from:"Devesh Sharma"

Re: release of debug/test tools for dapl and verbs

2009-12-07 Thread Devesh Sharma

Hello chien,
thanks for your instant response. In addition to the TED explaination,
following is the elaboration for Vertde.

As far as ib-verbs API remains same, it should work with iWARP
hardware also, however we have not tested this path. In the future
releases this will be taken care.

Vertde is developed to test and debug kernel level ib-verbs, it does
not use IB or iWARP protocol dependent connection management service
for path resolution and address resolution, this has to be done
manually.

Comments, suggestions and feedback are welcome.

Thanks & Regards
Devesh

On Mon, Dec 7, 2009 at 5:29 PM, yogeshwar sonawane  wrote:
> Hi chien,
>
> TED is an environment on top of uDAPL library. Hence TED can be run
> over iWARP adapters using there uDAPL libraries.
>
> Currently, TED source has uDAPL provider limits set for OFED uDAPL
> library. Kindly have a look at them (dbg_def.h under src/include path)
> and if required, change them. Things should work then.
>
> Till now, we have not tested TED with iWARP adapters. In future, we
> will try to do it.
>
> Any questions/suggestions/feedback is welcome.
>
> Thanks & Regards,
> Yogeshwar
>
> On Fri, Dec 4, 2009 at 9:52 PM, Tung, Chien Tin
>  wrote:
>>>can be used over any implementation of uDAPL/verbs over IB. there will be
>>>regular updations with new features, bug fixes, docs etc. suggestions are
>>>welcome.
>>
>> Will your tools work for iWARP adapters as well?
>>
>> Chien
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Promiscuous mode support in IPoIB

2011-06-23 Thread Devesh Sharma

Hello list,

does IPoIB supports promiscuous mode? Is it possible to implement this
mode using IB-Verbs?

-- 
Please don't print this E- mail unless you really need to - this will
preserve trees on planet earth.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Promiscuous mode support in IPoIB

2011-07-18 Thread Devesh Sharma

Hello List,

Kindly someone help me on this. I need to know about promiscuous mode
support in IPoIB and have a confusion if it is possible at all on
IPoIB stack. If yes then How?

On Fri, Jun 24, 2011 at 11:19 AM, Devesh Sharma  wrote:
> Hello list,
>
> does IPoIB supports promiscuous mode? Is it possible to implement this
> mode using IB-Verbs?
>
> --
> Please don't print this E- mail unless you really need to - this will
> preserve trees on planet earth.
>



-- 
Please don't print this E- mail unless you really need to - this will
preserve trees on planet earth.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Promiscuous mode support in IPoIB

2011-07-20 Thread Devesh Sharma

Thanks Woodruff for your response. Eli please can you through some
light on this. Thanks you your help and time.

On Mon, Jul 18, 2011 at 11:55 PM, Woodruff, Robert J
 wrote:
> Devesh Sharma wrote,
>
>>Hello List,
>
>>Kindly someone help me on this. I need to know about promiscuous mode
>>support in IPoIB and have a confusion if it is possible at all on
>>IPoIB stack. If yes then How?
>
> I am not sure that it is supported, but Eli would know for sure.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Please don't print this E- mail unless you really need to - this will
preserve trees on planet earth.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 15/30] RDMA/ocrdma: changes to support RoCE-v2 in UD path

2015-02-21 Thread Devesh Sharma

Hi Som,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Somnath Kotur
> Sent: Friday, February 20, 2015 3:33 AM
> To: rol...@kernel.org
> Cc: linux-rdma@vger.kernel.org; Devesh Sharma; Somnath Kotur
> Subject: [PATCH 15/30] RDMA/ocrdma: changes to support RoCE-v2 in UD path
> 
> From: Devesh Sharma 
> 
> To support UD protocol this patch adds following changes to existing UD
> implementation.
> 
> 1. AH creation resolves gid-type for a given index.
> 2. Based on GID-type protocol header is built.
> 3. Work completion reports l3-type if f/w supports RoCE-v2
>and sets IB_WC_WITH_NETWORK_HDR_TYPE flag in wc->wc_flags.
> 
> Signed-off-by: Somnath Kotur 
> Signed-off-by: Devesh Sharma 
> ---
>  drivers/infiniband/hw/ocrdma/ocrdma.h   |1 +
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c|   68
> ++-
>  drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |5 ++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   23 +++--
>  4 files changed, 80 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h
> b/drivers/infiniband/hw/ocrdma/ocrdma.h
> index 97f971a..302fd0e 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma.h
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
> @@ -341,6 +341,7 @@ struct ocrdma_ah {
>   struct ocrdma_av *av;
>   u16 sgid_index;
>   u32 id;
> + u8 hdr_type;
>  };
> 
>  struct ocrdma_qp_hwq_info {
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> index 7ecd230..70a885b 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> @@ -39,6 +39,20 @@
> 
>  #define OCRDMA_VID_PCP_SHIFT 0xD
> 
> +static u16 ocrdma_hdr_type_to_proto_num(u8 hdr_type) {
> + switch (hdr_type) {
> + case OCRDMA_L3_TYPE_IB_GRH:
> + return (u16)0x8915;
> + case OCRDMA_L3_TYPE_IPV4:
> + return (u16)0x0800;
> + case OCRDMA_L3_TYPE_IPV6:
> + return (u16)0x86dd;
> + default:
> + return 0;
> + }
> +}
> +
>  static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah,
>   struct ib_ah_attr *attr, union ib_gid *sgid,
>   int pdid, bool *isvlan, u16 vlan_tag) @@ -47,22 +61,32
> @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah
> *ah,
>   struct ocrdma_eth_vlan eth;
>   struct ocrdma_grh grh;
>   int eth_sz;
> + u16 proto_num = 0;
> + struct iphdr ipv4;
> + union {
> + struct sockaddr _sockaddr;
> + struct sockaddr_in  _sockaddr_in;
> + struct sockaddr_in6 _sockaddr_in6;
> + } sgid_addr, dgid_addr;
> 
>   memset(ð, 0, sizeof(eth));
>   memset(&grh, 0, sizeof(grh));
> + /* Protocol Number */
> + proto_num = ocrdma_hdr_type_to_proto_num(ah->hdr_type);
> +
> 
>   /* VLAN */
>   if (!vlan_tag || (vlan_tag > 0xFFF))
>   vlan_tag = dev->pvid;
>   if (vlan_tag && (vlan_tag < 0x1000)) {
>   eth.eth_type = cpu_to_be16(0x8100);
> - eth.roce_eth_type =
> cpu_to_be16(OCRDMA_ROCE_ETH_TYPE);
> + eth.roce_eth_type = cpu_to_be16(proto_num);
>   vlan_tag |= (dev->sl & 0x07) << OCRDMA_VID_PCP_SHIFT;
>   eth.vlan_tag = cpu_to_be16(vlan_tag);
>   eth_sz = sizeof(struct ocrdma_eth_vlan);
>   *isvlan = true;
>   } else {
> - eth.eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE);
> + eth.eth_type = cpu_to_be16(proto_num);
>   eth_sz = sizeof(struct ocrdma_eth_basic);
>   }
>   /* MAC */
> @@ -71,18 +95,34 @@ static inline int set_av_attr(struct ocrdma_dev *dev,
> struct ocrdma_ah *ah,
>   if (status)
>   return status;
>   ah->sgid_index = attr->grh.sgid_index;
> - memcpy(&grh.sgid[0], sgid->raw, sizeof(union ib_gid));
> - memcpy(&grh.dgid[0], attr->grh.dgid.raw, sizeof(attr->grh.dgid.raw));
> -
> - grh.tclass_flow = cpu_to_be32((6 << 28) |
> - (attr->grh.traffic_class << 24) |
> - attr->grh.flow_label);
> - /* 0x1b is next header value in GRH */
> - grh.pdid_hoplimit = cpu_to_be32((pdid << 16) |
> - (0x1b << 8) | attr->grh.hop_limit);
>   /* Eth HDR */
>   memcpy(&ah->av->eth_hdr, ð, eth_sz);
> - memcpy((u8 *)ah->av + eth_sz, &grh, sizeof(stru

RE: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-22 Thread Devesh Sharma

Hi Matan,

Please find a comment inline below:

-Regards
Devesh
> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Somnath Kotur
> Sent: Friday, February 20, 2015 3:32 AM
> To: rol...@kernel.org
> Cc: linux-rdma@vger.kernel.org; Matan Barak; Somnath Kotur
> Subject: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use
> roce_gid_cache
> 
> From: Matan Barak 
> 
> Previously, we resolved the dmac and took the smac and vlan from the resolved
> address. Changing that into finding a net device that matches the IP and vlan 
> of
> the network packet and querying the RoCE GID cache for this net device, GID
> and GID type.
> 
> ocrdma driver changes were done by Somnath Kotur
> 
> 
> Signed-off-by: Matan Barak 
> Signed-off-by: Somnath Kotur 
> ---
>  drivers/infiniband/core/addr.c   |3 +-
>  drivers/infiniband/core/cm.c |   30 --
>  drivers/infiniband/core/cma.c|9 --
>  drivers/infiniband/core/core_priv.h  |4 +-
>  drivers/infiniband/core/sa_query.c   |4 -
>  drivers/infiniband/core/ucma.c   |1 -
>  drivers/infiniband/core/uverbs_cmd.c |6 +-
>  drivers/infiniband/core/verbs.c  |  159 +
>  drivers/infiniband/hw/mlx4/ah.c  |   15 +++-
>  drivers/infiniband/hw/mlx4/mad.c |   12 ++-
>  drivers/infiniband/hw/mlx4/mcg.c |2 +-
>  drivers/infiniband/hw/mlx4/mlx4_ib.h |2 +-
>  drivers/infiniband/hw/mlx4/qp.c  |   42 ++--
>  drivers/infiniband/hw/ocrdma/ocrdma.h|1 +
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   20 +++--
>  drivers/infiniband/hw/ocrdma/ocrdma_hw.c |   17 ++-
>  include/rdma/ib_addr.h   |2 +-
>  include/rdma/ib_sa.h |2 -
>  include/rdma/ib_verbs.h  |7 +-
>  19 files changed, 183 insertions(+), 155 deletions(-)
> 
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index f80da50..43af7f5 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -458,7 +458,7 @@ static void resolve_cb(int status, struct sockaddr
> *src_addr,  }
> 
>  int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8
> *dmac,
> -u16 *vlan_id)
> +u16 *vlan_id, int if_index)
>  {
>   int ret = 0;
>   struct rdma_dev_addr dev_addr;
> @@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid,
> union ib_gid *dgid, u8 *dmac,
>   return ret;
> 
>   memset(&dev_addr, 0, sizeof(dev_addr));
> + dev_addr.bound_dev_if = if_index;
> 
>   ctx.addr = &dev_addr;
>   init_completion(&ctx.comp);
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index
> d88f2ae..7974e74 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -178,8 +178,6 @@ struct cm_av {
>   struct ib_ah_attr ah_attr;
>   u16 pkey_index;
>   u8 timeout;
> - u8  valid;
> - u8  smac[ETH_ALEN];
>  };
> 
>  struct cm_work {
> @@ -382,7 +380,6 @@ static int cm_init_av_by_path(struct ib_sa_path_rec
> *path, struct cm_av *av)
>&av->ah_attr);
>   av->timeout = path->packet_life_time + 1;
> 
> - av->valid = 1;
>   return 0;
>  }
> 
> @@ -1563,7 +1560,6 @@ static int cm_req_handler(struct cm_work *work)
>   cm_format_paths_from_req(req_msg, &work->path[0], &work-
> >path[1]);
> 
>   memcpy(work->path[0].dmac, cm_id_priv->av.ah_attr.dmac,
> ETH_ALEN);
> - work->path[0].vlan_id = cm_id_priv->av.ah_attr.vlan_id;
>   ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
>   if (ret) {
>   ib_get_cached_gid(work->port->cm_dev->ib_device,
> @@ -3511,32 +3507,6 @@ static int cm_init_qp_rtr_attr(struct cm_id_private
> *cm_id_priv,
>   *qp_attr_mask = IB_QP_STATE | IB_QP_AV |
> IB_QP_PATH_MTU |
>   IB_QP_DEST_QPN | IB_QP_RQ_PSN;
>   qp_attr->ah_attr = cm_id_priv->av.ah_attr;
> - if (!cm_id_priv->av.valid) {
> - spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> - return -EINVAL;
> - }
> - if (cm_id_priv->av.ah_attr.vlan_id != 0x) {
> - qp_attr->vlan_id = cm_id_priv->av.ah_attr.vlan_id;
> - *qp_attr_mask |= IB_QP_VID;
> - }
> - if (!is_zero_ether_addr(cm_id_priv->av.smac)) {
> - memcpy(qp_attr->smac, cm_id_priv->av.smac,
> -sizeof(qp_attr->smac));
> - *qp_attr_mask |= IB_QP_SMAC;
> - }
> - if (cm_id_priv->alt_av.valid) {
> - if (cm_id_priv->alt_av.ah_attr.vlan_id != 0x) {
> - qp_attr->alt_vlan_id =
> -

RE: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-23 Thread Devesh Sharma


> -Original Message-
> From: Matan Barak [mailto:mat...@mellanox.com]
> Sent: Monday, February 23, 2015 3:47 PM
> To: Devesh Sharma; Somnath Kotur; rol...@kernel.org
> Cc: linux-rdma@vger.kernel.org
> Subject: Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use
> roce_gid_cache
> 
> 
> 
> On 2/23/2015 7:25 AM, Devesh Sharma wrote:
> > Hi Matan,
> >
> > Please find a comment inline below:
> >
> > -Regards
> > Devesh
> >> -Original Message-
> >> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> >> ow...@vger.kernel.org] On Behalf Of Somnath Kotur
> >> Sent: Friday, February 20, 2015 3:32 AM
> >> To: rol...@kernel.org
> >> Cc: linux-rdma@vger.kernel.org; Matan Barak; Somnath Kotur
> >> Subject: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to
> >> use roce_gid_cache
> >>
> >> From: Matan Barak 
> >>
> >> Previously, we resolved the dmac and took the smac and vlan from the
> >> resolved address. Changing that into finding a net device that
> >> matches the IP and vlan of the network packet and querying the RoCE
> >> GID cache for this net device, GID and GID type.
> >>
> >> ocrdma driver changes were done by Somnath Kotur
> >> 
> >>
> >> Signed-off-by: Matan Barak 
> >> Signed-off-by: Somnath Kotur 
> >> ---
> >>   drivers/infiniband/core/addr.c   |3 +-
> >>   drivers/infiniband/core/cm.c |   30 --
> >>   drivers/infiniband/core/cma.c|9 --
> >>   drivers/infiniband/core/core_priv.h  |4 +-
> >>   drivers/infiniband/core/sa_query.c   |4 -
> >>   drivers/infiniband/core/ucma.c   |1 -
> >>   drivers/infiniband/core/uverbs_cmd.c |6 +-
> >>   drivers/infiniband/core/verbs.c  |  159 
> >> +
> >>   drivers/infiniband/hw/mlx4/ah.c  |   15 +++-
> >>   drivers/infiniband/hw/mlx4/mad.c |   12 ++-
> >>   drivers/infiniband/hw/mlx4/mcg.c |2 +-
> >>   drivers/infiniband/hw/mlx4/mlx4_ib.h |2 +-
> >>   drivers/infiniband/hw/mlx4/qp.c  |   42 ++--
> >>   drivers/infiniband/hw/ocrdma/ocrdma.h|1 +
> >>   drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   20 +++--
> >>   drivers/infiniband/hw/ocrdma/ocrdma_hw.c |   17 ++-
> >>   include/rdma/ib_addr.h   |2 +-
> >>   include/rdma/ib_sa.h |2 -
> >>   include/rdma/ib_verbs.h  |7 +-
> >>   19 files changed, 183 insertions(+), 155 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/core/addr.c
> >> b/drivers/infiniband/core/addr.c index f80da50..43af7f5 100644
> >> --- a/drivers/infiniband/core/addr.c
> >> +++ b/drivers/infiniband/core/addr.c
> >> @@ -458,7 +458,7 @@ static void resolve_cb(int status, struct
> >> sockaddr *src_addr,  }
> >>
> >>   int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid
> >> *dgid, u8 *dmac,
> >> - u16 *vlan_id)
> >> + u16 *vlan_id, int if_index)
> >>   {
> >>int ret = 0;
> >>struct rdma_dev_addr dev_addr;
> >> @@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid
> >> *sgid, union ib_gid *dgid, u8 *dmac,
> >>return ret;
> >>
> >>memset(&dev_addr, 0, sizeof(dev_addr));
> >> +  dev_addr.bound_dev_if = if_index;
> >>
> >>ctx.addr = &dev_addr;
> >>init_completion(&ctx.comp);
> >> diff --git a/drivers/infiniband/core/cm.c
> >> b/drivers/infiniband/core/cm.c index
> >> d88f2ae..7974e74 100644
> >> --- a/drivers/infiniband/core/cm.c
> >> +++ b/drivers/infiniband/core/cm.c
> >> @@ -178,8 +178,6 @@ struct cm_av {
> >>struct ib_ah_attr ah_attr;
> >>u16 pkey_index;
> >>u8 timeout;
> >> -  u8  valid;
> >> -  u8  smac[ETH_ALEN];
> >>   };
> >>
> >>   struct cm_work {
> >> @@ -382,7 +380,6 @@ static int cm_init_av_by_path(struct
> >> ib_sa_path_rec *path, struct cm_av *av)
> >> &av->ah_attr);
> >>av->timeout = path->packet_life_time + 1;
> >>
> >> -  av->valid = 1;
> >>return 0;
> >>   }
> >>
> >> @@ -1563,7 +1560,6 @@ static int cm

RE: RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter

2015-02-26 Thread Devesh Sharma

Thanks Dan for pointing out, we will address this issue and send out a patch to 
fix this.

-Regards
Devesh

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Dan Carpenter
> Sent: Thursday, February 26, 2015 6:00 PM
> To: Parav Pandit
> Cc: linux-rdma@vger.kernel.org
> Subject: re: RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA
> adapter
> 
> Hello Parav Pandit,
> 
> The patch fe2caefcdf58: "RDMA/ocrdma: Add driver for Emulex OneConnect
> IBoE RDMA adapter" from Mar 21, 2012, leads to the following static checker
> warning:
> 
>   drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1426
> _ocrdma_modify_qp()
>   warn: bool is not less than zero.
> 
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
>   1411  int _ocrdma_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
>   1412int attr_mask)
>   1413  {
>   1414  int status = 0;
>   1415  struct ocrdma_qp *qp;
>   1416  struct ocrdma_dev *dev;
>   1417  enum ib_qp_state old_qps;
>   1418
>   1419  qp = get_ocrdma_qp(ibqp);
>   1420  dev = get_ocrdma_dev(ibqp->device);
>   1421  if (attr_mask & IB_QP_STATE)
>   1422  status = ocrdma_qp_state_change(qp, attr->qp_state,
> &old_qps);
>   1423  /* if new and previous states are same hw doesn't need to
>   1424   * know about it.
>   1425   */
>   1426  if (status < 0)
> 
> This check is never true.  Based on the comment then the check should
> be:
>   if (status == 1)
>   return SOMETHING;
> 
>   1427  return status;
>   1428  status = ocrdma_mbx_modify_qp(dev, qp, attr, attr_mask);
>   1429
>   1430  return status;
>   1431  }
> 
> regards,
> dan carpenter
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v1 01/16] xprtrdma: Display IPv6 addresses and port numbers correctly

2015-03-24 Thread Devesh Sharma

I see in the svcrdma code, there is a big check to abort creating listener if 
AF is not AF_INET,
Do we have plans to address this on the server as well?

static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
struct net *net,
struct sockaddr *sa, int salen,
int flags)
{
struct rdma_cm_id *listen_id;
struct svcxprt_rdma *cma_xprt;
int ret;

dprintk("svcrdma: Creating RDMA socket\n");
if (sa->sa_family != AF_INET) {
dprintk("svcrdma: Address family %d is not supported.\n", 
sa->sa_family);
return ERR_PTR(-EAFNOSUPPORT);

-Regards
Devesh

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Saturday, March 14, 2015 2:57 AM
> To: linux-rdma@vger.kernel.org
> Subject: [PATCH v1 01/16] xprtrdma: Display IPv6 addresses and port numbers
> correctly
> 
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/transport.c |   47
> ---
>  net/sunrpc/xprtrdma/verbs.c |   21 +++--
>  2 files changed, 47 insertions(+), 21 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index 2e192ba..26a62e7 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -157,12 +157,47 @@ static struct ctl_table sunrpc_table[] = {
>  static struct rpc_xprt_ops xprt_rdma_procs;  /* forward reference */
> 
>  static void
> +xprt_rdma_format_addresses4(struct rpc_xprt *xprt, struct sockaddr
> +*sap) {
> + struct sockaddr_in *sin = (struct sockaddr_in *)sap;
> + char buf[20];
> +
> + snprintf(buf, sizeof(buf), "%08x", ntohl(sin->sin_addr.s_addr));
> + xprt->address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf,
> +GFP_KERNEL);
> +
> + xprt->address_strings[RPC_DISPLAY_NETID] = "rdma"; }
> +
> +static void
> +xprt_rdma_format_addresses6(struct rpc_xprt *xprt, struct sockaddr
> +*sap) {
> + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap;
> + char buf[40];
> +
> + snprintf(buf, sizeof(buf), "%pi6", &sin6->sin6_addr);
> + xprt->address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf,
> +GFP_KERNEL);
> +
> + xprt->address_strings[RPC_DISPLAY_NETID] = "rdma6"; }
> +
> +static void
>  xprt_rdma_format_addresses(struct rpc_xprt *xprt)  {
>   struct sockaddr *sap = (struct sockaddr *)
>   &rpcx_to_rdmad(xprt).addr;
> - struct sockaddr_in *sin = (struct sockaddr_in *)sap;
> - char buf[64];
> + char buf[128];
> +
> + switch (sap->sa_family) {
> + case AF_INET:
> + xprt_rdma_format_addresses4(xprt, sap);
> + break;
> + case AF_INET6:
> + xprt_rdma_format_addresses6(xprt, sap);
> + break;
> + default:
> + pr_err("rpcrdma: Unrecognized address family\n");
> + return;
> + }
> 
>   (void)rpc_ntop(sap, buf, sizeof(buf));
>   xprt->address_strings[RPC_DISPLAY_ADDR] = kstrdup(buf,
> GFP_KERNEL); @@ -170,16 +205,10 @@ xprt_rdma_format_addresses(struct
> rpc_xprt *xprt)
>   snprintf(buf, sizeof(buf), "%u", rpc_get_port(sap));
>   xprt->address_strings[RPC_DISPLAY_PORT] = kstrdup(buf,
> GFP_KERNEL);
> 
> - xprt->address_strings[RPC_DISPLAY_PROTO] = "rdma";
> -
> - snprintf(buf, sizeof(buf), "%08x", ntohl(sin->sin_addr.s_addr));
> - xprt->address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf,
> GFP_KERNEL);
> -
>   snprintf(buf, sizeof(buf), "%4hx", rpc_get_port(sap));
>   xprt->address_strings[RPC_DISPLAY_HEX_PORT] = kstrdup(buf,
> GFP_KERNEL);
> 
> - /* netid */
> - xprt->address_strings[RPC_DISPLAY_NETID] = "rdma";
> + xprt->address_strings[RPC_DISPLAY_PROTO] = "rdma";
>  }
> 
>  static void
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index
> 124676c..1aa55b7 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -50,6 +50,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "xprt_rdma.h"
> @@ -424,7 +425,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct
> rdma_cm_event *event)
>   struct rpcrdma_ia *ia = &xprt->rx_ia;
>   struct rpcrdma_ep *ep = &xprt->rx_ep;
>  #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
> - struct sockaddr_in *addr = (struct sockaddr_in *) &ep-
> >rep_remote_addr;
> + struct sockaddr *sap = (struct sockaddr *)&ep->rep_remote_addr;
>  #endif
>   struct ib_qp_attr *attr = &ia->ri_qp_attr;
>   struct ib_qp_init_attr *iattr = &ia->ri_qp_init_attr; @@ -480,9 +481,8
> @@ connected:
>   wake_up_all(&ep->rep_connect_wait);
>   /*FALLTHROUGH*/
>   default:
> - dprintk("RPC:   %s: %pI4:%u (ep 0x%p): %s\n",
> - __func__, &addr->sin

RE: [PATCH v1 06/16] xprtrdma: Add a "deregister_external" op for each memreg mode

2015-03-24 Thread Devesh Sharma

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Saturday, March 14, 2015 2:57 AM
> To: linux-rdma@vger.kernel.org
> Subject: [PATCH v1 06/16] xprtrdma: Add a "deregister_external" op for each
> memreg mode
> 
> There is very little common processing among the different external memory
> deregistration functions.
> 
> In addition, instead of calling the deregistration function for each segment,
> have one call release all segments for a request. This makes the API a little
> asymmetrical, but a hair faster.
> 
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/fmr_ops.c  |   37 
>  net/sunrpc/xprtrdma/frwr_ops.c |   46 
>  net/sunrpc/xprtrdma/physical_ops.c |   13 ++
>  net/sunrpc/xprtrdma/rpc_rdma.c |7 +--
>  net/sunrpc/xprtrdma/transport.c|8 +---
>  net/sunrpc/xprtrdma/verbs.c|   81 
> 
>  net/sunrpc/xprtrdma/xprt_rdma.h|5 +-
>  7 files changed, 103 insertions(+), 94 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
> index 45fb646..9b983b4 100644
> --- a/net/sunrpc/xprtrdma/fmr_ops.c
> +++ b/net/sunrpc/xprtrdma/fmr_ops.c
> @@ -20,6 +20,32 @@
>  /* Maximum scatter/gather per FMR */
>  #define RPCRDMA_MAX_FMR_SGES (64)
> 
> +/* Use the ib_unmap_fmr() verb to prevent further remote
> + * access via RDMA READ or RDMA WRITE.
> + */
> +static int
> +__fmr_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) {
> + struct rpcrdma_ia *ia = &r_xprt->rx_ia;
> + struct rpcrdma_mr_seg *seg1 = seg;
> + int rc, nsegs = seg->mr_nsegs;
> + LIST_HEAD(l);
> +
> + list_add(&seg1->rl_mw->r.fmr->list, &l);
> + rc = ib_unmap_fmr(&l);
> + read_lock(&ia->ri_qplock);
> + while (seg1->mr_nsegs--)
> + rpcrdma_unmap_one(ia, seg++);
> + read_unlock(&ia->ri_qplock);
> + if (rc)
> + goto out_err;
> + return nsegs;
> +
> +out_err:
> + dprintk("RPC:   %s: ib_unmap_fmr status %i\n", __func__, rc);
> + return nsegs;
> +}
> +
>  /* FMR mode conveys up to 64 pages of payload per chunk segment.
>   */
>  static size_t
> @@ -79,8 +105,19 @@ out_maperr:
>   return rc;
>  }
> 
> +static void
> +fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
> +  unsigned int count)
> +{
> + unsigned int i;
> +
> + for (i = 0; count--;)
> + i += __fmr_unmap(r_xprt, &req->rl_segments[i]); }
> +
>  const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
>   .ro_map = fmr_op_map,
> + .ro_unmap   = fmr_op_unmap,
>   .ro_maxpages= fmr_op_maxpages,
>   .ro_displayname = "fmr",
>  };
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 2b5ccb0..05b5761 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -17,6 +17,41 @@
>  # define RPCDBG_FACILITY RPCDBG_TRANS
>  #endif
> 
> +/* Post a LOCAL_INV Work Request to prevent further remote access
> + * via RDMA READ or RDMA WRITE.
> + */
> +static int
> +__frwr_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) {
> + struct rpcrdma_mr_seg *seg1 = seg;
> + struct rpcrdma_ia *ia = &r_xprt->rx_ia;
> + struct ib_send_wr invalidate_wr, *bad_wr;
> + int rc, nsegs = seg->mr_nsegs;
> +
> + seg1->rl_mw->r.frmr.fr_state = FRMR_IS_INVALID;
> +
> + memset(&invalidate_wr, 0, sizeof(invalidate_wr));
> + invalidate_wr.wr_id = (unsigned long)(void *)seg1->rl_mw;
> + invalidate_wr.opcode = IB_WR_LOCAL_INV;
> + invalidate_wr.ex.invalidate_rkey = seg1->rl_mw->r.frmr.fr_mr->rkey;
> + DECR_CQCOUNT(&r_xprt->rx_ep);
> +
> + read_lock(&ia->ri_qplock);
> + while (seg1->mr_nsegs--)
> + rpcrdma_unmap_one(ia, seg++);
> + rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
> + read_unlock(&ia->ri_qplock);
> + if (rc)
> + goto out_err;
> + return nsegs;
> +
> +out_err:
> + /* Force rpcrdma_buffer_get() to retry */
> + seg1->rl_mw->r.frmr.fr_state = FRMR_IS_STALE;
> + dprintk("RPC:   %s: ib_post_send status %i\n", __func__, rc);
> + return nsegs;
> +}
> +
>  /* FRWR mode conveys a list of pages per chunk segment. The
>   * maximum length of that list is the FRWR page list depth.
>   */
> @@ -116,8 +151,19 @@ out_err:
>   return rc;
>  }
> 
> +static void
> +frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
> +   unsigned int count)
> +{
> + unsigned int i;
> +
> + for (i = 0; count--;)
> + i += __frwr_unmap(r_xprt, &req->rl_segments[i]); }
> +
>  const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
>   .ro_map = frwr_op_map,
> + .ro_unmap   = frwr_op_un

RE: [PATCH v1 08/16] xprtrdma: Add "reset MRs" memreg op

2015-03-24 Thread Devesh Sharma

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Saturday, March 14, 2015 2:58 AM
> To: linux-rdma@vger.kernel.org
> Subject: [PATCH v1 08/16] xprtrdma: Add "reset MRs" memreg op
> 
> This method is invoked when a transport instance is about to be reconnected.
> Each Memory Region object is reset to its initial state.
> 
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/fmr_ops.c  |   23 
>  net/sunrpc/xprtrdma/frwr_ops.c |   46 
>  net/sunrpc/xprtrdma/physical_ops.c |6 ++
>  net/sunrpc/xprtrdma/verbs.c|  103 
> +---
>  net/sunrpc/xprtrdma/xprt_rdma.h|1
>  5 files changed, 78 insertions(+), 101 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
> index 1501db0..1ccb3de 100644
> --- a/net/sunrpc/xprtrdma/fmr_ops.c
> +++ b/net/sunrpc/xprtrdma/fmr_ops.c
> @@ -156,10 +156,33 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct
> rpcrdma_req *req,
>   i += __fmr_unmap(r_xprt, &req->rl_segments[i]);  }
> 
> +/* After a disconnect, unmap all FMRs.
> + *
> + * This is invoked only in the transport connect worker in order
> + * to serialize with rpcrdma_register_fmr_external().
> + */
> +static void
> +fmr_op_reset(struct rpcrdma_xprt *r_xprt) {
> + struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
> + struct rpcrdma_mw *r;
> + LIST_HEAD(list);
> + int rc;
> +
> + list_for_each_entry(r, &buf->rb_all, mw_all)
> + list_add(&r->r.fmr->list, &list);
> +
> + rc = ib_unmap_fmr(&list);
> + if (rc)
> + dprintk("RPC:   %s: ib_unmap_fmr failed %i\n",
> + __func__, rc);
> +}
> +
>  const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
>   .ro_map = fmr_op_map,
>   .ro_unmap   = fmr_op_unmap,
>   .ro_maxpages= fmr_op_maxpages,
>   .ro_init= fmr_op_init,
> + .ro_reset   = fmr_op_reset,
>   .ro_displayname = "fmr",
>  };
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 975372c..b4ce0e5 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -81,6 +81,18 @@ out_err:
>   return nsegs;
>  }
> 
> +static void
> +__frwr_release(struct rpcrdma_mw *r)
> +{
> + int rc;
> +
> + rc = ib_dereg_mr(r->r.frmr.fr_mr);
> + if (rc)
> + dprintk("RPC:   %s: ib_dereg_mr status %i\n",
> + __func__, rc);
> + ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
> +}
> +
>  /* FRWR mode conveys a list of pages per chunk segment. The
>   * maximum length of that list is the FRWR page list depth.
>   */
> @@ -226,10 +238,44 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct
> rpcrdma_req *req,
>   i += __frwr_unmap(r_xprt, &req->rl_segments[i]);  }
> 
> +/* After a disconnect, a flushed FAST_REG_MR can leave an FRMR in
> + * an unusable state. Find FRMRs in this state and dereg / reg
> + * each.  FRMRs that are VALID and attached to an rpcrdma_req are
> + * also torn down.
> + *
> + * This gives all in-use FRMRs a fresh rkey and leaves them INVALID.
> + *
> + * This is invoked only in the transport connect worker in order
> + * to serialize with rpcrdma_register_frmr_external().
> + */
> +static void
> +frwr_op_reset(struct rpcrdma_xprt *r_xprt) {
> + struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
> + struct ib_device *device = r_xprt->rx_ia.ri_id->device;
> + unsigned int depth = r_xprt->rx_ia.ri_max_frmr_depth;
> + struct ib_pd *pd = r_xprt->rx_ia.ri_pd;
> + struct rpcrdma_mw *r;
> + int rc;
> +
> + list_for_each_entry(r, &buf->rb_all, mw_all) {
> + if (r->r.frmr.fr_state == FRMR_IS_INVALID)
> + continue;
> +
> + __frwr_release(r);
> + rc = __frwr_init(r, pd, device, depth);
> + if (rc)
> + continue;

Should we print something here e.g. "failed to allocate frmr, mount will work 
with less number of frmr, performance hit is expected"?

> +
> + r->r.frmr.fr_state = FRMR_IS_INVALID;
> + }
> +}
> +
>  const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
>   .ro_map = frwr_op_map,
>   .ro_unmap   = frwr_op_unmap,
>   .ro_maxpages= frwr_op_maxpages,
>   .ro_init= frwr_op_init,
> + .ro_reset   = frwr_op_reset,
>   .ro_displayname = "frwr",
>  };
> diff --git a/net/sunrpc/xprtrdma/physical_ops.c
> b/net/sunrpc/xprtrdma/physical_ops.c
> index ae2b0bc..0afc691 100644
> --- a/net/sunrpc/xprtrdma/physical_ops.c
> +++ b/net/sunrpc/xprtrdma/physical_ops.c
> @@ -62,10 +62,16 @@ physical_op_unm

RE: [PATCH v1 10/16] xprtrdma: Add "open" memreg op

2015-03-24 Thread Devesh Sharma

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Saturday, March 14, 2015 2:58 AM
> To: linux-rdma@vger.kernel.org
> Subject: [PATCH v1 10/16] xprtrdma: Add "open" memreg op
> 
> The open op determines the size of various transport data structures based on
> device capabilities and memory registration mode.
> 
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/fmr_ops.c  |   22 +
>  net/sunrpc/xprtrdma/frwr_ops.c |   60
> 
>  net/sunrpc/xprtrdma/physical_ops.c |   22 +
>  net/sunrpc/xprtrdma/verbs.c|   54 ++--
>  net/sunrpc/xprtrdma/xprt_rdma.h|3 ++
>  5 files changed, 110 insertions(+), 51 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
> index 3115e4b..96e6cd3 100644
> --- a/net/sunrpc/xprtrdma/fmr_ops.c
> +++ b/net/sunrpc/xprtrdma/fmr_ops.c
> @@ -46,6 +46,27 @@ out_err:
>   return nsegs;
>  }
> 
> +static int
> +fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
> + struct rpcrdma_create_data_internal *cdata) {
> + struct ib_device_attr *devattr = &ia->ri_devattr;
> + unsigned int wrs, max_wrs;
> +
> + max_wrs = devattr->max_qp_wr;
> + if (cdata->max_requests > max_wrs)
> + cdata->max_requests = max_wrs;
> +
> + wrs = cdata->max_requests;
> + ep->rep_attr.cap.max_send_wr = wrs;
> + ep->rep_attr.cap.max_recv_wr = wrs;
> +
> + dprintk("RPC:   %s: pre-allocating %u send WRs, %u recv WRs\n",
> + __func__, ep->rep_attr.cap.max_send_wr,
> + ep->rep_attr.cap.max_recv_wr);
> + return 0;
> +}
> +
>  /* FMR mode conveys up to 64 pages of payload per chunk segment.
>   */
>  static size_t
> @@ -201,6 +222,7 @@ fmr_op_destroy(struct rpcrdma_buffer *buf)  const
> struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
>   .ro_map = fmr_op_map,
>   .ro_unmap   = fmr_op_unmap,
> + .ro_open= fmr_op_open,
>   .ro_maxpages= fmr_op_maxpages,
>   .ro_init= fmr_op_init,
>   .ro_reset   = fmr_op_reset,
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index fc3a228..9bb4b2d 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -93,6 +93,65 @@ __frwr_release(struct rpcrdma_mw *r)
>   ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
>  }
> 
> +static int
> +frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
> +  struct rpcrdma_create_data_internal *cdata) {
> + struct ib_device_attr *devattr = &ia->ri_devattr;
> + unsigned int wrs, max_wrs;
> + int depth = 7;
> +
> + max_wrs = devattr->max_qp_wr;
> + if (cdata->max_requests > max_wrs)
> + cdata->max_requests = max_wrs;
> +
> + wrs = cdata->max_requests;
> + ep->rep_attr.cap.max_send_wr = wrs;
> + ep->rep_attr.cap.max_recv_wr = wrs;
> +
> + ia->ri_max_frmr_depth =
> + min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
> +   devattr->max_fast_reg_page_list_len);
> + dprintk("RPC:   %s: device's max FR page list len = %u\n",
> + __func__, ia->ri_max_frmr_depth);
> +
> + /* Add room for frmr register and invalidate WRs.
> +  * 1. FRMR reg WR for head
> +  * 2. FRMR invalidate WR for head
> +  * 3. N FRMR reg WRs for pagelist
> +  * 4. N FRMR invalidate WRs for pagelist
> +  * 5. FRMR reg WR for tail
> +  * 6. FRMR invalidate WR for tail
> +  * 7. The RDMA_SEND WR
> +  */
> +
> + /* Calculate N if the device max FRMR depth is smaller than
> +  * RPCRDMA_MAX_DATA_SEGS.
> +  */
> + if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
> + int delta = RPCRDMA_MAX_DATA_SEGS - ia-
> >ri_max_frmr_depth;
> +
> + do {
> + depth += 2; /* FRMR reg + invalidate */
> + delta -= ia->ri_max_frmr_depth;
> + } while (delta > 0);

Please add a check on ia->ri_max_frmr_depth to check it non-zero. A bug in 
provider (if it reports max_frmr_depth = 0 in query_device), would form an 
infinite loop here and mount will be stuck.

> + }
> +
> + ep->rep_attr.cap.max_send_wr *= depth;
> + if (ep->rep_attr.cap.max_send_wr > max_wrs) {
> + cdata->max_requests = max_wrs / depth;
> + if (!cdata->max_requests)
> + return -EINVAL;
> + ep->rep_attr.cap.max_send_wr = cdata->max_requests *
> +depth;
> + }
> +
> + dprintk("RPC:   %s: pre-allocating %u send WRs, %u recv WRs\n",
> + __func__, ep->rep_attr.cap.max_send_wr,
> + ep->rep_attr.cap.max_recv_wr);
> +

RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

2015-03-26 Thread Devesh Sharma

Hi Chuck,

I have validated these set of patches with ocrdma device, iozone passes with 
these.

-Regards
Devesh

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Anna Schumaker
> Sent: Friday, March 27, 2015 12:10 AM
> To: Chuck Lever; linux-rdma@vger.kernel.org; linux-...@vger.kernel.org
> Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
> 
> Hey Chuck,
> 
> I didn't see anything that needs to be fixed up in these patches.  Are they 
> ready
> for me?
> 
> Anna
> 
> On 03/24/2015 04:30 PM, Chuck Lever wrote:
> > This is a series of client-side patches for NFS/RDMA. In preparation
> > for increasing the transport credit limit and maximum rsize/wsize,
> > I've re-factored the memory registration logic into separate files,
> > invoked via a method API.
> >
> > The two main optimizations in v1 of this series have been dropped.
> > Sagi Grimberg didn't like the complexity of the solution, and there
> > isn't enough time to rework it, test the new version, and get it
> > reviewed before the 4.1 merge window opens. I'm going to prepare these
> > for 4.2.
> >
> > Fixes suggested by reviewers have been included before the refactoring
> > patches to make it easier to backport them to previous kernels.
> >
> > The series is available in the nfs-rdma-for-4.1 topic branch at
> >
> > git://linux-nfs.org/projects/cel/cel-2.6.git
> >
> > Changes since v1:
> > - Rebased on 4.0-rc5
> > - Main optimizations postponed to 4.2
> > - Addressed review comments from Anna, Sagi, and Devesh
> >
> > ---
> >
> > Chuck Lever (15):
> >   SUNRPC: Introduce missing well-known netids
> >   xprtrdma: Display IPv6 addresses and port numbers correctly
> >   xprtrdma: Perform a full marshal on retransmit
> >   xprtrdma: Byte-align FRWR registration
> >   xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
> >   xprtrdma: Add vector of ops for each memory registration strategy
> >   xprtrdma: Add a "max_payload" op for each memreg mode
> >   xprtrdma: Add a "register_external" op for each memreg mode
> >   xprtrdma: Add a "deregister_external" op for each memreg mode
> >   xprtrdma: Add "init MRs" memreg op
> >   xprtrdma: Add "reset MRs" memreg op
> >   xprtrdma: Add "destroy MRs" memreg op
> >   xprtrdma: Add "open" memreg op
> >   xprtrdma: Handle non-SEND completions via a callout
> >   xprtrdma: Make rpcrdma_{un}map_one() into inline functions
> >
> >
> >  include/linux/sunrpc/msg_prot.h|8
> >  net/sunrpc/xprtrdma/Makefile   |3
> >  net/sunrpc/xprtrdma/fmr_ops.c  |  208 +++
> >  net/sunrpc/xprtrdma/frwr_ops.c |  353 ++
> >  net/sunrpc/xprtrdma/physical_ops.c |   94 +
> >  net/sunrpc/xprtrdma/rpc_rdma.c |   87 ++--
> >  net/sunrpc/xprtrdma/transport.c|   61 ++-
> >  net/sunrpc/xprtrdma/verbs.c|  699 
> > +++-
> >  net/sunrpc/xprtrdma/xprt_rdma.h|   90 -
> >  9 files changed, 882 insertions(+), 721 deletions(-)  create mode
> > 100644 net/sunrpc/xprtrdma/fmr_ops.c  create mode 100644
> > net/sunrpc/xprtrdma/frwr_ops.c  create mode 100644
> > net/sunrpc/xprtrdma/physical_ops.c
> >
> > --
> > Chuck Lever
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> > in the body of a message to majord...@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

2015-03-26 Thread Devesh Sharma

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Devesh Sharma
> Sent: Friday, March 27, 2015 11:13 AM
> To: Anna Schumaker; Chuck Lever; linux-rdma@vger.kernel.org; linux-
> n...@vger.kernel.org
> Subject: RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
> 
> Hi Chuck,
> 
> I have validated these set of patches with ocrdma device, iozone passes with
> these.


Thanks to Meghna.

> 
> -Regards
> Devesh
> 
> > -Original Message-
> > From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> > ow...@vger.kernel.org] On Behalf Of Anna Schumaker
> > Sent: Friday, March 27, 2015 12:10 AM
> > To: Chuck Lever; linux-rdma@vger.kernel.org; linux-...@vger.kernel.org
> > Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
> >
> > Hey Chuck,
> >
> > I didn't see anything that needs to be fixed up in these patches.  Are
> > they ready for me?
> >
> > Anna
> >
> > On 03/24/2015 04:30 PM, Chuck Lever wrote:
> > > This is a series of client-side patches for NFS/RDMA. In preparation
> > > for increasing the transport credit limit and maximum rsize/wsize,
> > > I've re-factored the memory registration logic into separate files,
> > > invoked via a method API.
> > >
> > > The two main optimizations in v1 of this series have been dropped.
> > > Sagi Grimberg didn't like the complexity of the solution, and there
> > > isn't enough time to rework it, test the new version, and get it
> > > reviewed before the 4.1 merge window opens. I'm going to prepare
> > > these for 4.2.
> > >
> > > Fixes suggested by reviewers have been included before the
> > > refactoring patches to make it easier to backport them to previous 
> > > kernels.
> > >
> > > The series is available in the nfs-rdma-for-4.1 topic branch at
> > >
> > > git://linux-nfs.org/projects/cel/cel-2.6.git
> > >
> > > Changes since v1:
> > > - Rebased on 4.0-rc5
> > > - Main optimizations postponed to 4.2
> > > - Addressed review comments from Anna, Sagi, and Devesh
> > >
> > > ---
> > >
> > > Chuck Lever (15):
> > >   SUNRPC: Introduce missing well-known netids
> > >   xprtrdma: Display IPv6 addresses and port numbers correctly
> > >   xprtrdma: Perform a full marshal on retransmit
> > >   xprtrdma: Byte-align FRWR registration
> > >   xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
> > >   xprtrdma: Add vector of ops for each memory registration strategy
> > >   xprtrdma: Add a "max_payload" op for each memreg mode
> > >   xprtrdma: Add a "register_external" op for each memreg mode
> > >   xprtrdma: Add a "deregister_external" op for each memreg mode
> > >   xprtrdma: Add "init MRs" memreg op
> > >   xprtrdma: Add "reset MRs" memreg op
> > >   xprtrdma: Add "destroy MRs" memreg op
> > >   xprtrdma: Add "open" memreg op
> > >   xprtrdma: Handle non-SEND completions via a callout
> > >   xprtrdma: Make rpcrdma_{un}map_one() into inline functions
> > >
> > >
> > >  include/linux/sunrpc/msg_prot.h|8
> > >  net/sunrpc/xprtrdma/Makefile   |3
> > >  net/sunrpc/xprtrdma/fmr_ops.c  |  208 +++
> > >  net/sunrpc/xprtrdma/frwr_ops.c |  353 ++
> > >  net/sunrpc/xprtrdma/physical_ops.c |   94 +
> > >  net/sunrpc/xprtrdma/rpc_rdma.c |   87 ++--
> > >  net/sunrpc/xprtrdma/transport.c|   61 ++-
> > >  net/sunrpc/xprtrdma/verbs.c|  699 
> > > +++-
> > >  net/sunrpc/xprtrdma/xprt_rdma.h|   90 -
> > >  9 files changed, 882 insertions(+), 721 deletions(-)  create mode
> > > 100644 net/sunrpc/xprtrdma/fmr_ops.c  create mode 100644
> > > net/sunrpc/xprtrdma/frwr_ops.c  create mode 100644
> > > net/sunrpc/xprtrdma/physical_ops.c
> > >
> > > --
> > > Chuck Lever
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> > > in the body of a message to majord...@vger.kernel.org More majordomo
> > > info at  http://vger.kernel.org/majordomo-info.html
> > >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > in the body of a message to majord...@vger.kernel.org More majordomo
> > info at http://vger.kernel.org/majordomo-info.html
>   칻 & ~ &   +-  ݶ  w  ˛   m b  kvf   ^n r   z   h&   G   h ( 階 ݢj" 
>   m z
> ޖ   f   h   ~ m
N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

2015-03-27 Thread Devesh Sharma

Yes, You can add My and Meghna's name in tested-by tag

-Thanks

> -Original Message-
> From: Chuck Lever [mailto:chuck.le...@oracle.com]
> Sent: Friday, March 27, 2015 7:48 PM
> To: Devesh Sharma
> Cc: Anna Schumaker; linux-rdma@vger.kernel.org; Linux NFS Mailing List;
> Meghana Cheripady
> Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
> 
> 
> On Mar 27, 2015, at 12:44 AM, Devesh Sharma
>  wrote:
> 
> >> -Original Message-
> >> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> >> ow...@vger.kernel.org] On Behalf Of Devesh Sharma
> >> Sent: Friday, March 27, 2015 11:13 AM
> >> To: Anna Schumaker; Chuck Lever; linux-rdma@vger.kernel.org; linux-
> >> n...@vger.kernel.org
> >> Subject: RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
> >>
> >> Hi Chuck,
> >>
> >> I have validated these set of patches with ocrdma device, iozone
> >> passes with these.
> >
> >
> > Thanks to Meghna.
> 
> Hi Devesh-
> 
> Is there a Tested-by tag that Anna can add to these patches?
> 
> 
> >>
> >> -Regards
> >> Devesh
> >>
> >>> -Original Message-
> >>> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> >>> ow...@vger.kernel.org] On Behalf Of Anna Schumaker
> >>> Sent: Friday, March 27, 2015 12:10 AM
> >>> To: Chuck Lever; linux-rdma@vger.kernel.org;
> >>> linux-...@vger.kernel.org
> >>> Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
> >>>
> >>> Hey Chuck,
> >>>
> >>> I didn't see anything that needs to be fixed up in these patches.
> >>> Are they ready for me?
> >>>
> >>> Anna
> >>>
> >>> On 03/24/2015 04:30 PM, Chuck Lever wrote:
> >>>> This is a series of client-side patches for NFS/RDMA. In
> >>>> preparation for increasing the transport credit limit and maximum
> >>>> rsize/wsize, I've re-factored the memory registration logic into
> >>>> separate files, invoked via a method API.
> >>>>
> >>>> The two main optimizations in v1 of this series have been dropped.
> >>>> Sagi Grimberg didn't like the complexity of the solution, and there
> >>>> isn't enough time to rework it, test the new version, and get it
> >>>> reviewed before the 4.1 merge window opens. I'm going to prepare
> >>>> these for 4.2.
> >>>>
> >>>> Fixes suggested by reviewers have been included before the
> >>>> refactoring patches to make it easier to backport them to previous
> kernels.
> >>>>
> >>>> The series is available in the nfs-rdma-for-4.1 topic branch at
> >>>>
> >>>> git://linux-nfs.org/projects/cel/cel-2.6.git
> >>>>
> >>>> Changes since v1:
> >>>> - Rebased on 4.0-rc5
> >>>> - Main optimizations postponed to 4.2
> >>>> - Addressed review comments from Anna, Sagi, and Devesh
> >>>>
> >>>> ---
> >>>>
> >>>> Chuck Lever (15):
> >>>>  SUNRPC: Introduce missing well-known netids
> >>>>  xprtrdma: Display IPv6 addresses and port numbers correctly
> >>>>  xprtrdma: Perform a full marshal on retransmit
> >>>>  xprtrdma: Byte-align FRWR registration
> >>>>  xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
> >>>>  xprtrdma: Add vector of ops for each memory registration strategy
> >>>>  xprtrdma: Add a "max_payload" op for each memreg mode
> >>>>  xprtrdma: Add a "register_external" op for each memreg mode
> >>>>  xprtrdma: Add a "deregister_external" op for each memreg mode
> >>>>  xprtrdma: Add "init MRs" memreg op
> >>>>  xprtrdma: Add "reset MRs" memreg op
> >>>>  xprtrdma: Add "destroy MRs" memreg op
> >>>>  xprtrdma: Add "open" memreg op
> >>>>  xprtrdma: Handle non-SEND completions via a callout
> >>>>  xprtrdma: Make rpcrdma_{un}map_one() into inline functions
> >>>>
> >>>>
> >>>> include/linux/sunrpc/msg_prot.h|8
> >>>> net/sunrpc/xprtrdma/Makefile   |3
> >>>> net/sunrpc/xprtrdma/fmr_ops.c  |  208 +++

RE: [PATCH for-next 3/6] IB/ipoib: Handle QP in SQE state

2015-04-15 Thread Devesh Sharma

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Or Gerlitz
> Sent: Thursday, April 02, 2015 4:09 PM
> To: Roland Dreier; Doug Ledford
> Cc: linux-rdma@vger.kernel.org; Erez Shitrit; Tal Alon; Amir Vadai; Or Gerlitz
> Subject: [PATCH for-next 3/6] IB/ipoib: Handle QP in SQE state
> 
> From: Erez Shitrit 
> 
> As the result of a completion error the QP can moved to SQE state by the
> hardware. Since it's not the Error state, there are no flushes and hence the
> driver doesn't know about that.

As per spec, QP transition to SQE causes flush completion for subsequent WQEs, 
the description is telling other way. Am I missing something?

> 
> The fix creates a task that after completion with error which is not a flush
> tracks the QP state and if it is in SQE state moves it back to RTS.
> 
> Signed-off-by: Erez Shitrit 
> Signed-off-by: Or Gerlitz 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib.h|5 +++
>  drivers/infiniband/ulp/ipoib/ipoib_ib.c |   59
> ++-
>  2 files changed, 63 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
> b/drivers/infiniband/ulp/ipoib/ipoib.h
> index 769044c..2703d9a 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -299,6 +299,11 @@ struct ipoib_neigh_table {
>   struct completion   deleted;
>  };
> 
> +struct ipoib_qp_state_validate {
> + struct work_struct work;
> + struct ipoib_dev_priv   *priv;
> +};
> +
>  /*
>   * Device private locking: network stack tx_lock protects members used
>   * in TX fast path, lock protects everything else.  lock nests inside diff 
> --git
> a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> index 29b376d..63b92cb 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> @@ -327,6 +327,51 @@ static void ipoib_dma_unmap_tx(struct ib_device *ca,
>   }
>  }
> 
> +/*
> + * As the result of a completion error the QP Can be transferred to SQE 
> states.
> + * The function checks if the (send)QP is in SQE state and
> + * moves it back to RTS state, that in order to have it functional again.
> + */
> +static void ipoib_qp_state_validate_work(struct work_struct *work) {
> + struct ipoib_qp_state_validate *qp_work =
> + container_of(work, struct ipoib_qp_state_validate, work);
> +
> + struct ipoib_dev_priv *priv = qp_work->priv;
> + struct ib_qp_attr qp_attr;
> + struct ib_qp_init_attr query_init_attr;
> + int ret;
> +
> + ret = ib_query_qp(priv->qp, &qp_attr, IB_QP_STATE, &query_init_attr);
> + if (ret) {
> + ipoib_warn(priv, "%s: Failed to query QP ret: %d\n",
> +__func__, ret);
> + goto free_res;
> + }
> + pr_info("%s: QP: 0x%x is in state: %d\n",
> + __func__, priv->qp->qp_num, qp_attr.qp_state);
> +
> + /* currently support only in SQE->RTS transition*/
> + if (qp_attr.qp_state == IB_QPS_SQE) {
> + qp_attr.qp_state = IB_QPS_RTS;
> +
> + ret = ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE);
> + if (ret) {
> + pr_warn("failed(%d) modify QP:0x%x SQE->RTS\n",
> + ret, priv->qp->qp_num);
> + goto free_res;
> + }
> + pr_info("%s: QP: 0x%x moved from IB_QPS_SQE to
> IB_QPS_RTS\n",
> + __func__, priv->qp->qp_num);
> + } else {
> + pr_warn("QP (%d) will stay in state: %d\n",
> + priv->qp->qp_num, qp_attr.qp_state);
> + }
> +
> +free_res:
> + kfree(qp_work);
> +}
> +
>  static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)  
> {
>   struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -358,10 +403,22
> @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc
> *wc)
>   netif_wake_queue(dev);
> 
>   if (wc->status != IB_WC_SUCCESS &&
> - wc->status != IB_WC_WR_FLUSH_ERR)
> + wc->status != IB_WC_WR_FLUSH_ERR) {
> + struct ipoib_qp_state_validate *qp_work;
>   ipoib_warn(priv, "failed send event "
>  "(status=%d, wrid=%d vend_err %x)\n",
>  wc->status, wr_id, wc->vendor_err);
> + qp_work = kzalloc(sizeof(*qp_work), GFP_ATOMIC);
> + if (!qp_work) {
> + ipoib_warn(priv, "%s Failed alloc
> ipoib_qp_state_validate for qp: 0x%x\n",
> +__func__, priv->qp->qp_num);
> + return;
> + }
> +
> + INIT_WORK(&qp_work->work, ipoib_qp_state_validate_work);
> + qp_work->priv = priv;
> + queue_work(priv->wq, &qp_work->work);
> + }
>  }
> 
>  static int poll_tx(struct ipoib_dev_priv *priv)
> --

RE: [PATCH for-next 3/6] IB/ipoib: Handle QP in SQE state

2015-04-16 Thread Devesh Sharma

> -Original Message-
> From: Erez Shitrit [mailto:ere...@dev.mellanox.co.il]
> Sent: Thursday, April 16, 2015 12:14 PM
> To: Devesh Sharma; Or Gerlitz; Roland Dreier; Doug Ledford
> Cc: linux-rdma@vger.kernel.org; Erez Shitrit; Tal Alon; Amir Vadai
> Subject: Re: [PATCH for-next 3/6] IB/ipoib: Handle QP in SQE state
> 
> On 4/15/2015 8:20 PM, Devesh Sharma wrote:
> >> -Original Message-
> >> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> >> ow...@vger.kernel.org] On Behalf Of Or Gerlitz
> >> Sent: Thursday, April 02, 2015 4:09 PM
> >> To: Roland Dreier; Doug Ledford
> >> Cc: linux-rdma@vger.kernel.org; Erez Shitrit; Tal Alon; Amir Vadai;
> >> Or Gerlitz
> >> Subject: [PATCH for-next 3/6] IB/ipoib: Handle QP in SQE state
> >>
> >> From: Erez Shitrit 
> >>
> >> As the result of a completion error the QP can moved to SQE state by
> >> the hardware. Since it's not the Error state, there are no flushes
> >> and hence the driver doesn't know about that.
> > As per spec, QP transition to SQE causes flush completion for subsequent
> WQEs, the description is telling other way. Am I missing something?
> 
> No you are not :) . the description tries to say the following: the driver 
> cannot
> distinguish between IB_WC_WR_FLUSH_ERR that threated as IB_WC_SUCCESS
> to IB_WC_WR_FLUSH_ERR that comes after other errors that take the QP to
> SQE, The driver must recognize the first error that is not
> IB_WC_WR_FLUSH_ERR and handle accordingly.
> For example, the driver can take the QP to ERROR state as a part of its life
> cycle (drain it, driver down, etc.) and at these situations many
> IB_WC_WR_FLUSH_ERR return and no need to change the state of the QP, it is
> already under the handling of the driver, which is not when other error comes.
> this is the intention of that patch, to return the QP back to life after that 
> (un-
> handed) cases.

Okay, makes sense to me.

> 
> >> The fix creates a task that after completion with error which is not
> >> a flush tracks the QP state and if it is in SQE state moves it back to RTS.
> >>
> >> Signed-off-by: Erez Shitrit 
> >> Signed-off-by: Or Gerlitz 
> >> ---
> >>   drivers/infiniband/ulp/ipoib/ipoib.h|5 +++
> >>   drivers/infiniband/ulp/ipoib/ipoib_ib.c |   59
> >> ++-
> >>   2 files changed, 63 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
> >> b/drivers/infiniband/ulp/ipoib/ipoib.h
> >> index 769044c..2703d9a 100644
> >> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> >> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> >> @@ -299,6 +299,11 @@ struct ipoib_neigh_table {
> >>struct completion   deleted;
> >>   };
> >>
> >> +struct ipoib_qp_state_validate {
> >> +  struct work_struct work;
> >> +  struct ipoib_dev_priv   *priv;
> >> +};
> >> +
> >>   /*
> >>* Device private locking: network stack tx_lock protects members used
> >>* in TX fast path, lock protects everything else.  lock nests
> >> inside diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> >> b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> >> index 29b376d..63b92cb 100644
> >> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> >> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> >> @@ -327,6 +327,51 @@ static void ipoib_dma_unmap_tx(struct ib_device
> *ca,
> >>}
> >>   }
> >>
> >> +/*
> >> + * As the result of a completion error the QP Can be transferred to SQE
> states.
> >> + * The function checks if the (send)QP is in SQE state and
> >> + * moves it back to RTS state, that in order to have it functional again.
> >> + */
> >> +static void ipoib_qp_state_validate_work(struct work_struct *work) {
> >> +  struct ipoib_qp_state_validate *qp_work =
> >> +  container_of(work, struct ipoib_qp_state_validate, work);
> >> +
> >> +  struct ipoib_dev_priv *priv = qp_work->priv;
> >> +  struct ib_qp_attr qp_attr;
> >> +  struct ib_qp_init_attr query_init_attr;
> >> +  int ret;
> >> +
> >> +  ret = ib_query_qp(priv->qp, &qp_attr, IB_QP_STATE, &query_init_attr);
> >> +  if (ret) {
> >> +  ipoib_warn(priv, "%s: Failed to query QP ret: %d\n",
> >> + __func__, ret);
> >> +  goto free_res;
> >> +  }
> >>

RE: [PATCH v5 00/27] IB/Verbs: IB Management Helpers

2015-04-20 Thread Devesh Sharma

Hi Michael,

is there a specific git branch available to pull out all the patches?

-Regards
Devesh

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Michael Wang
> Sent: Monday, April 20, 2015 1:59 PM
> To: Roland Dreier; Sean Hefty; Hal Rosenstock; linux-rdma@vger.kernel.org;
> linux-ker...@vger.kernel.org; h...@dev.mellanox.co.il
> Cc: Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph Raisch; Mike
> Marciniszyn; Eli Cohen; Faisal Latif; Jack Morgenstein; Or Gerlitz; Haggai 
> Eran;
> Ira Weiny; Tom Talpey; Jason Gunthorpe; Doug Ledford; Michael Wang
> Subject: [PATCH v5 00/27] IB/Verbs: IB Management Helpers
> 
> 
> Since v4:
>   * Thanks for the comments from Hal, Sean, Tom, Or Gerlitz, Jason,
> Roland, Ira and Steve :-) Please remind me if anything missed :-P
>   * Fix logical issue inside 3#, 14#
>   * Refine 3#, 4#, 5# with label 'free'
>   * Rework 10# to stop using port 1 when port already assigned
> 
> There are plenty of lengthy code to check the transport type of IB device, or 
> the
> link layer type of it's port, but actually we are just speculating whether a
> particular management/feature is supported by the device/port.
> 
> Thus instead of inferring, we should have our own mechanism for IB
> management capability/protocol/feature checking, several proposals below.
> 
> This patch set will reform the method of getting transport type, we will now
> using query_transport() instead of inferring from transport and link layer
> respectively, also we defined the new transport type to make the concept more
> reasonable.
> 
> Mapping List:
>   node-type   link-layer  old-transport   new-transport
> nes   RNICETH IWARP   IWARP
> amso1100  RNICETH IWARP   IWARP
> cxgb3 RNICETH IWARP   IWARP
> cxgb4 RNICETH IWARP   IWARP
> usnic USNIC_UDP   ETH USNIC_UDP   USNIC_UDP
> ocrdmaIB_CA   ETH IB  IBOE
> mlx4  IB_CA   IB/ETH  IB  IB/IBOE
> mlx5  IB_CA   IB  IB  IB
> ehca  IB_CA   IB  IB  IB
> ipath IB_CA   IB  IB  IB
> mthca IB_CA   IB  IB  IB
> qib   IB_CA   IB  IB  IB
> 
> For example:
>   if (transport == IB) && (link-layer == ETH) will now become:
>   if (query_transport() == IBOE)
> 
> Thus we will be able to get rid of the respective transport and link-layer
> checking, and it will help us to add new protocol/Technology (like OPA) more
> easier, also with the introduced management helpers, IB management logical
> will be more clear and easier for extending.
> 
> Highlights:
> The patch set covered a wide range of IB stuff, thus for those who are
> familiar with the particular part, your suggestion would be invaluable ;-)
> 
> Patch 1#~15# included all the logical reform, 16#~25# introduced the
> management helpers, 26#~27# do clean up.
> 
> Patches haven't been tested yet, we appreciate if any one who have these
> HW willing to provide his Tested-by :-)
> 
> Doug suggested the bitmask mechanism:
>   https://www.mail-archive.com/linux-
> r...@vger.kernel.org/msg23765.html
> which could be the plan for future reforming, we prefer that to be another
> series which focus on semantic and performance.
> 
> This patch-set is somewhat 'bloated' now and it may be a good timing for
> staging, I'd like to suggest we focus on improving existed helpers and 
> push
> all the further reforms into next series ;-)
> 
> Proposals:
> Sean:
>   https://www.mail-archive.com/linux-
> r...@vger.kernel.org/msg23339.html
> Doug:
>   https://www.mail-archive.com/linux-
> r...@vger.kernel.org/msg23418.html
>   https://www.mail-archive.com/linux-
> r...@vger.kernel.org/msg23765.html
> Jason:
>   https://www.mail-archive.com/linux-
> r...@vger.kernel.org/msg23425.html
> 
> Michael Wang (27):
> IB/Verbs: Implement new callback query_transport()
> IB/Verbs: Implement raw management helpers
> IB/Verbs: Reform IB-core mad/agent/user_mad
> IB/Verbs: Reform IB-core cm
> IB/Verbs: Reform IB-core sa_query
> IB/Verbs: Reform IB-core multicast
> IB/Verbs: Reform IB-ulp ipoib
> IB/Verbs: Reform IB-ulp xprtrdma
> IB/Verbs: Reform IB-core verbs/uverbs_cmd/sysfs
> IB/Verbs: Reform cm related part in IB-core cma/ucm
> IB/Verbs: Reform route related part in IB-core cma
> IB/Verbs: Reform mcast related part in IB-core cma
> IB/Verbs: Reserve legacy transport type in 'dev_addr'
> IB/Verbs: Reform cma_acquire_dev()
> IB/Verbs: Reform rest part in IB-

RE: [PATCH v5 13/27] IB/Verbs: Reserve legacy transport type in 'dev_addr'

2015-04-20 Thread Devesh Sharma

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Michael Wang
> Sent: Monday, April 20, 2015 2:08 PM
> To: Roland Dreier; Sean Hefty; linux-rdma@vger.kernel.org; linux-
> ker...@vger.kernel.org; h...@dev.mellanox.co.il
> Cc: Michael Wang; Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph
> Raisch; Mike Marciniszyn; Eli Cohen; Faisal Latif; Jack Morgenstein; Or 
> Gerlitz;
> Haggai Eran; Ira Weiny; Tom Talpey; Jason Gunthorpe; Doug Ledford
> Subject: [PATCH v5 13/27] IB/Verbs: Reserve legacy transport type in
> 'dev_addr'
> 
> 
> Reserve the legacy transport type for the 'transport' member of 'struct
> rdma_dev_addr' until we make sure this is no longer needed.
> 
> Cc: Hal Rosenstock 
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/cma.c | 25 +++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index ebac646..6195bf6 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -244,14 +244,35 @@ static inline void cma_set_ip_ver(struct cma_hdr
> *hdr, u8 ip_ver)
>   hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);  }
> 
> +static inline void cma_set_legacy_transport(struct rdma_cm_id *id) {
> + switch (id->device->node_type) {
> + case RDMA_NODE_IB_CA:
> + case RDMA_NODE_IB_SWITCH:
> + case RDMA_NODE_IB_ROUTER:
> + id->route.addr.dev_addr.transport = RDMA_TRANSPORT_IB;

What about IBOE transport, am I missing something here? As of today ocrdma 
exports node_type  as RDMA_NODE_IB_CA, here transport will be set to 
RDMA_TRANSPORT_IB,
Should it be RDMA_TRANPORT_IBOE?

> + break;
> + case RDMA_NODE_RNIC:
> + id->route.addr.dev_addr.transport =
> RDMA_TRANSPORT_IWARP;
> + break;
> + case RDMA_NODE_USNIC:
> + id->route.addr.dev_addr.transport =
> RDMA_TRANSPORT_USNIC;
> + break;
> + case RDMA_NODE_USNIC_UDP:
> + id->route.addr.dev_addr.transport =
> RDMA_TRANSPORT_USNIC_UDP;
> + break;
> + default:
> + BUG();
> + }
> +}
> +
>  static void cma_attach_to_dev(struct rdma_id_private *id_priv,
> struct cma_device *cma_dev)
>  {
>   atomic_inc(&cma_dev->refcount);
>   id_priv->cma_dev = cma_dev;
>   id_priv->id.device = cma_dev->device;
> - id_priv->id.route.addr.dev_addr.transport =
> - rdma_node_get_transport(cma_dev->device->node_type);
> + cma_set_legacy_transport(&id_priv->id);
>   list_add_tail(&id_priv->list, &cma_dev->id_list);  }
> 
> --
> 2.1.0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

RE: [PATCH v5 14/27] IB/Verbs: Reform cma_acquire_dev()

2015-04-20 Thread Devesh Sharma

Looks good, I would like to test with ocrdma before confirming.

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Michael Wang
> Sent: Monday, April 20, 2015 2:08 PM
> To: Roland Dreier; Sean Hefty; linux-rdma@vger.kernel.org; linux-
> ker...@vger.kernel.org; h...@dev.mellanox.co.il
> Cc: Michael Wang; Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph
> Raisch; Mike Marciniszyn; Eli Cohen; Faisal Latif; Jack Morgenstein; Or 
> Gerlitz;
> Haggai Eran; Ira Weiny; Tom Talpey; Jason Gunthorpe; Doug Ledford
> Subject: [PATCH v5 14/27] IB/Verbs: Reform cma_acquire_dev()
> 
> 
> Reform cma_acquire_dev() with management helpers, introduce
> cma_validate_port() to make the code more clean.
> 
> Cc: Hal Rosenstock 
> Cc: Steve Wise 
> Cc: Tom Talpey 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> Cc: Ira Weiny 
> Cc: Sean Hefty 
> Signed-off-by: Michael Wang 
> ---
>  drivers/infiniband/core/cma.c | 68 +
> --
>  1 file changed, 40 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 6195bf6..44e7bb9 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -370,18 +370,35 @@ static int cma_translate_addr(struct sockaddr *addr,
> struct rdma_dev_addr *dev_a
>   return ret;
>  }
> 
> +static inline int cma_validate_port(struct ib_device *device, u8 port,
> +   union ib_gid *gid, int dev_type) {
> + u8 found_port;
> + int ret = -ENODEV;
> +
> + if ((dev_type == ARPHRD_INFINIBAND) && !rdma_tech_ib(device,
> port))
> + return ret;
> +
> + if ((dev_type != ARPHRD_INFINIBAND) && rdma_tech_ib(device, port))
> + return ret;
> +
> + ret = ib_find_cached_gid(device, gid, &found_port, NULL);
> + if (port != found_port)
> + return -ENODEV;
> +
> + return ret;
> +}
> +
>  static int cma_acquire_dev(struct rdma_id_private *id_priv,
>  struct rdma_id_private *listen_id_priv)  {
>   struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
>   struct cma_device *cma_dev;
> - union ib_gid gid, iboe_gid;
> + union ib_gid gid, iboe_gid, *gidp;
>   int ret = -ENODEV;
> - u8 port, found_port;
> - enum rdma_link_layer dev_ll = dev_addr->dev_type ==
> ARPHRD_INFINIBAND ?
> - IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET;
> + u8 port;
> 
> - if (dev_ll != IB_LINK_LAYER_INFINIBAND &&
> + if (dev_addr->dev_type != ARPHRD_INFINIBAND &&
>   id_priv->id.ps == RDMA_PS_IPOIB)
>   return -EINVAL;
> 
> @@ -391,41 +408,36 @@ static int cma_acquire_dev(struct rdma_id_private
> *id_priv,
> 
>   memcpy(&gid, dev_addr->src_dev_addr +
>  rdma_addr_gid_offset(dev_addr), sizeof gid);
> - if (listen_id_priv &&
> - rdma_port_get_link_layer(listen_id_priv->id.device,
> -  listen_id_priv->id.port_num) == dev_ll) {
> +
> + if (listen_id_priv) {
>   cma_dev = listen_id_priv->cma_dev;
>   port = listen_id_priv->id.port_num;
> - if (rdma_node_get_transport(cma_dev->device->node_type)
> == RDMA_TRANSPORT_IB &&
> - rdma_port_get_link_layer(cma_dev->device, port) ==
> IB_LINK_LAYER_ETHERNET)
> - ret = ib_find_cached_gid(cma_dev->device, &iboe_gid,
> -  &found_port, NULL);
> - else
> - ret = ib_find_cached_gid(cma_dev->device, &gid,
> -  &found_port, NULL);
> + gidp = rdma_tech_iboe(cma_dev->device, port) ?
> +&iboe_gid : &gid;
> 
> - if (!ret && (port  == found_port)) {
> - id_priv->id.port_num = found_port;
> + ret = cma_validate_port(cma_dev->device, port, gidp,
> + dev_addr->dev_type);
> + if (!ret) {
> + id_priv->id.port_num = port;
>   goto out;
>   }
>   }
> +
>   list_for_each_entry(cma_dev, &dev_list, list) {
>   for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port)
> {
>   if (listen_id_priv &&
>   listen_id_priv->cma_dev == cma_dev &&
>   listen_id_priv->id.port_num == port)
>   continue;
> - if (rdma_port_get_link_layer(cma_dev->device, port)
> == dev_ll) {
> - if (rdma_node_get_transport(cma_dev-
> >device->node_type) == RDMA_TRANSPORT_IB &&
> - rdma_port_get_link_layer(cma_dev-
> >device, port) == IB_LINK_LAYER_ETHERNET)
> - ret = ib_find_cached_gid(cma_dev-
> >device, &iboe_gid,

RE: [PATCH v5 00/27] IB/Verbs: IB Management Helpers

2015-04-21 Thread Devesh Sharma

Hi Michael,

It will be great help if you could base you patches on existing Roland's tree 
and share to branch details to pull. 
Just like Chuck lever does for his nfs-rdma patches?

-Regards
Devesh

> -Original Message-
> From: Michael Wang [mailto:yun.w...@profitbricks.com]
> Sent: Tuesday, April 21, 2015 1:17 PM
> To: Devesh Sharma; Roland Dreier; Sean Hefty; Hal Rosenstock; linux-
> r...@vger.kernel.org; linux-ker...@vger.kernel.org; h...@dev.mellanox.co.il
> Cc: Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph Raisch; Mike
> Marciniszyn; Eli Cohen; Faisal Latif; Jack Morgenstein; Or Gerlitz; Haggai 
> Eran;
> Ira Weiny; Tom Talpey; Jason Gunthorpe; Doug Ledford
> Subject: Re: [PATCH v5 00/27] IB/Verbs: IB Management Helpers
> 
> Hi, Devesh
> 
> On 04/21/2015 07:41 AM, Devesh Sharma wrote:
> > Hi Michael,
> >
> > is there a specific git branch available to pull out all the patches?
> 
> Not yet, we may need the maintainer to tell us which branch could the series
> been applied for testing purpose, after we all satisfied :-)
> 
> For now we could 'git am' these patches to 'infiniband.git/for-next'
> in order to do testing.
> 
> Regards,
> Michael Wang
> 
> >
> > -Regards
> > Devesh
> >
> >> -Original Message-
> >> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> >> ow...@vger.kernel.org] On Behalf Of Michael Wang
> >> Sent: Monday, April 20, 2015 1:59 PM
> >> To: Roland Dreier; Sean Hefty; Hal Rosenstock;
> >> linux-rdma@vger.kernel.org; linux-ker...@vger.kernel.org;
> >> h...@dev.mellanox.co.il
> >> Cc: Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph Raisch; Mike
> >> Marciniszyn; Eli Cohen; Faisal Latif; Jack Morgenstein; Or Gerlitz;
> >> Haggai Eran; Ira Weiny; Tom Talpey; Jason Gunthorpe; Doug Ledford;
> >> Michael Wang
> >> Subject: [PATCH v5 00/27] IB/Verbs: IB Management Helpers
> >>
> >>
> >> Since v4:
> >>   * Thanks for the comments from Hal, Sean, Tom, Or Gerlitz, Jason,
> >> Roland, Ira and Steve :-) Please remind me if anything missed :-P
> >>   * Fix logical issue inside 3#, 14#
> >>   * Refine 3#, 4#, 5# with label 'free'
> >>   * Rework 10# to stop using port 1 when port already assigned
> >>
> >> There are plenty of lengthy code to check the transport type of IB
> >> device, or the link layer type of it's port, but actually we are just
> >> speculating whether a particular management/feature is supported by the
> device/port.
> >>
> >> Thus instead of inferring, we should have our own mechanism for IB
> >> management capability/protocol/feature checking, several proposals below.
> >>
> >> This patch set will reform the method of getting transport type, we
> >> will now using query_transport() instead of inferring from transport
> >> and link layer respectively, also we defined the new transport type
> >> to make the concept more reasonable.
> >>
> >> Mapping List:
> >>node-type   link-layer  old-transport   new-transport
> >> nesRNICETH IWARP   IWARP
> >> amso1100   RNICETH IWARP   IWARP
> >> cxgb3  RNICETH IWARP   IWARP
> >> cxgb4  RNICETH IWARP   IWARP
> >> usnic  USNIC_UDP   ETH USNIC_UDP   USNIC_UDP
> >> ocrdma IB_CA   ETH IB  IBOE
> >> mlx4   IB_CA   IB/ETH  IB  IB/IBOE
> >> mlx5   IB_CA   IB  IB  IB
> >> ehca   IB_CA   IB  IB  IB
> >> ipath  IB_CA   IB  IB  IB
> >> mthca  IB_CA   IB  IB  IB
> >> qibIB_CA   IB  IB  IB
> >>
> >> For example:
> >>if (transport == IB) && (link-layer == ETH) will now become:
> >>if (query_transport() == IBOE)
> >>
> >> Thus we will be able to get rid of the respective transport and
> >> link-layer checking, and it will help us to add new
> >> protocol/Technology (like OPA) more easier, also with the introduced
> >> management helpers, IB management logical will be more clear and easier
> for extending.
> >>
> >> Highlights:
> >> The patch set covered a wide range of IB stuff

RE: [PATCH v6 00/26] IB/Verbs: IB Management Helpers

2015-04-27 Thread Devesh Sharma

Tested-By: Devesh Sharma 

I am still in process of reviewing the series. Will respond soon.

-Regards
Devesh
> -Original Message-
> From: Michael Wang [mailto:yun.w...@profitbricks.com]
> Sent: Friday, April 24, 2015 6:43 PM
> To: Roland Dreier; Sean Hefty; Hal Rosenstock; linux-rdma@vger.kernel.org;
> linux-ker...@vger.kernel.org
> Cc: Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph Raisch; Mike
> Marciniszyn; Eli Cohen; Faisal Latif; Jack Morgenstein; Or Gerlitz; Haggai 
> Eran;
> Ira Weiny; Tom Talpey; Jason Gunthorpe; Doug Ledford; Devesh Sharma; Liran
> Liss; Dave Goodell
> Subject: Re: [PATCH v6 00/26] IB/Verbs: IB Management Helpers
> 
> Add missing Cc:
> Devesh Sharma "
> Liran Liss "
> Dave Goodell "
> 
> Regards,
> Michael Wang
> 
> On 04/24/2015 02:23 PM, Michael Wang wrote:
> > Since v5:
> >   * Thanks to Ira, Devesh for the review and testing :-)
> >   * Thanks for the comments from Steve, Tom, Jason, Hal, Devesh, Ira,
> > Liran, Jason, Dave :-) Please remind me if anything missed :-P
> >   * Trivial fix for 4#
> >   * Drop the reform on acquiring link-layer in 9#
> >   * Drop cap_ipoib()
> >
> > There are plenty of lengthy code to check the transport type of IB
> > device, or the link layer type of it's port, but actually we are just
> > speculating whether a particular management/feature is supported by the
> device/port.
> >
> > Thus instead of inferring, we should have our own mechanism for IB
> > management capability/protocol/feature checking, several proposals below.
> >
> > This patch set will reform the method of getting transport type, we
> > will now using query_transport() instead of inferring from transport
> > and link layer respectively, also we defined the new transport type to
> > make the concept more reasonable.
> >
> > Mapping List:
> > node-type   link-layer  old-transport   new-transport
> > nes RNICETH IWARP   IWARP
> > amso1100RNICETH IWARP   IWARP
> > cxgb3   RNICETH IWARP   IWARP
> > cxgb4   RNICETH IWARP   IWARP
> > usnic   USNIC_UDP   ETH USNIC_UDP   USNIC_UDP
> > ocrdma  IB_CA   ETH IB  IBOE
> > mlx4IB_CA   IB/ETH  IB  IB/IBOE
> > mlx5IB_CA   IB  IB  IB
> > ehcaIB_CA   IB  IB  IB
> > ipath   IB_CA   IB  IB  IB
> > mthca   IB_CA   IB  IB  IB
> > qib IB_CA   IB  IB  IB
> >
> > For example:
> > if (transport == IB) && (link-layer == ETH) will now become:
> > if (query_transport() == IBOE)
> >
> > Thus we will be able to get rid of the respective transport and
> > link-layer checking, and it will help us to add new
> > protocol/Technology (like OPA) more easier, also with the introduced
> > management helpers, IB management logical will be more clear and easier
> for extending.
> >
> > Highlights:
> > The patch set covered a wide range of IB stuff, thus for those who are
> > familiar with the particular part, your suggestion would be
> > invaluable ;-)
> >
> > Patch 1#~15# included all the logical reform, 16#~25# introduced the
> > management helpers, 26#~27# do clean up.
> >
> > we appreciate for those one who have the HW willing to provide
> > Tested-by :-)
> >
> > Doug suggested the bitmask mechanism:
> > https://www.mail-archive.com/linux-
> r...@vger.kernel.org/msg23765.html
> > which could be the plan for future reforming, we prefer that to be 
> > another
> > series which focus on semantic and performance.
> >
> > This patch-set is somewhat 'bloated' now and it may be a good timing for
> > staging, I'd like to suggest we focus on improving existed helpers and 
> > push
> > all the further reforms into next series ;-)
> >
> > We now have a repository based on latest infiniband/for-next with this
> > series applied:
> > g...@github.com:ywang-pb/infiniband-wy.git
> >
> > Proposals:
> > Sean:
> > https://www.mail-archive.com/linux-
> r...@vger.kernel.org/msg23339.html
> > Doug:
> > https://www.mail-archive.com/linux-
> r...@vger.kernel.org/msg23418.html
&

RE: [PATCH v1 02/14] xprtrdma: Warn when there are orphaned IB objects

2015-05-07 Thread Devesh Sharma

> -Original Message-
> From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
> Sent: Wednesday, May 06, 2015 10:18 PM
> To: Devesh Sharma
> Cc: Chuck Lever; linux-rdma@vger.kernel.org; Linux NFS Mailing List
> Subject: Re: [PATCH v1 02/14] xprtrdma: Warn when there are orphaned IB
> objects
>
> On Wed, May 06, 2015 at 07:52:03PM +0530, Devesh Sharma wrote:
> > >> Should we check for EBUSY explicitly? other then this is an error
> > >> in vendor specific ib_dealloc_pd()
> > >
> > > Any error return means ib_dealloc_pd() has failed, right? Doesn’t
> > > that mean the PD is still allocated, and could cause problems later?
> >
> > Yes, you are correct, I was thinking ib_dealloc_pd() has a refcount
> > implemented in the core layer, thus if the PD is used by any resource,
> > it will always fail with -EBUSY.
>
> .. and it will not be freed, which indicates a serious bug in the caller,
> so the
> caller should respond to the failure with a BUG_ON or WARN_ON.

Yes, that’s what this patch is doing.

>
> > .With emulex adapter it is possible to fail dealloc_pd with ENOMEM or
> > EIO in cases where device f/w is not responding etc. this situation do
> > not represent PD is actually in use.
>
> This is a really bad idea. If the pd was freed and from the consumer's
> perspective everything is sane then it should return success.
>
> If the driver detects an internal failure, then it should move the driver
> to a
> failed state (whatever that means, but at a minimum it means the firmware
> state and driver state must be resync'd), and still succeed the dealloc.

Makes sense.

>
> There is absolutely nothing the caller can do about a driver level failure
> here,
> and it doesn't indicate a caller bug.
>
> Returning ENOMEM for dealloc is what we'd call an insane API. You can't
> have
> failable memory allocations in a dealloc path.

I will supply a fix in ocrdma.

Reviewed-by: Devesh Sharma 
>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 00/16] NFS/RDMA patches proposed for 4.2

2015-05-20 Thread Devesh Sharma

Medusa test passes with average load.

Tested-By: Devesh Sharma 

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Monday, May 11, 2015 11:32 PM
> To: linux-rdma@vger.kernel.org; linux-...@vger.kernel.org
> Subject: [PATCH v2 00/16] NFS/RDMA patches proposed for 4.2
>
> I'd like these patches to be considered for merging upstream. This patch
> series
> includes:
>
>   - JIT allocation of rpcrdma_mw structures
>   - Break-up of rb_lock
>   - Reduction of how many rpcrdma_mw structs are needed per transport
>
> These are pre-requisites for increasing the RPC slot count and r/wsize on
> RPC/RDMA transports, and provide scalability benefits even on their own.
> And:
>
>   - A generic transport fault injector
>
> This is useful to discover regressions in logic that handles transport
> reconnection.
>
> You can find these in my git repo in the "nfs-rdma-for-4.2" topic branch.
> See:
>
>   git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
>
> Changes since v1:
>
>   - Rebased on 4.1-rc3
>   - Transport fault injector controlled from debugfs rather than /proc
>   - Transport fault injector works for all transport types
>   - bc_send() clean up suggested by Christoph Hellwig
>   - Added Reviewed-by: tags. Many thanks to reviewers!
>   - Addressed all review comments but one: Sagi's comment about
>  ri_device remains unresolved.
>
> ---
>
> Chuck Lever (16):
>   SUNRPC: Transport fault injection
>   xprtrdma: Warn when there are orphaned IB objects
>   xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt
>   xprtrdma: Remove rr_func
>   xprtrdma: Use ib_device pointer safely
>   xprtrdma: Introduce helpers for allocating MWs
>   xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external()
>   xprtrdma: Introduce an FRMR recovery workqueue
>   xprtrdma: Acquire MRs in rpcrdma_register_external()
>   xprtrdma: Remove unused LOCAL_INV recovery logic
>   xprtrdma: Remove ->ro_reset
>   xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy
>   xprtrdma: Split rb_lock
>   xprtrdma: Stack relief in fmr_op_map()
>   xprtrdma: Reduce per-transport MR allocation
>   SUNRPC: Clean up bc_send()
>
>
>  include/linux/sunrpc/bc_xprt.h |1
>  include/linux/sunrpc/xprt.h|   19 +++
>  include/linux/sunrpc/xprtrdma.h|3
>  net/sunrpc/Makefile|2
>  net/sunrpc/bc_svc.c|   63 -
>  net/sunrpc/clnt.c  |1
>  net/sunrpc/debugfs.c   |   77 +++
>  net/sunrpc/svc.c   |   33 -
>  net/sunrpc/xprt.c  |2
>  net/sunrpc/xprtrdma/fmr_ops.c  |  120 +++--
>  net/sunrpc/xprtrdma/frwr_ops.c |  227
> +++-
>  net/sunrpc/xprtrdma/physical_ops.c |   14 --
>  net/sunrpc/xprtrdma/rpc_rdma.c |8 -
>  net/sunrpc/xprtrdma/transport.c|   30 +++-
>  net/sunrpc/xprtrdma/verbs.c|  257
> +---
>  net/sunrpc/xprtrdma/xprt_rdma.h|   38 -
>  net/sunrpc/xprtsock.c  |   10 +
>  17 files changed, 492 insertions(+), 413 deletions(-)  delete mode 100644
> net/sunrpc/bc_svc.c
>
> --
> Chuck Lever
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] RDMA/ocrdma: remove unneeded tests

2015-05-26 Thread Devesh Sharma

Although already applied, Thanks Laurent and Doug.

Acked-By: Devesh Sharma 

> -Original Message-
> From: Doug Ledford [mailto:dledf...@redhat.com]
> Sent: Tuesday, May 26, 2015 6:15 PM
> To: Laurent Navet
> Cc: selvin.xav...@emulex.com; devesh.sha...@emulex.com;
> mitesh.ah...@emulex.com; sean.he...@intel.com; hal.rosenst...@gmail.com;
> linux-rdma@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH] RDMA/ocrdma: remove unneeded tests
>
> On Thu, 2015-05-21 at 22:07 +0200, Laurent Navet wrote:
> > The same code is executed regardless status value, so these tests can
> > be removed.
> > Fix Coverity CID 1271151 and 1268788
>
> Thanks, applied for 4.2.
>
> > Signed-off-by: Laurent Navet 
> > ---
> >  drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 6 --
> >  1 file changed, 6 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
> > b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
> > index 0c9e959..e748090 100644
> > --- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
> > +++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
> > @@ -1960,9 +1960,6 @@ static int ocrdma_mbx_reg_mr_cont(struct
> ocrdma_dev *dev,
> > upper_32_bits(hwmr->pbl_table[i + pbl_offset].pa);
> > }
> > status = ocrdma_mbx_cmd(dev, (struct ocrdma_mqe *)cmd);
> > -   if (status)
> > -   goto mbx_err;
> > -mbx_err:
> > kfree(cmd);
> > return status;
> >  }
> > @@ -3044,9 +3041,6 @@ static int ocrdma_mbx_modify_eqd(struct
> ocrdma_dev *dev, struct ocrdma_eq *eq,
> > (eq[i].aic_obj.prev_eqd * 65)/100;
> > }
> > status = ocrdma_mbx_cmd(dev, (struct ocrdma_mqe *)cmd);
> > -   if (status)
> > -   goto mbx_err;
> > -mbx_err:
> > kfree(cmd);
> > return status;
> >  }
>
>
> --
> Doug Ledford 
>   GPG KeyID: 0E572FDD
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] MAINTAINERS: update Emulex ocrdma email addresses

2015-05-26 Thread Devesh Sharma

Thanks Laurent,

My earlier mail bounced back from Linux-kernel mailing list, thus
resending.

CC'ing Doug.

Acked-By: Devesh Sharma 


> -Original Message-
> From: Laurent Navet [mailto:laurent.na...@gmail.com]
> Sent: Wednesday, May 27, 2015 12:46 AM
> To: a...@linux-foundation.org; gre...@linuxfoundation.org;
> da...@davemloft.net; mche...@osg.samsung.com; a...@arndb.de;
> j...@perches.com; jingooh...@gmail.com; selvin.xav...@avagotech.com;
> devesh.sha...@avagotech.com; mitesh.ah...@avagotech.com
> Cc: linux-ker...@vger.kernel.org; Laurent Navet
> Subject: [PATCH] MAINTAINERS: update Emulex ocrdma email addresses
>
> @emulex.com addresses respond to use @avagotech.com.
>
> Signed-off-by: Laurent Navet 
> ---
>  MAINTAINERS | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f8e0afb..05766f7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8846,9 +8846,9 @@ S:  Supported
>  F:   drivers/net/ethernet/emulex/benet/
>
>  EMULEX ONECONNECT ROCE DRIVER
> -M:   Selvin Xavier 
> -M:   Devesh Sharma 
> -M:   Mitesh Ahuja 
> +M:   Selvin Xavier 
> +M:   Devesh Sharma 
> +M:   Mitesh Ahuja 
>  L:   linux-rdma@vger.kernel.org
>  W:   http://www.emulex.com
>  S:   Supported
> --
> 2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] RDMA/ocrdma: Fix memory leak in _ocrdma_alloc_pd()

2015-05-31 Thread Devesh Sharma

Acked-By: Devesh Sharma 

On Sat, May 30, 2015 at 8:56 PM, Doug Ledford  wrote:
> On Fri, 2015-05-29 at 23:10 -0700, Roland Dreier wrote:
>> From: Roland Dreier 
>>
>> If ocrdma_get_pd_num() fails, then we need to free the pd struct we 
>> allocated.
>>
>> This was detected by Coverity (CID 1271245).
>>
>> Signed-off-by: Roland Dreier 
>
> Thanks, series applied.
>
>> ---
>>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c 
>> b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
>> index 9dcb66077d6c..fcb86749efc9 100644
>> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
>> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
>> @@ -375,7 +375,12 @@ static struct ocrdma_pd *_ocrdma_alloc_pd(struct 
>> ocrdma_dev *dev,
>>
>>   if (dev->pd_mgr->pd_prealloc_valid) {
>>   status = ocrdma_get_pd_num(dev, pd);
>> - return (status == 0) ? pd : ERR_PTR(status);
>> + if (status == 0) {
>> + return pd;
>> + } else {
>> + kfree(pd);
>> + return ERR_PTR(status);
>> + }
>>   }
>>
>>  retry:
>
>
> --
> Doug Ledford 
>   GPG KeyID: 0E572FDD
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible

2015-06-01 Thread Devesh Sharma

Looks good.

Reviewed-By: Devesh Sharma 

On Sun, May 31, 2015 at 5:44 PM, Or Gerlitz  wrote:
> From: Matan Barak 
>
> Add a new ib_cq_init_attr structure which contains the
> previous cqe (minimum number of CQ entries) and comp_vector
> (completion vector) in addition to a new flags field.
> All vendors' create_cq callbacks are changed in order
> to work with the new API.
>
> This commit does not change any functionality.
>
> Signed-off-by: Matan Barak 
> Signed-off-by: Or Gerlitz 
> ---
>  drivers/infiniband/core/uverbs_cmd.c |6 --
>  drivers/infiniband/core/verbs.c  |3 ++-
>  drivers/infiniband/hw/amso1100/c2_provider.c |7 ++-
>  drivers/infiniband/hw/cxgb3/iwch_provider.c  |   11 ---
>  drivers/infiniband/hw/cxgb4/cq.c |9 +++--
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |8 
>  drivers/infiniband/hw/ehca/ehca_cq.c |7 ++-
>  drivers/infiniband/hw/ehca/ehca_iverbs.h |3 ++-
>  drivers/infiniband/hw/ipath/ipath_cq.c   |9 +++--
>  drivers/infiniband/hw/ipath/ipath_verbs.h|3 ++-
>  drivers/infiniband/hw/mlx4/cq.c  |8 +++-
>  drivers/infiniband/hw/mlx4/mlx4_ib.h |3 ++-
>  drivers/infiniband/hw/mlx5/cq.c  |   10 --
>  drivers/infiniband/hw/mlx5/main.c|3 ++-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |5 +++--
>  drivers/infiniband/hw/mthca/mthca_provider.c |8 ++--
>  drivers/infiniband/hw/nes/nes_verbs.c|   11 ---
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |7 ++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |6 --
>  drivers/infiniband/hw/qib/qib_cq.c   |   11 ---
>  drivers/infiniband/hw/qib/qib_verbs.h|5 +++--
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c |   10 +++---
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.h |7 ---
>  include/rdma/ib_verbs.h  |   10 --
>  24 files changed, 124 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/infiniband/core/uverbs_cmd.c 
> b/drivers/infiniband/core/uverbs_cmd.c
> index a9f0489..1954ebb 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -1341,6 +1341,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
> struct ib_uverbs_event_file*ev_file = NULL;
> struct ib_cq   *cq;
> int ret;
> +   struct ib_cq_init_attr attr = {};
>
> if (out_len < sizeof resp)
> return -ENOSPC;
> @@ -1376,8 +1377,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
> INIT_LIST_HEAD(&obj->comp_list);
> INIT_LIST_HEAD(&obj->async_list);
>
> -   cq = file->device->ib_dev->create_cq(file->device->ib_dev, cmd.cqe,
> -cmd.comp_vector,
> +   attr.cqe = cmd.cqe;
> +   attr.comp_vector = cmd.comp_vector;
> +   cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr,
>  file->ucontext, &udata);
> if (IS_ERR(cq)) {
> ret = PTR_ERR(cq);
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 685a362..f7615d4 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1078,8 +1078,9 @@ struct ib_cq *ib_create_cq(struct ib_device *device,
>void *cq_context, int cqe, int comp_vector)
>  {
> struct ib_cq *cq;
> +   struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = 
> comp_vector};
>
> -   cq = device->create_cq(device, cqe, comp_vector, NULL, NULL);
> +   cq = device->create_cq(device, &attr, NULL, NULL);
>
> if (!IS_ERR(cq)) {
> cq->device= device;
> diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c 
> b/drivers/infiniband/hw/amso1100/c2_provider.c
> index d396c39..a43e022 100644
> --- a/drivers/infiniband/hw/amso1100/c2_provider.c
> +++ b/drivers/infiniband/hw/amso1100/c2_provider.c
> @@ -286,13 +286,18 @@ static int c2_destroy_qp(struct ib_qp *ib_qp)
> return 0;
>  }
>
> -static struct ib_cq *c2_create_cq(struct ib_device *ibdev, int entries, int 
> vector,
> +static struct ib_cq *c2_create_cq(struct ib_device *ibdev,
> + const struct ib_cq_init_attr *attr,
>   struct ib_ucontext *context,
>   struct ib_udata *udata)
>  {
> +   int entries

Re: [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device

2015-06-01 Thread Devesh Sharma

ocrdma part Looks good.

Reviewed-By: Devesh Sharma 

On Sun, May 31, 2015 at 5:44 PM, Or Gerlitz  wrote:
> From: Matan Barak 
>
> Vendors should be able to pass vendor specific data to/from
> user-space via query_device uverb. In order to do this,
> we need to pass the vendors' specific udata.
>
> Signed-off-by: Matan Barak 
> Signed-off-by: Or Gerlitz 
> ---
>  drivers/infiniband/core/device.c |4 +++-
>  drivers/infiniband/core/uverbs_cmd.c |2 +-
>  drivers/infiniband/hw/amso1100/c2_provider.c |7 +--
>  drivers/infiniband/hw/cxgb3/iwch_provider.c  |8 ++--
>  drivers/infiniband/hw/cxgb4/provider.c   |8 ++--
>  drivers/infiniband/hw/ehca/ehca_hca.c|6 +-
>  drivers/infiniband/hw/ehca/ehca_iverbs.h |3 ++-
>  drivers/infiniband/hw/ipath/ipath_verbs.c|7 +--
>  drivers/infiniband/hw/mlx4/main.c|6 +-
>  drivers/infiniband/hw/mlx5/main.c|9 +++--
>  drivers/infiniband/hw/mthca/mthca_provider.c |7 +--
>  drivers/infiniband/hw/nes/nes_verbs.c|6 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |6 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |3 ++-
>  drivers/infiniband/hw/qib/qib_verbs.c|6 --
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c |6 +-
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.h |3 ++-
>  include/rdma/ib_verbs.h  |3 ++-
>  18 files changed, 75 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/infiniband/core/device.c 
> b/drivers/infiniband/core/device.c
> index 568cb41..694bd66 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -539,9 +539,11 @@ EXPORT_SYMBOL(ib_dispatch_event);
>  int ib_query_device(struct ib_device *device,
> struct ib_device_attr *device_attr)
>  {
> +   struct ib_udata uhw = {.outlen = 0, .inlen = 0};
> +
> memset(device_attr, 0, sizeof(*device_attr));
>
> -   return device->query_device(device, device_attr);
> +   return device->query_device(device, device_attr, &uhw);
>  }
>  EXPORT_SYMBOL(ib_query_device);
>
> diff --git a/drivers/infiniband/core/uverbs_cmd.c 
> b/drivers/infiniband/core/uverbs_cmd.c
> index 11ee298..bbb02ff 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -3428,7 +3428,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
> *file,
>
> memset(&attr, 0, sizeof(attr));
>
> -   err = device->query_device(device, &attr);
> +   err = device->query_device(device, &attr, uhw);
> if (err)
> return err;
>
> diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c 
> b/drivers/infiniband/hw/amso1100/c2_provider.c
> index a43e022..382f109 100644
> --- a/drivers/infiniband/hw/amso1100/c2_provider.c
> +++ b/drivers/infiniband/hw/amso1100/c2_provider.c
> @@ -63,13 +63,16 @@
>  #include "c2_provider.h"
>  #include "c2_user.h"
>
> -static int c2_query_device(struct ib_device *ibdev,
> -  struct ib_device_attr *props)
> +static int c2_query_device(struct ib_device *ibdev, struct ib_device_attr 
> *props,
> +  struct ib_udata *uhw)
>  {
> struct c2_dev *c2dev = to_c2dev(ibdev);
>
> pr_debug("%s:%u\n", __func__, __LINE__);
>
> +   if (uhw->inlen || uhw->outlen)
> +   return -EINVAL;
> +
> *props = c2dev->props;
> return 0;
>  }
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
> b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> index 2eaf7e8..c4b5936 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -1150,13 +1150,17 @@ static u64 fw_vers_string_to_u64(struct iwch_dev 
> *iwch_dev)
>(fw_mic & 0x);
>  }
>
> -static int iwch_query_device(struct ib_device *ibdev,
> -struct ib_device_attr *props)
> +static int iwch_query_device(struct ib_device *ibdev, struct ib_device_attr 
> *props,
> +struct ib_udata *uhw)
>  {
>
> struct iwch_dev *dev;
> +
> PDBG("%s ibdev %p\n", __func__, ibdev);
>
> +   if (uhw->inlen || uhw->outlen)
> +   return -EINVAL;
> +
> dev = to_iwch_dev(ibdev);
> memset(props, 0, sizeof *props);
> memcpy(&props->sys_image_guid, dev->rdev.t3cdev_p->lldev->dev_addr, 
> 6);
> diff --git a/drivers/infiniband/hw/cxgb4

Re: [PATCH] RDMA/ocrdma: fix double free on pd

2015-06-07 Thread Devesh Sharma

Acked-By: Devesh Sharma 

On Fri, Jun 5, 2015 at 8:17 PM, Colin King  wrote:
> From: Colin Ian King 
>
> A reorganisation of the PD allocation and deallocation in commit
> 9ba1377daa ("RDMA/ocrdma: Move PD resource management to driver.")
> introduced a double free on pd, as detected by static analysis by
> smatch:
>
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:682 ocrdma_alloc_pd()
>   error: double free of 'pd'^
>
> The original call to ocrdma_mbx_dealloc_pd() (which does not kfree
> pd) was replaced with a call to _ocrdma_dealloc_pd() (which does
> kfree pd).  The kfree following this call causes the double free,
> so just remove it to fix the problem.
>
> Fixes: 9ba1377daa ("RDMA/ocrdma: Move PD resource management to driver.")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c 
> b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> index 9dcb660..219f212 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> @@ -679,7 +679,6 @@ err:
> ocrdma_release_ucontext_pd(uctx);
> } else {
> status = _ocrdma_dealloc_pd(dev, pd);
> -   kfree(pd);
> }
>  exit:
> return ERR_PTR(status);
> --
> 2.1.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] RDMA/ocrdma: update module license to dual license

2015-06-12 Thread Devesh Sharma

This patch updates the ocrdma module license from GPL to
Dual BSD/GPL licensing.

Signed-off-by: Devesh Sharma 
---
 drivers/infiniband/hw/ocrdma/ocrdma_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index cee43c1..2917324 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -60,7 +60,7 @@
 MODULE_VERSION(OCRDMA_ROCE_DRV_VERSION);
 MODULE_DESCRIPTION(OCRDMA_ROCE_DRV_DESC " " OCRDMA_ROCE_DRV_VERSION);
 MODULE_AUTHOR("Emulex Corporation");
-MODULE_LICENSE("GPL");
+MODULE_LICENSE("Dual BSD/GPL");
 
 static LIST_HEAD(ocrdma_dev_list);
 static DEFINE_SPINLOCK(ocrdma_devlist_lock);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] update ocrdma to dual license

2015-06-12 Thread Devesh Sharma

A series to update the license from GPL to GPL/Dual-BSD licensing
for ocrdma source.

Devesh Sharma (2):
  RDMA/ocrdma: update license from gpl to dual license
  RDMA/ocrdma: update module license to dual license

 drivers/infiniband/hw/ocrdma/ocrdma.h   |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_abi.h   |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c|   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h|   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c|   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_hw.h|   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |   55 +--
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_stats.c |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_stats.h |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |   53 -
 12 files changed, 409 insertions(+), 229 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] RDMA/ocrdma: update license from gpl to dual license

2015-06-12 Thread Devesh Sharma

This patch edits the legal statement for ocrdma driver code and
moves it to GPL/Dual-BSD license.

Signed-off-by: Devesh Sharma 
---
 drivers/infiniband/hw/ocrdma/ocrdma.h   |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_abi.h   |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c|   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h|   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c|   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_hw.h|   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_stats.c |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_stats.h |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |   53 +--
 12 files changed, 408 insertions(+), 228 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index b396344..6a36338 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -1,21 +1,36 @@
-/***
- * This file is part of the Emulex RoCE Device Driver for  *
- * RoCE (RDMA over Converged Ethernet) adapters.   *
- * Copyright (C) 2008-2012 Emulex. All rights reserved.*
- * EMULEX and SLI are trademarks of Emulex.*
- * www.emulex.com  *
- * *
- * This program is free software; you can redistribute it and/or   *
- * modify it under the terms of version 2 of the GNU General   *
- * Public License as published by the Free Software Foundation.*
- * This program is distributed in the hope that it will be useful. *
- * ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND  *
- * WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY,  *
- * FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE  *
- * DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD *
- * TO BE LEGALLY INVALID.  See the GNU General Public License for  *
- * more details, a copy of which can be found in the file COPYING  *
- * included with this package. *
+/* This file is part of the Emulex RoCE Device Driver for
+ * RoCE (RDMA over Converged Ethernet) adapters.
+ * Copyright (C) 2012-2015 Emulex. All rights reserved.
+ * EMULEX and SLI are trademarks of Emulex.
+ * www.emulex.com
+ *
+ * This software is available to you under a choice of one of two licenses.
+ * You may choose to be licensed under the terms of the GNU General Public
+ * License (GPL) Version 2, available from the file COPYING in the main
+ * directory of this source tree, or the BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * - Redistributions of source code must retain the above copyright notice,
+ *   this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * Contact Information:
  * linux-driv...@emulex.com
@@ -23,7 +38,7 @@
  * Emulex
  *  Susan Street
  * Costa Mesa, CA 92626
- ***/
+ */
 
 #ifndef __OCRDMA_H__
 #define __OCRDMA_H__
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_abi.h 
b/drivers/infiniband/hw/ocrdma/ocrdma_abi.h
index 1554cca..430b135 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_abi.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_abi.h
@@ -

Re: [PATCH 0/2] update ocrdma to dual license

2015-06-29 Thread Devesh Sharma

Hi Doug,

a gentle reminder to pull this simple patch-set to your tree.

-Regards
Devesh

On Fri, Jun 12, 2015 at 10:15 PM, Devesh Sharma
 wrote:
> A series to update the license from GPL to GPL/Dual-BSD licensing
> for ocrdma source.
>
> Devesh Sharma (2):
>   RDMA/ocrdma: update license from gpl to dual license
>   RDMA/ocrdma: update module license to dual license
>
>  drivers/infiniband/hw/ocrdma/ocrdma.h   |   53 -
>  drivers/infiniband/hw/ocrdma/ocrdma_abi.h   |   53 -
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c|   53 -
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.h|   53 -
>  drivers/infiniband/hw/ocrdma/ocrdma_hw.c|   53 -
>  drivers/infiniband/hw/ocrdma/ocrdma_hw.h|   53 -
>  drivers/infiniband/hw/ocrdma/ocrdma_main.c  |   55 
> +--
>  drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |   53 -
>  drivers/infiniband/hw/ocrdma/ocrdma_stats.c |   53 -
>  drivers/infiniband/hw/ocrdma/ocrdma_stats.h |   53 -
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   53 -
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |   53 -
>  12 files changed, 409 insertions(+), 229 deletions(-)
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] update ocrdma to dual license

2015-06-30 Thread Devesh Sharma

Hi Christoph, I really do not have it.

However, this change is initiated with consent of Emulex management.

-Regards
Devesh

On Tue, Jun 30, 2015 at 11:36 AM, Christoph Hellwig  wrote:
> On Fri, Jun 12, 2015 at 10:15:03PM +0530, Devesh Sharma wrote:
>> A series to update the license from GPL to GPL/Dual-BSD licensing
>> for ocrdma source.
>
> Do you have a written consent from everyone who contributed to the
> driver to do this?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] update ocrdma to dual license

2015-07-02 Thread Devesh Sharma

Christoph,


Apologies, I misspoke in my response to you.  There was a study of the
code and we thought it was reasonable to post.  However, in retrospect
we should have used more due diligence.  We're going back to seek
explicit consent from key contributors.


-Regards
Devesh

On Wed, Jul 1, 2015 at 12:51 PM, Christoph Hellwig  wrote:
> On Tue, Jun 30, 2015 at 04:19:43PM +0530, Devesh Sharma wrote:
>> Hi Christoph, I really do not have it.
>>
>> However, this change is initiated with consent of Emulex management.
>
> Emulex managament can't relicense code written by other people.
>
> There isn't a lot of non-Emulex contributions here, but you'll have to be
> really careful with it, and this whole patchset absolutely misses due
> care.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] update ocrdma to dual license

2015-07-07 Thread Devesh Sharma

On Fri, Jul 3, 2015 at 9:08 PM, Weiny, Ira  wrote:
>>
>> Christoph,
>>
>>
>> Apologies, I misspoke in my response to you.  There was a study of the code 
>> and
>> we thought it was reasonable to post.  However, in retrospect we should have
>> used more due diligence.  We're going back to seek explicit consent from key
>> contributors.
>
> I'm no legal expert, but don't you need consent from _all_ contributors?

We're contacting all of the contributors that we identified.  Quite a
few have already replied in the positive. We are waiting replies from
a few and tracking down some bounced emails from others.

>
> Ira
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] update ocrdma to dual license

2015-07-07 Thread Devesh Sharma

Hi Christoph,

On Fri, Jul 3, 2015 at 9:22 PM, Christoph Hellwig  wrote:
> On Fri, Jul 03, 2015 at 03:38:55PM +, Weiny, Ira wrote:
>> >
>> > Christoph,
>> >
>> >
>> > Apologies, I misspoke in my response to you.  There was a study of the 
>> > code and
>> > we thought it was reasonable to post.  However, in retrospect we should 
>> > have
>> > used more due diligence.  We're going back to seek explicit consent from 
>> > key
>> > contributors.
>>
>> I'm no legal expert, but don't you need consent from _all_ contributors?
>
> Exactly.  I'd also like to see a really good argument why you'd want to
> relicense code that's been part of the kernel for a while.

We (Emulex/Avago) were lobbied by the Open-Fabrics Alliance (OFA) to
change the licensing from just GPLv2 to a dual GPLv2/BSD license.
They would prefer the elements in the OFED stack all be dual licensed.
We're trying  to move to this position.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 02/12] xprtrdma: Raise maximum payload size to one megabyte

2015-07-10 Thread Devesh Sharma

Looks good

Reviewed-By: Devesh Sharma 

On Fri, Jul 10, 2015 at 2:11 AM, Chuck Lever  wrote:
> The point of larger rsize and wsize is to reduce the per-byte cost
> of memory registration and deregistration. Modern HCAs can typically
> handle a megabyte or more with a single registration operation.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/xprt_rdma.h |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index f49dd8b..abee472 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -165,8 +165,7 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
>   * struct rpcrdma_buffer. N is the max number of outstanding requests.
>   */
>
> -/* temporary static scatter/gather max */
> -#define RPCRDMA_MAX_DATA_SEGS  (64)/* max scatter/gather */
> +#define RPCRDMA_MAX_DATA_SEGS  ((1 * 1024 * 1024) / PAGE_SIZE)
>  #define RPCRDMA_MAX_SEGS   (RPCRDMA_MAX_DATA_SEGS + 2) /* head+tail = 2 
> */
>
>  struct rpcrdma_buffer;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 03/12] xprtrdma: Increase default credit limit

2015-07-10 Thread Devesh Sharma

Increasing the default slot table entries will increase the MR
requirements per mount.

Currently, with 32 as default Client ends up allocating 2178 frmrs
(ref: kernel 4.1-rc4) for a single mount. With 128 frmr requirement
for startup would be 8448.

8K+ MRs per mount just for start-up, I am a little doubtful about this
change. We can always release-note that "for better performance
increase the slot table entries by echo 128 >
/proc/sys/sunrpc/rdma_slot_table_entries"

-Regards
Devesh

On Fri, Jul 10, 2015 at 2:12 AM, Chuck Lever  wrote:
> In preparation for similar increases on NFS/RDMA servers, bump the
> advertised credit limit for RPC/RDMA to 128. This allocates some
> extra resources, but the client will continue to allow only the
> number of RPCs in flight that the server requests via its advertised
> credit limit.
>
> Signed-off-by: Chuck Lever 
> ---
>  include/linux/sunrpc/xprtrdma.h |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/sunrpc/xprtrdma.h b/include/linux/sunrpc/xprtrdma.h
> index b176130..b7b279b 100644
> --- a/include/linux/sunrpc/xprtrdma.h
> +++ b/include/linux/sunrpc/xprtrdma.h
> @@ -49,7 +49,7 @@
>   * a single chunk type per message is supported currently.
>   */
>  #define RPCRDMA_MIN_SLOT_TABLE (2U)
> -#define RPCRDMA_DEF_SLOT_TABLE (32U)
> +#define RPCRDMA_DEF_SLOT_TABLE (128U)
>  #define RPCRDMA_MAX_SLOT_TABLE (256U)
>
>  #define RPCRDMA_DEF_INLINE  (1024) /* default inline max */
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 04/12] xprtrdma: Remove last ib_reg_phys_mr() call site

2015-07-10 Thread Devesh Sharma

Looks good.

Reviewed-By: Devesh Sharma 

On Fri, Jul 10, 2015 at 2:12 AM, Chuck Lever  wrote:
> All HCA providers have an ib_get_dma_mr() verb. Thus
> rpcrdma_ia_open() will either grab the device's local_dma_key if one
> is available, or it will call ib_get_dma_mr() which is a 100%
> guaranteed fallback. There is never any need to use the
> ib_reg_phys_mr() code path in rpcrdma_register_internal(), so it can
> be removed.
>
> The remaining logic in rpcrdma_{de}register_internal() is folded
> into rpcrdma_{alloc,free}_regbuf().
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/verbs.c |  102 
> ---
>  net/sunrpc/xprtrdma/xprt_rdma.h |1
>  2 files changed, 21 insertions(+), 82 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 891c4ed..cdf5220 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1229,75 +1229,6 @@ rpcrdma_mapping_error(struct rpcrdma_mr_seg *seg)
> (unsigned long long)seg->mr_dma, seg->mr_dmalen);
>  }
>
> -static int
> -rpcrdma_register_internal(struct rpcrdma_ia *ia, void *va, int len,
> -   struct ib_mr **mrp, struct ib_sge *iov)
> -{
> -   struct ib_phys_buf ipb;
> -   struct ib_mr *mr;
> -   int rc;
> -
> -   /*
> -* All memory passed here was kmalloc'ed, therefore phys-contiguous.
> -*/
> -   iov->addr = ib_dma_map_single(ia->ri_device,
> -   va, len, DMA_BIDIRECTIONAL);
> -   if (ib_dma_mapping_error(ia->ri_device, iov->addr))
> -   return -ENOMEM;
> -
> -   iov->length = len;
> -
> -   if (ia->ri_have_dma_lkey) {
> -   *mrp = NULL;
> -   iov->lkey = ia->ri_dma_lkey;
> -   return 0;
> -   } else if (ia->ri_bind_mem != NULL) {
> -   *mrp = NULL;
> -   iov->lkey = ia->ri_bind_mem->lkey;
> -   return 0;
> -   }
> -
> -   ipb.addr = iov->addr;
> -   ipb.size = iov->length;
> -   mr = ib_reg_phys_mr(ia->ri_pd, &ipb, 1,
> -   IB_ACCESS_LOCAL_WRITE, &iov->addr);
> -
> -   dprintk("RPC:   %s: phys convert: 0x%llx "
> -   "registered 0x%llx length %d\n",
> -   __func__, (unsigned long long)ipb.addr,
> -   (unsigned long long)iov->addr, len);
> -
> -   if (IS_ERR(mr)) {
> -   *mrp = NULL;
> -   rc = PTR_ERR(mr);
> -   dprintk("RPC:   %s: failed with %i\n", __func__, rc);
> -   } else {
> -   *mrp = mr;
> -   iov->lkey = mr->lkey;
> -   rc = 0;
> -   }
> -
> -   return rc;
> -}
> -
> -static int
> -rpcrdma_deregister_internal(struct rpcrdma_ia *ia,
> -   struct ib_mr *mr, struct ib_sge *iov)
> -{
> -   int rc;
> -
> -   ib_dma_unmap_single(ia->ri_device,
> -   iov->addr, iov->length, DMA_BIDIRECTIONAL);
> -
> -   if (NULL == mr)
> -   return 0;
> -
> -   rc = ib_dereg_mr(mr);
> -   if (rc)
> -   dprintk("RPC:   %s: ib_dereg_mr failed %i\n", __func__, 
> rc);
> -   return rc;
> -}
> -
>  /**
>   * rpcrdma_alloc_regbuf - kmalloc and register memory for SEND/RECV buffers
>   * @ia: controlling rpcrdma_ia
> @@ -1317,26 +1248,30 @@ struct rpcrdma_regbuf *
>  rpcrdma_alloc_regbuf(struct rpcrdma_ia *ia, size_t size, gfp_t flags)
>  {
> struct rpcrdma_regbuf *rb;
> -   int rc;
> +   struct ib_sge *iov;
>
> -   rc = -ENOMEM;
> rb = kmalloc(sizeof(*rb) + size, flags);
> if (rb == NULL)
> goto out;
>
> -   rb->rg_size = size;
> -   rb->rg_owner = NULL;
> -   rc = rpcrdma_register_internal(ia, rb->rg_base, size,
> -  &rb->rg_mr, &rb->rg_iov);
> -   if (rc)
> +   iov = &rb->rg_iov;
> +   iov->addr = ib_dma_map_single(ia->ri_device,
> + (void *)rb->rg_base, size,
> + DMA_BIDIRECTIONAL);
> +   if (ib_dma_mapping_error(ia->ri_device, iov->addr))
> goto out_free;
>
> +   iov->length = size;
> +   iov->lkey = ia->ri_have_dma_lkey ?
> +   ia->ri_dma_lkey : ia->ri_bind_mem->lkey;

Re: [PATCH v1 05/12] xprtrdma: Account for RPC/RDMA header size when deciding to inline

2015-07-10 Thread Devesh Sharma

Looks good

Reveiwed-By: Devesh Sharma 

On Fri, Jul 10, 2015 at 2:12 AM, Chuck Lever  wrote:
> When marshaling RPC/RDMA requests, ensure the combined size of
> RPC/RDMA header and RPC header do not exceed the inline threshold.
> Endpoints typically reject RPC/RDMA messages that exceed the size
> of their receive buffers.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/rpc_rdma.c |   29 +++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index 84ea37d..8cf9402 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -71,6 +71,31 @@ static const char transfertypes[][12] = {
>  };
>  #endif
>
> +/* The client can send a request inline as long as the RPCRDMA header
> + * plus the RPC call fit under the transport's inline limit. If the
> + * combined call message size exceeds that limit, the client must use
> + * the read chunk list for this operation.
> + */
> +static bool rpcrdma_args_inline(struct rpc_rqst *rqst)
> +{
> +   unsigned int callsize = RPCRDMA_HDRLEN_MIN + rqst->rq_snd_buf.len;
> +
> +   return callsize <= RPCRDMA_INLINE_WRITE_THRESHOLD(rqst);
> +}
> +
> +/* The client can’t know how large the actual reply will be. Thus it
> + * plans for the largest possible reply for that particular ULP
> + * operation. If the maximum combined reply message size exceeds that
> + * limit, the client must provide a write list or a reply chunk for
> + * this request.
> + */
> +static bool rpcrdma_results_inline(struct rpc_rqst *rqst)
> +{
> +   unsigned int repsize = RPCRDMA_HDRLEN_MIN + rqst->rq_rcv_buf.buflen;
> +
> +   return repsize <= RPCRDMA_INLINE_READ_THRESHOLD(rqst);
> +}
> +
>  /*
>   * Chunk assembly from upper layer xdr_buf.
>   *
> @@ -418,7 +443,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
>  * a READ, then use write chunks to separate the file data
>  * into pages; otherwise use reply chunks.
>  */
> -   if (rqst->rq_rcv_buf.buflen <= RPCRDMA_INLINE_READ_THRESHOLD(rqst))
> +   if (rpcrdma_results_inline(rqst))
> wtype = rpcrdma_noch;
> else if (rqst->rq_rcv_buf.page_len == 0)
> wtype = rpcrdma_replych;
> @@ -441,7 +466,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
>  * implies the op is a write.
>  * TBD check NFSv4 setacl
>  */
> -   if (rqst->rq_snd_buf.len <= RPCRDMA_INLINE_WRITE_THRESHOLD(rqst))
> +   if (rpcrdma_args_inline(rqst))
> rtype = rpcrdma_noch;
> else if (rqst->rq_snd_buf.page_len == 0)
> rtype = rpcrdma_areadch;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 06/12] xprtrdma: Always provide a write list when sending NFS READ

2015-07-10 Thread Devesh Sharma

Looks good

Reveiwed-By: Devesh Sharma 

On Fri, Jul 10, 2015 at 2:12 AM, Chuck Lever  wrote:
> The client has been setting up a reply chunk for NFS READs that are
> smaller than the inline threshold. This is not efficient: both the
> server and client CPUs have to copy the reply's data payload into
> and out of the memory region that is then transferred via RDMA.
>
> Using the write list, the data payload is moved by the device and no
> extra data copying is necessary.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/rpc_rdma.c |   21 -
>  1 file changed, 4 insertions(+), 17 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index 8cf9402..e569da4 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -427,28 +427,15 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
> /*
>  * Chunks needed for results?
>  *
> +* o Read ops return data as write chunk(s), header as inline.
>  * o If the expected result is under the inline threshold, all ops
>  *   return as inline (but see later).
>  * o Large non-read ops return as a single reply chunk.
> -* o Large read ops return data as write chunk(s), header as inline.
> -*
> -* Note: the NFS code sending down multiple result segments implies
> -* the op is one of read, readdir[plus], readlink or NFSv4 getacl.
> -*/
> -
> -   /*
> -* This code can handle read chunks, write chunks OR reply
> -* chunks -- only one type. If the request is too big to fit
> -* inline, then we will choose read chunks. If the request is
> -* a READ, then use write chunks to separate the file data
> -* into pages; otherwise use reply chunks.
>  */
> -   if (rpcrdma_results_inline(rqst))
> -   wtype = rpcrdma_noch;
> -   else if (rqst->rq_rcv_buf.page_len == 0)
> -   wtype = rpcrdma_replych;
> -   else if (rqst->rq_rcv_buf.flags & XDRBUF_READ)
> +   if (rqst->rq_rcv_buf.flags & XDRBUF_READ)
> wtype = rpcrdma_writech;
> +   else if (rpcrdma_results_inline(rqst))
> +   wtype = rpcrdma_noch;
> else
> wtype = rpcrdma_replych;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls

2015-07-10 Thread Devesh Sharma

we need to honor the max limits of device by checking
dev_attr.max_sge? a vendor may not support 4 sges.

On Fri, Jul 10, 2015 at 2:13 AM, Chuck Lever  wrote:
> Only the RPC/RDMA header is sent when making an RDMA_NOMSG call.
> That header resides in the first element of the iovec array
> passed to rpcrdma_ep_post().
>
> Instead of special casing the iovec element with the pad, just
> sync all the elements in the send iovec. Syncing the zero pad is
> not strictly necessary, but the pad is rarely if ever used these
> days, and the extra cost in that case is small.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/rpc_rdma.c  |4 
>  net/sunrpc/xprtrdma/verbs.c |   27 +++
>  net/sunrpc/xprtrdma/xprt_rdma.h |   18 ++
>  3 files changed, 25 insertions(+), 24 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index cb05233..2e721f2 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -575,6 +575,10 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
> req->rl_send_iov[0].length = hdrlen;
> req->rl_send_iov[0].lkey = rdmab_lkey(req->rl_rdmabuf);
>
> +   req->rl_niovs = 1;
> +   if (rtype == rpcrdma_areadch)
> +   return 0;
> +
> req->rl_send_iov[1].addr = rdmab_addr(req->rl_sendbuf);
> req->rl_send_iov[1].length = rpclen;
> req->rl_send_iov[1].lkey = rdmab_lkey(req->rl_sendbuf);
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index cdf5220..9199436 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -651,7 +651,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct 
> rpcrdma_ia *ia,
> if (rc)
> return rc;
> ep->rep_attr.cap.max_recv_wr = cdata->max_requests;
> -   ep->rep_attr.cap.max_send_sge = (cdata->padding ? 4 : 2);
> +   ep->rep_attr.cap.max_send_sge = RPCRDMA_MAX_IOVS;
> ep->rep_attr.cap.max_recv_sge = 1;
> ep->rep_attr.cap.max_inline_data = 0;
> ep->rep_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
> @@ -1303,9 +1303,11 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia,
> struct rpcrdma_ep *ep,
> struct rpcrdma_req *req)
>  {
> +   struct ib_device *device = ia->ri_device;
> struct ib_send_wr send_wr, *send_wr_fail;
> struct rpcrdma_rep *rep = req->rl_reply;
> -   int rc;
> +   struct ib_sge *iov = req->rl_send_iov;
> +   int i, rc;
>
> if (rep) {
> rc = rpcrdma_ep_post_recv(ia, ep, rep);
> @@ -1316,22 +1318,15 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia,
>
> send_wr.next = NULL;
> send_wr.wr_id = RPCRDMA_IGNORE_COMPLETION;
> -   send_wr.sg_list = req->rl_send_iov;
> +   send_wr.sg_list = iov;
> send_wr.num_sge = req->rl_niovs;
> send_wr.opcode = IB_WR_SEND;
> -   if (send_wr.num_sge == 4)   /* no need to sync any pad (constant) 
> */
> -   ib_dma_sync_single_for_device(ia->ri_device,
> - req->rl_send_iov[3].addr,
> - req->rl_send_iov[3].length,
> - DMA_TO_DEVICE);
> -   ib_dma_sync_single_for_device(ia->ri_device,
> - req->rl_send_iov[1].addr,
> - req->rl_send_iov[1].length,
> - DMA_TO_DEVICE);
> -   ib_dma_sync_single_for_device(ia->ri_device,
> - req->rl_send_iov[0].addr,
> - req->rl_send_iov[0].length,
> - DMA_TO_DEVICE);
> +
> +   for (i = 0; i < send_wr.num_sge; i++)
> +   ib_dma_sync_single_for_device(device, iov[i].addr,
> + iov[i].length, DMA_TO_DEVICE);
> +   dprintk("RPC:   %s: posting %d s/g entries\n",
> +   __func__, send_wr.num_sge);
>
> if (DECR_CQCOUNT(ep) > 0)
> send_wr.send_flags = 0;
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index ce4e79e..90da480 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -256,16 +256,18 @@ struct rpcrdma_mr_seg {   /* chunk descriptors 
> */
> char*mr_offset; /* kva if no page, else offset */
>  };
>
> +#define RPCRDMA_MAX_IOVS   (4)
> +
>  struct rpcrdma_req {
> -   unsigned intrl_niovs;   /* 0, 2 or 4 */
> -   unsigned intrl_nchunks; /* non-zero if chunks */
> -   unsigned intrl_connect_cookie;  /* retry detection */
> -   struct rpcrdma_buffer *rl_buffer; /* home base for this structure */
> +   unsigned intrl_niovs;
> +   unsigned intrl_nchunks;
> +   unsigned int

Re: [PATCH] IB/core: Destroy ocrdma_dev_id IDR on module exit

2015-07-10 Thread Devesh Sharma

We missed to ack this patch. Thanks Dough and Johannes.

acked-by: Devesh Sharma 

On Thu, Jul 9, 2015 at 3:13 AM, Doug Ledford  wrote:
> On 07/08/2015 11:23 AM, Johannes Thumshirn wrote:
>> Destroy ocrdma_dev_id IDR on module exit, reclaiming the allocated memory.
>>
>
> Thanks, applied.
>
>
> --
> Doug Ledford 
>   GPG KeyID: 0E572FDD
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls

2015-07-10 Thread Devesh Sharma

On Fri, Jul 10, 2015 at 6:28 PM, Tom Talpey  wrote:
> On 7/10/2015 7:29 AM, Devesh Sharma wrote:
>>
>> we need to honor the max limits of device by checking
>> dev_attr.max_sge? a vendor may not support 4 sges.
>
>
> iWARP requires a minimum of 4 send SGEs (draft-hilland-verbs 8.1.3.2)
>
>An RI MUST support at least four Scatter/Gather Elements per
>Scatter/Gather List when the Scatter/Gather List refers to the Data
>Source of a Send Operation Type or the Data Sink of a Receive
>Operation. An RI is NOT REQUIRED to support more than one
>Scatter/Gather Element per Scatter/Gather List when the
>Scatter/Gather List refers to the Data Source of an RDMA Write.
>
> I'm not certain if IB and RoCE state a similar minimum requirement,
> but it seems a very bad idea to have fewer.

To my knowledge IBTA Spec do not pose any such minimum requirement.
RoCE also do not puts any minimum requirement. I think its fine if
xprtrdma honors the device limits, thus covering iWARP devices because
all iWARP devices would support minimum 4.

Chuck would correct me if xprtrdma do have any minimum requirements

>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 03/12] xprtrdma: Increase default credit limit

2015-07-10 Thread Devesh Sharma

Yes, we are covered here, I took reference of  4.1-rc4 and that series
was pulled in 4.1-rc7.

I will update my test-bench and re-validate the numbers.

-Regards

On Fri, Jul 10, 2015 at 8:03 PM, Chuck Lever  wrote:
>
> On Jul 10, 2015, at 6:45 AM, Devesh Sharma  
> wrote:
>
>> Increasing the default slot table entries will increase the MR
>> requirements per mount.
>
> Yes, but:
>
>> Currently, with 32 as default Client ends up allocating 2178 frmrs
>> (ref: kernel 4.1-rc4) for a single mount. With 128 frmr requirement
>> for startup would be 8448.
>
> Commit 40c6ed0c8a7f ("xprtrdma: Reduce per-transport MR allocation”)
> is supposed to address this. This commit is in 4.1.
>
> The number of MRs per credit is now 256 divided by the HCA’s
> max_fast_reg_page_list_len. See frwr_op_open().
>
> For mlx4 the number of MRs per credit is just 1, for example.
>
>
>> 8K+ MRs per mount just for start-up, I am a little doubtful about this
>> change. We can always release-note that "for better performance
>> increase the slot table entries by echo 128 >
>> /proc/sys/sunrpc/rdma_slot_table_entries"
>>
>> -Regards
>> Devesh
>>
>> On Fri, Jul 10, 2015 at 2:12 AM, Chuck Lever  wrote:
>>> In preparation for similar increases on NFS/RDMA servers, bump the
>>> advertised credit limit for RPC/RDMA to 128. This allocates some
>>> extra resources, but the client will continue to allow only the
>>> number of RPCs in flight that the server requests via its advertised
>>> credit limit.
>>>
>>> Signed-off-by: Chuck Lever 
>>> ---
>>> include/linux/sunrpc/xprtrdma.h |2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/sunrpc/xprtrdma.h 
>>> b/include/linux/sunrpc/xprtrdma.h
>>> index b176130..b7b279b 100644
>>> --- a/include/linux/sunrpc/xprtrdma.h
>>> +++ b/include/linux/sunrpc/xprtrdma.h
>>> @@ -49,7 +49,7 @@
>>>  * a single chunk type per message is supported currently.
>>>  */
>>> #define RPCRDMA_MIN_SLOT_TABLE (2U)
>>> -#define RPCRDMA_DEF_SLOT_TABLE (32U)
>>> +#define RPCRDMA_DEF_SLOT_TABLE (128U)
>>> #define RPCRDMA_MAX_SLOT_TABLE (256U)
>>>
>>> #define RPCRDMA_DEF_INLINE  (1024) /* default inline max */
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to RoCE v2?

2015-07-13 Thread Devesh Sharma

Hi Ram,

RoCE-v2 patch series is still under review, you should see updated
patch series soon on this mailing list.

-Regards
Devesh

On Mon, Jul 13, 2015 at 5:56 PM, Rupert Dance  wrote:
> Hi Ram,
>
> There were several issues discovered in OFED 3.18 RC3 which have been
> resolved in the latest daily build.
>
> https://www.openfabrics.org/downloads/OFED/ofed-3.18-daily/OFED-3.18-2015070
> 7-2235.tgz
>
> The issues are documented in the latest bugs in Bugzilla:
>
> http://bugs.openfabrics.org/bugzilla/
>
> There are also additional notes about the build procedures in the following
> document:
>
> https://www.openfabrics.org/downloads/WorkGroups/ewg/Build%20Process/OFED-3-
> 18-Build%20process-v3.pdf
>
> Finally we hope to have full support for RoCE v2 in out next major OFED
> Build which will be based on the 4.x kernel.
>
> Thanks
>
> Rupert
>
> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Ram Amrani
> Sent: Monday, July 13, 2015 7:47 AM
> To: linux-rdma@vger.kernel.org
> Subject: How to RoCE v2?
>
> Hi,
> This is my first e-mail on this list. I have a few questions that I hope
> you'll answer -
>
> In the latest OFED (3.18-rc3) I don't see any RoCE v2 code, albeit it being
> introduce in 2014. How come?
>
> I've been unable to track patches from e-mail archives about relevant
> e-mails such as "[PATCH 0/2] Adding support for RoCE V2 specification".
> Where can patches be found/tracked? Is there a git repository with log?
>
> I've also tried to build the latest OFED, in the hope that it will contain
> the RoCE 2 code, but got errors. I followed the instructions in the web
> (https://www.openfabrics.org/index.php/installation.html) but got errors
> when applying patches ("./scripts/admin_rdma.sh -n -p") and also when
> skipping it and running configure and make... are the instructions
> up-to-date?
>
> Thanks,
>
> Ram
>
>
> 
>
> This message and any attached documents contain information from the sending
> company or its parent company(s), subsidiaries, divisions or branch offices
> that may be confidential. If you are not the intended recipient, you may not
> read, copy, distribute, or use this information. If you have received this
> transmission in error, please notify the sender immediately by reply e-mail
> and then delete this message.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to RoCE v2?

2015-07-14 Thread Devesh Sharma

Those should be avaliable on Marc

On Tue, Jul 14, 2015 at 5:45 PM, Ram Amrani  wrote:
> Thanks Robert, Rupert and Devesh.
>
> Where can I access the patches under review?
>
>
>
> -Original Message-
> From: Devesh Sharma [mailto:devesh.sha...@avagotech.com]
> Sent: Monday, July 13, 2015 3:36 PM
> To: Rupert Dance
> Cc: Ram Amrani; linux-rdma@vger.kernel.org
> Subject: Re: How to RoCE v2?
>
> Hi Ram,
>
> RoCE-v2 patch series is still under review, you should see updated patch 
> series soon on this mailing list.
>
> -Regards
> Devesh
>
> On Mon, Jul 13, 2015 at 5:56 PM, Rupert Dance  wrote:
>> Hi Ram,
>>
>> There were several issues discovered in OFED 3.18 RC3 which have been
>> resolved in the latest daily build.
>>
>> https://www.openfabrics.org/downloads/OFED/ofed-3.18-daily/OFED-3.18-2
>> 015070
>> 7-2235.tgz
>>
>> The issues are documented in the latest bugs in Bugzilla:
>>
>> http://bugs.openfabrics.org/bugzilla/
>>
>> There are also additional notes about the build procedures in the
>> following
>> document:
>>
>> https://www.openfabrics.org/downloads/WorkGroups/ewg/Build%20Process/O
>> FED-3-
>> 18-Build%20process-v3.pdf
>>
>> Finally we hope to have full support for RoCE v2 in out next major
>> OFED Build which will be based on the 4.x kernel.
>>
>> Thanks
>>
>> Rupert
>>
>> -Original Message-
>> From: linux-rdma-ow...@vger.kernel.org
>> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Ram Amrani
>> Sent: Monday, July 13, 2015 7:47 AM
>> To: linux-rdma@vger.kernel.org
>> Subject: How to RoCE v2?
>>
>> Hi,
>> This is my first e-mail on this list. I have a few questions that I
>> hope you'll answer -
>>
>> In the latest OFED (3.18-rc3) I don't see any RoCE v2 code, albeit it
>> being introduce in 2014. How come?
>>
>> I've been unable to track patches from e-mail archives about relevant
>> e-mails such as "[PATCH 0/2] Adding support for RoCE V2 specification".
>> Where can patches be found/tracked? Is there a git repository with log?
>>
>> I've also tried to build the latest OFED, in the hope that it will
>> contain the RoCE 2 code, but got errors. I followed the instructions
>> in the web
>> (https://www.openfabrics.org/index.php/installation.html) but got
>> errors when applying patches ("./scripts/admin_rdma.sh -n -p") and
>> also when skipping it and running configure and make... are the
>> instructions up-to-date?
>>
>> Thanks,
>>
>> Ram
>>
>>
>> 
>>
>> This message and any attached documents contain information from the
>> sending company or its parent company(s), subsidiaries, divisions or
>> branch offices that may be confidential. If you are not the intended
>> recipient, you may not read, copy, distribute, or use this
>> information. If you have received this transmission in error, please
>> notify the sender immediately by reply e-mail and then delete this message.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>> in the body of a message to majord...@vger.kernel.org More majordomo
>> info at http://vger.kernel.org/majordomo-info.html
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>> in the body of a message to majord...@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>
> 
>
> This message and any attached documents contain information from the sending 
> company or its parent company(s), subsidiaries, divisions or branch offices 
> that may be confidential. If you are not the intended recipient, you may not 
> read, copy, distribute, or use this information. If you have received this 
> transmission in error, please notify the sender immediately by reply e-mail 
> and then delete this message.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 07/14] xprtrdma: Remove logic that constructs RDMA_MSGP type calls

2015-07-15 Thread Devesh Sharma

with MAX_IOVS set to 2 iozone passes with ocrdma device. My testing
includes both the series of svcrdma and xprtrdma.

On Wed, Jul 15, 2015 at 12:31 AM, Chuck Lever  wrote:
>
> On Jul 14, 2015, at 3:00 PM, Tom Talpey  wrote:
>
>> On 7/13/2015 12:30 PM, Chuck Lever wrote:
>>> RDMA_MSGP type calls insert a zero pad in the middle of the RPC
>>> message to align the RPC request's data payload to the server's
>>> alignment preferences. A server can then "page flip" the payload
>>> into place to avoid a data copy in certain circumstances. However:
>>> ...
>>>
>>> Clean up the marshaling code by removing the logic that constructs
>>> RDMA_MSGP type calls. This also reduces the maximum send iovec size
>>> from four to just two elements.
>>>
>>
>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h 
>>> b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> index 8219011..0b50103 100644
>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>> ...>
>>> +#define RPCRDMA_MAX_IOVS(4)
>>> +
>>
>> So, shouldn't this constant be "2"? The extra 2 iov's were used
>> only for constructing the pad.
>
> Yes, thanks. I folded a couple of patches together into this
> one, and forgot to update the constant.
>
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] update ocrdma to dual license

2015-07-16 Thread Devesh Sharma

We have received appropriate permissions from the code authors and
would like to resubmit the patches to change to a dual-licensed
driver.

Thank-you.

On Thu, Jul 9, 2015 at 2:38 AM, Doug Ledford  wrote:
> On 07/08/2015 04:25 PM, Christoph Hellwig wrote:
>> On Wed, Jul 08, 2015 at 04:15:00PM -0400, Doug Ledford wrote:
>>> On 07/08/2015 04:02 PM, Christoph Hellwig wrote:
 So how about someone tells OFED to stop trying to enforce this BS?
>>>
>>> Unfortunately, simply "not enforcing" a bylaw of a multi-company
>>> organization isn't really a valid option, you should know that.  You
>>> have to work to change the bylaw, which usually involves its own
>>> draconian process.
>>
>> Looks like it's time to get that started.
>
> If they care to, then I'm sure they can.  Unlike you, they might
> consider the dual license a benefit.
>
>>  Or just tell OFED to piss off
>> because they really shouldn't be able to have that sort of influence
>> over code in the Linux kernel.
>
> OFED is a distribution made by the EWG that is a working group of the
> OFA.  You can't tell OFED to piss off, it's an inanimate object.  You
> *could* tell the EWG or OFA to do so.  However, they don't really have
> influence over the linux kernel except that their members contribute
> more code to the RDMA stack than all other contributors combined by
> orders of magnitude.  If an individual code contributor (read Avagotech)
> decides that they wish to comply with the EWG bylaws and make their own
> code compliant (read ocrdma driver), I have no problem with that.  If
> this weren't their code, or if they weren't actively maintaining it and
> the primary contributors to its ongoing changes, it would be a different
> issue.  But that's not the case,  So I'm not inclined to take the stance
> you are.  And since I know they are currently pursuing due diligence on
> getting permission to do so, I'm inclined to block further patches from
> non-Avagotech addresses until either the change is complete or abandoned.
>
> --
> Doug Ledford 
>   GPG KeyID: 0E572FDD
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v1 0/2] ocrdma license change

2015-07-23 Thread Devesh Sharma

Doug- Thanks for your help!  Resubmitting, per your request

Devesh Sharma (2):
  RDMA/ocrdma: update ocrdma license to dual-license
  RDMA/ocrdma: update ocrdma module license srting

 drivers/infiniband/hw/ocrdma/ocrdma.h   |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_abi.h   |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c|   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h|   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c|   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_hw.h|   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |   55 +--
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_stats.c |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_stats.h |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   53 -
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |   53 -
 12 files changed, 409 insertions(+), 229 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v1 1/2] RDMA/ocrdma: update ocrdma license to dual-license

2015-07-23 Thread Devesh Sharma

Change of license from GPLv2 to dual-license (GPLv2 and BSD 2-Clause)

Cc: Tejun Heo 
Cc: Duan Jiong 
Cc: Roland Dreier 
Cc: Jes Sorensen 
Cc: Sasha Levin 
Cc: Dan Carpenter 
Cc: Prarit Bhargava 
Cc: Colin Ian King 
Cc: Wei Yongjun 
Cc: Moni Shoua 
Cc: Rasmus Villemoes 
Cc: Li RongQing 
Cc: Devendra Naga 
Signed-off-by: Devesh Sharma 
---
 drivers/infiniband/hw/ocrdma/ocrdma.h   |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_abi.h   |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c|   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h|   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c|   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_hw.h|   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_stats.c |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_stats.h |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   53 +--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |   53 +--
 12 files changed, 408 insertions(+), 228 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index b396344..6a36338 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -1,21 +1,36 @@
-/***
- * This file is part of the Emulex RoCE Device Driver for  *
- * RoCE (RDMA over Converged Ethernet) adapters.   *
- * Copyright (C) 2008-2012 Emulex. All rights reserved.*
- * EMULEX and SLI are trademarks of Emulex.*
- * www.emulex.com  *
- * *
- * This program is free software; you can redistribute it and/or   *
- * modify it under the terms of version 2 of the GNU General   *
- * Public License as published by the Free Software Foundation.*
- * This program is distributed in the hope that it will be useful. *
- * ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND  *
- * WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY,  *
- * FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE  *
- * DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD *
- * TO BE LEGALLY INVALID.  See the GNU General Public License for  *
- * more details, a copy of which can be found in the file COPYING  *
- * included with this package. *
+/* This file is part of the Emulex RoCE Device Driver for
+ * RoCE (RDMA over Converged Ethernet) adapters.
+ * Copyright (C) 2012-2015 Emulex. All rights reserved.
+ * EMULEX and SLI are trademarks of Emulex.
+ * www.emulex.com
+ *
+ * This software is available to you under a choice of one of two licenses.
+ * You may choose to be licensed under the terms of the GNU General Public
+ * License (GPL) Version 2, available from the file COPYING in the main
+ * directory of this source tree, or the BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * - Redistributions of source code must retain the above copyright notice,
+ *   this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * Contact Information:
  * linux-driv...@emulex.com
@@ -23,7 +38,7 @@
  * Emulex
  *  Susan Street
  * Costa Mesa, CA 92626
- ***/
+ */
 
 #ifndef __OCRDMA_H__
 #define __OCRDMA_H__
diff --git a/drivers/infiniband

[PATCH v1 2/2] RDMA/ocrdma: update ocrdma module license srting

2015-07-23 Thread Devesh Sharma

replace module_license from "GPL" to "Dual BSD/GPL"

Cc: Tejun Heo 
Cc: Duan Jiong 
Cc: Roland Dreier 
Cc: Jes Sorensen 
Cc: Sasha Levin 
Cc: Dan Carpenter 
Cc: Prarit Bhargava 
Cc: Colin Ian King 
Cc: Wei Yongjun 
Cc: Moni Shoua 
Cc: Rasmus Villemoes 
Cc: Li RongQing 
Cc: Devendra Naga 
Signed-off-by: Devesh Sharma 
---
 drivers/infiniband/hw/ocrdma/ocrdma_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 6ded95a..b119a34 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -61,7 +61,7 @@
 MODULE_VERSION(OCRDMA_ROCE_DRV_VERSION);
 MODULE_DESCRIPTION(OCRDMA_ROCE_DRV_DESC " " OCRDMA_ROCE_DRV_VERSION);
 MODULE_AUTHOR("Emulex Corporation");
-MODULE_LICENSE("GPL");
+MODULE_LICENSE("Dual BSD/GPL");
 
 static LIST_HEAD(ocrdma_dev_list);
 static DEFINE_SPINLOCK(ocrdma_devlist_lock);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] xprtrdma: take vendor driver refcount at client

2015-07-27 Thread Devesh Sharma

Thanks Chuck Lever for the valuable feedback and suggestions.

This is a rework of the following patch sent almost a year back:
http://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg20730.html

In presence of active mount if someone tries to rmmod vendor-driver, the
command remains stuck forever waiting for destruction of all rdma-cm-id.
in worst case client can crash during shutdown with active mounts.

The existing code assumes that ia->ri_id->device cannot change during
the lifetime of a transport. Lifting that assumption is a long chain
of work, and is in plan.

The community decided that preventing the hang right now is more
important than waiting for architectural changes.

Signed-off-by: Devesh Sharma 
---
 net/sunrpc/xprtrdma/verbs.c |   31 +++
 1 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 891c4ed..d16f599 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include  /* try_module_get()/module_put() */
 
 #include "xprt_rdma.h"
 
@@ -414,6 +415,14 @@ connected:
return 0;
 }
 
+static void rpcrdma_destroy_id(struct rdma_cm_id *id)
+{
+   if (id) {
+   module_put(id->device->owner);
+   rdma_destroy_id(id);
+   }
+}
+
 static struct rdma_cm_id *
 rpcrdma_create_id(struct rpcrdma_xprt *xprt,
struct rpcrdma_ia *ia, struct sockaddr *addr)
@@ -440,6 +449,11 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
}
wait_for_completion_interruptible_timeout(&ia->ri_done,
msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
+   if (!ia->ri_async_rc && !try_module_get(id->device->owner)) {
+   dprintk("RPC:   %s: Failed to get device module\n",
+   __func__);
+   ia->ri_async_rc = -ENODEV;
+   }
rc = ia->ri_async_rc;
if (rc)
goto out;
@@ -449,16 +463,17 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
if (rc) {
dprintk("RPC:   %s: rdma_resolve_route() failed %i\n",
__func__, rc);
-   goto out;
+   goto put;
}
wait_for_completion_interruptible_timeout(&ia->ri_done,
msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
rc = ia->ri_async_rc;
if (rc)
-   goto out;
+   goto put;
 
return id;
-
+put:
+   module_put(id->device->owner);
 out:
rdma_destroy_id(id);
return ERR_PTR(rc);
@@ -592,7 +607,7 @@ out3:
ib_dealloc_pd(ia->ri_pd);
ia->ri_pd = NULL;
 out2:
-   rdma_destroy_id(ia->ri_id);
+   rpcrdma_destroy_id(ia->ri_id);
ia->ri_id = NULL;
 out1:
return rc;
@@ -618,7 +633,7 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
if (ia->ri_id->qp)
rdma_destroy_qp(ia->ri_id);
-   rdma_destroy_id(ia->ri_id);
+   rpcrdma_destroy_id(ia->ri_id);
ia->ri_id = NULL;
}
 
@@ -825,7 +840,7 @@ retry:
if (ia->ri_device != id->device) {
printk("RPC:   %s: can't reconnect on "
"different device!\n", __func__);
-   rdma_destroy_id(id);
+   rpcrdma_destroy_id(id);
rc = -ENETUNREACH;
goto out;
}
@@ -834,7 +849,7 @@ retry:
if (rc) {
dprintk("RPC:   %s: rdma_create_qp failed %i\n",
__func__, rc);
-   rdma_destroy_id(id);
+   rpcrdma_destroy_id(id);
rc = -ENETUNREACH;
goto out;
}
@@ -845,7 +860,7 @@ retry:
write_unlock(&ia->ri_qplock);
 
rdma_destroy_qp(old);
-   rdma_destroy_id(old);
+   rpcrdma_destroy_id(old);
} else {
dprintk("RPC:   %s: connecting...\n", __func__);
rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] xprtrdma: take vendor driver refcount at client

2015-07-28 Thread Devesh Sharma

On Tue, Jul 28, 2015 at 7:43 PM, Chuck Lever  wrote:
>
> On Jul 28, 2015, at 4:46 AM, Sagi Grimberg  wrote:
>
>> On 7/28/2015 2:01 AM, Devesh Sharma wrote:
>>> Thanks Chuck Lever for the valuable feedback and suggestions.
>>>
>>> This is a rework of the following patch sent almost a year back:
>>> http://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg20730.html
>>>
>>> In presence of active mount if someone tries to rmmod vendor-driver, the
>>> command remains stuck forever waiting for destruction of all rdma-cm-id.
>>> in worst case client can crash during shutdown with active mounts.
>>
>> Ouch, taking a reference on the module preventing it from unloading is
>> not very well behaved (putting it nicely). That's also breaking the
>> layering of ULPs <-> core <-> provider scheme.
>>
>> Why not just cleanup everything upon DEVICE_REMOVAL?
>
> xprtrdma does not support DEVICE_REMOVAL yet. That's why we are
> taking this temporary approach.
>
>
>>> The existing code assumes that ia->ri_id->device cannot change during
>>> the lifetime of a transport. Lifting that assumption is a long chain
>>> of work, and is in plan.
>>>
>>> The community decided that preventing the hang right now is more
>>> important than waiting for architectural changes.
>>
>> Well, if you are putting a bandage here - the code should be documented
>> with a proper FIXME.
>
> That's a good suggestion.

I will put this appropriatly in my next post.

>
>
> --
> Chuck Lever
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v1] xprtrdma: take vendor driver refcount at client

2015-07-29 Thread Devesh Sharma

Thanks Chuck Lever for the valuable feedback and suggestions.

This is a rework of the following patch sent almost a year back:
http://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg20730.html

In presence of active mount if someone tries to rmmod vendor-driver, the
command remains stuck forever waiting for destruction of all rdma-cm-id.
in worst case client can crash during shutdown with active mounts.

The existing code assumes that ia->ri_id->device cannot change during
the lifetime of a transport. xprtrdma do not have support for
DEVICE_REMOVAL event either. Lifting that assumption and adding support
for DEVICE_REMOVAL event is a long chain of work, and is in plan.

The community decided that preventing the hang right now is more
important than waiting for architectural changes.

Thus, this patch intorduces a temorary workaround to acquire module
reference count during the mount of a nfs-rdma mount point.

CC:chuck.le...@oracle.com
CC:linux-...@vger.kernel.org
Signed-off-by: Devesh Sharma 
Reviewed-by: Sagi Grimberg 
---
 net/sunrpc/xprtrdma/verbs.c |   38 ++
 1 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 891c4ed..d59d638 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include  /* try_module_get()/module_put() */
 
 #include "xprt_rdma.h"
 
@@ -414,6 +415,14 @@ connected:
return 0;
 }
 
+static void rpcrdma_destroy_id(struct rdma_cm_id *id)
+{
+   if (id) {
+   module_put(id->device->owner);
+   rdma_destroy_id(id);
+   }
+}
+
 static struct rdma_cm_id *
 rpcrdma_create_id(struct rpcrdma_xprt *xprt,
struct rpcrdma_ia *ia, struct sockaddr *addr)
@@ -440,6 +449,18 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
}
wait_for_completion_interruptible_timeout(&ia->ri_done,
msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
+
+   /* FIXME: We hate to break the notion of ULP<-->Core<-->Provider
+* by calling try_module_get() on vendor driver. This is to prevent a
+* system hang or a possible crash during reboot with active nfs-rdma
+* mount. We will keep this workaround until xprtrdma comes back with a
+* massive architectural changes to have proper fix.
+*/
+   if (!ia->ri_async_rc && !try_module_get(id->device->owner)) {
+   dprintk("RPC:   %s: Failed to get device module\n",
+   __func__);
+   ia->ri_async_rc = -ENODEV;
+   }
rc = ia->ri_async_rc;
if (rc)
goto out;
@@ -449,16 +470,17 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
if (rc) {
dprintk("RPC:   %s: rdma_resolve_route() failed %i\n",
__func__, rc);
-   goto out;
+   goto put;
}
wait_for_completion_interruptible_timeout(&ia->ri_done,
msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
rc = ia->ri_async_rc;
if (rc)
-   goto out;
+   goto put;
 
return id;
-
+put:
+   module_put(id->device->owner);
 out:
rdma_destroy_id(id);
return ERR_PTR(rc);
@@ -592,7 +614,7 @@ out3:
ib_dealloc_pd(ia->ri_pd);
ia->ri_pd = NULL;
 out2:
-   rdma_destroy_id(ia->ri_id);
+   rpcrdma_destroy_id(ia->ri_id);
ia->ri_id = NULL;
 out1:
return rc;
@@ -618,7 +640,7 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
if (ia->ri_id->qp)
rdma_destroy_qp(ia->ri_id);
-   rdma_destroy_id(ia->ri_id);
+   rpcrdma_destroy_id(ia->ri_id);
ia->ri_id = NULL;
}
 
@@ -825,7 +847,7 @@ retry:
if (ia->ri_device != id->device) {
printk("RPC:   %s: can't reconnect on "
"different device!\n", __func__);
-   rdma_destroy_id(id);
+   rpcrdma_destroy_id(id);
rc = -ENETUNREACH;
goto out;
}
@@ -834,7 +856,7 @@ retry:
if (rc) {
dprintk("RPC:   %s: rdma_create_qp failed %i\n",
__func__, rc);
-   rdma_destroy_id(id);
+   rpcrdma_destroy_id(id);
rc = -ENETUNREACH;
goto out;
}
@@ -845,7 +867,7 @@ retry:
write_unlock(&ia->ri_qplock);
 
rdma_destr

[PATCH v1 RESEND] xprtrdma: take vendor driver refcount at client

2015-07-29 Thread Devesh Sharma

Thanks Chuck Lever for the valuable feedback and suggestions.

This is a rework of the following patch sent almost a year back:
http://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg20730.html

In presence of active mount if someone tries to rmmod vendor-driver, the
command remains stuck forever waiting for destruction of all rdma-cm-id.
in worst case client can crash during shutdown with active mounts.

The existing code assumes that ia->ri_id->device cannot change during
the lifetime of a transport. xprtrdma do not have support for
DEVICE_REMOVAL event either. Lifting that assumption and adding support
for DEVICE_REMOVAL event is a long chain of work, and is in plan.

The community decided that preventing the hang right now is more
important than waiting for architectural changes.

Thus, this patch introduces a temporary workaround to acquire module
reference count during the mount of a nfs-rdma mount point.

Cc: chuck.le...@oracle.com
Cc: linux-...@vger.kernel.org
Signed-off-by: Devesh Sharma 
Reviewed-by: Sagi Grimberg 
---
 net/sunrpc/xprtrdma/verbs.c |   38 ++
 1 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 891c4ed..d59d638 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include  /* try_module_get()/module_put() */
 
 #include "xprt_rdma.h"
 
@@ -414,6 +415,14 @@ connected:
return 0;
 }
 
+static void rpcrdma_destroy_id(struct rdma_cm_id *id)
+{
+   if (id) {
+   module_put(id->device->owner);
+   rdma_destroy_id(id);
+   }
+}
+
 static struct rdma_cm_id *
 rpcrdma_create_id(struct rpcrdma_xprt *xprt,
struct rpcrdma_ia *ia, struct sockaddr *addr)
@@ -440,6 +449,18 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
}
wait_for_completion_interruptible_timeout(&ia->ri_done,
msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
+
+   /* FIXME: We hate to break the notion of ULP<-->Core<-->Provider
+* by calling try_module_get() on vendor driver. This is to prevent a
+* system hang or a possible crash during reboot with active nfs-rdma
+* mount. We will keep this workaround until xprtrdma comes back with a
+* massive architectural changes to have proper fix.
+*/
+   if (!ia->ri_async_rc && !try_module_get(id->device->owner)) {
+   dprintk("RPC:   %s: Failed to get device module\n",
+   __func__);
+   ia->ri_async_rc = -ENODEV;
+   }
rc = ia->ri_async_rc;
if (rc)
goto out;
@@ -449,16 +470,17 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
if (rc) {
dprintk("RPC:   %s: rdma_resolve_route() failed %i\n",
__func__, rc);
-   goto out;
+   goto put;
}
wait_for_completion_interruptible_timeout(&ia->ri_done,
msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
rc = ia->ri_async_rc;
if (rc)
-   goto out;
+   goto put;
 
return id;
-
+put:
+   module_put(id->device->owner);
 out:
rdma_destroy_id(id);
return ERR_PTR(rc);
@@ -592,7 +614,7 @@ out3:
ib_dealloc_pd(ia->ri_pd);
ia->ri_pd = NULL;
 out2:
-   rdma_destroy_id(ia->ri_id);
+   rpcrdma_destroy_id(ia->ri_id);
ia->ri_id = NULL;
 out1:
return rc;
@@ -618,7 +640,7 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
if (ia->ri_id->qp)
rdma_destroy_qp(ia->ri_id);
-   rdma_destroy_id(ia->ri_id);
+   rpcrdma_destroy_id(ia->ri_id);
ia->ri_id = NULL;
}
 
@@ -825,7 +847,7 @@ retry:
if (ia->ri_device != id->device) {
printk("RPC:   %s: can't reconnect on "
"different device!\n", __func__);
-   rdma_destroy_id(id);
+   rpcrdma_destroy_id(id);
rc = -ENETUNREACH;
goto out;
}
@@ -834,7 +856,7 @@ retry:
if (rc) {
dprintk("RPC:   %s: rdma_create_qp failed %i\n",
__func__, rc);
-   rdma_destroy_id(id);
+   rpcrdma_destroy_id(id);
rc = -ENETUNREACH;
goto out;
}
@@ -845,7 +867,7 @@ retry:
write_unlock(&ia->ri_qplock);

Re: [PATCH v1] xprtrdma: take vendor driver refcount at client

2015-07-29 Thread Devesh Sharma

On Wed, Jul 29, 2015 at 1:03 PM, Christoph Hellwig  wrote:
> Hi Devesh,
>
> I don't understand your use of "vendor driver" here.  It seems your'e
> talking about the HCA driver.

Yes, I mean to say HCA driver. I will change this in next revision,
its confusing rigth now.

>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] xprtrdma: take HCA driver refcount at client

2015-07-29 Thread Devesh Sharma

Thanks Chuck Lever for the valuable feedback and suggestions.

This is a rework of the following patch sent almost a year back:
http://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg20730.html

In presence of active mount if someone tries to rmmod vendor-driver, the
command remains stuck forever waiting for destruction of all rdma-cm-id.
in worst case client can crash during shutdown with active mounts.

The existing code assumes that ia->ri_id->device cannot change during
the lifetime of a transport. xprtrdma do not have support for
DEVICE_REMOVAL event either. Lifting that assumption and adding support
for DEVICE_REMOVAL event is a long chain of work, and is in plan.

The community decided that preventing the hang right now is more
important than waiting for architectural changes.

Thus, this patch introduces a temporary workaround to acquire HCA driver
module reference count during the mount of a nfs-rdma mount point.

Cc: chuck.le...@oracle.com
Cc: linux-...@vger.kernel.org
Signed-off-by: Devesh Sharma 
Reviewed-by: Sagi Grimberg 
---
 net/sunrpc/xprtrdma/verbs.c |   38 ++
 1 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 891c4ed..1c3c420 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include  /* try_module_get()/module_put() */
 
 #include "xprt_rdma.h"
 
@@ -414,6 +415,14 @@ connected:
return 0;
 }
 
+static void rpcrdma_destroy_id(struct rdma_cm_id *id)
+{
+   if (id) {
+   module_put(id->device->owner);
+   rdma_destroy_id(id);
+   }
+}
+
 static struct rdma_cm_id *
 rpcrdma_create_id(struct rpcrdma_xprt *xprt,
struct rpcrdma_ia *ia, struct sockaddr *addr)
@@ -440,6 +449,18 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
}
wait_for_completion_interruptible_timeout(&ia->ri_done,
msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
+
+   /* FIXME: We hate to break the notion of ULP<-->Core<-->Provider
+* by calling try_module_get() on HCA driver. This is to prevent a
+* system hang or a possible crash during reboot with active nfs-rdma
+* mount. We will keep this workaround until xprtrdma comes back with a
+* massive architectural changes to have proper fix.
+*/
+   if (!ia->ri_async_rc && !try_module_get(id->device->owner)) {
+   dprintk("RPC:   %s: Failed to get device module\n",
+   __func__);
+   ia->ri_async_rc = -ENODEV;
+   }
rc = ia->ri_async_rc;
if (rc)
goto out;
@@ -449,16 +470,17 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
if (rc) {
dprintk("RPC:   %s: rdma_resolve_route() failed %i\n",
__func__, rc);
-   goto out;
+   goto put;
}
wait_for_completion_interruptible_timeout(&ia->ri_done,
msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
rc = ia->ri_async_rc;
if (rc)
-   goto out;
+   goto put;
 
return id;
-
+put:
+   module_put(id->device->owner);
 out:
rdma_destroy_id(id);
return ERR_PTR(rc);
@@ -592,7 +614,7 @@ out3:
ib_dealloc_pd(ia->ri_pd);
ia->ri_pd = NULL;
 out2:
-   rdma_destroy_id(ia->ri_id);
+   rpcrdma_destroy_id(ia->ri_id);
ia->ri_id = NULL;
 out1:
return rc;
@@ -618,7 +640,7 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
if (ia->ri_id->qp)
rdma_destroy_qp(ia->ri_id);
-   rdma_destroy_id(ia->ri_id);
+   rpcrdma_destroy_id(ia->ri_id);
ia->ri_id = NULL;
}
 
@@ -825,7 +847,7 @@ retry:
if (ia->ri_device != id->device) {
printk("RPC:   %s: can't reconnect on "
"different device!\n", __func__);
-   rdma_destroy_id(id);
+   rpcrdma_destroy_id(id);
rc = -ENETUNREACH;
goto out;
}
@@ -834,7 +856,7 @@ retry:
if (rc) {
dprintk("RPC:   %s: rdma_create_qp failed %i\n",
__func__, rc);
-   rdma_destroy_id(id);
+   rpcrdma_destroy_id(id);
rc = -ENETUNREACH;
goto out;
}
@@ -845,7 +867,7 @@ retry:
write_unlock(&ia->ri_qplock);

Re: [PATCH v1 03/18] xprtrdma: Remove completion polling budgets

2015-09-17 Thread Devesh Sharma

On Fri, Sep 18, 2015 at 2:14 AM, Chuck Lever  wrote:
>
> Commit 8301a2c047cc ("xprtrdma: Limit work done by completion
> handler") was supposed to prevent xprtrdma's upcall handlers from
> starving other softIRQ work by letting them return to the provider
> before all CQEs have been polled.
>
> The logic assumes the provider will call the upcall handler again
> immediately if the CQ is re-armed while there are still queued CQEs.
>
> This assumption is invalid. The IBTA spec says that after a CQ is
> armed, the hardware must interrupt only when a new CQE is inserted.
> xprtrdma can't rely on the provider calling again, even though some
> providers do.
>
> Therefore, leaving CQEs on queue makes sense only when there is
> another mechanism that ensures all remaining CQEs are consumed in a
> timely fashion. xprtrdma does not have such a mechanism. If a CQE
> remains queued, the transport can wait forever to send the next RPC.
>
> Finally, move the wcs array back onto the stack to ensure that the
> poll array is always local to the CPU where the completion upcall is
> running.
>
> Fixes: 8301a2c047cc ("xprtrdma: Limit work done by completion ...")
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/verbs.c |  100 
> ++-
>  net/sunrpc/xprtrdma/xprt_rdma.h |5 --
>  2 files changed, 45 insertions(+), 60 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 8a477e2..f2e3863 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -158,34 +158,37 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
> }
>  }
>
> -static int
> +/* The wc array is on stack: automatic memory is always CPU-local.
> + *
> + * The common case is a single completion is ready. By asking
> + * for two entries, a return code of 1 means there is exactly
> + * one completion and no more. We don't have to poll again to
> + * know that the CQ is now empty.
> + */
> +static void
>  rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
>  {
> -   struct ib_wc *wcs;
> -   int budget, count, rc;
> +   struct ib_wc *pos, wcs[2];
> +   int count, rc;
>
> -   budget = RPCRDMA_WC_BUDGET / RPCRDMA_POLLSIZE;
> do {
> -   wcs = ep->rep_send_wcs;
> +   pos = wcs;
>
> -   rc = ib_poll_cq(cq, RPCRDMA_POLLSIZE, wcs);
> -   if (rc <= 0)
> -   return rc;
> +   rc = ib_poll_cq(cq, ARRAY_SIZE(wcs), pos);
> +   if (rc < 0)
> +   goto out_warn;
>
> count = rc;
> while (count-- > 0)
> -   rpcrdma_sendcq_process_wc(wcs++);
> -   } while (rc == RPCRDMA_POLLSIZE && --budget);
> -   return 0;
> +   rpcrdma_sendcq_process_wc(pos++);
> +   } while (rc == ARRAY_SIZE(wcs));

I think I have missed something and not able to understand the reason
for polling 2 CQEs in one poll? It is possible that in a given poll_cq
call you end up getting on 1 completion, the other completion is
delayed due to some reason. Would it be better to poll for 1 in every
poll call Or
otherwise have this
while ( rc <= ARRAY_SIZE(wcs) && rc);

> +   return;
> +
> +out_warn:
> +   pr_warn("RPC:   %s: ib_poll_cq() failed %i\n", __func__, rc);
>  }
>
> -/*
> - * Handle send, fast_reg_mr, and local_inv completions.
> - *
> - * Send events are typically suppressed and thus do not result
> - * in an upcall. Occasionally one is signaled, however. This
> - * prevents the provider's completion queue from wrapping and
> - * losing a completion.
> +/* Handle provider send completion upcalls.
>   */
>  static void
>  rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
> @@ -193,12 +196,7 @@ rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
> struct rpcrdma_ep *ep = (struct rpcrdma_ep *)cq_context;
> int rc;
>
> -   rc = rpcrdma_sendcq_poll(cq, ep);
> -   if (rc) {
> -   dprintk("RPC:   %s: ib_poll_cq failed: %i\n",
> -   __func__, rc);
> -   return;
> -   }
> +   rpcrdma_sendcq_poll(cq, ep);
>
> rc = ib_req_notify_cq(cq,
> IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
> @@ -247,44 +245,41 @@ out_fail:
> goto out_schedule;
>  }
>
> -static int
> +/* The wc array is on stack: automatic memory is always CPU-local.
> + *
> + * struct ib_wc is 64 bytes, making the poll array potentially
> + * large. But this is at the bottom of the call chain. Further
> + * substantial work is done in another thread.
> + */
> +static void
>  rpcrdma_recvcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
>  {
> -   struct list_head sched_list;
> -   struct ib_wc *wcs;
> -   int budget, count, rc;
> +   struct ib_wc *pos, wcs[4];
> +   LIST_HEAD(sched_list);
> +   int count, rc;
>
> -   INIT_LIST_HEAD(&sched_list);
> -

Re: [PATCH v1 03/18] xprtrdma: Remove completion polling budgets

2015-09-21 Thread Devesh Sharma

On Fri, Sep 18, 2015 at 7:49 PM, Chuck Lever  wrote:
> Hi Devesh-
>
>
> On Sep 18, 2015, at 2:52 AM, Devesh Sharma  
> wrote:
>
>> On Fri, Sep 18, 2015 at 2:14 AM, Chuck Lever  wrote:
>>>
>>> Commit 8301a2c047cc ("xprtrdma: Limit work done by completion
>>> handler") was supposed to prevent xprtrdma's upcall handlers from
>>> starving other softIRQ work by letting them return to the provider
>>> before all CQEs have been polled.
>>>
>>> The logic assumes the provider will call the upcall handler again
>>> immediately if the CQ is re-armed while there are still queued CQEs.
>>>
>>> This assumption is invalid. The IBTA spec says that after a CQ is
>>> armed, the hardware must interrupt only when a new CQE is inserted.
>>> xprtrdma can't rely on the provider calling again, even though some
>>> providers do.
>>>
>>> Therefore, leaving CQEs on queue makes sense only when there is
>>> another mechanism that ensures all remaining CQEs are consumed in a
>>> timely fashion. xprtrdma does not have such a mechanism. If a CQE
>>> remains queued, the transport can wait forever to send the next RPC.
>>>
>>> Finally, move the wcs array back onto the stack to ensure that the
>>> poll array is always local to the CPU where the completion upcall is
>>> running.
>>>
>>> Fixes: 8301a2c047cc ("xprtrdma: Limit work done by completion ...")
>>> Signed-off-by: Chuck Lever 
>>> ---
>>> net/sunrpc/xprtrdma/verbs.c |  100 
>>> ++-
>>> net/sunrpc/xprtrdma/xprt_rdma.h |5 --
>>> 2 files changed, 45 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index 8a477e2..f2e3863 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -158,34 +158,37 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
>>>}
>>> }
>>>
>>> -static int
>>> +/* The wc array is on stack: automatic memory is always CPU-local.
>>> + *
>>> + * The common case is a single completion is ready. By asking
>>> + * for two entries, a return code of 1 means there is exactly
>>> + * one completion and no more. We don't have to poll again to
>>> + * know that the CQ is now empty.
>>> + */
>>> +static void
>>> rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
>>> {
>>> -   struct ib_wc *wcs;
>>> -   int budget, count, rc;
>>> +   struct ib_wc *pos, wcs[2];
>>> +   int count, rc;
>>>
>>> -   budget = RPCRDMA_WC_BUDGET / RPCRDMA_POLLSIZE;
>>>do {
>>> -   wcs = ep->rep_send_wcs;
>>> +   pos = wcs;
>>>
>>> -   rc = ib_poll_cq(cq, RPCRDMA_POLLSIZE, wcs);
>>> -   if (rc <= 0)
>>> -   return rc;
>>> +   rc = ib_poll_cq(cq, ARRAY_SIZE(wcs), pos);
>>> +   if (rc < 0)
>>> +   goto out_warn;
>>>
>>>count = rc;
>>>while (count-- > 0)
>>> -   rpcrdma_sendcq_process_wc(wcs++);
>>> -   } while (rc == RPCRDMA_POLLSIZE && --budget);
>>> -   return 0;
>>> +   rpcrdma_sendcq_process_wc(pos++);
>>> +   } while (rc == ARRAY_SIZE(wcs));
>>
>> I think I have missed something and not able to understand the reason
>> for polling 2 CQEs in one poll?
>
> See the block comment above.
>
> When ib_poll_cq() returns the same number of WCs as the
> consumer requested, there may still be CQEs waiting to
> be polled. Another call to ib_poll_cq() is needed to find
> out if that's the case.

True...

>
> When ib_poll_cq() returns fewer WCs than the consumer
> requested, the consumer doesn't have to call again to
> know that the CQ is empty.

Agree, the while loop will terminate here. What if immediately after
the vendor_poll_cq() decided to report only 1 CQE and terminate
polling loop, another CQE is added. This new CQE will be polled only
after T usec (where T is interrupt-latency).

>
> The common case, by far, is that one CQE is ready. By
> requesting two, the number returned is less than the
> number requested, and the consumer can tell immediately
> that the CQE is drained. The extra ib_poll_cq call is
> avoided.
>
> N

Re: [PATCH v1 03/18] xprtrdma: Remove completion polling budgets

2015-09-21 Thread Devesh Sharma

On Sun, Sep 20, 2015 at 4:05 PM, Sagi Grimberg  wrote:
>>> It is possible that in a given poll_cq
>>> call you end up getting on 1 completion, the other completion is
>>> delayed due to some reason.
>>
>>
>> If a CQE is allowed to be delayed, how does polling
>> again guarantee that the consumer can retrieve it?
>>
>> What happens if a signal occurs, there is only one CQE,
>> but it is delayed? ib_poll_cq would return 0 in that
>> case, and the consumer would never call again, thinking
>> the CQ is empty. There's no way the consumer can know
>> for sure when a CQ is drained.
>>
>> If the delayed CQE happens only when there is more
>> than one CQE, how can polling multiple WCs ever work
>> reliably?
>>
>> Maybe I don't understand what is meant by delayed.
>>
>
> If I'm not mistaken, Devesh meant that if between ib_poll_cq (where you
> polled the last 2 wcs) until the while statement another CQE was
> generated then you lost a bit of efficiency. Correct?

Yes, That's the point.

>
>
>>
>>> Would it be better to poll for 1 in every
>>> poll call Or
>>> otherwise have this
>>> while ( rc <= ARRAY_SIZE(wcs) && rc);
>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 01/18] xprtrdma: Enable swap-on-NFS/RDMA

2015-09-21 Thread Devesh Sharma

Looks Good.

On Fri, Sep 18, 2015 at 2:14 AM, Chuck Lever  wrote:
> After adding a swapfile on an NFS/RDMA mount and removing the
> normal swap partition, I was able to push the NFS client well
> into swap without any issue.
>
> I forgot to swapoff the NFS file before rebooting. This pinned
> the NFS mount and the IB core and provider, causing shutdown to
> hang. I think this is expected and safe behavior. Probably
> shutdown scripts should "swapoff -a" before unmounting any
> filesystems.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/transport.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index 41e452b..e9e5ed7 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -676,7 +676,7 @@ static void xprt_rdma_print_stats(struct rpc_xprt *xprt, 
> struct seq_file *seq)
>  static int
>  xprt_rdma_enable_swap(struct rpc_xprt *xprt)
>  {
> -   return -EINVAL;
> +   return 0;
>  }
>
>  static void
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 02/18] xprtrdma: Replace global lkey with lkey local to PD

2015-09-21 Thread Devesh Sharma

Looks good, will test this ocrdma and update you.

On Fri, Sep 18, 2015 at 2:14 AM, Chuck Lever  wrote:
> The core API has changed so that devices that do not have a global
> DMA lkey automatically create an mr, per-PD, and make that lkey
> available. The global DMA lkey interface is going away in favor of
> the per-PD DMA lkey.
>
> The per-PD DMA lkey is always available. Convert xprtrdma to use the
> device's per-PD DMA lkey for regbufs, no matter which memory
> registration scheme is in use.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/fmr_ops.c  |   19 ---
>  net/sunrpc/xprtrdma/frwr_ops.c |5 -
>  net/sunrpc/xprtrdma/physical_ops.c |   10 +-
>  net/sunrpc/xprtrdma/verbs.c|2 +-
>  net/sunrpc/xprtrdma/xprt_rdma.h|1 -
>  5 files changed, 2 insertions(+), 35 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
> index cb25c89..f1e8daf 100644
> --- a/net/sunrpc/xprtrdma/fmr_ops.c
> +++ b/net/sunrpc/xprtrdma/fmr_ops.c
> @@ -39,25 +39,6 @@ static int
>  fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
> struct rpcrdma_create_data_internal *cdata)
>  {
> -   struct ib_device_attr *devattr = &ia->ri_devattr;
> -   struct ib_mr *mr;
> -
> -   /* Obtain an lkey to use for the regbufs, which are
> -* protected from remote access.
> -*/
> -   if (devattr->device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
> -   ia->ri_dma_lkey = ia->ri_device->local_dma_lkey;
> -   } else {
> -   mr = ib_get_dma_mr(ia->ri_pd, IB_ACCESS_LOCAL_WRITE);
> -   if (IS_ERR(mr)) {
> -   pr_err("%s: ib_get_dma_mr for failed with %lX\n",
> -  __func__, PTR_ERR(mr));
> -   return -ENOMEM;
> -   }
> -   ia->ri_dma_lkey = ia->ri_dma_mr->lkey;
> -   ia->ri_dma_mr = mr;
> -   }
> -
> return 0;
>  }
>
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 21b3efb..004f1ad 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -189,11 +189,6 @@ frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep 
> *ep,
> struct ib_device_attr *devattr = &ia->ri_devattr;
> int depth, delta;
>
> -   /* Obtain an lkey to use for the regbufs, which are
> -* protected from remote access.
> -*/
> -   ia->ri_dma_lkey = ia->ri_device->local_dma_lkey;
> -
> ia->ri_max_frmr_depth =
> min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
>   devattr->max_fast_reg_page_list_len);
> diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
> b/net/sunrpc/xprtrdma/physical_ops.c
> index 72cf8b1..617b76f 100644
> --- a/net/sunrpc/xprtrdma/physical_ops.c
> +++ b/net/sunrpc/xprtrdma/physical_ops.c
> @@ -23,7 +23,6 @@ static int
>  physical_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
>  struct rpcrdma_create_data_internal *cdata)
>  {
> -   struct ib_device_attr *devattr = &ia->ri_devattr;
> struct ib_mr *mr;
>
> /* Obtain an rkey to use for RPC data payloads.
> @@ -37,15 +36,8 @@ physical_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep 
> *ep,
>__func__, PTR_ERR(mr));
> return -ENOMEM;
> }
> -   ia->ri_dma_mr = mr;
> -
> -   /* Obtain an lkey to use for regbufs.
> -*/
> -   if (devattr->device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
> -   ia->ri_dma_lkey = ia->ri_device->local_dma_lkey;
> -   else
> -   ia->ri_dma_lkey = ia->ri_dma_mr->lkey;
>
> +   ia->ri_dma_mr = mr;
> return 0;
>  }
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 01a314a..8a477e2 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1255,7 +1255,7 @@ rpcrdma_alloc_regbuf(struct rpcrdma_ia *ia, size_t 
> size, gfp_t flags)
> goto out_free;
>
> iov->length = size;
> -   iov->lkey = ia->ri_dma_lkey;
> +   iov->lkey = ia->ri_pd->local_dma_lkey;
> rb->rg_size = size;
> rb->rg_owner = NULL;
> return rb;
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index 0251222..c09414e 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -65,7 +65,6 @@ struct rpcrdma_ia {
> struct rdma_cm_id   *ri_id;
> struct ib_pd*ri_pd;
> struct ib_mr*ri_dma_mr;
> -   u32 ri_dma_lkey;
> struct completion   ri_done;
> int ri_async_rc;
> unsigned intri_max_frmr_depth;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More ma

Re: [PATCH v1 04/18] xprtrdma: Refactor reply handler error handling

2015-09-21 Thread Devesh Sharma

Looks good.

On Fri, Sep 18, 2015 at 2:14 AM, Chuck Lever  wrote:
> Clean up: The error cases in rpcrdma_reply_handler() almost never
> execute. Ensure the compiler places them out of the hot path.
>
> No behavior change expected.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/rpc_rdma.c  |   90 
> ++-
>  net/sunrpc/xprtrdma/verbs.c |2 -
>  net/sunrpc/xprtrdma/xprt_rdma.h |2 +
>  3 files changed, 54 insertions(+), 40 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index bc8bd65..287c874 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -741,52 +741,27 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
> unsigned long cwnd;
> u32 credits;
>
> -   /* Check status. If bad, signal disconnect and return rep to pool */
> -   if (rep->rr_len == ~0U) {
> -   rpcrdma_recv_buffer_put(rep);
> -   if (r_xprt->rx_ep.rep_connected == 1) {
> -   r_xprt->rx_ep.rep_connected = -EIO;
> -   rpcrdma_conn_func(&r_xprt->rx_ep);
> -   }
> -   return;
> -   }
> -   if (rep->rr_len < RPCRDMA_HDRLEN_MIN) {
> -   dprintk("RPC:   %s: short/invalid reply\n", __func__);
> -   goto repost;
> -   }
> +   dprintk("RPC:   %s: incoming rep %p\n", __func__, rep);
> +
> +   if (rep->rr_len == RPCRDMA_BAD_LEN)
> +   goto out_badstatus;
> +   if (rep->rr_len < RPCRDMA_HDRLEN_MIN)
> +   goto out_shortreply;
> +
> headerp = rdmab_to_msg(rep->rr_rdmabuf);
> -   if (headerp->rm_vers != rpcrdma_version) {
> -   dprintk("RPC:   %s: invalid version %d\n",
> -   __func__, be32_to_cpu(headerp->rm_vers));
> -   goto repost;
> -   }
> +   if (headerp->rm_vers != rpcrdma_version)
> +   goto out_badversion;
>
> /* Get XID and try for a match. */
> spin_lock(&xprt->transport_lock);
> rqst = xprt_lookup_rqst(xprt, headerp->rm_xid);
> -   if (rqst == NULL) {
> -   spin_unlock(&xprt->transport_lock);
> -   dprintk("RPC:   %s: reply 0x%p failed "
> -   "to match any request xid 0x%08x len %d\n",
> -   __func__, rep, be32_to_cpu(headerp->rm_xid),
> -   rep->rr_len);
> -repost:
> -   r_xprt->rx_stats.bad_reply_count++;
> -   if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, &r_xprt->rx_ep, rep))
> -   rpcrdma_recv_buffer_put(rep);
> -
> -   return;
> -   }
> +   if (!rqst)
> +   goto out_nomatch;
>
> /* get request object */
> req = rpcr_to_rdmar(rqst);
> -   if (req->rl_reply) {
> -   spin_unlock(&xprt->transport_lock);
> -   dprintk("RPC:   %s: duplicate reply 0x%p to RPC "
> -   "request 0x%p: xid 0x%08x\n", __func__, rep, req,
> -   be32_to_cpu(headerp->rm_xid));
> -   goto repost;
> -   }
> +   if (req->rl_reply)
> +   goto out_duplicate;
>
> dprintk("RPC:   %s: reply 0x%p completes request 0x%p\n"
> "   RPC request 0x%p xid 0x%08x\n",
> @@ -883,8 +858,45 @@ badheader:
> if (xprt->cwnd > cwnd)
> xprt_release_rqst_cong(rqst->rq_task);
>
> +   xprt_complete_rqst(rqst->rq_task, status);
> +   spin_unlock(&xprt->transport_lock);
> dprintk("RPC:   %s: xprt_complete_rqst(0x%p, 0x%p, %d)\n",
> __func__, xprt, rqst, status);
> -   xprt_complete_rqst(rqst->rq_task, status);
> +   return;
> +
> +out_badstatus:
> +   rpcrdma_recv_buffer_put(rep);
> +   if (r_xprt->rx_ep.rep_connected == 1) {
> +   r_xprt->rx_ep.rep_connected = -EIO;
> +   rpcrdma_conn_func(&r_xprt->rx_ep);
> +   }
> +   return;
> +
> +out_shortreply:
> +   dprintk("RPC:   %s: short/invalid reply\n", __func__);
> +   goto repost;
> +
> +out_badversion:
> +   dprintk("RPC:   %s: invalid version %d\n",
> +   __func__, be32_to_cpu(headerp->rm_vers));
> +   goto repost;
> +
> +out_nomatch:
> +   spin_unlock(&xprt->transport_lock);
> +   dprintk("RPC:   %s: reply 0x%p failed "
> +   "to match any request xid 0x%08x len %d\n",
> +   __func__, rep, be32_to_cpu(headerp->rm_xid),
> +   rep->rr_len);
> +   goto repost;
> +
> +out_duplicate:
> spin_unlock(&xprt->transport_lock);
> +   dprintk("RPC:   %s: duplicate reply 0x%p to RPC "
> +   "request 0x%p: xid 0x%08x\n", __func__, rep, req,
> +   be32_to_cpu(headerp->rm_xid));
> +
> +repost:
> +   r_xprt->rx_stats.bad_reply_count++;
> +   if (rpcrdma_ep_post_recv(&r_

Re: [PATCH v1 07/18] xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers

2015-09-21 Thread Devesh Sharma

Looks good.

On Fri, Sep 18, 2015 at 2:15 AM, Chuck Lever  wrote:
> xprtrdma's backward direction send and receive buffers are the same
> size as the forechannel's inline threshold, and must be pre-
> registered.
>
> The consumer has no control over which receive buffer the adapter
> chooses to catch an incoming backwards-direction call. Any receive
> buffer can be used for either a forward reply or a backward call.
> Thus both types of RPC message must all be the same size.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/Makefile  |1
>  net/sunrpc/xprtrdma/backchannel.c |  204 
> +
>  net/sunrpc/xprtrdma/transport.c   |7 +
>  net/sunrpc/xprtrdma/verbs.c   |   92 ++---
>  net/sunrpc/xprtrdma/xprt_rdma.h   |   20 
>  5 files changed, 309 insertions(+), 15 deletions(-)
>  create mode 100644 net/sunrpc/xprtrdma/backchannel.c
>
> diff --git a/net/sunrpc/xprtrdma/Makefile b/net/sunrpc/xprtrdma/Makefile
> index 48913de..33f99d3 100644
> --- a/net/sunrpc/xprtrdma/Makefile
> +++ b/net/sunrpc/xprtrdma/Makefile
> @@ -5,3 +5,4 @@ rpcrdma-y := transport.o rpc_rdma.o verbs.o \
> svc_rdma.o svc_rdma_transport.o \
> svc_rdma_marshal.o svc_rdma_sendto.o svc_rdma_recvfrom.o \
> module.o
> +rpcrdma-$(CONFIG_SUNRPC_BACKCHANNEL) += backchannel.o
> diff --git a/net/sunrpc/xprtrdma/backchannel.c 
> b/net/sunrpc/xprtrdma/backchannel.c
> new file mode 100644
> index 000..c0a42ad
> --- /dev/null
> +++ b/net/sunrpc/xprtrdma/backchannel.c
> @@ -0,0 +1,204 @@
> +/*
> + * Copyright (c) 2015 Oracle.  All rights reserved.
> + *
> + * Support for backward direction RPCs on RPC/RDMA.
> + */
> +
> +#include 
> +
> +#include "xprt_rdma.h"
> +
> +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
> +# define RPCDBG_FACILITY   RPCDBG_TRANS
> +#endif
> +
> +static void rpcrdma_bc_free_rqst(struct rpcrdma_xprt *r_xprt,
> +struct rpc_rqst *rqst)
> +{
> +   struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
> +   struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
> +
> +   spin_lock(&buf->rb_reqslock);
> +   list_del(&req->rl_all);
> +   spin_unlock(&buf->rb_reqslock);
> +
> +   rpcrdma_destroy_req(&r_xprt->rx_ia, req);
> +
> +   kfree(rqst);
> +}
> +
> +static int rpcrdma_bc_setup_rqst(struct rpcrdma_xprt *r_xprt,
> +struct rpc_rqst *rqst)
> +{
> +   struct rpcrdma_ia *ia = &r_xprt->rx_ia;
> +   struct rpcrdma_regbuf *rb;
> +   struct rpcrdma_req *req;
> +   struct xdr_buf *buf;
> +   size_t size;
> +
> +   req = rpcrdma_create_req(r_xprt);
> +   if (!req)
> +   return -ENOMEM;
> +   req->rl_backchannel = true;
> +
> +   size = RPCRDMA_INLINE_WRITE_THRESHOLD(rqst);
> +   rb = rpcrdma_alloc_regbuf(ia, size, GFP_KERNEL);
> +   if (IS_ERR(rb))
> +   goto out_fail;
> +   req->rl_rdmabuf = rb;
> +
> +   size += RPCRDMA_INLINE_READ_THRESHOLD(rqst);
> +   rb = rpcrdma_alloc_regbuf(ia, size, GFP_KERNEL);
> +   if (IS_ERR(rb))
> +   goto out_fail;
> +   rb->rg_owner = req;
> +   req->rl_sendbuf = rb;
> +   /* so that rpcr_to_rdmar works when receiving a request */
> +   rqst->rq_buffer = (void *)req->rl_sendbuf->rg_base;
> +
> +   buf = &rqst->rq_snd_buf;
> +   buf->head[0].iov_base = rqst->rq_buffer;
> +   buf->head[0].iov_len = 0;
> +   buf->tail[0].iov_base = NULL;
> +   buf->tail[0].iov_len = 0;
> +   buf->page_len = 0;
> +   buf->len = 0;
> +   buf->buflen = size;
> +
> +   return 0;
> +
> +out_fail:
> +   rpcrdma_bc_free_rqst(r_xprt, rqst);
> +   return -ENOMEM;
> +}
> +
> +/* Allocate and add receive buffers to the rpcrdma_buffer's existing
> + * list of rep's. These are released when the transport is destroyed. */
> +static int rpcrdma_bc_setup_reps(struct rpcrdma_xprt *r_xprt,
> +unsigned int count)
> +{
> +   struct rpcrdma_buffer *buffers = &r_xprt->rx_buf;
> +   struct rpcrdma_rep *rep;
> +   unsigned long flags;
> +   int rc = 0;
> +
> +   while (count--) {
> +   rep = rpcrdma_create_rep(r_xprt);
> +   if (IS_ERR(rep)) {
> +   pr_err("RPC:   %s: reply buffer alloc failed\n",
> +  __func__);
> +   rc = PTR_ERR(rep);
> +   break;
> +   }
> +
> +   spin_lock_irqsave(&buffers->rb_lock, flags);
> +   list_add(&rep->rr_list, &buffers->rb_recv_bufs);
> +   spin_unlock_irqrestore(&buffers->rb_lock, flags);
> +   }
> +
> +   return rc;
> +}
> +
> +/**
> + * xprt_rdma_bc_setup - Pre-allocate resources for handling backchannel 
> requests
> + * @xprt: transport associated with these backchannel resources
> + * @reqs: number of concurrent incoming requests to expect
> + *
> + * Returns 0 on success; oth

Re: [PATCH v1 08/18] xprtrdma: Pre-allocate Work Requests for backchannel

2015-09-21 Thread Devesh Sharma

On Fri, Sep 18, 2015 at 2:15 AM, Chuck Lever  wrote:
> Pre-allocate extra send and receive Work Requests needed to handle
> backchannel receives and sends.
>
> The transport doesn't know how many extra WRs to pre-allocate until
> the xprt_setup_backchannel() call, but that's long after the WRs are
> allocated during forechannel setup.
>
> So, use a fixed value for now.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/backchannel.c |4 
>  net/sunrpc/xprtrdma/verbs.c   |   14 --
>  net/sunrpc/xprtrdma/xprt_rdma.h   |   10 ++
>  3 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/backchannel.c 
> b/net/sunrpc/xprtrdma/backchannel.c
> index c0a42ad..f5c7122 100644
> --- a/net/sunrpc/xprtrdma/backchannel.c
> +++ b/net/sunrpc/xprtrdma/backchannel.c
> @@ -123,6 +123,9 @@ int xprt_rdma_bc_setup(struct rpc_xprt *xprt, unsigned 
> int reqs)
>  * Twice as many rpc_rqsts are prepared to ensure there is
>  * always an rpc_rqst available as soon as a reply is sent.
>  */
> +   if (reqs > RPCRDMA_BACKWARD_WRS >> 1)
> +   goto out_err;
> +
> for (i = 0; i < (reqs << 1); i++) {
> rqst = kzalloc(sizeof(*rqst), GFP_KERNEL);
> if (!rqst) {
> @@ -159,6 +162,7 @@ int xprt_rdma_bc_setup(struct rpc_xprt *xprt, unsigned 
> int reqs)
>  out_free:
> xprt_rdma_bc_destroy(xprt, reqs);
>
> +out_err:
> pr_err("RPC:   %s: setup backchannel transport failed\n", 
> __func__);
> return -ENOMEM;
>  }
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 1e4a948..133c720 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -614,6 +614,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct 
> rpcrdma_ia *ia,
> struct ib_device_attr *devattr = &ia->ri_devattr;
> struct ib_cq *sendcq, *recvcq;
> struct ib_cq_init_attr cq_attr = {};
> +   unsigned int max_qp_wr;
> int rc, err;
>
> if (devattr->max_sge < RPCRDMA_MAX_IOVS) {
> @@ -622,18 +623,27 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct 
> rpcrdma_ia *ia,
> return -ENOMEM;
> }
>
> +   if (devattr->max_qp_wr <= RPCRDMA_BACKWARD_WRS) {
> +   dprintk("RPC:   %s: insufficient wqe's available\n",
> +   __func__);
> +   return -ENOMEM;
> +   }
> +   max_qp_wr = devattr->max_qp_wr - RPCRDMA_BACKWARD_WRS;
> +
> /* check provider's send/recv wr limits */
> -   if (cdata->max_requests > devattr->max_qp_wr)
> -   cdata->max_requests = devattr->max_qp_wr;
> +   if (cdata->max_requests > max_qp_wr)
> +   cdata->max_requests = max_qp_wr;

should we
cdata->max_request = max_qp_wr - RPCRDMA_BACKWARD_WRS?

>
> ep->rep_attr.event_handler = rpcrdma_qp_async_error_upcall;
> ep->rep_attr.qp_context = ep;
> ep->rep_attr.srq = NULL;
> ep->rep_attr.cap.max_send_wr = cdata->max_requests;
> +   ep->rep_attr.cap.max_send_wr += RPCRDMA_BACKWARD_WRS;

Looks like will cause a qp-create failure if any hypothetical device
supports devattr->max_qp_wr = cdata->max_requests

> rc = ia->ri_ops->ro_open(ia, ep, cdata);
> if (rc)
> return rc;
> ep->rep_attr.cap.max_recv_wr = cdata->max_requests;
> +   ep->rep_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS;
> ep->rep_attr.cap.max_send_sge = RPCRDMA_MAX_IOVS;
> ep->rep_attr.cap.max_recv_sge = 1;
> ep->rep_attr.cap.max_inline_data = 0;
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index 2ca0567..37d0d7f 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -101,6 +101,16 @@ struct rpcrdma_ep {
>   */
>  #define RPCRDMA_IGNORE_COMPLETION  (0ULL)
>
> +/* Pre-allocate extra Work Requests for handling backward receives
> + * and sends. This is a fixed value because the Work Queues are
> + * allocated when the forward channel is set up.
> + */
> +#if defined(CONFIG_SUNRPC_BACKCHANNEL)
> +#define RPCRDMA_BACKWARD_WRS   (8)
> +#else
> +#define RPCRDMA_BACKWARD_WRS   (0)
> +#endif
> +
>  /* Registered buffer -- registered kmalloc'd memory for RDMA SEND/RECV
>   *
>   * The below structure appears at the front of a large region of kmalloc'd
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 03/18] xprtrdma: Remove completion polling budgets

2015-09-22 Thread Devesh Sharma

On Mon, Sep 21, 2015 at 9:15 PM, Chuck Lever  wrote:
>
>> On Sep 21, 2015, at 1:51 AM, Devesh Sharma  
>> wrote:
>>
>> On Sun, Sep 20, 2015 at 4:05 PM, Sagi Grimberg  
>> wrote:
>>>>> It is possible that in a given poll_cq
>>>>> call you end up getting on 1 completion, the other completion is
>>>>> delayed due to some reason.
>>>>
>>>>
>>>> If a CQE is allowed to be delayed, how does polling
>>>> again guarantee that the consumer can retrieve it?
>>>>
>>>> What happens if a signal occurs, there is only one CQE,
>>>> but it is delayed? ib_poll_cq would return 0 in that
>>>> case, and the consumer would never call again, thinking
>>>> the CQ is empty. There's no way the consumer can know
>>>> for sure when a CQ is drained.
>>>>
>>>> If the delayed CQE happens only when there is more
>>>> than one CQE, how can polling multiple WCs ever work
>>>> reliably?
>>>>
>>>> Maybe I don't understand what is meant by delayed.
>>>>
>>>
>>> If I'm not mistaken, Devesh meant that if between ib_poll_cq (where you
>>> polled the last 2 wcs) until the while statement another CQE was
>>> generated then you lost a bit of efficiency. Correct?
>>
>> Yes, That's the point.
>
> I’m optimizing for the common case where 1 CQE is ready
> to be polled. How much of an efficiency loss are you
> talking about, how often would this loss occur, and is
> this a problem for all providers / devices?

The scenario would happen or not is difficult to predict, but its
quite possible with any vendor based on load on PCI bus I guess.
This may affect the latency figures though.

>
> Is this an issue for the current arrangement where 8 WCs
> are polled at a time?

Yes, its there even today.

>
>
> —
> Chuck Lever
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 03/16] xprtrdma: Prevent loss of completion signals

2015-10-06 Thread Devesh Sharma

Looks good!

Reviewed-By: Devesh Sharma 

On Tue, Oct 6, 2015 at 8:28 PM, Chuck Lever  wrote:
> Commit 8301a2c047cc ("xprtrdma: Limit work done by completion
> handler") was supposed to prevent xprtrdma's upcall handlers from
> starving other softIRQ work by letting them return to the provider
> before all CQEs have been polled.
>
> The logic assumes the provider will call the upcall handler again
> immediately if the CQ is re-armed while there are still queued CQEs.
>
> This assumption is invalid. The IBTA spec says that after a CQ is
> armed, the hardware must interrupt only when a new CQE is inserted.
> xprtrdma can't rely on the provider calling again, even though some
> providers do.
>
> Therefore, leaving CQEs on queue makes sense only when there is
> another mechanism that ensures all remaining CQEs are consumed in a
> timely fashion. xprtrdma does not have such a mechanism. If a CQE
> remains queued, the transport can wait forever to send the next RPC.
>
> Finally, move the wcs array back onto the stack to ensure that the
> poll array is always local to the CPU where the completion upcall is
> running.
>
> Fixes: 8301a2c047cc ("xprtrdma: Limit work done by completion ...")
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/verbs.c |   74 
> ---
>  net/sunrpc/xprtrdma/xprt_rdma.h |5 ---
>  2 files changed, 38 insertions(+), 41 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index c713909..e9599e9 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -158,25 +158,30 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
> }
>  }
>
> -static int
> -rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
> +/* The common case is a single send completion is waiting. By
> + * passing two WC entries to ib_poll_cq, a return code of 1
> + * means there is exactly one WC waiting and no more. We don't
> + * have to invoke ib_poll_cq again to know that the CQ has been
> + * properly drained.
> + */
> +static void
> +rpcrdma_sendcq_poll(struct ib_cq *cq)
>  {
> -   struct ib_wc *wcs;
> -   int budget, count, rc;
> +   struct ib_wc *pos, wcs[2];
> +   int count, rc;
>
> -   budget = RPCRDMA_WC_BUDGET / RPCRDMA_POLLSIZE;
> do {
> -   wcs = ep->rep_send_wcs;
> +   pos = wcs;
>
> -   rc = ib_poll_cq(cq, RPCRDMA_POLLSIZE, wcs);
> -   if (rc <= 0)
> -   return rc;
> +   rc = ib_poll_cq(cq, ARRAY_SIZE(wcs), pos);
> +   if (rc < 0)
> +   break;
>
> count = rc;
> while (count-- > 0)
> -   rpcrdma_sendcq_process_wc(wcs++);
> -   } while (rc == RPCRDMA_POLLSIZE && --budget);
> -   return 0;
> +   rpcrdma_sendcq_process_wc(pos++);
> +   } while (rc == ARRAY_SIZE(wcs));
> +   return;
>  }
>
>  /* Handle provider send completion upcalls.
> @@ -184,10 +189,8 @@ rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep 
> *ep)
>  static void
>  rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
>  {
> -   struct rpcrdma_ep *ep = (struct rpcrdma_ep *)cq_context;
> -
> do {
> -   rpcrdma_sendcq_poll(cq, ep);
> +   rpcrdma_sendcq_poll(cq);
> } while (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP |
>   IB_CQ_REPORT_MISSED_EVENTS) > 0);
>  }
> @@ -226,31 +229,32 @@ out_fail:
> goto out_schedule;
>  }
>
> -static int
> -rpcrdma_recvcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
> +/* The wc array is on stack: automatic memory is always CPU-local.
> + *
> + * struct ib_wc is 64 bytes, making the poll array potentially
> + * large. But this is at the bottom of the call chain. Further
> + * substantial work is done in another thread.
> + */
> +static void
> +rpcrdma_recvcq_poll(struct ib_cq *cq)
>  {
> -   struct list_head sched_list;
> -   struct ib_wc *wcs;
> -   int budget, count, rc;
> +   struct ib_wc *pos, wcs[4];
> +   LIST_HEAD(sched_list);
> +   int count, rc;
>
> -   INIT_LIST_HEAD(&sched_list);
> -   budget = RPCRDMA_WC_BUDGET / RPCRDMA_POLLSIZE;
> do {
> -   wcs = ep->rep_recv_wcs;
> +   pos = wcs;
>
> -   rc = ib_poll_cq(cq, RPCRDMA_POLLSIZE, wcs);
> -   if (rc <= 0)
> -   goto out_schedule;
> +   rc = ib_poll_cq(cq, AR

Re: [PATCH v2 02/16] xprtrdma: Re-arm after missed events

2015-10-06 Thread Devesh Sharma

Looks good,

Reviewed-By: Devesh Sharma 

Will send a test-report of this series with ocrdma drivers.



On Tue, Oct 6, 2015 at 8:28 PM, Chuck Lever  wrote:
> ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns a positive
> value if WCs were added to a CQ after the last completion upcall
> but before the CQ has been re-armed.
>
> Commit 7f23f6f6e388 ("xprtrmda: Reduce lock contention in
> completion handlers") assumed that when ib_req_notify_cq() returned
> a positive RC, the CQ had also been successfully re-armed, making
> it safe to return control to the provider without losing any
> completion signals. That is an invalid assumption.
>
> Change both completion handlers to continue polling while
> ib_req_notify_cq() returns a positive value.
>
> Fixes: 7f23f6f6e388 ("xprtrmda: Reduce lock contention in ...")
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/verbs.c |   66 
> +++
>  1 file changed, 10 insertions(+), 56 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 8a477e2..c713909 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -179,38 +179,17 @@ rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep 
> *ep)
> return 0;
>  }
>
> -/*
> - * Handle send, fast_reg_mr, and local_inv completions.
> - *
> - * Send events are typically suppressed and thus do not result
> - * in an upcall. Occasionally one is signaled, however. This
> - * prevents the provider's completion queue from wrapping and
> - * losing a completion.
> +/* Handle provider send completion upcalls.
>   */
>  static void
>  rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
>  {
> struct rpcrdma_ep *ep = (struct rpcrdma_ep *)cq_context;
> -   int rc;
> -
> -   rc = rpcrdma_sendcq_poll(cq, ep);
> -   if (rc) {
> -   dprintk("RPC:   %s: ib_poll_cq failed: %i\n",
> -   __func__, rc);
> -   return;
> -   }
>
> -   rc = ib_req_notify_cq(cq,
> -   IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
> -   if (rc == 0)
> -   return;
> -   if (rc < 0) {
> -   dprintk("RPC:   %s: ib_req_notify_cq failed: %i\n",
> -   __func__, rc);
> -   return;
> -   }
> -
> -   rpcrdma_sendcq_poll(cq, ep);
> +   do {
> +   rpcrdma_sendcq_poll(cq, ep);
> +   } while (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP |
> + IB_CQ_REPORT_MISSED_EVENTS) > 0);
>  }
>
>  static void
> @@ -274,42 +253,17 @@ out_schedule:
> return rc;
>  }
>
> -/*
> - * Handle receive completions.
> - *
> - * It is reentrant but processes single events in order to maintain
> - * ordering of receives to keep server credits.
> - *
> - * It is the responsibility of the scheduled tasklet to return
> - * recv buffers to the pool. NOTE: this affects synchronization of
> - * connection shutdown. That is, the structures required for
> - * the completion of the reply handler must remain intact until
> - * all memory has been reclaimed.
> +/* Handle provider receive completion upcalls.
>   */
>  static void
>  rpcrdma_recvcq_upcall(struct ib_cq *cq, void *cq_context)
>  {
> struct rpcrdma_ep *ep = (struct rpcrdma_ep *)cq_context;
> -   int rc;
> -
> -   rc = rpcrdma_recvcq_poll(cq, ep);
> -   if (rc) {
> -   dprintk("RPC:   %s: ib_poll_cq failed: %i\n",
> -   __func__, rc);
> -   return;
> -   }
>
> -   rc = ib_req_notify_cq(cq,
> -   IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
> -   if (rc == 0)
> -   return;
> -   if (rc < 0) {
> -   dprintk("RPC:   %s: ib_req_notify_cq failed: %i\n",
> -   __func__, rc);
> -   return;
> -   }
> -
> -   rpcrdma_recvcq_poll(cq, ep);
> +   do {
> +   rpcrdma_recvcq_poll(cq, ep);
> +   } while (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP |
> + IB_CQ_REPORT_MISSED_EVENTS) > 0);
>  }
>
>  static void
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 04/16] xprtrdma: Refactor reply handler error handling

2015-10-06 Thread Devesh Sharma

Looks Good,

Reviewed-By: Devesh Sharma 

On Tue, Oct 6, 2015 at 8:29 PM, Chuck Lever  wrote:
> Clean up: The error cases in rpcrdma_reply_handler() almost never
> execute. Ensure the compiler places them out of the hot path.
>
> No behavior change expected.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/rpc_rdma.c  |   89 
> ++-
>  net/sunrpc/xprtrdma/verbs.c |2 -
>  net/sunrpc/xprtrdma/xprt_rdma.h |2 +
>  3 files changed, 53 insertions(+), 40 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index bc8bd65..60ffa63 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -741,52 +741,27 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
> unsigned long cwnd;
> u32 credits;
>
> -   /* Check status. If bad, signal disconnect and return rep to pool */
> -   if (rep->rr_len == ~0U) {
> -   rpcrdma_recv_buffer_put(rep);
> -   if (r_xprt->rx_ep.rep_connected == 1) {
> -   r_xprt->rx_ep.rep_connected = -EIO;
> -   rpcrdma_conn_func(&r_xprt->rx_ep);
> -   }
> -   return;
> -   }
> -   if (rep->rr_len < RPCRDMA_HDRLEN_MIN) {
> -   dprintk("RPC:   %s: short/invalid reply\n", __func__);
> -   goto repost;
> -   }
> +   dprintk("RPC:   %s: incoming rep %p\n", __func__, rep);
> +
> +   if (rep->rr_len == RPCRDMA_BAD_LEN)
> +   goto out_badstatus;
> +   if (rep->rr_len < RPCRDMA_HDRLEN_MIN)
> +   goto out_shortreply;
> +
> headerp = rdmab_to_msg(rep->rr_rdmabuf);
> -   if (headerp->rm_vers != rpcrdma_version) {
> -   dprintk("RPC:   %s: invalid version %d\n",
> -   __func__, be32_to_cpu(headerp->rm_vers));
> -   goto repost;
> -   }
> +   if (headerp->rm_vers != rpcrdma_version)
> +   goto out_badversion;
>
> /* Get XID and try for a match. */
> spin_lock(&xprt->transport_lock);
> rqst = xprt_lookup_rqst(xprt, headerp->rm_xid);
> -   if (rqst == NULL) {
> -   spin_unlock(&xprt->transport_lock);
> -   dprintk("RPC:   %s: reply 0x%p failed "
> -   "to match any request xid 0x%08x len %d\n",
> -   __func__, rep, be32_to_cpu(headerp->rm_xid),
> -   rep->rr_len);
> -repost:
> -   r_xprt->rx_stats.bad_reply_count++;
> -   if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, &r_xprt->rx_ep, rep))
> -   rpcrdma_recv_buffer_put(rep);
> -
> -   return;
> -   }
> +   if (!rqst)
> +   goto out_nomatch;
>
> /* get request object */
> req = rpcr_to_rdmar(rqst);
> -   if (req->rl_reply) {
> -   spin_unlock(&xprt->transport_lock);
> -   dprintk("RPC:   %s: duplicate reply 0x%p to RPC "
> -   "request 0x%p: xid 0x%08x\n", __func__, rep, req,
> -   be32_to_cpu(headerp->rm_xid));
> -   goto repost;
> -   }
> +   if (req->rl_reply)
> +   goto out_duplicate;
>
> dprintk("RPC:   %s: reply 0x%p completes request 0x%p\n"
> "   RPC request 0x%p xid 0x%08x\n",
> @@ -883,8 +858,44 @@ badheader:
> if (xprt->cwnd > cwnd)
> xprt_release_rqst_cong(rqst->rq_task);
>
> +   xprt_complete_rqst(rqst->rq_task, status);
> +   spin_unlock(&xprt->transport_lock);
> dprintk("RPC:   %s: xprt_complete_rqst(0x%p, 0x%p, %d)\n",
> __func__, xprt, rqst, status);
> -   xprt_complete_rqst(rqst->rq_task, status);
> +   return;
> +
> +out_badstatus:
> +   rpcrdma_recv_buffer_put(rep);
> +   if (r_xprt->rx_ep.rep_connected == 1) {
> +   r_xprt->rx_ep.rep_connected = -EIO;
> +   rpcrdma_conn_func(&r_xprt->rx_ep);
> +   }
> +   return;
> +
> +out_shortreply:
> +   dprintk("RPC:   %s: short/invalid reply\n", __func__);
> +   goto repost;
> +
> +out_badversion:
> +   dprintk("RPC:   %s: invalid version %d\n",
> +   __func__, be32_to_cpu(headerp->rm_vers));
> +   goto repost;
> +
> +out_nomatch:

Re: [PATCH v2 05/16] xprtrdma: Replace send and receive arrays

2015-10-06 Thread Devesh Sharma

looks good,

Reviewed-By: Devesh Sharma 

On Tue, Oct 6, 2015 at 8:29 PM, Chuck Lever  wrote:
> The rb_send_bufs and rb_recv_bufs arrays are used to implement a
> pair of stacks for keeping track of free rpcrdma_req and rpcrdma_rep
> structs. Replace those arrays with free lists.
>
> To allow more than 512 RPCs in-flight at once, each of these arrays
> would be larger than a page (assuming 8-byte addresses and 4KB
> pages). Allowing up to 64K in-flight RPCs (as TCP now does), each
> buffer array would have to be 128 pages. That's an order-6
> allocation. (Not that we're going there.)
>
> A list is easier to expand dynamically. Instead of allocating a
> larger array of pointers and copying the existing pointers to the
> new array, simply append more buffers to each list.
>
> This also makes it simpler to manage receive buffers that might
> catch backwards-direction calls, or to post receive buffers in
> bulk to amortize the overhead of ib_post_recv.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/verbs.c |  155 
> +--
>  net/sunrpc/xprtrdma/xprt_rdma.h |9 +-
>  2 files changed, 73 insertions(+), 91 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 0076129..ab26392 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -928,44 +928,18 @@ rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
>  {
> struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
> struct rpcrdma_ia *ia = &r_xprt->rx_ia;
> -   struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
> -   char *p;
> -   size_t len;
> int i, rc;
>
> -   buf->rb_max_requests = cdata->max_requests;
> +   buf->rb_max_requests = r_xprt->rx_data.max_requests;
> spin_lock_init(&buf->rb_lock);
>
> -   /* Need to allocate:
> -*   1.  arrays for send and recv pointers
> -*   2.  arrays of struct rpcrdma_req to fill in pointers
> -*   3.  array of struct rpcrdma_rep for replies
> -* Send/recv buffers in req/rep need to be registered
> -*/
> -   len = buf->rb_max_requests *
> -   (sizeof(struct rpcrdma_req *) + sizeof(struct rpcrdma_rep *));
> -
> -   p = kzalloc(len, GFP_KERNEL);
> -   if (p == NULL) {
> -   dprintk("RPC:   %s: req_t/rep_t/pad kzalloc(%zd) 
> failed\n",
> -   __func__, len);
> -   rc = -ENOMEM;
> -   goto out;
> -   }
> -   buf->rb_pool = p;   /* for freeing it later */
> -
> -   buf->rb_send_bufs = (struct rpcrdma_req **) p;
> -   p = (char *) &buf->rb_send_bufs[buf->rb_max_requests];
> -   buf->rb_recv_bufs = (struct rpcrdma_rep **) p;
> -   p = (char *) &buf->rb_recv_bufs[buf->rb_max_requests];
> -
> rc = ia->ri_ops->ro_init(r_xprt);
> if (rc)
> goto out;
>
> +   INIT_LIST_HEAD(&buf->rb_send_bufs);
> for (i = 0; i < buf->rb_max_requests; i++) {
> struct rpcrdma_req *req;
> -   struct rpcrdma_rep *rep;
>
> req = rpcrdma_create_req(r_xprt);
> if (IS_ERR(req)) {
> @@ -974,7 +948,12 @@ rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
> rc = PTR_ERR(req);
> goto out;
> }
> -   buf->rb_send_bufs[i] = req;
> +   list_add(&req->rl_free, &buf->rb_send_bufs);
> +   }
> +
> +   INIT_LIST_HEAD(&buf->rb_recv_bufs);
> +   for (i = 0; i < buf->rb_max_requests + 2; i++) {
> +   struct rpcrdma_rep *rep;
>
> rep = rpcrdma_create_rep(r_xprt);
> if (IS_ERR(rep)) {
> @@ -983,7 +962,7 @@ rpcrdma_buffer_create(struct rpcrdma_xprt *r_xprt)
> rc = PTR_ERR(rep);
> goto out;
> }
> -   buf->rb_recv_bufs[i] = rep;
> +   list_add(&rep->rr_list, &buf->rb_recv_bufs);
> }
>
> return 0;
> @@ -992,6 +971,28 @@ out:
> return rc;
>  }
>
> +static struct rpcrdma_req *
> +rpcrdma_buffer_get_req_locked(struct rpcrdma_buffer *buf)
> +{
> +   struct rpcrdma_req *req;
> +
> +   req = list_first_entry(&buf->rb_send_bufs,
> +  struct rpcrdma_req, rl_free);
> +   list_del(&req->rl_free);
> +   return req;
> +}
> +
> +static struct rpcrdma_rep

Re: [PATCH v2 06/16] xprtrdma: Use workqueue to process RPC/RDMA replies

2015-10-06 Thread Devesh Sharma

Looks good! I will send a test report with ocrdma driver.

Reviewed-By: Devesh Sharma 

On Tue, Oct 6, 2015 at 8:29 PM, Chuck Lever  wrote:
> The reply tasklet is fast, but it's single threaded. After reply
> traffic saturates a single CPU, there's no more reply processing
> capacity.
>
> Replace the tasklet with a workqueue to spread reply handling across
> all CPUs.  This also moves RPC/RDMA reply handling out of the soft
> IRQ context and into a context that allows sleeps.
>
> Signed-off-by: Chuck Lever 
> ---
>  net/sunrpc/xprtrdma/rpc_rdma.c  |   17 +++-
>  net/sunrpc/xprtrdma/transport.c |8 ++
>  net/sunrpc/xprtrdma/verbs.c |   54 
> ---
>  net/sunrpc/xprtrdma/xprt_rdma.h |4 +++
>  4 files changed, 65 insertions(+), 18 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index 60ffa63..95774fc 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -723,8 +723,8 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
> schedule_delayed_work(&ep->rep_connect_worker, 0);
>  }
>
> -/*
> - * Called as a tasklet to do req/reply match and complete a request
> +/* Process received RPC/RDMA messages.
> + *
>   * Errors must result in the RPC task either being awakened, or
>   * allowed to timeout, to discover the errors at that time.
>   */
> @@ -752,13 +752,14 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
> if (headerp->rm_vers != rpcrdma_version)
> goto out_badversion;
>
> -   /* Get XID and try for a match. */
> -   spin_lock(&xprt->transport_lock);
> +   /* Match incoming rpcrdma_rep to an rpcrdma_req to
> +* get context for handling any incoming chunks.
> +*/
> +   spin_lock_bh(&xprt->transport_lock);
> rqst = xprt_lookup_rqst(xprt, headerp->rm_xid);
> if (!rqst)
> goto out_nomatch;
>
> -   /* get request object */
> req = rpcr_to_rdmar(rqst);
> if (req->rl_reply)
> goto out_duplicate;
> @@ -859,7 +860,7 @@ badheader:
> xprt_release_rqst_cong(rqst->rq_task);
>
> xprt_complete_rqst(rqst->rq_task, status);
> -   spin_unlock(&xprt->transport_lock);
> +   spin_unlock_bh(&xprt->transport_lock);
> dprintk("RPC:   %s: xprt_complete_rqst(0x%p, 0x%p, %d)\n",
> __func__, xprt, rqst, status);
> return;
> @@ -882,14 +883,14 @@ out_badversion:
> goto repost;
>
>  out_nomatch:
> -   spin_unlock(&xprt->transport_lock);
> +   spin_unlock_bh(&xprt->transport_lock);
> dprintk("RPC:   %s: no match for incoming xid 0x%08x len %d\n",
> __func__, be32_to_cpu(headerp->rm_xid),
> rep->rr_len);
> goto repost;
>
>  out_duplicate:
> -   spin_unlock(&xprt->transport_lock);
> +   spin_unlock_bh(&xprt->transport_lock);
> dprintk("RPC:   %s: "
> "duplicate reply %p to RPC request %p: xid 0x%08x\n",
> __func__, rep, req, be32_to_cpu(headerp->rm_xid));
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index e9e5ed7..897a2f3 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -732,6 +732,7 @@ void xprt_rdma_cleanup(void)
> dprintk("RPC:   %s: xprt_unregister returned %i\n",
> __func__, rc);
>
> +   rpcrdma_destroy_wq();
> frwr_destroy_recovery_wq();
>  }
>
> @@ -743,8 +744,15 @@ int xprt_rdma_init(void)
> if (rc)
> return rc;
>
> +   rc = rpcrdma_alloc_wq();
> +   if (rc) {
> +   frwr_destroy_recovery_wq();
> +   return rc;
> +   }
> +
> rc = xprt_register_transport(&xprt_rdma);
> if (rc) {
> +   rpcrdma_destroy_wq();
> frwr_destroy_recovery_wq();
> return rc;
> }
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index ab26392..cf2f5b3 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -100,6 +100,35 @@ rpcrdma_run_tasklet(unsigned long data)
>
>  static DECLARE_TASKLET(rpcrdma_tasklet_g, rpcrdma_run_tasklet, 0UL);
>
> +static struct workqueue_struct *rpcrdma_receive_wq;
> +
> +int
> +rpcrdma_alloc_wq(void)
> +{
> +   struct workqueue_struct *recv_wq;
> +
>

Re: [PATCH rdma-RC] IB/cm: Fix sleeping while atomic when creating AH from WC

2015-10-12 Thread Devesh Sharma

Looks good, just one doubt inline:

On Sun, Oct 11, 2015 at 6:28 PM, Matan Barak  wrote:
> When IP based addressing was introduced, ib_create_ah_from_wc was
> changed in order to support a suitable AH. Since this AH should
> now contains the DMAC (which isn't a simple derivative of the GID).
> In order to find the DMAC, an ARP should sometime be sent. This ARP
> is a sleeping context.
>
> ib_create_ah_from_wc is called from cm_alloc_response_msg, which is
> sometimes called from an atomic context. This caused a
> sleeping-while-atomic bug. Fixing this by splitting
> cm_alloc_response_msg to an atomic and sleep-able part. When
> cm_alloc_response_msg is used in an atomic context, we try to create
> the AH before entering the atomic context.
>
> Fixes: 66bd20a72d2f ('IB/core: Ethernet L2 attributes in verbs/cm structures')
> Signed-off-by: Matan Barak 
> ---
>
> Hi Doug,
> This patch fixes an old bug in the CM. IP based addressing requires
> ARP resolution which isn't sleep-able by its nature. This resolution
> was sometimes done in non sleep-able context. Our regression tests
> picked up this bug and verified this fix.
>
> Thanks,
> Matan
>
>  drivers/infiniband/core/cm.c | 60 
> 
>  1 file changed, 49 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index ea4db9c..f5cf1c4 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -287,17 +287,12 @@ static int cm_alloc_msg(struct cm_id_private 
> *cm_id_priv,
> return 0;
>  }
>
> -static int cm_alloc_response_msg(struct cm_port *port,
> -struct ib_mad_recv_wc *mad_recv_wc,
> -struct ib_mad_send_buf **msg)
> +static int _cm_alloc_response_msg(struct cm_port *port,
> + struct ib_mad_recv_wc *mad_recv_wc,
> + struct ib_ah *ah,
> + struct ib_mad_send_buf **msg)
>  {
> struct ib_mad_send_buf *m;
> -   struct ib_ah *ah;
> -
> -   ah = ib_create_ah_from_wc(port->mad_agent->qp->pd, mad_recv_wc->wc,
> - mad_recv_wc->recv_buf.grh, port->port_num);
> -   if (IS_ERR(ah))
> -   return PTR_ERR(ah);
>
> m = ib_create_send_mad(port->mad_agent, 1, 
> mad_recv_wc->wc->pkey_index,
>0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
> @@ -312,6 +307,20 @@ static int cm_alloc_response_msg(struct cm_port *port,
> return 0;
>  }
>
> +static int cm_alloc_response_msg(struct cm_port *port,
> +struct ib_mad_recv_wc *mad_recv_wc,
> +struct ib_mad_send_buf **msg)
> +{
> +   struct ib_ah *ah;
> +
> +   ah = ib_create_ah_from_wc(port->mad_agent->qp->pd, mad_recv_wc->wc,
> + mad_recv_wc->recv_buf.grh, port->port_num);
> +   if (IS_ERR(ah))
> +   return PTR_ERR(ah);
> +
> +   return _cm_alloc_response_msg(port, mad_recv_wc, ah, msg);
> +}
> +
>  static void cm_free_msg(struct ib_mad_send_buf *msg)
>  {
> ib_destroy_ah(msg->ah);
> @@ -2201,6 +2210,7 @@ static int cm_dreq_handler(struct cm_work *work)
> struct cm_id_private *cm_id_priv;
> struct cm_dreq_msg *dreq_msg;
> struct ib_mad_send_buf *msg = NULL;
> +   struct ib_ah *ah;
> int ret;
>
> dreq_msg = (struct cm_dreq_msg *)work->mad_recv_wc->recv_buf.mad;
> @@ -2213,6 +2223,11 @@ static int cm_dreq_handler(struct cm_work *work)
> return -EINVAL;
> }
>
> +   ah = ib_create_ah_from_wc(work->port->mad_agent->qp->pd,
> + work->mad_recv_wc->wc,
> + work->mad_recv_wc->recv_buf.grh,
> + work->port->port_num);
> +

Shouldn't below IS_ERR(ah) on ah be here, instead of there?

> work->cm_event.private_data = &dreq_msg->private_data;
>
> spin_lock_irq(&cm_id_priv->lock);
> @@ -2234,9 +2249,13 @@ static int cm_dreq_handler(struct cm_work *work)
> case IB_CM_TIMEWAIT:
> 
> atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
> counter[CM_DREQ_COUNTER]);
> -   if (cm_alloc_response_msg(work->port, work->mad_recv_wc, 
> &msg))
> +   if (IS_ERR(ah))
> +   goto unlock;
> +   if (_cm_alloc_response_msg(work->port, work->mad_recv_wc, ah,
> +  &msg))
> goto unlock;
>
> +   ah = NULL;
> cm_format_drep((struct cm_drep_msg *) msg->mad, cm_id_priv,
>cm_id_priv->private_data,
>cm_id_priv->private_data_len);
> @@ -2259,6 +2278,8 @@ static int cm_dreq_handler(struct cm_work *work)
>

Re: [PATCH v2 00/16] NFS/RDMA patches for merging into v4.4

2015-10-14 Thread Devesh Sharma

Hi Chuck,

With the server crash fix in place, ocrdma is passing iozone on this series.

Series Tested-By: Devesh Sharma 

On Tue, Oct 6, 2015 at 8:28 PM, Chuck Lever  wrote:
> Introduce client-side support for bi-directional RPC/RDMA.
> Bi-directional RPC/RDMA is a pre-requisite for NFSv4.1 on RDMA
> transports.
>
> Also available in the "nfs-rdma-for-4.4" topic branch of this git repo:
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
> Or for browsing:
>
> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfs-rdma-for-4.4
>
> Changes since v1:
> - Dropped "RFC" in Subject: line
> - Rebased on v4.3-rc4 + Steve W's recent fixes
> - NFS Server-side backchannel support postponed
> - "xprtrdma: Replace global lkey" dropped, already merged
> - Addressed Sagi's comments on "Replace send and receive arrays"
> - Addressed Jason's comment regarding ib_req_notify_cq return code
> - Moved RPC/RDMA reply handling into a work queue
>
> I'm assuming recent list discussion has addressed Devesh's and
> Sagi's concerns with "Prevent loss of completion signals". Let me
> know if there is still a problem.
>
> ---
>
> Chuck Lever (16):
>   xprtrdma: Enable swap-on-NFS/RDMA
>   xprtrdma: Re-arm after missed events
>   xprtrdma: Prevent loss of completion signals
>   xprtrdma: Refactor reply handler error handling
>   xprtrdma: Replace send and receive arrays
>   xprtrdma: Use workqueue to process RPC/RDMA replies
>   xprtrdma: Remove reply tasklet
>   xprtrdma: Saving IRQs no longer needed for rb_lock
>   SUNRPC: Abstract backchannel operations
>   xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers
>   xprtrdma: Pre-allocate Work Requests for backchannel
>   xprtrdma: Add support for sending backward direction RPC replies
>   xprtrdma: Handle incoming backward direction RPC calls
>   svcrdma: Add backward direction service for RPC/RDMA transport
>   SUNRPC: Remove the TCP-only restriction in bc_svc_process()
>   NFS: Enable client side NFSv4.1 backchannel to use other transports
>
>
>  fs/nfs/callback.c|   33 +-
>  include/linux/sunrpc/bc_xprt.h   |5
>  include/linux/sunrpc/svc_rdma.h  |6
>  include/linux/sunrpc/xprt.h  |7
>  net/sunrpc/backchannel_rqst.c|   24 +-
>  net/sunrpc/svc.c |5
>  net/sunrpc/xprtrdma/Makefile |1
>  net/sunrpc/xprtrdma/backchannel.c|  373 +++
>  net/sunrpc/xprtrdma/rpc_rdma.c   |  148 ++---
>  net/sunrpc/xprtrdma/svc_rdma.c   |6
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |   58 
>  net/sunrpc/xprtrdma/transport.c  |   18 +
>  net/sunrpc/xprtrdma/verbs.c  |  479 
> +++---
>  net/sunrpc/xprtrdma/xprt_rdma.h  |   53 +++
>  net/sunrpc/xprtsock.c|   16 +
>  15 files changed, 916 insertions(+), 316 deletions(-)
>  create mode 100644 net/sunrpc/xprtrdma/backchannel.c
>
> --
> Chuck Lever
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-next 2/7] IB: Introduce Work Queue object and its verbs

2015-10-18 Thread Devesh Sharma

Hi All,

Will it be a good idea to have ha separate header for this feature.
Lets not append to ib_verbs.h?

-Regards
Devesh

On Sun, Oct 18, 2015 at 8:45 PM, Parav Pandit  wrote:
> On Sun, Oct 18, 2015 at 8:38 PM, Yishai Hadas
>  wrote:
>> On 10/15/2015 7:49 PM, Parav Pandit wrote:
>>
>>> If there is stateless WQ being used by multiple QPs in multiplexed
>>
>>
>> The WQ is not stateless and always has its own PD.
>>
>>> way, it should be able to multiplex between QP's of different PD as
>>> well.
>>> Otherwise for every PD being created, there will have be one WQ needed
>>> to service all the QPs belonging to that PD.
>>
>>
>> As mentioned, same WQ can serve multiple QPs, from PD point of view it
>> behaves similarly to SRQ that may be associated with many QPs with different
>> PDs.
>>
>> See IB SPEC, Release 1.3, o10-2.2.1:
>> "SRQ may be associated with the same PD as used by one or more of its
>> associated QPs or a different PD."
>>
>> As part of coming V1 will improve the commit message to better clarify the
>> WQ's PD behavior, thanks.
>
> Ok. Got it. Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-next v2 02/10] IB/core: Expose and rename ib_find_cached_gid_by_port cache API

2015-10-18 Thread Devesh Sharma

looks good, Reviewed-By: Devesh Sharma 

On Thu, Oct 15, 2015 at 9:08 PM, Matan Barak  wrote:
> Sometime consumers might want to search for a GID in a specific port.
> For example, when a WC arrives and we want to search the GID
> that matches that port - it's better to search only the relevant
> port.
> Exposing and renaming ib_cache_gid_find_by_port in order to match
> the naming convention of the module.
>
> Signed-off-by: Matan Barak 
> ---
>  drivers/infiniband/core/cache.c |  9 +
>  drivers/infiniband/core/core_priv.h |  5 -
>  drivers/infiniband/core/device.c|  4 ++--
>  include/rdma/ib_cache.h | 19 +++
>  4 files changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> index 5c05407..639a726 100644
> --- a/drivers/infiniband/core/cache.c
> +++ b/drivers/infiniband/core/cache.c
> @@ -409,10 +409,10 @@ static int ib_cache_gid_find(struct ib_device *ib_dev,
> mask, port, index);
>  }
>
> -int ib_cache_gid_find_by_port(struct ib_device *ib_dev,
> - const union ib_gid *gid,
> - u8 port, struct net_device *ndev,
> - u16 *index)
> +int ib_find_cached_gid_by_port(struct ib_device *ib_dev,
> +  const union ib_gid *gid,
> +  u8 port, struct net_device *ndev,
> +  u16 *index)
>  {
> int local_index;
> struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
> @@ -438,6 +438,7 @@ int ib_cache_gid_find_by_port(struct ib_device *ib_dev,
>
> return -ENOENT;
>  }
> +EXPORT_SYMBOL(ib_find_cached_gid_by_port);
>
>  static struct ib_gid_table *alloc_gid_table(int sz)
>  {
> diff --git a/drivers/infiniband/core/core_priv.h 
> b/drivers/infiniband/core/core_priv.h
> index 70bb36e..0df82b1 100644
> --- a/drivers/infiniband/core/core_priv.h
> +++ b/drivers/infiniband/core/core_priv.h
> @@ -65,11 +65,6 @@ void ib_enum_all_roce_netdevs(roce_netdev_filter filter,
>   roce_netdev_callback cb,
>   void *cookie);
>
> -int ib_cache_gid_find_by_port(struct ib_device *ib_dev,
> - const union ib_gid *gid,
> - u8 port, struct net_device *ndev,
> - u16 *index);
> -
>  enum ib_cache_gid_default_mode {
> IB_CACHE_GID_DEFAULT_MODE_SET,
> IB_CACHE_GID_DEFAULT_MODE_DELETE
> diff --git a/drivers/infiniband/core/device.c 
> b/drivers/infiniband/core/device.c
> index f22ce48..179e813 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -838,8 +838,8 @@ int ib_find_gid(struct ib_device *device, union ib_gid 
> *gid,
>
> for (port = rdma_start_port(device); port <= rdma_end_port(device); 
> ++port) {
> if (rdma_cap_roce_gid_table(device, port)) {
> -   if (!ib_cache_gid_find_by_port(device, gid, port,
> -  ndev, index)) {
> +   if (!ib_find_cached_gid_by_port(device, gid, port,
> +   ndev, index)) {
> *port_num = port;
> return 0;
> }
> diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h
> index dcc9bed..679d7ca 100644
> --- a/include/rdma/ib_cache.h
> +++ b/include/rdma/ib_cache.h
> @@ -75,6 +75,25 @@ int ib_find_cached_gid(struct ib_device *device,
>u16  *index);
>
>  /**
> + * ib_find_cached_gid_by_port - Returns the GID table index where a specified
> + * GID value occurs
> + * @device: The device to query.
> + * @gid: The GID value to search for.
> + * @port_num: The port number of the device where the GID value sould be
> + *   searched.
> + * @ndev: In RoCE, the net device of the device. Null means ignore.
> + * @index: The index into the cached GID table where the GID was found.  This
> + *   parameter may be NULL.
> + *
> + * ib_find_cached_gid() searches for the specified GID value in
> + * the local software cache.
> + */
> +int ib_find_cached_gid_by_port(struct ib_device *device,
> +  const union ib_gid *gid,
> +  u8   port_num,
> +  struct net_device *ndev,
> +  u16  *index);
> +/**
>   * ib_get_cached_pkey - Returns a cached PKey t

Re: [PATCH for-next v2 01/10] IB/core: Add netdev and gid attributes paramteres to cache

2015-10-18 Thread Devesh Sharma

Looks Good
Reviewed-By: Devesh Sharma 

On Thu, Oct 15, 2015 at 9:08 PM, Matan Barak  wrote:
> Adding an ability to query the IB cache by a netdev and get the
> attributes of a GID. These parameters are necessary in order to
> successfully resolve the required GID (when the netdevice is known)
> and get the Ethernet L2 attributes from a GID.
>
> Signed-off-by: Matan Barak 
> ---
>  drivers/infiniband/core/cache.c| 10 ++
>  drivers/infiniband/core/cm.c   |  5 +++--
>  drivers/infiniband/core/cma.c  | 10 ++
>  drivers/infiniband/core/device.c   | 17 -
>  drivers/infiniband/core/mad.c  |  2 +-
>  drivers/infiniband/core/multicast.c|  3 ++-
>  drivers/infiniband/core/sa_query.c |  2 +-
>  drivers/infiniband/core/sysfs.c|  2 +-
>  drivers/infiniband/core/verbs.c|  7 ---
>  drivers/infiniband/hw/mlx4/main.c  |  4 ++--
>  drivers/infiniband/hw/mlx4/qp.c|  5 +++--
>  drivers/infiniband/hw/mthca/mthca_av.c |  2 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c|  2 +-
>  drivers/infiniband/ulp/ipoib/ipoib_main.c  |  2 +-
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  2 +-
>  drivers/infiniband/ulp/srp/ib_srp.c|  2 +-
>  drivers/infiniband/ulp/srpt/ib_srpt.c  |  3 ++-
>  include/rdma/ib_cache.h| 13 +
>  include/rdma/ib_verbs.h|  5 +++--
>  19 files changed, 60 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> index 87471ef..5c05407 100644
> --- a/drivers/infiniband/core/cache.c
> +++ b/drivers/infiniband/core/cache.c
> @@ -649,21 +649,23 @@ static int gid_table_setup_one(struct ib_device *ib_dev)
>  int ib_get_cached_gid(struct ib_device *device,
>   u8port_num,
>   int   index,
> - union ib_gid *gid)
> + union ib_gid *gid,
> + struct ib_gid_attr *gid_attr)
>  {
> if (port_num < rdma_start_port(device) || port_num > 
> rdma_end_port(device))
> return -EINVAL;
>
> -   return __ib_cache_gid_get(device, port_num, index, gid, NULL);
> +   return __ib_cache_gid_get(device, port_num, index, gid, gid_attr);
>  }
>  EXPORT_SYMBOL(ib_get_cached_gid);
>
>  int ib_find_cached_gid(struct ib_device *device,
>const union ib_gid *gid,
> +  struct net_device *ndev,
>u8   *port_num,
>u16  *index)
>  {
> -   return ib_cache_gid_find(device, gid, NULL, port_num, index);
> +   return ib_cache_gid_find(device, gid, ndev, port_num, index);
>  }
>  EXPORT_SYMBOL(ib_find_cached_gid);
>
> @@ -845,7 +847,7 @@ static void ib_cache_update(struct ib_device *device,
> if (!use_roce_gid_table) {
> for (i = 0;  i < gid_cache->table_len; ++i) {
> ret = ib_query_gid(device, port, i,
> -  gid_cache->table + i);
> +  gid_cache->table + i, NULL);
> if (ret) {
> printk(KERN_WARNING "ib_query_gid failed (%d) 
> for %s (index %d)\n",
>ret, device->name, i);
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index ea4db9c..15d60f2 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -365,7 +365,7 @@ static int cm_init_av_by_path(struct ib_sa_path_rec 
> *path, struct cm_av *av)
> read_lock_irqsave(&cm.device_lock, flags);
> list_for_each_entry(cm_dev, &cm.device_list, list) {
> if (!ib_find_cached_gid(cm_dev->ib_device, &path->sgid,
> -   &p, NULL)) {
> +   NULL, &p, NULL)) {
> port = cm_dev->port[p-1];
> break;
> }
> @@ -1638,7 +1638,8 @@ static int cm_req_handler(struct cm_work *work)
> ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
> if (ret) {
> ib_get_cached_gid(work->port->cm_dev->ib_device,
> - work->port->port_num, 0, 
> &work->path[0].sgid);
> + work->port->port_num, 0, 
> &wor

Re: [PATCH for-next v2 03/10] IB/core: Add netdev to path record

2015-10-18 Thread Devesh Sharma

Looks Good
Reviewed-By: Devesh Sharma 

On Thu, Oct 15, 2015 at 9:08 PM, Matan Barak  wrote:
> In order to find the sgid_index, one could just query the IB cache
> with the correct GID and netdevice. Therefore, instead of storing
> the L2 attributes directly in the path, we only store the
> ifindex and net and use them later to get the sgid_index.
> The vlan_id and smac L2 attributes are removed in a later patch.
>
> Signed-off-by: Matan Barak 
> ---
>  drivers/infiniband/core/sa_query.c| 13 +++--
>  drivers/infiniband/core/uverbs_marshall.c |  2 ++
>  include/rdma/ib_sa.h  | 10 ++
>  3 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/core/sa_query.c 
> b/drivers/infiniband/core/sa_query.c
> index 9a4e789..c9d9d7a 100644
> --- a/drivers/infiniband/core/sa_query.c
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -1007,18 +1007,25 @@ int ib_init_ah_from_path(struct ib_device *device, u8 
> port_num,
> force_grh = rdma_cap_eth_ah(device, port_num);
>
> if (rec->hop_limit > 1 || force_grh) {
> +   struct net_device *ndev = ib_get_ndev_from_path(rec);
> +
> ah_attr->ah_flags = IB_AH_GRH;
> ah_attr->grh.dgid = rec->dgid;
>
> -   ret = ib_find_cached_gid(device, &rec->sgid, NULL, &port_num,
> +   ret = ib_find_cached_gid(device, &rec->sgid, ndev, &port_num,
>  &gid_index);
> -   if (ret)
> +   if (ret) {
> +   if (ndev)
> +   dev_put(ndev);
> return ret;
> +   }
>
> ah_attr->grh.sgid_index= gid_index;
> ah_attr->grh.flow_label= be32_to_cpu(rec->flow_label);
> ah_attr->grh.hop_limit = rec->hop_limit;
> ah_attr->grh.traffic_class = rec->traffic_class;
> +   if (ndev)
> +   dev_put(ndev);
> }
> if (force_grh) {
> memcpy(ah_attr->dmac, rec->dmac, ETH_ALEN);
> @@ -1151,6 +1158,8 @@ static void ib_sa_path_rec_callback(struct ib_sa_query 
> *sa_query,
> ib_unpack(path_rec_table, ARRAY_SIZE(path_rec_table),
>   mad->data, &rec);
> rec.vlan_id = 0x;
> +   rec.net = NULL;
> +   rec.ifindex = 0;
> memset(rec.dmac, 0, ETH_ALEN);
> memset(rec.smac, 0, ETH_ALEN);
> query->callback(status, &rec, query->context);
> diff --git a/drivers/infiniband/core/uverbs_marshall.c 
> b/drivers/infiniband/core/uverbs_marshall.c
> index abd9724..484698c 100644
> --- a/drivers/infiniband/core/uverbs_marshall.c
> +++ b/drivers/infiniband/core/uverbs_marshall.c
> @@ -144,5 +144,7 @@ void ib_copy_path_rec_from_user(struct ib_sa_path_rec 
> *dst,
> memset(dst->smac, 0, sizeof(dst->smac));
> memset(dst->dmac, 0, sizeof(dst->dmac));
> dst->vlan_id = 0x;
> +   dst->net = NULL;
> +   dst->ifindex = 0;
>  }
>  EXPORT_SYMBOL(ib_copy_path_rec_from_user);
> diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
> index 7e071a6..406ecf1 100644
> --- a/include/rdma/ib_sa.h
> +++ b/include/rdma/ib_sa.h
> @@ -39,6 +39,7 @@
>  #include 
>
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -157,8 +158,17 @@ struct ib_sa_path_rec {
> u8   smac[ETH_ALEN];
> u8   dmac[ETH_ALEN];
> u16  vlan_id;
> +   /* ignored in IB */
> +   int  ifindex;
> +   /* ignored in IB */
> +   struct net  *net;
>  };
>
> +static inline struct net_device *ib_get_ndev_from_path(struct ib_sa_path_rec 
> *rec)
> +{
> +   return rec->net ? dev_get_by_index(rec->net, rec->ifindex) : NULL;
> +}
> +
>  #define IB_SA_MCMEMBER_REC_MGID
> IB_SA_COMP_MASK( 0)
>  #define IB_SA_MCMEMBER_REC_PORT_GIDIB_SA_COMP_MASK( 1)
>  #define IB_SA_MCMEMBER_REC_QKEY
> IB_SA_COMP_MASK( 2)
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-next v2 04/10] IB/cm: cm_init_av_by_path should find a GID by its netdevice

2015-10-18 Thread Devesh Sharma

Looks Good

Reviewed-By: Devesh Sharma 

On Thu, Oct 15, 2015 at 9:08 PM, Matan Barak  wrote:
> Previously, the CM has searched the cache for any sgid_index whose
> GID matches the path's GID. Since the path record stores the net
> device, the CM should now search only for GIDs which originated from
> this net device.
>
> Signed-off-by: Matan Barak 
> ---
>  drivers/infiniband/core/cm.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 15d60f2..ea7f3c5 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -361,17 +361,21 @@ static int cm_init_av_by_path(struct ib_sa_path_rec 
> *path, struct cm_av *av)
> unsigned long flags;
> int ret;
> u8 p;
> +   struct net_device *ndev = ib_get_ndev_from_path(path);
>
> read_lock_irqsave(&cm.device_lock, flags);
> list_for_each_entry(cm_dev, &cm.device_list, list) {
> if (!ib_find_cached_gid(cm_dev->ib_device, &path->sgid,
> -   NULL, &p, NULL)) {
> +   ndev, &p, NULL)) {
> port = cm_dev->port[p-1];
> break;
> }
> }
> read_unlock_irqrestore(&cm.device_lock, flags);
>
> +   if (ndev)
> +   dev_put(ndev);
> +
> if (!port)
> return -EINVAL;
>
> @@ -384,7 +388,6 @@ static int cm_init_av_by_path(struct ib_sa_path_rec 
> *path, struct cm_av *av)
> ib_init_ah_from_path(cm_dev->ib_device, port->port_num, path,
>  &av->ah_attr);
> av->timeout = path->packet_life_time + 1;
> -   memcpy(av->smac, path->smac, sizeof(av->smac));
>
> av->valid = 1;
> return 0;
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-next v2 05/10] IB/cma: cma_validate_port should verify the port and netdevice

2015-10-18 Thread Devesh Sharma

Looks Good
Reveiwed-By: Devesh Sharma 

On Thu, Oct 15, 2015 at 9:08 PM, Matan Barak  wrote:
> Previously, cma_validate_port searched for GIDs in IB cache and then
> tried to verify the found port. This could fail when there are
> identical GIDs on both ports. In addition, netdevice should be taken
> into account when searching the GID table.
> Fixing cma_validate_port to search only the relevant port's cache
> and netdevice.
>
> Signed-off-by: Matan Barak 
> ---
>  drivers/infiniband/core/cma.c | 26 ++
>  1 file changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index b15d9d5..849c280 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -427,10 +427,11 @@ static int cma_translate_addr(struct sockaddr *addr, 
> struct rdma_dev_addr *dev_a
>  }
>
>  static inline int cma_validate_port(struct ib_device *device, u8 port,
> - union ib_gid *gid, int dev_type)
> + union ib_gid *gid, int dev_type,
> + int bound_if_index)
>  {
> -   u8 found_port;
> int ret = -ENODEV;
> +   struct net_device *ndev = NULL;
>
> if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, 
> port))
> return ret;
> @@ -438,9 +439,13 @@ static inline int cma_validate_port(struct ib_device 
> *device, u8 port,
> if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
> return ret;
>
> -   ret = ib_find_cached_gid(device, gid, NULL, &found_port, NULL);
> -   if (port != found_port)
> -   return -ENODEV;
> +   if (dev_type == ARPHRD_ETHER)
> +   ndev = dev_get_by_index(&init_net, bound_if_index);
> +
> +   ret = ib_find_cached_gid_by_port(device, gid, port, ndev, NULL);
> +
> +   if (ndev)
> +   dev_put(ndev);
>
> return ret;
>  }
> @@ -472,7 +477,8 @@ static int cma_acquire_dev(struct rdma_id_private 
> *id_priv,
>&iboe_gid : &gid;
>
> ret = cma_validate_port(cma_dev->device, port, gidp,
> -   dev_addr->dev_type);
> +   dev_addr->dev_type,
> +   dev_addr->bound_dev_if);
> if (!ret) {
> id_priv->id.port_num = port;
> goto out;
> @@ -490,7 +496,8 @@ static int cma_acquire_dev(struct rdma_id_private 
> *id_priv,
>&iboe_gid : &gid;
>
> ret = cma_validate_port(cma_dev->device, port, gidp,
> -   dev_addr->dev_type);
> +   dev_addr->dev_type,
> +   dev_addr->bound_dev_if);
> if (!ret) {
> id_priv->id.port_num = port;
> goto out;
> @@ -2270,8 +2277,11 @@ static int cma_resolve_iboe_route(struct 
> rdma_id_private *id_priv)
>
> route->num_paths = 1;
>
> -   if (addr->dev_addr.bound_dev_if)
> +   if (addr->dev_addr.bound_dev_if) {
> ndev = dev_get_by_index(&init_net, 
> addr->dev_addr.bound_dev_if);
> +   route->path_rec->net = &init_net;
> +   route->path_rec->ifindex = addr->dev_addr.bound_dev_if;
> +   }
> if (!ndev) {
> ret = -ENODEV;
> goto err2;
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-next v2 06/10] IB/cache: Add ib_find_gid_by_filter cache API

2015-10-18 Thread Devesh Sharma

Looks Good
Reviewed-By: Devesh Sharma 

On Thu, Oct 15, 2015 at 9:08 PM, Matan Barak  wrote:
> GID cache API users might want to search for GIDs with specific
> attributes rather than just specifying GID, net device and port.
> This is used in a later patch, where we find the sgid index by
> L2 Ethernet attributes.
>
> Signed-off-by: Matan Barak 
> ---
>  drivers/infiniband/core/cache.c | 93 
> +
>  include/rdma/ib_cache.h |  8 
>  2 files changed, 101 insertions(+)
>
> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> index 639a726..89bebea 100644
> --- a/drivers/infiniband/core/cache.c
> +++ b/drivers/infiniband/core/cache.c
> @@ -440,6 +440,81 @@ int ib_find_cached_gid_by_port(struct ib_device *ib_dev,
>  }
>  EXPORT_SYMBOL(ib_find_cached_gid_by_port);
>
> +/**
> + * ib_find_gid_by_filter - Returns the GID table index where a specified
> + * GID value occurs
> + * @device: The device to query.
> + * @gid: The GID value to search for.
> + * @port_num: The port number of the device where the GID value could be
> + *   searched.
> + * @filter: The filter function is executed on any matching GID in the table.
> + *   If the filter function returns true, the corresponding index is 
> returned,
> + *   otherwise, we continue searching the GID table. It's guaranteed that
> + *   while filter is executed, ndev field is valid and the structure won't
> + *   change. filter is executed in an atomic context. filter must not be 
> NULL.
> + * @index: The index into the cached GID table where the GID was found.  This
> + *   parameter may be NULL.
> + *
> + * ib_cache_gid_find_by_filter() searches for the specified GID value
> + * of which the filter function returns true in the port's GID table.
> + * This function is only supported on RoCE ports.
> + *
> + */
> +static int ib_cache_gid_find_by_filter(struct ib_device *ib_dev,
> +  const union ib_gid *gid,
> +  u8 port,
> +  bool (*filter)(const union ib_gid *,
> + const struct 
> ib_gid_attr *,
> + void *),
> +  void *context,
> +  u16 *index)
> +{
> +   struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
> +   struct ib_gid_table *table;
> +   unsigned int i;
> +   bool found = false;
> +
> +   if (!ports_table)
> +   return -EOPNOTSUPP;
> +
> +   if (port < rdma_start_port(ib_dev) ||
> +   port > rdma_end_port(ib_dev) ||
> +   !rdma_protocol_roce(ib_dev, port))
> +   return -EPROTONOSUPPORT;
> +
> +   table = ports_table[port - rdma_start_port(ib_dev)];
> +
> +   for (i = 0; i < table->sz; i++) {
> +   struct ib_gid_attr attr;
> +   unsigned long flags;
> +
> +   read_lock_irqsave(&table->data_vec[i].lock, flags);
> +   if (table->data_vec[i].props & GID_TABLE_ENTRY_INVALID)
> +   goto next;
> +
> +   if (memcmp(gid, &table->data_vec[i].gid, sizeof(*gid)))
> +   goto next;
> +
> +   memcpy(&attr, &table->data_vec[i].attr, sizeof(attr));
> +
> +   if (filter(gid, &attr, context))
> +   found = true;
> +
> +next:
> +   read_unlock_irqrestore(&table->data_vec[i].lock, flags);
> +
> +   if (found)
> +   break;
> +   }
> +
> +   if (!found)
> +   return -ENOENT;
> +
> +   if (index)
> +   *index = i;
> +   return 0;
> +}
> +
>  static struct ib_gid_table *alloc_gid_table(int sz)
>  {
> unsigned int i;
> @@ -670,6 +745,24 @@ int ib_find_cached_gid(struct ib_device *device,
>  }
>  EXPORT_SYMBOL(ib_find_cached_gid);
>
> +int ib_find_gid_by_filter(struct ib_device *device,
> + const union ib_gid *gid,
> + u8 port_num,
> + bool (*filter)(const union ib_gid *gid,
> +const struct ib_gid_attr *,
> +void *),
> + void *context, u16 *index)
> +{
> +   /* Only RoCE GID table supports filter function */
> +   if (!rdma_cap_roce_gid_table(device, port_num) && filter)
> +   return -EPROTON

Re: [PATCH for-next v2 08/10] IB/cm: Remove the usage of smac and vid of qp_attr and cm_av

2015-10-18 Thread Devesh Sharma

Looks Good
Reviewed-By: Devesh Sharma 

On Thu, Oct 15, 2015 at 9:08 PM, Matan Barak  wrote:
> The cm and cma don't need to explicitly handle vlan and smac,
> as they are resolved from the GID index now. Removing this
> portion of code.
>
> Signed-off-by: Matan Barak 
> ---
>  drivers/infiniband/core/cm.c  | 30 --
>  drivers/infiniband/core/cma.c |  6 --
>  2 files changed, 36 deletions(-)
>
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index ea7f3c5..6dd24a5 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -179,8 +179,6 @@ struct cm_av {
> struct ib_ah_attr ah_attr;
> u16 pkey_index;
> u8 timeout;
> -   u8  valid;
> -   u8  smac[ETH_ALEN];
>  };
>
>  struct cm_work {
> @@ -389,7 +387,6 @@ static int cm_init_av_by_path(struct ib_sa_path_rec 
> *path, struct cm_av *av)
>  &av->ah_attr);
> av->timeout = path->packet_life_time + 1;
>
> -   av->valid = 1;
> return 0;
>  }
>
> @@ -1637,7 +1634,6 @@ static int cm_req_handler(struct cm_work *work)
> cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]);
>
> memcpy(work->path[0].dmac, cm_id_priv->av.ah_attr.dmac, ETH_ALEN);
> -   work->path[0].vlan_id = cm_id_priv->av.ah_attr.vlan_id;
> ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
> if (ret) {
> ib_get_cached_gid(work->port->cm_dev->ib_device,
> @@ -3614,32 +3610,6 @@ static int cm_init_qp_rtr_attr(struct cm_id_private 
> *cm_id_priv,
> *qp_attr_mask = IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU |
> IB_QP_DEST_QPN | IB_QP_RQ_PSN;
> qp_attr->ah_attr = cm_id_priv->av.ah_attr;
> -   if (!cm_id_priv->av.valid) {
> -   spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> -   return -EINVAL;
> -   }
> -   if (cm_id_priv->av.ah_attr.vlan_id != 0x) {
> -   qp_attr->vlan_id = cm_id_priv->av.ah_attr.vlan_id;
> -   *qp_attr_mask |= IB_QP_VID;
> -   }
> -   if (!is_zero_ether_addr(cm_id_priv->av.smac)) {
> -   memcpy(qp_attr->smac, cm_id_priv->av.smac,
> -  sizeof(qp_attr->smac));
> -   *qp_attr_mask |= IB_QP_SMAC;
> -   }
> -   if (cm_id_priv->alt_av.valid) {
> -   if (cm_id_priv->alt_av.ah_attr.vlan_id != 0x) {
> -   qp_attr->alt_vlan_id =
> -   cm_id_priv->alt_av.ah_attr.vlan_id;
> -   *qp_attr_mask |= IB_QP_ALT_VID;
> -   }
> -   if (!is_zero_ether_addr(cm_id_priv->alt_av.smac)) {
> -   memcpy(qp_attr->alt_smac,
> -  cm_id_priv->alt_av.smac,
> -  sizeof(qp_attr->alt_smac));
> -   *qp_attr_mask |= IB_QP_ALT_SMAC;
> -   }
> -   }
> qp_attr->path_mtu = cm_id_priv->path_mtu;
> qp_attr->dest_qp_num = be32_to_cpu(cm_id_priv->remote_qpn);
> qp_attr->rq_psn = be32_to_cpu(cm_id_priv->rq_psn);
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 849c280..bc11ea4 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -733,12 +733,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private 
> *id_priv,
>
> BUG_ON(id_priv->cma_dev->device != id_priv->id.device);
>
> -   if (rdma_protocol_roce(id_priv->id.device, id_priv->id.port_num)) {
> -   ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);
> -
> -   if (ret)
> -   goto out;
> -   }
> if (conn_param)
> qp_attr.max_dest_rd_atomic = conn_param->responder_resources;
> ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask);
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-next v2 09/10] IB/core: Remove smac and vlan id from qp_attr and ah_attr

2015-10-18 Thread Devesh Sharma

Looks Good
Reviewed-By: Devesh Sharma 

On Thu, Oct 15, 2015 at 9:08 PM, Matan Barak  wrote:
> Smac and vlan id could be resolved from the GID attribute, and thus
> these attributes aren't needed anymore. Removing them.
>
> Signed-off-by: Matan Barak 
> ---
>  drivers/infiniband/core/sa_query.c   |  4 
>  drivers/infiniband/core/ucma.c   |  1 -
>  drivers/infiniband/core/uverbs_cmd.c |  1 -
>  include/rdma/ib_verbs.h  | 13 -
>  4 files changed, 4 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/infiniband/core/sa_query.c 
> b/drivers/infiniband/core/sa_query.c
> index c9d9d7a..77f5afc 100644
> --- a/drivers/infiniband/core/sa_query.c
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -1029,11 +1029,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 
> port_num,
> }
> if (force_grh) {
> memcpy(ah_attr->dmac, rec->dmac, ETH_ALEN);
> -   ah_attr->vlan_id = rec->vlan_id;
> -   } else {
> -   ah_attr->vlan_id = 0x;
> }
> -
> return 0;
>  }
>  EXPORT_SYMBOL(ib_init_ah_from_path);
> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
> index a53fc9b..ac55a9e 100644
> --- a/drivers/infiniband/core/ucma.c
> +++ b/drivers/infiniband/core/ucma.c
> @@ -1211,7 +1211,6 @@ static int ucma_set_ib_path(struct ucma_context *ctx,
> return -EINVAL;
>
> memset(&sa_path, 0, sizeof(sa_path));
> -   sa_path.vlan_id = 0x;
>
> ib_sa_unpack_path(path_data->path_rec, &sa_path);
> ret = rdma_set_ib_paths(ctx->cm_id, &sa_path, 1);
> diff --git a/drivers/infiniband/core/uverbs_cmd.c 
> b/drivers/infiniband/core/uverbs_cmd.c
> index b242480..ae5f912 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -2698,7 +2698,6 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
> attr.grh.sgid_index= cmd.attr.grh.sgid_index;
> attr.grh.hop_limit = cmd.attr.grh.hop_limit;
> attr.grh.traffic_class = cmd.attr.grh.traffic_class;
> -   attr.vlan_id   = 0;
> memset(&attr.dmac, 0, sizeof(attr.dmac));
> memcpy(attr.grh.dgid.raw, cmd.attr.grh.dgid, 16);
>
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 9868cab..0b20658 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -697,7 +697,6 @@ struct ib_ah_attr {
> u8  ah_flags;
> u8  port_num;
> u8  dmac[ETH_ALEN];
> -   u16 vlan_id;
>  };
>
>  enum ib_wc_status {
> @@ -957,10 +956,10 @@ enum ib_qp_attr_mask {
> IB_QP_PATH_MIG_STATE= (1<<18),
> IB_QP_CAP   = (1<<19),
> IB_QP_DEST_QPN  = (1<<20),
> -   IB_QP_SMAC  = (1<<21),
> -   IB_QP_ALT_SMAC  = (1<<22),
> -   IB_QP_VID   = (1<<23),
> -   IB_QP_ALT_VID   = (1<<24),
> +   IB_QP_RESERVED1 = (1<<21),
> +   IB_QP_RESERVED2 = (1<<22),
> +   IB_QP_RESERVED3 = (1<<23),
> +   IB_QP_RESERVED4 = (1<<24),
>  };
>
>  enum ib_qp_state {
> @@ -1010,10 +1009,6 @@ struct ib_qp_attr {
> u8  rnr_retry;
> u8  alt_port_num;
> u8  alt_timeout;
> -   u8  smac[ETH_ALEN];
> -   u8  alt_smac[ETH_ALEN];
> -   u16 vlan_id;
> -   u16 alt_vlan_id;
>  };
>
>  enum ib_wr_opcode {
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-next v2 10/10] IB/core: Remove smac and vlan id from path record

2015-10-18 Thread Devesh Sharma

Looks Good
Reviewed-By: Devesh Sharma 

On Thu, Oct 15, 2015 at 9:08 PM, Matan Barak  wrote:
> The GID cache accompanies every GID with attributes.
> The GID attributes link the GID with its netdevice, which could be
> resolved to smac and vlan id easily. Since we've added the netdevice
> (ifindex and net) to the path record, storing the L2 attributes is
> duplicated data and hence these attributes are removed.
>
> Signed-off-by: Matan Barak 
> ---
>  drivers/infiniband/core/cma.c | 2 --
>  drivers/infiniband/core/sa_query.c| 2 --
>  drivers/infiniband/core/uverbs_marshall.c | 2 --
>  include/rdma/ib_sa.h  | 2 --
>  4 files changed, 8 deletions(-)
>
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index bc11ea4..2914460 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -2281,9 +2281,7 @@ static int cma_resolve_iboe_route(struct 
> rdma_id_private *id_priv)
> goto err2;
> }
>
> -   route->path_rec->vlan_id = rdma_vlan_dev_vlan_id(ndev);
> memcpy(route->path_rec->dmac, addr->dev_addr.dst_dev_addr, ETH_ALEN);
> -   memcpy(route->path_rec->smac, ndev->dev_addr, ndev->addr_len);
>
> rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr,
> &route->path_rec->sgid);
> diff --git a/drivers/infiniband/core/sa_query.c 
> b/drivers/infiniband/core/sa_query.c
> index 77f5afc..dcdaa79 100644
> --- a/drivers/infiniband/core/sa_query.c
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -1153,11 +1153,9 @@ static void ib_sa_path_rec_callback(struct ib_sa_query 
> *sa_query,
>
> ib_unpack(path_rec_table, ARRAY_SIZE(path_rec_table),
>   mad->data, &rec);
> -   rec.vlan_id = 0x;
> rec.net = NULL;
> rec.ifindex = 0;
> memset(rec.dmac, 0, ETH_ALEN);
> -   memset(rec.smac, 0, ETH_ALEN);
> query->callback(status, &rec, query->context);
> } else
> query->callback(status, NULL, query->context);
> diff --git a/drivers/infiniband/core/uverbs_marshall.c 
> b/drivers/infiniband/core/uverbs_marshall.c
> index 484698c..7d2f14c 100644
> --- a/drivers/infiniband/core/uverbs_marshall.c
> +++ b/drivers/infiniband/core/uverbs_marshall.c
> @@ -141,9 +141,7 @@ void ib_copy_path_rec_from_user(struct ib_sa_path_rec 
> *dst,
> dst->preference = src->preference;
> dst->packet_life_time_selector = src->packet_life_time_selector;
>
> -   memset(dst->smac, 0, sizeof(dst->smac));
> memset(dst->dmac, 0, sizeof(dst->dmac));
> -   dst->vlan_id = 0x;
> dst->net = NULL;
> dst->ifindex = 0;
>  }
> diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
> index 406ecf1..3019695 100644
> --- a/include/rdma/ib_sa.h
> +++ b/include/rdma/ib_sa.h
> @@ -155,9 +155,7 @@ struct ib_sa_path_rec {
> u8   packet_life_time_selector;
> u8   packet_life_time;
> u8   preference;
> -   u8   smac[ETH_ALEN];
> u8   dmac[ETH_ALEN];
> -   u16  vlan_id;
> /* ignored in IB */
> int  ifindex;
> /* ignored in IB */
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] RDMA/libocrdma: set vlan present bit for UD

2015-11-01 Thread Devesh Sharma

This patch tells f/w about the presence of VLAN tag in
the AH being supplied to the QP.

Signed-off-by: Devesh Sharma 
---
 src/ocrdma_abi.h   | 7 ---
 src/ocrdma_main.h  | 7 +++
 src/ocrdma_verbs.c | 8 ++--
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/ocrdma_abi.h b/src/ocrdma_abi.h
index ad7abd4..8102c1c 100644
--- a/src/ocrdma_abi.h
+++ b/src/ocrdma_abi.h
@@ -51,14 +51,14 @@ enum {
 };
 
 #define OCRDMA_DB_CQ_RING_ID_MASK  0x3FF   /* bits 0 - 9 */
-#define OCRDMA_DB_CQ_RING_ID_EXT_MASK  0x0C00  /* bits 10-11 of qid placing at 
12-11 */
-#define OCRDMA_DB_CQ_RING_ID_EXT_MASK_SHIFT  0x1   /* qid #2 msbits 
placing at 12-11 */
+#define OCRDMA_DB_CQ_RING_ID_EXT_MASK  0x0C00  /* bits 10-11 of qid 
placing at 12-11 */
+#define OCRDMA_DB_CQ_RING_ID_EXT_MASK_SHIFT0x1 /* qid #2 msbits 
placing at 12-11 */
 #define OCRDMA_DB_CQ_NUM_POPPED_SHIFT  (16)/* bits 16 - 28 */
 /* Rearm bit */
 #define OCRDMA_DB_CQ_REARM_SHIFT   (29)/* bit 29 */
 
 /* solicited bit */
-#define OCRDMA_DB_CQ_SOLICIT_SHIFT   (31)  /* bit 31 */
+#define OCRDMA_DB_CQ_SOLICIT_SHIFT (31)/* bit 31 */
 
 struct ocrdma_get_context {
struct ibv_get_context cmd;
@@ -291,6 +291,7 @@ enum {
OCRDMA_FLAG_FENCE_R = 0x8,
OCRDMA_FLAG_SOLICIT = 0x10,
OCRDMA_FLAG_IMM = 0x20,
+   OCRDMA_FLAG_AH_VLAN_PR  = 0x40,
 
/* Stag flags */
OCRDMA_LKEY_FLAG_LOCAL_WR   = 0x1,
diff --git a/src/ocrdma_main.h b/src/ocrdma_main.h
index 5a386bb..4e7be75 100644
--- a/src/ocrdma_main.h
+++ b/src/ocrdma_main.h
@@ -211,10 +211,17 @@ struct ocrdma_qp {
int signaled;   /* signaled QP */
 };
 
+enum {
+   OCRDMA_AH_ID_MASK   = 0x3FF,
+   OCRDMA_AH_VLAN_VALID_MASK   = 0x01,
+   OCRDMA_AH_VLAN_VALID_SHIFT  = 0x1F
+};
+
 struct ocrdma_ah {
struct ibv_ah ibv_ah;
struct ocrdma_pd *pd;
uint16_t id;
+   uint8_t isvlan;
 };
 
 #define get_ocrdma_xxx(xxx, type)  \
diff --git a/src/ocrdma_verbs.c b/src/ocrdma_verbs.c
index ab90b4f..d80ab27 100644
--- a/src/ocrdma_verbs.c
+++ b/src/ocrdma_verbs.c
@@ -1196,6 +1196,9 @@ static void ocrdma_build_ud_hdr(struct ocrdma_qp *qp,
ud_hdr->rsvd_dest_qpn = wr->wr.ud.remote_qpn;
ud_hdr->qkey = wr->wr.ud.remote_qkey;
ud_hdr->rsvd_ahid = ah->id;
+   if (ah->isvlan)
+   hdr->cw |= (OCRDMA_FLAG_AH_VLAN_PR <<
+   OCRDMA_WQE_FLAGS_SHIFT);
 }
 
 static void ocrdma_build_sges(struct ocrdma_hdr_wqe *hdr,
@@ -2156,9 +2159,10 @@ struct ibv_ah *ocrdma_create_ah(struct ibv_pd *ibpd, 
struct ibv_ah_attr *attr)
if (status)
goto cmd_err;
 
-   ah->id = pd->uctx->ah_tbl[ahtbl_idx];
+   ah->id = pd->uctx->ah_tbl[ahtbl_idx] & OCRDMA_AH_ID_MASK;
+   ah->isvlan = (pd->uctx->ah_tbl[ahtbl_idx] >>
+   OCRDMA_AH_VLAN_VALID_SHIFT);
return &ah->ibv_ah;
-
 cmd_err:
ocrdma_free_ah_tbl_id(pd->uctx, ahtbl_idx);
 tbl_err:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] RDMA/libocrdma: update libocrdma version string

2015-11-01 Thread Devesh Sharma

version string updated from 1.0.5 to 1.0.6

Signed-off-by: Devesh Sharma 
---
 configure.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure.in b/configure.in
index 140c07b..653bc43 100644
--- a/configure.in
+++ b/configure.in
@@ -1,11 +1,11 @@
 dnl Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.57)
-AC_INIT(libocrdma, 1.0.5, linux-rdma@vger.kernel.org)
+AC_INIT(libocrdma, 1.0.6, linux-rdma@vger.kernel.org)
 AC_CONFIG_SRCDIR([src/ocrdma_main.h])
 AC_CONFIG_AUX_DIR(config)
 AM_CONFIG_HEADER(config.h)
-AM_INIT_AUTOMAKE(libocrdma, 1.0.5)
+AM_INIT_AUTOMAKE(libocrdma, 1.0.6)
 AM_PROG_LIBTOOL
 
 AC_ARG_ENABLE(libcheck, [ --disable-libcheckdo not test for the presence 
of ib libraries],
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] RDMA/libocrdma:sync qp-state with hw state

2015-11-01 Thread Devesh Sharma

From: Padmanabh Ratnakar 

This patch sync up the QP state with the underlying h/w
QP state and reports the same to user application.

Signed-off-by: Padmanabh Ratnakar 
Signed-off-by: Devesh Sharma 
---
 src/ocrdma_verbs.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/src/ocrdma_verbs.c b/src/ocrdma_verbs.c
index cf2ecd2..ab90b4f 100644
--- a/src/ocrdma_verbs.c
+++ b/src/ocrdma_verbs.c
@@ -651,20 +651,6 @@ mbx_err:
return NULL;
 }
 
-/*
- * ocrdma_query_qp
- */
-int ocrdma_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
-   int attr_mask, struct ibv_qp_init_attr *init_attr)
-{
-   struct ibv_query_qp cmd;
-   int status;
-
-   status =
-   ibv_cmd_query_qp(qp, attr, attr_mask, init_attr, &cmd, sizeof(cmd));
-   return status;
-}
-
 enum ocrdma_qp_state get_ocrdma_qp_state(enum ibv_qp_state qps)
 {
switch (qps) {
@@ -896,6 +882,25 @@ int ocrdma_modify_qp(struct ibv_qp *ibqp, struct 
ibv_qp_attr *attr,
return status;
 }
 
+/*
+ * ocrdma_query_qp
+ */
+int ocrdma_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+   int attr_mask, struct ibv_qp_init_attr *init_attr)
+{
+   struct ibv_query_qp cmd;
+   struct ocrdma_qp *qp = get_ocrdma_qp(ibqp);
+   int status;
+
+   status = ibv_cmd_query_qp(ibqp, attr, attr_mask,
+ init_attr, &cmd, sizeof(cmd));
+
+   if (!status)
+   ocrdma_qp_state_machine(qp, attr->qp_state);
+
+   return status;
+}
+
 static void ocrdma_srq_toggle_bit(struct ocrdma_srq *srq, int idx)
 {
int i = idx / 32;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] RDMA/libocrdma: Prevent CQ-Doorbell floods

2015-11-01 Thread Devesh Sharma

Changing CQ-Doorbell(DB) logic to prevent DB floods, it is supposed to be
pressed only if any hw CQE is polled. If cq-arm was requested
previously then don't bother about number of hw CQEs polled and
arm the CQ.

Signed-off-by: Devesh Sharma 
---
 src/ocrdma_verbs.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/src/ocrdma_verbs.c b/src/ocrdma_verbs.c
index d80ab27..cf6f72c 100644
--- a/src/ocrdma_verbs.c
+++ b/src/ocrdma_verbs.c
@@ -2003,14 +2003,11 @@ expand_cqe:
}
 stop_cqe:
cq->getp = cur_getp;
-   if (cq->deferred_arm) {
-   ocrdma_ring_cq_db(cq, 1, cq->deferred_sol, polled_hw_cqes);
+   if (cq->deferred_arm || polled_hw_cqes) {
+   ocrdma_ring_cq_db(cq, cq->deferred_arm,
+ cq->deferred_sol, polled_hw_cqes);
cq->deferred_arm = 0;
cq->deferred_sol = 0;
-   } else {
-   /* We need to pop the CQE. No need to arm */
-   ocrdma_ring_cq_db(cq, 0, cq->deferred_sol, polled_hw_cqes);
-   cq->deferred_sol = 0;
}
 
return i;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] libocrdma bug-fixes

2015-11-01 Thread Devesh Sharma

Bug fix series of user-space rdma driver for Emulex
devices.

Devesh Sharma (3):
  RDMA/libocrdma: set vlan present bit for UD
  RDMA/libocrdma: Prevent CQ-Doorbell floods
  RDMA/libocrdma: update libocrdma version string

Padmanabh Ratnakar (1):
  RDMA/libocrdma:sync qp-state with hw state

 configure.in   |  4 ++--
 src/ocrdma_abi.h   |  7 ---
 src/ocrdma_main.h  |  7 +++
 src/ocrdma_verbs.c | 50 --
 4 files changed, 41 insertions(+), 27 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/11] IB: remove support for phys MRs

2015-11-24 Thread Devesh Sharma

Reviewed-By: Devesh Sharma (ocrdma)

On Sun, Nov 22, 2015 at 11:16 PM, Christoph Hellwig  wrote:
> We have stopped using phys MRs in the kernel a while ago, so let's
> remove all the cruft used to implement them.
>
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Steve Wise[cxgb3, cxgb4]
> ---
>  drivers/infiniband/hw/cxgb3/iwch_mem.c   |  31 ---
>  drivers/infiniband/hw/cxgb3/iwch_provider.c  |  69 --
>  drivers/infiniband/hw/cxgb3/iwch_provider.h  |   4 -
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |  11 -
>  drivers/infiniband/hw/cxgb4/mem.c| 248 -
>  drivers/infiniband/hw/cxgb4/provider.c   |   2 -
>  drivers/infiniband/hw/mthca/mthca_provider.c |  84 ---
>  drivers/infiniband/hw/nes/nes_cm.c   |   7 +-
>  drivers/infiniband/hw/nes/nes_verbs.c|   3 +-
>  drivers/infiniband/hw/nes/nes_verbs.h|   5 +
>  drivers/infiniband/hw/ocrdma/ocrdma_main.c   |   1 -
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  | 163 --
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |   3 -
>  drivers/infiniband/hw/qib/qib_mr.c   |  51 +
>  drivers/infiniband/hw/qib/qib_verbs.c|   1 -
>  drivers/infiniband/hw/qib/qib_verbs.h|   4 -
>  drivers/staging/rdma/amso1100/c2_provider.c  |   1 -
>  drivers/staging/rdma/ehca/ehca_iverbs.h  |  11 -
>  drivers/staging/rdma/ehca/ehca_main.c|   2 -
>  drivers/staging/rdma/ehca/ehca_mrmw.c| 321 
> ---
>  drivers/staging/rdma/ehca/ehca_mrmw.h|   5 -
>  drivers/staging/rdma/hfi1/mr.c   |  51 +
>  drivers/staging/rdma/hfi1/verbs.c|   1 -
>  drivers/staging/rdma/hfi1/verbs.h|   4 -
>  drivers/staging/rdma/ipath/ipath_mr.c|  55 -
>  drivers/staging/rdma/ipath/ipath_verbs.c |   1 -
>  drivers/staging/rdma/ipath/ipath_verbs.h |   4 -
>  include/rdma/ib_verbs.h  |  16 +-
>  28 files changed, 15 insertions(+), 1144 deletions(-)
>
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c 
> b/drivers/infiniband/hw/cxgb3/iwch_mem.c
> index 5c36ee2..3a5e27d 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
> @@ -75,37 +75,6 @@ int iwch_register_mem(struct iwch_dev *rhp, struct iwch_pd 
> *php,
> return ret;
>  }
>
> -int iwch_reregister_mem(struct iwch_dev *rhp, struct iwch_pd *php,
> -   struct iwch_mr *mhp,
> -   int shift,
> -   int npages)
> -{
> -   u32 stag;
> -   int ret;
> -
> -   /* We could support this... */
> -   if (npages > mhp->attr.pbl_size)
> -   return -ENOMEM;
> -
> -   stag = mhp->attr.stag;
> -   if (cxio_reregister_phys_mem(&rhp->rdev,
> -  &stag, mhp->attr.pdid,
> -  mhp->attr.perms,
> -  mhp->attr.zbva,
> -  mhp->attr.va_fbo,
> -  mhp->attr.len,
> -  shift - 12,
> -  mhp->attr.pbl_size, mhp->attr.pbl_addr))
> -   return -ENOMEM;
> -
> -   ret = iwch_finish_mem_reg(mhp, stag);
> -   if (ret)
> -   cxio_dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size,
> -  mhp->attr.pbl_addr);
> -
> -   return ret;
> -}
> -
>  int iwch_alloc_pbl(struct iwch_mr *mhp, int npages)
>  {
> mhp->attr.pbl_addr = cxio_hal_pblpool_alloc(&mhp->rhp->rdev,
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
> b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> index 1567b5b..9576e15 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -556,73 +556,6 @@ err:
>
>  }
>
> -static int iwch_reregister_phys_mem(struct ib_mr *mr,
> -int mr_rereg_mask,
> -struct ib_pd *pd,
> -struct ib_phys_buf *buffer_list,
> -int num_phys_buf,
> -int acc, u64 * iova_start)
> -{
> -
> -   struct iwch_mr mh, *mhp;
> -   struct iwch_pd *php;
> -   struct iwch_dev *rhp;
> -   __be64 *page_list = NULL;
> -   int shift = 0;
> -   u64 total_size;
> -   int npages = 0;
> -   int ret;
> -
> -   PDBG("%s ib_mr %p ib_pd %p\n"

Re: [PATCH V4 9/9] IB/mlx4: Enable mlx4_ib support for MODIFY_QP_EX

2013-09-11 Thread Devesh Sharma

Hi Or,

I don't see any patches to librdmacm/libibverbs git to call _EX
version of uverbs commands. The patch you have pointed in v4-0 patch
still seems to be incomplete. Are these broken?

On Tue, Sep 10, 2013 at 8:11 PM, Or Gerlitz  wrote:
> From: Matan Barak 
>
> mlx4_ib driver should indicate that it supports
> MODIFY_QP_EX user verbs extended command.
>
> Signed-off-by: Matan Barak 
> Signed-off-by: Or Gerlitz 
> ---
>  drivers/infiniband/hw/mlx4/main.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx4/main.c 
> b/drivers/infiniband/hw/mlx4/main.c
> index 7a29ad5..77c87d0 100644
> --- a/drivers/infiniband/hw/mlx4/main.c
> +++ b/drivers/infiniband/hw/mlx4/main.c
> @@ -1755,7 +1755,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
> (1ull << IB_USER_VERBS_CMD_QUERY_SRQ)   |
> (1ull << IB_USER_VERBS_CMD_DESTROY_SRQ) |
> (1ull << IB_USER_VERBS_CMD_CREATE_XSRQ) |
> -   (1ull << IB_USER_VERBS_CMD_OPEN_QP);
> +   (1ull << IB_USER_VERBS_CMD_OPEN_QP) |
> +   (1ull << IB_USER_VERBS_CMD_MODIFY_QP_EX);
>
> ibdev->ib_dev.query_device  = mlx4_ib_query_device;
> ibdev->ib_dev.query_port= mlx4_ib_query_port;
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4 9/9] IB/mlx4: Enable mlx4_ib support for MODIFY_QP_EX

2013-09-12 Thread Devesh Sharma

Inline Below:

On Thu, Sep 12, 2013 at 4:15 PM, Or Gerlitz  wrote:
> On 12/09/2013 08:26, Devesh Sharma wrote:
>>
>> I don't see any patches to librdmacm/libibverbs git to call _EX version of
>> uverbs commands.
>
>
> We've posted the kernel patches, that should be enough for the review. If
> you have any specific questions re user
> space aspects of this series, feel free to send them now.

Yes! for kenel space I see the above set of patches will work fine
without any issues. On the other hand, if from user space some
application tries to establish a connection using RDMACM, the Driver
will receive dmac and vlanid fields as Zeros because
libibverbs/librdmacm still does not call _EX versions of UVERBS/UCM
commands, which are introduced in these set of patches (7/9, 8/9). So,
for example if I try to run ib_send_bw with -R, traffice will not
run!!

So what are the plans to add these changes in libibverbs/librdmacm libraries.
 OR
there is some flaw in my understanding that librdmacm/libibverbs needs
changes in order to uses newly proposed scheme. Please clarify.

-Regards
 Devesh
>
>
>> The patch you have pointed in v4-0 patch still seems to be incomplete. Are
>> these broken?
>
>
> I don't understand the question. During the review of V4 we were pointed to
> a part missing in patch 5/9
> and this will be fixed in V5, sure.
>
> Or.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4 9/9] IB/mlx4: Enable mlx4_ib support for MODIFY_QP_EX

2013-09-12 Thread Devesh Sharma

Inline Below on Second question:

On Thu, Sep 12, 2013 at 4:15 PM, Or Gerlitz  wrote:
> On 12/09/2013 08:26, Devesh Sharma wrote:
>>
>> I don't see any patches to librdmacm/libibverbs git to call _EX version of
>> uverbs commands.
>
>
> We've posted the kernel patches, that should be enough for the review. If
> you have any specific questions re user
> space aspects of this series, feel free to send them now.
>
>
>> The patch you have pointed in v4-0 patch still seems to be incomplete. Are
>> these broken?
>
>
> I don't understand the question. During the review of V4 we were pointed to
> a part missing in patch 5/9
> and this will be fixed in V5, sure.
Yes, 5/9 miss a part related to populatin gid table during load time.
Well I was totally concerned about the user space apps, with current
git of libibverbs/librdmacm user app will fail to perform data
transfer operations.
After digging more into the liunx-rdma git I found the completed set
of patches for flow-steering which introduces extension commands in
kernel space. Still looking for corresponding patches for
liblibverbs/librdmacm.

-Regards
 Devesh Sharama

>
> Or.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4 9/9] IB/mlx4: Enable mlx4_ib support for MODIFY_QP_EX

2013-10-02 Thread Devesh Sharma

Hi Or,

One more point I have is, since current application like
perftest/qperf/rping/krping do not have code to receive ipv6 address,
do you have plans to modify these?

On Sun, Sep 29, 2013 at 4:18 PM, Or Gerlitz  wrote:
> On 17/09/2013 23:49, Or Gerlitz wrote:
>>
>> On Tue, Sep 17, 2013 at 8:50 PM, Roland Dreier wrote:
>>>
>>> On Thu, Sep 12, 2013 at 10:22 AM, Jason Gunthorpe wrote:

 On Thu, Sep 12, 2013 at 03:24:46PM +0300, Or Gerlitz wrote:
>
> Let me clarify this. The idea is that current RoCE applications will
> run as is after they update "their" librdmacm, since its this
> library that works with the new uverbs entries.

 Or, we are not supposed to break userspace. You can't insist that a
 user space library be updated in-sync with the kernel.
>>>
>>> Agree.  This "IP based addressing" for RoCE looks like a big problem
>>> at the moment.  Let me reiterate my understanding, and you guys can
>>> correct me if I get something wrong:
>>>
>>>   - current addressing scheme is broken for virtualization use cases,
>>> because VMs may not know about what VLANs are in use.  (also there are
>>> issues around bonding modes that use different Ethernet addresses)
>>
>> The current addressing is actually broken for vlan use cases, both
>> native and virtualized, for the virt as of the argument you mentioned,
>> for native as of one node connected to Ethernet edge switch acting in
>> access mode (that is the switch does vlan insertion/stripping) and the
>> other node handling vlans by itself. Each one will form different GID
>> for the other party.
>>
>>>   - proposed change requires:
>>> * all systems must update kernel at the same time, because old and
>>> new kernels cannot talk to each other
>>> * all systems must update librdmacm when they update the kernel,
>>> because old librdmacm does not work with new kernel
>>> I understand that we want to fix the issue around VLAN tagged traffic
>>> from VMs, but I don't see how we can break the whole stack to
>>> accomplish that.  Isn't there some incremental way forward?
>>
>> To begin with, we don't break the whole stack -- using the current
>> patch set, for ports whose link is IB, all biz as usual, and this is
>> the in the port resolution, that is if for a given device one port is
>> IB and one port Eth, existing librdmacm keep working on the IB por.
>>
>> Another fact to put in the fire is that SRIOV VMs don't have RoCE now
>> (not supported by upstream). Actually we're holding off with the SRIOV
>> RoCE patches submission b/c of the breakage with the current scheme
>> --> no need for backward compatibility here either. The vast majority
>> if not all the Cloud use cases we are aware to which would use RoCE
>> need VST and need it to work right.
>>
>> With vlans being broken already, I would say we need 1st and most fix
>> that and only/maybe later worry on backward compatibility for the few
>> native mode use cases that somehow manage to workaround the buggish
>> gid format when they use vlans.
>>
>> As for those who don't use vlans, which is also rare, as RoCE is
>> working best over some lossless channel which is typically achieved
>> using PFC over a vlan... we can use the fact that the IP bases
>> addressing patches configure both interface IPv4 and IPv6 addresses
>> into the gid table.
>>
>> Now,  the IPv6 link address is actually also plugged into the gid
>> table by nodes running the old code since this is how the non-vlan MAC
>> based GID is constructed. Using this fact, we can allow
>>
>> 1. the patched kernel to work with non updated user space, as long as
>> they use the GID which relates to an IPv6 link local address
>>
>> 2. node running the "old" code to talk with "new" node over what the
>> old node sees as a non-vlan MAC based GID and the new node sees as
>> IPv6 link local gid.
>>
>> Sounds better?
>>
>>
>
> Hi Roland, ping, I have wrote a detailed reply to your concerns and no word
> from you except on the
> "begin with" part, can you? Or.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP

2013-11-19 Thread Devesh Sharma

Hi Roland,

I agree with Or. RDMA-CM takes care of resolving l2 addresses for kernel ULP 
and there are no _non-rdma-cm_ ULP right now in the IB-kernel-stack. On the 
other hand, for the user-space applications, which uses RDMA-CM V5 is a 
simplified approach. All the patches =< v4 was an effort to change RDMA-CM and 
it changed entire user/kernel interface of rdma-cm and verbs, this was not 
acceptable.

With V5 patch-set I still have a concern about those user apps which does not 
use rdma-cm (e.g. ib_send_bw without -R option) how DMAC and SMAC will be 
resolved?

However, in the approach to move address resolution to core verbs i.e. 
ib_modify_qp() and ib_create_ah(), vendor driver will have freedom to resolve 
l2 addresses in its own way. 
Init_ah_from_wc() would still need changes. This change will be same as done in 
v5 patch set. This approach would also solve the "ib_send_bw without -R option" 
issue.

-Regards
 Devesh

-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Or Gerlitz
Sent: Wednesday, November 20, 2013 2:19 AM
To: Roland Dreier
Cc: Or Gerlitz; linux-rdma@vger.kernel.org; monis; matanb; Tzahi Oved; Moni 
Shoua
Subject: Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when 
modifying QP

On Tue, Nov 19, 2013 at 8:08 PM, Roland Dreier  wrote:
>> Existing user space applications provide only IBoE L3 address 
>> attributes to the kernel when they issue QP modify. To comply with 
>> them and let such apps to keep work transparently under the IBoE GID 
>> IP addressing changes, added Eth L2 address resolution in the user-kernel 
>> linking piece - uverbs.

> I don't get why this belongs in uverbs.  In the current design serves 
> as a transport between userspace and kernel and the kernel verbs are 
> the same as user verbs.  The only exception to this that introduces 
> complexity is the stuff related to sharing XRCs and that makes sense 
> because multiple processes etc. is definitely a userspace-only concern.

> However in this case I don't see why address resolution is something 
> only userspace cares about.  Wouldn't it make sense to put this 
> resolution in core verbs?

Basically, we've put it into uverbs b/c for kernel consumers that use the 
rdma-cm the problem doesn't exist, since the Ethernet L2 attributes are filled 
into the qp attributes used by the rdma-cm throughout the address resolution 
process.

Since currently there are no in-tree non rdma-cm cosumer ULPs that are  
applicable to RoCE, the kernel is done deal in that respect.

If it helps or/and make more sense, sure we can move the reslution to be done @ 
the core verbs, e.g in core/verbs.c :: ib_modify_qp, anything else expect for 
this feedback?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP

2013-11-20 Thread Devesh Sharma

Okay, got it, I think I got confused. putting the changes in verbs.c/modify_qp 
and verbs.c/create_ah make sense for me.

Agree with Som and Or.

-Regards
 Devesh

From: Or Gerlitz [ogerl...@mellanox.com]
Sent: Wednesday, November 20, 2013 3:38 PM
To: Somnath Kotur; Devesh Sharma; Or Gerlitz; Roland Dreier
Cc: linux-rdma@vger.kernel.org; monis; matanb; Tzahi Oved; Moni Shoua
Subject: Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when 
modifying QP

On 20/11/2013 09:15, Somnath Kotur wrote:
>> However, in the approach to move address resolution to core verbs i.e.
>> >ib_modify_qp() and ib_create_ah(), vendor driver will have freedom to
>> >resolve l2 addresses in its own way.
> This I am not sure if it's a good idea for each vendor driver to implement L2 
> address resolution in it's own way? Not sure if that was the intent behind 
> Roland's statement ?

I agree with Somnath, I don't see the point in putting L3 --> L2 address
resolution within vendor drivers. This will create huge code
duplication, and more problems. Again, with the proposed patches all
kernel ULPs that are applicable to RoCE are covered and hence there's no
address resolution for kernel session. To comply with non-modified user
space applications/libraries V5 added a code to do address resolution
and Roland just pointed out it may makes more sense to put that small
code piece in the core verbs modify_qp function and not in the uverbs call.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP

2013-12-11 Thread Devesh Sharma

On Thu, Dec 12, 2013 at 2:11 AM, Or Gerlitz  wrote:
> On Wed, Dec 11, 2013 at 7:59 PM, Roland Dreier  wrote:
>
>> OK, I guess I'm still confused.  Do we have two ways of resolving
>> addresses, one way for consumers that use the RDMA CM and another way
>> for consumers that don't?  Does that mean we end up having two
>> different paths through core verbs?  Even if we put address resolution
>> in uverbs, how do user apps that *do* use librdmacm work?
>
> Let me clarify -- with V5 it doesn't matter how the application does
> address resolution in the 1st place. The kernel uverbs layer is
> provided with local/remote GIDs in the modify qp uverb entry and for
> Ethernet ports issues address resolution through the relevant
> netdevice etc. This is works the same for applications that use
> librdmacm and for applications who don't.

Yes, ib_send_bw type of apps will work seamlessly with V5. Thanks Or
for the illustration.
>
> With your comment/insight, we will move that small logic within the
> ib_core module from uverbs to verbs.c which means this step will be
> applied when needed on kernel non rdma-cm consumers too such as SRP
> which you referred too below.
>
> When needed means when the relevant qp attributes (== L2 ethernet
> construcsts, source/destination MACs, and vlan ID) are missing. Kernel
> rdma-cm consumers would have these fill in their qp attr since its
> filled by the rdma-cm / ib-addr module throughout the ARP address
> resolution, non rdma-cm consumers will not and the rest follows.
Yes, me too agree with Or.
>
>> Also the statement that there are no non-rdma-cm consumers in the
>> kernel is definitely false -- the SRP initiator does not use IP
>> addressing, and it does seem like a legitimate and useful application
>> to run SRP over ethernet.
>
> fair-enough, so with the change you proposed, this will be supported!
proposal from Roland make sense to me too.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] RDMA/ocrdma: SQ and RQ doorbell offset clean up

2014-01-29 Thread Devesh Sharma

Hi Roland,

Please discard this patch. 
Due to IP based GID changes, this won't apply clean, We are planning to send 
you a new series of patches on for-next tree please use that instead.

-Regards
 Devesh
-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of devesh.sha...@emulex.com
Sent: Monday, November 18, 2013 4:58 PM
To: linux-rdma@vger.kernel.org
Subject: [PATCH] RDMA/ocrdma: SQ and RQ doorbell offset clean up

From: Devesh Sharma

Introducing new macros to define SQ and RQ doorbell offset.

Signed-off-by: Devesh Sharma 
---
 drivers/infiniband/hw/ocrdma/ocrdma.h   |7 ---
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |5 -
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   18 ++
 3 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index adc11d1..07d156f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -384,13 +384,6 @@ static inline struct ocrdma_srq *get_ocrdma_srq(struct 
ib_srq *ibsrq)
return container_of(ibsrq, struct ocrdma_srq, ibsrq);  }
 
-
-static inline int ocrdma_get_num_posted_shift(struct ocrdma_qp *qp) -{
-   return ((qp->dev->nic_info.dev_family == OCRDMA_GEN2_FAMILY &&
-qp->id < 128) ? 24 : 16);
-}
-
 static inline int is_cqe_valid(struct ocrdma_cq *cq, struct ocrdma_cqe *cqe)  {
int cqe_valid;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_sli.h 
b/drivers/infiniband/hw/ocrdma/ocrdma_sli.h
index 9f9570e..38df269 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_sli.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_sli.h
@@ -103,7 +103,10 @@ enum {
OCRDMA_DB_GEN2_SRQ_OFFSET   = OCRDMA_DB_GEN2_RQ_OFFSET,
OCRDMA_DB_CQ_OFFSET = 0x120,
OCRDMA_DB_EQ_OFFSET = OCRDMA_DB_CQ_OFFSET,
-   OCRDMA_DB_MQ_OFFSET = 0x140
+   OCRDMA_DB_MQ_OFFSET = 0x140,
+
+   OCRDMA_DB_SQ_SHIFT  = 16,
+   OCRDMA_DB_RQ_SHIFT  = 24
 };
 
 #define OCRDMA_DB_CQ_RING_ID_MASK   0x3FF  /* bits 0 - 9 */
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 69f1d12..c80ba6e 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -1092,15 +1092,9 @@ static int ocrdma_copy_qp_uresp(struct ocrdma_qp *qp,
}
uresp.db_page_addr = usr_db;
uresp.db_page_size = dev->nic_info.db_page_size;
-   if (dev->nic_info.dev_family == OCRDMA_GEN2_FAMILY) {
-   uresp.db_sq_offset = OCRDMA_DB_GEN2_SQ_OFFSET;
-   uresp.db_rq_offset = OCRDMA_DB_GEN2_RQ_OFFSET;
-   uresp.db_shift = 24;
-   } else {
-   uresp.db_sq_offset = OCRDMA_DB_SQ_OFFSET;
-   uresp.db_rq_offset = OCRDMA_DB_RQ_OFFSET;
-   uresp.db_shift = 16;
-   }
+   uresp.db_sq_offset = OCRDMA_DB_GEN2_SQ_OFFSET;
+   uresp.db_rq_offset = OCRDMA_DB_GEN2_RQ_OFFSET;
+   uresp.db_shift = OCRDMA_DB_RQ_SHIFT;
 
if (qp->dpp_enabled) {
uresp.dpp_credit = dpp_credit_lmt;
@@ -1273,7 +1267,7 @@ static void ocrdma_flush_rq_db(struct ocrdma_qp *qp)  {
if (qp->db_cache) {
u32 val = qp->rq.dbid | (qp->db_cache <<
-   ocrdma_get_num_posted_shift(qp));
+   OCRDMA_DB_RQ_SHIFT);
iowrite32(val, qp->rq_db);
qp->db_cache = 0;
}
@@ -2018,7 +2012,7 @@ static int ocrdma_build_fr(struct ocrdma_qp *qp, struct 
ocrdma_hdr_wqe *hdr,
 
 static void ocrdma_ring_sq_db(struct ocrdma_qp *qp)  {
-   u32 val = qp->sq.dbid | (1 << 16);
+   u32 val = qp->sq.dbid | (1 << OCRDMA_DB_SQ_SHIFT);
 
iowrite32(val, qp->sq_db);
 }
@@ -2123,7 +2117,7 @@ int ocrdma_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
 
 static void ocrdma_ring_rq_db(struct ocrdma_qp *qp)  {
-   u32 val = qp->rq.dbid | (1 << ocrdma_get_num_posted_shift(qp));
+   u32 val = qp->rq.dbid | (1 << OCRDMA_DB_RQ_SHIFT);
 
if (qp->state != OCRDMA_QPS_INIT)
iowrite32(val, qp->rq_db);
--
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 255 matches

Mail list logo