Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2016-01-03 Thread ira.weiny
On Sat, Jan 02, 2016 at 09:03:31AM -0800, Christoph Hellwig wrote:
> On Wed, Dec 30, 2015 at 09:00:07PM -0500, ira.weiny wrote:
> > On Wed, Dec 30, 2015 at 03:01:33AM -0800, Christoph Hellwig wrote:
> > > Hi Ira,
> > > 
> > > please take a look at the patches I've attached - they are just WIP
> > > that hasn't been tested as I'm on a vacation without access to my
> > > IB setup until New Year's Eve.
> > 
> > I have them on a branch.
> > 
> > I'll try and do some testing over the weekend.
> 
> FYI, you probably need this fix from Bart:
> 
> http://marc.info/?l=linux-kernel=145138288102008=2

For some reason I did not...

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2016-01-03 Thread ira.weiny
On Wed, Dec 30, 2015 at 03:01:33AM -0800, Christoph Hellwig wrote:
> Hi Ira,
> 
> please take a look at the patches I've attached - they are just WIP
> that hasn't been tested as I'm on a vacation without access to my
> IB setup until New Year's Eve.
> 
> Patch 1 is I think a genuine bug fix caused by the madness (pun
> intendended) of the wr_id abuses.
> 
> Patch 2: passes the mad_send_buf explicitily to mad handlers to get rid
> of that mess.
> 
> Patch 3 is the CQ API conversion which becomes relatively simple once
> the prior issues above are sorted out.
> 

I've tested all 3 patches together and things seem to be working just fine.
That said I have some comments below.

> From a22609131ca353278015b6b4aec3077db06ad9f5 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 30 Dec 2015 11:49:22 +0100
> Subject: IB/mad: pass send buf in wr_id in local_completions
> 
> The sa_query recv_handler expects the send_buf in wr_id, and all other recv
> handlers ignore wr_id.  It seems this is what we should pass, please confirm.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/core/mad.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index d4d2a61..e0859e5 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -2734,7 +2734,7 @@ static void local_completions(struct work_struct *work)
>* before request
>*/
>   build_smp_wc(recv_mad_agent->agent.qp,
> -  (unsigned long) local->mad_send_wr,
> +  (unsigned long) 
> >mad_send_wr->send_buf,

This is not right.

local_completions is only called for locally processed SMP packets (not SA
packets).  SMPs use the wr_id assigned by ib_create_send_mad (which points to
the internal struct ib_mad_send_wr_private object.)

It seems like a better fix (to make the code more explicit) would be:

20:08:58 > git di
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index e0859e515b47..2840f27a18a0 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2734,7 +2734,7 @@ static void local_completions(struct work_struct *work)
 * before request
 *  */
build_smp_wc(recv_mad_agent->agent.qp,
-(unsigned long) 
>mad_send_wr->send_buf,
+local->mad_send_wr->send_wr.wr.wr_id,
 be16_to_cpu(IB_LID_PERMISSIVE),
 local->mad_send_wr->send_wr.pkey_index,
 recv_mad_agent->agent.port_num, );

But this is just removed later anyway.


All that said I agree there is some hackyness (or as you said "abuse" in the
wr_id) in that the SA is depending on this code in ib_mad.

2013 mad_recv_wc->wc->wr_id = (unsigned long) _send_wr->send_buf;
2014 mad_agent_priv->agent.recv_handler(_agent_priv->agent,
2015mad_recv_wc);

Who is in control of wr_id is confusing in this area of the code.  For that
reason I like your next patch which makes this explicit throughout.


>be16_to_cpu(IB_LID_PERMISSIVE),
>local->mad_send_wr->send_wr.pkey_index,
>recv_mad_agent->agent.port_num, );
> -- 
> 1.9.1
> 

> From c6101bfa6543d0b35c2ca2fa0add09341f456e88 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 30 Dec 2015 11:54:02 +0100
> Subject: IB/mad: pass ib_mad_send_buf explicitly to the recv_handler
> 
> Stop abusing wr_id and just pass the parameter explicitly.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/core/cm.c  |  1 +
>  drivers/infiniband/core/mad.c | 18 ++
>  drivers/infiniband/core/sa_query.c|  7 ++-
>  drivers/infiniband/core/user_mad.c|  1 +
>  drivers/infiniband/ulp/srpt/ib_srpt.c |  1 +
>  include/rdma/ib_mad.h |  2 ++
>  6 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index e3a95d1..ad3726d 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum 
> ib_event_type event)
>  EXPORT_SYMBOL(ib_cm_notify);
>  
>  static void cm_recv_handler(struct ib_mad_agent *mad_agent,
> + struct ib_mad_send_buf *send_buf,
>   struct ib_mad_recv_wc *mad_recv_wc)
>  {
>   struct cm_port *port = mad_agent->context;
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 

Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2016-01-03 Thread ira.weiny
On Wed, Dec 30, 2015 at 03:01:33AM -0800, Christoph Hellwig wrote:
> Hi Ira,
> 
> please take a look at the patches I've attached - they are just WIP
> that hasn't been tested as I'm on a vacation without access to my
> IB setup until New Year's Eve.
> 
> Patch 1 is I think a genuine bug fix caused by the madness (pun
> intendended) of the wr_id abuses.
> 
> Patch 2: passes the mad_send_buf explicitily to mad handlers to get rid
> of that mess.
> 
> Patch 3 is the CQ API conversion which becomes relatively simple once
> the prior issues above are sorted out.
> 

[snip]

> From 0a367aa36e5410f41333d613b7587913b57eb75f Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 30 Dec 2015 11:55:32 +0100
> Subject: IB/mad: use CQ abstraction
> 
> Remove the local workqueue to process mad completions and use the CQ API
> instead.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/core/mad.c  | 154 
> +
>  drivers/infiniband/core/mad_priv.h |   2 +-
>  2 files changed, 55 insertions(+), 101 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index f15fcd6..b04ad05 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -61,18 +61,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in 
> number of work requests
>  module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
>  MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work 
> requests");
>  
> -/*
> - * Define a limit on the number of completions which will be processed by the
> - * worker thread in a single work item.  This ensures that other work items
> - * (potentially from other users) are processed fairly.
> - *
> - * The number of completions was derived from the default queue sizes above.
> - * We use a value which is double the larger of the 2 queues (receive @ 512)
> - * but keep it fixed such that an increase in that value does not introduce
> - * unfairness.
> - */
> -#define MAD_COMPLETION_PROC_LIMIT 1024
> -
>  static struct list_head ib_mad_port_list;
>  static u32 ib_mad_client_id = 0;
>  
> @@ -96,6 +84,9 @@ static int add_nonoui_reg_req(struct ib_mad_reg_req 
> *mad_reg_req,
> u8 mgmt_class);
>  static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req,
>  struct ib_mad_agent_private *agent_priv);
> +static bool mad_error_handler(struct ib_mad_port_private *port_priv,
> +   struct ib_wc *wc);
> +static void ib_mad_send_done(struct ib_cq *cq, struct ib_wc *wc);
>  
>  /*
>   * Returns a ib_mad_port_private structure or NULL for a device/port
> @@ -702,11 +693,11 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
>  }
>  
>  static void build_smp_wc(struct ib_qp *qp,
> -  u64 wr_id, u16 slid, u16 pkey_index, u8 port_num,
> +  void *wr_cqe, u16 slid, u16 pkey_index, u8 port_num,
>struct ib_wc *wc)
>  {
>   memset(wc, 0, sizeof *wc);
> - wc->wr_id = wr_id;
> + wc->wr_cqe = wr_cqe;
>   wc->status = IB_WC_SUCCESS;
>   wc->opcode = IB_WC_RECV;
>   wc->pkey_index = pkey_index;
> @@ -844,7 +835,7 @@ static int handle_outgoing_dr_smp(struct 
> ib_mad_agent_private *mad_agent_priv,
>   }
>  
>   build_smp_wc(mad_agent_priv->agent.qp,
> -  send_wr->wr.wr_id, drslid,
> +  send_wr->wr.wr_cqe, drslid,
>send_wr->pkey_index,
>send_wr->port_num, _wc);
>  
> @@ -1051,7 +1042,9 @@ struct ib_mad_send_buf * ib_create_send_mad(struct 
> ib_mad_agent *mad_agent,
>  
>   mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey;
>  
> - mad_send_wr->send_wr.wr.wr_id = (unsigned long) mad_send_wr;
> + mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
> +
> + mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;
>   mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list;
>   mad_send_wr->send_wr.wr.num_sge = 2;
>   mad_send_wr->send_wr.wr.opcode = IB_WR_SEND;
> @@ -1163,8 +1156,9 @@ int ib_send_mad(struct ib_mad_send_wr_private 
> *mad_send_wr)
>  
>   /* Set WR ID to find mad_send_wr upon completion */
>   qp_info = mad_send_wr->mad_agent_priv->qp_info;
> - mad_send_wr->send_wr.wr.wr_id = (unsigned long)_send_wr->mad_list;
>   mad_send_wr->mad_list.mad_queue = _info->send_queue;
> + mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
> + mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;

I'm not sure if it is absolutely required to assign cqe.done in both
ib_create_send_mad and ib_send_mad.  But I don't think it hurts either.

>  
>   mad_agent = mad_send_wr->send_buf.mad_agent;
>   sge = mad_send_wr->sg_list;
> @@ -2185,13 +2179,14 @@ handle_smi(struct ib_mad_port_private *port_priv,
>   return handle_ib_smi(port_priv, qp_info, wc, port_num, 

Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2016-01-02 Thread Christoph Hellwig
On Wed, Dec 30, 2015 at 09:00:07PM -0500, ira.weiny wrote:
> On Wed, Dec 30, 2015 at 03:01:33AM -0800, Christoph Hellwig wrote:
> > Hi Ira,
> > 
> > please take a look at the patches I've attached - they are just WIP
> > that hasn't been tested as I'm on a vacation without access to my
> > IB setup until New Year's Eve.
> 
> I have them on a branch.
> 
> I'll try and do some testing over the weekend.

FYI, you probably need this fix from Bart:

http://marc.info/?l=linux-kernel=145138288102008=2
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-30 Thread ira.weiny
On Wed, Dec 30, 2015 at 03:01:33AM -0800, Christoph Hellwig wrote:
> Hi Ira,
> 
> please take a look at the patches I've attached - they are just WIP
> that hasn't been tested as I'm on a vacation without access to my
> IB setup until New Year's Eve.

I have them on a branch.

I'll try and do some testing over the weekend.

Ira

> 
> Patch 1 is I think a genuine bug fix caused by the madness (pun
> intendended) of the wr_id abuses.
> 
> Patch 2: passes the mad_send_buf explicitily to mad handlers to get rid
> of that mess.
> 
> Patch 3 is the CQ API conversion which becomes relatively simple once
> the prior issues above are sorted out.
> 

> From a22609131ca353278015b6b4aec3077db06ad9f5 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 30 Dec 2015 11:49:22 +0100
> Subject: IB/mad: pass send buf in wr_id in local_completions
> 
> The sa_query recv_handler expects the send_buf in wr_id, and all other recv
> handlers ignore wr_id.  It seems this is what we should pass, please confirm.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/core/mad.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index d4d2a61..e0859e5 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -2734,7 +2734,7 @@ static void local_completions(struct work_struct *work)
>* before request
>*/
>   build_smp_wc(recv_mad_agent->agent.qp,
> -  (unsigned long) local->mad_send_wr,
> +  (unsigned long) 
> >mad_send_wr->send_buf,
>be16_to_cpu(IB_LID_PERMISSIVE),
>local->mad_send_wr->send_wr.pkey_index,
>recv_mad_agent->agent.port_num, );
> -- 
> 1.9.1
> 

> From c6101bfa6543d0b35c2ca2fa0add09341f456e88 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 30 Dec 2015 11:54:02 +0100
> Subject: IB/mad: pass ib_mad_send_buf explicitly to the recv_handler
> 
> Stop abusing wr_id and just pass the parameter explicitly.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/core/cm.c  |  1 +
>  drivers/infiniband/core/mad.c | 18 ++
>  drivers/infiniband/core/sa_query.c|  7 ++-
>  drivers/infiniband/core/user_mad.c|  1 +
>  drivers/infiniband/ulp/srpt/ib_srpt.c |  1 +
>  include/rdma/ib_mad.h |  2 ++
>  6 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index e3a95d1..ad3726d 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum 
> ib_event_type event)
>  EXPORT_SYMBOL(ib_cm_notify);
>  
>  static void cm_recv_handler(struct ib_mad_agent *mad_agent,
> + struct ib_mad_send_buf *send_buf,
>   struct ib_mad_recv_wc *mad_recv_wc)
>  {
>   struct cm_port *port = mad_agent->context;
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index e0859e5..f15fcd6 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -693,7 +693,7 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
>  
>   atomic_inc(_snoop_priv->refcount);
>   spin_unlock_irqrestore(_info->snoop_lock, flags);
> - mad_snoop_priv->agent.recv_handler(_snoop_priv->agent,
> + mad_snoop_priv->agent.recv_handler(_snoop_priv->agent, NULL,
>  mad_recv_wc);
>   deref_snoop_agent(mad_snoop_priv);
>   spin_lock_irqsave(_info->snoop_lock, flags);
> @@ -1994,9 +1994,9 @@ static void ib_mad_complete_recv(struct 
> ib_mad_agent_private *mad_agent_priv,
>   /* user rmpp is in effect
>* and this is an active RMPP MAD
>*/
> - mad_recv_wc->wc->wr_id = 0;
> - 
> mad_agent_priv->agent.recv_handler(_agent_priv->agent,
> -mad_recv_wc);
> + mad_agent_priv->agent.recv_handler(
> + _agent_priv->agent, NULL,
> + mad_recv_wc);
>   atomic_dec(_agent_priv->refcount);
>   } else {
>   /* not user rmpp, revert to normal behavior and
> @@ -2010,9 +2010,10 @@ static void ib_mad_complete_recv(struct 
> ib_mad_agent_private *mad_agent_priv,
>   spin_unlock_irqrestore(_agent_priv->lock, 

Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-30 Thread Christoph Hellwig
Hi Ira,

please take a look at the patches I've attached - they are just WIP
that hasn't been tested as I'm on a vacation without access to my
IB setup until New Year's Eve.

Patch 1 is I think a genuine bug fix caused by the madness (pun
intendended) of the wr_id abuses.

Patch 2: passes the mad_send_buf explicitily to mad handlers to get rid
of that mess.

Patch 3 is the CQ API conversion which becomes relatively simple once
the prior issues above are sorted out.

>From a22609131ca353278015b6b4aec3077db06ad9f5 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Wed, 30 Dec 2015 11:49:22 +0100
Subject: IB/mad: pass send buf in wr_id in local_completions

The sa_query recv_handler expects the send_buf in wr_id, and all other recv
handlers ignore wr_id.  It seems this is what we should pass, please confirm.

Signed-off-by: Christoph Hellwig 
---
 drivers/infiniband/core/mad.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index d4d2a61..e0859e5 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2734,7 +2734,7 @@ static void local_completions(struct work_struct *work)
 * before request
 */
build_smp_wc(recv_mad_agent->agent.qp,
-(unsigned long) local->mad_send_wr,
+(unsigned long) 
>mad_send_wr->send_buf,
 be16_to_cpu(IB_LID_PERMISSIVE),
 local->mad_send_wr->send_wr.pkey_index,
 recv_mad_agent->agent.port_num, );
-- 
1.9.1

>From c6101bfa6543d0b35c2ca2fa0add09341f456e88 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Wed, 30 Dec 2015 11:54:02 +0100
Subject: IB/mad: pass ib_mad_send_buf explicitly to the recv_handler

Stop abusing wr_id and just pass the parameter explicitly.

Signed-off-by: Christoph Hellwig 
---
 drivers/infiniband/core/cm.c  |  1 +
 drivers/infiniband/core/mad.c | 18 ++
 drivers/infiniband/core/sa_query.c|  7 ++-
 drivers/infiniband/core/user_mad.c|  1 +
 drivers/infiniband/ulp/srpt/ib_srpt.c |  1 +
 include/rdma/ib_mad.h |  2 ++
 6 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index e3a95d1..ad3726d 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum 
ib_event_type event)
 EXPORT_SYMBOL(ib_cm_notify);
 
 static void cm_recv_handler(struct ib_mad_agent *mad_agent,
+   struct ib_mad_send_buf *send_buf,
struct ib_mad_recv_wc *mad_recv_wc)
 {
struct cm_port *port = mad_agent->context;
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index e0859e5..f15fcd6 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -693,7 +693,7 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
 
atomic_inc(_snoop_priv->refcount);
spin_unlock_irqrestore(_info->snoop_lock, flags);
-   mad_snoop_priv->agent.recv_handler(_snoop_priv->agent,
+   mad_snoop_priv->agent.recv_handler(_snoop_priv->agent, NULL,
   mad_recv_wc);
deref_snoop_agent(mad_snoop_priv);
spin_lock_irqsave(_info->snoop_lock, flags);
@@ -1994,9 +1994,9 @@ static void ib_mad_complete_recv(struct 
ib_mad_agent_private *mad_agent_priv,
/* user rmpp is in effect
 * and this is an active RMPP MAD
 */
-   mad_recv_wc->wc->wr_id = 0;
-   
mad_agent_priv->agent.recv_handler(_agent_priv->agent,
-  mad_recv_wc);
+   mad_agent_priv->agent.recv_handler(
+   _agent_priv->agent, NULL,
+   mad_recv_wc);
atomic_dec(_agent_priv->refcount);
} else {
/* not user rmpp, revert to normal behavior and
@@ -2010,9 +2010,10 @@ static void ib_mad_complete_recv(struct 
ib_mad_agent_private *mad_agent_priv,
spin_unlock_irqrestore(_agent_priv->lock, flags);
 
/* Defined behavior is to complete response before 
request */
-   mad_recv_wc->wc->wr_id = (unsigned long) 
_send_wr->send_buf;
-   
mad_agent_priv->agent.recv_handler(_agent_priv->agent,
-  

Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-30 Thread Eli Cohen
On Mon, Dec 28, 2015 at 07:35:14PM -0500, ira.weiny wrote:
> 
> I'm still confused.  Here is the code with the patch applied:
> 
> 
> /* 
>  * IB MAD completion callback 
>  */ 
> static void ib_mad_completion_handler(struct work_struct *work) 
> { 
> struct ib_mad_port_private *port_priv; 
> struct ib_wc wc; 
> int count = 0; 
> 
> port_priv = container_of(work, struct ib_mad_port_private, work);
> ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);
> 
> while (ib_poll_cq(port_priv->cq, 1, ) == 1) {
> if (wc.status == IB_WC_SUCCESS) {
> switch (wc.opcode) {
> case IB_WC_SEND:
> ib_mad_send_done_handler(port_priv, );
> break;
> case IB_WC_RECV:
> ib_mad_recv_done_handler(port_priv, );
> break;
> default:
> BUG_ON(1);
> break;
> }
> } else
> mad_error_handler(port_priv, );
> 
> if (++count > MAD_COMPLETION_PROC_LIMIT) {
> queue_work(port_priv->wq, _priv->work);
> break;
> }
> }
> }
> 
> 
> How is "return" any different than "break"?  Calling return will still result
> in a rearm on the next work task.

My argument is that if you break the loop since you exahsuted the quota you 
don't need to call ib_req_notify_cq(). If you think about this you will see 
that this was the original logic of this function. Only after you exhasted the 
quota you need to call ib_req_notify_cq() so the next completion will trigger 
the interrupt handler which calls the completion handler. The result is that 
you are generating less interrupts in the system.

To achieve that you do like this:

/*
 * IB MAD completion callback
 */
static void ib_mad_completion_handler(struct work_struct *work)
{
struct ib_mad_port_private *port_priv;
struct ib_wc wc;
int count = 0;

port_priv = container_of(work, struct ib_mad_port_private, work);

while (ib_poll_cq(port_priv->cq, 1, ) == 1) {
if (wc.status == IB_WC_SUCCESS) {
switch (wc.opcode) {  
case IB_WC_SEND:
ib_mad_send_done_handler(port_priv, );
break;
case IB_WC_RECV:
ib_mad_recv_done_handler(port_priv, );
break;
default:
BUG_ON(1);
break;
}   

  
} else  

  
mad_error_handler(port_priv, );

if (++count > MAD_COMPLETION_PROC_LIMIT) {
queue_work(port_priv->wq, _priv->work);
return;  <== return and avoid requesting for 
notification
}   

  
}   

  
ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); <== only called if 
there are no more completions to process
}  

I hope my point is clear now.

> 
> Perhaps this code is wrong in the first place?  Should it call 
> ib_req_notify_cq
> after the while loop?  This code has been this way forever...
> 
> 1da177e4c3f41   (Linus Torvalds 2005-04-16 15:20:36 -0700   2568) 
> ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);
> 1da177e4c3f41   (Linus Torvalds 2005-04-16 15:20:36 -0700   2569)
> 1da177e4c3f41   (Linus Torvalds 2005-04-16 15:20:36 -0700   2570)   while 
> (ib_poll_cq(port_priv->cq, 1, ) == 1) {
> 
> 
> Ira
> 
> 
> > 
> > > 
> > > I'm not quite sure what you mean about moving the ib_req_notify_cq 
> > > outside of
> > > the while loop.  It seems like to do what you say we would need another 
> > > work
> > > item which just does ib_poll_cq.  Is that what you meant?
> > > 
> > > Ira
> > > 
> > > > 
> > > > > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct 
> > > > > work_struct *work)
> > > > >   }
> > > > > 

Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-29 Thread ira.weiny
On Tue, Dec 29, 2015 at 11:51:19AM +0200, Sagi Grimberg wrote:
> 
> >Please just convert the mad handler to the new CQ API in
> >drivers/infiniband/core/cq.c.  If you have any question about it I'd be
> >glad to help you.
> 
> +1 on this suggestion.
> 
> We had these sorts of questions in our ULPs as well. The CQ API should
> take care of all that for you and leaves you to just handle the
> completions...

I saw your work and agree it would be nice but it will take some time to
convert and debug the MAD stack.  I'll try and find some time but it is
unlikely I will anytime soon.

We can hit this bug regularly with hfi1 but have not hit with qib or mlx4.  I
leave it up to Doug if he wants to take this fix before someone finds time to
convert the MAD stack.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-29 Thread Christoph Hellwig
Please just convert the mad handler to the new CQ API in
drivers/infiniband/core/cq.c.  If you have any question about it I'd be
glad to help you.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-29 Thread Sagi Grimberg



Please just convert the mad handler to the new CQ API in
drivers/infiniband/core/cq.c.  If you have any question about it I'd be
glad to help you.


+1 on this suggestion.

We had these sorts of questions in our ULPs as well. The CQ API should
take care of all that for you and leaves you to just handle the
completions...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-28 Thread Eli Cohen
On Thu, Dec 10, 2015 at 04:52:30PM -0500, ira.we...@intel.com wrote:
> From: Dean Luick 
> 
>  
> @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct 
> work_struct *work)
>  {
>   struct ib_mad_port_private *port_priv;
>   struct ib_wc wc;
> + int count = 0;
>  
>   port_priv = container_of(work, struct ib_mad_port_private, work);
>   ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);

I think you shoudld push the call to ib_req_notify_cq outside the
while loop. You don't need to arm the CQ if you re-queued the work.
Only when you have drained the CQ should you re-arm.

> @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct 
> work_struct *work)
>   }
>   } else
>   mad_error_handler(port_priv, );
> +
> + if (++count > MAD_COMPLETION_PROC_LIMIT) {
> + queue_work(port_priv->wq, _priv->work);
> + break;
> + }
>   }
>  }
>  
> -- 
> 1.8.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-28 Thread ira.weiny
On Mon, Dec 28, 2015 at 06:51:30PM +0200, Eli Cohen wrote:
> On Thu, Dec 10, 2015 at 04:52:30PM -0500, ira.we...@intel.com wrote:
> > From: Dean Luick 
> > 
> >  
> > @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct 
> > work_struct *work)
> >  {
> > struct ib_mad_port_private *port_priv;
> > struct ib_wc wc;
> > +   int count = 0;
> >  
> > port_priv = container_of(work, struct ib_mad_port_private, work);
> > ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);
> 
> I think you shoudld push the call to ib_req_notify_cq outside the
> while loop. You don't need to arm the CQ if you re-queued the work.
> Only when you have drained the CQ should you re-arm.

Will it hurt to rearm?  The way the code stands I think the worse that will
happen is an extra work item scheduled and an ib_poll_cq call.

I'm not quite sure what you mean about moving the ib_req_notify_cq outside of
the while loop.  It seems like to do what you say we would need another work
item which just does ib_poll_cq.  Is that what you meant?

Ira

> 
> > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct 
> > work_struct *work)
> > }
> > } else
> > mad_error_handler(port_priv, );
> > +
> > +   if (++count > MAD_COMPLETION_PROC_LIMIT) {
> > +   queue_work(port_priv->wq, _priv->work);
> > +   break;
> > +   }
> > }
> >  }
> >  
> > -- 
> > 1.8.2
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-28 Thread ira.weiny
On Tue, Dec 29, 2015 at 01:25:33AM +0200, Eli Cohen wrote:
> On Mon, Dec 28, 2015 at 06:05:46PM -0500, ira.weiny wrote:
> > 
> > Will it hurt to rearm?  The way the code stands I think the worse that will
> > happen is an extra work item scheduled and an ib_poll_cq call.
> 
> If you re-arm unconditionally you call for extra interrupts which you
> can do without. When you break the loop of processing completions since
> you exhausted the quota, you queue the work so you can continue
> processing completions in the next time slot of the work task. After
> queueing the work you should call "return" instead of "break".
> If you processed all the completions before reaching
> MAD_COMPLETION_PROC_LIMIT you will exit the while loop and then
> re-arming can take place.

I'm still confused.  Here is the code with the patch applied:


/* 
 * IB MAD completion callback 
 */ 
static void ib_mad_completion_handler(struct work_struct *work) 
{ 
struct ib_mad_port_private *port_priv; 
struct ib_wc wc; 
int count = 0; 

port_priv = container_of(work, struct ib_mad_port_private, work);
ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);

while (ib_poll_cq(port_priv->cq, 1, ) == 1) {
if (wc.status == IB_WC_SUCCESS) {
switch (wc.opcode) {
case IB_WC_SEND:
ib_mad_send_done_handler(port_priv, );
break;
case IB_WC_RECV:
ib_mad_recv_done_handler(port_priv, );
break;
default:
BUG_ON(1);
break;
}
} else
mad_error_handler(port_priv, );

if (++count > MAD_COMPLETION_PROC_LIMIT) {
queue_work(port_priv->wq, _priv->work);
break;
}
}
}


How is "return" any different than "break"?  Calling return will still result
in a rearm on the next work task.

Perhaps this code is wrong in the first place?  Should it call ib_req_notify_cq
after the while loop?  This code has been this way forever...

1da177e4c3f41   (Linus Torvalds 2005-04-16 15:20:36 -0700   2568) 
ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);
1da177e4c3f41   (Linus Torvalds 2005-04-16 15:20:36 -0700   2569)
1da177e4c3f41   (Linus Torvalds 2005-04-16 15:20:36 -0700   2570)   while 
(ib_poll_cq(port_priv->cq, 1, ) == 1) {


Ira


> 
> > 
> > I'm not quite sure what you mean about moving the ib_req_notify_cq outside 
> > of
> > the while loop.  It seems like to do what you say we would need another work
> > item which just does ib_poll_cq.  Is that what you meant?
> > 
> > Ira
> > 
> > > 
> > > > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct 
> > > > work_struct *work)
> > > > }
> > > > } else
> > > > mad_error_handler(port_priv, );
> > > > +
> > > > +   if (++count > MAD_COMPLETION_PROC_LIMIT) {
> > > > +   queue_work(port_priv->wq, _priv->work);
> > > > +   break;
> > > > +   }
> > > > }
> > > >  }
> > > >  
> > > > -- 
> > > > 1.8.2
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > > the body of a message to majord...@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-28 Thread Eli Cohen
On Mon, Dec 28, 2015 at 06:05:46PM -0500, ira.weiny wrote:
> 
> Will it hurt to rearm?  The way the code stands I think the worse that will
> happen is an extra work item scheduled and an ib_poll_cq call.

If you re-arm unconditionally you call for extra interrupts which you
can do without. When you break the loop of processing completions since
you exhausted the quota, you queue the work so you can continue
processing completions in the next time slot of the work task. After
queueing the work you should call "return" instead of "break".
If you processed all the completions before reaching
MAD_COMPLETION_PROC_LIMIT you will exit the while loop and then
re-arming can take place.

> 
> I'm not quite sure what you mean about moving the ib_req_notify_cq outside of
> the while loop.  It seems like to do what you say we would need another work
> item which just does ib_poll_cq.  Is that what you meant?
> 
> Ira
> 
> > 
> > > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct 
> > > work_struct *work)
> > >   }
> > >   } else
> > >   mad_error_handler(port_priv, );
> > > +
> > > + if (++count > MAD_COMPLETION_PROC_LIMIT) {
> > > + queue_work(port_priv->wq, _priv->work);
> > > + break;
> > > + }
> > >   }
> > >  }
> > >  
> > > -- 
> > > 1.8.2
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-23 Thread ira.weiny
Doug,

With your email troubles I just wanted to make sure you did not lose track of
this patch.

https://patchwork.kernel.org/patch/7822511/

Thanks,
Ira

On Thu, Dec 10, 2015 at 04:52:30PM -0500, ira.we...@intel.com wrote:
> From: Dean Luick 
> 
> It was found that when a process was rapidly sending MADs other processes 
> could
> be hung in their unregister calls.
> 
> This would happen when process A was injecting packets fast enough that the
> single threaded workqueue was never exiting ib_mad_completion_handler.
> Therefore when process B called flush_workqueue via the unregister call it
> would hang until process A stopped sending MADs.
> 
> The fix is to periodically reschedule ib_mad_completion_handler after
> processing a large number of completions.  The number of completions chosen 
> was
> decided based on the defaults for the recv queue size.  However, it was kept
> fixed such that increasing those queue sizes would not adversely affect
> fairness in the future.
> 
> Reviewed-by: Ira Weiny 
> Signed-off-by: Dean Luick 
> ---
>  drivers/infiniband/core/mad.c | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 2281de122038..d4d2a618fd66 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -61,6 +61,18 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in 
> number of work requests
>  module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
>  MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work 
> requests");
>  
> +/*
> + * Define a limit on the number of completions which will be processed by the
> + * worker thread in a single work item.  This ensures that other work items
> + * (potentially from other users) are processed fairly.
> + *
> + * The number of completions was derived from the default queue sizes above.
> + * We use a value which is double the larger of the 2 queues (receive @ 512)
> + * but keep it fixed such that an increase in that value does not introduce
> + * unfairness.
> + */
> +#define MAD_COMPLETION_PROC_LIMIT 1024
> +
>  static struct list_head ib_mad_port_list;
>  static u32 ib_mad_client_id = 0;
>  
> @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct 
> work_struct *work)
>  {
>   struct ib_mad_port_private *port_priv;
>   struct ib_wc wc;
> + int count = 0;
>  
>   port_priv = container_of(work, struct ib_mad_port_private, work);
>   ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);
> @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct 
> work_struct *work)
>   }
>   } else
>   mad_error_handler(port_priv, );
> +
> + if (++count > MAD_COMPLETION_PROC_LIMIT) {
> + queue_work(port_priv->wq, _priv->work);
> + break;
> + }
>   }
>  }
>  
> -- 
> 1.8.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-23 Thread Doug Ledford
On 12/23/2015 03:01 PM, ira.weiny wrote:
> Doug,
> 
> With your email troubles I just wanted to make sure you did not lose track of
> this patch.
> 
> https://patchwork.kernel.org/patch/7822511/

I've got it, thanks.

> Thanks,
> Ira
> 
> On Thu, Dec 10, 2015 at 04:52:30PM -0500, ira.we...@intel.com wrote:
>> From: Dean Luick 
>>
>> It was found that when a process was rapidly sending MADs other processes 
>> could
>> be hung in their unregister calls.
>>
>> This would happen when process A was injecting packets fast enough that the
>> single threaded workqueue was never exiting ib_mad_completion_handler.
>> Therefore when process B called flush_workqueue via the unregister call it
>> would hang until process A stopped sending MADs.
>>
>> The fix is to periodically reschedule ib_mad_completion_handler after
>> processing a large number of completions.  The number of completions chosen 
>> was
>> decided based on the defaults for the recv queue size.  However, it was kept
>> fixed such that increasing those queue sizes would not adversely affect
>> fairness in the future.
>>
>> Reviewed-by: Ira Weiny 
>> Signed-off-by: Dean Luick 
>> ---
>>  drivers/infiniband/core/mad.c | 18 ++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>> index 2281de122038..d4d2a618fd66 100644
>> --- a/drivers/infiniband/core/mad.c
>> +++ b/drivers/infiniband/core/mad.c
>> @@ -61,6 +61,18 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in 
>> number of work requests
>>  module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
>>  MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work 
>> requests");
>>  
>> +/*
>> + * Define a limit on the number of completions which will be processed by 
>> the
>> + * worker thread in a single work item.  This ensures that other work items
>> + * (potentially from other users) are processed fairly.
>> + *
>> + * The number of completions was derived from the default queue sizes above.
>> + * We use a value which is double the larger of the 2 queues (receive @ 512)
>> + * but keep it fixed such that an increase in that value does not introduce
>> + * unfairness.
>> + */
>> +#define MAD_COMPLETION_PROC_LIMIT 1024
>> +
>>  static struct list_head ib_mad_port_list;
>>  static u32 ib_mad_client_id = 0;
>>  
>> @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct 
>> work_struct *work)
>>  {
>>  struct ib_mad_port_private *port_priv;
>>  struct ib_wc wc;
>> +int count = 0;
>>  
>>  port_priv = container_of(work, struct ib_mad_port_private, work);
>>  ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);
>> @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct 
>> work_struct *work)
>>  }
>>  } else
>>  mad_error_handler(port_priv, );
>> +
>> +if (++count > MAD_COMPLETION_PROC_LIMIT) {
>> +queue_work(port_priv->wq, _priv->work);
>> +break;
>> +}
>>  }
>>  }
>>  
>> -- 
>> 1.8.2
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature