On Wed, Feb 15, 2017 at 3:29 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Wed, 2017-02-15 at 13:10 +0200, Saeed Mahameed wrote: >> On Fri, Feb 10, 2017 at 2:27 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: >> > From: Eric Dumazet <eduma...@google.com> >> > >> > All rx and rx netdev interrupts are handled by respectively >> > by mlx4_en_rx_irq() and mlx4_en_tx_irq() which simply schedule a NAPI. >> > >> > But mlx4_eq_int() also fires a tasklet to service all items that were >> > queued via mlx4_add_cq_to_tasklet(), but this handler was not called >> > unless user cqe was handled. >> > >> > This is very confusing, as "mpstat -I SCPU ..." show huge number of >> > tasklet invocations. >> > >> > This patch saves this overhead, by carefully firing the tasklet directly >> > from mlx4_add_cq_to_tasklet(), removing four atomic operations per IRQ. >> > >> > Signed-off-by: Eric Dumazet <eduma...@google.com> >> > Cc: Tariq Toukan <tar...@mellanox.com> >> > Cc: Saeed Mahameed <sae...@mellanox.com> >> > --- >> > drivers/net/ethernet/mellanox/mlx4/cq.c | 6 +++++- >> > drivers/net/ethernet/mellanox/mlx4/eq.c | 9 +-------- >> > 2 files changed, 6 insertions(+), 9 deletions(-) >> > >> > diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c >> > b/drivers/net/ethernet/mellanox/mlx4/cq.c >> > index >> > 6b8635378f1fcb2aae4e8ac390bcd09d552c2256..fa6d2354a0e910ee160863e3cbe21a512d77bf03 >> > 100644 >> > --- a/drivers/net/ethernet/mellanox/mlx4/cq.c >> > +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c >> > @@ -81,8 +81,9 @@ void mlx4_cq_tasklet_cb(unsigned long data) >> > >> > static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq) >> > { >> > - unsigned long flags; >> > struct mlx4_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv; >> > + unsigned long flags; >> > + bool kick; >> > >> > spin_lock_irqsave(&tasklet_ctx->lock, flags); >> > /* When migrating CQs between EQs will be implemented, please note >> > @@ -92,7 +93,10 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq) >> > */ >> > if (list_empty_careful(&cq->tasklet_ctx.list)) { >> > atomic_inc(&cq->refcount); >> > + kick = list_empty(&tasklet_ctx->list); >> >> So first one in would fire the tasklet, but wouldn't this cause CQE >> processing loss >> in the same mlx4_eq_int loop if the tasklet was fast enough to >> schedule and while other CQEs are going to add themselves to the >> tasklet_ctx->list ? > > > mlx4_eq_int() is a hard irq handler. > > How a tasklet could run in the middle of it ? >
can the tasklet run on a different core ? > A tasklet is a softirq handler. > > softirq must wait that the current hard irq handler is done. >> >> Anyway i tried to find race scenarios that could cause such thing but >> synchronization looks good. >> >> > list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list); >> > + if (kick) >> > + tasklet_schedule(&tasklet_ctx->task); >> > } >> > spin_unlock_irqrestore(&tasklet_ctx->lock, flags); >> > } >> > diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c >> > b/drivers/net/ethernet/mellanox/mlx4/eq.c >> > index >> > 0509996957d9664b612358dd805359f4bc67b8dc..39232b6a974f4b4b961d3b0b8634f04e6b9d0caa >> > 100644 >> > --- a/drivers/net/ethernet/mellanox/mlx4/eq.c >> > +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c >> > @@ -494,7 +494,7 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct >> > mlx4_eq *eq) >> > { >> > struct mlx4_priv *priv = mlx4_priv(dev); >> > struct mlx4_eqe *eqe; >> > - int cqn = -1; >> > + int cqn; >> > int eqes_found = 0; >> > int set_ci = 0; >> > int port; >> > @@ -840,13 +840,6 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct >> > mlx4_eq *eq) >> > >> > eq_set_ci(eq, 1); >> > >> > - /* cqn is 24bit wide but is initialized such that its higher bits >> > - * are ones too. Thus, if we got any event, cqn's high bits should >> > be off >> > - * and we need to schedule the tasklet. >> > - */ >> > - if (!(cqn & ~0xffffff)) >> >> what if we simply change this condition to: >> if (!list_empty_careful(eq->tasklet_ctx.list)) >> >> Wouldn't this be sort of equivalent to what you did ? and this way we >> would simply fire the tasklet only when needed and not on every >> handled CQE. > > Still this test would be done one million time per second on my hosts. > > What is the point exactly ? > the point is that if the EQ is full of CQEs from different CQs you would do the " kick = list_empty(&tasklet_ctx->list);" test per empty CQ list rather than once at the end. in mlx4_en case, you have only two CQs on each EQ but in RoCE/IB you can have as many CQs as you want. > Thanks. > >