Re: [PATCH 13/17] net: stmmac: Implement NAPI for TX

2017-02-03 Thread Corentin Labbe
On Fri, Feb 03, 2017 at 10:15:30AM -0500, David Miller wrote:
> From: Corentin Labbe 
> Date: Fri, 3 Feb 2017 14:41:45 +0100
> 
> > On Tue, Jan 31, 2017 at 11:12:25PM -0500, David Miller wrote:
> >> From: Corentin Labbe 
> >> Date: Tue, 31 Jan 2017 10:11:48 +0100
> >> 
> >> > The stmmac driver run TX completion under NAPI but without checking
> >> > the work done by the TX completion function.
> >> 
> >> The current behavior is correct and completely intentional.
> >> 
> >> A driver should _never_ account TX work to the NAPI poll budget.
> >> 
> >> This is because TX liberation is orders of magnitude cheaper than
> >> receiving a packet, and such SKB freeing makes more SKBs available
> >> for RX processing.
> >> 
> >> Therefore, TX work should never count against the NAPI budget.
> >> 
> >> Please do not fix something which is not broken.
> > 
> > So at least the documentation I read must be fixed 
> > (https://wiki.linuxfoundation.org/networking/napi)
> 
> We have no control over nor care about what the Linux Foundation writes
> about the Linux networking code.
> 
> Complain to them and please do not bother us about it.
> 
> Thank you.

Sorry, this was not to bother you.

Could you give me your opinion on the other question of the mail ? (just copied 
below)
So perhaps the best way is to do like intel igb/ixgbe, keeping under NAPI until 
the stmmac_tx_clean function said that it finished handling the queue (with a 
distinct TX budget)?

Thanks
Regards


Re: [PATCH 13/17] net: stmmac: Implement NAPI for TX

2017-02-03 Thread David Miller
From: Corentin Labbe 
Date: Fri, 3 Feb 2017 14:41:45 +0100

> On Tue, Jan 31, 2017 at 11:12:25PM -0500, David Miller wrote:
>> From: Corentin Labbe 
>> Date: Tue, 31 Jan 2017 10:11:48 +0100
>> 
>> > The stmmac driver run TX completion under NAPI but without checking
>> > the work done by the TX completion function.
>> 
>> The current behavior is correct and completely intentional.
>> 
>> A driver should _never_ account TX work to the NAPI poll budget.
>> 
>> This is because TX liberation is orders of magnitude cheaper than
>> receiving a packet, and such SKB freeing makes more SKBs available
>> for RX processing.
>> 
>> Therefore, TX work should never count against the NAPI budget.
>> 
>> Please do not fix something which is not broken.
> 
> So at least the documentation I read must be fixed 
> (https://wiki.linuxfoundation.org/networking/napi)

We have no control over nor care about what the Linux Foundation writes
about the Linux networking code.

Complain to them and please do not bother us about it.

Thank you.


Re: [PATCH 13/17] net: stmmac: Implement NAPI for TX

2017-02-03 Thread Corentin Labbe
On Tue, Jan 31, 2017 at 11:12:25PM -0500, David Miller wrote:
> From: Corentin Labbe 
> Date: Tue, 31 Jan 2017 10:11:48 +0100
> 
> > The stmmac driver run TX completion under NAPI but without checking
> > the work done by the TX completion function.
> 
> The current behavior is correct and completely intentional.
> 
> A driver should _never_ account TX work to the NAPI poll budget.
> 
> This is because TX liberation is orders of magnitude cheaper than
> receiving a packet, and such SKB freeing makes more SKBs available
> for RX processing.
> 
> Therefore, TX work should never count against the NAPI budget.
> 
> Please do not fix something which is not broken.

So at least the documentation I read must be fixed 
(https://wiki.linuxfoundation.org/networking/napi)

So perhaps the best way is to do like intel igb/ixgbe, keeping under NAPI until 
the stmmac_tx_clean function said that it finish handling the queue ?

Regards
Corentin Labbe


Re: [PATCH 13/17] net: stmmac: Implement NAPI for TX

2017-01-31 Thread David Miller
From: Corentin Labbe 
Date: Tue, 31 Jan 2017 10:11:48 +0100

> The stmmac driver run TX completion under NAPI but without checking
> the work done by the TX completion function.

The current behavior is correct and completely intentional.

A driver should _never_ account TX work to the NAPI poll budget.

This is because TX liberation is orders of magnitude cheaper than
receiving a packet, and such SKB freeing makes more SKBs available
for RX processing.

Therefore, TX work should never count against the NAPI budget.

Please do not fix something which is not broken.


Re: [PATCH 13/17] net: stmmac: Implement NAPI for TX

2017-01-31 Thread Corentin Labbe
On Tue, Jan 31, 2017 at 11:28:03AM +0100, Giuseppe CAVALLARO wrote:
> On 1/31/2017 10:11 AM, Corentin Labbe wrote:
> > The stmmac driver run TX completion under NAPI but without checking the
> > work done by the TX completion function.
> >
> > This patch add work/budget to the TX completion function.
> >
> > The visible effect is that it keep the driver longer under NAPI and
> > boost performance.
> > Under dwmac-sun8i the iperf goes from 140Mbit/s to 500Mbit/s.
> > Under dwmac-sunxi an iperf run use half less interrupts.
> 
> I think that this patch should be sent separately with more details
> about the implementation you are adopting and results.
> 

This patch is just implementing what NAPI documentation say.
"The budget parameter places a limit on the amount of work the driver may do. 
Each received packet counts as one unit of work. The poll() function may also 
process TX completions, in which case if it processes the entire TX ring then 
it should count that work as the rest of the budget. Otherwise, TX completions 
are not counted."

For the history, I have done the sun8i-emac driver for H3/A64 and get with if 
very good performance.
Some people find that the hardware was in fact a modified version of dwmac and 
so I start working on dwmac-sun8i glue driver.
Testing dwmac-sun8i give very bad performance and the only real difference 
between thoses two driver was the handling of NAPI TX mitigation.

The performance are tested with a simple iperf.
I will redo some test with some numbers

> For example, in the timer callback you force 256 (it seems
> DMA_TX_SIZE/2); do you think this should be tunable or fixed to
> NAPI budget?

I think that the whole "TX mitigation timer" is useless when using TX 
completion within NAPI.
I will do some bench for checking with and without it.

> 
> I'd like to understand if performance you get are for TCP traffic;
> can you tell me what happens on unidirectional traffic?
> 
> Thx a lot for your effort, pls let me know
> 
> Regards
> peppe
> 



Re: [PATCH 13/17] net: stmmac: Implement NAPI for TX

2017-01-31 Thread Giuseppe CAVALLARO

On 1/31/2017 10:11 AM, Corentin Labbe wrote:

The stmmac driver run TX completion under NAPI but without checking the
work done by the TX completion function.

This patch add work/budget to the TX completion function.

The visible effect is that it keep the driver longer under NAPI and
boost performance.
Under dwmac-sun8i the iperf goes from 140Mbit/s to 500Mbit/s.
Under dwmac-sunxi an iperf run use half less interrupts.


I think that this patch should be sent separately with more details
about the implementation you are adopting and results.

For example, in the timer callback you force 256 (it seems
DMA_TX_SIZE/2); do you think this should be tunable or fixed to
NAPI budget?

I'd like to understand if performance you get are for TCP traffic;
can you tell me what happens on unidirectional traffic?

Thx a lot for your effort, pls let me know

Regards
peppe



Signed-off-by: Corentin Labbe 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 2df36bd..e53b727 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1299,10 +1299,11 @@ static void stmmac_dma_operation_mode(struct 
stmmac_priv *priv)
  * @priv: driver private structure
  * Description: it reclaims the transmit resources after transmission 
completes.
  */
-static void stmmac_tx_clean(struct stmmac_priv *priv)
+static int stmmac_tx_clean(struct stmmac_priv *priv, int budget)
 {
unsigned int bytes_compl = 0, pkts_compl = 0;
unsigned int entry = priv->dirty_tx;
+   int work = 0;

netif_tx_lock(priv->dev);

@@ -1369,6 +1370,9 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
priv->hw->desc->release_tx_desc(p, priv->mode);

entry = STMMAC_GET_ENTRY(entry, DMA_TX_SIZE);
+   work++;
+   if (work >= budget)
+   break;
}
priv->dirty_tx = entry;

@@ -1386,6 +1390,11 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
mod_timer(>eee_ctrl_timer, STMMAC_LPI_T(eee_timer));
}
netif_tx_unlock(priv->dev);
+
+   if (work < budget)
+   work = 0;
+
+   return work;
 }

 static inline void stmmac_enable_dma_irq(struct stmmac_priv *priv)
@@ -1617,7 +1626,7 @@ static void stmmac_tx_timer(unsigned long data)
 {
struct stmmac_priv *priv = (struct stmmac_priv *)data;

-   stmmac_tx_clean(priv);
+   stmmac_tx_clean(priv, 256);
 }

 /**
@@ -2657,9 +2666,10 @@ static int stmmac_poll(struct napi_struct *napi, int 
budget)
int work_done = 0;

priv->xstats.napi_poll++;
-   stmmac_tx_clean(priv);
+   work_done += stmmac_tx_clean(priv, budget);

-   work_done = stmmac_rx(priv, budget);
+   if (work_done < budget)
+   work_done += stmmac_rx(priv, budget - work_done);
if (work_done < budget) {
napi_complete(napi);
stmmac_enable_dma_irq(priv);





[PATCH 13/17] net: stmmac: Implement NAPI for TX

2017-01-31 Thread Corentin Labbe
The stmmac driver run TX completion under NAPI but without checking the
work done by the TX completion function.

This patch add work/budget to the TX completion function.

The visible effect is that it keep the driver longer under NAPI and
boost performance.
Under dwmac-sun8i the iperf goes from 140Mbit/s to 500Mbit/s.
Under dwmac-sunxi an iperf run use half less interrupts.

Signed-off-by: Corentin Labbe 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 2df36bd..e53b727 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1299,10 +1299,11 @@ static void stmmac_dma_operation_mode(struct 
stmmac_priv *priv)
  * @priv: driver private structure
  * Description: it reclaims the transmit resources after transmission 
completes.
  */
-static void stmmac_tx_clean(struct stmmac_priv *priv)
+static int stmmac_tx_clean(struct stmmac_priv *priv, int budget)
 {
unsigned int bytes_compl = 0, pkts_compl = 0;
unsigned int entry = priv->dirty_tx;
+   int work = 0;
 
netif_tx_lock(priv->dev);
 
@@ -1369,6 +1370,9 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
priv->hw->desc->release_tx_desc(p, priv->mode);
 
entry = STMMAC_GET_ENTRY(entry, DMA_TX_SIZE);
+   work++;
+   if (work >= budget)
+   break;
}
priv->dirty_tx = entry;
 
@@ -1386,6 +1390,11 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
mod_timer(>eee_ctrl_timer, STMMAC_LPI_T(eee_timer));
}
netif_tx_unlock(priv->dev);
+
+   if (work < budget)
+   work = 0;
+
+   return work;
 }
 
 static inline void stmmac_enable_dma_irq(struct stmmac_priv *priv)
@@ -1617,7 +1626,7 @@ static void stmmac_tx_timer(unsigned long data)
 {
struct stmmac_priv *priv = (struct stmmac_priv *)data;
 
-   stmmac_tx_clean(priv);
+   stmmac_tx_clean(priv, 256);
 }
 
 /**
@@ -2657,9 +2666,10 @@ static int stmmac_poll(struct napi_struct *napi, int 
budget)
int work_done = 0;
 
priv->xstats.napi_poll++;
-   stmmac_tx_clean(priv);
+   work_done += stmmac_tx_clean(priv, budget);
 
-   work_done = stmmac_rx(priv, budget);
+   if (work_done < budget)
+   work_done += stmmac_rx(priv, budget - work_done);
if (work_done < budget) {
napi_complete(napi);
stmmac_enable_dma_irq(priv);
-- 
2.10.2