Re: Page allocator order-0 optimizations merged

2017-03-29 Thread Tariq Toukan



On 28/03/2017 9:24 PM, Jesper Dangaard Brouer wrote:

On Tue, 28 Mar 2017 19:05:12 +0300
Tariq Toukan  wrote:


On 28/03/2017 10:32 AM, Tariq Toukan wrote:



On 27/03/2017 4:32 PM, Mel Gorman wrote:

On Mon, Mar 27, 2017 at 02:39:47PM +0200, Jesper Dangaard Brouer wrote:

On Mon, 27 Mar 2017 10:55:14 +0200
Jesper Dangaard Brouer  wrote:


A possible solution, would be use the local_bh_{disable,enable} instead
of the {preempt_disable,enable} calls.  But it is slower, using numbers
from [1] (19 vs 11 cycles), thus the expected cycles saving is
38-19=19.

The problematic part of using local_bh_enable is that this adds a
softirq/bottom-halves rescheduling point (as it checks for pending
BHs).  Thus, this might affects real workloads.


I implemented this solution in patch below... and tested it on mlx5 at
50G with manually disabled driver-page-recycling.  It works for me.

To Mel, that do you prefer... a partial-revert or something like this?



If Tariq confirms it works for him as well, this looks far safer patch


Great.
I will test Jesper's patch today in the afternoon.



It looks very good!
I get line-rate (94Gbits/sec) with 8 streams, in comparison to less than
55Gbits/sec before.


Just confirming, this is when you have disabled mlx5 driver
page-recycling, right?



Right.
This is a great result!


than having a dedicate IRQ-safe queue. Your concern about the BH
scheduling point is valid but if it's proven to be a problem, there is
still the option of a partial revert.




Re: [REGRESSION] mac80211: IBSS vif queue stopped when started after 11s vif

2017-03-29 Thread Johannes Berg
Hi Sven,


> But I could be completely wrong about it. It would therefore be
> interesting for me to know who would be responsible to start the
> queues when ieee80211_do_open rejected it for IBSS.

Well, once ieee80211_offchannel_return() is called, that should do the
needful and end up in ieee80211_propagate_queue_wake().

Can you check what the IBSS vif's queues are (vif->hw_queue[...])?

However, I also don't understand the difference between encrypted and
unencrypted here.

johannes


Re: [PATCH net-next] net: phy: Allow building mdio-boardinfo into the kernel

2017-03-29 Thread Arnd Bergmann
On Tue, Mar 28, 2017 at 9:57 PM, Florian Fainelli  wrote:
> mdio-boardinfo contains code that is helpful for platforms to register
> specific MDIO bus devices independent of how CONFIG_MDIO_DEVICE or
> CONFIG_PHYLIB will be selected (modular or built-in). In order to make
> that possible, let's do the following:
>
> - descend into drivers/net/phy/ unconditionally
>
> - make mdiobus_setup_mdiodev_from_board_info() take a callback argument
>   which allows us not to expose the internal MDIO board info list and
>   mutex, yet maintain the logic within the same file
>
> - relocate the code that creates a MDIO device into
>   drivers/net/phy/mdio_bus.c
>
> - build mdio-boardinfo.o into the kernel as soon as MDIO_DEVICE is
>   defined (y or m)
>
> Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from 
> PHYs")
> Fixes: 648ea0134069 ("net: phy: Allow pre-declaration of MDIO devices")
> Signed-off-by: Florian Fainelli 

It survived the overnight randconfig build,

Tested-by: Arnd Bergmann 

On a related note, I ran into one more case of a network driver selecting a
particular PHY:

drivers/net/built-in.o: In function `octeon_mdiobus_remove':
wilink_platform_data.c:(.text+0xe58): undefined reference to
`mdiobus_unregister'
wilink_platform_data.c:(.text+0xe60): undefined reference to `mdiobus_free'
drivers/net/built-in.o: In function `octeon_mdiobus_probe':
wilink_platform_data.c:(.text+0xec8): undefined reference to
`devm_mdiobus_alloc_size'
wilink_platform_data.c:(.text+0x1090): undefined reference to
`of_mdiobus_register'
wilink_platform_data.c:(.text+0x10d0): undefined reference to `mdiobus_free'

Building with this hack fixes the three instances I found so far, but my
current workaround seems rather fragile:

@@ -28,7 +28,7 @@ config MDIO_BCM_UNIMAC

 config MDIO_BITBANG
tristate "Bitbanged MDIO buses"
-   depends on !(MDIO_DEVICE=y && PHYLIB=m)
+   depends on m || !(MDIO_DEVICE=y && PHYLIB=m)
help
  This module implements the MDIO bus protocol in software,
  for use by low level drivers that export the ability to
@@ -118,6 +118,7 @@ config MDIO_OCTEON
 config MDIO_SUN4I
tristate "Allwinner sun4i MDIO interface support"
depends on ARCH_SUNXI
+   depends on m || !(MDIO_DEVICE=y && PHYLIB=m)
help
  This driver supports the MDIO interface found in the network
  interface units of the Allwinner SoC that have an EMAC (A10,
@@ -109,6 +109,7 @@ config MDIO_OCTEON
tristate "Octeon and some ThunderX SOCs MDIO buses"
depends on 64BIT
depends on HAS_IOMEM
+   depends on m || !(MDIO_DEVICE=y && PHYLIB=m)
select MDIO_CAVIUM
help
  This module provides a driver for the Octeon and ThunderX MDIO

The configuration causing it is something like this:

CONFIG_MDIO_OCTEON=y
CONFIG_MDIO_DEVICE=y
CONFIG_PHYLIB=m

This is what I'm trying now:

--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -7,7 +7,16 @@ menuconfig MDIO_DEVICE
help
   MDIO devices and driver infrastructure code.

-if MDIO_DEVICE
+config MDIO_BUS
+   tristate
+   default m if PHYLIB=m
+   default MDIO_DEVICE
+   help
+ This internal symbol is used for link time dependencies and it
+ reflects whether the mdio_bus/mdio_device code is built as a
+ loadable module or built-in.
+
+if MDIO_BUS

 config MDIO_BCM_IPROC
tristate "Broadcom iProc MDIO bus controller"
@@ -28,7 +37,6 @@ config MDIO_BCM_UNIMAC

 config MDIO_BITBANG
tristate "Bitbanged MDIO buses"
-   depends on m || !(MDIO_DEVICE=y && PHYLIB=m)
help
  This module implements the MDIO bus protocol in software,
  for use by low level drivers that export the ability to
@@ -109,7 +117,6 @@ config MDIO_OCTEON
tristate "Octeon and some ThunderX SOCs MDIO buses"
depends on 64BIT
depends on HAS_IOMEM
-   depends on m || !(MDIO_DEVICE=y && PHYLIB=m)
select MDIO_CAVIUM
help
  This module provides a driver for the Octeon and ThunderX MDIO
@@ -119,7 +126,6 @@ config MDIO_OCTEON
 config MDIO_SUN4I
tristate "Allwinner sun4i MDIO interface support"
depends on ARCH_SUNXI
-   depends on m || !(MDIO_DEVICE=y && PHYLIB=m)
help
  This driver supports the MDIO interface found in the network
  interface units of the Allwinner SoC that have an EMAC (A10,


Arnd


Re: in_irq_or_nmi()

2017-03-29 Thread Peter Zijlstra
On Mon, Mar 27, 2017 at 09:58:17AM -0700, Matthew Wilcox wrote:
> On Mon, Mar 27, 2017 at 05:15:00PM +0200, Jesper Dangaard Brouer wrote:
> > And I also verified it worked:
> > 
> >   0.63 │   mov__preempt_count,%eax
> >│ free_hot_cold_page():
> >   1.25 │   test   $0x1f,%eax
> >│ ↓ jne1e4
> > 
> > And this simplification also made the compiler change this into a
> > unlikely branch, which is a micro-optimization (that I will leave up to
> > the compiler).
> 
> Excellent!  That said, I think we should define in_irq_or_nmi() in
> preempt.h, rather than hiding it in the memory allocator.  And since we're
> doing that, we might as well make it look like the other definitions:
> 
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index 7eeceac52dea..af98c29abd9d 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -81,6 +81,7 @@
>  #define in_interrupt()   (irq_count())
>  #define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET)
>  #define in_nmi() (preempt_count() & NMI_MASK)
> +#define in_irq_or_nmi()  (preempt_count() & (HARDIRQ_MASK | 
> NMI_MASK))
>  #define in_task()(!(preempt_count() & \
>  (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
>  

No, that's horrible. Also, wth is this about? A memory allocator that
needs in_nmi()? That sounds beyond broken.


Re: [RFC PATCH tip/master 2/3] kprobes: Allocate kretprobe instance if its free list is empty

2017-03-29 Thread Masami Hiramatsu
On Wed, 29 Mar 2017 08:30:05 +0200
Ingo Molnar  wrote:
> 
> * Masami Hiramatsu  wrote:
> 
> > @@ -1824,6 +1823,30 @@ void unregister_jprobes(struct jprobe **jps, int num)
> >  EXPORT_SYMBOL_GPL(unregister_jprobes);
> >  
> >  #ifdef CONFIG_KRETPROBES
> > +
> > +/* Try to use free instance first, if failed, try to allocate new instance 
> > */
> > +struct kretprobe_instance *kretprobe_alloc_instance(struct kretprobe *rp)
> > +{
> > +   struct kretprobe_instance *ri = NULL;
> > +   unsigned long flags = 0;
> > +
> > +   raw_spin_lock_irqsave(&rp->lock, flags);
> > +   if (!hlist_empty(&rp->free_instances)) {
> > +   ri = hlist_entry(rp->free_instances.first,
> > +   struct kretprobe_instance, hlist);
> > +   hlist_del(&ri->hlist);
> > +   }
> > +   raw_spin_unlock_irqrestore(&rp->lock, flags);
> > +
> > +   /* Populate max active instance if possible */
> > +   if (!ri && rp->maxactive < KRETPROBE_MAXACTIVE_ALLOC) {
> > +   ri = kmalloc(sizeof(*ri) + rp->data_size, GFP_ATOMIC);
> > +   if (ri)
> > +   rp->maxactive++;
> > +   }
> > +
> > +   return ri;
> > +}
> >  /*
> >   * This kprobe pre_handler is registered with every kretprobe. When probe
> >   * hits it will set up the return probe.
> > @@ -1846,14 +1869,8 @@ static int pre_handler_kretprobe(struct kprobe *p, 
> > struct pt_regs *regs)
> > }
> >  
> > /* TODO: consider to only swap the RA after the last pre_handler fired 
> > */
> > -   hash = hash_ptr(current, KPROBE_HASH_BITS);
> > -   raw_spin_lock_irqsave(&rp->lock, flags);
> > -   if (!hlist_empty(&rp->free_instances)) {
> > -   ri = hlist_entry(rp->free_instances.first,
> > -   struct kretprobe_instance, hlist);
> > -   hlist_del(&ri->hlist);
> > -   raw_spin_unlock_irqrestore(&rp->lock, flags);
> > -
> > +   ri = kretprobe_alloc_instance(rp);
> > +   if (ri) {
> > ri->rp = rp;
> > ri->task = current;
> >  
> > @@ -1868,13 +1885,13 @@ static int pre_handler_kretprobe(struct kprobe *p, 
> > struct pt_regs *regs)
> >  
> > /* XXX(hch): why is there no hlist_move_head? */
> > INIT_HLIST_NODE(&ri->hlist);
> > +   hash = hash_ptr(current, KPROBE_HASH_BITS);
> > kretprobe_table_lock(hash, &flags);
> > hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
> > kretprobe_table_unlock(hash, &flags);
> > -   } else {
> > +   } else
> > rp->nmissed++;
> > -   raw_spin_unlock_irqrestore(&rp->lock, flags);
> > -   }
> > +
> > return 0;
> >  }
> >  NOKPROBE_SYMBOL(pre_handler_kretprobe);
> 
> So this is something I missed while the original code was merged, but the 
> concept 
> looks a bit weird: why do we do any "allocation" while a handler is executing?
> 
> That's fundamentally fragile. What's the maximum number of parallel 
> 'kretprobe_instance' required per kretprobe - one per CPU?

It depends on the place where we put the probe. If the probed function will be
blocked (yield to other tasks), then we need a same number of threads on
the system which can invoke the function. So, ultimately, it is same
as function_graph tracer, we need it for each thread.

> 
> If so then we should preallocate all of them when they are installed and not 
> do 
> any alloc/free dance when executing them.
> 
> This will also speed them up, and increase robustness all around.

I see, kretprobe already do that, and I keep it on the code.

By default, kretprobe will allocate NR_CPU of kretprobe_instance for each
kretprobe. For usual usecase (deeper inside functions in kernel) that is OK.
However, as Lukasz reported, for the function near the syscall entry, it may
require more instances. In that case, kretprobe user needs to increase
maxactive before registering kretprobes, which will be done by Alban's patch.

However, the next question is, how many instances are actually needed.
User may have to do trial & error loop to find that. For professional users,
they will do that, but for the light users, they may not want to do that.

I'm also considering to provide a "knob" of disabing this dynamic allocation
feature on debugfs, which will help users who would like to avoid memory
allocation on kretprobe.

Thank you,

-- 
Masami Hiramatsu 


[PATCH] net: ipv6: netfilter: Format block comments.

2017-03-29 Thread Arushi Singhal
Fix checkpatch warnings:
WARNING: Block comments use a trailing */ on a separate line
WARNING: Block comments use * on subsequent lines

Signed-off-by: Arushi Singhal 
---
 net/ipv6/netfilter/ip6_tables.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index b8cb61c27aa1..ac69ce3bfa1e 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -51,14 +51,14 @@ void *ip6t_alloc_initial_table(const struct xt_table *info)
 }
 EXPORT_SYMBOL_GPL(ip6t_alloc_initial_table);
 
-/*
-   We keep a set of rules for each CPU, so we can avoid write-locking
-   them in the softirq when updating the counters and therefore
-   only need to read-lock in the softirq; doing a write_lock_bh() in user
-   context stops packets coming through and allows user context to read
-   the counters or update the rules.
-
-   Hence the start of any table is given by get_table() below.  */
+/* We keep a set of rules for each CPU, so we can avoid write-locking
+ * them in the softirq when updating the counters and therefore
+ * only need to read-lock in the softirq; doing a write_lock_bh() in user
+ * context stops packets coming through and allows user context to read
+ * the counters or update the rules.
+ *
+ * Hence the start of any table is given by get_table() below.
+ */
 
 /* Returns whether matches rule or not. */
 /* Performance critical - called for every packet */
-- 
2.11.0



[PATCH net-next] net: mvneta: add RGMII_RXID and RGMII_TXID support

2017-03-29 Thread Jisheng Zhang
RGMII_RXID and RGMII_TX_ID share the same GMAC CTRL setting as RGMII
or RGMII_ID.

Signed-off-by: Jisheng Zhang 
---
 drivers/net/ethernet/marvell/mvneta.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index aebbc5399a06..7a6c65b44d7e 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4099,6 +4099,8 @@ static int mvneta_port_power_up(struct mvneta_port *pp, 
int phy_mode)
break;
case PHY_INTERFACE_MODE_RGMII:
case PHY_INTERFACE_MODE_RGMII_ID:
+   case PHY_INTERFACE_MODE_RGMII_RXID:
+   case PHY_INTERFACE_MODE_RGMII_TXID:
ctrl |= MVNETA_GMAC2_PORT_RGMII;
break;
default:
-- 
2.11.0



Re: [PATCH v2] netfilter: Clean up tests if NULL returned on failure

2017-03-29 Thread SIMRAN SINGHAL
On Wed, Mar 29, 2017 at 12:25 PM, Jan Engelhardt  wrote:
>
> On Tuesday 2017-03-28 18:23, SIMRAN SINGHAL wrote:
>>On Tue, Mar 28, 2017 at 7:24 PM, Jan Engelhardt  wrote:
>>> On Tuesday 2017-03-28 15:13, simran singhal wrote:
>>>
Some functions like kmalloc/kzalloc return NULL on failure. When NULL
represents failure, !x is commonly used.

@@ -910,7 +910,7 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct 
ip_vs_dest_user_kern *udest,
   }

   dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL);
-  if (dest == NULL)
+  if (!dest)
   return -ENOMEM;
>>>
>>> This kind of transformation however is not cleanup anymore, it's really
>>> bikeshedding and should be avoided. There are pro and cons for both
>>> variants, and there is not really an overwhelming number of arguments
>>> for either variant to justify the change.
>>
>>Sorry, but I didn't get what you are trying to convey. And particularly pros 
>>and
>>cons of both variants.
>
> The ==NULL/!=NULL part sort of ensures that the left side is a pointer, which
> is lost when just using the variable and have it implicitly convert to bool.

Thanks for the explaination

But, according to me we should prefer != NULL over ==NULL according to
coding style.


Re: [PATCH 3/3] net: stmmac: Prefer kcalloc() over kmalloc_array()

2017-03-29 Thread Niklas Cassel
(resending mail without SPAM header)

Hi Thierry

Sorry that I missed your previous email,
for some reason it got stuck in the spam filter.

Really good catch. This patch fixes the random
RX brokenness for me. Thanks!

It would be nice with a Fixes tag though,
but lately there's been so many changes,
so it might be hard to point to a single commit.

Nevertheless:

Tested-by: Niklas Cassel 

On 03/28/2017 03:57 PM, Thierry Reding wrote:
> From: Thierry Reding 
> 
> Some of the data in the new queue structures seems to not be properly
> initialized, causing undefined behaviour (networking will work about 2
> out of 10 tries). kcalloc() will zero the allocated memory and results
> in 10 out of 10 successful boots.
> 
> Signed-off-by: Thierry Reding 
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 37 
> +++
>  1 file changed, 17 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index ec5bba85c529..845320bc 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1412,13 +1412,12 @@ static void free_dma_desc_resources(struct 
> stmmac_priv *priv)
>  static int alloc_rx_dma_desc_resources(struct stmmac_priv *priv)
>  {
>   u32 rx_count = priv->plat->rx_queues_to_use;
> + struct stmmac_rx_queue *rx_q;
>   int ret = -ENOMEM;
>   u32 queue = 0;
>  
>   /* Allocate RX queues array */
> - priv->rx_queue = kmalloc_array(rx_count,
> -sizeof(struct stmmac_rx_queue),
> -GFP_KERNEL);
> + priv->rx_queue = kcalloc(rx_count, sizeof(*rx_q), GFP_KERNEL);
>   if (!priv->rx_queue) {
>   kfree(priv->rx_queue);
>   return -ENOMEM;
> @@ -1426,20 +1425,19 @@ static int alloc_rx_dma_desc_resources(struct 
> stmmac_priv *priv)
>  
>   /* RX queues buffers and DMA */
>   for (queue = 0; queue < rx_count; queue++) {
> - struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
> + rx_q = &priv->rx_queue[queue];
>  
>   rx_q->queue_index = queue;
>   rx_q->priv_data = priv;
>  
> - rx_q->rx_skbuff_dma = kmalloc_array(DMA_RX_SIZE,
> - sizeof(dma_addr_t),
> - GFP_KERNEL);
> + rx_q->rx_skbuff_dma = kcalloc(DMA_RX_SIZE, sizeof(dma_addr_t),
> +   GFP_KERNEL);
>   if (!rx_q->rx_skbuff_dma)
>   goto err_dma_buffers;
>  
> - rx_q->rx_skbuff = kmalloc_array(DMA_RX_SIZE,
> - sizeof(struct sk_buff *),
> - GFP_KERNEL);
> + rx_q->rx_skbuff = kcalloc(DMA_RX_SIZE,
> +   sizeof(struct sk_buff *),
> +   GFP_KERNEL);
>   if (!rx_q->rx_skbuff)
>   goto err_dma_buffers;
>  
> @@ -1477,33 +1475,32 @@ static int alloc_rx_dma_desc_resources(struct 
> stmmac_priv *priv)
>  static int alloc_tx_dma_desc_resources(struct stmmac_priv *priv)
>  {
>   u32 tx_count = priv->plat->tx_queues_to_use;
> + struct stmmac_tx_queue *tx_q;
>   int ret = -ENOMEM;
>   u32 queue = 0;
>  
>   /* Allocate TX queues array */
> - priv->tx_queue = kmalloc_array(tx_count,
> -sizeof(struct stmmac_tx_queue),
> -GFP_KERNEL);
> + priv->tx_queue = kcalloc(tx_count, sizeof(*tx_q), GFP_KERNEL);
>   if (!priv->tx_queue)
>   return -ENOMEM;
>  
>   /* TX queues buffers and DMA */
>   for (queue = 0; queue < tx_count; queue++) {
> - struct stmmac_tx_queue *tx_q = &priv->tx_queue[queue];
> + tx_q = &priv->tx_queue[queue];
>  
>   tx_q->queue_index = queue;
>   tx_q->priv_data = priv;
>  
> - tx_q->tx_skbuff_dma = kmalloc_array(DMA_TX_SIZE,
> -   sizeof(struct stmmac_tx_info),
> -   GFP_KERNEL);
> + tx_q->tx_skbuff_dma = kcalloc(DMA_TX_SIZE,
> +   sizeof(struct stmmac_tx_info),
> +   GFP_KERNEL);
>  
>   if (!tx_q->tx_skbuff_dma)
>   goto err_dma_buffers;
>  
> - tx_q->tx_skbuff = kmalloc_array(DMA_TX_SIZE,
> - sizeof(struct sk_buff *),
> - GFP_KERNEL);
> + tx_q->tx_skbuff = kcalloc(DMA_TX_SIZE,
> +   sizeof(struct sk_buff *),
> +   

[PATCH net-next v2] net: mvneta: set rx mode during resume if interface is running

2017-03-29 Thread Jisheng Zhang
I found a bug by:

0. boot and start dhcp client
1. echo mem > /sys/power/state
2. resume back immediately
3. don't touch dhcp client to renew the lease
4. ping the gateway. No acks

Usually, after step2, the DHCP lease isn't expired, so in theory we
should resume all back. But in fact, it doesn't. It turns out
the rx mode isn't resumed correctly. This patch fixes it by adding
mvneta_set_rx_mode(dev) in the resume hook if interface is running.

Signed-off-by: Jisheng Zhang 
---
Since v1:
 - rebased to the latest net-next tree and explictly mention it

 drivers/net/ethernet/marvell/mvneta.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index aebbc5399a06..cc126204dc4d 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4449,8 +4449,11 @@ static int mvneta_resume(struct device *device)
mvneta_fixed_link_update(pp, dev->phydev);
 
netif_device_attach(dev);
-   if (netif_running(dev))
+   if (netif_running(dev)) {
mvneta_open(dev);
+   mvneta_set_rx_mode(dev);
+   }
+
return 0;
 }
 #endif
-- 
2.11.0



Re: in_irq_or_nmi()

2017-03-29 Thread Jesper Dangaard Brouer
On Wed, 29 Mar 2017 10:12:19 +0200
Peter Zijlstra  wrote:

> On Mon, Mar 27, 2017 at 09:58:17AM -0700, Matthew Wilcox wrote:
> > On Mon, Mar 27, 2017 at 05:15:00PM +0200, Jesper Dangaard Brouer wrote:  
> > > And I also verified it worked:
> > > 
> > >   0.63 │   mov__preempt_count,%eax
> > >│ free_hot_cold_page():
> > >   1.25 │   test   $0x1f,%eax
> > >│ ↓ jne1e4
> > > 
> > > And this simplification also made the compiler change this into a
> > > unlikely branch, which is a micro-optimization (that I will leave up to
> > > the compiler).  
> > 
> > Excellent!  That said, I think we should define in_irq_or_nmi() in
> > preempt.h, rather than hiding it in the memory allocator.  And since we're
> > doing that, we might as well make it look like the other definitions:
> > 
> > diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> > index 7eeceac52dea..af98c29abd9d 100644
> > --- a/include/linux/preempt.h
> > +++ b/include/linux/preempt.h
> > @@ -81,6 +81,7 @@
> >  #define in_interrupt() (irq_count())
> >  #define in_serving_softirq()   (softirq_count() & SOFTIRQ_OFFSET)
> >  #define in_nmi()   (preempt_count() & NMI_MASK)
> > +#define in_irq_or_nmi()(preempt_count() & (HARDIRQ_MASK | 
> > NMI_MASK))
> >  #define in_task()  (!(preempt_count() & \
> >(NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
> >
> 
> No, that's horrible. Also, wth is this about? A memory allocator that
> needs in_nmi()? That sounds beyond broken.

It is the other way around. We want to exclude NMI and HARDIRQ from
using the per-cpu-pages (pcp) lists "order-0 cache" (they will
fall-through using the normal buddy allocator path).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


[PATCH] net: ipv6: Removed unnecessary parenthesis

2017-03-29 Thread Arushi Singhal
Removed parentheses on the right hand side of assignment, as they are
not required. The following coccinelle script was used to fix this
issue:

@@
local idexpression id;
expression e;
@@

id =
-(
e
-)

Signed-off-by: Arushi Singhal 
---
 net/ipv6/netfilter/nf_conntrack_reasm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c 
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index fdde76e8a16a..34dfdc2fb2bc 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -387,9 +387,9 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff 
*prev,  struct net_devic
return false;
 
/* Unfragmented part is taken from the first segment. */
-   payload_len = ((head->data - skb_network_header(head)) -
+   payload_len = (head->data - skb_network_header(head)) -
   sizeof(struct ipv6hdr) + fq->q.len -
-  sizeof(struct frag_hdr));
+  sizeof(struct frag_hdr);
if (payload_len > IPV6_MAXPLEN) {
net_dbg_ratelimited("nf_ct_frag6_reasm: payload len = %d\n",
payload_len);
-- 
2.11.0



Re: [PATCH v2] netfilter: Clean up tests if NULL returned on failure

2017-03-29 Thread SIMRAN SINGHAL
On Wed, Mar 29, 2017 at 2:19 PM, SIMRAN SINGHAL
 wrote:
> On Wed, Mar 29, 2017 at 12:25 PM, Jan Engelhardt  wrote:
>>
>> On Tuesday 2017-03-28 18:23, SIMRAN SINGHAL wrote:
>>>On Tue, Mar 28, 2017 at 7:24 PM, Jan Engelhardt  wrote:
 On Tuesday 2017-03-28 15:13, simran singhal wrote:

>Some functions like kmalloc/kzalloc return NULL on failure. When NULL
>represents failure, !x is commonly used.
>
>@@ -910,7 +910,7 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct 
>ip_vs_dest_user_kern *udest,
>   }
>
>   dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL);
>-  if (dest == NULL)
>+  if (!dest)
>   return -ENOMEM;

 This kind of transformation however is not cleanup anymore, it's really
 bikeshedding and should be avoided. There are pro and cons for both
 variants, and there is not really an overwhelming number of arguments
 for either variant to justify the change.
>>>
>>>Sorry, but I didn't get what you are trying to convey. And particularly pros 
>>>and
>>>cons of both variants.
>>
>> The ==NULL/!=NULL part sort of ensures that the left side is a pointer, which
>> is lost when just using the variable and have it implicitly convert to bool.
>
> Thanks for the explaination
>
> But, according to me we should prefer != NULL over ==NULL according to
> coding style.

Sorry their is typing mistake in above.

But, according to me we should prefer !var over ( var ==NULL ) according to the
coding style


Re: in_irq_or_nmi()

2017-03-29 Thread Peter Zijlstra
On Wed, Mar 29, 2017 at 10:59:28AM +0200, Jesper Dangaard Brouer wrote:
> On Wed, 29 Mar 2017 10:12:19 +0200
> Peter Zijlstra  wrote:
> 
> > On Mon, Mar 27, 2017 at 09:58:17AM -0700, Matthew Wilcox wrote:
> > > On Mon, Mar 27, 2017 at 05:15:00PM +0200, Jesper Dangaard Brouer wrote:  
> > > > And I also verified it worked:
> > > > 
> > > >   0.63 │   mov__preempt_count,%eax
> > > >│ free_hot_cold_page():
> > > >   1.25 │   test   $0x1f,%eax
> > > >│ ↓ jne1e4
> > > > 
> > > > And this simplification also made the compiler change this into a
> > > > unlikely branch, which is a micro-optimization (that I will leave up to
> > > > the compiler).  
> > > 
> > > Excellent!  That said, I think we should define in_irq_or_nmi() in
> > > preempt.h, rather than hiding it in the memory allocator.  And since we're
> > > doing that, we might as well make it look like the other definitions:
> > > 
> > > diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> > > index 7eeceac52dea..af98c29abd9d 100644
> > > --- a/include/linux/preempt.h
> > > +++ b/include/linux/preempt.h
> > > @@ -81,6 +81,7 @@
> > >  #define in_interrupt()   (irq_count())
> > >  #define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET)
> > >  #define in_nmi() (preempt_count() & NMI_MASK)
> > > +#define in_irq_or_nmi()  (preempt_count() & (HARDIRQ_MASK | 
> > > NMI_MASK))
> > >  #define in_task()(!(preempt_count() & \
> > >  (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
> > >
> > 
> > No, that's horrible. Also, wth is this about? A memory allocator that
> > needs in_nmi()? That sounds beyond broken.
> 
> It is the other way around. We want to exclude NMI and HARDIRQ from
> using the per-cpu-pages (pcp) lists "order-0 cache" (they will
> fall-through using the normal buddy allocator path).

Any in_nmi() code arriving at the allocator is broken. No need to fix
the allocator.


[PATCH net-next v1 2/2] tipc: allow rdm/dgram socketpairs

2017-03-29 Thread Parthasarathy Bhuvaragan
From: Erik Hugne 

for socketpairs using connectionless transport, we cache
the respective node local TIPC portid to use in subsequent
calls to send() in the socket's private data.

Signed-off-by: Erik Hugne 
Signed-off-by: Parthasarathy Bhuvaragan 
---
 net/tipc/socket.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 1198dddf72e8..15f6ce7bf868 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2515,9 +2515,21 @@ static int tipc_socketpair(struct socket *sock1, struct 
socket *sock2)
 {
struct tipc_sock *tsk2 = tipc_sk(sock2->sk);
struct tipc_sock *tsk1 = tipc_sk(sock1->sk);
-
-   tipc_sk_finish_conn(tsk1, tsk2->portid, 0);
-   tipc_sk_finish_conn(tsk2, tsk1->portid, 0);
+   u32 onode = tipc_own_addr(sock_net(sock1->sk));
+
+   tsk1->peer.family = AF_TIPC;
+   tsk1->peer.addrtype = TIPC_ADDR_ID;
+   tsk1->peer.scope = TIPC_NODE_SCOPE;
+   tsk1->peer.addr.id.ref = tsk2->portid;
+   tsk1->peer.addr.id.node = onode;
+   tsk2->peer.family = AF_TIPC;
+   tsk2->peer.addrtype = TIPC_ADDR_ID;
+   tsk2->peer.scope = TIPC_NODE_SCOPE;
+   tsk2->peer.addr.id.ref = tsk1->portid;
+   tsk2->peer.addr.id.node = onode;
+
+   tipc_sk_finish_conn(tsk1, tsk2->portid, onode);
+   tipc_sk_finish_conn(tsk2, tsk1->portid, onode);
return 0;
 }
 
@@ -2529,7 +2541,7 @@ static const struct proto_ops msg_ops = {
.release= tipc_release,
.bind   = tipc_bind,
.connect= tipc_connect,
-   .socketpair = sock_no_socketpair,
+   .socketpair = tipc_socketpair,
.accept = sock_no_accept,
.getname= tipc_getname,
.poll   = tipc_poll,
-- 
2.1.4



[PATCH net-next v1 1/2] tipc: add support for stream/seqpacket socketpairs

2017-03-29 Thread Parthasarathy Bhuvaragan
From: Erik Hugne 

sockets A and B are connected back-to-back, similar to what
AF_UNIX does.

Signed-off-by: Erik Hugne 
Signed-off-by: Parthasarathy Bhuvaragan 
---
 net/tipc/socket.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 7130e73bd42c..1198dddf72e8 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2511,6 +2511,16 @@ static int tipc_ioctl(struct socket *sock, unsigned int 
cmd, unsigned long arg)
}
 }
 
+static int tipc_socketpair(struct socket *sock1, struct socket *sock2)
+{
+   struct tipc_sock *tsk2 = tipc_sk(sock2->sk);
+   struct tipc_sock *tsk1 = tipc_sk(sock1->sk);
+
+   tipc_sk_finish_conn(tsk1, tsk2->portid, 0);
+   tipc_sk_finish_conn(tsk2, tsk1->portid, 0);
+   return 0;
+}
+
 /* Protocol switches for the various types of TIPC sockets */
 
 static const struct proto_ops msg_ops = {
@@ -2540,7 +2550,7 @@ static const struct proto_ops packet_ops = {
.release= tipc_release,
.bind   = tipc_bind,
.connect= tipc_connect,
-   .socketpair = sock_no_socketpair,
+   .socketpair = tipc_socketpair,
.accept = tipc_accept,
.getname= tipc_getname,
.poll   = tipc_poll,
@@ -2561,7 +2571,7 @@ static const struct proto_ops stream_ops = {
.release= tipc_release,
.bind   = tipc_bind,
.connect= tipc_connect,
-   .socketpair = sock_no_socketpair,
+   .socketpair = tipc_socketpair,
.accept = tipc_accept,
.getname= tipc_getname,
.poll   = tipc_poll,
-- 
2.1.4



[PATCH net-next v1 0/2] tipc: add socketpair support

2017-03-29 Thread Parthasarathy Bhuvaragan
We add socketpair support for connection oriented sockets in
the first patch and for connection less in the second.

Erik Hugne (2):
  tipc: add support for stream/seqpacket socketpairs
  tipc: allow rdm/dgram socketpairs

 net/tipc/socket.c | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

-- 
2.1.4



Re: Bisected softirq accounting issue in v4.11-rc1~170^2~28

2017-03-29 Thread Jesper Dangaard Brouer
On Tue, 28 Mar 2017 23:11:22 +0200
Frederic Weisbecker  wrote:

> On Tue, Mar 28, 2017 at 05:23:03PM +0200, Jesper Dangaard Brouer wrote:
> > On Tue, 28 Mar 2017 16:34:36 +0200
> > Frederic Weisbecker  wrote:
> >   
> > > On Tue, Mar 28, 2017 at 10:14:03AM +0200, Jesper Dangaard Brouer wrote:  
> > > > 
> > > > (While evaluating some changes to the page allocator) I ran into an
> > > > issue with ksoftirqd getting too much CPU sched time.
> > > > 
> > > > I bisected the problem to
> > > >  a499a5a14dbd ("sched/cputime: Increment kcpustat directly on irqtime 
> > > > account")
> > > > 
> > > >  a499a5a14dbd1d0315a96fc62a8798059325e9e6 is the first bad commit
> > > >  commit a499a5a14dbd1d0315a96fc62a8798059325e9e6
> > > >  Author: Frederic Weisbecker 
> > > >  Date:   Tue Jan 31 04:09:32 2017 +0100
> > > > 
> > > > sched/cputime: Increment kcpustat directly on irqtime account
> > > > 
> > > > The irqtime is accounted is nsecs and stored in
> > > > cpu_irq_time.hardirq_time and cpu_irq_time.softirq_time. Once the
> > > > accumulated amount reaches a new jiffy, this one gets accounted to 
> > > > the
> > > > kcpustat.
> > > > 
> > > > This was necessary when kcpustat was stored in cputime_t, which 
> > > > could at
> > > > worst have jiffies granularity. But now kcpustat is stored in nsecs
> > > > so this whole discretization game with temporary irqtime storage has
> > > > become unnecessary.
> > > > 
> > > > We can now directly account the irqtime to the kcpustat.
> > > > 
> > > > Signed-off-by: Frederic Weisbecker 
> > > > Cc: Benjamin Herrenschmidt 
> > > > Cc: Fenghua Yu 
> > > > Cc: Heiko Carstens 
> > > > Cc: Linus Torvalds 
> > > > Cc: Martin Schwidefsky 
> > > > Cc: Michael Ellerman 
> > > > Cc: Paul Mackerras 
> > > > Cc: Peter Zijlstra 
> > > > Cc: Rik van Riel 
> > > > Cc: Stanislaw Gruszka 
> > > > Cc: Thomas Gleixner 
> > > > Cc: Tony Luck 
> > > > Cc: Wanpeng Li 
> > > > Link: 
> > > > http://lkml.kernel.org/r/1485832191-26889-17-git-send-email-fweis...@gmail.com
> > > > Signed-off-by: Ingo Molnar 
> > > > 
> > > > The reproducer is running a userspace udp_sink[1] program, and taskset
> > > > pinning the process to the same CPU as softirq RX is running on, and
> > > > starting a UDP flood with pktgen (tool part of kernel tree:
> > > > samples/pktgen/pktgen_sample03_burst_single_flow.sh).
> > > 
> > > So that means I need to run udp_sink on the same CPU than pktgen?  
> > 
> > No, you misunderstood.  I run pktgen on another physical machine, which
> > is sending UDP packets towards my Device-Under-Test (DUT) target.  The
> > DUT-target is receiving packets and I observe which CPU the NIC is
> > delivering these packets to.  
> 
> Ah ok, so I tried to run pktgen on another machine and I get that strange 
> write error:
> 
> # ./pktgen_sample03_burst_single_flow.sh -d 192.168.1.3  -i wlan0
> ./functions.sh: ligne 76 : echo: erreur d'�criture : Erreur inconnue 524
> ERROR: Write error(1) occurred cmd: "clone_skb 10 > 
> /proc/net/pktgen/wlan0@0"
> 
> Any idea?

Yes, this interface does not support pktgen "clone_skb".  You can
supply cmdline argument "-c 0" to fix this.  But I suspect that this
interface also does not support "burst", thus you also need "-b 0".

See all cmdline args via: ./pktgen_sample03_burst_single_flow.sh -h

Why are you using a wifi interface for this kind of overload testing?
(the basic test here is making sure softirq is busy 100%, and at slow
wifi speeds this might not be possible to force ksoftirqd into this
scheduler state)


> > 
> > E.g determine RX-CPU via mpstat command:
> >  mpstat -P ALL -u -I SCPU -I SUM 2
> > 
> > I then start udp_sink, pinned to the RX-CPU, like:
> >  sudo taskset -c 2 ./udp_sink --port 9 --count $((10**6)) --recvmsg 
> > --repeat 1000  
> 
> Ah thanks for these hints!
> 
> > > > After this commit, the udp_sink program does not get any sched CPU
> > > > time, and no packets are delivered to userspace.  (All packets are
> > > > dropped by softirq due to a full socket queue, nstat
> > > > UdpRcvbufErrors).
> > > > 
> > > > A related symptom is that ksoftirqd no longer get accounted in
> > > > top.
> > > 
> > > That's indeed what I observe. udp_sink has almost no CPU time,
> > > neither has ksoftirqd but kpktgend_0 has everything.
> > > 
> > > Finally a bug I can reproduce!  
> > 
> > Good to hear you can reproduce it! :-)  
> 
> Well, since I was generating the packets locally, maybe it didn't trigger
> the expected interrupts...

Well, you definitely didn't create the test case I was using.  I cannot
remember if the pktgen kthreads runs in softirq context, but I suspect
it does. If so, you can recreate the main problem, which is a softirq
thread using 100% CPU time, which cause no other processes getting
sched time on that CPU.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer a

Re: [PATCH v2] netfilter: Clean up tests if NULL returned on failure

2017-03-29 Thread Jan Engelhardt

On Wednesday 2017-03-29 11:15, SIMRAN SINGHAL wrote:
>>   dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL);
>>-  if (dest == NULL)
>>+  if (!dest)
>>   return -ENOMEM;
>
>But, according to me we should prefer !var over ( var ==NULL ) according to the
>coding style

Where does it say that?


Re: [PATCH net-next 0/7] netconf: Add support for RTM_DELNETCONF

2017-03-29 Thread Nicolas Dichtel
Le 29/03/2017 à 07:32, David Miller a écrit :
> From: David Ahern 
> Date: Tue, 28 Mar 2017 14:28:00 -0700
> 
>> netconf notifications are sent as devices register but not when they
>> are deleted leaving userspace caches out of sync. Add support for
>> RTM_DELNETCONF to ipv4, ipv6 and mpls.
Not sure why those notifications are needed. When an interface is set down, ipv4
route deletion are not notified. Why is it needed for netconf?

>>
>> MPLS is missing RTM_NEWNETCONF as devices are created, so add it as well.
> 
> Series applied, thanks.
> 


[PATCH net-next v3 5/5] net-next: dsa: add dsa support for Mediatek MT7530 switch

2017-03-29 Thread sean.wang
From: Sean Wang 

MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on
Mediatek router platforms such as MT7623A or MT7623N platform which
includes 7-port Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY.
Among these ports, The port from 0 to 4 are the user ports connecting
with the remote devices while the port 5 and 6 are the CPU ports
connecting into Mediatek Ethernet GMAC.

For port 6, it can communicate with the CPU via Mediatek Ethernet GMAC
through either the TRGMII or RGMII which could be controlled by phy-mode
in the dt-bindings to specify which mode is preferred to use. And for
port 5, only RGMII can be specified. However, currently, only port 6 is
being supported in this DSA driver.

The driver is made with the reference to qca8k and other existing DSA
driver. The most of the essential callbacks of the DSA are already
support in the driver, including tag insert for user port distinguishing,
port control, bridge offloading, STP setup and ethtool operation to allow
DSA to model each user port into a standalone netdevice as the other DSA
driver had done.

Signed-off-by: Sean Wang 
Signed-off-by: Landen Chao 
---
 drivers/net/dsa/Kconfig  |8 +
 drivers/net/dsa/Makefile |2 +-
 drivers/net/dsa/mt7530.c | 1126 ++
 drivers/net/dsa/mt7530.h |  390 
 4 files changed, 1525 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/dsa/mt7530.c
 create mode 100644 drivers/net/dsa/mt7530.h

diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index 0659846..5b322b4 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -34,4 +34,12 @@ config NET_DSA_QCA8K
  This enables support for the Qualcomm Atheros QCA8K Ethernet
  switch chips.
 
+config NET_DSA_MT7530
+   tristate "Mediatek MT7530 Ethernet switch support"
+   depends on NET_DSA
+   select NET_DSA_TAG_MTK
+   ---help---
+ This enables support for the Mediatek MT7530 Ethernet switch
+ chip.
+
 endmenu
diff --git a/drivers/net/dsa/Makefile b/drivers/net/dsa/Makefile
index a3c9416..8e629c1 100644
--- a/drivers/net/dsa/Makefile
+++ b/drivers/net/dsa/Makefile
@@ -2,6 +2,6 @@ obj-$(CONFIG_NET_DSA_MV88E6060) += mv88e6060.o
 obj-$(CONFIG_NET_DSA_BCM_SF2)  += bcm-sf2.o
 bcm-sf2-objs   := bcm_sf2.o bcm_sf2_cfp.o
 obj-$(CONFIG_NET_DSA_QCA8K)+= qca8k.o
-
+obj-$(CONFIG_NET_DSA_MT7530) += mt7530.o
 obj-y  += b53/
 obj-y  += mv88e6xxx/
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
new file mode 100644
index 000..ad2e6f8
--- /dev/null
+++ b/drivers/net/dsa/mt7530.c
@@ -0,0 +1,1126 @@
+/*
+ * Mediatek MT7530 DSA Switch driver
+ * Copyright (C) 2017 Sean Wang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "mt7530.h"
+
+/* String, offset, and register size in bytes if different from 4 bytes */
+static const struct mt7530_mib_desc mt7530_mib[] = {
+   MIB_DESC(1, 0x00, "TxDrop"),
+   MIB_DESC(1, 0x04, "TxCrcErr"),
+   MIB_DESC(1, 0x08, "TxUnicast"),
+   MIB_DESC(1, 0x0c, "TxMulticast"),
+   MIB_DESC(1, 0x10, "TxBroadcast"),
+   MIB_DESC(1, 0x14, "TxCollision"),
+   MIB_DESC(1, 0x18, "TxSingleCollision"),
+   MIB_DESC(1, 0x1c, "TxMultipleCollision"),
+   MIB_DESC(1, 0x20, "TxDeferred"),
+   MIB_DESC(1, 0x24, "TxLateCollision"),
+   MIB_DESC(1, 0x28, "TxExcessiveCollistion"),
+   MIB_DESC(1, 0x2c, "TxPause"),
+   MIB_DESC(1, 0x30, "TxPktSz64"),
+   MIB_DESC(1, 0x34, "TxPktSz65To127"),
+   MIB_DESC(1, 0x38, "TxPktSz128To255"),
+   MIB_DESC(1, 0x3c, "TxPktSz256To511"),
+   MIB_DESC(1, 0x40, "TxPktSz512To1023"),
+   MIB_DESC(1, 0x44, "Tx1024ToMax"),
+   MIB_DESC(2, 0x48, "TxBytes"),
+   MIB_DESC(1, 0x60, "RxDrop"),
+   MIB_DESC(1, 0x64, "RxFiltering"),
+   MIB_DESC(1, 0x6c, "RxMulticast"),
+   MIB_DESC(1, 0x70, "RxBroadcast"),
+   MIB_DESC(1, 0x74, "RxAlignErr"),
+   MIB_DESC(1, 0x78, "RxCrcErr"),
+   MIB_DESC(1, 0x7c, "RxUnderSizeErr"),
+   MIB_DESC(1, 0x80, "RxFragErr"),
+   MIB_DESC(1, 0x84, "RxOverSzErr"),
+   MIB_DESC(1, 0x88, "RxJabberErr"),
+   MIB_DESC(1, 0x8c, "RxPause"),
+   MIB_DESC(1, 0x90, "RxPktSz64"),
+   MIB_DESC(1, 0x94, "RxPktSz65To127"),
+   MIB_DESC(1, 0x98, "RxPktSz128To255"),
+   M

[PATCH net-next v3 1/5] dt-bindings: net: dsa: add Mediatek MT7530 binding

2017-03-29 Thread sean.wang
From: Sean Wang 

Add device-tree binding for Mediatek MT7530 switch.

Cc: devicet...@vger.kernel.org
Signed-off-by: Sean Wang 
Acked-by: Rob Herring 
---
 .../devicetree/bindings/net/dsa/mt7530.txt | 92 ++
 1 file changed, 92 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/dsa/mt7530.txt

diff --git a/Documentation/devicetree/bindings/net/dsa/mt7530.txt 
b/Documentation/devicetree/bindings/net/dsa/mt7530.txt
new file mode 100644
index 000..a9bc27b
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/dsa/mt7530.txt
@@ -0,0 +1,92 @@
+Mediatek MT7530 Ethernet switch
+
+
+Required properties:
+
+- compatible: Must be compatible = "mediatek,mt7530";
+- #address-cells: Must be 1.
+- #size-cells: Must be 0.
+- mediatek,mcm: Boolean; if defined, indicates that either MT7530 is the part
+   on multi-chip module belong to MT7623A has or the remotely standalone
+   chip as the function MT7623N reference board provided for.
+- core-supply: Phandle to the regulator node necessary for the core power.
+- io-supply: Phandle to the regulator node necessary for the I/O power.
+   See Documentation/devicetree/bindings/regulator/mt6323-regulator.txt
+   for details for the regulator setup on these boards.
+
+If the property mediatek,mcm isn't defined, following property is required
+
+- reset-gpios: Should be a gpio specifier for a reset line.
+
+Else, following properties are required
+
+- resets : Phandle pointing to the system reset controller with
+   line index for the ethsys.
+- reset-names : Should be set to "mcm".
+
+Required properties for the child nodes within ports container:
+
+- reg: Port address described must be 6 for CPU port and from 0 to 5 for
+   user ports.
+- phy-mode: String, must be either "trgmii" or "rgmii" for port labeled
+"cpu".
+
+See Documentation/devicetree/bindings/dsa/dsa.txt for a list of additional
+required, optional properties and how the integrated switch subnodes must
+be specified.
+
+Example:
+
+   &mdio0 {
+   switch@0 {
+   compatible = "mediatek,mt7530";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0>;
+
+   core-supply = <&mt6323_vpa_reg>;
+   io-supply = <&mt6323_vemc3v3_reg>;
+   reset-gpios = <&pio 33 0>;
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0>;
+   port@0 {
+   reg = <0>;
+   label = "lan0";
+   };
+
+   port@1 {
+   reg = <1>;
+   label = "lan1";
+   };
+
+   port@2 {
+   reg = <2>;
+   label = "lan2";
+   };
+
+   port@3 {
+   reg = <3>;
+   label = "lan3";
+   };
+
+   port@4 {
+   reg = <4>;
+   label = "wan";
+   };
+
+   port@6 {
+   reg = <6>;
+   label = "cpu";
+   ethernet = <&gmac0>;
+   phy-mode = "trgmii";
+   fixed-link {
+   speed = <1000>;
+   full-duplex;
+   };
+   };
+   };
+   };
+   };
-- 
1.9.1



[PATCH net-next v3 0/5] net-next: dsa: add Mediatek MT7530 support

2017-03-29 Thread sean.wang
From: Sean Wang 

MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on
Mediatek router platforms such as MT7623A or MT7623N which includes 7-port
Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY. Among these ports,
The port from 0 to 4 are the user ports connecting with the remote devices
while the port 5 and 6 are the CPU ports connecting into Mediatek Ethernet
GMAC.

The patch series integrated Mediatek MT7530 into DSA support which
includes the most of the essential callbacks such as tag insertion for
port distinguishing, port control, bridge offloading, STP setup and
ethtool operations to allow DSA to model each user port into independently
standalone netdevice as the other DSA driver had done.

Changes since v1:
- rebased into 4.11-rc1
- refined binding document including below five items 
- changed the type of mediatek,mcm into bool
- used reset controller binding for MCM reset and removed "mediatek,ethsys"
  property from binding
- reused CPU port's ethernet Phandle instead of creating new one and removed
  "mediatek,ethernet" property from binding
- aligned naming for GPIO reset with dsa/marvell.txt
- added phy-mode as required property child nodes within ports container
- handled gpio reset with devm_gpiod_* API
- refined comment words
- removed condition for CDM setting since the setup looks both fine for all 
cases
- allowed of_find_net_device_by_node() working with pointing the device node 
into
  real netdev instance
- fixed Kbuild warnings

Changes since v2:
- reuse readx_poll_timeout() to poll
- add proper macro instead of hard coding
- treat inconsistent cpu port as warning
- remove the usage for regmap-debugfs
- show error message when invalid id is found
- put the logic for the setup of trgmii into adjut_link()
- refine and reuse logic between port_[disable,enable], and default port setup 
- correct typo

Sean Wang (5):
  dt-bindings: net: dsa: add Mediatek MT7530 binding
  net-next: dsa: add Mediatek tag RX/TX handler
  net-next: ethernet: mediatek: add CDM able to recognize the tag for
DSA
  net-next: ethernet: mediatek: add device_node of GMAC pointing into
the netdev instance
  net-next: dsa: add dsa support for Mediatek MT7530 switch

 .../devicetree/bindings/net/dsa/mt7530.txt |   92 ++
 drivers/net/dsa/Kconfig|8 +
 drivers/net/dsa/Makefile   |2 +-
 drivers/net/dsa/mt7530.c   | 1126 
 drivers/net/dsa/mt7530.h   |  390 +++
 drivers/net/ethernet/mediatek/mtk_eth_soc.c|8 +
 drivers/net/ethernet/mediatek/mtk_eth_soc.h|4 +
 include/net/dsa.h  |1 +
 net/dsa/Kconfig|2 +
 net/dsa/Makefile   |1 +
 net/dsa/dsa.c  |3 +
 net/dsa/dsa_priv.h |3 +
 net/dsa/tag_mtk.c  |  117 ++
 13 files changed, 1756 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/net/dsa/mt7530.txt
 create mode 100644 drivers/net/dsa/mt7530.c
 create mode 100644 drivers/net/dsa/mt7530.h
 create mode 100644 net/dsa/tag_mtk.c

-- 
1.9.1



[PATCH net-next v3 4/5] net-next: ethernet: mediatek: add device_node of GMAC pointing into the netdev instance

2017-03-29 Thread sean.wang
From: Sean Wang 

the patch adds the setup of the corresponding device node of GMAC into the
netdev instance which could allow other modules such as DSA to find the
instance through the node in dt-bindings using of_find_net_device_by_node()
call.

Signed-off-by: Sean Wang 
Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index c21ed99..84b09a4 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2323,6 +2323,8 @@ static int mtk_add_mac(struct mtk_eth *eth, struct 
device_node *np)
eth->netdev[id]->ethtool_ops = &mtk_ethtool_ops;
 
eth->netdev[id]->irq = eth->irq[0];
+   eth->netdev[id]->dev.of_node = np;
+
return 0;
 
 free_netdev:
-- 
1.9.1



[PATCH net-next v3 3/5] net-next: ethernet: mediatek: add CDM able to recognize the tag for DSA

2017-03-29 Thread sean.wang
From: Sean Wang 

The patch adds the setup for allowing CDM can recognize these packets with
carrying port-distinguishing tag. Otherwise, these tagging packets will be
handled incorrectly by CDM. The setup is also working out for general
untag packets as well.

Signed-off-by: Sean Wang 
Signed-off-by: Landen Chao 
Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 6 ++
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 4 
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 9e75768..c21ed99 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1846,6 +1846,12 @@ static int mtk_hw_init(struct mtk_eth *eth)
/* GE2, Force 1000M/FD, FC ON */
mtk_w32(eth, MAC_MCR_FIXED_LINK, MTK_MAC_MCR(1));
 
+   /* Indicates CDM to parse the MTK special tag from CPU
+* which also is working out for untag packets.
+*/
+   val = mtk_r32(eth, MTK_CDMQ_IG_CTRL);
+   mtk_w32(eth, val | MTK_CDMQ_STAG_EN, MTK_CDMQ_IG_CTRL);
+
/* Enable RX VLan Offloading */
mtk_w32(eth, 1, MTK_CDMP_EG_CTRL);
 
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 99b1c8e..996024d 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -70,6 +70,10 @@
 /* Frame Engine Interrupt Grouping Register */
 #define MTK_FE_INT_GRP 0x20
 
+/* CDMP Ingress Control Register */
+#define MTK_CDMQ_IG_CTRL   0x1400
+#define MTK_CDMQ_STAG_EN   BIT(0)
+
 /* CDMP Exgress Control Register */
 #define MTK_CDMP_EG_CTRL   0x404
 
-- 
1.9.1



[PATCH net-next v3 2/5] net-next: dsa: add Mediatek tag RX/TX handler

2017-03-29 Thread sean.wang
From: Sean Wang 

Add the support for the 4-bytes tag for DSA port distinguishing inserted
allowing receiving and transmitting the packet via the particular port.
The tag is being added after the source MAC address in the ethernet
header.

Signed-off-by: Sean Wang 
Signed-off-by: Landen Chao 
Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
---
 include/net/dsa.h  |   1 +
 net/dsa/Kconfig|   2 +
 net/dsa/Makefile   |   1 +
 net/dsa/dsa.c  |   3 ++
 net/dsa/dsa_priv.h |   3 ++
 net/dsa/tag_mtk.c  | 117 +
 6 files changed, 127 insertions(+)
 create mode 100644 net/dsa/tag_mtk.c

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 4e13e69..3276547 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -31,6 +31,7 @@ enum dsa_tag_protocol {
DSA_TAG_PROTO_EDSA,
DSA_TAG_PROTO_BRCM,
DSA_TAG_PROTO_QCA,
+   DSA_TAG_PROTO_MTK,
DSA_TAG_LAST,   /* MUST BE LAST */
 };
 
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 9649238..d78789b 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -31,4 +31,6 @@ config NET_DSA_TAG_TRAILER
 config NET_DSA_TAG_QCA
bool
 
+config NET_DSA_TAG_MTK
+   bool
 endif
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index 31d3437..9b1d478 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -8,3 +8,4 @@ dsa_core-$(CONFIG_NET_DSA_TAG_DSA) += tag_dsa.o
 dsa_core-$(CONFIG_NET_DSA_TAG_EDSA) += tag_edsa.o
 dsa_core-$(CONFIG_NET_DSA_TAG_TRAILER) += tag_trailer.o
 dsa_core-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o
+dsa_core-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index b6d4f6a..617f736 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -53,6 +53,9 @@ static struct sk_buff *dsa_slave_notag_xmit(struct sk_buff 
*skb,
 #ifdef CONFIG_NET_DSA_TAG_QCA
[DSA_TAG_PROTO_QCA] = &qca_netdev_ops,
 #endif
+#ifdef CONFIG_NET_DSA_TAG_MTK
+   [DSA_TAG_PROTO_MTK] = &mtk_netdev_ops,
+#endif
[DSA_TAG_PROTO_NONE] = &none_ops,
 };
 
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 0706a51..2a31399 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -85,4 +85,7 @@ int dsa_slave_create(struct dsa_switch *ds, struct device 
*parent,
 /* tag_qca.c */
 extern const struct dsa_device_ops qca_netdev_ops;
 
+/* tag_mtk.c */
+extern const struct dsa_device_ops mtk_netdev_ops;
+
 #endif
diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
new file mode 100644
index 000..833a9d6
--- /dev/null
+++ b/net/dsa/tag_mtk.c
@@ -0,0 +1,117 @@
+/*
+ * Mediatek DSA Tag support
+ * Copyright (C) 2017 Landen Chao 
+ *   Sean Wang 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include "dsa_priv.h"
+
+#define MTK_HDR_LEN4
+#define MTK_HDR_RECV_SOURCE_PORT_MASK  GENMASK(2, 0)
+#define MTK_HDR_XMIT_DP_BIT_MASK   GENMASK(5, 0)
+
+static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb,
+   struct net_device *dev)
+{
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   u8 *mtk_tag;
+
+   if (skb_cow_head(skb, MTK_HDR_LEN) < 0)
+   goto out_free;
+
+   skb_push(skb, MTK_HDR_LEN);
+
+   memmove(skb->data, skb->data + MTK_HDR_LEN, 2 * ETH_ALEN);
+
+   /* Build the tag after the MAC Source Address */
+   mtk_tag = skb->data + 2 * ETH_ALEN;
+   mtk_tag[0] = 0;
+   mtk_tag[1] = (1 << p->dp->index) & MTK_HDR_XMIT_DP_BIT_MASK;
+   mtk_tag[2] = 0;
+   mtk_tag[3] = 0;
+
+   return skb;
+
+out_free:
+   kfree_skb(skb);
+   return NULL;
+}
+
+static int mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev,
+  struct packet_type *pt, struct net_device *orig_dev)
+{
+   struct dsa_switch_tree *dst = dev->dsa_ptr;
+   struct dsa_switch *ds;
+   int port;
+   __be16 *phdr, hdr;
+
+   if (unlikely(!dst))
+   goto out_drop;
+
+   skb = skb_unshare(skb, GFP_ATOMIC);
+   if (!skb)
+   goto out;
+
+   if (unlikely(!pskb_may_pull(skb, MTK_HDR_LEN)))
+   goto out_drop;
+
+   /* The MTK header is added by the switch between src addr
+* and ethertype at this point, skb->data points to 2 bytes
+* after src addr so header should be 2 bytes right before.
+*/
+   phdr = (__be16 *)(skb->data - 2);
+   hdr = ntohs(*phdr);
+
+   /* Remove MTK tag and recalculate checksum. */
+   skb_pull_rcsum(skb, MTK_HDR_LEN);
+
+   memmove(skb->data - ETH_HLEN,
+   skb->da

Re: [PATCH net-next 7/8] vhost_net: try batch dequing from skb array

2017-03-29 Thread Jason Wang



On 2017年03月23日 13:34, Jason Wang wrote:






+{
+if (rvq->rh != rvq->rt)
+goto out;
+
+rvq->rh = rvq->rt = 0;
+rvq->rt = skb_array_consume_batched_bh(rvq->rx_array, rvq->rxq,
+VHOST_RX_BATCH);

A comment explaining why is is -bh would be helpful.


Ok.

Thanks 


Rethink about this. It looks like -bh is not needed in this case since 
no consumer run in bh.


Thanks


[PATCH] net: netfilterL: Fix line over 80 characters.

2017-03-29 Thread Arushi Singhal
fix the line over 80 characters as reported by checkpatch.pl

Signed-off-by: Arushi Singhal 
---
 net/ipv6/netfilter/ip6_tables.c| 6 --
 net/ipv6/netfilter/ip6t_SYNPROXY.c | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index ac69ce3bfa1e..b3b94cc80544 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -81,12 +81,14 @@ ip6_packet_match(const struct sk_buff *skb,
 &ip6info->dst)))
return false;
 
-   ret = ifname_compare_aligned(indev, ip6info->iniface, 
ip6info->iniface_mask);
+   ret = ifname_compare_aligned(indev, ip6info->iniface,
+ip6info->iniface_mask);
 
if (NF_INVF(ip6info, IP6T_INV_VIA_IN, ret != 0))
return false;
 
-   ret = ifname_compare_aligned(outdev, ip6info->outiface, 
ip6info->outiface_mask);
+   ret = ifname_compare_aligned(outdev, ip6info->outiface,
+ip6info->outiface_mask);
 
if (NF_INVF(ip6info, IP6T_INV_VIA_OUT, ret != 0))
return false;
diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c 
b/net/ipv6/netfilter/ip6t_SYNPROXY.c
index e0fa78085ad7..6b4d1837891b 100644
--- a/net/ipv6/netfilter/ip6t_SYNPROXY.c
+++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
@@ -58,7 +58,8 @@ synproxy_send_tcp(struct net *net,
fl6.daddr = niph->daddr;
fl6.fl6_sport = nth->source;
fl6.fl6_dport = nth->dest;
-   security_skb_classify_flow((struct sk_buff *)skb, 
flowi6_to_flowi(&fl6));
+   security_skb_classify_flow((struct sk_buff *)skb,
+  flowi6_to_flowi(&fl6));
dst = ip6_route_output(net, NULL, &fl6);
if (dst->error) {
dst_release(dst);
-- 
2.11.0



[net-next 06/13] i40e: remove FDIR_REQUIRES_REINIT driver flag

2017-03-29 Thread Jeff Kirsher
From: Jacob Keller 

This flag hasn't been used since commit 1e1be8f622ee ("i40e: ATR policy
change to flush the table to clean stale ATR rules").

Lets simplify things and just remove it.

Change-ID: I76279d84db8a2fd96f445b96aa413059f9256879
Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index f506e994861b..aa9ac2833edf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -389,7 +389,6 @@ struct i40e_pf {
 #define I40E_FLAG_MSIX_ENABLED BIT_ULL(3)
 #define I40E_FLAG_RSS_ENABLED  BIT_ULL(6)
 #define I40E_FLAG_VMDQ_ENABLED BIT_ULL(7)
-#define I40E_FLAG_FDIR_REQUIRES_REINIT BIT_ULL(8)
 #define I40E_FLAG_NEED_LINK_UPDATE BIT_ULL(9)
 #define I40E_FLAG_IWARP_ENABLEDBIT_ULL(10)
 #define I40E_FLAG_CLEAN_ADMINQ BIT_ULL(14)
-- 
2.12.0



[net-next 00/13][pull request] 40GbE Intel Wired LAN Driver Updates 2017-03-29

2017-03-29 Thread Jeff Kirsher
This series contains updates to i40e and i40evf only.

Preethi changes the default driver mode of operation to descriptor
write-back for VF.

Alex cleans up and addresses several issues in the way that i40e handles
private flags.  Modifies the driver to use the length of the packet
instead of the DD status bit to determine if a new descriptor is ready
to be processed.  Refactors the driver by pulling the code responsible
for fetching the receive buffer and synchronizing DMA into a single
function.  Also pulled the code responsible for handling buffer
recycling and page counting and distributed it through several functions,
so we can commonize the bits that handle either freeing or recycling the
buffers.  Cleans up the code in preparation for us adding support for
build_skb().  Changed the way we handle the maximum frame size for the
receive path so it is more consistent with other drivers.

Paul enables XL722 to use the direct read/write method since it does not
support the AQ command to read/write the control register.

Christopher fixes a case where we miss an arq element if a new one is
added before we enable interrupts and exit the loop.

Jake cleans up a pointless goto statement.  Also cleaned up a flag that
was not being used.

Carolyn does round 2 for adding a delay to the receive queue to
accommodate the hardware needs.

The following are changes since commit 0e42c72195cc1a6f7461bfc48b32dce29e1677f7:
  Merge branch 'netconf-delnetconf'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alexander Duyck (7):
  i40e: Clean up handling of private flags
  i40e/i40evf: Use length to determine if descriptor is done
  i40e/i40evf: Pull code for grabbing and syncing rx_buffer from
fetch_buffer
  i40e/i40evf: Pull out code for cleaning up Rx buffers
  i40e/i40evf: Break i40e_fetch_rx_buffer up to allow for reuse of frag
code
  i40e/i40evf: Add legacy-rx private flag to allow fallback to old Rx
flow
  i40e/i40evf: Change the way we limit the maximum frame size for Rx

Christopher N Bednarz (1):
  i40e: Check for new arq elements before leaving the adminq subtask
loop

Jacob Keller (2):
  i40e: remove a useless goto statement
  i40e: remove FDIR_REQUIRES_REINIT driver flag

Paul M Stillwell Jr (1):
  i40e: use register for XL722 control register read/write

Preethi Banala (1):
  i40evf: enforce descriptor write-back mechanism for VF

Wyborny, Carolyn (1):
  i40e: fix for queue timing delays

 drivers/net/ethernet/intel/i40e/i40e.h |  10 +-
 drivers/net/ethernet/intel/i40e/i40e_common.c  |   8 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 192 -
 drivers/net/ethernet/intel/i40e/i40e_main.c|  42 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c| 249 
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|   4 +-
 drivers/net/ethernet/intel/i40evf/i40e_common.c|   8 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  | 312 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  |  18 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h |   6 +-
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 104 +++
 drivers/net/ethernet/intel/i40evf/i40evf_main.c|  16 +-
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c|  18 +-
 13 files changed, 559 insertions(+), 428 deletions(-)

-- 
2.12.0



[net-next 03/13] i40e: use register for XL722 control register read/write

2017-03-29 Thread Jeff Kirsher
From: Paul M Stillwell Jr 

The XL722 doesn't support the AQ command to read/write the control
register so enable it to bypass the check and use the direct read/write
method.

Change-ID: Iefecc737b57207485c90845af5989d5af518bf16
Signed-off-by: Paul M Stillwell Jr 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c   | 8 ++--
 drivers/net/ethernet/intel/i40evf/i40e_common.c | 8 ++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 95946f41002b..f9db95aa3a20 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -4963,7 +4963,9 @@ u32 i40e_read_rx_ctl(struct i40e_hw *hw, u32 reg_addr)
int retry = 5;
u32 val = 0;
 
-   use_register = (hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver < 5);
+   use_register = (((hw->aq.api_maj_ver == 1) &&
+   (hw->aq.api_min_ver < 5)) ||
+   (hw->mac.type == I40E_MAC_X722));
if (!use_register) {
 do_retry:
status = i40e_aq_rx_ctl_read_register(hw, reg_addr, &val, NULL);
@@ -5022,7 +5024,9 @@ void i40e_write_rx_ctl(struct i40e_hw *hw, u32 reg_addr, 
u32 reg_val)
bool use_register;
int retry = 5;
 
-   use_register = (hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver < 5);
+   use_register = (((hw->aq.api_maj_ver == 1) &&
+   (hw->aq.api_min_ver < 5)) ||
+   (hw->mac.type == I40E_MAC_X722));
if (!use_register) {
 do_retry:
status = i40e_aq_rx_ctl_write_register(hw, reg_addr,
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c 
b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 89dfdbca13db..626fbf1ead4d 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -958,7 +958,9 @@ u32 i40evf_read_rx_ctl(struct i40e_hw *hw, u32 reg_addr)
int retry = 5;
u32 val = 0;
 
-   use_register = (hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver < 5);
+   use_register = (((hw->aq.api_maj_ver == 1) &&
+   (hw->aq.api_min_ver < 5)) ||
+   (hw->mac.type == I40E_MAC_X722));
if (!use_register) {
 do_retry:
status = i40evf_aq_rx_ctl_read_register(hw, reg_addr,
@@ -1019,7 +1021,9 @@ void i40evf_write_rx_ctl(struct i40e_hw *hw, u32 
reg_addr, u32 reg_val)
bool use_register;
int retry = 5;
 
-   use_register = (hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver < 5);
+   use_register = (((hw->aq.api_maj_ver == 1) &&
+   (hw->aq.api_min_ver < 5)) ||
+   (hw->mac.type == I40E_MAC_X722));
if (!use_register) {
 do_retry:
status = i40evf_aq_rx_ctl_write_register(hw, reg_addr,
-- 
2.12.0



[net-next 13/13] i40e: fix for queue timing delays

2017-03-29 Thread Jeff Kirsher
From: "Wyborny, Carolyn" 

This patch adds a delay to Rx queue disables to accommodate HW needs.

v2: Added missing check for disable only, additional details on the
need for the ugly delay and fixed spacing on comment.

Change-ID: I2864ca667ce5dcc2cc44f8718113b719742a46a1
Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1f89e416156d..a0506e28d167 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4067,6 +4067,12 @@ static int i40e_vsi_control_rx(struct i40e_vsi *vsi, 
bool enable)
}
}
 
+   /* Due to HW errata, on Rx disable only, the register can indicate done
+* before it really is. Needs 50ms to be sure
+*/
+   if (!enable)
+   mdelay(50);
+
return ret;
 }
 
-- 
2.12.0



[net-next 09/13] i40e/i40evf: Pull out code for cleaning up Rx buffers

2017-03-29 Thread Jeff Kirsher
From: Alexander Duyck 

This patch pulls out the code responsible for handling buffer recycling and
page counting and distributes it through several functions.  This allows us
to commonize the bits that handle either freeing or recycling the buffers.

As far as the page count tracking one change to the logic is that
pagecnt_bias is decremented as soon as we call i40e_get_rx_buffer.  It is
then the responsibility of the function that pulls the data to either
increment the pagecnt_bias if the buffer can be recycled as-is, or to
update page_offset so that we are pointing at the correct location for
placement of the next buffer.

Change-ID: Ibac576360cb7f0b1627f2a993d13c1a8a2bf60af
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 73 +--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 72 --
 2 files changed, 89 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f2256d8c5e35..bba41ce08124 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1294,6 +1294,8 @@ static bool i40e_alloc_mapped_page(struct i40e_ring 
*rx_ring,
bi->dma = dma;
bi->page = page;
bi->page_offset = 0;
+
+   /* initialize pagecnt_bias to 1 representing we fully own page */
bi->pagecnt_bias = 1;
 
return true;
@@ -1622,8 +1624,6 @@ static inline bool i40e_page_is_reusable(struct page 
*page)
  * the adapter for another receive
  *
  * @rx_buffer: buffer containing the page
- * @page: page address from rx_buffer
- * @truesize: actual size of the buffer in this page
  *
  * If page is reusable, rx_buffer->page_offset is adjusted to point to
  * an unused region in the page.
@@ -1646,14 +1646,13 @@ static inline bool i40e_page_is_reusable(struct page 
*page)
  *
  * In either case, if the page is reusable its refcount is increased.
  **/
-static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer,
-  struct page *page,
-  const unsigned int truesize)
+static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer)
 {
 #if (PAGE_SIZE >= 8192)
unsigned int last_offset = PAGE_SIZE - I40E_RXBUFFER_2048;
 #endif
-   unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--;
+   unsigned int pagecnt_bias = rx_buffer->pagecnt_bias;
+   struct page *page = rx_buffer->page;
 
/* Is any reuse possible? */
if (unlikely(!i40e_page_is_reusable(page)))
@@ -1661,15 +1660,9 @@ static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer 
*rx_buffer,
 
 #if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
-   if (unlikely(page_count(page) != pagecnt_bias))
+   if (unlikely((page_count(page) - pagecnt_bias) > 1))
return false;
-
-   /* flip page offset to other buffer */
-   rx_buffer->page_offset ^= truesize;
 #else
-   /* move offset up to the next cache line */
-   rx_buffer->page_offset += truesize;
-
if (rx_buffer->page_offset > last_offset)
return false;
 #endif
@@ -1678,10 +1671,11 @@ static bool i40e_can_reuse_rx_page(struct 
i40e_rx_buffer *rx_buffer,
 * the pagecnt_bias and page count so that we fully restock the
 * number of references the driver holds.
 */
-   if (unlikely(pagecnt_bias == 1)) {
+   if (unlikely(!pagecnt_bias)) {
page_ref_add(page, USHRT_MAX);
rx_buffer->pagecnt_bias = USHRT_MAX;
}
+
return true;
 }
 
@@ -1689,8 +1683,8 @@ static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer 
*rx_buffer,
  * i40e_add_rx_frag - Add contents of Rx buffer to sk_buff
  * @rx_ring: rx descriptor ring to transact packets on
  * @rx_buffer: buffer containing page to add
- * @size: packet length from rx_desc
  * @skb: sk_buff to place the data into
+ * @size: packet length from rx_desc
  *
  * This function will add the data contained in rx_buffer->page to the skb.
  * This is done either through a direct copy if the data in the buffer is
@@ -1700,10 +1694,10 @@ static bool i40e_can_reuse_rx_page(struct 
i40e_rx_buffer *rx_buffer,
  * The function will then update the page offset if necessary and return
  * true if the buffer can be reused by the adapter.
  **/
-static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
+static void i40e_add_rx_frag(struct i40e_ring *rx_ring,
 struct i40e_rx_buffer *rx_buffer,
-unsigned int size,
-struct sk_buff *skb)
+struct sk_buff *skb,
+unsigned int size)
 {
struct page *page = rx_buffer->page;
unsigned char *va = page_address(page) + rx_buffer->page_offset;
@@ -1723,12

[net-next 07/13] i40e/i40evf: Use length to determine if descriptor is done

2017-03-29 Thread Jeff Kirsher
From: Alexander Duyck 

This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.

Change-ID: Iebdf9cdb36c454ef092df27199b92ad09c374231
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 24 
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 24 
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 2ca8d13baea5..012e55354043 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1757,6 +1757,7 @@ static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
  * i40e_fetch_rx_buffer - Allocate skb and populate it
  * @rx_ring: rx descriptor ring to transact packets on
  * @rx_desc: descriptor containing info written by hardware
+ * @size: size of buffer to add to skb
  *
  * This function allocates an skb on the fly, and populates it with the page
  * data from the current receive descriptor, taking care to set up the skb
@@ -1766,13 +1767,9 @@ static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
 static inline
 struct sk_buff *i40e_fetch_rx_buffer(struct i40e_ring *rx_ring,
 union i40e_rx_desc *rx_desc,
-struct sk_buff *skb)
+struct sk_buff *skb,
+unsigned int size)
 {
-   u64 local_status_error_len =
-   le64_to_cpu(rx_desc->wb.qword1.status_error_len);
-   unsigned int size =
-   (local_status_error_len & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
-   I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
struct i40e_rx_buffer *rx_buffer;
struct page *page;
 
@@ -1890,6 +1887,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, 
int budget)
 
while (likely(total_rx_packets < budget)) {
union i40e_rx_desc *rx_desc;
+   unsigned int size;
u16 vlan_tag;
u8 rx_ptype;
u64 qword;
@@ -1906,19 +1904,21 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, 
int budget)
/* status_error_len will always be zero for unused descriptors
 * because it's cleared in cleanup, and overlaps with hdr_addr
 * which is always zero because packet split isn't used, if the
-* hardware wrote DD then it will be non-zero
+* hardware wrote DD then the length will be non-zero
 */
-   if (!i40e_test_staterr(rx_desc,
-  BIT(I40E_RX_DESC_STATUS_DD_SHIFT)))
+   qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+   size = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
+  I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
+   if (!size)
break;
 
/* This memory barrier is needed to keep us from reading
-* any other fields out of the rx_desc until we know the
-* DD bit is set.
+* any other fields out of the rx_desc until we have
+* verified the descriptor has been written back.
 */
dma_rmb();
 
-   skb = i40e_fetch_rx_buffer(rx_ring, rx_desc, skb);
+   skb = i40e_fetch_rx_buffer(rx_ring, rx_desc, skb, size);
if (!skb)
break;
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index f1a99a8dc7ea..e41eb46b02fe 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1116,6 +1116,7 @@ static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
  * i40evf_fetch_rx_buffer - Allocate skb and populate it
  * @rx_ring: rx descriptor ring to transact packets on
  * @rx_desc: descriptor containing info written by hardware
+ * @size: size of buffer to add to skb
  *
  * This function allocates an skb on the fly, and populates it with the page
  * data from the current receive descriptor, taking care to set up the skb
@@ -1125,13 +1126,9 @@ static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
 static inline
 struct sk_buff *i40evf_fetch_rx_buffer(struct i40e_ring *rx_ring,
   union i40e_rx_desc *rx_desc,
-  struct sk_buff *skb)
+  struct sk_buff *skb,
+  unsigned int size)
 {
-   u6

[net-next 04/13] i40e: Check for new arq elements before leaving the adminq subtask loop

2017-03-29 Thread Jeff Kirsher
From: Christopher N Bednarz 

Fix a case where we miss an arq element if a new one is added before we
enable interrupts and exit the arq subtask loop. This occurs frequently
with RDMA running on Windows VF and causes long delays that prevent SMB
from establishing connections.

Change-ID: I3e1c8b2b960c12857d9b8275bea2c1563674392e
Signed-off-by: Christopher N Bednarz 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 96bedb54701c..cdf36713f4d1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6519,9 +6519,11 @@ static void i40e_clean_adminq_subtask(struct i40e_pf *pf)
 opcode);
break;
}
-   } while (pending && (i++ < pf->adminq_work_limit));
+   } while (i++ < pf->adminq_work_limit);
+
+   if (i < pf->adminq_work_limit)
+   clear_bit(__I40E_ADMINQ_EVENT_PENDING, &pf->state);
 
-   clear_bit(__I40E_ADMINQ_EVENT_PENDING, &pf->state);
/* re-enable Admin queue interrupt cause */
val = rd32(hw, I40E_PFINT_ICR0_ENA);
val |=  I40E_PFINT_ICR0_ENA_ADMINQ_MASK;
-- 
2.12.0



[net-next 11/13] i40e/i40evf: Add legacy-rx private flag to allow fallback to old Rx flow

2017-03-29 Thread Jeff Kirsher
From: Alexander Duyck 

This patch adds a control which will allow us to toggle into and out of the
legacy Rx mode.  The legacy Rx mode is what we currently do when performing
Rx.  As I make further changes what should happen is that the driver will
fall back to the behavior for Rx as of this patch should the "legacy-rx"
flag be set to on.

Change-ID: I0342998849bbb31351cce05f6e182c99174e7751
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |   1 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   5 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h |   2 +
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 104 +
 4 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index aa9ac2833edf..421ea57128d3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -430,6 +430,7 @@ struct i40e_pf {
 #define I40E_FLAG_TEMP_LINK_POLLINGBIT_ULL(55)
 #define I40E_FLAG_CLIENT_L2_CHANGE BIT_ULL(56)
 #define I40E_FLAG_WOL_MC_MAGIC_PKT_WAKEBIT_ULL(57)
+#define I40E_FLAG_LEGACY_RXBIT_ULL(58)
 
/* Tracks features that are disabled due to hw limitations.
 * If a bit is set here, it means that the corresponding
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 9b2e9cef56a4..c0c1a0cdaa5b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -226,6 +226,7 @@ static const struct i40e_priv_flags 
i40e_gstrings_priv_flags[] = {
I40E_PRIV_FLAG("flow-director-atr", I40E_FLAG_FD_ATR_ENABLED, 0),
I40E_PRIV_FLAG("veb-stats", I40E_FLAG_VEB_STATS_ENABLED, 0),
I40E_PRIV_FLAG("hw-atr-eviction", I40E_FLAG_HW_ATR_EVICT_CAPABLE, 0),
+   I40E_PRIV_FLAG("legacy-rx", I40E_FLAG_LEGACY_RX, 0),
 };
 
 #define I40E_PRIV_FLAGS_STR_LEN ARRAY_SIZE(i40e_gstrings_priv_flags)
@@ -4055,6 +4056,7 @@ static int i40e_set_priv_flags(struct net_device *dev, 
u32 flags)
}
 
 flags_complete:
+   /* check for flags that changed */
changed_flags ^= pf->flags;
 
/* Process any additional changes needed as a result of flag changes.
@@ -4095,7 +4097,8 @@ static int i40e_set_priv_flags(struct net_device *dev, 
u32 flags)
/* Issue reset to cause things to take effect, as additional bits
 * are added we will need to create a mask of bits requiring reset
 */
-   if (changed_flags & I40E_FLAG_VEB_STATS_ENABLED)
+   if ((changed_flags & I40E_FLAG_VEB_STATS_ENABLED) ||
+   ((changed_flags & I40E_FLAG_LEGACY_RX) && netif_running(dev)))
i40e_do_reset(pf, BIT(__I40E_PF_RESET_REQUESTED));
 
return 0;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h 
b/drivers/net/ethernet/intel/i40evf/i40evf.h
index b2b48511f457..e60cbfa7e769 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -222,6 +222,7 @@ struct i40evf_adapter {
 #define I40EVF_FLAG_CLIENT_NEEDS_L2_PARAMS BIT(17)
 #define I40EVF_FLAG_PROMISC_ON BIT(18)
 #define I40EVF_FLAG_ALLMULTI_ONBIT(19)
+#define I40EVF_FLAG_LEGACY_RX  BIT(20)
 /* duplicates for common code */
 #define I40E_FLAG_FDIR_ATR_ENABLED 0
 #define I40E_FLAG_DCB_ENABLED  0
@@ -229,6 +230,7 @@ struct i40evf_adapter {
 #define I40E_FLAG_RX_CSUM_ENABLED  I40EVF_FLAG_RX_CSUM_ENABLED
 #define I40E_FLAG_WB_ON_ITR_CAPABLEI40EVF_FLAG_WB_ON_ITR_CAPABLE
 #define I40E_FLAG_OUTER_UDP_CSUM_CAPABLE   
I40EVF_FLAG_OUTER_UDP_CSUM_CAPABLE
+#define I40E_FLAG_LEGACY_RXI40EVF_FLAG_LEGACY_RX
/* flags for admin queue service task */
u32 aq_required;
 #define I40EVF_FLAG_AQ_ENABLE_QUEUES   BIT(0)
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 122efbd29a19..9bb2cc7dd4e4 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -63,6 +63,29 @@ static const struct i40evf_stats i40evf_gstrings_stats[] = {
 #define I40EVF_STATS_LEN(_dev) \
(I40EVF_GLOBAL_STATS_LEN + I40EVF_QUEUE_STATS_LEN(_dev))
 
+/* For now we have one and only one private flag and it is only defined
+ * when we have support for the SKIP_CPU_SYNC DMA attribute.  Instead
+ * of leaving all this code sitting around empty we will strip it unless
+ * our one private flag is actually available.
+ */
+struct i40evf_priv_flags {
+   char flag_string[ETH_GSTRING_LEN];
+   u32 flag;
+   bool read_only;
+};
+
+#define I40EVF_PRIV_FLAG(_name, _flag, _read_only) { \
+   .flag_st

[net-next 10/13] i40e/i40evf: Break i40e_fetch_rx_buffer up to allow for reuse of frag code

2017-03-29 Thread Jeff Kirsher
From: Alexander Duyck 

This patch is meant to clean up the code in preparation for us adding
support for build_skb.  Specifically we deconstruct i40e_fetch_buffer into
several functions so that those functions can later be reused when we add a
path for build_skb.

Specifically with this change we split out the code for adding a page to an
exiting skb.

Change-ID: Iab1efbab6b8b97cb60ab9fdd0be1d37a056a154d
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 138 --
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 138 --
 2 files changed, 130 insertions(+), 146 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index bba41ce08124..ebffca0cefac 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1687,61 +1687,23 @@ static bool i40e_can_reuse_rx_page(struct 
i40e_rx_buffer *rx_buffer)
  * @size: packet length from rx_desc
  *
  * This function will add the data contained in rx_buffer->page to the skb.
- * This is done either through a direct copy if the data in the buffer is
- * less than the skb header size, otherwise it will just attach the page as
- * a frag to the skb.
+ * It will just attach the page as a frag to the skb.
  *
- * The function will then update the page offset if necessary and return
- * true if the buffer can be reused by the adapter.
+ * The function will then update the page offset.
  **/
 static void i40e_add_rx_frag(struct i40e_ring *rx_ring,
 struct i40e_rx_buffer *rx_buffer,
 struct sk_buff *skb,
 unsigned int size)
 {
-   struct page *page = rx_buffer->page;
-   unsigned char *va = page_address(page) + rx_buffer->page_offset;
 #if (PAGE_SIZE < 8192)
unsigned int truesize = I40E_RXBUFFER_2048;
 #else
-   unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
+   unsigned int truesize = SKB_DATA_ALIGN(size);
 #endif
-   unsigned int pull_len;
-
-   if (unlikely(skb_is_nonlinear(skb)))
-   goto add_tail_frag;
-
-   /* will the data fit in the skb we allocated? if so, just
-* copy it as it is pretty small anyway
-*/
-   if (size <= I40E_RX_HDR_SIZE) {
-   memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
-
-   /* page is to be freed, increase pagecnt_bias instead of
-* decreasing page count.
-*/
-   rx_buffer->pagecnt_bias++;
-   return;
-   }
-
-   /* we need the header to contain the greater of either
-* ETH_HLEN or 60 bytes if the skb->len is less than
-* 60 for skb_pad.
-*/
-   pull_len = eth_get_headlen(va, I40E_RX_HDR_SIZE);
-
-   /* align pull length to size of long to optimize
-* memcpy performance
-*/
-   memcpy(__skb_put(skb, pull_len), va, ALIGN(pull_len, sizeof(long)));
-
-   /* update all of the pointers */
-   va += pull_len;
-   size -= pull_len;
 
-add_tail_frag:
-   skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
-   (unsigned long)va & ~PAGE_MASK, size, truesize);
+   skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
+   rx_buffer->page_offset, size, truesize);
 
/* page is being used so we must update the page offset */
 #if (PAGE_SIZE < 8192)
@@ -1781,45 +1743,66 @@ static struct i40e_rx_buffer *i40e_get_rx_buffer(struct 
i40e_ring *rx_ring,
 }
 
 /**
- * i40e_fetch_rx_buffer - Allocate skb and populate it
+ * i40e_construct_skb - Allocate skb and populate it
  * @rx_ring: rx descriptor ring to transact packets on
  * @rx_buffer: rx buffer to pull data from
  * @size: size of buffer to add to skb
  *
- * This function allocates an skb on the fly, and populates it with the page
- * data from the current receive descriptor, taking care to set up the skb
- * correctly, as well as handling calling the page recycle function if
- * necessary.
+ * This function allocates an skb.  It then populates it with the page
+ * data from the current receive descriptor, taking care to set up the
+ * skb correctly.
  */
-static inline
-struct sk_buff *i40e_fetch_rx_buffer(struct i40e_ring *rx_ring,
-struct i40e_rx_buffer *rx_buffer,
-struct sk_buff *skb,
-unsigned int size)
+static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
+ struct i40e_rx_buffer *rx_buffer,
+ unsigned int size)
 {
-   if (likely(!skb)) {
-   void *page_addr = page_address(rx_buffer->page) +
- rx_buffer->page_offset;
+   void

[net-next 01/13] i40evf: enforce descriptor write-back mechanism for VF

2017-03-29 Thread Jeff Kirsher
From: Preethi Banala 

The current driver mode is to use a write-back mechanism for the head
register which indicates transmit completions. The VF driver needs to be
able to work on hardware that exclusively uses descriptor write-back, so
change the default driver mode of operation to descriptor write-back for
VF. In our analysis, performance wasn't significantly different with
either write-back method.

Change-ID: Ia92e4ec77c2df8dc4515c71d53746d57d77759af
Signed-off-by: Preethi Banala 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  | 64 +++---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  | 14 -
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c|  4 --
 3 files changed, 7 insertions(+), 75 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 8915c5598d20..f1a99a8dc7ea 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -137,10 +137,7 @@ u32 i40evf_get_tx_pending(struct i40e_ring *ring, bool 
in_sw)
 {
u32 head, tail;
 
-   if (!in_sw)
-   head = i40e_get_head(ring);
-   else
-   head = ring->next_to_clean;
+   head = ring->next_to_clean;
tail = readl(ring->tail);
 
if (head != tail)
@@ -165,7 +162,6 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 {
u16 i = tx_ring->next_to_clean;
struct i40e_tx_buffer *tx_buf;
-   struct i40e_tx_desc *tx_head;
struct i40e_tx_desc *tx_desc;
unsigned int total_bytes = 0, total_packets = 0;
unsigned int budget = vsi->work_limit;
@@ -174,8 +170,6 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
tx_desc = I40E_TX_DESC(tx_ring, i);
i -= tx_ring->count;
 
-   tx_head = I40E_TX_DESC(tx_ring, i40e_get_head(tx_ring));
-
do {
struct i40e_tx_desc *eop_desc = tx_buf->next_to_watch;
 
@@ -186,8 +180,9 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
/* prevent any other reads prior to eop_desc */
read_barrier_depends();
 
-   /* we have caught up to head, no work left to do */
-   if (tx_head == tx_desc)
+   /* if the descriptor isn't done, no work yet to do */
+   if (!(eop_desc->cmd_type_offset_bsz &
+ cpu_to_le64(I40E_TX_DESC_DTYPE_DESC_DONE)))
break;
 
/* clear next_to_watch to prevent false hangs */
@@ -464,10 +459,6 @@ int i40evf_setup_tx_descriptors(struct i40e_ring *tx_ring)
 
/* round up to nearest 4K */
tx_ring->size = tx_ring->count * sizeof(struct i40e_tx_desc);
-   /* add u32 for head writeback, align after this takes care of
-* guaranteeing this is at least one cache line in size
-*/
-   tx_ring->size += sizeof(u32);
tx_ring->size = ALIGN(tx_ring->size, 4096);
tx_ring->desc = dma_alloc_coherent(dev, tx_ring->size,
   &tx_ring->dma, GFP_KERNEL);
@@ -2012,7 +2003,6 @@ static inline void i40evf_tx_map(struct i40e_ring 
*tx_ring, struct sk_buff *skb,
u16 i = tx_ring->next_to_use;
u32 td_tag = 0;
dma_addr_t dma;
-   u16 desc_count = 1;
 
if (tx_flags & I40E_TX_FLAGS_HW_VLAN) {
td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
@@ -2048,7 +2038,6 @@ static inline void i40evf_tx_map(struct i40e_ring 
*tx_ring, struct sk_buff *skb,
 
tx_desc++;
i++;
-   desc_count++;
 
if (i == tx_ring->count) {
tx_desc = I40E_TX_DESC(tx_ring, 0);
@@ -2070,7 +2059,6 @@ static inline void i40evf_tx_map(struct i40e_ring 
*tx_ring, struct sk_buff *skb,
 
tx_desc++;
i++;
-   desc_count++;
 
if (i == tx_ring->count) {
tx_desc = I40E_TX_DESC(tx_ring, 0);
@@ -2096,46 +2084,8 @@ static inline void i40evf_tx_map(struct i40e_ring 
*tx_ring, struct sk_buff *skb,
 
i40e_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
-   /* write last descriptor with EOP bit */
-   td_cmd |= I40E_TX_DESC_CMD_EOP;
-
-   /* We can OR these values together as they both are checked against
-* 4 below and at this point desc_count will be used as a boolean value
-* after this if/else block.
-*/
-   desc_count |= ++tx_ring->packet_stride;
-
-   /* Algorithm to optimize tail and RS bit setting:
-* if queue is stopped
-*  mark RS bit
-*  reset packet counter
-* else if xmit_more is supported and is true
-*  advance packet counter to 4
-*  reset desc_count to 0
-*
-* if desc_count >= 4
-*  mark RS bit
-*  reset packet counter
-* if desc_count > 0

[net-next 08/13] i40e/i40evf: Pull code for grabbing and syncing rx_buffer from fetch_buffer

2017-03-29 Thread Jeff Kirsher
From: Alexander Duyck 

This patch pulls the code responsible for fetching the Rx buffer and
synchronizing DMA into a function, specifically called i40e_get_rx_buffer.

The general idea is to allow for better code reuse by pulling this out of
i40e_fetch_rx_buffer.  We dropped a couple of prefetches since the time
between the prefetch being called and the data being accessed was too small
to be useful.

Change-ID: I4885fce4b2637dbedc8e16431169d23d3d7e79b9
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 58 ---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 58 ---
 2 files changed, 68 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 012e55354043..f2256d8c5e35 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1754,9 +1754,35 @@ static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
 }
 
 /**
+ * i40e_get_rx_buffer - Fetch Rx buffer and synchronize data for use
+ * @rx_ring: rx descriptor ring to transact packets on
+ * @size: size of buffer to add to skb
+ *
+ * This function will pull an Rx buffer from the ring and synchronize it
+ * for use by the CPU.
+ */
+static struct i40e_rx_buffer *i40e_get_rx_buffer(struct i40e_ring *rx_ring,
+const unsigned int size)
+{
+   struct i40e_rx_buffer *rx_buffer;
+
+   rx_buffer = &rx_ring->rx_bi[rx_ring->next_to_clean];
+   prefetchw(rx_buffer->page);
+
+   /* we are reusing so sync this buffer for CPU use */
+   dma_sync_single_range_for_cpu(rx_ring->dev,
+ rx_buffer->dma,
+ rx_buffer->page_offset,
+ size,
+ DMA_FROM_DEVICE);
+
+   return rx_buffer;
+}
+
+/**
  * i40e_fetch_rx_buffer - Allocate skb and populate it
  * @rx_ring: rx descriptor ring to transact packets on
- * @rx_desc: descriptor containing info written by hardware
+ * @rx_buffer: rx buffer to pull data from
  * @size: size of buffer to add to skb
  *
  * This function allocates an skb on the fly, and populates it with the page
@@ -1766,19 +1792,13 @@ static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
  */
 static inline
 struct sk_buff *i40e_fetch_rx_buffer(struct i40e_ring *rx_ring,
-union i40e_rx_desc *rx_desc,
+struct i40e_rx_buffer *rx_buffer,
 struct sk_buff *skb,
 unsigned int size)
 {
-   struct i40e_rx_buffer *rx_buffer;
-   struct page *page;
-
-   rx_buffer = &rx_ring->rx_bi[rx_ring->next_to_clean];
-   page = rx_buffer->page;
-   prefetchw(page);
-
if (likely(!skb)) {
-   void *page_addr = page_address(page) + rx_buffer->page_offset;
+   void *page_addr = page_address(rx_buffer->page) +
+ rx_buffer->page_offset;
 
/* prefetch first cache line of first page */
prefetch(page_addr);
@@ -1794,21 +1814,8 @@ struct sk_buff *i40e_fetch_rx_buffer(struct i40e_ring 
*rx_ring,
rx_ring->rx_stats.alloc_buff_failed++;
return NULL;
}
-
-   /* we will be copying header into skb->data in
-* pskb_may_pull so it is in our interest to prefetch
-* it now to avoid a possible cache miss
-*/
-   prefetchw(skb->data);
}
 
-   /* we are reusing so sync this buffer for CPU use */
-   dma_sync_single_range_for_cpu(rx_ring->dev,
- rx_buffer->dma,
- rx_buffer->page_offset,
- size,
- DMA_FROM_DEVICE);
-
/* pull page into skb */
if (i40e_add_rx_frag(rx_ring, rx_buffer, size, skb)) {
/* hand second half of page back to the ring */
@@ -1886,6 +1893,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, 
int budget)
bool failure = false;
 
while (likely(total_rx_packets < budget)) {
+   struct i40e_rx_buffer *rx_buffer;
union i40e_rx_desc *rx_desc;
unsigned int size;
u16 vlan_tag;
@@ -1918,7 +1926,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, 
int budget)
 */
dma_rmb();
 
-   skb = i40e_fetch_rx_buffer(rx_ring, rx_desc, skb, size);
+   rx_buffer = i40e_get_rx_buffer(rx_ring, size);
+
+   skb = i40e_fetch_rx_buffer(rx_ring, rx_buffer, skb, size);
if (!skb)

[net-next 12/13] i40e/i40evf: Change the way we limit the maximum frame size for Rx

2017-03-29 Thread Jeff Kirsher
From: Alexander Duyck 

This patch changes the way we handle the maximum frame size for the Rx
path.  Previously we were rounding up to 2K for a 1500 MTU and then brining
the max frame size down to MTU plus a fixed amount.  With this patch
applied what we now do is limit the maximum frame to 1.5K minus the value
for NET_IP_ALIGN for standard MTU, and for any MTU greater than 1500 we
allow up to the maximum frame size.  This makes the behavior more
consistent with the other drivers such as igb which had similar logic.  In
addition it reduces the test matrix for MTU since we only have two max
frame sizes that are handled for Rx now.

Change-ID: I23a9d3c857e7df04b0ef28c64df63e659c013f3f
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c| 26 --
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|  4 +---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  |  4 +---
 drivers/net/ethernet/intel/i40evf/i40evf.h |  4 
 drivers/net/ethernet/intel/i40evf/i40evf_main.c| 16 -
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c| 14 
 6 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1dc02c5eee1c..1f89e416156d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2995,7 +2995,8 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 
ring->rx_buf_len = vsi->rx_buf_len;
 
-   rx_ctx.dbuff = ring->rx_buf_len >> I40E_RXQ_CTX_DBUFF_SHIFT;
+   rx_ctx.dbuff = DIV_ROUND_UP(ring->rx_buf_len,
+   BIT_ULL(I40E_RXQ_CTX_DBUFF_SHIFT));
 
rx_ctx.base = (ring->dma / 128);
rx_ctx.qlen = ring->count;
@@ -3075,17 +3076,18 @@ static int i40e_vsi_configure_rx(struct i40e_vsi *vsi)
int err = 0;
u16 i;
 
-   if (vsi->netdev && (vsi->netdev->mtu > ETH_DATA_LEN))
-   vsi->max_frame = vsi->netdev->mtu + ETH_HLEN
-  + ETH_FCS_LEN + VLAN_HLEN;
-   else
-   vsi->max_frame = I40E_RXBUFFER_2048;
-
-   vsi->rx_buf_len = I40E_RXBUFFER_2048;
-
-   /* round up for the chip's needs */
-   vsi->rx_buf_len = ALIGN(vsi->rx_buf_len,
-   BIT_ULL(I40E_RXQ_CTX_DBUFF_SHIFT));
+   if (!vsi->netdev || (vsi->back->flags & I40E_FLAG_LEGACY_RX)) {
+   vsi->max_frame = I40E_MAX_RXBUFFER;
+   vsi->rx_buf_len = I40E_RXBUFFER_2048;
+#if (PAGE_SIZE < 8192)
+   } else if (vsi->netdev->mtu <= ETH_DATA_LEN) {
+   vsi->max_frame = I40E_RXBUFFER_1536 - NET_IP_ALIGN;
+   vsi->rx_buf_len = I40E_RXBUFFER_1536 - NET_IP_ALIGN;
+#endif
+   } else {
+   vsi->max_frame = I40E_MAX_RXBUFFER;
+   vsi->rx_buf_len = I40E_RXBUFFER_2048;
+   }
 
/* set up individual rings */
for (i = 0; i < vsi->num_queue_pairs && !err; i++)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index eb733726637f..d6609deace57 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -117,10 +117,8 @@ enum i40e_dyn_idx_t {
 
 /* Supported Rx Buffer Sizes (a multiple of 128) */
 #define I40E_RXBUFFER_256   256
+#define I40E_RXBUFFER_1536  1536  /* 128B aligned standard Ethernet frame */
 #define I40E_RXBUFFER_2048  2048
-#define I40E_RXBUFFER_3072  3072   /* For FCoE MTU of 2158 */
-#define I40E_RXBUFFER_4096  4096
-#define I40E_RXBUFFER_8192  8192
 #define I40E_MAX_RXBUFFER   9728  /* largest size for single descriptor */
 
 /* NOTE: netdev_alloc_skb reserves up to 64 bytes, NET_IP_ALIGN means we
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index aba40edb0e2e..3bb4d732e467 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -104,10 +104,8 @@ enum i40e_dyn_idx_t {
 
 /* Supported Rx Buffer Sizes (a multiple of 128) */
 #define I40E_RXBUFFER_256   256
+#define I40E_RXBUFFER_1536  1536  /* 128B aligned standard Ethernet frame */
 #define I40E_RXBUFFER_2048  2048
-#define I40E_RXBUFFER_3072  3072   /* For FCoE MTU of 2158 */
-#define I40E_RXBUFFER_4096  4096
-#define I40E_RXBUFFER_8192  8192
 #define I40E_MAX_RXBUFFER   9728  /* largest size for single descriptor */
 
 /* NOTE: netdev_alloc_skb reserves up to 64 bytes, NET_IP_ALIGN means we
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h 
b/drivers/net/ethernet/intel/i40evf/i40evf.h
index e60cbfa7e769..d61ecf655091 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -72,10 +72,6 @@ struct i40e_vsi {
 #define I40EVF_MAX_RXD 4096
 #define I40EVF_MIN_RXD 64
 #de

[net-next 05/13] i40e: remove a useless goto statement

2017-03-29 Thread Jeff Kirsher
From: Jacob Keller 

The goto found here for when in MFP mode is pointless. It jumps to the
end of a series of if blocks. However, right after this statement is
a closing '}' for this if block, which will result in the program flow
going to the exact same location as the goto statement indicates. Thus,
regardless of whether we are in MFP mode, the program flow will resume
from the same location.

This arose due to various refactoring which did not notice that this
goto became essentially a no-op.

To properly understand this diff you will need to view a larger context
than is given by default.

Change-ID: I088f73c3831aa5c4e2281380c7a3ce605594300c
Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index cdf36713f4d1..1dc02c5eee1c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5167,10 +5167,6 @@ static int i40e_init_pf_dcb(struct i40e_pf *pf)
(hw->dcbx_status == I40E_DCBX_STATUS_DISABLED)) {
dev_info(&pf->pdev->dev,
 "DCBX offload is not supported or is disabled 
for this PF.\n");
-
-   if (pf->flags & I40E_FLAG_MFP_ENABLED)
-   goto out;
-
} else {
/* When status is not DISABLED then DCBX in FW */
pf->dcbx_cap = DCB_CAP_DCBX_LLD_MANAGED |
-- 
2.12.0



[net-next 02/13] i40e: Clean up handling of private flags

2017-03-29 Thread Jeff Kirsher
From: Alexander Duyck 

This patch cleans up and addresses several issues in the way that i40e
handles private flags. Previously the code was choosing fixed bits and
trying to match them up with strings in a somewhat haphazard way. This
resulted in the possibility for adding a new bit and causing a mismatch as
the private flags are linear bits starting at 0, and the private flags in
the driver were split up over a group specific to the PF and a group that
was global.

What this change does is define an array of structs used to represent the
private flags. Contained within the structs are the bits necessary to know
which flags to set and/or clear depending on the state of the bit. By
doing this we can add new bits in the future with minimal overhead and
avoid creating possible mis-matches should we need to remove a flag based
on compile options.

Change-ID: Ia3214ab04f0ab2f70354ac0997a135f1d01b0acd
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |   8 --
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 189 +++--
 2 files changed, 112 insertions(+), 85 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index d7e84f99eb2d..f506e994861b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -91,14 +91,6 @@
 #define I40E_QUEUE_WAIT_RETRY_LIMIT10
 #define I40E_INT_NAME_STR_LEN  (IFNAMSIZ + 16)
 
-/* Ethtool Private Flags */
-#define I40E_PRIV_FLAGS_MFP_FLAG   BIT(0)
-#define I40E_PRIV_FLAGS_LINKPOLL_FLAG  BIT(1)
-#define I40E_PRIV_FLAGS_FD_ATR BIT(2)
-#define I40E_PRIV_FLAGS_VEB_STATS  BIT(3)
-#define I40E_PRIV_FLAGS_HW_ATR_EVICT   BIT(4)
-#define I40E_PRIV_FLAGS_TRUE_PROMISC_SUPPORT   BIT(5)
-
 #define I40E_NVM_VERSION_LO_SHIFT  0
 #define I40E_NVM_VERSION_LO_MASK   (0xff << I40E_NVM_VERSION_LO_SHIFT)
 #define I40E_NVM_VERSION_HI_SHIFT  12
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index c8c566a0a6c3..9b2e9cef56a4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -207,22 +207,36 @@ static const char i40e_gstrings_test[][ETH_GSTRING_LEN] = 
{
 
 #define I40E_TEST_LEN (sizeof(i40e_gstrings_test) / ETH_GSTRING_LEN)
 
-static const char i40e_priv_flags_strings[][ETH_GSTRING_LEN] = {
-   "MFP",
-   "LinkPolling",
-   "flow-director-atr",
-   "veb-stats",
-   "hw-atr-eviction",
+struct i40e_priv_flags {
+   char flag_string[ETH_GSTRING_LEN];
+   u64 flag;
+   bool read_only;
 };
 
-#define I40E_PRIV_FLAGS_STR_LEN ARRAY_SIZE(i40e_priv_flags_strings)
+#define I40E_PRIV_FLAG(_name, _flag, _read_only) { \
+   .flag_string = _name, \
+   .flag = _flag, \
+   .read_only = _read_only, \
+}
+
+static const struct i40e_priv_flags i40e_gstrings_priv_flags[] = {
+   /* NOTE: MFP setting cannot be changed */
+   I40E_PRIV_FLAG("MFP", I40E_FLAG_MFP_ENABLED, 1),
+   I40E_PRIV_FLAG("LinkPolling", I40E_FLAG_LINK_POLLING_ENABLED, 0),
+   I40E_PRIV_FLAG("flow-director-atr", I40E_FLAG_FD_ATR_ENABLED, 0),
+   I40E_PRIV_FLAG("veb-stats", I40E_FLAG_VEB_STATS_ENABLED, 0),
+   I40E_PRIV_FLAG("hw-atr-eviction", I40E_FLAG_HW_ATR_EVICT_CAPABLE, 0),
+};
+
+#define I40E_PRIV_FLAGS_STR_LEN ARRAY_SIZE(i40e_gstrings_priv_flags)
 
 /* Private flags with a global effect, restricted to PF 0 */
-static const char i40e_gl_priv_flags_strings[][ETH_GSTRING_LEN] = {
-   "vf-true-promisc-support",
+static const struct i40e_priv_flags i40e_gl_gstrings_priv_flags[] = {
+   I40E_PRIV_FLAG("vf-true-promisc-support",
+  I40E_FLAG_TRUE_PROMISC_SUPPORT, 0),
 };
 
-#define I40E_GL_PRIV_FLAGS_STR_LEN ARRAY_SIZE(i40e_gl_priv_flags_strings)
+#define I40E_GL_PRIV_FLAGS_STR_LEN ARRAY_SIZE(i40e_gl_gstrings_priv_flags)
 
 /**
  * i40e_partition_setting_complaint - generic complaint for MFP restriction
@@ -1660,12 +1674,18 @@ static void i40e_get_strings(struct net_device *netdev, 
u32 stringset,
/* BUG_ON(p - data != I40E_STATS_LEN * ETH_GSTRING_LEN); */
break;
case ETH_SS_PRIV_FLAGS:
-   memcpy(data, i40e_priv_flags_strings,
-  I40E_PRIV_FLAGS_STR_LEN * ETH_GSTRING_LEN);
-   data += I40E_PRIV_FLAGS_STR_LEN * ETH_GSTRING_LEN;
-   if (pf->hw.pf_id == 0)
-   memcpy(data, i40e_gl_priv_flags_strings,
-  I40E_GL_PRIV_FLAGS_STR_LEN * ETH_GSTRING_LEN);
+   for (i = 0; i < I40E_PRIV_FLAGS_STR_LEN; i++) {
+   snprintf(p, ETH_GSTRING_LEN, "%s",
+i40e_gstrings_priv_flags[i].flag_string);
+   p += ETH_GSTRING_LEN;
+   }
+  

[PATCH] net: netfilter: Replace explicit NULL comparison with ! operator

2017-03-29 Thread Arushi Singhal
Replace explicit NULL comparison with ! operator to simplify code.

Signed-off-by: Arushi Singhal 
---
 net/netfilter/ipvs/ip_vs_ctl.c |  8 ++---
 net/netfilter/ipvs/ip_vs_proto.c   |  8 ++---
 net/netfilter/nf_conntrack_broadcast.c |  2 +-
 net/netfilter/nf_conntrack_core.c  |  2 +-
 net/netfilter/nf_conntrack_ecache.c|  4 +--
 net/netfilter/nf_conntrack_helper.c|  4 +--
 net/netfilter/nf_conntrack_proto.c |  4 +--
 net/netfilter/nf_log.c |  2 +-
 net/netfilter/nf_nat_redirect.c|  2 +-
 net/netfilter/nf_tables_api.c  | 62 +-
 net/netfilter/nfnetlink_log.c  |  6 ++--
 net/netfilter/nfnetlink_queue.c|  8 ++---
 net/netfilter/nft_compat.c |  4 +--
 net/netfilter/nft_ct.c | 10 +++---
 net/netfilter/nft_dynset.c | 14 
 net/netfilter/nft_log.c| 14 
 net/netfilter/nft_lookup.c |  2 +-
 net/netfilter/nft_payload.c|  4 +--
 net/netfilter/nft_set_hash.c   |  4 +--
 net/netfilter/x_tables.c   |  8 ++---
 net/netfilter/xt_TCPMSS.c  |  4 +--
 net/netfilter/xt_addrtype.c|  2 +-
 net/netfilter/xt_connlimit.c   |  2 +-
 net/netfilter/xt_conntrack.c   |  2 +-
 net/netfilter/xt_hashlimit.c   |  4 +--
 net/netfilter/xt_recent.c  |  6 ++--
 26 files changed, 96 insertions(+), 96 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 5aeb0dde6ccc..32daa0b3797e 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -983,7 +983,7 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct 
ip_vs_dest_user_kern *udest)
dest = ip_vs_lookup_dest(svc, udest->af, &daddr, dport);
rcu_read_unlock();
 
-   if (dest != NULL) {
+   if (dest) {
IP_VS_DBG(1, "%s(): dest already exists\n", __func__);
return -EEXIST;
}
@@ -994,7 +994,7 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct 
ip_vs_dest_user_kern *udest)
 */
dest = ip_vs_trash_get_dest(svc, udest->af, &daddr, dport);
 
-   if (dest != NULL) {
+   if (dest) {
IP_VS_DBG_BUF(3, "Get destination %s:%u from trash, "
  "dest->refcnt=%d, service %u/%s:%u\n",
  IP_VS_DBG_ADDR(udest->af, &daddr), ntohs(dport),
@@ -1299,7 +1299,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct 
ip_vs_service_user_kern *u,
 
 
  out_err:
-   if (svc != NULL) {
+   if (svc) {
ip_vs_unbind_scheduler(svc, sched);
ip_vs_service_free(svc);
}
@@ -2453,7 +2453,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user 
*user, unsigned int len)
 
switch (cmd) {
case IP_VS_SO_SET_ADD:
-   if (svc != NULL)
+   if (svc)
ret = -EEXIST;
else
ret = ip_vs_add_service(ipvs, &usvc, &svc);
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 8ae480715cea..6ee7fec2ef47 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -53,7 +53,7 @@ static int __used __init register_ip_vs_protocol(struct 
ip_vs_protocol *pp)
pp->next = ip_vs_proto_table[hash];
ip_vs_proto_table[hash] = pp;
 
-   if (pp->init != NULL)
+   if (pp->init)
pp->init(pp);
 
return 0;
@@ -77,7 +77,7 @@ register_ip_vs_proto_netns(struct netns_ipvs *ipvs, struct 
ip_vs_protocol *pp)
ipvs->proto_data_table[hash] = pd;
atomic_set(&pd->appcnt, 0); /* Init app counter */
 
-   if (pp->init_netns != NULL) {
+   if (pp->init_netns) {
int ret = pp->init_netns(ipvs, pd);
if (ret) {
/* unlink an free proto data */
@@ -102,7 +102,7 @@ static int unregister_ip_vs_protocol(struct ip_vs_protocol 
*pp)
for (; *pp_p; pp_p = &(*pp_p)->next) {
if (*pp_p == pp) {
*pp_p = pp->next;
-   if (pp->exit != NULL)
+   if (pp->exit)
pp->exit(pp);
return 0;
}
@@ -124,7 +124,7 @@ unregister_ip_vs_proto_netns(struct netns_ipvs *ipvs, 
struct ip_vs_proto_data *p
for (; *pd_p; pd_p = &(*pd_p)->next) {
if (*pd_p == pd) {
*pd_p = pd->next;
-   if (pd->pp->exit_netns != NULL)
+   if (pd->pp->exit_netns)
pd->pp->exit_netns(ipvs, pd);
kfree(pd);
return 0;
diff --git a/net/netfilter/nf_conntrack_broadcast.c 
b/net/netfilter/nf_conntrack_broadcast.c
index 4e99cca61612..a016d47e5a80 100644
--- a/net/netfilter/nf_conntrack_broadcast.c

[PATCH] net: ipv6: Removed unnecessary parenthesis

2017-03-29 Thread Arushi Singhal
Removed parentheses on the right hand side of assignment, as they are
not required. The following coccinelle script was used to fix this
issue:

@@
local idexpression id;
expression e;
@@

id =
-(
e
-)

Signed-off-by: Arushi Singhal 
---
 net/ipv6/netfilter/nf_conntrack_reasm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c 
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index fdde76e8a16a..34dfdc2fb2bc 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -387,9 +387,9 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff 
*prev,  struct net_devic
return false;
 
/* Unfragmented part is taken from the first segment. */
-   payload_len = ((head->data - skb_network_header(head)) -
+   payload_len = (head->data - skb_network_header(head)) -
   sizeof(struct ipv6hdr) + fq->q.len -
-  sizeof(struct frag_hdr));
+  sizeof(struct frag_hdr);
if (payload_len > IPV6_MAXPLEN) {
net_dbg_ratelimited("nf_ct_frag6_reasm: payload len = %d\n",
payload_len);
-- 
2.11.0



Re: [PATCH net-next] net: mpls: Update lfib_nlmsg_size to skip deleted nexthops

2017-03-29 Thread Robert Shearman

On 28/03/17 23:19, David Ahern wrote:

A recent commit skips nexthops in a route if the device has been
deleted. Update lfib_nlmsg_size accordingly.

Reported-by: Roopa Prabhu 
Signed-off-by: David Ahern 


Acked-by: Robert Shearman 


Re: [kernel-hardening] [PATCH net-next v6 06/11] seccomp,landlock: Handle Landlock events per process hierarchy

2017-03-29 Thread Djalal Harouni
On Wed, Mar 29, 2017 at 1:46 AM, Mickaël Salaün  wrote:
> The seccomp(2) syscall can be used by a task to apply a Landlock rule to
> itself. As a seccomp filter, a Landlock rule is enforced for the current
> task and all its future children. A rule is immutable and a task can
> only add new restricting rules to itself, forming a chain of rules.
>
> A Landlock rule is tied to a Landlock event. If the use of a kernel
> object is allowed by the other Linux security mechanisms (e.g. DAC,
> capabilities, other LSM), then a Landlock event related to this kind of
> object is triggered. The chain of rules for this event is then
> evaluated. Each rule return a 32-bit value which can deny the use of a
> kernel object with a non-zero value. If every rules of the chain return
> zero, then the use of the object is allowed.
>
> Changes since v5:
> * remove struct landlock_node and use a similar inheritance mechanisme
>   as seccomp-bpf (requested by Andy Lutomirski)
> * rename SECCOMP_ADD_LANDLOCK_RULE to SECCOMP_APPEND_LANDLOCK_RULE
> * rename file manager.c to providers.c
> * add comments
> * typo and cosmetic fixes
>
> Changes since v4:
> * merge manager and seccomp patches
> * return -EFAULT in seccomp(2) when user_bpf_fd is null to easely check
>   if Landlock is supported
> * only allow a process with the global CAP_SYS_ADMIN to use Landlock
>   (will be lifted in the future)
> * add an early check to exit as soon as possible if the current process
>   does not have Landlock rules
>
> Changes since v3:
> * remove the hard link with seccomp (suggested by Andy Lutomirski and
>   Kees Cook):
>   * remove the cookie which could imply multiple evaluation of Landlock
> rules
>   * remove the origin field in struct landlock_data
> * remove documentation fix (merged upstream)
> * rename the new seccomp command to SECCOMP_ADD_LANDLOCK_RULE
> * internal renaming
> * split commit
> * new design to be able to inherit on the fly the parent rules
>
> Changes since v2:
> * Landlock programs can now be run without seccomp filter but for any
>   syscall (from the process) or interruption
> * move Landlock related functions and structs into security/landlock/*
>   (to manage cgroups as well)
> * fix seccomp filter handling: run Landlock programs for each of their
>   legitimate seccomp filter
> * properly clean up all seccomp results
> * cosmetic changes to ease the understanding
> * fix some ifdef
>
> Signed-off-by: Mickaël Salaün 
> Cc: Alexei Starovoitov 
> Cc: Andrew Morton 
> Cc: Andy Lutomirski 
> Cc: James Morris 
> Cc: Kees Cook 
> Cc: Serge E. Hallyn 
> Cc: Will Drewry 
> Link: 
> https://lkml.kernel.org/r/c10a503d-5e35-7785-2f3d-25ed8dd63...@digikod.net
> ---
>  include/linux/landlock.h  |  36 +++
>  include/linux/seccomp.h   |   8 ++
>  include/uapi/linux/seccomp.h  |   1 +
>  kernel/fork.c |  14 ++-
>  kernel/seccomp.c  |   8 ++
>  security/landlock/Makefile|   2 +-
>  security/landlock/hooks.c |  37 +++
>  security/landlock/hooks.h |   5 +
>  security/landlock/init.c  |   3 +-
>  security/landlock/providers.c | 232 
> ++
>  10 files changed, 342 insertions(+), 4 deletions(-)
>  create mode 100644 security/landlock/providers.c
>
> diff --git a/include/linux/landlock.h b/include/linux/landlock.h
> index 53013dc374fe..c40ee78e86e0 100644
> --- a/include/linux/landlock.h
> +++ b/include/linux/landlock.h
> @@ -12,6 +12,9 @@
>  #define _LINUX_LANDLOCK_H
>  #ifdef CONFIG_SECURITY_LANDLOCK
>
> +#include  /* _LANDLOCK_SUBTYPE_EVENT_LAST */
> +#include  /* atomic_t */
> +
>  /*
>   * This is not intended for the UAPI headers. Each userland software should 
> use
>   * a static minimal version for the required features as explained in the
> @@ -19,5 +22,38 @@
>   */
>  #define LANDLOCK_VERSION 1
>
> +struct landlock_rule {
> +   atomic_t usage;
> +   struct landlock_rule *prev;
> +   struct bpf_prog *prog;
> +};
> +
> +/**
> + * struct landlock_events - Landlock event rules enforced on a thread
> + *
> + * This is used for low performance impact when forking a process. Instead of
> + * copying the full array and incrementing the usage of each entries, only
> + * create a pointer to &struct landlock_events and increments its usage. When
> + * appending a new rule, if &struct landlock_events is shared with other 
> tasks,
> + * then duplicate it and append the rule to this new &struct landlock_events.
> + *
> + * @usage: reference count to manage the object lifetime. When a thread need 
> to
> + * add Landlock rules and if @usage is greater than 1, then the 
> thread
> + * must duplicate &struct landlock_events to not change the 
> children's
> + * rules as well.
> + * @rules: array of non-NULL &struct landlock_rule pointers
> + */
> +struct landlock_events {
> +   atomic_t usage;
> +   struct landlock_rule *rules[_LANDLOCK_SUBTYPE_EVENT_LAST];
> +};
> +
> +void put_landlock_events(stru

[PATCH] ezchip: nps_enet: check if napi has been completed

2017-03-29 Thread Vlad Zakharov
After a new NAPI_STATE_MISSED state was added to NAPI we can get into
this state and in such case we have to reschedule NAPI as some work is
still pending and we have to process it. napi_complete_done() function
returns false if we have to reschedule something (e.g. in case we were
in MISSED state) as current polling have not been completed yet.

nps_enet driver hasn't been verifying the return value of
napi_complete_done() and has been forcibly enabling interrupts. That is
not correct as we should not enable interrupts before we have processed
all scheduled work. As a result we were getting trapped in interrupt
hanlder chain as we had never been able to disabale ethernet
interrupts again.

So this patch makes nps_enet_poll() func verify return value of
napi_complete_done() and enable interrupts only in case all scheduled
work has been completed.

Signed-off-by: Vlad Zakharov 
---
 drivers/net/ethernet/ezchip/nps_enet.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ezchip/nps_enet.c 
b/drivers/net/ethernet/ezchip/nps_enet.c
index 992ebe9..f819843 100644
--- a/drivers/net/ethernet/ezchip/nps_enet.c
+++ b/drivers/net/ethernet/ezchip/nps_enet.c
@@ -189,11 +189,9 @@ static int nps_enet_poll(struct napi_struct *napi, int 
budget)
 
nps_enet_tx_handler(ndev);
work_done = nps_enet_rx_handler(ndev);
-   if (work_done < budget) {
+   if ((work_done < budget) && napi_complete_done(napi, work_done)) {
u32 buf_int_enable_value = 0;
 
-   napi_complete_done(napi, work_done);
-
/* set tx_done and rx_rdy bits */
buf_int_enable_value |= NPS_ENET_ENABLE << RX_RDY_SHIFT;
buf_int_enable_value |= NPS_ENET_ENABLE << TX_DONE_SHIFT;
-- 
2.7.4



Re: [PATCH net-next 7/8] vhost_net: try batch dequing from skb array

2017-03-29 Thread Pankaj Gupta
Hi Jason,

> 
> On 2017年03月23日 13:34, Jason Wang wrote:
> >
> >
> >>
> >>> +{
> >>> +if (rvq->rh != rvq->rt)
> >>> +goto out;
> >>> +
> >>> +rvq->rh = rvq->rt = 0;
> >>> +rvq->rt = skb_array_consume_batched_bh(rvq->rx_array, rvq->rxq,
> >>> +VHOST_RX_BATCH);
> >> A comment explaining why is is -bh would be helpful.
> >
> > Ok.
> >
> > Thanks
> 
> Rethink about this. It looks like -bh is not needed in this case since
> no consumer run in bh.

In that case do we need other variants of "ptr_ring_consume_batched_*()" 
functions.
Are we planning to use them in future? 

> 
> Thanks
> 


Re: [PATCH net-next 7/8] vhost_net: try batch dequing from skb array

2017-03-29 Thread Jason Wang



On 2017年03月29日 18:46, Pankaj Gupta wrote:

Hi Jason,


On 2017年03月23日 13:34, Jason Wang wrote:



+{
+if (rvq->rh != rvq->rt)
+goto out;
+
+rvq->rh = rvq->rt = 0;
+rvq->rt = skb_array_consume_batched_bh(rvq->rx_array, rvq->rxq,
+VHOST_RX_BATCH);

A comment explaining why is is -bh would be helpful.

Ok.

Thanks

Rethink about this. It looks like -bh is not needed in this case since
no consumer run in bh.

In that case do we need other variants of "ptr_ring_consume_batched_*()" 
functions.
Are we planning to use them in future?


I think we'd better keep them, since it serves as helpers. You can see 
that not all the helpers in ptr_ring has real users, but they were 
prepared for the future use.


Thanks




Thanks





Re: [PATCH v2] net: veth: use new api ethtool_{get|set}_link_ksettings

2017-03-29 Thread Xin Long
On Wed, Mar 29, 2017 at 2:24 PM, Philippe Reynes  wrote:
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes 
> ---
> Changelog:
> v2:
> - avoid useless initiazation to zero (thanks Xin Long)
>
Reviewed-by: Xin Long 

>  drivers/net/veth.c |   19 +++
>  1 files changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 8c39d6d..3171036 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -45,18 +45,13 @@ struct veth_priv {
> { "peer_ifindex" },
>  };
>
> -static int veth_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
> +static int veth_get_link_ksettings(struct net_device *dev,
> +  struct ethtool_link_ksettings *cmd)
>  {
> -   cmd->supported  = 0;
> -   cmd->advertising= 0;
> -   ethtool_cmd_speed_set(cmd, SPEED_1);
> -   cmd->duplex = DUPLEX_FULL;
> -   cmd->port   = PORT_TP;
> -   cmd->phy_address= 0;
> -   cmd->transceiver= XCVR_INTERNAL;
> -   cmd->autoneg= AUTONEG_DISABLE;
> -   cmd->maxtxpkt   = 0;
> -   cmd->maxrxpkt   = 0;
> +   cmd->base.speed = SPEED_1;
> +   cmd->base.duplex= DUPLEX_FULL;
> +   cmd->base.port  = PORT_TP;
> +   cmd->base.autoneg   = AUTONEG_DISABLE;
> return 0;
>  }
>
> @@ -95,12 +90,12 @@ static void veth_get_ethtool_stats(struct net_device *dev,
>  }
>
>  static const struct ethtool_ops veth_ethtool_ops = {
> -   .get_settings   = veth_get_settings,
> .get_drvinfo= veth_get_drvinfo,
> .get_link   = ethtool_op_get_link,
> .get_strings= veth_get_strings,
> .get_sset_count = veth_get_sset_count,
> .get_ethtool_stats  = veth_get_ethtool_stats,
> +   .get_link_ksettings = veth_get_link_ksettings,
>  };
>
>  static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
> --
> 1.7.4.4
>


Re: [REGRESSION] mac80211: IBSS vif queue stopped when started after 11s vif

2017-03-29 Thread Sven Eckelmann
On Mittwoch, 29. März 2017 09:49:21 CEST Johannes Berg wrote:
> > But I could be completely wrong about it. It would therefore be
> > interesting for me to know who would be responsible to start the
> > queues when ieee80211_do_open rejected it for IBSS.
> 
> Well, once ieee80211_offchannel_return() is called, that should do the
> needful and end up in ieee80211_propagate_queue_wake().
> 
> Can you check what the IBSS vif's queues are (vif->hw_queue[...])?

I've just dumped the data in ieee80211_propagate_queue_wake and checked when 
the function returns. The test patch (sorry, really ugly debug printk stuff) 
is attached. The most interesting part is that

if (local->ops->wake_tx_queue)
return;

evaluates to true. The rest rest of the function is therefore always skipped 
for ath9k.

This was noticed when looking at the debug output:

root@lede:/# dmesg|grep ieee80211_propagate_queue_wake
[   20.865005] ieee80211_propagate_queue_wake:248 queue 
[   20.870839] ieee80211_propagate_queue_wake:248 queue 0001
[   20.876661] ieee80211_propagate_queue_wake:248 queue 0002
[   20.882487] ieee80211_propagate_queue_wake:248 queue 0003
[   21.794795] ieee80211_propagate_queue_wake:248 queue 
[   21.800629] ieee80211_propagate_queue_wake:248 queue 0001
[   21.806452] ieee80211_propagate_queue_wake:248 queue 0002
[   21.812278] ieee80211_propagate_queue_wake:248 queue 0003
[   21.830078] ieee80211_propagate_queue_wake:248 queue 
[   21.835918] ieee80211_propagate_queue_wake:248 queue 0001
[   21.841740] ieee80211_propagate_queue_wake:248 queue 0002
[   21.847566] ieee80211_propagate_queue_wake:248 queue 0003
[   23.320814] ieee80211_propagate_queue_wake:248 queue 
[   23.326643] ieee80211_propagate_queue_wake:248 queue 0001
[   23.332469] ieee80211_propagate_queue_wake:248 queue 0002
[   23.338294] ieee80211_propagate_queue_wake:248 queue 0003
[   41.930942] ieee80211_propagate_queue_wake:248 queue 
[   41.940709] ieee80211_propagate_queue_wake:248 queue 0002
[   46.949087] ieee80211_propagate_queue_wake:248 queue 
[   82.999021] ieee80211_propagate_queue_wake:248 queue 

Removing this is enough to fix the problem. And now you will propably say 
"hey, this is not my code". And this is the reason why I have now CC'ed the 
author of 80a83cfc434b ("mac80211: skip netdev queue control with software 
queuing"). This change in ieee80211_propagate_queue_wake is basically breaking 
the (delayed) startup of the ibss netdev queue [1] when the device was offchan 
during the ieee80211_do_open of the ibss interface.

Not sure whether removing it in ieee80211_propagate_queue_wake will have other 
odd side effects with software queuing. Maybe Michal Kazior can tell us if it 
is safe to remove it.

> However, I also don't understand the difference between encrypted and
> unencrypted here.

My best guess is timing. LEDE is not using wpa_supplicant when encryption is 
disabled.

Kind regards,
Sven

[1] https://lkml.kernel.org/r/1978424.XTv2Qph05K@bentoboxdiff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 036fa1d..9a1079f 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -517,6 +517,10 @@ int ieee80211_do_open(struct wireless_dev *wdev, bool coming_up)
 	u32 changed = 0;
 	int res;
 	u32 hw_reconf_flags = 0;
+	const char *ifname = "unknown";
+
+	if (sdata->dev)
+		ifname = sdata->dev->name;
 
 	switch (sdata->vif.type) {
 	case NL80211_IFTYPE_WDS:
@@ -745,11 +749,14 @@ int ieee80211_do_open(struct wireless_dev *wdev, bool coming_up)
 	if (sdata->vif.type == NL80211_IFTYPE_MONITOR ||
 	sdata->vif.type == NL80211_IFTYPE_AP_VLAN) {
 		/* XXX: for AP_VLAN, actually track AP queues */
+		
+		printk("%s:%u netif_tx_start_all_queues %s\n", __func__, __LINE__, ifname);
 		netif_tx_start_all_queues(dev);
 	} else if (dev) {
 		unsigned long flags;
 		int n_acs = IEEE80211_NUM_ACS;
 		int ac;
+		int started = 0;
 
 		if (local->hw.queues < IEEE80211_NUM_ACS)
 			n_acs = 1;
@@ -762,11 +769,20 @@ int ieee80211_do_open(struct wireless_dev *wdev, bool coming_up)
 int ac_queue = sdata->vif.hw_queue[ac];
 
 if (local->queue_stop_reasons[ac_queue] == 0 &&
-skb_queue_empty(&local->pending[ac_queue]))
+skb_queue_empty(&local->pending[ac_queue])) {
+		//printk("%s:%u netif_start_subqueue type %u %s\n", __func__, __LINE__, sdata->vif.type, ifname);
 	netif_start_subqueue(dev, ac);
+	started = 1;
+} else {
+		printk("%s:%u NOT netif_start_subqueue type %u stop_reasons %d queue_empty %d %s\n", __func__, __LINE__, sdata->vif.type, local->queue_stop_reasons[ac_queue], skb_queue_empty(&local->pending[ac_queue]), ifname);
+}
 			}
+		} else {
+			printk("%s:%u NOT netif_start_subqueue type %u cab_queue %d stop_reasons %d queue_empty %d %s\n", __func__, __LINE__, sdata->vif.type, 

Re: [net-next 03/13] i40e: use register for XL722 control register read/write

2017-03-29 Thread Sergei Shtylyov

Hello!

On 3/29/2017 1:12 PM, Jeff Kirsher wrote:


From: Paul M Stillwell Jr 

The XL722 doesn't support the AQ command to read/write the control
register so enable it to bypass the check and use the direct read/write
method.

Change-ID: Iefecc737b57207485c90845af5989d5af518bf16
Signed-off-by: Paul M Stillwell Jr 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c   | 8 ++--
 drivers/net/ethernet/intel/i40evf/i40e_common.c | 8 ++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 95946f41002b..f9db95aa3a20 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -4963,7 +4963,9 @@ u32 i40e_read_rx_ctl(struct i40e_hw *hw, u32 reg_addr)
int retry = 5;
u32 val = 0;

-   use_register = (hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver < 5);
+   use_register = (((hw->aq.api_maj_ver == 1) &&
+   (hw->aq.api_min_ver < 5)) ||
+   (hw->mac.type == I40E_MAC_X722));


   () around the right operand of the assignment not really needed. Well, 
neither around || and &&...



if (!use_register) {
 do_retry:
status = i40e_aq_rx_ctl_read_register(hw, reg_addr, &val, NULL);
@@ -5022,7 +5024,9 @@ void i40e_write_rx_ctl(struct i40e_hw *hw, u32 reg_addr, 
u32 reg_val)
bool use_register;
int retry = 5;

-   use_register = (hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver < 5);
+   use_register = (((hw->aq.api_maj_ver == 1) &&
+   (hw->aq.api_min_ver < 5)) ||
+   (hw->mac.type == I40E_MAC_X722));


   Same here...


if (!use_register) {
 do_retry:
status = i40e_aq_rx_ctl_write_register(hw, reg_addr,
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c 
b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 89dfdbca13db..626fbf1ead4d 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -958,7 +958,9 @@ u32 i40evf_read_rx_ctl(struct i40e_hw *hw, u32 reg_addr)
int retry = 5;
u32 val = 0;

-   use_register = (hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver < 5);
+   use_register = (((hw->aq.api_maj_ver == 1) &&
+   (hw->aq.api_min_ver < 5)) ||
+   (hw->mac.type == I40E_MAC_X722));


   And here...


if (!use_register) {
 do_retry:
status = i40evf_aq_rx_ctl_read_register(hw, reg_addr,
@@ -1019,7 +1021,9 @@ void i40evf_write_rx_ctl(struct i40e_hw *hw, u32 
reg_addr, u32 reg_val)
bool use_register;
int retry = 5;

-   use_register = (hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver < 5);
+   use_register = (((hw->aq.api_maj_ver == 1) &&
+   (hw->aq.api_min_ver < 5)) ||
+   (hw->mac.type == I40E_MAC_X722));


   And here...


if (!use_register) {
 do_retry:
status = i40evf_aq_rx_ctl_write_register(hw, reg_addr,


MBR, Sergei



Re: [REGRESSION] mac80211: IBSS vif queue stopped when started after 11s vif

2017-03-29 Thread Johannes Berg

> if (local->ops->wake_tx_queue)
>   return;
> 
> evaluates to true. The rest rest of the function is therefore always
> skipped for ath9k.

Ahh, yes, ok.

> Removing this is enough to fix the problem. And now you will propably
> say "hey, this is not my code". And this is the reason why I have now
> CC'ed the author of 80a83cfc434b ("mac80211: skip netdev queue
> control with software queuing"). This change in
> ieee80211_propagate_queue_wake is basically breaking 
> the (delayed) startup of the ibss netdev queue [1] when the device
> was offchan during the ieee80211_do_open of the ibss interface.
> 
> Not sure whether removing it in ieee80211_propagate_queue_wake will
> have other odd side effects with software queuing. Maybe Michal
> Kazior can tell us if it is safe to remove it.

No, it's the other way around.

Michal's patches correctly added a test for this to
__ieee80211_stop_queue(), the only missing thing is that this test
should also be in ieee80211_do_open() like this:

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 40813dd3301c..5bb0c5012819 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -718,7 +718,8 @@ int ieee80211_do_open(struct wireless_dev *wdev, bool 
coming_up)
ieee80211_recalc_ps(local);
 
if (sdata->vif.type == NL80211_IFTYPE_MONITOR ||
-   sdata->vif.type == NL80211_IFTYPE_AP_VLAN) {
+   sdata->vif.type == NL80211_IFTYPE_AP_VLAN ||
+   local->ops->wake_tx_queue) {
/* XXX: for AP_VLAN, actually track AP queues */
netif_tx_start_all_queues(dev);
} else if (dev) {

johannes


Re: [PATCH ethtool] ethtool: Support for configurable RSS hash function

2017-03-29 Thread Gal Pressman
On 27/03/2017 21:02, John W. Linville wrote:
> On Sat, Mar 25, 2017 at 02:50:47PM -0700, Jakub Kicinski wrote:
>> On Wed,  8 Mar 2017 16:03:51 +0200, Gal Pressman wrote:
>>> This ethtool patch adds support to set and get the current RSS hash
>>> function for the device through the new hfunc mask field in the
>>> ethtool_rxfh struct. Kernel supported hash function names are queried
>>> with ETHTOOL_GSTRINGS - each string is corresponding with a bit in hfunc
>>> mask according to its index in the string-set.
>>>
>>> Signed-off-by: Eyal Perry 
>>> Signed-off-by: Gal Pressman 
>>> Reviewed-by: Saeed Mahameed 
>> Hi John,
>>
>> It seems you have applied both my earlier patch with get support and
>> this:
>>
>> adbaa18b9bc1 ("ethtool: Support for configurable RSS hash function")
>> b932835d2302 ("ethtool: print hash function with ethtool 
>> -x|--show-rxfh-indir")
>>
>> Now we print the RSS function twice:
>>
>> RX flow hash indirection table for em4 with 4 RX ring(s):
>> 0:  [...]
>> RSS hash function: toeplitz  <--- from my adbaa18b9bc1
>> RSS hash key:
>> Operation not supported
>> RSS hash function:   <--- from this patch
>> toeplitz: on
>> xor: off
>> crc32: off
>>
>> Reverting my patch is probably the easiest way forward, although I find
>> it more concise and easier to parse in test scripts :)
> Thanks for pointing-out this issue! I apologize for my own confusion.
>
> As you suggest, I will be reverting your patch.
>
> Thanks,
>
> John
126464e4da18 ('Revert "ethtool: Support for configurable RSS hash function"')

Seems like you ended up reverting my patch instead of Jakub's?
We lost the set hfunc functionality.


Re: [PATCH] tcp: possible race between tcp_done() and tcp_poll()

2017-03-29 Thread Sergei Shtylyov

Hello!

On 3/29/2017 8:22 AM, Seiichi Ikarashi wrote:


Similar to commit a4d258036ed9b2a1811.


   Commit citing is standardized: it should specify 12-digit (at least) SHA1 
and the commit summary line enclosed in ("").



Between receiving a packet and tcp_poll(), sk->sk_err is protected by memory 
barriers but
sk->sk_shutdown and sk->sk_state are not. So possibly, 
POLLIN|POLLRDNORM|POLLRDHUP might
not be set even when receiving a RST packet.

Signed-off-by: Seiichi Ikarashi 



Should be --- before the diffstat.


 net/ipv4/tcp.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)


[...]

MBR, Sergei



Re: [PATCH net-next 8/8] vhost_net: use lockless peeking for skb array during busy polling

2017-03-29 Thread Michael S. Tsirkin
On Tue, Mar 21, 2017 at 12:04:47PM +0800, Jason Wang wrote:
> For the socket that exports its skb array, we can use lockless polling
> to avoid touching spinlock during busy polling.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/net.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 53f09f2..41153a3 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -551,10 +551,13 @@ static int peek_head_len(struct vhost_net_virtqueue 
> *rvq, struct sock *sk)
>   return len;
>  }
>  
> -static int sk_has_rx_data(struct sock *sk)
> +static int sk_has_rx_data(struct vhost_net_virtqueue *rvq, struct sock *sk)
>  {
>   struct socket *sock = sk->sk_socket;
>  
> + if (rvq->rx_array)
> + return !__skb_array_empty(rvq->rx_array);
> +
>   if (sock->ops->peek_len)
>   return sock->ops->peek_len(sock);
>  

I don't see which patch adds __skb_array_empty.

> @@ -579,7 +582,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net 
> *net,
>   endtime = busy_clock() + vq->busyloop_timeout;
>  
>   while (vhost_can_busy_poll(&net->dev, endtime) &&
> -!sk_has_rx_data(sk) &&
> +!sk_has_rx_data(rvq, sk) &&
>  vhost_vq_avail_empty(&net->dev, vq))
>   cpu_relax();
>  
> -- 
> 2.7.4


Re: [REGRESSION] mac80211: IBSS vif queue stopped when started after 11s vif

2017-03-29 Thread Sven Eckelmann
On Mittwoch, 29. März 2017 13:53:06 CEST Johannes Berg wrote:
[...]
> > Not sure whether removing it in ieee80211_propagate_queue_wake will
> > have other odd side effects with software queuing. Maybe Michal
> > Kazior can tell us if it is safe to remove it.
> 
> No, it's the other way around.
> 
> Michal's patches correctly added a test for this to
> __ieee80211_stop_queue(), the only missing thing is that this test
> should also be in ieee80211_do_open() like this:
> 
> diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
> index 40813dd3301c..5bb0c5012819 100644
> --- a/net/mac80211/iface.c
> +++ b/net/mac80211/iface.c
> @@ -718,7 +718,8 @@ int ieee80211_do_open(struct wireless_dev *wdev, bool 
> coming_up)
>   ieee80211_recalc_ps(local);
>  
>   if (sdata->vif.type == NL80211_IFTYPE_MONITOR ||
> - sdata->vif.type == NL80211_IFTYPE_AP_VLAN) {
> + sdata->vif.type == NL80211_IFTYPE_AP_VLAN ||
> + local->ops->wake_tx_queue) {
>   /* XXX: for AP_VLAN, actually track AP queues */
>   netif_tx_start_all_queues(dev);
>   } else if (dev) {

Yes, this also works.

Kind regards,
Sven

signature.asc
Description: This is a digitally signed message part.


[PATCH v2] net: stmmac: dwmac-rk: Add handling for RGMII_ID/RXID/TXID

2017-03-29 Thread Wadim Egorov
ATM dwmac-rk will always set and enable it's internal delay lines.
Using PHY internal delays in combination with the phy-mode
rgmii-id/rxid/txid was not possible. Only rgmii was supported.

Now we can disable rockchip's gmac delay lines and also use
rgmii-id/rxid/txid.

Tested only with a RK3288 based board.

Signed-off-by: Wadim Egorov 
---
Changes in v2: Added parenthesis around both expressions in DELAY_ENABLE
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 53 ++
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index e5db6ac..f0df519 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -74,6 +74,10 @@ struct rk_priv_data {
 #define GRF_BIT(nr)(BIT(nr) | BIT(nr+16))
 #define GRF_CLR_BIT(nr)(BIT(nr+16))
 
+#define DELAY_ENABLE(soc, tx, rx) \
+   (((tx) ? soc##_GMAC_TXCLK_DLY_ENABLE : soc##_GMAC_TXCLK_DLY_DISABLE) | \
+((rx) ? soc##_GMAC_RXCLK_DLY_ENABLE : soc##_GMAC_RXCLK_DLY_DISABLE))
+
 #define RK3228_GRF_MAC_CON00x0900
 #define RK3228_GRF_MAC_CON10x0904
 
@@ -115,8 +119,7 @@ static void rk3228_set_to_rgmii(struct rk_priv_data 
*bsp_priv,
regmap_write(bsp_priv->grf, RK3228_GRF_MAC_CON1,
 RK3228_GMAC_PHY_INTF_SEL_RGMII |
 RK3228_GMAC_RMII_MODE_CLR |
-RK3228_GMAC_RXCLK_DLY_ENABLE |
-RK3228_GMAC_TXCLK_DLY_ENABLE);
+DELAY_ENABLE(RK3228, tx_delay, rx_delay));
 
regmap_write(bsp_priv->grf, RK3228_GRF_MAC_CON0,
 RK3228_GMAC_CLK_RX_DL_CFG(rx_delay) |
@@ -232,8 +235,7 @@ static void rk3288_set_to_rgmii(struct rk_priv_data 
*bsp_priv,
 RK3288_GMAC_PHY_INTF_SEL_RGMII |
 RK3288_GMAC_RMII_MODE_CLR);
regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON3,
-RK3288_GMAC_RXCLK_DLY_ENABLE |
-RK3288_GMAC_TXCLK_DLY_ENABLE |
+DELAY_ENABLE(RK3288, tx_delay, rx_delay) |
 RK3288_GMAC_CLK_RX_DL_CFG(rx_delay) |
 RK3288_GMAC_CLK_TX_DL_CFG(tx_delay));
 }
@@ -460,8 +462,7 @@ static void rk3366_set_to_rgmii(struct rk_priv_data 
*bsp_priv,
 RK3366_GMAC_PHY_INTF_SEL_RGMII |
 RK3366_GMAC_RMII_MODE_CLR);
regmap_write(bsp_priv->grf, RK3366_GRF_SOC_CON7,
-RK3366_GMAC_RXCLK_DLY_ENABLE |
-RK3366_GMAC_TXCLK_DLY_ENABLE |
+DELAY_ENABLE(RK3366, tx_delay, rx_delay) |
 RK3366_GMAC_CLK_RX_DL_CFG(rx_delay) |
 RK3366_GMAC_CLK_TX_DL_CFG(tx_delay));
 }
@@ -572,8 +573,7 @@ static void rk3368_set_to_rgmii(struct rk_priv_data 
*bsp_priv,
 RK3368_GMAC_PHY_INTF_SEL_RGMII |
 RK3368_GMAC_RMII_MODE_CLR);
regmap_write(bsp_priv->grf, RK3368_GRF_SOC_CON16,
-RK3368_GMAC_RXCLK_DLY_ENABLE |
-RK3368_GMAC_TXCLK_DLY_ENABLE |
+DELAY_ENABLE(RK3368, tx_delay, rx_delay) |
 RK3368_GMAC_CLK_RX_DL_CFG(rx_delay) |
 RK3368_GMAC_CLK_TX_DL_CFG(tx_delay));
 }
@@ -684,8 +684,7 @@ static void rk3399_set_to_rgmii(struct rk_priv_data 
*bsp_priv,
 RK3399_GMAC_PHY_INTF_SEL_RGMII |
 RK3399_GMAC_RMII_MODE_CLR);
regmap_write(bsp_priv->grf, RK3399_GRF_SOC_CON6,
-RK3399_GMAC_RXCLK_DLY_ENABLE |
-RK3399_GMAC_TXCLK_DLY_ENABLE |
+DELAY_ENABLE(RK3399, tx_delay, rx_delay) |
 RK3399_GMAC_CLK_RX_DL_CFG(rx_delay) |
 RK3399_GMAC_CLK_TX_DL_CFG(tx_delay));
 }
@@ -985,14 +984,29 @@ static int rk_gmac_powerup(struct rk_priv_data *bsp_priv)
return ret;
 
/*rmii or rgmii*/
-   if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RGMII) {
+   switch (bsp_priv->phy_iface) {
+   case PHY_INTERFACE_MODE_RGMII:
dev_info(dev, "init for RGMII\n");
bsp_priv->ops->set_to_rgmii(bsp_priv, bsp_priv->tx_delay,
bsp_priv->rx_delay);
-   } else if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RMII) {
+   break;
+   case PHY_INTERFACE_MODE_RGMII_ID:
+   dev_info(dev, "init for RGMII_ID\n");
+   bsp_priv->ops->set_to_rgmii(bsp_priv, 0, 0);
+   break;
+   case PHY_INTERFACE_MODE_RGMII_RXID:
+   dev_info(dev, "init for RGMII_RXID\n");
+   bsp_priv->ops->set_to_rgmii(bsp_priv, bsp_priv->tx_delay, 0);
+   break;
+   case PHY_INTERFACE_MODE_RGMII_TXID:
+   dev_info(dev, "init for RGMII_TXID\n");
+   bsp_priv->ops->set_to_rgmii(bsp_priv, 0, bsp_priv->rx_delay);
+ 

[PATCH 2/8] netfilter: nfnl_cthelper: fix runtime expectation policy updates

2017-03-29 Thread Pablo Neira Ayuso
We only allow runtime updates of expectation policies for timeout and
maximum number of expectations, otherwise reject the update.

Signed-off-by: Pablo Neira Ayuso 
Acked-by: Liping Zhang 
---
 net/netfilter/nfnetlink_cthelper.c | 86 +-
 1 file changed, 84 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nfnetlink_cthelper.c 
b/net/netfilter/nfnetlink_cthelper.c
index 3cd41d105407..90f291e27eb1 100644
--- a/net/netfilter/nfnetlink_cthelper.c
+++ b/net/netfilter/nfnetlink_cthelper.c
@@ -256,6 +256,89 @@ nfnl_cthelper_create(const struct nlattr * const tb[],
 }
 
 static int
+nfnl_cthelper_update_policy_one(const struct nf_conntrack_expect_policy 
*policy,
+   struct nf_conntrack_expect_policy *new_policy,
+   const struct nlattr *attr)
+{
+   struct nlattr *tb[NFCTH_POLICY_MAX + 1];
+   int err;
+
+   err = nla_parse_nested(tb, NFCTH_POLICY_MAX, attr,
+  nfnl_cthelper_expect_pol);
+   if (err < 0)
+   return err;
+
+   if (!tb[NFCTH_POLICY_NAME] ||
+   !tb[NFCTH_POLICY_EXPECT_MAX] ||
+   !tb[NFCTH_POLICY_EXPECT_TIMEOUT])
+   return -EINVAL;
+
+   if (nla_strcmp(tb[NFCTH_POLICY_NAME], policy->name))
+   return -EBUSY;
+
+   new_policy->max_expected =
+   ntohl(nla_get_be32(tb[NFCTH_POLICY_EXPECT_MAX]));
+   new_policy->timeout =
+   ntohl(nla_get_be32(tb[NFCTH_POLICY_EXPECT_TIMEOUT]));
+
+   return 0;
+}
+
+static int nfnl_cthelper_update_policy_all(struct nlattr *tb[],
+  struct nf_conntrack_helper *helper)
+{
+   struct nf_conntrack_expect_policy new_policy[helper->expect_class_max + 
1];
+   struct nf_conntrack_expect_policy *policy;
+   int i, err;
+
+   /* Check first that all policy attributes are well-formed, so we don't
+* leave things in inconsistent state on errors.
+*/
+   for (i = 0; i < helper->expect_class_max + 1; i++) {
+
+   if (!tb[NFCTH_POLICY_SET + i])
+   return -EINVAL;
+
+   err = nfnl_cthelper_update_policy_one(&helper->expect_policy[i],
+ &new_policy[i],
+ tb[NFCTH_POLICY_SET + i]);
+   if (err < 0)
+   return err;
+   }
+   /* Now we can safely update them. */
+   for (i = 0; i < helper->expect_class_max + 1; i++) {
+   policy = (struct nf_conntrack_expect_policy *)
+   &helper->expect_policy[i];
+   policy->max_expected = new_policy->max_expected;
+   policy->timeout = new_policy->timeout;
+   }
+
+   return 0;
+}
+
+static int nfnl_cthelper_update_policy(struct nf_conntrack_helper *helper,
+  const struct nlattr *attr)
+{
+   struct nlattr *tb[NFCTH_POLICY_SET_MAX + 1];
+   unsigned int class_max;
+   int err;
+
+   err = nla_parse_nested(tb, NFCTH_POLICY_SET_MAX, attr,
+  nfnl_cthelper_expect_policy_set);
+   if (err < 0)
+   return err;
+
+   if (!tb[NFCTH_POLICY_SET_NUM])
+   return -EINVAL;
+
+   class_max = ntohl(nla_get_be32(tb[NFCTH_POLICY_SET_NUM]));
+   if (helper->expect_class_max + 1 != class_max)
+   return -EBUSY;
+
+   return nfnl_cthelper_update_policy_all(tb, helper);
+}
+
+static int
 nfnl_cthelper_update(const struct nlattr * const tb[],
 struct nf_conntrack_helper *helper)
 {
@@ -265,8 +348,7 @@ nfnl_cthelper_update(const struct nlattr * const tb[],
return -EBUSY;
 
if (tb[NFCTH_POLICY]) {
-   ret = nfnl_cthelper_parse_expect_policy(helper,
-   tb[NFCTH_POLICY]);
+   ret = nfnl_cthelper_update_policy(helper, tb[NFCTH_POLICY]);
if (ret < 0)
return ret;
}
-- 
2.1.4



[PATCH 1/8] netfilter: nfnl_cthelper: fix incorrect helper->expect_class_max

2017-03-29 Thread Pablo Neira Ayuso
From: Liping Zhang 

The helper->expect_class_max must be set to the total number of
expect_policy minus 1, since we will use the statement "if (class >
helper->expect_class_max)" to validate the CTA_EXPECT_CLASS attr in
ctnetlink_alloc_expect.

So for compatibility, set the helper->expect_class_max to the
NFCTH_POLICY_SET_NUM attr's value minus 1.

Also: it's invalid when the NFCTH_POLICY_SET_NUM attr's value is zero.
1. this will result "expect_policy = kzalloc(0, GFP_KERNEL);";
2. we cannot set the helper->expect_class_max to a proper value.

So if nla_get_be32(tb[NFCTH_POLICY_SET_NUM]) is zero, report -EINVAL to
the userspace.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nfnetlink_cthelper.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/nfnetlink_cthelper.c 
b/net/netfilter/nfnetlink_cthelper.c
index de8782345c86..3cd41d105407 100644
--- a/net/netfilter/nfnetlink_cthelper.c
+++ b/net/netfilter/nfnetlink_cthelper.c
@@ -161,6 +161,7 @@ nfnl_cthelper_parse_expect_policy(struct 
nf_conntrack_helper *helper,
int i, ret;
struct nf_conntrack_expect_policy *expect_policy;
struct nlattr *tb[NFCTH_POLICY_SET_MAX+1];
+   unsigned int class_max;
 
ret = nla_parse_nested(tb, NFCTH_POLICY_SET_MAX, attr,
   nfnl_cthelper_expect_policy_set);
@@ -170,19 +171,18 @@ nfnl_cthelper_parse_expect_policy(struct 
nf_conntrack_helper *helper,
if (!tb[NFCTH_POLICY_SET_NUM])
return -EINVAL;
 
-   helper->expect_class_max =
-   ntohl(nla_get_be32(tb[NFCTH_POLICY_SET_NUM]));
-
-   if (helper->expect_class_max != 0 &&
-   helper->expect_class_max > NF_CT_MAX_EXPECT_CLASSES)
+   class_max = ntohl(nla_get_be32(tb[NFCTH_POLICY_SET_NUM]));
+   if (class_max == 0)
+   return -EINVAL;
+   if (class_max > NF_CT_MAX_EXPECT_CLASSES)
return -EOVERFLOW;
 
expect_policy = kzalloc(sizeof(struct nf_conntrack_expect_policy) *
-   helper->expect_class_max, GFP_KERNEL);
+   class_max, GFP_KERNEL);
if (expect_policy == NULL)
return -ENOMEM;
 
-   for (i=0; iexpect_class_max; i++) {
+   for (i = 0; i < class_max; i++) {
if (!tb[NFCTH_POLICY_SET+i])
goto err;
 
@@ -191,6 +191,8 @@ nfnl_cthelper_parse_expect_policy(struct 
nf_conntrack_helper *helper,
if (ret < 0)
goto err;
}
+
+   helper->expect_class_max = class_max - 1;
helper->expect_policy = expect_policy;
return 0;
 err:
@@ -377,10 +379,10 @@ nfnl_cthelper_dump_policy(struct sk_buff *skb,
goto nla_put_failure;
 
if (nla_put_be32(skb, NFCTH_POLICY_SET_NUM,
-htonl(helper->expect_class_max)))
+htonl(helper->expect_class_max + 1)))
goto nla_put_failure;
 
-   for (i=0; iexpect_class_max; i++) {
+   for (i = 0; i < helper->expect_class_max + 1; i++) {
nest_parms2 = nla_nest_start(skb,
(NFCTH_POLICY_SET+i) | NLA_F_NESTED);
if (nest_parms2 == NULL)
-- 
2.1.4



[PATCH 0/8] Netfilter fixes for net

2017-03-29 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains a rather large update with Netfilter
fixes, specifically targeted to incorrect RCU usage in several spots and
the userspace conntrack helper infrastructure (nfnetlink_cthelper),
more specifically they are:

1) expect_class_max is incorrect set via cthelper, as in kernel semantics
   mandate that this represents the array of expectation classes minus 1.
   Patch from Liping Zhang.

2) Expectation policy updates via cthelper are currently broken for several
   reasons: This code allows illegal changes in the policy such as changing
   the number of expeciation classes, it is leaking the updated policy and
   such update occurs with no RCU protection at all. Fix this by adding a
   new nfnl_cthelper_update_policy() that describes what is really legal on
   the update path.

3) Fix several memory leaks in cthelper, from Jeffy Chen.

4) synchronize_rcu() is missing in the removal path of several modules,
   this may lead to races since CPU may still be running on code that has
   just gone. Also from Liping Zhang.

5) Don't use the helper hashtable from cthelper, it is not safe to walk
   over those bits without the helper mutex. Fix this by introducing a
   new independent list for userspace helpers. From Liping Zhang.

6) nf_ct_extend_unregister() needs synchronize_rcu() to make sure no
   packets are walking on any conntrack extension that is gone after
   module removal, again from Liping.

7) nf_nat_snmp may crash if we fail to unregister the helper due to
   accidental leftover code, from Gao Feng.

8) Fix leak in nfnetlink_queue with secctx support, from Liping Zhang.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit db7f00b8dba6d687b6ab1f2e9309acfd214fcb4b:

  tcp: tcp_get_info() should read tcp_time_stamp later (2017-03-16 21:37:13 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 77c1c03c5b8ef28e55bb0aff29b1e006037ca645:

  netfilter: nfnetlink_queue: fix secctx memory leak (2017-03-29 12:20:50 +0200)


Gao Feng (1):
  netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register

Jeffy Chen (1):
  netfilter: nfnl_cthelper: Fix memory leak

Liping Zhang (5):
  netfilter: nfnl_cthelper: fix incorrect helper->expect_class_max
  netfilter: invoke synchronize_rcu after set the _hook_ to NULL
  netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table
  netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister
  netfilter: nfnetlink_queue: fix secctx memory leak

Pablo Neira Ayuso (1):
  netfilter: nfnl_cthelper: fix runtime expectation policy updates

 net/ipv4/netfilter/nf_nat_snmp_basic.c |  20 +--
 net/netfilter/nf_conntrack_ecache.c|   2 +
 net/netfilter/nf_conntrack_extend.c|  13 +-
 net/netfilter/nf_conntrack_netlink.c   |   1 +
 net/netfilter/nf_nat_core.c|   2 +
 net/netfilter/nfnetlink_cthelper.c | 287 +
 net/netfilter/nfnetlink_cttimeout.c|   2 +-
 net/netfilter/nfnetlink_queue.c|   9 +-
 8 files changed, 206 insertions(+), 130 deletions(-)


[PATCH 5/8] netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table

2017-03-29 Thread Pablo Neira Ayuso
From: Liping Zhang 

The nf_ct_helper_hash table is protected by nf_ct_helper_mutex, while
nfct_helper operation is protected by nfnl_lock(NFNL_SUBSYS_CTHELPER).
So it's possible that one CPU is walking the nf_ct_helper_hash for
cthelper add/get/del, another cpu is doing nf_conntrack_helpers_unregister
at the same time. This is dangrous, and may cause use after free error.

Note, delete operation will flush all cthelpers added via nfnetlink, so
using rcu to do protect is not easy.

Now introduce a dummy list to record all the cthelpers added via
nfnetlink, then we can walk the dummy list instead of walking the
nf_ct_helper_hash. Also, keep nfnl_cthelper_dump_table unchanged, it
may be invoked without nfnl_lock(NFNL_SUBSYS_CTHELPER) held.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nfnetlink_cthelper.c | 177 +
 1 file changed, 81 insertions(+), 96 deletions(-)

diff --git a/net/netfilter/nfnetlink_cthelper.c 
b/net/netfilter/nfnetlink_cthelper.c
index 2b987d2a77bc..d45558178da5 100644
--- a/net/netfilter/nfnetlink_cthelper.c
+++ b/net/netfilter/nfnetlink_cthelper.c
@@ -32,6 +32,13 @@ MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Pablo Neira Ayuso ");
 MODULE_DESCRIPTION("nfnl_cthelper: User-space connection tracking helpers");
 
+struct nfnl_cthelper {
+   struct list_headlist;
+   struct nf_conntrack_helper  helper;
+};
+
+static LIST_HEAD(nfnl_cthelper_list);
+
 static int
 nfnl_userspace_cthelper(struct sk_buff *skb, unsigned int protoff,
struct nf_conn *ct, enum ip_conntrack_info ctinfo)
@@ -205,14 +212,16 @@ nfnl_cthelper_create(const struct nlattr * const tb[],
 struct nf_conntrack_tuple *tuple)
 {
struct nf_conntrack_helper *helper;
+   struct nfnl_cthelper *nfcth;
int ret;
 
if (!tb[NFCTH_TUPLE] || !tb[NFCTH_POLICY] || !tb[NFCTH_PRIV_DATA_LEN])
return -EINVAL;
 
-   helper = kzalloc(sizeof(struct nf_conntrack_helper), GFP_KERNEL);
-   if (helper == NULL)
+   nfcth = kzalloc(sizeof(*nfcth), GFP_KERNEL);
+   if (nfcth == NULL)
return -ENOMEM;
+   helper = &nfcth->helper;
 
ret = nfnl_cthelper_parse_expect_policy(helper, tb[NFCTH_POLICY]);
if (ret < 0)
@@ -249,11 +258,12 @@ nfnl_cthelper_create(const struct nlattr * const tb[],
if (ret < 0)
goto err2;
 
+   list_add_tail(&nfcth->list, &nfnl_cthelper_list);
return 0;
 err2:
kfree(helper->expect_policy);
 err1:
-   kfree(helper);
+   kfree(nfcth);
return ret;
 }
 
@@ -379,7 +389,8 @@ static int nfnl_cthelper_new(struct net *net, struct sock 
*nfnl,
const char *helper_name;
struct nf_conntrack_helper *cur, *helper = NULL;
struct nf_conntrack_tuple tuple;
-   int ret = 0, i;
+   struct nfnl_cthelper *nlcth;
+   int ret = 0;
 
if (!tb[NFCTH_NAME] || !tb[NFCTH_TUPLE])
return -EINVAL;
@@ -390,31 +401,22 @@ static int nfnl_cthelper_new(struct net *net, struct sock 
*nfnl,
if (ret < 0)
return ret;
 
-   rcu_read_lock();
-   for (i = 0; i < nf_ct_helper_hsize && !helper; i++) {
-   hlist_for_each_entry_rcu(cur, &nf_ct_helper_hash[i], hnode) {
+   list_for_each_entry(nlcth, &nfnl_cthelper_list, list) {
+   cur = &nlcth->helper;
 
-   /* skip non-userspace conntrack helpers. */
-   if (!(cur->flags & NF_CT_HELPER_F_USERSPACE))
-   continue;
+   if (strncmp(cur->name, helper_name, NF_CT_HELPER_NAME_LEN))
+   continue;
 
-   if (strncmp(cur->name, helper_name,
-   NF_CT_HELPER_NAME_LEN) != 0)
-   continue;
+   if ((tuple.src.l3num != cur->tuple.src.l3num ||
+tuple.dst.protonum != cur->tuple.dst.protonum))
+   continue;
 
-   if ((tuple.src.l3num != cur->tuple.src.l3num ||
-tuple.dst.protonum != cur->tuple.dst.protonum))
-   continue;
+   if (nlh->nlmsg_flags & NLM_F_EXCL)
+   return -EEXIST;
 
-   if (nlh->nlmsg_flags & NLM_F_EXCL) {
-   ret = -EEXIST;
-   goto err;
-   }
-   helper = cur;
-   break;
-   }
+   helper = cur;
+   break;
}
-   rcu_read_unlock();
 
if (helper == NULL)
ret = nfnl_cthelper_create(tb, &tuple);
@@ -422,9 +424,6 @@ static int nfnl_cthelper_new(struct net *net, struct sock 
*nfnl,
ret = nfnl_cthelper_update(tb, helper);
 
return ret;
-err:
-   rcu_read_unlock();
-  

[PATCH 4/8] netfilter: invoke synchronize_rcu after set the _hook_ to NULL

2017-03-29 Thread Pablo Neira Ayuso
From: Liping Zhang 

Otherwise, another CPU may access the invalid pointer. For example:
CPU0CPU1
 -  rcu_read_lock();
 -  pfunc = _hook_;
  _hook_ = NULL;  -
  mod unload  -
 - pfunc(); // invalid, panic
 - rcu_read_unlock();

So we must call synchronize_rcu() to wait the rcu reader to finish.

Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked
by later nf_conntrack_helper_unregister, but I'm inclined to add a
explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend
on such obscure assumptions is not a good idea.

Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object,
so in cttimeout_exit, invoking rcu_barrier() is not necessary at all,
remove it too.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/nf_nat_snmp_basic.c | 1 +
 net/netfilter/nf_conntrack_ecache.c| 2 ++
 net/netfilter/nf_conntrack_netlink.c   | 1 +
 net/netfilter/nf_nat_core.c| 2 ++
 net/netfilter/nfnetlink_cttimeout.c| 2 +-
 5 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c 
b/net/ipv4/netfilter/nf_nat_snmp_basic.c
index c9b52c361da2..5a8f7c360887 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c
@@ -1304,6 +1304,7 @@ static int __init nf_nat_snmp_basic_init(void)
 static void __exit nf_nat_snmp_basic_fini(void)
 {
RCU_INIT_POINTER(nf_nat_snmp_hook, NULL);
+   synchronize_rcu();
nf_conntrack_helper_unregister(&snmp_trap_helper);
 }
 
diff --git a/net/netfilter/nf_conntrack_ecache.c 
b/net/netfilter/nf_conntrack_ecache.c
index da9df2d56e66..22fc32143e9c 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -290,6 +290,7 @@ void nf_conntrack_unregister_notifier(struct net *net,
BUG_ON(notify != new);
RCU_INIT_POINTER(net->ct.nf_conntrack_event_cb, NULL);
mutex_unlock(&nf_ct_ecache_mutex);
+   /* synchronize_rcu() is called from ctnetlink_exit. */
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_unregister_notifier);
 
@@ -326,6 +327,7 @@ void nf_ct_expect_unregister_notifier(struct net *net,
BUG_ON(notify != new);
RCU_INIT_POINTER(net->ct.nf_expect_event_cb, NULL);
mutex_unlock(&nf_ct_ecache_mutex);
+   /* synchronize_rcu() is called from ctnetlink_exit. */
 }
 EXPORT_SYMBOL_GPL(nf_ct_expect_unregister_notifier);
 
diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index 6806b5e73567..908d858034e4 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -3442,6 +3442,7 @@ static void __exit ctnetlink_exit(void)
 #ifdef CONFIG_NETFILTER_NETLINK_GLUE_CT
RCU_INIT_POINTER(nfnl_ct_hook, NULL);
 #endif
+   synchronize_rcu();
 }
 
 module_init(ctnetlink_init);
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 94b14c5a8b17..82802e4a6640 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -903,6 +903,8 @@ static void __exit nf_nat_cleanup(void)
 #ifdef CONFIG_XFRM
RCU_INIT_POINTER(nf_nat_decode_session_hook, NULL);
 #endif
+   synchronize_rcu();
+
for (i = 0; i < NFPROTO_NUMPROTO; i++)
kfree(nf_nat_l4protos[i]);
 
diff --git a/net/netfilter/nfnetlink_cttimeout.c 
b/net/netfilter/nfnetlink_cttimeout.c
index 139e0867e56e..47d6656c9119 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -646,8 +646,8 @@ static void __exit cttimeout_exit(void)
 #ifdef CONFIG_NF_CONNTRACK_TIMEOUT
RCU_INIT_POINTER(nf_ct_timeout_find_get_hook, NULL);
RCU_INIT_POINTER(nf_ct_timeout_put_hook, NULL);
+   synchronize_rcu();
 #endif /* CONFIG_NF_CONNTRACK_TIMEOUT */
-   rcu_barrier();
 }
 
 module_init(cttimeout_init);
-- 
2.1.4



[PATCH 8/8] netfilter: nfnetlink_queue: fix secctx memory leak

2017-03-29 Thread Pablo Neira Ayuso
From: Liping Zhang 

We must call security_release_secctx to free the memory returned by
security_secid_to_secctx, otherwise memory may be leaked forever.

Fixes: ef493bd930ae ("netfilter: nfnetlink_queue: add security context 
information")
Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nfnetlink_queue.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 3ee0b8a000a4..933509ebf3d3 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -443,7 +443,7 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
skb = alloc_skb(size, GFP_ATOMIC);
if (!skb) {
skb_tx_error(entskb);
-   return NULL;
+   goto nlmsg_failure;
}
 
nlh = nlmsg_put(skb, 0, 0,
@@ -452,7 +452,7 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
if (!nlh) {
skb_tx_error(entskb);
kfree_skb(skb);
-   return NULL;
+   goto nlmsg_failure;
}
nfmsg = nlmsg_data(nlh);
nfmsg->nfgen_family = entry->state.pf;
@@ -598,12 +598,17 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
}
 
nlh->nlmsg_len = skb->len;
+   if (seclen)
+   security_release_secctx(secdata, seclen);
return skb;
 
 nla_put_failure:
skb_tx_error(entskb);
kfree_skb(skb);
net_err_ratelimited("nf_queue: error creating packet message\n");
+nlmsg_failure:
+   if (seclen)
+   security_release_secctx(secdata, seclen);
return NULL;
 }
 
-- 
2.1.4



[PATCH 7/8] netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register

2017-03-29 Thread Pablo Neira Ayuso
From: Gao Feng 

In the commit 93557f53e1fb ("netfilter: nf_conntrack: nf_conntrack snmp
helper"), the snmp_helper is replaced by nf_nat_snmp_hook. So the
snmp_helper is never registered. But it still tries to unregister the
snmp_helper, it could cause the panic.

Now remove the useless snmp_helper and the unregister call in the
error handler.

Fixes: 93557f53e1fb ("netfilter: nf_conntrack: nf_conntrack snmp helper")
Signed-off-by: Gao Feng 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/nf_nat_snmp_basic.c | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c 
b/net/ipv4/netfilter/nf_nat_snmp_basic.c
index 5a8f7c360887..53e49f5011d3 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c
@@ -1260,16 +1260,6 @@ static const struct nf_conntrack_expect_policy 
snmp_exp_policy = {
.timeout= 180,
 };
 
-static struct nf_conntrack_helper snmp_helper __read_mostly = {
-   .me = THIS_MODULE,
-   .help   = help,
-   .expect_policy  = &snmp_exp_policy,
-   .name   = "snmp",
-   .tuple.src.l3num= AF_INET,
-   .tuple.src.u.udp.port   = cpu_to_be16(SNMP_PORT),
-   .tuple.dst.protonum = IPPROTO_UDP,
-};
-
 static struct nf_conntrack_helper snmp_trap_helper __read_mostly = {
.me = THIS_MODULE,
.help   = help,
@@ -1288,17 +1278,10 @@ static struct nf_conntrack_helper snmp_trap_helper 
__read_mostly = {
 
 static int __init nf_nat_snmp_basic_init(void)
 {
-   int ret = 0;
-
BUG_ON(nf_nat_snmp_hook != NULL);
RCU_INIT_POINTER(nf_nat_snmp_hook, help);
 
-   ret = nf_conntrack_helper_register(&snmp_trap_helper);
-   if (ret < 0) {
-   nf_conntrack_helper_unregister(&snmp_helper);
-   return ret;
-   }
-   return ret;
+   return nf_conntrack_helper_register(&snmp_trap_helper);
 }
 
 static void __exit nf_nat_snmp_basic_fini(void)
-- 
2.1.4



[PATCH 3/8] netfilter: nfnl_cthelper: Fix memory leak

2017-03-29 Thread Pablo Neira Ayuso
From: Jeffy Chen 

We have memory leaks of nf_conntrack_helper & expect_policy.

Signed-off-by: Jeffy Chen 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nfnetlink_cthelper.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nfnetlink_cthelper.c 
b/net/netfilter/nfnetlink_cthelper.c
index 90f291e27eb1..2b987d2a77bc 100644
--- a/net/netfilter/nfnetlink_cthelper.c
+++ b/net/netfilter/nfnetlink_cthelper.c
@@ -216,7 +216,7 @@ nfnl_cthelper_create(const struct nlattr * const tb[],
 
ret = nfnl_cthelper_parse_expect_policy(helper, tb[NFCTH_POLICY]);
if (ret < 0)
-   goto err;
+   goto err1;
 
strncpy(helper->name, nla_data(tb[NFCTH_NAME]), NF_CT_HELPER_NAME_LEN);
helper->data_len = ntohl(nla_get_be32(tb[NFCTH_PRIV_DATA_LEN]));
@@ -247,10 +247,12 @@ nfnl_cthelper_create(const struct nlattr * const tb[],
 
ret = nf_conntrack_helper_register(helper);
if (ret < 0)
-   goto err;
+   goto err2;
 
return 0;
-err:
+err2:
+   kfree(helper->expect_policy);
+err1:
kfree(helper);
return ret;
 }
@@ -696,6 +698,8 @@ static int nfnl_cthelper_del(struct net *net, struct sock 
*nfnl,
 
found = true;
nf_conntrack_helper_unregister(cur);
+   kfree(cur->expect_policy);
+   kfree(cur);
}
}
/* Make sure we return success if we flush and there is no helpers */
@@ -759,6 +763,8 @@ static void __exit nfnl_cthelper_exit(void)
continue;
 
nf_conntrack_helper_unregister(cur);
+   kfree(cur->expect_policy);
+   kfree(cur);
}
}
 }
-- 
2.1.4



[PATCH 6/8] netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister

2017-03-29 Thread Pablo Neira Ayuso
From: Liping Zhang 

If one cpu is doing nf_ct_extend_unregister while another cpu is doing
__nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
NULL, so it's possible that we may access invalid pointer.

But actually, most of the ct extends are built-in, so the problem listed
above will not happen. However, there are two exceptions: NF_CT_EXT_NAT
and NF_CT_EXT_SYNPROXY.

For _EXT_NAT, the panic will not happen, since adding the nat extend and
unregistering the nat extend are located in the same file(nf_nat_core.c),
this means that after the nat module is removed, we cannot add the nat
extend too.

For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while
synproxy extend unregister will be done by synproxy_core_exit. So after
nf_synproxy_core.ko is removed, we may still try to add the synproxy
extend, then kernel panic may happen.

I know it's very hard to reproduce this issue, but I can play a tricky
game to make it happen very easily :)

Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook:
  # iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY
Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook.
Also note, in the userspace we only add a 20s' delay, then
reinject the syn packet to the kernel:
  # iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1
Step 3. Using "nc 2.2.2.2 1234" to connect the server.
Step 4. Now remove the nf_synproxy_core.ko quickly:
  # iptables -F FORWARD
  # rmmod ipt_SYNPROXY
  # rmmod nf_synproxy_core
Step 5. After 20s' delay, the syn packet is reinjected to the kernel.

Now you will see the panic like this:
  kernel BUG at net/netfilter/nf_conntrack_extend.c:91!
  Call Trace:
   ? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack]
   init_conntrack+0x12b/0x600 [nf_conntrack]
   nf_conntrack_in+0x4cc/0x580 [nf_conntrack]
   ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4]
   nf_reinject+0x104/0x270
   nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue]
   ? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue]
   ? nla_parse+0xa0/0x100
   nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink]
   [...]

One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e.
introduce nf_conntrack_synproxy.c and only do ct extend register and
unregister in it, similar to nf_conntrack_timeout.c.

But having such a obscure restriction of nf_ct_extend_unregister is not a
good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types
to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then
it will be easier if we add new ct extend in the future.

Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary
anymore, remove it too.

Signed-off-by: Liping Zhang 
Acked-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_extend.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_conntrack_extend.c 
b/net/netfilter/nf_conntrack_extend.c
index 02bcf00c2492..008299b7f78f 100644
--- a/net/netfilter/nf_conntrack_extend.c
+++ b/net/netfilter/nf_conntrack_extend.c
@@ -53,7 +53,11 @@ nf_ct_ext_create(struct nf_ct_ext **ext, enum nf_ct_ext_id 
id,
 
rcu_read_lock();
t = rcu_dereference(nf_ct_ext_types[id]);
-   BUG_ON(t == NULL);
+   if (!t) {
+   rcu_read_unlock();
+   return NULL;
+   }
+
off = ALIGN(sizeof(struct nf_ct_ext), t->align);
len = off + t->len + var_alloc_len;
alloc_size = t->alloc_size + var_alloc_len;
@@ -88,7 +92,10 @@ void *__nf_ct_ext_add_length(struct nf_conn *ct, enum 
nf_ct_ext_id id,
 
rcu_read_lock();
t = rcu_dereference(nf_ct_ext_types[id]);
-   BUG_ON(t == NULL);
+   if (!t) {
+   rcu_read_unlock();
+   return NULL;
+   }
 
newoff = ALIGN(old->len, t->align);
newlen = newoff + t->len + var_alloc_len;
@@ -175,6 +182,6 @@ void nf_ct_extend_unregister(struct nf_ct_ext_type *type)
RCU_INIT_POINTER(nf_ct_ext_types[type->id], NULL);
update_alloc_size(type);
mutex_unlock(&nf_ct_ext_type_mutex);
-   rcu_barrier(); /* Wait for completion of call_rcu()'s */
+   synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(nf_ct_extend_unregister);
-- 
2.1.4



Re: [PATCH net-next] rtnl: Add support for netdev event to link messages

2017-03-29 Thread Vlad Yasevich
[ resending to list.  hit the wrong reply button last time ]

On 03/27/2017 06:58 PM, David Miller wrote:
> From: Vladislav Yasevich 
> Date: Sat, 25 Mar 2017 21:59:47 -0400
> 
>> RTNL currently generates notifications on some netdev notifier events.
>> However, user space has no idea what changed.  All it sees is the
>> data and has to infer what has changed.  For some events that is not
>> possible.
>>
>> This patch adds a new field to RTM_NEWLINK message called IFLA_EVENT
>> that would have an encoding of the which event triggered this
>> notification.  Currectly, only 2 events (NETDEV_NOTIFY_PEERS and
>> NETDEV_MTUCHANGED) are supported.  These events could be interesting
>> in the virt space to trigger additional configuration commands to VMs.
>> Other events of interest may be added later.
>>
>> Signed-off-by: Vladislav Yasevich 
> 
> At what point do we start providing the metadata for the changed
> values as well?  You'd probably need to provide both the old and
> new values to cover all cases.

I don't think if that would be possible because of when events are triggered.
We send these notifications after all the changes have already been made, so
it might be tough to carry old data.

Looking at just the two events I am supporting in this patch, we could actually
supply the old mtu data through a NETDEV_PRECHANGEMTU event, if it is necessary.
For the use cases I am looking at, it isn't usefull, but easy enough to add.

> 
>> @@ -4044,6 +4076,7 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct 
>> netlink_callback *cb)
>>  return skb->len;
>>  }
>>  
>> +
>>  /* Process one rtnetlink message. */
>>  
>>  static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> 
> Please don't add more empty lines between functions, one is enough.
> 

Sorry, got left-over after moving the code around.  Will remove when 
resubmitting.

-vlad


[PATCH] virtio_net: fix mergeable bufs error handling

2017-03-29 Thread Michael S. Tsirkin
On xdp error we try to free head_skb without having
initialized it, that's clearly bogus.

Fixes: f600b6905015 ("virtio_net: Add XDP support")
Cc: John Fastabend 
Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11773d6..e0fb3707 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -570,7 +570,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
u16 num_buf;
struct page *page;
int offset;
-   struct sk_buff *head_skb, *curr_skb;
+   struct sk_buff *head_skb = NULL, *curr_skb;
struct bpf_prog *xdp_prog;
unsigned int truesize;
 
-- 
MST


[PATCH] virtio_net: enable big packets for large MTU values

2017-03-29 Thread Michael S. Tsirkin
If one enables e.g. jumbo frames without mergeable
buffers, packets won't fit in 1500 byte buffers
we use. Switch to big packet mode instead.
TODO: make sizing more exact, possibly extend small
packet mode to use larger pages.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/virtio_net.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e0fb3707..9dc31dc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2428,6 +2428,10 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->mtu = mtu;
dev->max_mtu = mtu;
}
+
+   /* TODO: size buffers correctly in this case. */
+   if (dev->mtu > ETH_DATA_LEN)
+   vi->big_packets = true;
}
 
if (vi->any_header_sg)
-- 
MST


Re: [PATCH net-next v2] net: mvneta: set rx mode during resume if interface is running

2017-03-29 Thread Andrew Lunn
On Wed, Mar 29, 2017 at 04:47:19PM +0800, Jisheng Zhang wrote:
> I found a bug by:
> 
> 0. boot and start dhcp client
> 1. echo mem > /sys/power/state
> 2. resume back immediately
> 3. don't touch dhcp client to renew the lease
> 4. ping the gateway. No acks
> 
> Usually, after step2, the DHCP lease isn't expired, so in theory we
> should resume all back. But in fact, it doesn't. It turns out
> the rx mode isn't resumed correctly. This patch fixes it by adding
> mvneta_set_rx_mode(dev) in the resume hook if interface is running.
> 
> Signed-off-by: Jisheng Zhang 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net-next] net: mvneta: add RGMII_RXID and RGMII_TXID support

2017-03-29 Thread Andrew Lunn
On Wed, Mar 29, 2017 at 04:42:26PM +0800, Jisheng Zhang wrote:
> RGMII_RXID and RGMII_TX_ID share the same GMAC CTRL setting as RGMII
> or RGMII_ID.
> 
> Signed-off-by: Jisheng Zhang 

Looks reasonable.

Reviewed-by: Andrew Lunn 

Andrew


Private Placement

2017-03-29 Thread Mr. Ernest Pinto
Good day,

Welcome to our Private Placement Portfolio.

I am a Staff of a Venture Capital Firm specializing in Growth Capital 
Investments/Loans.We seek to invest in Projects with Public and Private sectors 
in a broad range of areas including Real estate, Agriculture, Energy, Oil and 
Gas, emerging markets and high-technology. Within the technology sector, the 
firm focuses on communications, software, digital content and services.

We have the capacity to invest a considerable amount of funds in any viable 
project(s) that your company requires funding for on an investment 
capacity/Loan Application. Upon the review of your company's Project Business 
Plan we shall determine on the project(s) possible funding. This will be form 
of a silent and Private Placement Investments.

Endeavor to respond promptly if the investment proposal meets your company's 
approval.

Kind Regards,
Ernest Pinto


Re: Bisected softirq accounting issue in v4.11-rc1~170^2~28

2017-03-29 Thread Frederic Weisbecker
On Wed, Mar 29, 2017 at 11:30:30AM +0200, Jesper Dangaard Brouer wrote:
> On Tue, 28 Mar 2017 23:11:22 +0200
> Frederic Weisbecker  wrote:
> 
> > On Tue, Mar 28, 2017 at 05:23:03PM +0200, Jesper Dangaard Brouer wrote:
> > > On Tue, 28 Mar 2017 16:34:36 +0200
> > > Frederic Weisbecker  wrote:
> > >   
> > > > On Tue, Mar 28, 2017 at 10:14:03AM +0200, Jesper Dangaard Brouer wrote: 
> > > >  
> > > > > 
> > > > > (While evaluating some changes to the page allocator) I ran into an
> > > > > issue with ksoftirqd getting too much CPU sched time.
> > > > > 
> > > > > I bisected the problem to
> > > > >  a499a5a14dbd ("sched/cputime: Increment kcpustat directly on irqtime 
> > > > > account")
> > > > > 
> > > > >  a499a5a14dbd1d0315a96fc62a8798059325e9e6 is the first bad commit
> > > > >  commit a499a5a14dbd1d0315a96fc62a8798059325e9e6
> > > > >  Author: Frederic Weisbecker 
> > > > >  Date:   Tue Jan 31 04:09:32 2017 +0100
> > > > > 
> > > > > sched/cputime: Increment kcpustat directly on irqtime account
> > > > > 
> > > > > The irqtime is accounted is nsecs and stored in
> > > > > cpu_irq_time.hardirq_time and cpu_irq_time.softirq_time. Once the
> > > > > accumulated amount reaches a new jiffy, this one gets accounted 
> > > > > to the
> > > > > kcpustat.
> > > > > 
> > > > > This was necessary when kcpustat was stored in cputime_t, which 
> > > > > could at
> > > > > worst have jiffies granularity. But now kcpustat is stored in 
> > > > > nsecs
> > > > > so this whole discretization game with temporary irqtime storage 
> > > > > has
> > > > > become unnecessary.
> > > > > 
> > > > > We can now directly account the irqtime to the kcpustat.
> > > > > 
> > > > > Signed-off-by: Frederic Weisbecker 
> > > > > Cc: Benjamin Herrenschmidt 
> > > > > Cc: Fenghua Yu 
> > > > > Cc: Heiko Carstens 
> > > > > Cc: Linus Torvalds 
> > > > > Cc: Martin Schwidefsky 
> > > > > Cc: Michael Ellerman 
> > > > > Cc: Paul Mackerras 
> > > > > Cc: Peter Zijlstra 
> > > > > Cc: Rik van Riel 
> > > > > Cc: Stanislaw Gruszka 
> > > > > Cc: Thomas Gleixner 
> > > > > Cc: Tony Luck 
> > > > > Cc: Wanpeng Li 
> > > > > Link: 
> > > > > http://lkml.kernel.org/r/1485832191-26889-17-git-send-email-fweis...@gmail.com
> > > > > Signed-off-by: Ingo Molnar 
> > > > > 
> > > > > The reproducer is running a userspace udp_sink[1] program, and taskset
> > > > > pinning the process to the same CPU as softirq RX is running on, and
> > > > > starting a UDP flood with pktgen (tool part of kernel tree:
> > > > > samples/pktgen/pktgen_sample03_burst_single_flow.sh).
> > > > 
> > > > So that means I need to run udp_sink on the same CPU than pktgen?  
> > > 
> > > No, you misunderstood.  I run pktgen on another physical machine, which
> > > is sending UDP packets towards my Device-Under-Test (DUT) target.  The
> > > DUT-target is receiving packets and I observe which CPU the NIC is
> > > delivering these packets to.  
> > 
> > Ah ok, so I tried to run pktgen on another machine and I get that strange 
> > write error:
> > 
> > # ./pktgen_sample03_burst_single_flow.sh -d 192.168.1.3  -i wlan0
> > ./functions.sh: ligne 76 : echo: erreur d'�criture : Erreur inconnue 524
> > ERROR: Write error(1) occurred cmd: "clone_skb 10 > 
> > /proc/net/pktgen/wlan0@0"
> > 
> > Any idea?
> 
> Yes, this interface does not support pktgen "clone_skb".  You can
> supply cmdline argument "-c 0" to fix this.  But I suspect that this
> interface also does not support "burst", thus you also need "-b 0".
> 
> See all cmdline args via: ./pktgen_sample03_burst_single_flow.sh -h
> 
> Why are you using a wifi interface for this kind of overload testing?
> (the basic test here is making sure softirq is busy 100%, and at slow
> wifi speeds this might not be possible to force ksoftirqd into this
> scheduler state)

What? I need to raise from the couch and plug an ethernet cable?? ;-) ;-)

More seriously you're right, wifi probably won't be enough to trigger
the desired storm on the destination interface. I'm going to try with eth0,
that should also fix the clone_skb issues.

> > > > > After this commit, the udp_sink program does not get any sched CPU
> > > > > time, and no packets are delivered to userspace.  (All packets are
> > > > > dropped by softirq due to a full socket queue, nstat
> > > > > UdpRcvbufErrors).
> > > > > 
> > > > > A related symptom is that ksoftirqd no longer get accounted in
> > > > > top.
> > > > 
> > > > That's indeed what I observe. udp_sink has almost no CPU time,
> > > > neither has ksoftirqd but kpktgend_0 has everything.
> > > > 
> > > > Finally a bug I can reproduce!  
> > > 
> > > Good to hear you can reproduce it! :-)  
> > 
> > Well, since I was generating the packets locally, maybe it didn't trigger
> > the expected interrupts...
> 
> Well, you definitely didn't create the test case I was using.  I cannot
> remember 

Re: [RFC PATCH tip/master 0/3] kprobes: tracing: kretprobe_instance dynamic allocation

2017-03-29 Thread Frank Ch. Eigler

mhiramat wrote:

> Here is a correction of patches to introduce kretprobe_instance
> dynamic allocation for avoiding kretprobe silently miss-hits.
> [...]

Thanks, this looks automatically useful also to systemtap users.

- FChE


[PATCH] net: netfilter: remove unused variable

2017-03-29 Thread Arushi Singhal
This patch uses the following coccinelle script to remove
a variable that was simply used to store the return
value of a function call before returning it:

@@
identifier len,f;
@@

-int len;
 ... when != len
 when strict
-len =
+return
f(...);
-return len;

Signed-off-by: Arushi Singhal 
---
 net/netfilter/ipvs/ip_vs_ftp.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index d30c327bb578..9e1e682610ef 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -482,11 +482,9 @@ static struct pernet_operations ip_vs_ftp_ops = {
 
 static int __init ip_vs_ftp_init(void)
 {
-   int rv;
 
-   rv = register_pernet_subsys(&ip_vs_ftp_ops);
/* rcu_barrier() is called by netns on error */
-   return rv;
+   return register_pernet_subsys(&ip_vs_ftp_ops);
 }
 
 /*
-- 
2.11.0



Re: [PATCH 3/5] net/packet: fix overflow in check for tp_frame_nr

2017-03-29 Thread Craig Gallek
On Tue, Mar 28, 2017 at 1:19 PM, Andrey Konovalov  wrote:
> On Tue, Mar 28, 2017 at 5:54 PM, Craig Gallek  wrote:
>> On Tue, Mar 28, 2017 at 10:00 AM, Andrey Konovalov
>>  wrote:
>>> When calculating rb->frames_per_block * req->tp_block_nr the result
>>> can overflow.
>>>
>>> Add a check that tp_block_size * tp_block_nr <= UINT_MAX.
>>>
>>> Since frames_per_block <= tp_block_size, the expression would
>>> never overflow.
>>>
>>> Signed-off-by: Andrey Konovalov 
>>> ---
>>>  net/packet/af_packet.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>>> index 506348abdf2f..c5c43fff8c01 100644
>>> --- a/net/packet/af_packet.c
>>> +++ b/net/packet/af_packet.c
>>> @@ -4197,6 +4197,9 @@ static int packet_set_ring(struct sock *sk, union 
>>> tpacket_req_u *req_u,
>>> goto out;
>>> if (unlikely(req->tp_frame_size == 0))
>>> goto out;
>>> +   if (unlikely((u64)req->tp_block_size * req->tp_block_nr >
>>> +   UINT_MAX))
>>> +   goto out;
>> So this may be pedantic, but really the only guarantee that you have
>> for the 'unsigned int' type of these fields is that they are _at
>> least_ 16 bits.  There is no guarantee on the upper bound size, so
>> casting to a u64 will be problematic on a compiler that happens to use
>> 64 bits for 'unsigned int'.  I'm not aware of any that use greater
>> than 32 bits right now and using one that does may very well break
>> other things in the kernel, but here we are...  Perhaps a alternative
>> fix would be to do the multiplication into an 'unsigned int' type and
>> ensure that the result is larger than each of the original two values?
>
> I don't mind changing the check, but I've never encountered such compilers.
>
> Would this alternative work? It doesn't seem obvious.
>
> Other alternatives that I see for this check are:
>
> 1. req->tp_block_size > UINT_MAX / req->tp_block_nr
>
> 2. (req->tp_block_size * req->tp_block_nr) / req->tp_block_nr !=
> req->tp_block_size
>
> I'm not sure which one is better.
I'm by no means the style expert here, but I would go with whichever
makes the intention of the check (preventing overflow) most obvious.
Maybe #1 in your example?  I'm also not sure what the acceptable
assumptions about the size of 'int' are in the kernel code.  I'm sure
there's a thread out there with Linus expressing a strong feeling one
way or another, but I haven't found it yet ;)

>
>>
>> The real issue is that explicit size types should have been used in
>> this userspace structure.


Re: [PATCH net-next v6 01/11] bpf: Add eBPF program subtype and is_valid_subtype() verifier

2017-03-29 Thread kbuild test robot
Hi Mickaël,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Landlock-LSM-Toward-unprivileged-sandboxing/20170329-211258
config: i386-randconfig-x002-201713 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   kernel/bpf/syscall.c: In function 'bpf_prog_load':
>> kernel/bpf/syscall.c:886:25: warning: cast to pointer from integer of 
>> different size [-Wint-to-pointer-cast]
  size = check_user_buf((void __user *)attr->prog_subtype,
^

vim +886 kernel/bpf/syscall.c

   870  
   871  prog->orig_prog = NULL;
   872  prog->jited = 0;
   873  
   874  atomic_set(&prog->aux->refcnt, 1);
   875  prog->gpl_compatible = is_gpl ? 1 : 0;
   876  
   877  /* find program type: socket_filter vs tracing_filter */
   878  err = find_prog_type(type, prog);
   879  if (err < 0)
   880  goto free_prog;
   881  
   882  /* copy eBPF program subtype from user space */
   883  if (attr->prog_subtype) {
   884  __u32 size;
   885  
 > 886  size = check_user_buf((void __user *)attr->prog_subtype,
   887attr->prog_subtype_size,
   888sizeof(prog->subtype));
   889  if (size < 0) {
   890  err = size;
   891  goto free_prog;
   892  }
   893  /* prog->subtype is __GFP_ZERO */
   894  if (copy_from_user(&prog->subtype,

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH net v2 0/3] net/packet: fix multiple overflow issues in ring buffers

2017-03-29 Thread Andrey Konovalov
This patchset addresses multiple overflows and signedness-related issues
in packet socket ring buffers.

Changes in v2:
- remove cleanup patches, will send in a separate patchset
- use a > UINT_MAX / b to check for a * b overflow

Andrey Konovalov (3):
  net/packet: fix overflow in check for priv area size
  net/packet: fix overflow in check for tp_frame_nr
  net/packet: fix overflow in check for tp_reserve

 net/packet/af_packet.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

-- 
2.12.2.564.g063fe858b8-goog



[PATCH net v2 3/3] net/packet: fix overflow in check for tp_reserve

2017-03-29 Thread Andrey Konovalov
When calculating po->tp_hdrlen + po->tp_reserve the result can overflow.

Fix by checking that tp_reserve <= INT_MAX on assign.

Signed-off-by: Andrey Konovalov 
---
 net/packet/af_packet.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 3ac286ebb2f4..8489beff5c25 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3665,6 +3665,8 @@ packet_setsockopt(struct socket *sock, int level, int 
optname, char __user *optv
return -EBUSY;
if (copy_from_user(&val, optval, sizeof(val)))
return -EFAULT;
+   if (val > INT_MAX)
+   return -EINVAL;
po->tp_reserve = val;
return 0;
}
-- 
2.12.2.564.g063fe858b8-goog



[PATCH net v2 1/3] net/packet: fix overflow in check for priv area size

2017-03-29 Thread Andrey Konovalov
Subtracting tp_sizeof_priv from tp_block_size and casting to int
to check whether one is less then the other doesn't always work
(both of them are unsigned ints).

Compare them as is instead.

Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as
it can overflow inside BLK_PLUS_PRIV otherwise.

Signed-off-by: Andrey Konovalov 
---
 net/packet/af_packet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index a0dbe7ca8f72..2323ee35dc09 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4193,8 +4193,8 @@ static int packet_set_ring(struct sock *sk, union 
tpacket_req_u *req_u,
if (unlikely(!PAGE_ALIGNED(req->tp_block_size)))
goto out;
if (po->tp_version >= TPACKET_V3 &&
-   (int)(req->tp_block_size -
- BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0)
+   req->tp_block_size <=
+ BLK_PLUS_PRIV((u64)req_u->req3.tp_sizeof_priv))
goto out;
if (unlikely(req->tp_frame_size < po->tp_hdrlen +
po->tp_reserve))
-- 
2.12.2.564.g063fe858b8-goog



[PATCH net v2 2/3] net/packet: fix overflow in check for tp_frame_nr

2017-03-29 Thread Andrey Konovalov
When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow.

Add a check that tp_block_size * tp_block_nr <= UINT_MAX.

Since frames_per_block <= tp_block_size, the expression would
never overflow.

Signed-off-by: Andrey Konovalov 
---
 net/packet/af_packet.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 2323ee35dc09..3ac286ebb2f4 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4205,6 +4205,8 @@ static int packet_set_ring(struct sock *sk, union 
tpacket_req_u *req_u,
rb->frames_per_block = req->tp_block_size / req->tp_frame_size;
if (unlikely(rb->frames_per_block == 0))
goto out;
+   if (unlikely(req->tp_block_size > UINT_MAX / req->tp_block_nr))
+   goto out;
if (unlikely((rb->frames_per_block * req->tp_block_nr) !=
req->tp_frame_nr))
goto out;
-- 
2.12.2.564.g063fe858b8-goog



Re: [PATCH net-next 0/7] netconf: Add support for RTM_DELNETCONF

2017-03-29 Thread David Ahern
On 3/29/17 3:36 AM, Nicolas Dichtel wrote:
> Le 29/03/2017 à 07:32, David Miller a écrit :
>> From: David Ahern 
>> Date: Tue, 28 Mar 2017 14:28:00 -0700
>>
>>> netconf notifications are sent as devices register but not when they
>>> are deleted leaving userspace caches out of sync. Add support for
>>> RTM_DELNETCONF to ipv4, ipv6 and mpls.
> Not sure why those notifications are needed. When an interface is set down, 
> ipv4
> route deletion are not notified. Why is it needed for netconf?
> 

We carry a patch to send notifications for route deletes. In general, it
makes management of libnl caches much easier when delete notifications
are sent for each type. Without it, a link delete requires walking the
other caches removing objects.


Re: [Outreachy kernel] [PATCH] net: netfilter: remove unused variable

2017-03-29 Thread Julia Lawall


On Wed, 29 Mar 2017, Arushi Singhal wrote:

> This patch uses the following coccinelle script to remove
> a variable that was simply used to store the return
> value of a function call before returning it:
>
> @@
> identifier len,f;
> @@
>
> -int len;
>  ... when != len
>  when strict
> -len =
> +return
> f(...);
> -return len;
>
> Signed-off-by: Arushi Singhal 
> ---
>  net/netfilter/ipvs/ip_vs_ftp.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
> index d30c327bb578..9e1e682610ef 100644
> --- a/net/netfilter/ipvs/ip_vs_ftp.c
> +++ b/net/netfilter/ipvs/ip_vs_ftp.c
> @@ -482,11 +482,9 @@ static struct pernet_operations ip_vs_ftp_ops = {
>
>  static int __init ip_vs_ftp_init(void)
>  {
> - int rv;
>
> - rv = register_pernet_subsys(&ip_vs_ftp_ops);
>   /* rcu_barrier() is called by netns on error */
> - return rv;
> + return register_pernet_subsys(&ip_vs_ftp_ops);

It looks like you end up with an unnecessary blank line at the beginning
of the function.

julia

>  }
>
>  /*
> --
> 2.11.0
>
> --
> You received this message because you are subscribed to the Google Groups 
> "outreachy-kernel" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to outreachy-kernel+unsubscr...@googlegroups.com.
> To post to this group, send email to outreachy-ker...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/outreachy-kernel/20170329133848.GA1336%40arushi-HP-Pavilion-Notebook.
> For more options, visit https://groups.google.com/d/optout.
>


[PATCH] Add hardware PTP support.

2017-03-29 Thread Rafal Ozieblo
This patch is based on original Harini's patch and Andrei's patch,
implemented in aseparate file to ease the review/maintanance
and integration with other platforms.

In case that macb is compiled as a module, it has been renamed to
cadence-macb.ko to avoid naming confusion in Makefile.

This driver does support GEM-GXL:
- Enable HW time stamp
- Register ptp clock framework
- Initialize PTP related registers
- Updated dma buffer descriptor read/write mechanism
- HW time stamp on the PTP Ethernet packets are received using the
  SO_TIMESTAMPING API. Where timers are obtained from the dma buffer
  descriptors
- Added tsu_clk to device tree

Note: Patch on net-next, on March 15th.

Signed-off-by: Rafal Ozieblo 
---
 Documentation/devicetree/bindings/net/macb.txt |   1 +
 drivers/net/ethernet/cadence/Kconfig   |  10 +-
 drivers/net/ethernet/cadence/Makefile  |   7 +-
 drivers/net/ethernet/cadence/macb.c| 237 ++--
 drivers/net/ethernet/cadence/macb.h| 176 +-
 drivers/net/ethernet/cadence/macb_ptp.c| 724 +
 6 files changed, 1109 insertions(+), 46 deletions(-)
 create mode 100755 drivers/net/ethernet/cadence/macb_ptp.c

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index 1506e94..27966ae 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -22,6 +22,7 @@ Required properties:
Required elements: 'pclk', 'hclk'
Optional elements: 'tx_clk'
Optional elements: 'rx_clk' applies to cdns,zynqmp-gem
+   Optional elements: 'tsu_clk'
 - clocks: Phandles to input clocks.
 
 Optional properties for PHY child node:
diff --git a/drivers/net/ethernet/cadence/Kconfig 
b/drivers/net/ethernet/cadence/Kconfig
index 608bea1..427d65a 100644
--- a/drivers/net/ethernet/cadence/Kconfig
+++ b/drivers/net/ethernet/cadence/Kconfig
@@ -29,7 +29,15 @@ config MACB
  support for the MACB/GEM chip.
 
  To compile this driver as a module, choose M here: the module
- will be called macb.
+ will be macb.
+
+config MACB_USE_HWSTAMP
+   bool "Use IEEE 1588 hwstamp"
+   depends on MACB
+   default y
+   imply PTP_1588_CLOCK
+   ---help---
+ Enable IEEE 1588 Precision Time Protocol (PTP) support for MACB.
 
 config MACB_PCI
tristate "Cadence PCI MACB/GEM support"
diff --git a/drivers/net/ethernet/cadence/Makefile 
b/drivers/net/ethernet/cadence/Makefile
index 4ba7559..a7f6e04 100644
--- a/drivers/net/ethernet/cadence/Makefile
+++ b/drivers/net/ethernet/cadence/Makefile
@@ -1,6 +1,11 @@
 #
 # Makefile for the Atmel network device drivers.
 #
+cadence-macb-y := macb.o
 
-obj-$(CONFIG_MACB) += macb.o
+ifeq ($(CONFIG_MACB_USE_HWSTAMP),y)
+cadence-macb-y += macb_ptp.o
+endif
+
+obj-$(CONFIG_MACB) += cadence-macb.o
 obj-$(CONFIG_MACB_PCI) += macb_pci.o
diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 30606b1..32af94e 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -79,33 +79,84 @@
 #define MACB_HALT_TIMEOUT  1230
 
 /* DMA buffer descriptor might be different size
- * depends on hardware configuration.
+ * depends on hardware configuration:
+ *
+ * 1. dma address width 32 bits:
+ *word 1: 32 bit address of Data Buffer
+ *word 2: control
+ *
+ * 2. dma address width 64 bits:
+ *word 1: 32 bit address of Data Buffer
+ *word 2: control
+ *word 3: upper 32 bit address of Data Buffer
+ *word 4: unused
+ *
+ * 3. dma address width 32 bits with hardware timestamping:
+ *word 1: 32 bit address of Data Buffer
+ *word 2: control
+ *word 3: timestamp word 1
+ *word 4: timestamp word 2
+ *
+ * 4. dma address width 64 bits with hardware timestamping:
+ *word 1: 32 bit address of Data Buffer
+ *word 2: control
+ *word 3: upper 32 bit address of Data Buffer
+ *word 4: unused
+ *word 5: timestamp word 1
+ *word 6: timestamp word 2
  */
 static unsigned int macb_dma_desc_get_size(struct macb *bp)
 {
-#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
-   if (bp->hw_dma_cap == HW_DMA_CAP_64B)
-   return sizeof(struct macb_dma_desc) + sizeof(struct 
macb_dma_desc_64);
+#ifdef MACB_EXT_DESC
+   unsigned int desc_size;
+
+   switch (bp->hw_dma_cap) {
+   case HW_DMA_CAP_64B:
+   desc_size = sizeof(struct macb_dma_desc)
+   + sizeof(struct macb_dma_desc_64);
+   break;
+   case HW_DMA_CAP_PTP:
+   desc_size = sizeof(struct macb_dma_desc)
+   + sizeof(struct macb_dma_desc_ptp);
+   break;
+   case HW_DMA_CAP_64B_PTP:
+   desc_size = sizeof(struct macb_dma_desc)
+   + sizeof(struct macb_dma_desc_64)
+   + sizeof(struct macb_dma_desc_ptp);
+   break;
+   de

Re: [PATCH net-next 0/7] netconf: Add support for RTM_DELNETCONF

2017-03-29 Thread Nicolas Dichtel
Le 29/03/2017 à 16:13, David Ahern a écrit :
> On 3/29/17 3:36 AM, Nicolas Dichtel wrote:
>> Le 29/03/2017 à 07:32, David Miller a écrit :
>>> From: David Ahern 
>>> Date: Tue, 28 Mar 2017 14:28:00 -0700
>>>
 netconf notifications are sent as devices register but not when they
 are deleted leaving userspace caches out of sync. Add support for
 RTM_DELNETCONF to ipv4, ipv6 and mpls.
>> Not sure why those notifications are needed. When an interface is set down, 
>> ipv4
>> route deletion are not notified. Why is it needed for netconf?
>>
> 
> We carry a patch to send notifications for route deletes. In general, it
> makes management of libnl caches much easier when delete notifications
> are sent for each type. Without it, a link delete requires walking the
> other caches removing objects.
> 
David rejected this kind of patch several times, but maybe he changed his mind.


Re: [PATCH] ezchip: nps_enet: check if napi has been completed

2017-03-29 Thread Eric Dumazet
On Wed, 2017-03-29 at 13:41 +0300, Vlad Zakharov wrote:
> After a new NAPI_STATE_MISSED state was added to NAPI we can get into
> this state and in such case we have to reschedule NAPI as some work is
> still pending and we have to process it. napi_complete_done() function
> returns false if we have to reschedule something (e.g. in case we were
> in MISSED state) as current polling have not been completed yet.
> 
> nps_enet driver hasn't been verifying the return value of
> napi_complete_done() and has been forcibly enabling interrupts. That is
> not correct as we should not enable interrupts before we have processed
> all scheduled work. As a result we were getting trapped in interrupt
> hanlder chain as we had never been able to disabale ethernet
> interrupts again.
> 
> So this patch makes nps_enet_poll() func verify return value of
> napi_complete_done() and enable interrupts only in case all scheduled
> work has been completed.
> 
> Signed-off-by: Vlad Zakharov 
> ---
>  drivers/net/ethernet/ezchip/nps_enet.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ezchip/nps_enet.c 
> b/drivers/net/ethernet/ezchip/nps_enet.c
> index 992ebe9..f819843 100644
> --- a/drivers/net/ethernet/ezchip/nps_enet.c
> +++ b/drivers/net/ethernet/ezchip/nps_enet.c
> @@ -189,11 +189,9 @@ static int nps_enet_poll(struct napi_struct *napi, int 
> budget)
>  
>   nps_enet_tx_handler(ndev);
>   work_done = nps_enet_rx_handler(ndev);
> - if (work_done < budget) {
> + if ((work_done < budget) && napi_complete_done(napi, work_done)) {
>   u32 buf_int_enable_value = 0;
>  
> - napi_complete_done(napi, work_done);
> -
>   /* set tx_done and rx_rdy bits */
>   buf_int_enable_value |= NPS_ENET_ENABLE << RX_RDY_SHIFT;
>   buf_int_enable_value |= NPS_ENET_ENABLE << TX_DONE_SHIFT;

Seems fine, but looking at this driver, it looks it has some races,
trying to be a bit too smart.

nps_enet_irq_handler() really should be simpler, or the risk of missing
an interrupt might be high.

diff --git a/drivers/net/ethernet/ezchip/nps_enet.c 
b/drivers/net/ethernet/ezchip/nps_enet.c
index 
992ebe973d25bfbccff7b5c42dc1801ea41fc9ea..03885ac0c0f845805eadb4659302b5c11bb250f6
 100644
--- a/drivers/net/ethernet/ezchip/nps_enet.c
+++ b/drivers/net/ethernet/ezchip/nps_enet.c
@@ -233,14 +233,11 @@ static irqreturn_t nps_enet_irq_handler(s32 irq, void 
*dev_instance)
 {
struct net_device *ndev = dev_instance;
struct nps_enet_priv *priv = netdev_priv(ndev);
-   u32 rx_ctrl_value = nps_enet_reg_get(priv, NPS_ENET_REG_RX_CTL);
-   u32 rx_ctrl_cr = (rx_ctrl_value & RX_CTL_CR_MASK) >> RX_CTL_CR_SHIFT;
 
-   if (nps_enet_is_tx_pending(priv) || rx_ctrl_cr)
-   if (likely(napi_schedule_prep(&priv->napi))) {
-   nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE, 0);
-   __napi_schedule(&priv->napi);
-   }
+   if (likely(napi_schedule_prep(&priv->napi))) {
+   nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE, 0);
+   __napi_schedule(&priv->napi);
+   }
 
return IRQ_HANDLED;
 }






[PATCH v2] net: netfilter: remove unused variable

2017-03-29 Thread Arushi Singhal
This patch uses the following coccinelle script to remove
a variable that was simply used to store the return
value of a function call before returning it:

@@
identifier len,f;
@@

-int len;
 ... when != len
 when strict
-len =
+return
f(...);
-return len;

Signed-off-by: Arushi Singhal 
---
changes in v2
   - remove extra blank line.

 net/netfilter/ipvs/ip_vs_ftp.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index d30c327bb578..c93c93762354 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -482,11 +482,8 @@ static struct pernet_operations ip_vs_ftp_ops = {
 
 static int __init ip_vs_ftp_init(void)
 {
-   int rv;
-
-   rv = register_pernet_subsys(&ip_vs_ftp_ops);
/* rcu_barrier() is called by netns on error */
-   return rv;
+   return register_pernet_subsys(&ip_vs_ftp_ops);
 }
 
 /*
-- 
2.11.0



Re: [PATCH ethtool] ethtool: Support for configurable RSS hash function

2017-03-29 Thread John W. Linville
On Wed, Mar 29, 2017 at 02:56:38PM +0300, Gal Pressman wrote:
> On 27/03/2017 21:02, John W. Linville wrote:
> > On Sat, Mar 25, 2017 at 02:50:47PM -0700, Jakub Kicinski wrote:
> >> On Wed,  8 Mar 2017 16:03:51 +0200, Gal Pressman wrote:
> >>> This ethtool patch adds support to set and get the current RSS hash
> >>> function for the device through the new hfunc mask field in the
> >>> ethtool_rxfh struct. Kernel supported hash function names are queried
> >>> with ETHTOOL_GSTRINGS - each string is corresponding with a bit in hfunc
> >>> mask according to its index in the string-set.
> >>>
> >>> Signed-off-by: Eyal Perry 
> >>> Signed-off-by: Gal Pressman 
> >>> Reviewed-by: Saeed Mahameed 
> >> Hi John,
> >>
> >> It seems you have applied both my earlier patch with get support and
> >> this:
> >>
> >> adbaa18b9bc1 ("ethtool: Support for configurable RSS hash function")
> >> b932835d2302 ("ethtool: print hash function with ethtool 
> >> -x|--show-rxfh-indir")

Damn it! I was bamboozled...

> >> Now we print the RSS function twice:
> >>
> >> RX flow hash indirection table for em4 with 4 RX ring(s):
> >> 0:  [...]
> >> RSS hash function: toeplitz  <--- from my adbaa18b9bc1

This should have said "<--- from my b932835d2302"...

> >> RSS hash key:
> >> Operation not supported
> >> RSS hash function:   <--- from this patch
> >> toeplitz: on
> >> xor: off
> >> crc32: off
> >>
> >> Reverting my patch is probably the easiest way forward, although I find
> >> it more concise and easier to parse in test scripts :)
> > Thanks for pointing-out this issue! I apologize for my own confusion.
> >
> > As you suggest, I will be reverting your patch.
> >
> > Thanks,
> >
> > John
> 126464e4da18 ('Revert "ethtool: Support for configurable RSS hash function"')
> 
> Seems like you ended up reverting my patch instead of Jakub's?
> We lost the set hfunc functionality.

Confused by the swapped commit IDs above, I dutifully reverted
"adbaa18b9bc1"...ugh...

Alright, I'll have to tighten-up my game a bit! From here I'll revert
b932835d2302 and reinstate adbaa18b9bc1, hopefully bringing this
little misadventure to a close.

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.


Re: [PATCH net-next v6 01/11] bpf: Add eBPF program subtype and is_valid_subtype() verifier (fwd)

2017-03-29 Thread Julia Lawall
Size is unsigned, so not negative.

julia

-- Forwarded message --
Date: Wed, 29 Mar 2017 23:06:01 +0800
From: kbuild test robot 
To: kbu...@01.org
Cc: Julia Lawall 
Subject: Re: [PATCH net-next v6 01/11] bpf: Add eBPF program subtype and
is_valid_subtype() verifier

In-Reply-To: <20170328234650.19695-2-...@digikod.net>
TO: "Mickaël Salaün" 

Hi Mickaël,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Landlock-LSM-Toward-unprivileged-sandboxing/20170329-211258
:: branch date: 2 hours ago
:: commit date: 2 hours ago

>> kernel/bpf/syscall.c:1041:5-9: WARNING: Unsigned expression compared with 
>> zero: size < 0

git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 07d282aef4f60235407284c0be81d01e352e040b
vim +1041 kernel/bpf/syscall.c

f4324551 Daniel Mack2016-11-23  1025return -EINVAL;
f4324551 Daniel Mack2016-11-23  1026}
f4324551 Daniel Mack2016-11-23  1027
7f677633 Alexei Starovoitov 2017-02-10  1028return ret;
f4324551 Daniel Mack2016-11-23  1029  }
f4324551 Daniel Mack2016-11-23  1030  #endif /* CONFIG_CGROUP_BPF */
f4324551 Daniel Mack2016-11-23  1031
99c55f7d Alexei Starovoitov 2014-09-26  1032  SYSCALL_DEFINE3(bpf, int, cmd, 
union bpf_attr __user *, uattr, unsigned int, size)
99c55f7d Alexei Starovoitov 2014-09-26  1033  {
99c55f7d Alexei Starovoitov 2014-09-26  1034union bpf_attr attr = {};
99c55f7d Alexei Starovoitov 2014-09-26  1035int err;
99c55f7d Alexei Starovoitov 2014-09-26  1036
1be7f75d Alexei Starovoitov 2015-10-07  1037if (!capable(CAP_SYS_ADMIN) && 
sysctl_unprivileged_bpf_disabled)
99c55f7d Alexei Starovoitov 2014-09-26  1038return -EPERM;
99c55f7d Alexei Starovoitov 2014-09-26  1039
07d282ae Mickaël Salaün 2017-03-29  1040size = check_user_buf((void 
__user *)uattr, size, sizeof(attr));
07d282ae Mickaël Salaün 2017-03-29 @1041if (size < 0)
07d282ae Mickaël Salaün 2017-03-29  1042return size;
99c55f7d Alexei Starovoitov 2014-09-26  1043
99c55f7d Alexei Starovoitov 2014-09-26  1044/* copy attributes from user 
space, may be less than sizeof(bpf_attr) */
99c55f7d Alexei Starovoitov 2014-09-26  1045if (copy_from_user(&attr, 
uattr, size) != 0)
99c55f7d Alexei Starovoitov 2014-09-26  1046return -EFAULT;
99c55f7d Alexei Starovoitov 2014-09-26  1047
99c55f7d Alexei Starovoitov 2014-09-26  1048switch (cmd) {
99c55f7d Alexei Starovoitov 2014-09-26  1049case BPF_MAP_CREATE:

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

[PATCH net-next] drivers: add explicit interrupt.h includes

2017-03-29 Thread Florian Westphal
These files all use functions declared in interrupt.h, but currently rely
on implicit inclusion of this file (via netns/xfrm.h).

That won't work anymore when the flow cache is removed so include that
header where needed.

Signed-off-by: Florian Westphal 
---
 drivers/infiniband/hw/nes/nes.h   | 1 +
 drivers/net/dsa/mv88e6xxx/global2.c   | 1 +
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  | 1 +
 drivers/net/ethernet/amd/xgbe/xgbe-i2c.c  | 1 +
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 1 +
 drivers/net/ethernet/broadcom/bgmac.c | 1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 ++
 drivers/net/ethernet/cavium/liquidio/lio_main.c   | 1 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c| 1 +
 drivers/net/ethernet/ezchip/nps_enet.c| 1 +
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c  | 1 +
 drivers/net/ethernet/qualcomm/emac/emac-sgmii.c   | 1 +
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c | 1 +
 drivers/net/wireless/st/cw1200/cw1200_sdio.c  | 1 +
 drivers/usb/gadget/function/f_ncm.c   | 1 +
 net/mac802154/ieee802154_i.h  | 1 +
 net/smc/smc_ib.h  | 1 +
 17 files changed, 18 insertions(+)

diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
index 85acd0843b50..3f9e56e8b379 100644
--- a/drivers/infiniband/hw/nes/nes.h
+++ b/drivers/infiniband/hw/nes/nes.h
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/dsa/mv88e6xxx/global2.c 
b/drivers/net/dsa/mv88e6xxx/global2.c
index 0b8601f8536e..132559d46b95 100644
--- a/drivers/net/dsa/mv88e6xxx/global2.c
+++ b/drivers/net/dsa/mv88e6xxx/global2.c
@@ -13,6 +13,7 @@
  * (at your option) any later version.
  */
 
+#include 
 #include 
 #include "mv88e6xxx.h"
 #include "global2.h"
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 54593e03d821..c772420fa41c 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -118,6 +118,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c
index 0c7088a426e9..417bdb5982a9 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c
@@ -115,6 +115,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
index 4c5b90eea4af..b672d9249539 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
@@ -114,6 +114,7 @@
  * THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index e1a24ee6ab8b..ba4d2e145bb9 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -11,6 +11,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index c7a5b84a5cb2..3cb07778a690 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -18,6 +18,8 @@
 #define DRV_VER_MIN7
 #define DRV_VER_UPD0
 
+#include 
+
 struct tx_bd {
__le32 tx_bd_len_flags_type;
#define TX_BD_TYPE  (0x3f << 0)
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index b22291906fcc..a8426d3d05d0 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -16,6 +16,7 @@
  * NONINFRINGEMENT.  See the GNU General Public License for more details.
  ***/
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 89fd81abab9a..174d748b5928 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -16,6 +16,7 @@
  * NONINFRINGEMENT.  See the GNU General Public License for more details.
  ***/
 #include 
+#include 
 #include 
 #include 
 #include "liquidio_common.h"
diff --git a/drivers/net/ethernet/ezchip/nps_enet.c 
b/drivers/net/ethernet/ezchip/nps_enet.c
index 992ebe973d25..70165fcbff9c 100644
--- a/drivers/net/ethernet/ezchip/nps_enet.c
+++ b/drivers/net/ethernet/ezchip/nps_enet.c
@@ 

Re: [PATCH net-next v6 04/11] landlock: Add LSM hooks related to filesystem

2017-03-29 Thread kbuild test robot
Hi Mickaël,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Landlock-LSM-Toward-unprivileged-sandboxing/20170329-211258
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from security/landlock/hooks.c:12:0:
   security/landlock/hooks.c: In function 'landlock_decide':
>> arch/x86/include/asm/processor.h:824:39: error: implicit declaration of 
>> function 'task_stack_page' [-Werror=implicit-function-declaration]
 unsigned long __ptr = (unsigned long)task_stack_page(task); \
  ^
>> security/landlock/hooks.c:107:41: note: in expansion of macro 'task_pt_regs'
  .syscall_nr = syscall_get_nr(current, task_pt_regs(current)),
^~~~
   security/landlock/hooks.c:104:26: warning: unused variable 'ctx' 
[-Wunused-variable]
 struct landlock_context ctx = {
 ^~~
   security/landlock/hooks.c:102:6: warning: unused variable 'event_idx' 
[-Wunused-variable]
 u32 event_idx = get_index(event);
 ^
   cc1: some warnings being treated as errors

vim +/task_stack_page +824 arch/x86/include/asm/processor.h

2f66dcc9 include/asm-x86/processor.h  Glauber de Oliveira Costa 2008-01-30  
818   * Therefore beware: accessing the ss/esp fields of the
2f66dcc9 include/asm-x86/processor.h  Glauber de Oliveira Costa 2008-01-30  
819   * "struct pt_regs" is possible, but they may contain the
2f66dcc9 include/asm-x86/processor.h  Glauber de Oliveira Costa 2008-01-30  
820   * completely wrong values.
2f66dcc9 include/asm-x86/processor.h  Glauber de Oliveira Costa 2008-01-30  
821   */
2f66dcc9 include/asm-x86/processor.h  Glauber de Oliveira Costa 2008-01-30  
822  #define task_pt_regs(task) \
2f66dcc9 include/asm-x86/processor.h  Glauber de Oliveira Costa 2008-01-30  
823  ({ \
5c39403e arch/x86/include/asm/processor.h Denys Vlasenko2015-03-13 
@824 unsigned long __ptr = (unsigned long)task_stack_page(task); \
5c39403e arch/x86/include/asm/processor.h Denys Vlasenko2015-03-13  
825 __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \
5c39403e arch/x86/include/asm/processor.h Denys Vlasenko2015-03-13  
826 ((struct pt_regs *)__ptr) - 1;  \
2f66dcc9 include/asm-x86/processor.h  Glauber de Oliveira Costa 2008-01-30  
827  })

:: The code at line 824 was first introduced by commit
:: 5c39403e004bec75ce0c549541be5479595d6ad0 x86/asm/entry: Simplify 
task_pt_regs() macro definition

:: TO: Denys Vlasenko 
:: CC: Ingo Molnar 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH net] ibmvnic: Remove debugfs support

2017-03-29 Thread Nathan Fontenot
The debugfs support in the ibmvnic driver is not, and never has been,
supported. Just remove it.

The work done in the debugfs code for the driver was part of the original
spec for the ibmvnic driver. The corresponding support for this from the
server side was never supported and has been dropped.

Signed-off-by: Nathan Fontenot 
---
 drivers/net/ethernet/ibm/ibmvnic.c |  628 
 drivers/net/ethernet/ibm/ibmvnic.h |   30 --
 2 files changed, 658 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 3d73182..1e8ba78 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -65,7 +65,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -615,20 +614,10 @@ static void ibmvnic_release_resources(struct 
ibmvnic_adapter *adapter)
release_sub_crqs(adapter);
ibmvnic_release_crq_queue(adapter);
 
-   if (adapter->debugfs_dir && !IS_ERR(adapter->debugfs_dir))
-   debugfs_remove_recursive(adapter->debugfs_dir);
-
if (adapter->stats_token)
dma_unmap_single(dev, adapter->stats_token,
 sizeof(struct ibmvnic_statistics),
 DMA_FROM_DEVICE);
-
-   if (adapter->ras_comps)
-   dma_free_coherent(dev, adapter->ras_comp_num *
- sizeof(struct ibmvnic_fw_component),
- adapter->ras_comps, adapter->ras_comps_tok);
-
-   kfree(adapter->ras_comp_int);
 }
 
 static int ibmvnic_close(struct net_device *netdev)
@@ -2332,57 +2321,6 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq,
kfree(error_buff);
 }
 
-static void handle_dump_size_rsp(union ibmvnic_crq *crq,
-struct ibmvnic_adapter *adapter)
-{
-   int len = be32_to_cpu(crq->request_dump_size_rsp.len);
-   struct ibmvnic_inflight_cmd *inflight_cmd;
-   struct device *dev = &adapter->vdev->dev;
-   union ibmvnic_crq newcrq;
-   unsigned long flags;
-
-   /* allocate and map buffer */
-   adapter->dump_data = kmalloc(len, GFP_KERNEL);
-   if (!adapter->dump_data) {
-   complete(&adapter->fw_done);
-   return;
-   }
-
-   adapter->dump_data_token = dma_map_single(dev, adapter->dump_data, len,
- DMA_FROM_DEVICE);
-
-   if (dma_mapping_error(dev, adapter->dump_data_token)) {
-   if (!firmware_has_feature(FW_FEATURE_CMO))
-   dev_err(dev, "Couldn't map dump data\n");
-   kfree(adapter->dump_data);
-   complete(&adapter->fw_done);
-   return;
-   }
-
-   inflight_cmd = kmalloc(sizeof(*inflight_cmd), GFP_ATOMIC);
-   if (!inflight_cmd) {
-   dma_unmap_single(dev, adapter->dump_data_token, len,
-DMA_FROM_DEVICE);
-   kfree(adapter->dump_data);
-   complete(&adapter->fw_done);
-   return;
-   }
-
-   memset(&newcrq, 0, sizeof(newcrq));
-   newcrq.request_dump.first = IBMVNIC_CRQ_CMD;
-   newcrq.request_dump.cmd = REQUEST_DUMP;
-   newcrq.request_dump.ioba = cpu_to_be32(adapter->dump_data_token);
-   newcrq.request_dump.len = cpu_to_be32(adapter->dump_data_size);
-
-   memcpy(&inflight_cmd->crq, &newcrq, sizeof(newcrq));
-
-   spin_lock_irqsave(&adapter->inflight_lock, flags);
-   list_add_tail(&inflight_cmd->list, &adapter->inflight);
-   spin_unlock_irqrestore(&adapter->inflight_lock, flags);
-
-   ibmvnic_send_crq(adapter, &newcrq);
-}
-
 static void handle_error_indication(union ibmvnic_crq *crq,
struct ibmvnic_adapter *adapter)
 {
@@ -2563,7 +2501,6 @@ static int handle_login_rsp(union ibmvnic_crq 
*login_rsp_crq,
struct device *dev = &adapter->vdev->dev;
struct ibmvnic_login_rsp_buffer *login_rsp = adapter->login_rsp_buf;
struct ibmvnic_login_buffer *login = adapter->login_buf;
-   union ibmvnic_crq crq;
int i;
 
dma_unmap_single(dev, adapter->login_buf_token, adapter->login_buf_sz,
@@ -2598,11 +2535,6 @@ static int handle_login_rsp(union ibmvnic_crq 
*login_rsp_crq,
}
complete(&adapter->init_done);
 
-   memset(&crq, 0, sizeof(crq));
-   crq.request_ras_comp_num.first = IBMVNIC_CRQ_CMD;
-   crq.request_ras_comp_num.cmd = REQUEST_RAS_COMP_NUM;
-   ibmvnic_send_crq(adapter, &crq);
-
return 0;
 }
 
@@ -2838,476 +2770,6 @@ static void handle_query_cap_rsp(union ibmvnic_crq *crq,
}
 }
 
-static void handle_control_ras_rsp(union ibmvnic_crq *crq,
-  struct ibmvnic_adapter *adapter)
-{
-   u8 correlator = crq->control_ras_rsp.correlator;
-   struct device *dev = &adapter->vdev->dev;
-   bool found = false;
-   

[PATCH][V2] VSOCK: remove unnecessary ternary operator on return value

2017-03-29 Thread Colin King
From: Colin Ian King 

Rather than assign the positive errno values to ret and then
checking if it is positive and flip the sign, just return the
errno value.

Detected by CoverityScan, CID#986649 ("Logically Dead Code")

Signed-off-by: Colin Ian King 
---
 net/vmw_vsock/vmci_transport.c | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 4be4fbbc0b50..10ae7823a19d 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -96,31 +96,23 @@ static int PROTOCOL_OVERRIDE = -1;
 
 static s32 vmci_transport_error_to_vsock_error(s32 vmci_error)
 {
-   int err;
-
switch (vmci_error) {
case VMCI_ERROR_NO_MEM:
-   err = ENOMEM;
-   break;
+   return -ENOMEM;
case VMCI_ERROR_DUPLICATE_ENTRY:
case VMCI_ERROR_ALREADY_EXISTS:
-   err = EADDRINUSE;
-   break;
+   return -EADDRINUSE;
case VMCI_ERROR_NO_ACCESS:
-   err = EPERM;
-   break;
+   return -EPERM;
case VMCI_ERROR_NO_RESOURCES:
-   err = ENOBUFS;
-   break;
+   return -ENOBUFS;
case VMCI_ERROR_INVALID_RESOURCE:
-   err = EHOSTUNREACH;
-   break;
+   return -EHOSTUNREACH;
case VMCI_ERROR_INVALID_ARGS:
default:
-   err = EINVAL;
+   break;
}
-
-   return err > 0 ? -err : err;
+   return -EINVAL;
 }
 
 static u32 vmci_transport_peer_rid(u32 peer_cid)
-- 
2.11.0



Re: [PATCH net v2 1/3] net/packet: fix overflow in check for priv area size

2017-03-29 Thread Eric Dumazet
On Wed, 2017-03-29 at 16:11 +0200, Andrey Konovalov wrote:
> Subtracting tp_sizeof_priv from tp_block_size and casting to int
> to check whether one is less then the other doesn't always work
> (both of them are unsigned ints).
> 
> Compare them as is instead.
> 
> Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as
> it can overflow inside BLK_PLUS_PRIV otherwise.
> 
> Signed-off-by: Andrey Konovalov 
> ---

Acked-by: Eric Dumazet 




Re: [PATCH net v2 3/3] net/packet: fix overflow in check for tp_reserve

2017-03-29 Thread Eric Dumazet
On Wed, 2017-03-29 at 16:11 +0200, Andrey Konovalov wrote:
> When calculating po->tp_hdrlen + po->tp_reserve the result can overflow.
> 
> Fix by checking that tp_reserve <= INT_MAX on assign.
> 
> Signed-off-by: Andrey Konovalov 
> ---

Acked-by: Eric Dumazet 

Thanks !




Re: [PATCH net v2 2/3] net/packet: fix overflow in check for tp_frame_nr

2017-03-29 Thread Eric Dumazet
On Wed, 2017-03-29 at 16:11 +0200, Andrey Konovalov wrote:
> When calculating rb->frames_per_block * req->tp_block_nr the result
> can overflow.
> 
> Add a check that tp_block_size * tp_block_nr <= UINT_MAX.
> 
> Since frames_per_block <= tp_block_size, the expression would
> never overflow.
> 
> Signed-off-by: Andrey Konovalov 
> ---

Acked-by: Eric Dumazet 




[PATCH] Add checks for kmalloc allocation failures

2017-03-29 Thread Colin King
From: Colin Ian King 

Ensure we don't end up with a null pointer dereferences by checking
for for allocation failures.  Allocate by sizeof(*ptr) rather than
the type to fix checkpack warnings.  Also merge multiple lines into
one line for the kmalloc call.

Detected by CoverityScan, CID#1422435 ("Dereference null return value")

Signed-off-by: Colin Ian King 
---
 drivers/net/ieee802154/ca8210.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c
index 53fa87bfede0..25fd3b04b3c0 100644
--- a/drivers/net/ieee802154/ca8210.c
+++ b/drivers/net/ieee802154/ca8210.c
@@ -634,6 +634,8 @@ static int ca8210_test_int_driver_write(
dev_dbg(&priv->spi->dev, "%#03x\n", buf[i]);
 
fifo_buffer = kmalloc(len, GFP_KERNEL);
+   if (!fifo_buffer)
+   return -ENOMEM;
memcpy(fifo_buffer, buf, len);
kfifo_in(&test->up_fifo, &fifo_buffer, 4);
wake_up_interruptible(&priv->test.readq);
@@ -759,10 +761,10 @@ static void ca8210_rx_done(struct cas_control *cas_ctl)
&priv->spi->dev,
"Resetting MAC...\n");
 
-   mlme_reset_wpc = kmalloc(
-   sizeof(struct work_priv_container),
-   GFP_KERNEL
-   );
+   mlme_reset_wpc = kmalloc(sizeof(*mlme_reset_wpc),
+GFP_KERNEL);
+   if (!mlme_reset_wpc)
+   goto finish;
INIT_WORK(
&mlme_reset_wpc->work,
ca8210_mlme_reset_worker
@@ -925,10 +927,10 @@ static int ca8210_spi_transfer(
 
dev_dbg(&spi->dev, "ca8210_spi_transfer called\n");
 
-   cas_ctl = kmalloc(
-   sizeof(struct cas_control),
-   GFP_ATOMIC
-   );
+   cas_ctl = kmalloc(sizeof(*cas_ctl), GFP_ATOMIC);
+   if (!cas_ctl)
+   return -ENOMEM;
+
cas_ctl->priv = priv;
memset(cas_ctl->tx_buf, SPI_IDLE, CA8210_SPI_BUF_SIZE);
memset(cas_ctl->tx_in_buf, SPI_IDLE, CA8210_SPI_BUF_SIZE);
-- 
2.11.0



Re: [PATCH net-next v6 11/11] landlock: Add user and kernel documentation for Landlock

2017-03-29 Thread kbuild test robot
Hi Mickaël,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Landlock-LSM-Toward-unprivileged-sandboxing/20170329-211258
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   include/linux/init.h:1: warning: no structured comments found
   kernel/sched/core.c:2085: warning: No description found for parameter 'rf'
   kernel/sched/core.c:2085: warning: Excess function parameter 'cookie' 
description in 'try_to_wake_up_local'
   include/linux/kthread.h:26: warning: Excess function parameter '...' 
description in 'kthread_create'
   kernel/sys.c:1: warning: no structured comments found
   include/linux/device.h:969: warning: No description found for parameter 
'dma_ops'
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
   include/linux/iio/iio.h:597: warning: No description found for parameter 
'trig_readonly'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 
'indio_dev'
   include/linux/iio/trigger.h:151: warning: No description found for parameter 
'trig'
   include/linux/device.h:970: warning: No description found for parameter 
'dma_ops'
   drivers/regulator/core.c:1467: warning: Excess function parameter 'ret' 
description in 'regulator_dev_lookup'
   include/drm/drm_drv.h:438: warning: No description found for parameter 'open'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'preclose'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'postclose'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'lastclose'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'set_busid'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'irq_handler'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'irq_preinstall'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'irq_postinstall'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'irq_uninstall'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'debugfs_init'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'debugfs_cleanup'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_open_object'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_close_object'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'prime_handle_to_fd'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'prime_fd_to_handle'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_prime_export'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_prime_import'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_prime_pin'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_prime_unpin'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_prime_res_obj'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_prime_get_sg_table'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_prime_import_sg_table'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_prime_vmap'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_prime_vunmap'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_prime_mmap'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'gem_vm_ops'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'major'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'minor'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'patchlevel'
   include/drm/drm_drv.h:438: warning: No description found for parameter 'name'
   include/drm/drm_drv.h:438: warning: No description found for parameter 'desc'
   include/drm/drm_drv.h:438: warning: No description found for parameter 'date'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'driver_features'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'ioctls'
   include/drm/drm_drv.h:438: warning: No description found for parameter 
'num_ioctls'
   include/drm/drm_drv.h:438: warning: No description 

Re: [PATCH] virtio_net: enable big packets for large MTU values

2017-03-29 Thread Michael S. Tsirkin
On Wed, Mar 29, 2017 at 03:38:09PM +0300, Michael S. Tsirkin wrote:
> If one enables e.g. jumbo frames without mergeable
> buffers, packets won't fit in 1500 byte buffers
> we use. Switch to big packet mode instead.
> TODO: make sizing more exact, possibly extend small
> packet mode to use larger pages.
> 
> Signed-off-by: Michael S. Tsirkin 

Fixes: 14de9d114a82 ("virtio-net: Add initial MTU advice feature")

> ---
>  drivers/net/virtio_net.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index e0fb3707..9dc31dc 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2428,6 +2428,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>   dev->mtu = mtu;
>   dev->max_mtu = mtu;
>   }
> +
> + /* TODO: size buffers correctly in this case. */
> + if (dev->mtu > ETH_DATA_LEN)
> + vi->big_packets = true;
>   }
>  
>   if (vi->any_header_sg)
> -- 
> MST


  1   2   3   >