date:20150929

[PATCH net-next 1/4] phylib: Add phy_set_max_speed helper

2015-09-29 Thread Simon Horman

Add a helper to allow ethernet drivers to limit the speed of a phy
(that they are attached to).

This mainly involves factoring out the business-end of
of_set_phy_supported() and exporting a new symbol.

This code seems to be open coded in several places, in several different
variants.

It is is envisaged that this will be used in situations where setting the
"max-speed" property in DT is not appropriate, e.g. because the maximum
speed is not a property of the phy hardware.

Signed-off-by: Simon Horman 

---
v2
* First post

v3
* As suggested by Florian Fainelli
  - Do not check for !IS_ENABLED(CONFIG_OF_MDIO) in __set_phy_supported.
This is already done in of_set_phy_supported() and is not relevant to
phy_set_max_speed)
  - Return -ENOTSUPP if 'max_speed' is not an unknown value
* As suggested by Sergei Shtylyov
  - White-space and comment enhancements.

v4
* No change
---
 drivers/net/phy/phy_device.c | 59 ++--
 include/linux/phy.h  |  1 +
 2 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index f761288abe66..383389146099 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1239,6 +1239,44 @@ static int gen10g_resume(struct phy_device *phydev)
return 0;
 }
 
+static int __set_phy_supported(struct phy_device *phydev, u32 max_speed)
+{
+   /* The default values for phydev->supported are provided by the PHY
+* driver "features" member, we want to reset to sane defaults first
+* before supporting higher speeds.
+*/
+   phydev->supported &= PHY_DEFAULT_FEATURES;
+
+   switch (max_speed) {
+   default:
+   return -ENOTSUPP;
+   case SPEED_1000:
+   phydev->supported |= PHY_1000BT_FEATURES;
+   /* fall through */
+   case SPEED_100:
+   phydev->supported |= PHY_100BT_FEATURES;
+   /* fall through */
+   case SPEED_10:
+   phydev->supported |= PHY_10BT_FEATURES;
+   }
+
+   return 0;
+}
+
+int phy_set_max_speed(struct phy_device *phydev, u32 max_speed)
+{
+   int err;
+
+   err = __set_phy_supported(phydev, max_speed);
+   if (err)
+   return err;
+
+   phydev->advertising = phydev->supported;
+
+   return 0;
+}
+EXPORT_SYMBOL(phy_set_max_speed);
+
 static void of_set_phy_supported(struct phy_device *phydev)
 {
struct device_node *node = phydev->dev.of_node;
@@ -1250,25 +1288,8 @@ static void of_set_phy_supported(struct phy_device 
*phydev)
if (!node)
return;
 
-   if (!of_property_read_u32(node, "max-speed", &max_speed)) {
-   /* The default values for phydev->supported are provided by the 
PHY
-* driver "features" member, we want to reset to sane defaults 
fist
-* before supporting higher speeds.
-*/
-   phydev->supported &= PHY_DEFAULT_FEATURES;
-
-   switch (max_speed) {
-   default:
-   return;
-
-   case SPEED_1000:
-   phydev->supported |= PHY_1000BT_FEATURES;
-   case SPEED_100:
-   phydev->supported |= PHY_100BT_FEATURES;
-   case SPEED_10:
-   phydev->supported |= PHY_10BT_FEATURES;
-   }
-   }
+   if (!of_property_read_u32(node, "max-speed", &max_speed))
+   __set_phy_supported(phydev, max_speed);
 }
 
 /**
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 4a4e3a092337..4c477e6ece33 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -798,6 +798,7 @@ int phy_mii_ioctl(struct phy_device *phydev, struct ifreq 
*ifr, int cmd);
 int phy_start_interrupts(struct phy_device *phydev);
 void phy_print_status(struct phy_device *phydev);
 void phy_device_free(struct phy_device *phydev);
+int phy_set_max_speed(struct phy_device *phydev, u32 max_speed);
 
 int phy_register_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask,
   int (*run)(struct phy_device *));
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 4/4] ravb: Add support for r8a7795 SoC

2015-09-29 Thread Simon Horman

From: Kazuya Mizuguchi 

This patch supports the r8a7795 SoC by:
- Using two interrupts
  + One for E-MAC
  + One for everything else
  + Both can be handled by the existing common interrupt handler, which
affords a simpler update to support the new SoC. In future some
consideration may be given to implementing multiple interrupt handlers
- Limiting the phy speed to 100Mbit/s for the new SoC;
  at this time it is not clear how this restriction may be lifted
  but I hope it will be possible as more information comes to light

Signed-off-by: Kazuya Mizuguchi 
[horms: reworked]
Signed-off-by: Simon Horman 

---
v0 [Kazuya Mizuguchi]

v1 [Simon Horman]
* Updated patch subject

v2 [Simon Horman]
* Reworked based on extensive feedback from
  Geert Uytterhoeven and Sergei Shtylyov.
* Broke binding update out into separate patch

v3 [Simon Horman]
* Check new return value of phy_set_max_speed()

v4
* No change
---
 drivers/net/ethernet/renesas/ravb.h  |  7 
 drivers/net/ethernet/renesas/ravb_main.c | 63 
 2 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb.h 
b/drivers/net/ethernet/renesas/ravb.h
index a157ff6a..0623fff932e4 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -766,6 +766,11 @@ struct ravb_ptp {
struct ravb_ptp_perout perout[N_PER_OUT];
 };
 
+enum ravb_chip_id {
+   RCAR_GEN2,
+   RCAR_GEN3,
+};
+
 struct ravb_private {
struct net_device *ndev;
struct platform_device *pdev;
@@ -806,6 +811,8 @@ struct ravb_private {
int msg_enable;
int speed;
int duplex;
+   int emac_irq;
+   enum ravb_chip_id chip_id;
 
unsigned no_avb_link:1;
unsigned avb_link_active_low:1;
diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 4ca093d033f8..8cc5ec5ed19a 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -889,6 +889,22 @@ static int ravb_phy_init(struct net_device *ndev)
return -ENOENT;
}
 
+   /* This driver only support 10/100Mbit speeds on Gen3
+* at this time.
+*/
+   if (priv->chip_id == RCAR_GEN3) {
+   int err;
+
+   err = phy_set_max_speed(phydev, SPEED_100);
+   if (err) {
+   netdev_err(ndev, "failed to limit PHY to 100Mbit/s\n");
+   phy_disconnect(phydev);
+   return err;
+   }
+
+   netdev_info(ndev, "limited PHY to 100Mbit/s\n");
+   }
+
netdev_info(ndev, "attached PHY %d (IRQ %d) to driver %s\n",
phydev->addr, phydev->irq, phydev->drv->name);
 
@@ -1197,6 +1213,15 @@ static int ravb_open(struct net_device *ndev)
goto out_napi_off;
}
 
+   if (priv->chip_id == RCAR_GEN3) {
+   error = request_irq(priv->emac_irq, ravb_interrupt,
+   IRQF_SHARED, ndev->name, ndev);
+   if (error) {
+   netdev_err(ndev, "cannot request IRQ\n");
+   goto out_free_irq;
+   }
+   }
+
/* Device init */
error = ravb_dmac_init(ndev);
if (error)
@@ -1220,6 +1245,7 @@ out_ptp_stop:
ravb_ptp_stop(ndev);
 out_free_irq:
free_irq(ndev->irq, ndev);
+   free_irq(priv->emac_irq, ndev);
 out_napi_off:
napi_disable(&priv->napi[RAVB_NC]);
napi_disable(&priv->napi[RAVB_BE]);
@@ -1625,10 +1651,20 @@ static int ravb_mdio_release(struct ravb_private *priv)
return 0;
 }
 
+static const struct of_device_id ravb_match_table[] = {
+   { .compatible = "renesas,etheravb-r8a7790", .data = (void *)RCAR_GEN2 },
+   { .compatible = "renesas,etheravb-r8a7794", .data = (void *)RCAR_GEN2 },
+   { .compatible = "renesas,etheravb-r8a7795", .data = (void *)RCAR_GEN3 },
+   { }
+};
+MODULE_DEVICE_TABLE(of, ravb_match_table);
+
 static int ravb_probe(struct platform_device *pdev)
 {
struct device_node *np = pdev->dev.of_node;
+   const struct of_device_id *match;
struct ravb_private *priv;
+   enum ravb_chip_id chip_id;
struct net_device *ndev;
int error, irq, q;
struct resource *res;
@@ -1657,7 +1693,14 @@ static int ravb_probe(struct platform_device *pdev)
/* The Ether-specific entries in the device structure. */
ndev->base_addr = res->start;
ndev->dma = -1;
-   irq = platform_get_irq(pdev, 0);
+
+   match = of_match_device(of_match_ptr(ravb_match_table), &pdev->dev);
+   chip_id = (enum ravb_chip_id)match->data;
+
+   if (chip_id == RCAR_GEN3)
+   irq = platform_get_irq_byname(pdev, "ch22");
+   else
+   irq = platform_get_irq(pdev, 0);
if (irq < 0) {
error = irq;
goto o

[PATCH net-next 0/4] ravb: Add support for r8a7795 SoC

2015-09-29 Thread Simon Horman

Dave,

please consider this series for net-next.
It enhances the ravb driver to support the r8a7795 SoC.

Changes:

* Dropped RFC prefix
* Details in changelog of individual patches

Base:

* net-next/master

Availability:

To aid review of this in conjunction with other EtherAVB changes
the following branches are available in my renesas tree on kernel.org.

* me/r8a7795-ravb-driver-v4: this series
* me/r8a7795-ravb-pfc-v2: r8a7795 sh-pfc update for EthernetAVB
* me/r8a7795-ravb-integration-v4: enable EthernetAVB on r8a7795
* me/r8a7795-ravb-driver-and-integration-v4.runtime:
  the above three branches with their runtime dependencies

Kazuya Mizuguchi (3):
  ravb: Provide dev parameter to DMA API
  ravb: Document binding for r8a7795 SoC
  ravb: Add support for r8a7795 SoC

Simon Horman (1):
  phylib: Add phy_set_max_speed helper

 .../devicetree/bindings/net/renesas,ravb.txt   |  69 --
 drivers/net/ethernet/renesas/ravb.h|   7 ++
 drivers/net/ethernet/renesas/ravb_main.c   | 101 +++--
 drivers/net/phy/phy_device.c   |  59 
 include/linux/phy.h|   1 +
 5 files changed, 184 insertions(+), 53 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/4] ravb: Provide dev parameter to DMA API

2015-09-29 Thread Simon Horman

From: Kazuya Mizuguchi 

This patch is in preparation for using this driver on arm64 where the
implementation of __dma_alloc_coherent fails if a device parameter is not
provided.

Signed-off-by: Kazuya Mizuguchi 
Signed-off-by: Yoshihiro Shimoda 
Signed-off-by: Masaru Nagai 
[horms: squashed into a single patch]
Signed-off-by: Simon Horman 

---
* [horms]
  I have only tested this on arm64 using r8a7795/salvator-x.

v0 [Kazuya Mizuguchi, Yoshihiro Shimoda, Masaru Nagai]

v1 [Simon Horman]
* Squashed into a single patch

v2 [Simon Horman]
* No change

v4
* No change
---
 drivers/net/ethernet/renesas/ravb_main.c | 38 
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 450899e9cea2..4ca093d033f8 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -201,7 +201,7 @@ static void ravb_ring_free(struct net_device *ndev, int q)
if (priv->rx_ring[q]) {
ring_size = sizeof(struct ravb_ex_rx_desc) *
(priv->num_rx_ring[q] + 1);
-   dma_free_coherent(NULL, ring_size, priv->rx_ring[q],
+   dma_free_coherent(ndev->dev.parent, ring_size, priv->rx_ring[q],
  priv->rx_desc_dma[q]);
priv->rx_ring[q] = NULL;
}
@@ -209,7 +209,7 @@ static void ravb_ring_free(struct net_device *ndev, int q)
if (priv->tx_ring[q]) {
ring_size = sizeof(struct ravb_tx_desc) *
(priv->num_tx_ring[q] * NUM_TX_DESC + 1);
-   dma_free_coherent(NULL, ring_size, priv->tx_ring[q],
+   dma_free_coherent(ndev->dev.parent, ring_size, priv->tx_ring[q],
  priv->tx_desc_dma[q]);
priv->tx_ring[q] = NULL;
}
@@ -240,13 +240,13 @@ static void ravb_ring_format(struct net_device *ndev, int 
q)
rx_desc = &priv->rx_ring[q][i];
/* The size of the buffer should be on 16-byte boundary. */
rx_desc->ds_cc = cpu_to_le16(ALIGN(PKT_BUF_SZ, 16));
-   dma_addr = dma_map_single(&ndev->dev, priv->rx_skb[q][i]->data,
+   dma_addr = dma_map_single(ndev->dev.parent, 
priv->rx_skb[q][i]->data,
  ALIGN(PKT_BUF_SZ, 16),
  DMA_FROM_DEVICE);
/* We just set the data size to 0 for a failed mapping which
 * should prevent DMA from happening...
 */
-   if (dma_mapping_error(&ndev->dev, dma_addr))
+   if (dma_mapping_error(ndev->dev.parent, dma_addr))
rx_desc->ds_cc = cpu_to_le16(0);
rx_desc->dptr = cpu_to_le32(dma_addr);
rx_desc->die_dt = DT_FEMPTY;
@@ -309,7 +309,7 @@ static int ravb_ring_init(struct net_device *ndev, int q)
 
/* Allocate all RX descriptors. */
ring_size = sizeof(struct ravb_ex_rx_desc) * (priv->num_rx_ring[q] + 1);
-   priv->rx_ring[q] = dma_alloc_coherent(NULL, ring_size,
+   priv->rx_ring[q] = dma_alloc_coherent(ndev->dev.parent, ring_size,
  &priv->rx_desc_dma[q],
  GFP_KERNEL);
if (!priv->rx_ring[q])
@@ -320,7 +320,7 @@ static int ravb_ring_init(struct net_device *ndev, int q)
/* Allocate all TX descriptors. */
ring_size = sizeof(struct ravb_tx_desc) *
(priv->num_tx_ring[q] * NUM_TX_DESC + 1);
-   priv->tx_ring[q] = dma_alloc_coherent(NULL, ring_size,
+   priv->tx_ring[q] = dma_alloc_coherent(ndev->dev.parent, ring_size,
  &priv->tx_desc_dma[q],
  GFP_KERNEL);
if (!priv->tx_ring[q])
@@ -443,7 +443,7 @@ static int ravb_tx_free(struct net_device *ndev, int q)
size = le16_to_cpu(desc->ds_tagl) & TX_DS;
/* Free the original skb. */
if (priv->tx_skb[q][entry / NUM_TX_DESC]) {
-   dma_unmap_single(&ndev->dev, le32_to_cpu(desc->dptr),
+   dma_unmap_single(ndev->dev.parent, 
le32_to_cpu(desc->dptr),
 size, DMA_TO_DEVICE);
/* Last packet descriptor? */
if (entry % NUM_TX_DESC == NUM_TX_DESC - 1) {
@@ -546,7 +546,7 @@ static bool ravb_rx(struct net_device *ndev, int *quota, 
int q)
 
skb = priv->rx_skb[q][entry];
priv->rx_skb[q][entry] = NULL;
-   dma_unmap_single(&ndev->dev, le32_to_cpu(desc->dptr),
+   dma_unmap_single(ndev->dev.parent, 
le32_to_cpu(desc->dptr),
 ALIGN(PKT_BUF_SZ, 16),

[PATCH net-next 3/4] ravb: Document binding for r8a7795 SoC

2015-09-29 Thread Simon Horman

From: Kazuya Mizuguchi 

This patch updates the ravb binding to support the r8a7795 SoC by:
- Adding a compat string for the new hardware
- Adding 25 named interrupts to binding for the new SoC;
  older SoCs continue to use a single multiplexed interrupt

The example is also updated to reflect the r8a7795 as this is the
more complex case.

Based on work by Kazuya Mizuguchi and others.

Signed-off-by: Simon Horman 
Acked-by: Geert Uytterhoeven 

---
v2
* First post; broken out of a driver update patch
* As discussed with Geert Uytterhoeven and Sergei Shtylyov
  - Binding: Make all interrupts mandatory as named-interrupts of
the form ch%u

v3
* A suggested by Geert Uytterhoeven
  - Reword description of interrupts and interrupt-names to
make things clearer. It is now based to some extent on
spi-rspi.txt and renesas,usb-dmac.txt.
* As suggested by Sergei Shtylyov
  - Drop phy-reset-gpio from example
* Added power-domains to example

v4
* A suggested by Geert Uytterhoeven
  - grammar fix for interrupt-names description
* Add ack
---
 .../devicetree/bindings/net/renesas,ravb.txt   | 69 +++---
 1 file changed, 62 insertions(+), 7 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt 
b/Documentation/devicetree/bindings/net/renesas,ravb.txt
index 1fd8831437bf..b486f3f5f6a3 100644
--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
@@ -6,8 +6,12 @@ interface contains.
 Required properties:
 - compatible: "renesas,etheravb-r8a7790" if the device is a part of R8A7790 
SoC.
  "renesas,etheravb-r8a7794" if the device is a part of R8A7794 SoC.
+ "renesas,etheravb-r8a7795" if the device is a part of R8A7795 SoC.
 - reg: offset and length of (1) the register block and (2) the stream buffer.
-- interrupts: interrupt specifier for the sole interrupt.
+- interrupts: A list of interrupt-specifiers, one for each entry in
+ interrupt-names.
+ If interrupt-names is not present, an interrupt specifier
+ for a single muxed interrupt.
 - phy-mode: see ethernet.txt file in the same directory.
 - phy-handle: see ethernet.txt file in the same directory.
 - #address-cells: number of address cells for the MDIO bus, must be equal to 1.
@@ -18,6 +22,12 @@ Required properties:
 Optional properties:
 - interrupt-parent: the phandle for the interrupt controller that services
interrupts for this device.
+- interrupt-names: A list of interrupt names.
+  For the R8A7795 SoC this property is mandatory;
+  it should include one entry per channel, named "ch%u",
+  where %u is the channel number ranging from 0 to 24.
+  For other SoCs this property is optional; if present
+  it should contain "mux" for a single muxed interrupt.
 - pinctrl-names: pin configuration state name ("default").
 - renesas,no-ether-link: boolean, specify when a board does not provide a 
proper
 AVB_LINK signal.
@@ -27,13 +37,46 @@ Optional properties:
 Example:
 
ethernet@e680 {
-   compatible = "renesas,etheravb-r8a7790";
-   reg = <0 0xe680 0 0x800>, <0 0xee0e8000 0 0x4000>;
+   compatible = "renesas,etheravb-r8a7795";
+   reg = <0 0xe680 0 0x800>, <0 0xe6a0 0 0x1>;
interrupt-parent = <&gic>;
-   interrupts = <0 163 IRQ_TYPE_LEVEL_HIGH>;
-   clocks = <&mstp8_clks R8A7790_CLK_ETHERAVB>;
-   phy-mode = "rmii";
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   interrupt-names = "ch0", "ch1", "ch2", "ch3",
+ "ch4", "ch5", "ch6", "ch7",
+ "ch8", "ch9", "ch10", "ch11",
+ "ch12", "ch13", "ch14", "ch15",
+ "ch16", "ch17", "ch18", "ch19",
+ "ch20", "ch21", "ch22", "ch23",
+ "ch24";
+   clocks = <&mstp8_clks R8A7795_CLK_ETHERAVB>;
+   power-domains = <&cpg_clocks>;
+   phy-mode = "rgmii-id";

Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-29 Thread Mathias Krause

On 29 September 2015 at 21:09, Jason Baron  wrote:
> However, if we call connect on socket 's', to connect to a new socket 'o2', we
> drop the reference on the original socket 'o'. Thus, we can now close socket
> 'o' without unregistering from epoll. Then, when we either close the ep
> or unregister 'o', we end up with this list corruption. Thus, this is not a
> race per se, but can be triggered sequentially.

Sounds profound, but the reproducers calls connect only once per
socket. So there is no "connect to a new socket", no?
But w/e, see below.

> Linus explains the general case in the context the signalfd stuff here:
> https://lkml.org/lkml/2013/10/14/634

I also found that posting while looking for similar bug reports. Also
found that one: https://lkml.org/lkml/2014/5/15/532

> So this may be the case that we've been chasing here for a while...

That bug triggers since commit 3c73419c09 "af_unix: fix 'poll for
write'/ connected DGRAM sockets". That's v2.6.26-rc7, as noted in the
reproducer.

>
> In any case, we could fix with that same POLLFREE mechansim, the simplest
> would be something like:
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 03ee4d3..d499f81 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -392,6 +392,9 @@ static void unix_sock_destructor(struct sock *sk)
> pr_debug("UNIX %p is destroyed, %ld are still alive.\n", sk,
> atomic_long_read(&unix_nr_socks));
>  #endif
> +   /* make sure we remove from epoll */
> +   wake_up_poll(&u->peer_wait, POLLFREE);
> +   synchronize_sched();
>  }
>
>  static void unix_release_sock(struct sock *sk, int embrion)
>
> I'm not suggesting we apply that, but that fixed the supplied test case.
> We could enhance the above, to avoid the free'ing latency there by doing
> the SLAB_DESTROY_BY_RCU for unix sockets. But I'm not convinced
> that this wouldn't be still broken for select()/poll() as well. I think
> we can be in a select() call for socket 's', and if we remove socket
> 'o' from it in the meantime (by doing a connect() on s to somewhere else
> and a close on 'o'), I think we can still crash there. So POLLFREE would
> have to be extended. I tried to hit this with select() but could not,
> but I think if I tried harder I could.
>
> Instead of going further down that route, perhaps something like below
> might be better. The basic idea would be to do away with the 'other'
> poll call in unix_dgram_poll(), and instead revert back to a registering
> on a single wait queue. We add a new wait queue to unix sockets such
> that we can register it with a remote other on connect(). Then we can
> use the wakeup from the remote to wake up the registered unix socket.
> Probably better explained with the patch below. Note I didn't add to
> the remote for SOCK_STREAM, since the poll() routine there doesn't do
> the double wait queue registering:
>
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index 4a167b3..9698aff 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -62,6 +62,7 @@ struct unix_sock {
>  #define UNIX_GC_CANDIDATE  0
>  #define UNIX_GC_MAYBE_CYCLE1
> struct socket_wqpeer_wq;
> +   wait_queue_twait;
>  };
>  #define unix_sk(__sk) ((struct unix_sock *)__sk)
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 03ee4d3..9e0692a 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -420,6 +420,8 @@ static void unix_release_sock(struct sock *sk, int 
> embrion)
> skpair = unix_peer(sk);
>
> if (skpair != NULL) {
> +   if (sk->sk_type != SOCK_STREAM)
> +   remove_wait_queue(&unix_sk(skpair)->peer_wait, 
> &u->wait);
> if (sk->sk_type == SOCK_STREAM || sk->sk_type == 
> SOCK_SEQPACKET) {
> unix_state_lock(skpair);
> /* No more writes */
> @@ -636,6 +638,16 @@ static struct proto unix_proto = {
>   */
>  static struct lock_class_key af_unix_sk_receive_queue_lock_key;
>
> +static int peer_wake(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +   struct unix_sock *u;
> +
> +   u = container_of(wait, struct unix_sock, wait);
> +   wake_up_interruptible_sync_poll(sk_sleep(&u->sk), key);
> +
> +   return 0;
> +}
> +
>  static struct sock *unix_create1(struct net *net, struct socket *sock, int 
> kern)
>  {
> struct sock *sk = NULL;
> @@ -664,6 +676,7 @@ static struct sock *unix_create1(struct net *net, struct 
> socket *sock, int kern)
> INIT_LIST_HEAD(&u->link);
> mutex_init(&u->readlock); /* single task reading lock */
> init_waitqueue_head(&u->peer_wait);
> +   init_waitqueue_func_entry(&u->wait, peer_wake);
> unix_insert_socket(unix_sockets_unbound(sk), sk);
>  out:
> if (sk == NULL)
> @@ -1030,7 +1043,10 @@ restart:
>  */
> if (unix_peer(sk)) {
> struct sock *old_peer = unix

Loan Offer

2015-09-29 Thread Loan


Contact us as we offer our finance service at a low and affordable interest 
rate for long and short cash term. Interested applicant should contact us for 
further acquisition procedures. Thanks as we remain obliged to render service 
to you; worldtrading1...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] net: add pfmemalloc check in sk_add_backlog()

2015-09-29 Thread David Miller

From: Eric Dumazet 
Date: Tue, 29 Sep 2015 18:52:25 -0700

> From: Eric Dumazet 
> 
> Greg reported crashes hitting the following check in __sk_backlog_rcv()
> 
>   BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); 
> 
> The pfmemalloc bit is currently checked in sk_filter().
> 
> This works correctly for TCP, because sk_filter() is ran in
> tcp_v[46]_rcv() before hitting the prequeue or backlog checks.
> 
> For UDP or other protocols, this does not work, because the sk_filter()
> is ran from sock_queue_rcv_skb(), which might be called _after_ backlog
> queuing if socket is owned by user by the time packet is processed by
> softirq handler.
> 
> Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB 
> processing")
> Signed-off-by: Eric Dumazet 
> Reported-by: Greg Thelen 

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: macb: fix two typos

2015-09-29 Thread David Miller

From: Geliang Tang 
Date: Tue, 29 Sep 2015 19:31:32 -0700

> Just fix two typos in code comments.
> 
> Signed-off-by: Geliang Tang 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] netfilter: remove dead code

2015-09-29 Thread David Miller

From: Florian Westphal 
Date: Wed, 30 Sep 2015 02:45:07 +0200

> Flavio Leitner  wrote:
>> Remove __nf_conntrack_find() from headers.
>> Fixes: dcd93ed4cd1 ("netfilter: nf_conntrack: remove dead code"
> 
> For the record: netfilter patches should go to
> netfilter-de...@vger.kernel.org .
> 
> That being said, in this case I doubt Pablo minds if David takes this
> directly, patch ts obviously correct[tm] :)

I don't want to create any unnecessary merge hassles, so please
resubmit this properly to netfilter-devel, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: Initialize flow flags in input path

2015-09-29 Thread David Miller

From: David Ahern 
Date: Tue, 29 Sep 2015 19:07:07 -0700

> The fib_table_lookup tracepoint found 2 places where the flowi4_flags is
> not initialized.
> 
> Signed-off-by: David Ahern 

Applied, thanks David.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/90] Netfilter/IPVS updates for net-next

2015-09-29 Thread David Miller

From: Pablo Neira Ayuso 
Date: Tue, 29 Sep 2015 21:25:21 +0200

> Hi David,
> 
> The following pull request contains Netfilter/IPVS updates for net-next
> containing 90 patches from Eric Biederman.
> 
> The main goal of this batch is to avoid recurrent lookups for the netns
> pointer, that happens over and over again in our Netfilter/IPVS code. The idea
> consists of passing netns pointer from the hook state to the relevant 
> functions
> and objects where this may be needed.
> 
> You can find more information on the IPVS updates from Simon Horman's commit
> merge message:
> 
> c3456026adc0 ("Merge tag 'ipvs2-for-v4.4' of 
> https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next";).
> 
> Exceptionally, this time, I'm not posting the patches again on netdev, Eric
> already Cc'ed this mailing list in the original submission. If you need me to
> make, just let me know.

Yeah that's appropriate in this situation.

Pulled, thanks Pablo.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] net: dsa: fix preparation of a port STP update

2015-09-29 Thread David Miller

From: Vivien Didelot 
Date: Tue, 29 Sep 2015 14:17:54 -0400

> Because of the default 0 value of ret in dsa_slave_port_attr_set, a
> driver may return -EOPNOTSUPP from the commit phase of a STP state,
> which triggers a WARN() from switchdev.
> 
> This happened on a 6185 switch which does not support hardware bridging.
> 
> Fixes: 3563606258cf ("switchdev: convert STP update to switchdev attr set")
> Reported-by: Andrew Lunn 
> Signed-off-by: Vivien Didelot 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net: dsa: fix preparation of a port STP update

2015-09-29 Thread David Miller

From: Vivien Didelot 
Date: Tue, 29 Sep 2015 12:38:36 -0400

> Because of the default 0 value of ret in dsa_slave_port_attr_set, a
> driver may return -EOPNOTSUPP from the commit phase of a STP state,
> which triggers a WARN() from switchdev.
> 
> This happened on a 6185 switch which does not support hardware bridging.
> 
> Reported-by: Andrew Lunn 
> Signed-off-by: Vivien Didelot 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] net: Add support for filtering neigh dump by master device

2015-09-29 Thread David Miller

From: David Ahern 
Date: Tue, 29 Sep 2015 09:32:03 -0700

> Add support for filtering neighbor dumps by master device by adding
> the NDA_MASTER attribute to the dump request. A new netlink flag,
> NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the
> request and output is filtered as requested.
> 
> Signed-off-by: David Ahern 
> ---
> v2
> - added NLM_F_DUMP_FILTERED flag for userspace feedback that request is
>   supported
> 
> This method works for other filters as well and other dump commands.
> Works fine for all combinations of new and old kernel and new and old ip:
> 1. new ip command on old kernel, NDA_MASTER attribute is ignored
> 2. old ip command on new kernel, NDA_MASTER attribute is not present
> 3. new ip on new kernel ... goodness ensues by limiting data to
>only what user wants

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] tcp: fix tcp_v6_md5_do_lookup prototype

2015-09-29 Thread Eric Dumazet

On Tue, 2015-09-29 at 21:24 -0700, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> tcp_v6_md5_do_lookup() now takes a const socket, even if
> CONFIG_TCP_MD5SIG is not set.
> 
> Fixes: b83e3deb974c ("tcp: md5: constify tcp_md5_do_lookup() socket argument")
> From: Eric Dumazet 

Signed-off-by: Eric Dumazet 

> Reported-by: kbuild test robot 
> ---
>  net/ipv6/tcp_ipv6.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 0ac64f47f8821ce7da103ecc7391ba7e..2ae95e1d03e1c0d5149c9f6fa7cf94d9 
> 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -82,7 +82,7 @@ static const struct inet_connection_sock_af_ops 
> ipv6_specific;
>  static const struct tcp_sock_af_ops tcp_sock_ipv6_specific;
>  static const struct tcp_sock_af_ops tcp_sock_ipv6_mapped_specific;
>  #else
> -static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(struct sock *sk,
> +static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(const struct sock *sk,
>  const struct in6_addr *addr)
>  {
>   return NULL;
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] tcp: fix tcp_v6_md5_do_lookup prototype

2015-09-29 Thread David Miller

From: Eric Dumazet 
Date: Tue, 29 Sep 2015 21:24:05 -0700

> From: Eric Dumazet 
> 
> tcp_v6_md5_do_lookup() now takes a const socket, even if
> CONFIG_TCP_MD5SIG is not set.
> 
> Fixes: b83e3deb974c ("tcp: md5: constify tcp_md5_do_lookup() socket argument")
> From: Eric Dumazet 
> Reported-by: kbuild test robot 

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next 0/6] net: switchdev: use specific switchdev_obj_*

2015-09-29 Thread David Miller

From: Vivien Didelot 
Date: Tue, 29 Sep 2015 12:07:12 -0400

> This patchset changes switchdev add, del, dump operations from this:
 ...
> to something similar to the notifier_call callback of a notifier_block:
 ...
> This allows the caller to pass and expect back a specific switchdev_obj_*
> structure (e.g. switchdev_obj_fdb) instead of the generic switchdev_obj one.
> 
> This will simplify pushing the callback function down to the drivers.
> 
> The first 3 patches get rid of the dev parameter of the dump callback, since 
> it
> is not always neeeded (e.g. vlan_dump) and some drivers (such as DSA drivers)
> may not have easy access to it.
> 
> Patches 4 and 5 implement the change in the switchdev operations and its 
> users.
> 
> Patch 6 extracts the inner switchdev_obj_* structures from switchdev_obj and
> removes this last one.
> 
> v2: fix error spotted by kbuild (extra ';' inline switchdev_port_obj_dump).

Series applied, thanks Vivien.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] net: Add support for filtering neigh dump by master device

2015-09-29 Thread roopa


On 9/29/15, 9:32 AM, David Ahern wrote:

Add support for filtering neighbor dumps by master device by adding
the NDA_MASTER attribute to the dump request. A new netlink flag,
NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the
request and output is filtered as requested.

Signed-off-by: David Ahern 

Acked-by: Roopa Prabhu 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] tcp: fix tcp_v6_md5_do_lookup prototype

2015-09-29 Thread Eric Dumazet

From: Eric Dumazet 

tcp_v6_md5_do_lookup() now takes a const socket, even if
CONFIG_TCP_MD5SIG is not set.

Fixes: b83e3deb974c ("tcp: md5: constify tcp_md5_do_lookup() socket argument")
From: Eric Dumazet 
Reported-by: kbuild test robot 
---
 net/ipv6/tcp_ipv6.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 0ac64f47f8821ce7da103ecc7391ba7e..2ae95e1d03e1c0d5149c9f6fa7cf94d9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -82,7 +82,7 @@ static const struct inet_connection_sock_af_ops ipv6_specific;
 static const struct tcp_sock_af_ops tcp_sock_ipv6_specific;
 static const struct tcp_sock_af_ops tcp_sock_ipv6_mapped_specific;
 #else
-static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(struct sock *sk,
+static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(const struct sock *sk,
   const struct in6_addr *addr)
 {
return NULL;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next:master 414/428] net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type

2015-09-29 Thread Eric Dumazet

On Tue, 2015-09-29 at 21:12 -0700, David Miller wrote:
> From: Eric Dumazet 
> Date: Tue, 29 Sep 2015 21:10:28 -0700
> 
> > Thanks, probably a matter of applying this patch.
> 
> Looks obvious enough, please submit this formally, thanks.

Sure ! I am compiling ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH bluetooth-next 1/4] netlink: add nla_get for le32 and le64

2015-09-29 Thread David Miller

From: Marcel Holtmann 
Date: Tue, 29 Sep 2015 18:08:32 +0200

> do you have any objections to me taking this change through the 
> bluetooth-next tree?

No objections.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] testptp: Silence compiler warnings on ppc64

2015-09-29 Thread David Miller

From: Thomas Huth 
Date: Tue, 29 Sep 2015 17:45:28 +0200

> When compiling Documentation/ptp/testptp.c the following compiler
> warnings are printed out:
 ...
> This happens because __s64 is by default defined as "long" on ppc64,
> not as "long long". However, to fix these warnings, it's possible to
> define the __SANE_USERSPACE_TYPES__ so that __s64 gets defined to
> "long long" on ppc64, too.
> 
> Signed-off-by: Thomas Huth 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] net/mlx4: Handle return codes in mlx4_qp_attach_common

2015-09-29 Thread David Miller

From: Robb Manes 
Date: Tue, 29 Sep 2015 11:03:37 -0400

> Both new_steering_entry() and existing_steering_entry() return values
> based on their success or failure, but currently they fall through
> silently.  This can make troubleshooting difficult, as we were unable
> to tell which one of these two functions returned errors or
> specifically what code was returned.  This patch remedies that
> situation by passing the return codes to err, which is returned by
> mlx4_qp_attach_common() itself.
> 
> This also addresses a leak in the call to mlx4_bitmap_free() as well.
> 
> Signed-off-by: Robb Manes 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next:master 414/428] net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type

2015-09-29 Thread David Miller

From: Eric Dumazet 
Date: Tue, 29 Sep 2015 21:10:28 -0700

> Thanks, probably a matter of applying this patch.

Looks obvious enough, please submit this formally, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] dsa: mv88e6xxx: Fix unsigned/signed issue

2015-09-29 Thread David Miller

From: Andrew Lunn 
Date: Tue, 29 Sep 2015 01:53:48 +0200

> commit dea870242a9c ("dsa: mv88e6xxx: Allow speed/duplex of port to be
> configured") leads to the following static checker warning:
> 
> drivers/net/dsa/mv88e6xxx.c:585 mv88e6xxx_adjust_link()
> warn: unsigned 'ret' is never less than zero.
> 
> drivers/net/dsa/mv88e6xxx.c
>573  void mv88e6xxx_adjust_link(struct dsa_switch *ds, int port,
>574 struct phy_device *phydev)
>575  {
>576  struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
>577  u32 ret, reg;
>578
>579  if (!phy_is_pseudo_fixed_link(phydev))
>580  return;
>581
>582  mutex_lock(&ps->smi_mutex);
>583
>584  ret = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL);
>585  if (ret < 0)
> 
> Make ret an int, which is the return type for _mv88e6xxx_reg_read()
> 
> Reported-by: Dan Carpenter 
> Signed-off-by: Andrew Lunn 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] net: m68k: Allow modular build

2015-09-29 Thread David Miller

From: Geert Uytterhoeven 
Date: Tue, 29 Sep 2015 10:24:01 +0200

> This patch series makes the remaining m68k Ethernet drivers modular.
> It's an alternative to the last 3 patches of Paul Gortmaker's series
> "[PATCH net-next 0/6] make non-modular code explicitly non-modular".
> 
> Note that "[PATCH 5/5] net: macmace: Allow modular build" depends on
> "[PATCH 4/5] m68k/mac: Export Peripheral System Controller (PSC) base
> address to modules". Feel free to take the dependency through the netdev
> tree to avoid modular build breakage.
> 
> This was compile-tested only (mac_defconfig + allmodconfig) due to lack
> of hardware.

Series applied, thanks Geert.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 RESEND] BNX2: fix a Null Pointer for stats_blk

2015-09-29 Thread David Miller

From: Weidong Wang 
Date: Tue, 29 Sep 2015 11:18:18 +0800

> @@ -839,11 +828,12 @@ bnx2_free_mem(struct bnx2 *bp)
>  }
> 
>  static int
> -bnx2_alloc_mem(struct bnx2 *bp)
> +bnx2_alloc_stats_blk(struct net_device *dev)
>  {
> - int i, status_blk_size, err;
> + int i, status_blk_size;
>   struct bnx2_napi *bnapi;
>   void *status_blk;
> + struct bnx2 *bp = netdev_priv(dev);
> 
>   /* Combine status and statistics blocks into one allocation. */
>   status_blk_size = L1_CACHE_ALIGN(sizeof(struct status_block));

This function is not just allocating the stats block, it's allocating
a whole bunch of other things too.

Only allocate the stats block at probe time, not the NAPI et al. stuff
as well.  That can safely stay in the open/close paths.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next:master 414/428] net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type

2015-09-29 Thread Eric Dumazet

On Wed, 2015-09-30 at 12:01 +0800, kbuild test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
> master
> head:   e6934f3ec00b04234acb24a1a2c28af59763d3b5
> commit: a00e74442bac5ad19a929d097370da7e07540ea6 [414/428] tcp/dccp: constify 
> send_synack and send_reset socket argument
> config: avr32-atngw100_defconfig (attached as .config)
> reproduce:
>   wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout a00e74442bac5ad19a929d097370da7e07540ea6
>   # save the attached .config to linux build tree
>   make.cross ARCH=avr32 
> 
> All warnings (new ones prefixed by >>):
> 
>net/ipv6/tcp_ipv6.c: In function 'tcp_v6_reqsk_send_ack':
> >> net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of 
> >> 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type
>net/ipv6/tcp_ipv6.c:926: warning: passing argument 1 of 
> 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type
> 
> vim +/tcp_v6_md5_do_lookup +930 net/ipv6/tcp_ipv6.c
> 
> 9c76a114b Wang Yufen   2014-03-29  914
> tcptw->tw_ts_recent, tw->tw_bound_dev_if, tcp_twsk_md5_key(tcptw),
> 21858cd02 Florent Fourcot  2015-05-16  915
> tw->tw_tclass, cpu_to_be32(tw->tw_flowlabel));
> ^1da177e4 Linus Torvalds   2005-04-16  916  
> 8feaf0c0a Arnaldo Carvalho de Melo 2005-08-09  917inet_twsk_put(tw);
> ^1da177e4 Linus Torvalds   2005-04-16  918  }
> ^1da177e4 Linus Torvalds   2005-04-16  919  
> a00e74442 Eric Dumazet 2015-09-29  920  static void 
> tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
> 6edafaaf6 Gui Jianfeng 2008-08-06  921
>   struct request_sock *req)
> ^1da177e4 Linus Torvalds   2005-04-16  922  {
> 3a19ce0ee Daniel Lee   2014-05-11  923/* sk->sk_state == 
> TCP_LISTEN -> for regular TCP_SYN_RECV
> 3a19ce0ee Daniel Lee   2014-05-11  924 * sk->sk_state == 
> TCP_SYN_RECV -> for Fast Open.
> 3a19ce0ee Daniel Lee   2014-05-11  925 */
> 0f85feae6 Eric Dumazet 2014-12-09  926tcp_v6_send_ack(sk, 
> skb, (sk->sk_state == TCP_LISTEN) ?
> 3a19ce0ee Daniel Lee   2014-05-11  927
> tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt,
> 0f85feae6 Eric Dumazet 2014-12-09  928
> tcp_rsk(req)->rcv_nxt, req->rcv_wnd,
> 0f85feae6 Eric Dumazet 2014-12-09  929
> tcp_time_stamp, req->ts_recent, sk->sk_bound_dev_if,
> 1d13a96c7 Florent Fourcot  2014-01-16 @930
> tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr),
> 1d13a96c7 Florent Fourcot  2014-01-16  9310, 0);
> ^1da177e4 Linus Torvalds   2005-04-16  932  }
> ^1da177e4 Linus Torvalds   2005-04-16  933  
> ^1da177e4 Linus Torvalds   2005-04-16  934  
> ^1da177e4 Linus Torvalds   2005-04-16  935  static struct sock 
> *tcp_v6_hnd_req(struct sock *sk, struct sk_buff *skb)
> ^1da177e4 Linus Torvalds   2005-04-16  936  {
> aa8223c7b Arnaldo Carvalho de Melo 2007-04-10  937const struct tcphdr *th 
> = tcp_hdr(skb);
> 52452c542 Eric Dumazet 2015-03-19  938struct request_sock 
> *req;
> 
> :: The code at line 930 was first introduced by commit
> :: 1d13a96c74fc4802a775189ddb58bc6469ffdaa3 ipv6: tcp: fix flowlabel 
> value in ACK messages send from TIME_WAIT
> 
> :: TO: Florent Fourcot 
> :: CC: David S. Miller 
> 
> ---

Thanks, probably a matter of applying this patch.

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 0ac64f47f8821ce7da103ecc7391ba7e..2ae95e1d03e1c0d5149c9f6fa7cf94d9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -82,7 +82,7 @@ static const struct inet_connection_sock_af_ops ipv6_specific;
 static const struct tcp_sock_af_ops tcp_sock_ipv6_specific;
 static const struct tcp_sock_af_ops tcp_sock_ipv6_mapped_specific;
 #else
-static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(struct sock *sk,
+static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(const struct sock *sk,
   const struct in6_addr *addr)
 {
return NULL;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] dsa: mv88e6xxx: Enable forwarding for unknown to the CPU port

2015-09-29 Thread David Miller

From: Andrew Lunn 
Date: Tue, 29 Sep 2015 01:50:56 +0200

> Frames destined to an unknown address must be forwarded to the CPU
> port. Otherwise incoming ARP, dhcp leases, etc, do not work.
> 
> Signed-off-by: Andrew Lunn 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] y2038 conversion for ntp/pps and sfc driver

2015-09-29 Thread David Miller

From: Arnd Bergmann 
Date: Mon, 28 Sep 2015 22:21:27 +0200

> When trying to build a kernel with time_t commented out, I found that
> the ntp subsystem still relies on timespec for its pps handling.
> 
> This series addresses this and converts all the code to use timespec64
> instead, step by step. There is one device driver that interacts with
> this code directly (rather than only through the ptp subsystem), so
> I have to convert that driver at the same time.
> 
> The patches should ideally stay together as a series, but they do
> span multiple subsystems, so I'm also looking for the right person
> to merge them.

I'm happy with this going via a tree other than mine, and for the
networking bits:

Acked-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 net-next 00/11] net: L3 master device

2015-09-29 Thread David Miller

From: David Ahern 
Date: Tue, 29 Sep 2015 20:07:09 -0700

> v3
> - added license header to l3mdev.c
> 
> - export symbols in l3mdev.c for use with GPL modules
> 
> - removed netdevice header from l3mdev.h (not needed) and fixed
>   typo in comment

Series applied, thanks David.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next:master 414/428] net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type

2015-09-29 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   e6934f3ec00b04234acb24a1a2c28af59763d3b5
commit: a00e74442bac5ad19a929d097370da7e07540ea6 [414/428] tcp/dccp: constify 
send_synack and send_reset socket argument
config: avr32-atngw100_defconfig (attached as .config)
reproduce:
  wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout a00e74442bac5ad19a929d097370da7e07540ea6
  # save the attached .config to linux build tree
  make.cross ARCH=avr32 

All warnings (new ones prefixed by >>):

   net/ipv6/tcp_ipv6.c: In function 'tcp_v6_reqsk_send_ack':
>> net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of 
>> 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type
   net/ipv6/tcp_ipv6.c:926: warning: passing argument 1 of 
'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type

vim +/tcp_v6_md5_do_lookup +930 net/ipv6/tcp_ipv6.c

9c76a114b Wang Yufen   2014-03-29  914  
tcptw->tw_ts_recent, tw->tw_bound_dev_if, tcp_twsk_md5_key(tcptw),
21858cd02 Florent Fourcot  2015-05-16  915  
tw->tw_tclass, cpu_to_be32(tw->tw_flowlabel));
^1da177e4 Linus Torvalds   2005-04-16  916  
8feaf0c0a Arnaldo Carvalho de Melo 2005-08-09  917  inet_twsk_put(tw);
^1da177e4 Linus Torvalds   2005-04-16  918  }
^1da177e4 Linus Torvalds   2005-04-16  919  
a00e74442 Eric Dumazet 2015-09-29  920  static void 
tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
6edafaaf6 Gui Jianfeng 2008-08-06  921  
  struct request_sock *req)
^1da177e4 Linus Torvalds   2005-04-16  922  {
3a19ce0ee Daniel Lee   2014-05-11  923  /* sk->sk_state == 
TCP_LISTEN -> for regular TCP_SYN_RECV
3a19ce0ee Daniel Lee   2014-05-11  924   * sk->sk_state == 
TCP_SYN_RECV -> for Fast Open.
3a19ce0ee Daniel Lee   2014-05-11  925   */
0f85feae6 Eric Dumazet 2014-12-09  926  tcp_v6_send_ack(sk, 
skb, (sk->sk_state == TCP_LISTEN) ?
3a19ce0ee Daniel Lee   2014-05-11  927  
tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt,
0f85feae6 Eric Dumazet 2014-12-09  928  
tcp_rsk(req)->rcv_nxt, req->rcv_wnd,
0f85feae6 Eric Dumazet 2014-12-09  929  
tcp_time_stamp, req->ts_recent, sk->sk_bound_dev_if,
1d13a96c7 Florent Fourcot  2014-01-16 @930  
tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr),
1d13a96c7 Florent Fourcot  2014-01-16  931  0, 0);
^1da177e4 Linus Torvalds   2005-04-16  932  }
^1da177e4 Linus Torvalds   2005-04-16  933  
^1da177e4 Linus Torvalds   2005-04-16  934  
^1da177e4 Linus Torvalds   2005-04-16  935  static struct sock 
*tcp_v6_hnd_req(struct sock *sk, struct sk_buff *skb)
^1da177e4 Linus Torvalds   2005-04-16  936  {
aa8223c7b Arnaldo Carvalho de Melo 2007-04-10  937  const struct tcphdr *th 
= tcp_hdr(skb);
52452c542 Eric Dumazet 2015-03-19  938  struct request_sock 
*req;

:: The code at line 930 was first introduced by commit
:: 1d13a96c74fc4802a775189ddb58bc6469ffdaa3 ipv6: tcp: fix flowlabel value 
in ACK messages send from TIME_WAIT

:: TO: Florent Fourcot 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[PATCH net-next 07/11] net: Remove the now unused vrf_ptr

2015-09-29 Thread David Ahern

Signed-off-by: David Ahern 
---
 drivers/net/vrf.c | 32 ++--
 include/linux/netdevice.h |  2 --
 include/net/vrf.h |  6 --
 3 files changed, 2 insertions(+), 38 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 72f1892ebad0..df872f4efb0d 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -396,18 +396,15 @@ static void __vrf_insert_slave(struct slave_queue *queue, 
struct slave *slave)
 
 static int do_vrf_add_slave(struct net_device *dev, struct net_device 
*port_dev)
 {
-   struct net_vrf_dev *vrf_ptr = kmalloc(sizeof(*vrf_ptr), GFP_KERNEL);
struct slave *slave = kzalloc(sizeof(*slave), GFP_KERNEL);
struct net_vrf *vrf = netdev_priv(dev);
struct slave_queue *queue = &vrf->queue;
int ret = -ENOMEM;
 
-   if (!slave || !vrf_ptr)
+   if (!slave)
goto out_fail;
 
slave->dev = port_dev;
-   vrf_ptr->ifindex = dev->ifindex;
-   vrf_ptr->tb_id = vrf->tb_id;
 
/* register the packet handler for slave ports */
ret = netdev_rx_handler_register(port_dev, vrf_handle_frame, dev);
@@ -424,7 +421,6 @@ static int do_vrf_add_slave(struct net_device *dev, struct 
net_device *port_dev)
 
port_dev->flags |= IFF_SLAVE;
__vrf_insert_slave(queue, slave);
-   rcu_assign_pointer(port_dev->vrf_ptr, vrf_ptr);
cycle_netdev(port_dev);
 
return 0;
@@ -432,7 +428,6 @@ static int do_vrf_add_slave(struct net_device *dev, struct 
net_device *port_dev)
 out_unregister:
netdev_rx_handler_unregister(port_dev);
 out_fail:
-   kfree(vrf_ptr);
kfree(slave);
return ret;
 }
@@ -448,21 +443,15 @@ static int vrf_add_slave(struct net_device *dev, struct 
net_device *port_dev)
 /* inverse of do_vrf_add_slave */
 static int do_vrf_del_slave(struct net_device *dev, struct net_device 
*port_dev)
 {
-   struct net_vrf_dev *vrf_ptr = rtnl_dereference(port_dev->vrf_ptr);
struct net_vrf *vrf = netdev_priv(dev);
struct slave_queue *queue = &vrf->queue;
struct slave *slave;
 
-   RCU_INIT_POINTER(port_dev->vrf_ptr, NULL);
-
netdev_upper_dev_unlink(port_dev, dev);
port_dev->flags &= ~IFF_SLAVE;
 
netdev_rx_handler_unregister(port_dev);
 
-   /* after netdev_rx_handler_unregister for synchronize_rcu */
-   kfree(vrf_ptr);
-
cycle_netdev(port_dev);
 
slave = __vrf_find_slave_dev(queue, port_dev);
@@ -601,10 +590,6 @@ static int vrf_validate(struct nlattr *tb[], struct nlattr 
*data[])
 
 static void vrf_dellink(struct net_device *dev, struct list_head *head)
 {
-   struct net_vrf_dev *vrf_ptr = rtnl_dereference(dev->vrf_ptr);
-
-   RCU_INIT_POINTER(dev->vrf_ptr, NULL);
-   kfree_rcu(vrf_ptr, rcu);
unregister_netdevice_queue(dev, head);
 }
 
@@ -612,7 +597,6 @@ static int vrf_newlink(struct net *src_net, struct 
net_device *dev,
   struct nlattr *tb[], struct nlattr *data[])
 {
struct net_vrf *vrf = netdev_priv(dev);
-   struct net_vrf_dev *vrf_ptr;
int err;
 
if (!data || !data[IFLA_VRF_TABLE])
@@ -622,24 +606,13 @@ static int vrf_newlink(struct net *src_net, struct 
net_device *dev,
 
dev->priv_flags |= IFF_L3MDEV_MASTER;
 
-   err = -ENOMEM;
-   vrf_ptr = kmalloc(sizeof(*dev->vrf_ptr), GFP_KERNEL);
-   if (!vrf_ptr)
-   goto out_fail;
-
-   vrf_ptr->ifindex = dev->ifindex;
-   vrf_ptr->tb_id = vrf->tb_id;
-
err = register_netdevice(dev);
if (err < 0)
goto out_fail;
 
-   rcu_assign_pointer(dev->vrf_ptr, vrf_ptr);
-
return 0;
 
 out_fail:
-   kfree(vrf_ptr);
free_netdev(dev);
return err;
 }
@@ -683,10 +656,9 @@ static int vrf_device_event(struct notifier_block *unused,
 
/* only care about unregister events to drop slave references */
if (event == NETDEV_UNREGISTER) {
-   struct net_vrf_dev *vrf_ptr = rtnl_dereference(dev->vrf_ptr);
struct net_device *vrf_dev;
 
-   if (!vrf_ptr || netif_is_l3_master(dev))
+   if (netif_is_l3_master(dev))
goto out;
 
vrf_dev = netdev_master_upper_dev_get(dev);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c7f14794fe14..72bf9e37a2f0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1427,7 +1427,6 @@ enum netdev_priv_flags {
  * @dn_ptr:DECnet specific data
  * @ip6_ptr:   IPv6 specific data
  * @ax25_ptr:  AX.25 specific data
- * @vrf_ptr:   VRF specific data
  * @ieee80211_ptr: IEEE 802.11 specific data, assign before registering
  *
  * @last_rx:   Time of last Rx
@@ -1649,7 +1648,6 @@ struct net_device {
struct dn_dev __rcu *dn_ptr;
struct inet6_dev __rcu  *ip6_ptr;
void*ax25_ptr;
-   struct net_

[PATCH net-next 05/11] net: Replace vrf_dev_table and friends

2015-09-29 Thread David Ahern

Replace calls to vrf_dev_table and friends with l3mdev_fib_table
and kin.

Signed-off-by: David Ahern 
---
 include/net/vrf.h   | 80 -
 net/ipv4/af_inet.c  |  4 +--
 net/ipv4/fib_frontend.c |  7 ++---
 3 files changed, 5 insertions(+), 86 deletions(-)

diff --git a/include/net/vrf.h b/include/net/vrf.h
index 874a6c9e4217..b05b96646e2a 100644
--- a/include/net/vrf.h
+++ b/include/net/vrf.h
@@ -34,66 +34,6 @@ struct net_vrf {
 
 
 #if IS_ENABLED(CONFIG_NET_VRF)
-/* called with rcu_read_lock */
-static inline u32 vrf_dev_table_rcu(const struct net_device *dev)
-{
-   u32 tb_id = 0;
-
-   if (dev) {
-   struct net_vrf_dev *vrf_ptr;
-
-   vrf_ptr = rcu_dereference(dev->vrf_ptr);
-   if (vrf_ptr)
-   tb_id = vrf_ptr->tb_id;
-   }
-   return tb_id;
-}
-
-static inline u32 vrf_dev_table(const struct net_device *dev)
-{
-   u32 tb_id;
-
-   rcu_read_lock();
-   tb_id = vrf_dev_table_rcu(dev);
-   rcu_read_unlock();
-
-   return tb_id;
-}
-
-static inline u32 vrf_dev_table_ifindex(struct net *net, int ifindex)
-{
-   struct net_device *dev;
-   u32 tb_id = 0;
-
-   if (!ifindex)
-   return 0;
-
-   rcu_read_lock();
-
-   dev = dev_get_by_index_rcu(net, ifindex);
-   if (dev)
-   tb_id = vrf_dev_table_rcu(dev);
-
-   rcu_read_unlock();
-
-   return tb_id;
-}
-
-/* called with rtnl */
-static inline u32 vrf_dev_table_rtnl(const struct net_device *dev)
-{
-   u32 tb_id = 0;
-
-   if (dev) {
-   struct net_vrf_dev *vrf_ptr;
-
-   vrf_ptr = rtnl_dereference(dev->vrf_ptr);
-   if (vrf_ptr)
-   tb_id = vrf_ptr->tb_id;
-   }
-   return tb_id;
-}
-
 /* caller has already checked netif_is_l3_master(dev) */
 static inline struct rtable *vrf_dev_get_rth(const struct net_device *dev)
 {
@@ -108,26 +48,6 @@ static inline struct rtable *vrf_dev_get_rth(const struct 
net_device *dev)
 }
 
 #else
-static inline u32 vrf_dev_table_rcu(const struct net_device *dev)
-{
-   return 0;
-}
-
-static inline u32 vrf_dev_table(const struct net_device *dev)
-{
-   return 0;
-}
-
-static inline u32 vrf_dev_table_ifindex(struct net *net, int ifindex)
-{
-   return 0;
-}
-
-static inline u32 vrf_dev_table_rtnl(const struct net_device *dev)
-{
-   return 0;
-}
-
 static inline struct rtable *vrf_dev_get_rth(const struct net_device *dev)
 {
return ERR_PTR(-ENETUNREACH);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8a556643b874..0df3f0527648 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -119,7 +119,7 @@
 #ifdef CONFIG_IP_MROUTE
 #include 
 #endif
-#include 
+#include 
 
 
 /* The inetsw table contains everything that inet_create needs to
@@ -450,7 +450,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, 
int addr_len)
goto out;
}
 
-   tb_id = vrf_dev_table_ifindex(net, sk->sk_bound_dev_if) ? : tb_id;
+   tb_id = l3mdev_fib_table_by_index(net, sk->sk_bound_dev_if) ? : tb_id;
chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, tb_id);
 
/* Not specified by any standard per-se, however it breaks too
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index b901b344f22d..fac172370276 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -45,7 +45,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -256,7 +255,7 @@ EXPORT_SYMBOL(inet_addr_type);
 unsigned int inet_dev_addr_type(struct net *net, const struct net_device *dev,
__be32 addr)
 {
-   u32 rt_table = vrf_dev_table(dev) ? : RT_TABLE_LOCAL;
+   u32 rt_table = l3mdev_fib_table(dev) ? : RT_TABLE_LOCAL;
 
return __inet_dev_addr_type(net, dev, addr, rt_table);
 }
@@ -269,7 +268,7 @@ unsigned int inet_addr_type_dev_table(struct net *net,
  const struct net_device *dev,
  __be32 addr)
 {
-   u32 rt_table = vrf_dev_table(dev) ? : RT_TABLE_LOCAL;
+   u32 rt_table = l3mdev_fib_table(dev) ? : RT_TABLE_LOCAL;
 
return __inet_dev_addr_type(net, NULL, addr, rt_table);
 }
@@ -804,7 +803,7 @@ static int inet_dump_fib(struct sk_buff *skb, struct 
netlink_callback *cb)
 static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct 
in_ifaddr *ifa)
 {
struct net *net = dev_net(ifa->ifa_dev->dev);
-   u32 tb_id = vrf_dev_table_rtnl(ifa->ifa_dev->dev);
+   u32 tb_id = l3mdev_fib_table(ifa->ifa_dev->dev);
struct fib_table *tb;
struct fib_config cfg = {
.fc_protocol = RTPROT_KERNEL,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.htm

[PATCH net-next 08/11] net: Remove vrf header file

2015-09-29 Thread David Ahern

Move remaining structs to VRF driver and delete the vrf header file.

Signed-off-by: David Ahern 
---
 MAINTAINERS   |  1 -
 drivers/net/vrf.c | 16 +++-
 include/net/vrf.h | 29 -
 3 files changed, 15 insertions(+), 31 deletions(-)
 delete mode 100644 include/net/vrf.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3f2d7a9d0bbf..fa43fa2f30e4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11273,7 +11273,6 @@ M:  Shrijeet Mukherjee 
 L: netdev@vger.kernel.org
 S: Maintained
 F: drivers/net/vrf.c
-F: include/net/vrf.h
 F: Documentation/networking/vrf.txt
 
 VT1211 HARDWARE MONITOR DRIVER
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index df872f4efb0d..64f2ab663ffe 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -34,7 +34,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #define DRV_NAME   "vrf"
@@ -45,6 +44,21 @@
 #define vrf_master_get_rcu(dev) \
((struct net_device *)rcu_dereference(dev->rx_handler_data))
 
+struct slave {
+   struct list_headlist;
+   struct net_device   *dev;
+};
+
+struct slave_queue {
+   struct list_headall_slaves;
+};
+
+struct net_vrf {
+   struct slave_queue  queue;
+   struct rtable   *rth;
+   u32 tb_id;
+};
+
 struct pcpu_dstats {
u64 tx_pkts;
u64 tx_bytes;
diff --git a/include/net/vrf.h b/include/net/vrf.h
deleted file mode 100644
index e83fc38770dd..
--- a/include/net/vrf.h
+++ /dev/null
@@ -1,29 +0,0 @@
-/*
- * include/net/net_vrf.h - adds vrf dev structure definitions
- * Copyright (c) 2015 Cumulus Networks
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#ifndef __LINUX_NET_VRF_H
-#define __LINUX_NET_VRF_H
-
-struct slave {
-   struct list_headlist;
-   struct net_device   *dev;
-};
-
-struct slave_queue {
-   struct list_headall_slaves;
-};
-
-struct net_vrf {
-   struct slave_queue  queue;
-   struct rtable   *rth;
-   u32 tb_id;
-};
-
-#endif /* __LINUX_NET_VRF_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 03/11] net: Add support for l3mdev ops to VRF driver

2015-09-29 Thread David Ahern

Signed-off-by: David Ahern 
---
 drivers/net/Kconfig |  1 +
 drivers/net/vrf.c   | 29 +
 2 files changed, 30 insertions(+)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index d18eb607bee6..b9ebd0d18a52 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -299,6 +299,7 @@ config NLMON
 config NET_VRF
tristate "Virtual Routing and Forwarding (Lite)"
depends on IP_MULTIPLE_TABLES && IPV6_MULTIPLE_TABLES
+   depends on NET_L3_MASTER_DEV
---help---
  This option enables the support for mapping interfaces into VRF's. The
  support enables VRF devices.
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 2d7418e0b908..72f1892ebad0 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRV_NAME   "vrf"
 #define DRV_VERSION"1.0"
@@ -529,6 +530,33 @@ static const struct net_device_ops vrf_netdev_ops = {
.ndo_del_slave  = vrf_del_slave,
 };
 
+static u32 vrf_fib_table(const struct net_device *dev)
+{
+   struct net_vrf *vrf = netdev_priv(dev);
+
+   return vrf->tb_id;
+}
+
+static struct rtable *vrf_get_rtable(const struct net_device *dev,
+const struct flowi4 *fl4)
+{
+   struct rtable *rth = NULL;
+
+   if (!(fl4->flowi4_flags & FLOWI_FLAG_VRFSRC)) {
+   struct net_vrf *vrf = netdev_priv(dev);
+
+   rth = vrf->rth;
+   atomic_inc(&rth->dst.__refcnt);
+   }
+
+   return rth;
+}
+
+static const struct l3mdev_ops vrf_l3mdev_ops = {
+   .l3mdev_fib_table   = vrf_fib_table,
+   .l3mdev_get_rtable  = vrf_get_rtable,
+};
+
 static void vrf_get_drvinfo(struct net_device *dev,
struct ethtool_drvinfo *info)
 {
@@ -546,6 +574,7 @@ static void vrf_setup(struct net_device *dev)
 
/* Initialize the device structure. */
dev->netdev_ops = &vrf_netdev_ops;
+   dev->l3mdev_ops = &vrf_l3mdev_ops;
dev->ethtool_ops = &vrf_ethtool_ops;
dev->destructor = free_netdev;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 00/11] net: L3 master device

2015-09-29 Thread David Ahern

The VRF device is essentially a Layer 3 master device used to associate
netdevices with a specific routing table and to influence FIB lookups
via 'ip rules' and controlling the oif/iif used for the lookup.

This series generalizes the VRF into L3 master device, l3mdev. Similar
to switchdev it has a Kconfig option and separate set of operations
in net_device allowing it to be completely compiled out if not wanted.
The l3mdev methods rely on the 'master' aspect and use of
netdev_master_upper_dev_get_rcu to retrieve the master device from a
given netdevice if it is enslaved to an L3_MASTER.

The VRF device is converted to use the l3mdev operations. At the end the
vrf_ptr is no longer and removed, as are all direct references to VRF.
The end result is a much simpler implementation for VRF.

Thanks to Nikolay for suggestions (eg., use of the master linkage which
is the key to making this work) and to Roopa, Andy and Shrijeet for
early reviews.

v3
- added license header to l3mdev.c

- export symbols in l3mdev.c for use with GPL modules

- removed netdevice header from l3mdev.h (not needed) and fixed
  typo in comment

v2
- rebased to top of net-next
- addressed Niks comments (checking master, removing extra lines, and
  flipping the order of patches 1 and 2)

Changes since RFC:
- Changed IFF_L3MDEV to IFF_L3MDEV_MASTER after Nikolay pointed out a problem
  with my flag changes (uniquely identifying a L3MDEV master device versus an
  enslaved device like a bond that will also be a master device)
- Rolled in icmp fix for panic when flipping from vrf functions to l3mdev
- Moved netif_is_l3_master check into l3mdev_get_rtable

David Ahern (11):
  net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER
  net: Introduce L3 Master device abstraction
  net: Add support for l3mdev ops to VRF driver
  net: Replace vrf_master_ifindex{,_rcu} with l3mdev equivalents
  net: Replace vrf_dev_table and friends
  net: Replace calls to vrf_dev_get_rth
  net: Remove the now unused vrf_ptr
  net: Remove vrf header file
  net: Move netif_index_is_l3_master to l3mdev.h
  net: Rename FLOWI_FLAG_VRFSRC to FLOWI_FLAG_L3MDEV_SRC
  net: Add netif_is_l3_slave

 MAINTAINERS   |   8 ++-
 drivers/net/Kconfig   |   1 +
 drivers/net/vrf.c |  89 +--
 include/linux/netdevice.h |  43 ---
 include/net/flow.h|   2 +-
 include/net/l3mdev.h  | 149 ++
 include/net/route.h   |   5 +-
 include/net/vrf.h | 178 --
 net/Kconfig   |   1 +
 net/Makefile  |   3 +
 net/ipv4/af_inet.c|   4 +-
 net/ipv4/fib_frontend.c   |  12 ++--
 net/ipv4/icmp.c   |   8 +--
 net/ipv4/ip_fragment.c|   6 +-
 net/ipv4/ip_output.c  |   2 +-
 net/ipv4/route.c  |  15 ++--
 net/ipv4/udp.c|   4 +-
 net/ipv4/xfrm4_policy.c   |   8 +--
 net/ipv6/xfrm6_policy.c   |   8 +--
 net/l3mdev/Kconfig|  10 +++
 net/l3mdev/Makefile   |   5 ++
 net/l3mdev/l3mdev.c   |  92 
 22 files changed, 369 insertions(+), 284 deletions(-)
 create mode 100644 include/net/l3mdev.h
 delete mode 100644 include/net/vrf.h
 create mode 100644 net/l3mdev/Kconfig
 create mode 100644 net/l3mdev/Makefile
 create mode 100644 net/l3mdev/l3mdev.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 09/11] net: Move netif_index_is_l3_master to l3mdev.h

2015-09-29 Thread David Ahern

Change CONFIG dependency to CONFIG_NET_L3_MASTER_DEV as well.

Signed-off-by: David Ahern 
---
 include/linux/netdevice.h | 21 -
 include/net/l3mdev.h  | 24 
 include/net/route.h   |  1 +
 3 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 72bf9e37a2f0..b9450784ae06 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3840,27 +3840,6 @@ static inline bool netif_is_ovs_master(const struct 
net_device *dev)
return dev->priv_flags & IFF_OPENVSWITCH;
 }
 
-static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
-{
-   bool rc = false;
-
-#if IS_ENABLED(CONFIG_NET_VRF)
-   struct net_device *dev;
-
-   if (ifindex == 0)
-   return false;
-
-   rcu_read_lock();
-
-   dev = dev_get_by_index_rcu(net, ifindex);
-   if (dev)
-   rc = netif_is_l3_master(dev);
-
-   rcu_read_unlock();
-#endif
-   return rc;
-}
-
 /* This device needs to keep skb dst for qdisc enqueue or ndo_start_xmit() */
 static inline void netif_keep_dst(struct net_device *dev)
 {
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index e382c777bab8..87cee05a0a17 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -81,6 +81,25 @@ static inline struct rtable *l3mdev_get_rtable(const struct 
net_device *dev,
return NULL;
 }
 
+static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
+{
+   struct net_device *dev;
+   bool rc = false;
+
+   if (ifindex == 0)
+   return false;
+
+   rcu_read_lock();
+
+   dev = dev_get_by_index_rcu(net, ifindex);
+   if (dev)
+   rc = netif_is_l3_master(dev);
+
+   rcu_read_unlock();
+
+   return rc;
+}
+
 #else
 
 static inline int l3mdev_master_ifindex_rcu(struct net_device *dev)
@@ -120,6 +139,11 @@ static inline struct rtable *l3mdev_get_rtable(const 
struct net_device *dev,
return NULL;
 }
 
+static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
+{
+   return false;
+}
+
 #endif
 
 #endif /* _NET_L3MDEV_H_ */
diff --git a/include/net/route.h b/include/net/route.h
index a565d0dad12c..e211dc167db1 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 06/11] net: Replace calls to vrf_dev_get_rth

2015-09-29 Thread David Ahern

Replace calls to vrf_dev_get_rth with l3mdev_get_rtable.
The check on the flow flags is handled in the l3mdev operation.

Signed-off-by: David Ahern 
---
 include/net/vrf.h | 22 --
 net/ipv4/route.c  |  8 +++-
 2 files changed, 3 insertions(+), 27 deletions(-)

diff --git a/include/net/vrf.h b/include/net/vrf.h
index b05b96646e2a..5bba1535ba73 100644
--- a/include/net/vrf.h
+++ b/include/net/vrf.h
@@ -32,26 +32,4 @@ struct net_vrf {
u32 tb_id;
 };
 
-
-#if IS_ENABLED(CONFIG_NET_VRF)
-/* caller has already checked netif_is_l3_master(dev) */
-static inline struct rtable *vrf_dev_get_rth(const struct net_device *dev)
-{
-   struct rtable *rth = ERR_PTR(-ENETUNREACH);
-   struct net_vrf *vrf = netdev_priv(dev);
-
-   if (vrf) {
-   rth = vrf->rth;
-   atomic_inc(&rth->dst.__refcnt);
-   }
-   return rth;
-}
-
-#else
-static inline struct rtable *vrf_dev_get_rth(const struct net_device *dev)
-{
-   return ERR_PTR(-ENETUNREACH);
-}
-#endif
-
 #endif /* __LINUX_NET_VRF_H */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index ba47c45c..1441de1550e6 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -112,7 +112,6 @@
 #endif
 #include 
 #include 
-#include 
 #include 
 
 #define RT_FL_TOS(oldflp4) \
@@ -2125,11 +2124,10 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
fl4->saddr = inet_select_addr(dev_out, 0,
  RT_SCOPE_HOST);
}
-   if (netif_is_l3_master(dev_out) &&
-   !(fl4->flowi4_flags & FLOWI_FLAG_VRFSRC)) {
-   rth = vrf_dev_get_rth(dev_out);
+
+   rth = l3mdev_get_rtable(dev_out, fl4);
+   if (rth)
goto out;
-   }
}
 
if (!fl4->daddr) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 02/11] net: Introduce L3 Master device abstraction

2015-09-29 Thread David Ahern

L3 master devices allow users of the abstraction to influence FIB lookups
for enslaved devices. Current API provides a means for the master device
to return a specific FIB table for an enslaved device, to return an
rtable/custom dst and influence the OIF used for fib lookups.

Signed-off-by: David Ahern 
---
 MAINTAINERS   |   7 +++
 include/linux/netdevice.h |   3 ++
 include/net/l3mdev.h  | 125 ++
 net/Kconfig   |   1 +
 net/Makefile  |   3 ++
 net/l3mdev/Kconfig|  10 
 net/l3mdev/Makefile   |   5 ++
 net/l3mdev/l3mdev.c   |  92 ++
 8 files changed, 246 insertions(+)
 create mode 100644 include/net/l3mdev.h
 create mode 100644 net/l3mdev/Kconfig
 create mode 100644 net/l3mdev/Makefile
 create mode 100644 net/l3mdev/l3mdev.c

diff --git a/MAINTAINERS b/MAINTAINERS
index bcd263de4827..3f2d7a9d0bbf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6095,6 +6095,13 @@ F:   Documentation/auxdisplay/ks0108
 F: drivers/auxdisplay/ks0108.c
 F: include/linux/ks0108.h
 
+L3MDEV
+M: David Ahern 
+L: netdev@vger.kernel.org
+S: Maintained
+F: net/l3mdev
+F: include/net/l3mdev.h
+
 LAPB module
 L: linux-...@vger.kernel.org
 S: Orphan
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 99c33e83822f..c7f14794fe14 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1587,6 +1587,9 @@ struct net_device {
 #ifdef CONFIG_NET_SWITCHDEV
const struct switchdev_ops *switchdev_ops;
 #endif
+#ifdef CONFIG_NET_L3_MASTER_DEV
+   const struct l3mdev_ops *l3mdev_ops;
+#endif
 
const struct header_ops *header_ops;
 
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
new file mode 100644
index ..e382c777bab8
--- /dev/null
+++ b/include/net/l3mdev.h
@@ -0,0 +1,125 @@
+/*
+ * include/net/l3mdev.h - L3 master device API
+ * Copyright (c) 2015 Cumulus Networks
+ * Copyright (c) 2015 David Ahern 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#ifndef _NET_L3MDEV_H_
+#define _NET_L3MDEV_H_
+
+/**
+ * struct l3mdev_ops - l3mdev operations
+ *
+ * @l3mdev_fib_table: Get FIB table id to use for lookups
+ *
+ * @l3mdev_get_rtable: Get cached IPv4 rtable (dst_entry) for device
+ */
+
+struct l3mdev_ops {
+   u32 (*l3mdev_fib_table)(const struct net_device *dev);
+   struct rtable * (*l3mdev_get_rtable)(const struct net_device *dev,
+const struct flowi4 *fl4);
+};
+
+#ifdef CONFIG_NET_L3_MASTER_DEV
+
+int l3mdev_master_ifindex_rcu(struct net_device *dev);
+static inline int l3mdev_master_ifindex(struct net_device *dev)
+{
+   int ifindex;
+
+   rcu_read_lock();
+   ifindex = l3mdev_master_ifindex_rcu(dev);
+   rcu_read_unlock();
+
+   return ifindex;
+}
+
+/* get index of an interface to use for FIB lookups. For devices
+ * enslaved to an L3 master device FIB lookups are based on the
+ * master index
+ */
+static inline int l3mdev_fib_oif_rcu(struct net_device *dev)
+{
+   return l3mdev_master_ifindex_rcu(dev) ? : dev->ifindex;
+}
+
+static inline int l3mdev_fib_oif(struct net_device *dev)
+{
+   int oif;
+
+   rcu_read_lock();
+   oif = l3mdev_fib_oif_rcu(dev);
+   rcu_read_unlock();
+
+   return oif;
+}
+
+u32 l3mdev_fib_table_rcu(const struct net_device *dev);
+u32 l3mdev_fib_table_by_index(struct net *net, int ifindex);
+static inline u32 l3mdev_fib_table(const struct net_device *dev)
+{
+   u32 tb_id;
+
+   rcu_read_lock();
+   tb_id = l3mdev_fib_table_rcu(dev);
+   rcu_read_unlock();
+
+   return tb_id;
+}
+
+static inline struct rtable *l3mdev_get_rtable(const struct net_device *dev,
+  const struct flowi4 *fl4)
+{
+   if (netif_is_l3_master(dev) && dev->l3mdev_ops->l3mdev_get_rtable)
+   return dev->l3mdev_ops->l3mdev_get_rtable(dev, fl4);
+
+   return NULL;
+}
+
+#else
+
+static inline int l3mdev_master_ifindex_rcu(struct net_device *dev)
+{
+   return 0;
+}
+static inline int l3mdev_master_ifindex(struct net_device *dev)
+{
+   return 0;
+}
+
+static inline int l3mdev_fib_oif_rcu(struct net_device *dev)
+{
+   return dev ? dev->ifindex : 0;
+}
+static inline int l3mdev_fib_oif(struct net_device *dev)
+{
+   return dev ? dev->ifindex : 0;
+}
+
+static inline u32 l3mdev_fib_table_rcu(const struct net_device *dev)
+{
+   return 0;
+}
+static inline u32 l3mdev_fib_table(const struct net_device *dev)
+{
+   return 0;
+}
+static inline u32 l3mdev_fib_table_by_index(struct net *net, int ifindex)
+{
+   return 0;
+}
+
+static inline struct rtable *l3mdev_get_rtable(c

[PATCH net-next 01/11] net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER

2015-09-29 Thread David Ahern

Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the
netif_is_vrf and netif_index_is_vrf macros.

Signed-off-by: David Ahern 
---
 drivers/net/vrf.c |  6 +++---
 include/linux/netdevice.h | 14 +++---
 include/net/route.h   |  2 +-
 include/net/vrf.h |  4 ++--
 net/ipv4/ip_output.c  |  2 +-
 net/ipv4/route.c  |  2 +-
 net/ipv4/udp.c|  2 +-
 7 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 4ecb3a3e516a..2d7418e0b908 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -438,7 +438,7 @@ static int do_vrf_add_slave(struct net_device *dev, struct 
net_device *port_dev)
 
 static int vrf_add_slave(struct net_device *dev, struct net_device *port_dev)
 {
-   if (netif_is_vrf(port_dev) || vrf_is_slave(port_dev))
+   if (netif_is_l3_master(port_dev) || vrf_is_slave(port_dev))
return -EINVAL;
 
return do_vrf_add_slave(dev, port_dev);
@@ -591,7 +591,7 @@ static int vrf_newlink(struct net *src_net, struct 
net_device *dev,
 
vrf->tb_id = nla_get_u32(data[IFLA_VRF_TABLE]);
 
-   dev->priv_flags |= IFF_VRF_MASTER;
+   dev->priv_flags |= IFF_L3MDEV_MASTER;
 
err = -ENOMEM;
vrf_ptr = kmalloc(sizeof(*dev->vrf_ptr), GFP_KERNEL);
@@ -657,7 +657,7 @@ static int vrf_device_event(struct notifier_block *unused,
struct net_vrf_dev *vrf_ptr = rtnl_dereference(dev->vrf_ptr);
struct net_device *vrf_dev;
 
-   if (!vrf_ptr || netif_is_vrf(dev))
+   if (!vrf_ptr || netif_is_l3_master(dev))
goto out;
 
vrf_dev = netdev_master_upper_dev_get(dev);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d2ffeafc9998..99c33e83822f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1258,7 +1258,7 @@ struct net_device_ops {
  * @IFF_LIVE_ADDR_CHANGE: device supports hardware address
  * change when it's running
  * @IFF_MACVLAN: Macvlan device
- * @IFF_VRF_MASTER: device is a VRF master
+ * @IFF_L3MDEV_MASTER: device is an L3 master device
  * @IFF_NO_QUEUE: device can run without qdisc attached
  * @IFF_OPENVSWITCH: device is a Open vSwitch master
  */
@@ -1283,7 +1283,7 @@ enum netdev_priv_flags {
IFF_XMIT_DST_RELEASE_PERM   = 1<<17,
IFF_IPVLAN_MASTER   = 1<<18,
IFF_IPVLAN_SLAVE= 1<<19,
-   IFF_VRF_MASTER  = 1<<20,
+   IFF_L3MDEV_MASTER   = 1<<20,
IFF_NO_QUEUE= 1<<21,
IFF_OPENVSWITCH = 1<<22,
 };
@@ -1308,7 +1308,7 @@ enum netdev_priv_flags {
 #define IFF_XMIT_DST_RELEASE_PERM  IFF_XMIT_DST_RELEASE_PERM
 #define IFF_IPVLAN_MASTER  IFF_IPVLAN_MASTER
 #define IFF_IPVLAN_SLAVE   IFF_IPVLAN_SLAVE
-#define IFF_VRF_MASTER IFF_VRF_MASTER
+#define IFF_L3MDEV_MASTER  IFF_L3MDEV_MASTER
 #define IFF_NO_QUEUE   IFF_NO_QUEUE
 #define IFF_OPENVSWITCHIFF_OPENVSWITCH
 
@@ -3824,9 +3824,9 @@ static inline bool netif_supports_nofcs(struct net_device 
*dev)
return dev->priv_flags & IFF_SUPP_NOFCS;
 }
 
-static inline bool netif_is_vrf(const struct net_device *dev)
+static inline bool netif_is_l3_master(const struct net_device *dev)
 {
-   return dev->priv_flags & IFF_VRF_MASTER;
+   return dev->priv_flags & IFF_L3MDEV_MASTER;
 }
 
 static inline bool netif_is_bridge_master(const struct net_device *dev)
@@ -3839,7 +3839,7 @@ static inline bool netif_is_ovs_master(const struct 
net_device *dev)
return dev->priv_flags & IFF_OPENVSWITCH;
 }
 
-static inline bool netif_index_is_vrf(struct net *net, int ifindex)
+static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
 {
bool rc = false;
 
@@ -3853,7 +3853,7 @@ static inline bool netif_index_is_vrf(struct net *net, 
int ifindex)
 
dev = dev_get_by_index_rcu(net, ifindex);
if (dev)
-   rc = netif_is_vrf(dev);
+   rc = netif_is_l3_master(dev);
 
rcu_read_unlock();
 #endif
diff --git a/include/net/route.h b/include/net/route.h
index d1bd90bb3187..a565d0dad12c 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -256,7 +256,7 @@ static inline void ip_route_connect_init(struct flowi4 
*fl4, __be32 dst, __be32
if (inet_sk(sk)->transparent)
flow_flags |= FLOWI_FLAG_ANYSRC;
 
-   if (netif_index_is_vrf(sock_net(sk), oif))
+   if (netif_index_is_l3_master(sock_net(sk), oif))
flow_flags |= FLOWI_FLAG_VRFSRC | FLOWI_FLAG_SKIP_NH_OIF;
 
flowi4_init_output(fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE,
diff --git a/include/net/vrf.h b/include/net/vrf.h
index 593e6094ddd4..34bb3f69def2 100644
--- a/include/net/vrf.h
+++ b/include/net/vrf.h
@@ -43,7 +43,7 @@ static inline int vrf_master_ifindex_rcu(cons

[PATCH net-next 04/11] net: Replace vrf_master_ifindex{,_rcu} with l3mdev equivalents

2015-09-29 Thread David Ahern

Replace calls to vrf_master_ifindex_rcu and vrf_master_ifindex with either
l3mdev_master_ifindex_rcu or l3mdev_master_ifindex.

The pattern:
oif = vrf_master_ifindex(dev) ? : dev->ifindex;
is replaced with
oif = l3mdev_fib_oif(dev);

And remove the now unused vrf macros.

Signed-off-by: David Ahern 
---
 include/net/vrf.h   | 41 -
 net/ipv4/fib_frontend.c |  5 +++--
 net/ipv4/icmp.c |  8 
 net/ipv4/ip_fragment.c  |  6 +++---
 net/ipv4/route.c|  7 ---
 net/ipv4/xfrm4_policy.c |  8 +++-
 net/ipv6/xfrm6_policy.c |  8 +++-
 7 files changed, 20 insertions(+), 63 deletions(-)

diff --git a/include/net/vrf.h b/include/net/vrf.h
index 34bb3f69def2..874a6c9e4217 100644
--- a/include/net/vrf.h
+++ b/include/net/vrf.h
@@ -34,37 +34,6 @@ struct net_vrf {
 
 
 #if IS_ENABLED(CONFIG_NET_VRF)
-/* called with rcu_read_lock() */
-static inline int vrf_master_ifindex_rcu(const struct net_device *dev)
-{
-   struct net_vrf_dev *vrf_ptr;
-   int ifindex = 0;
-
-   if (!dev)
-   return 0;
-
-   if (netif_is_l3_master(dev)) {
-   ifindex = dev->ifindex;
-   } else {
-   vrf_ptr = rcu_dereference(dev->vrf_ptr);
-   if (vrf_ptr)
-   ifindex = vrf_ptr->ifindex;
-   }
-
-   return ifindex;
-}
-
-static inline int vrf_master_ifindex(const struct net_device *dev)
-{
-   int ifindex;
-
-   rcu_read_lock();
-   ifindex = vrf_master_ifindex_rcu(dev);
-   rcu_read_unlock();
-
-   return ifindex;
-}
-
 /* called with rcu_read_lock */
 static inline u32 vrf_dev_table_rcu(const struct net_device *dev)
 {
@@ -139,16 +108,6 @@ static inline struct rtable *vrf_dev_get_rth(const struct 
net_device *dev)
 }
 
 #else
-static inline int vrf_master_ifindex_rcu(const struct net_device *dev)
-{
-   return 0;
-}
-
-static inline int vrf_master_ifindex(const struct net_device *dev)
-{
-   return 0;
-}
-
 static inline u32 vrf_dev_table_rcu(const struct net_device *dev)
 {
return 0;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 6fcbd215cdbc..b901b344f22d 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #ifndef CONFIG_IP_MULTIPLE_TABLES
@@ -332,7 +333,7 @@ static int __fib_validate_source(struct sk_buff *skb, 
__be32 src, __be32 dst,
bool dev_match;
 
fl4.flowi4_oif = 0;
-   fl4.flowi4_iif = vrf_master_ifindex_rcu(dev);
+   fl4.flowi4_iif = l3mdev_master_ifindex_rcu(dev);
if (!fl4.flowi4_iif)
fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX;
fl4.daddr = src;
@@ -366,7 +367,7 @@ static int __fib_validate_source(struct sk_buff *skb, 
__be32 src, __be32 dst,
if (nh->nh_dev == dev) {
dev_match = true;
break;
-   } else if (vrf_master_ifindex_rcu(nh->nh_dev) == dev->ifindex) {
+   } else if (l3mdev_master_ifindex_rcu(nh->nh_dev) == 
dev->ifindex) {
dev_match = true;
break;
}
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index e5eb8ac4089d..6b96dee2800b 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -96,7 +96,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 /*
  * Build xmit assembly blocks
@@ -309,7 +309,7 @@ static bool icmpv4_xrlim_allow(struct net *net, struct 
rtable *rt,
 
rc = false;
if (icmp_global_allow()) {
-   int vif = vrf_master_ifindex(dst->dev);
+   int vif = l3mdev_master_ifindex(dst->dev);
struct inet_peer *peer;
 
peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, vif, 1);
@@ -427,7 +427,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct 
sk_buff *skb)
fl4.flowi4_mark = mark;
fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
fl4.flowi4_proto = IPPROTO_ICMP;
-   fl4.flowi4_oif = vrf_master_ifindex(skb->dev);
+   fl4.flowi4_oif = l3mdev_master_ifindex(skb->dev);
security_skb_classify_flow(skb, flowi4_to_flowi(&fl4));
rt = ip_route_output_key(net, &fl4);
if (IS_ERR(rt))
@@ -461,7 +461,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
fl4->flowi4_proto = IPPROTO_ICMP;
fl4->fl4_icmp_type = type;
fl4->fl4_icmp_code = code;
-   fl4->flowi4_oif = vrf_master_ifindex(skb_in->dev);
+   fl4->flowi4_oif = l3mdev_master_ifindex(skb_in->dev);
 
security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
rt = __ip_route_output_key(net, fl4);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index fa7f15305f9a..9772b789adf3 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -48,7 +48,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 /* NOTE. Logic of IP defragmentation is para

Re: [PATCH net-next 2/2] openvswitch: netlink attributes for IPv6 tunneling

2015-09-29 Thread Jesse Gross

On Tue, Sep 29, 2015 at 10:52 AM, Jiri Benc  wrote:
> When compat code for tunnel configuration is used, IPv6 tun_info will be
> rejected by ovs_tunnel_get_egress_info. As the consequence, only the new way
> of tunnel config supports IPv6.

This appears to me to be a bug in the existing code.
ovs_tunnel_get_egress_info() as a general mechanism is still in use
and should work with both the old and new configuration methods.
However, I agree that it doesn't look like it will work currently with
tunnel devices. I think we need to fix this rather than making it more
broken.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: macb: fix two typos

2015-09-29 Thread Geliang Tang

Just fix two typos in code comments.

Signed-off-by: Geliang Tang 
---
 drivers/net/ethernet/cadence/macb.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index 6e1faea..866b128 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -267,9 +267,9 @@
 #define MACB_BEX_SIZE  1
 #define MACB_RM9200_BNQ_OFFSET 4 /* AT91RM9200 only */
 #define MACB_RM9200_BNQ_SIZE   1 /* AT91RM9200 only */
-#define MACB_COMP_OFFSET   5 /* Trnasmit complete */
+#define MACB_COMP_OFFSET   5 /* Transmit complete */
 #define MACB_COMP_SIZE 1
-#define MACB_UND_OFFSET6 /* Trnasmit under run */
+#define MACB_UND_OFFSET6 /* Transmit under run */
 #define MACB_UND_SIZE  1
 
 /* Bitfields in RSR */
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: Initialize flow flags in input path

2015-09-29 Thread David Ahern

The fib_table_lookup tracepoint found 2 places where the flowi4_flags is
not initialized.

Signed-off-by: David Ahern 
---
 net/ipv4/fib_frontend.c | 1 +
 net/ipv4/route.c| 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 6fcbd215cdbc..690bcbc59f26 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -340,6 +340,7 @@ static int __fib_validate_source(struct sk_buff *skb, 
__be32 src, __be32 dst,
fl4.flowi4_tos = tos;
fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
fl4.flowi4_tun_key.tun_id = 0;
+   fl4.flowi4_flags = 0;
 
no_addr = idev->ifa_list == NULL;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8c84a6664b30..13ac8d012aa7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1743,6 +1743,7 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
fl4.flowi4_mark = skb->mark;
fl4.flowi4_tos = tos;
fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
+   fl4.flowi4_flags = 0;
fl4.daddr = daddr;
fl4.saddr = saddr;
err = fib_lookup(net, &fl4, &res, 0);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ovs-dev] [PATCH net-next 1/2] openvswitch: add tunnel protocol to sw_flow_key

2015-09-29 Thread Jesse Gross

On Tue, Sep 29, 2015 at 10:52 AM, Jiri Benc  wrote:
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 5c030a4d7338..03ba070c3256 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -643,6 +643,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
> }
>
> SW_FLOW_KEY_PUT(match, tun_key.tun_flags, tun_flags, is_mask);
> +   SW_FLOW_KEY_PUT(match, tun_proto, AF_INET, is_mask);

I don't think this is right in the case of the mask. It will cause the
the mask to be the value AF_INET - instead you want to set the mask to
be 0xff.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] net: add pfmemalloc check in sk_add_backlog()

2015-09-29 Thread Eric Dumazet

From: Eric Dumazet 

Greg reported crashes hitting the following check in __sk_backlog_rcv()

BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); 

The pfmemalloc bit is currently checked in sk_filter().

This works correctly for TCP, because sk_filter() is ran in
tcp_v[46]_rcv() before hitting the prequeue or backlog checks.

For UDP or other protocols, this does not work, because the sk_filter()
is ran from sock_queue_rcv_skb(), which might be called _after_ backlog
queuing if socket is owned by user by the time packet is processed by
softirq handler.

Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB 
processing")
Signed-off-by: Eric Dumazet 
Reported-by: Greg Thelen 
---
 include/net/sock.h |8 
 1 file changed, 8 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7aa78440559a..e23717013a4e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -828,6 +828,14 @@ static inline __must_check int sk_add_backlog(struct sock 
*sk, struct sk_buff *s
if (sk_rcvqueues_full(sk, limit))
return -ENOBUFS;
 
+   /*
+* If the skb was allocated from pfmemalloc reserves, only
+* allow SOCK_MEMALLOC sockets to use it as this socket is
+* helping free memory
+*/
+   if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
+   return -ENOMEM;
+
__sk_add_backlog(sk, skb);
sk->sk_backlog.len += skb->truesize;
return 0;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] netfilter: remove dead code

2015-09-29 Thread Florian Westphal

Flavio Leitner  wrote:
> Remove __nf_conntrack_find() from headers.
> Fixes: dcd93ed4cd1 ("netfilter: nf_conntrack: remove dead code"

For the record: netfilter patches should go to
netfilter-de...@vger.kernel.org .

That being said, in this case I doubt Pablo minds if David takes this
directly, patch ts obviously correct[tm] :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] netfilter: remove dead code

2015-09-29 Thread Flavio Leitner

Remove __nf_conntrack_find() from headers.
Fixes: dcd93ed4cd1 ("netfilter: nf_conntrack: remove dead code"

Signed-off-by: Flavio Leitner 
---
 include/net/netfilter/nf_conntrack.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index d642f68..fde4068 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -183,10 +183,6 @@ void *nf_ct_alloc_hashtable(unsigned int *sizep, int 
nulls);
 
 void nf_ct_free_hashtable(void *hash, unsigned int size);
 
-struct nf_conntrack_tuple_hash *
-__nf_conntrack_find(struct net *net, u16 zone,
-   const struct nf_conntrack_tuple *tuple);
-
 int nf_conntrack_hash_check_insert(struct nf_conn *ct);
 bool nf_ct_delete(struct nf_conn *ct, u32 pid, int report);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] Documentation: improve line discipline method descriptions

2015-09-29 Thread Tilman Schmidt

Mention that the ldisc open method must set tty->receive_room, and
that many methods are optional. Add description of receive_buf2 method.

Signed-off-by: Tilman Schmidt 
---
 Documentation/serial/tty.txt | 60 
 1 file changed, 39 insertions(+), 21 deletions(-)

diff --git a/Documentation/serial/tty.txt b/Documentation/serial/tty.txt
index 973c8ad..bc3842d 100644
--- a/Documentation/serial/tty.txt
+++ b/Documentation/serial/tty.txt
@@ -39,8 +39,13 @@ TTY side interfaces:
 open() -   Called when the line discipline is attached to
the terminal. No other call into the line
discipline for this tty will occur until it
-   completes successfully. Returning an error will
-   prevent the ldisc from being attached. Can sleep.
+   completes successfully. Should initialize any
+   state needed by the ldisc, and set receive_room
+   in the tty_struct to the maximum amount of data
+   the line discipline is willing to accept from the
+   driver with a single call to receive_buf().
+   Returning an error will prevent the ldisc from
+   being attached. Can sleep.
 
 close()-   This is called on a terminal when the line
discipline is being unplugged. At the point of
@@ -52,9 +57,16 @@ hangup() -   Called when the tty line is hung up.
No further calls into the ldisc code will occur.
The return value is ignored. Can sleep.
 
-write()-   A process is writing data through the line
-   discipline.  Multiple write calls are serialized
-   by the tty layer for the ldisc.  May sleep. 
+read() -   (optional) A process requests reading data from
+   the line. Multiple read calls may occur in parallel
+   and the ldisc must deal with serialization issues.
+   If not defined, the process will receive an EIO
+   error. May sleep.
+
+write()-   (optional) A process requests writing data to 
the
+   line. Multiple write calls are serialized by the
+   tty layer for the ldisc. If not defined, the
+   process will receive an EIO error. May sleep.
 
 flush_buffer() -   (optional) May be called at any point between
open and close, and instructs the line discipline
@@ -69,27 +81,33 @@ set_termios()   -   (optional) Called on termios 
structure changes.
termios semaphore so allowed to sleep. Serialized
against itself only.
 
-read() -   Move data from the line discipline to the user.
-   Multiple read calls may occur in parallel and the
-   ldisc must deal with serialization issues. May 
-   sleep.
-
-poll() -   Check the status for the poll/select calls. Multiple
-   poll calls may occur in parallel. May sleep.
+poll() -   (optional) Check the status for the poll/select
+   calls. Multiple poll calls may occur in parallel.
+   May sleep.
 
-ioctl()-   Called when an ioctl is handed to the tty layer
-   that might be for the ldisc. Multiple ioctl calls
-   may occur in parallel. May sleep. 
+ioctl()-   (optional) Called when an ioctl is handed to the
+   tty layer that might be for the ldisc. Multiple
+   ioctl calls may occur in parallel. May sleep.
 
-compat_ioctl() -   Called when a 32 bit ioctl is handed to the tty layer
-   that might be for the ldisc. Multiple ioctl calls
-   may occur in parallel. May sleep.
+compat_ioctl() -   (optional) Called when a 32 bit ioctl is handed
+   to the tty layer that might be for the ldisc.
+   Multiple ioctl calls may occur in parallel.
+   May sleep.
 
 Driver Side Interfaces:
 
-receive_buf()  -   Hand buffers of bytes from the driver to the ldisc
-   for processing. Semantics currently rather
-   mysterious 8(
+receive_buf()  -   (optional) Called by the low-level driver to hand
+   a buffer of received bytes to the ldisc for
+   processing. The number of bytes is guaranteed not
+   to exceed the current value of tty->receive_room.
+   All bytes must be processed.
+
+receive_buf2() -   (optional) Called by

Re: [PATCH net-next 00/14] tcp: listener refactoring preparations

2015-09-29 Thread David Miller

From: Eric Dumazet 
Date: Tue, 29 Sep 2015 07:42:38 -0700

> This patch series makes changes to TCP/DCCP stacks so that
> we can switch listener code to lockless mode.
> 
> This is done by marking const the listener socket in all
> appropriate paths.
> 
> FastOpen code had to be changed to not dynamically allocate
> a very small structure to make code simpler for following changes.

Series applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] 802.1AD: Flow handling, actions, vlan parsing and netlink attributes

2015-09-29 Thread Thomas F Herbert


On 9/29/15 6:56 PM, Pravin Shelar wrote:

On Fri, Sep 25, 2015 at 3:35 PM, Thomas F Herbert
 wrote:

Pravin,

Another comment and question. Please seen inline below.

Thanks,

--Tom

On 9/24/15 7:42 PM, Pravin Shelar wrote:

On Thu, Sep 24, 2015 at 10:58 AM, Thomas F Herbert
 wrote:

Add support for 802.1ad including the ability to push and pop double
tagged vlans. Add support for 802.1ad to netlink parsing and flow
conversion. Uses double nested encap attributes to represent double
tagged vlan. Inner TPID encoded along with ctci in nested attributes.

Signed-off-by: Thomas F Herbert 
---
   net/openvswitch/flow.c |  83 +
   net/openvswitch/flow.h |   5 ++
   net/openvswitch/flow_netlink.c | 166
++---
   3 files changed, 230 insertions(+), 24 deletions(-)


...


@@ -1320,6 +1437,7 @@ static int __ovs_nla_put_key(const struct
sw_flow_key *swkey,
   {
  struct ovs_key_ethernet *eth_key;
  struct nlattr *nla, *encap;
+   struct nlattr *in_encap = NULL;

  if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id))
  goto nla_put_failure;
@@ -1368,17 +1486,42 @@ static int __ovs_nla_put_key(const struct
sw_flow_key *swkey,
  ether_addr_copy(eth_key->eth_src, output->eth.src);
  ether_addr_copy(eth_key->eth_dst, output->eth.dst);

-   if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) {
+   if (swkey->eth.tci || eth_type_vlan(swkey->eth.type)) {
  __be16 eth_type;
-   eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0x);
+
+   if (swkey->eth.cvlan.ctci ||
+   eth_type_vlan(swkey->eth.cvlan.c_tpid))
+   eth_type = !is_mask ? htons(ETH_P_8021AD) :
+ htons(0x);
+   else
+   eth_type = !is_mask ? htons(ETH_P_8021Q) :
+ htons(0x);
+

Here we can directly dump output->eth.type to netlink. No need to
check for inner encap.

The eth.type is set to the inner encapsulated protocol not to the tpid. We
don't "know" what the outer tpid so I assume it is 802.1Q. To address this
situation, do you think I should add the outer tpid to sw_flow_key?
Also see comment above in flow.h.


With the addition of nested vlan, we need to add outer tpid. This will
simplify vlan netlink serialization too.

Yes, thanks. I agree that this is the sensible approach.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 00/11] net: L3 master device

2015-09-29 Thread David Ahern

On 9/29/15 5:23 PM, David Miller wrote:

From: David Ahern 
Date: Mon, 28 Sep 2015 10:16:50 -0700

v2
- rebased to top of net-next

- addressed Niks comments (checking master, removing extra lines, and
   flipping the order of patches 1 and 2)

This still needs some work:

ERROR: "l3mdev_master_ifindex_rcu" [net/ipv6/ipv6.ko] undefined!
scripts/Makefile.modpost:90: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1
Makefile:1095: recipe for target 'modules' failed
make: *** [modules] Error 2

ugh. All of my builds have CONFIG_IPV6=y. Will kickout a v3 later.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] skbuff: Fix skb checksum partial check.

2015-09-29 Thread David Miller

From: Pravin B Shelar 
Date: Mon, 28 Sep 2015 17:24:25 -0700

> Earlier patch 6ae459bda tried to detect void ckecksum partial
> skb by comparing pull length to checksum offset. But it does
> not work for all cases since checksum-offset depends on
> updates to skb->data.
> 
> Following patch fixes it by validating checksum start offset
> after skb-data pointer is updated. Negative value of checksum
> offset start means there is no need to checksum.
> 
> Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull")
> Reported-by: Andrew Vagin 
> Signed-off-by: Pravin B Shelar 
> ---
> This and 6ae459bda patches needs to be backported to stable.

Applied and both queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 0/3] BPF updates

2015-09-29 Thread Daniel Borkmann

Some minor updates to {cls,act}_bpf to retrieve routing realms
and to make skb->priority writable.

Thanks!

v1 -> v2:
 - Dropped preclassify patch for now from the series as the
   rest is pretty much independent of it
 - Rest unchanged, only rebased and already posted Acked-by's kept

Daniel Borkmann (3):
  ebpf: migrate bpf_prog's flags to bitfield
  sched, bpf: add helper for retrieving routing realms
  sched, bpf: make skb->priority writable

 arch/arm/net/bpf_jit_32.c   |  2 +-
 arch/arm64/net/bpf_jit_comp.c   |  2 +-
 arch/mips/net/bpf_jit.c |  2 +-
 arch/powerpc/net/bpf_jit_comp.c |  2 +-
 arch/s390/net/bpf_jit_comp.c|  2 +-
 arch/sparc/net/bpf_jit_comp.c   |  2 +-
 arch/x86/net/bpf_jit_comp.c |  2 +-
 include/linux/filter.h  |  7 +--
 include/uapi/linux/bpf.h|  7 +++
 kernel/bpf/core.c   |  4 
 kernel/bpf/syscall.c|  6 --
 net/core/filter.c   | 33 ++---
 net/sched/cls_bpf.c |  8 ++--
 13 files changed, 63 insertions(+), 16 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 3/3] sched, bpf: make skb->priority writable

2015-09-29 Thread Daniel Borkmann

{cls,act}_bpf can now set the skb->priority from an eBPF program based
on various critera, so that for example classful qdiscs like multiq can
update the skb's priority during enqueue time and further push it down
into subsequent qdiscs.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 net/core/filter.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 45c69ce..53a5036 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1721,6 +1721,7 @@ static bool tc_cls_act_is_valid_access(int off, int size,
switch (off) {
case offsetof(struct __sk_buff, mark):
case offsetof(struct __sk_buff, tc_index):
+   case offsetof(struct __sk_buff, priority):
case offsetof(struct __sk_buff, cb[0]) ...
offsetof(struct __sk_buff, cb[4]):
break;
@@ -1762,8 +1763,12 @@ static u32 bpf_net_convert_ctx_access(enum 
bpf_access_type type, int dst_reg,
case offsetof(struct __sk_buff, priority):
BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, priority) != 4);
 
-   *insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg,
- offsetof(struct sk_buff, priority));
+   if (type == BPF_WRITE)
+   *insn++ = BPF_STX_MEM(BPF_W, dst_reg, src_reg,
+ offsetof(struct sk_buff, 
priority));
+   else
+   *insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg,
+ offsetof(struct sk_buff, 
priority));
break;
 
case offsetof(struct __sk_buff, ingress_ifindex):
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 1/3] ebpf: migrate bpf_prog's flags to bitfield

2015-09-29 Thread Daniel Borkmann

As we need to add further flags to the bpf_prog structure, lets migrate
both bools to a bitfield representation. The size of the base structure
(excluding insns) remains unchanged at 40 bytes.

Add also tags for the kmemchecker, so that it doesn't throw false
positives. Even in case gcc would generate suboptimal code, it's not
being accessed in performance critical paths.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 arch/arm/net/bpf_jit_32.c   | 2 +-
 arch/arm64/net/bpf_jit_comp.c   | 2 +-
 arch/mips/net/bpf_jit.c | 2 +-
 arch/powerpc/net/bpf_jit_comp.c | 2 +-
 arch/s390/net/bpf_jit_comp.c| 2 +-
 arch/sparc/net/bpf_jit_comp.c   | 2 +-
 arch/x86/net/bpf_jit_comp.c | 2 +-
 include/linux/filter.h  | 6 --
 kernel/bpf/core.c   | 4 
 kernel/bpf/syscall.c| 4 ++--
 net/core/filter.c   | 2 +-
 11 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 876060b..0df5fd5 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1047,7 +1047,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
 
set_memory_ro((unsigned long)header, header->pages);
fp->bpf_func = (void *)ctx.target;
-   fp->jited = true;
+   fp->jited = 1;
 out:
kfree(ctx.offsets);
return;
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index c047598..a44e529 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -744,7 +744,7 @@ void bpf_int_jit_compile(struct bpf_prog *prog)
 
set_memory_ro((unsigned long)header, header->pages);
prog->bpf_func = (void *)ctx.image;
-   prog->jited = true;
+   prog->jited = 1;
 out:
kfree(ctx.offset);
 }
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
index 0c4a133..77cb273 100644
--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -1251,7 +1251,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);
 
fp->bpf_func = (void *)ctx.target;
-   fp->jited = true;
+   fp->jited = 1;
 
 out:
kfree(ctx.offsets);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 17cea18..0478216 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -679,7 +679,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
((u64 *)image)[1] = local_paca->kernel_toc;
 #endif
fp->bpf_func = (void *)image;
-   fp->jited = true;
+   fp->jited = 1;
}
 out:
kfree(addrs);
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index eeda051..9a0c4c2 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -1310,7 +1310,7 @@ void bpf_int_jit_compile(struct bpf_prog *fp)
if (jit.prg_buf) {
set_memory_ro((unsigned long)header, header->pages);
fp->bpf_func = (void *) jit.prg_buf;
-   fp->jited = true;
+   fp->jited = 1;
}
 free_addrs:
kfree(jit.addrs);
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index f8b9f71..22564f5 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -812,7 +812,7 @@ cond_branch:f_offset = addrs[i + 
filter[i].jf];
if (image) {
bpf_flush_icache(image, image + proglen);
fp->bpf_func = (void *)image;
-   fp->jited = true;
+   fp->jited = 1;
}
 out:
kfree(addrs);
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 70efcd0..7599197 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1109,7 +1109,7 @@ void bpf_int_jit_compile(struct bpf_prog *prog)
bpf_flush_icache(header, image + proglen);
set_memory_ro((unsigned long)header, header->pages);
prog->bpf_func = (void *)image;
-   prog->jited = true;
+   prog->jited = 1;
}
 out:
kfree(addrs);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index fa2cab9..bad618f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -326,8 +326,10 @@ struct bpf_binary_header {
 
 struct bpf_prog {
u16 pages;  /* Number of allocated pages */
-   booljited;  /* Is our filter JIT'ed? */
-   boolgpl_compatible; /* Is our filter GPL 
compatible? */
+   kmemcheck_bitfield_begin(meta);
+   u16 jited:1,/* Is our filter JIT'ed? */
+   gpl_compatible:1; /* Is filter GPL compatible? 
*/
+   kmemcheck_bitfield_end(meta);
u32 len;/* Number of filter blocks */
e

[PATCH net-next v2 2/3] sched, bpf: add helper for retrieving routing realms

2015-09-29 Thread Daniel Borkmann

Using routing realms as part of the classifier is quite useful, it
can be viewed as a tag for one or multiple routing entries (think of
an analogy to net_cls cgroup for processes), set by user space routing
daemons or via iproute2 as an indicator for traffic classifiers and
later on processed in the eBPF program.

Unlike actions, the classifier can inspect device flags and enable
netif_keep_dst() if necessary. tc actions don't have that possibility,
but in case people know what they are doing, it can be used from there
as well (e.g. via devs that must keep dsts by design anyway).

If a realm is set, the handler returns the non-zero realm. User space
can set the full 32bit realm for the dst.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/linux/filter.h   |  3 ++-
 include/uapi/linux/bpf.h |  7 +++
 kernel/bpf/syscall.c |  2 ++
 net/core/filter.c| 22 ++
 net/sched/cls_bpf.c  |  8 ++--
 5 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index bad618f..3d5fd24 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -328,7 +328,8 @@ struct bpf_prog {
u16 pages;  /* Number of allocated pages */
kmemcheck_bitfield_begin(meta);
u16 jited:1,/* Is our filter JIT'ed? */
-   gpl_compatible:1; /* Is filter GPL compatible? 
*/
+   gpl_compatible:1, /* Is filter GPL compatible? 
*/
+   dst_needed:1;   /* Do we need dst entry? */
kmemcheck_bitfield_end(meta);
u32 len;/* Number of filter blocks */
enum bpf_prog_type  type;   /* Type of BPF program */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4ec0b54..564f1f0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -280,6 +280,13 @@ enum bpf_func_id {
 * Return: TC_ACT_REDIRECT
 */
BPF_FUNC_redirect,
+
+   /**
+* bpf_get_route_realm(skb) - retrieve a dst's tclassid
+* @skb: pointer to skb
+* Return: realm if != 0
+*/
+   BPF_FUNC_get_route_realm,
__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2190ab1..5f35f42 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -402,6 +402,8 @@ static void fixup_bpf_calls(struct bpf_prog *prog)
 */
BUG_ON(!prog->aux->ops->get_func_proto);
 
+   if (insn->imm == BPF_FUNC_get_route_realm)
+   prog->dst_needed = 1;
if (insn->imm == BPF_FUNC_tail_call) {
/* mark bpf_tail_call as different opcode
 * to avoid conditional branch in
diff --git a/net/core/filter.c b/net/core/filter.c
index 04664ac..45c69ce 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * sk_filter - run a packet through a socket filter
@@ -1478,6 +1479,25 @@ static const struct bpf_func_proto 
bpf_get_cgroup_classid_proto = {
.arg1_type  = ARG_PTR_TO_CTX,
 };
 
+static u64 bpf_get_route_realm(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+#ifdef CONFIG_IP_ROUTE_CLASSID
+   const struct dst_entry *dst;
+
+   dst = skb_dst((struct sk_buff *) (unsigned long) r1);
+   if (dst)
+   return dst->tclassid;
+#endif
+   return 0;
+}
+
+static const struct bpf_func_proto bpf_get_route_realm_proto = {
+   .func   = bpf_get_route_realm,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+};
+
 static u64 bpf_skb_vlan_push(u64 r1, u64 r2, u64 vlan_tci, u64 r4, u64 r5)
 {
struct sk_buff *skb = (struct sk_buff *) (long) r1;
@@ -1648,6 +1668,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
return bpf_get_skb_set_tunnel_key_proto();
case BPF_FUNC_redirect:
return &bpf_redirect_proto;
+   case BPF_FUNC_get_route_realm:
+   return &bpf_get_route_realm_proto;
default:
return sk_filter_func_proto(func_id);
}
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 7eeffaf6..5faaa54 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -262,7 +262,8 @@ static int cls_bpf_prog_from_ops(struct nlattr **tb, struct 
cls_bpf_prog *prog)
return 0;
 }
 
-static int cls_bpf_prog_from_efd(struct nlattr **tb, struct cls_bpf_prog *prog)
+static int cls_bpf_prog_from_efd(struct nlattr **tb, struct cls_bpf_prog *prog,
+const struct tcf_proto *tp)
 {
struct bpf_prog *fp;
char *name = NULL;
@@ -294,6 +295,9 @@ static int cls_bpf_prog_from_efd(struc

Re: Poor IPv6 TCP performance in 4.3-rc3

2015-09-29 Thread Fabio Estevam

Hi Russell,

On Tue, Sep 29, 2015 at 8:32 PM, Russell King - ARM Linux
 wrote:
> Hi,
>
> I'm seeing really poor IPv6 performance compared to IPv4.  I've
> checked using two different ARM platforms - an iMX6 platform using
> the FEC driver, and an Armada 38x using mvneta.

Does this patch help?
https://patchwork.ozlabs.org/patch/523632/

It was suggested in the following thread:
https://lkml.org/lkml/2015/9/29/258

and it seems to have fixed the performance issue.

Regards,

Fabio Estevam
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Poor IPv6 TCP performance in 4.3-rc3

2015-09-29 Thread Russell King - ARM Linux

Hi,

I'm seeing really poor IPv6 performance compared to IPv4.  I've
checked using two different ARM platforms - an iMX6 platform using
the FEC driver, and an Armada 38x using mvneta.

The following was captured using iperf between the target system
and my laptop.  The problem only occurs one-way.  The 4.3-rc3
platform is running iperf in server mode, the laptop is in client
mode.

Armada 38x:
ipv6: [  4]  0.0-23.9 sec   170 KBytes  58.3 Kbits/sec
ipv4: [  4]  0.0-10.0 sec  1.09 GBytes   936 Mbits/sec

iMX6Q:
ipv6: [  4]  0.0-11.1 sec   640 KBytes   474 Kbits/sec
ipv4: [  4]  0.0-10.0 sec   655 MBytes   549 Mbits/sec

iMX6D with 4.2:
ipv6: [  4]  0.0-10.0 sec   685 MBytes   574 Mbits/sec
ipv4: [  4]  0.0-10.0 sec   696 MBytes   583 Mbits/sec

It looks like there's an IPv6 regression between 4.2 and 4.3-rc3.

Turning GRO off on Armada 38x gives:
ipv6: [  4]  0.0-10.0 sec  1.08 GBytes   923 Mbits/sec
ipv4: [  5]  0.0-10.0 sec  1.09 GBytes   936 Mbits/sec

I haven't started to debug yet, but I thought I'd post a heads-up in
case it's a known problem.  I'll try to get some packet logs on
Thursday, and I'll try to bisect.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next PATCH 0/3] Minor IPv4 routing cleanups

2015-09-29 Thread David Miller

From: Alexander Duyck 
Date: Mon, 28 Sep 2015 11:10:25 -0700

> These patches just contain some minor cleanups to address a few minor
> issues.  The first and the third mostly just improve readability.  The
> second patch should improve the performance for multicast destination
> addresses that do not have a localhost source IP address by avoiding some
> unnecessary dereferences.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv

2015-09-29 Thread Florian Westphal

Tom Herbert  wrote:
> Call before performing NF_HOOK and routing in order to perform address
> translation in the receive path.
> 
> Signed-off-by: Tom Herbert 
> ---
>  net/ipv6/ip6_input.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 9075acf..06dac55 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, 
> struct packet_type *pt
>   /* Must drop socket now because of tproxy. */
>   skb_orphan(skb);
>  
> + /* Translate destination address before routing */
> + xfrm6_xlat_addr(skb);
> +

Ugh.  Yet another hook :-(
One would think we have enough by now.

In any case, I still think this ILA translation stuff should either
go into xtables (NPT-ish), nftables, or into tc if nft is unusable for
whatever reeason.  Judging by where this hook is placed, nf hooks
would work just fine.

If the iptables traverser has too high cost (unfortunately,
xtables design enforces counters and iface name matching even if its
not wanted/unneeded for instance), maybe nft would perform better in that
regard.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 00/11] net: L3 master device

2015-09-29 Thread David Miller

From: David Ahern 
Date: Mon, 28 Sep 2015 10:16:50 -0700

> v2
> - rebased to top of net-next
> 
> - addressed Niks comments (checking master, removing extra lines, and
>   flipping the order of patches 1 and 2)

This still needs some work:

ERROR: "l3mdev_master_ifindex_rcu" [net/ipv6/ipv6.ko] undefined!
scripts/Makefile.modpost:90: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1
Makefile:1095: recipe for target 'modules' failed
make: *** [modules] Error 2
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr

2015-09-29 Thread kbuild test robot

Hi Tom,

[auto build test results on next-20150929 -- if it's inappropriate base, please 
ignore]

reproduce:
  # apt-get install sparse
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/ipv6/ila/ila_xlat.c:218:24: sparse: incompatible types in comparison 
>> expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:269:32: sparse: incompatible types in comparison 
expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:275:25: sparse: incompatible types in comparison 
expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:279:25: sparse: incompatible types in comparison 
expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:315:31: sparse: incompatible types in comparison 
expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:329:32: sparse: incompatible types in comparison 
expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:201:23: sparse: incompatible types in comparison 
expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:514:31: sparse: incompatible types in comparison 
expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:184:23: sparse: incompatible types in comparison 
expression (different address spaces)
   net/ipv6/ila/ila_xlat.c: In function 'ila_xlat_fini':
   net/ipv6/ila/ila_xlat.c:636:6: warning: unused variable 'i' 
[-Wunused-variable]
 int i;
 ^

vim +218 net/ipv6/ila/ila_xlat.c

   202  }
   203  
   204  return NULL;
   205  }
   206  
   207  static inline void ila_release(struct ila_map *ila)
   208  {
   209  kfree_rcu(ila, rcu);
   210  }
   211  
   212  static void ila_free_cb(void *ptr, void *arg)
   213  {
   214  struct ila_map *ila = (struct ila_map *)ptr, *next;
   215  
   216  /* Assume rcu_readlock held */
   217  while (ila) {
 > 218  next = rcu_access_pointer(ila->next);
   219  ila_release(ila);
   220  ila = next;
   221  }
   222  }
   223  
   224  static int ila_add_mapping(struct net *net, struct ila_xlat_params *p)
   225  {
   226  struct ila_net *ilan = net_generic(net, ila_net_id);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: question about potential integer truncation in mwifiex_set_wapi_ie and mwifiex_set_wps_ie

2015-09-29 Thread James Cameron

On Tue, Sep 29, 2015 at 05:21:28PM +0200, PaX Team wrote:
> hi all,
> 
> in drivers/net/wireless/mwifiex/sta_ioctl.c the following functions
> 
>   mwifiex_set_wpa_ie_helper
>   mwifiex_set_wapi_ie
>   mwifiex_set_wps_ie
> 
> can truncate the incoming ie_len argument from u16 to u8 when it gets
> stored in mwifiex_private.wpa_ie_len, mwifiex_private.wapi_ie_len and
> mwifiex_private.wps_ie_len, respectively. based on some light code
> reading it seems a length value of 256 is valid (IEEE_MAX_IE_SIZE and
> MWIFIEX_MAX_VSIE_LEN seem to limit it) and thus would get truncated
> to 0 when stored in those u8 fields. the question is whether this is
> intentional or a bug somewhere.

i agree, while there is a test to ensure ie_len is not greater than
256, there is a possibility that it will be exactly 256, which means
256 bytes will be given to memcpy but
mwifiex_private.{wpa,wapi,wps}_ie_len will be zero.

i suggest changing the lengths to u16.  not tested.

diff --git a/drivers/net/wireless/mwifiex/main.h 
b/drivers/net/wireless/mwifiex/main.h
index fe12560..b66e9a7 100644
--- a/drivers/net/wireless/mwifiex/main.h
+++ b/drivers/net/wireless/mwifiex/main.h
@@ -512,14 +512,14 @@ struct mwifiex_private {
struct mwifiex_wep_key wep_key[NUM_WEP_KEYS];
u16 wep_key_curr_index;
u8 wpa_ie[256];
-   u8 wpa_ie_len;
+   u16 wpa_ie_len;
u8 wpa_is_gtk_set;
struct host_cmd_ds_802_11_key_material aes_key;
struct host_cmd_ds_802_11_key_material_v2 aes_key_v2;
u8 wapi_ie[256];
-   u8 wapi_ie_len;
+   u16 wapi_ie_len;
u8 *wps_ie;
-   u8 wps_ie_len;
+   u16 wps_ie_len;
u8 wmm_required;
u8 wmm_enabled;
u8 wmm_qosinfo;

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 3/7] openvswitch: Fix skb leak in ovs_fragment()

2015-09-29 Thread Joe Stringer

On 29 September 2015 at 15:48, Rustad, Mark D  wrote:
>> On Sep 29, 2015, at 3:39 PM, Joe Stringer  wrote:
>>
>> @@ -728,8 +727,14 @@ static void ovs_fragment(struct vport *vport, struct 
>> sk_buff *skb, u16 mru,
>>   WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.",
>> ovs_vport_name(vport), ntohs(ethertype), mru,
>> vport->dev->mtu);
>> - kfree_skb(skb);
>> + goto out;
>>   }
>> +
>> + skb = NULL;
>> +
>> +out:
>> + if (skb)
>> + kfree_skb(skb);
>> }
>>
>> static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,
>
> Wouldn't that hunk be better as:
>
> @@ -728,8 +727,13 @@ static void ovs_fragment(struct vport *vport, struct 
> sk_buff *skb, u16 mru,
> WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, 
> MTU=%d.",
>   ovs_vport_name(vport), ntohs(ethertype), mru,
>   vport->dev->mtu);
> -   kfree_skb(skb);
> +   goto out;
> }
> +
> +   return;
> +
> +out:
> +   kfree_skb(skb);
> }
>
> static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,
>
> --
> Mark Rustad, Networking Division, Intel Corporation

Sure thing, I'll roll this change in to a v2 when the rest of the
series is reviewed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function

2015-09-29 Thread David Ahern


Hi Tom:

On 9/29/15 4:17 PM, Tom Herbert wrote:

This patch adds xfrm6_xlat_addr which is called in the data path
to perform address translation (primarily for the receive path). Modules
may register their own callback to perform a translation-- this
registration is managed by xfrm6_xlat_addr_add and xfrm6_xlat_addr_del.
xfrm6_xlat_addr allows translation of addresses for an sk_buff.



Seems like a stretch to lump this into xfrms. You have a separate genl 
based config as opposed to the netlink xfrm API and you are calling the 
xlat_addr function directly in ip6_rcv as opposed to via some policy 
with dst_ops driven redirection. Why call this a xfrm?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] 802.1AD: Flow handling, actions, vlan parsing and netlink attributes

2015-09-29 Thread Pravin Shelar

On Fri, Sep 25, 2015 at 3:35 PM, Thomas F Herbert
 wrote:
> Pravin,
>
> Another comment and question. Please seen inline below.
>
> Thanks,
>
> --Tom
>
> On 9/24/15 7:42 PM, Pravin Shelar wrote:
>>
>> On Thu, Sep 24, 2015 at 10:58 AM, Thomas F Herbert
>>  wrote:
>>>
>>> Add support for 802.1ad including the ability to push and pop double
>>> tagged vlans. Add support for 802.1ad to netlink parsing and flow
>>> conversion. Uses double nested encap attributes to represent double
>>> tagged vlan. Inner TPID encoded along with ctci in nested attributes.
>>>
>>> Signed-off-by: Thomas F Herbert 
>>> ---
>>>   net/openvswitch/flow.c |  83 +
>>>   net/openvswitch/flow.h |   5 ++
>>>   net/openvswitch/flow_netlink.c | 166
>>> ++---
>>>   3 files changed, 230 insertions(+), 24 deletions(-)
>>>
...

>>> @@ -1320,6 +1437,7 @@ static int __ovs_nla_put_key(const struct
>>> sw_flow_key *swkey,
>>>   {
>>>  struct ovs_key_ethernet *eth_key;
>>>  struct nlattr *nla, *encap;
>>> +   struct nlattr *in_encap = NULL;
>>>
>>>  if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id))
>>>  goto nla_put_failure;
>>> @@ -1368,17 +1486,42 @@ static int __ovs_nla_put_key(const struct
>>> sw_flow_key *swkey,
>>>  ether_addr_copy(eth_key->eth_src, output->eth.src);
>>>  ether_addr_copy(eth_key->eth_dst, output->eth.dst);
>>>
>>> -   if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) {
>>> +   if (swkey->eth.tci || eth_type_vlan(swkey->eth.type)) {
>>>  __be16 eth_type;
>>> -   eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0x);
>>> +
>>> +   if (swkey->eth.cvlan.ctci ||
>>> +   eth_type_vlan(swkey->eth.cvlan.c_tpid))
>>> +   eth_type = !is_mask ? htons(ETH_P_8021AD) :
>>> + htons(0x);
>>> +   else
>>> +   eth_type = !is_mask ? htons(ETH_P_8021Q) :
>>> + htons(0x);
>>> +
>>
>> Here we can directly dump output->eth.type to netlink. No need to
>> check for inner encap.
>
> The eth.type is set to the inner encapsulated protocol not to the tpid. We
> don't "know" what the outer tpid so I assume it is 802.1Q. To address this
> situation, do you think I should add the outer tpid to sw_flow_key?
> Also see comment above in flow.h.
>

With the addition of nested vlan, we need to add outer tpid. This will
simplify vlan netlink serialization too.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Fix false positives in can_checksum_protocol()

2015-09-29 Thread Tom Herbert

On Tue, Sep 29, 2015 at 12:12 AM, David Woodhouse  wrote:
> On Mon, 2015-09-28 at 20:04 -0700, Tom Herbert wrote:
>>
>> > I've been pondering a bit of a redesign in this space.  I think the
>> > skb struct should be explicit in its instructions to hardware for
>> > which offloads to do for each packet.
>> >
>> > In this way, the stack would be *directly* telling the drivers what to
>> > do (and what not to do), solving all sorts of bugs and really improving
>> > driver reliability and implementation.
>> >
>> Doesn't CHECKSUM_PARTIAL with csum_offset and csum_start already tell
>> the driver unambiguously what to do wrt checksum offload?
>
> Right. That's precisely what we *do* have. But as things stand, we
> can't *use* it to its full capability.
>
> It's fine for decent devices which can handle such explicit
> instructions (advertised by the NETIF_F_HW_CSUM feature).
>
> The problem is the crappy devices that can *only* checksum UDP and TCP
> frames, advertised with the NETIF_F_IP{V6,}_CSUM features. We make a
> primitive attempt *not* to feed arbitrary checksum requests to such
> hardware. But we fail — we end up feeding *all* Legacy IP packets to a
> NETIF_F_IP_CSUM device, and *all* IPv6 packets to a NETIF_F_IPV6_CSUM
> device, regardless of whether they're *actually* TCP or UDP packets.
>
Please look at ixgbe_tx_csum in ixgbe driver. This one example of how
a driver can determine whether the checksum being offloaded is TCP or
UDP. The bug in this driver is that skb_checksum_help is not called
for a protocol the driver isn't looking for. In particular, I believe
this driver will probably send packets with invalid checksums when
TCP/UDP is used with IPv6 packets that contain extension headers.

Tom

> That's the problem I'm trying to solve. And then we *can* make full use
> of the generic checksum offload (I had it working for ICMPv6 at one
> point: http://lists.openwall.net/netdev/2013/01/14/38 ).
>
> --
> David WoodhouseOpen Source Technology Centre
> david.woodho...@intel.com  Intel Corporation
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 3/7] openvswitch: Fix skb leak in ovs_fragment()

2015-09-29 Thread Rustad, Mark D

> On Sep 29, 2015, at 3:39 PM, Joe Stringer  wrote:
> 
> @@ -728,8 +727,14 @@ static void ovs_fragment(struct vport *vport, struct 
> sk_buff *skb, u16 mru,
>   WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.",
> ovs_vport_name(vport), ntohs(ethertype), mru,
> vport->dev->mtu);
> - kfree_skb(skb);
> + goto out;
>   }
> +
> + skb = NULL;
> +
> +out:
> + if (skb)
> + kfree_skb(skb);
> }
> 
> static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,

Wouldn't that hunk be better as:

@@ -728,8 +727,13 @@ static void ovs_fragment(struct vport *vport, struct 
sk_buff *skb, u16 mru,
WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.",
  ovs_vport_name(vport), ntohs(ethertype), mru,
  vport->dev->mtu);
-   kfree_skb(skb);
+   goto out;
}
+
+   return;
+
+out:
+   kfree_skb(skb);
}

static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr

2015-09-29 Thread kbuild test robot

Hi Tom,

[auto build test results on next-20150929 -- if it's inappropriate base, please 
ignore]

config: m68k-sun3_defconfig (attached as .config)
reproduce:
  wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout c505336670b5c681c0a36053a68591e0f9074245
  # save the attached .config to linux build tree
  make.cross ARCH=m68k 

All error/warnings (new ones prefixed by >>):

>> ERROR: "xfrm6_xlat_addr_fini" [net/ipv6/ipv6.ko] undefined!
>> ERROR: "xfrm6_xlat_addr_init" [net/ipv6/ipv6.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[PATCH net 6/7] openvswitch: Extend ct_state match field to 32 bits

2015-09-29 Thread Joe Stringer

The ct_state field was initially added as an 8-bit field, however six of
the bits are already being used and use cases are already starting to
appear that may push the limits of this field. This patch extends the
field to 32 bits while retaining the internal representation of 8 bits.
This should cover forward compatibility of the ABI for the foreseeable
future.

This patch also reorders the OVS_CS_F_* bits to be sequential.

Suggested-by: Jarno Rajahalme 
Signed-off-by: Joe Stringer 
---
 include/uapi/linux/openvswitch.h | 8 
 net/openvswitch/conntrack.c  | 2 +-
 net/openvswitch/conntrack.h  | 4 ++--
 net/openvswitch/flow_netlink.c   | 8 
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 7cbb9d5..f121af5 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -323,7 +323,7 @@ enum ovs_key_attr {
OVS_KEY_ATTR_MPLS,  /* array of struct ovs_key_mpls.
 * The implementation may restrict
 * the accepted length of the array. */
-   OVS_KEY_ATTR_CT_STATE,  /* u8 bitmask of OVS_CS_F_* */
+   OVS_KEY_ATTR_CT_STATE,  /* u32 bitmask of OVS_CS_F_* */
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
@@ -449,9 +449,9 @@ struct ovs_key_ct_label {
 #define OVS_CS_F_ESTABLISHED   0x02 /* Part of an existing connection. */
 #define OVS_CS_F_RELATED   0x04 /* Related to an established
 * connection. */
-#define OVS_CS_F_INVALID   0x20 /* Could not track connection. */
-#define OVS_CS_F_REPLY_DIR 0x40 /* Flow is in the reply direction. */
-#define OVS_CS_F_TRACKED   0x80 /* Conntrack has occurred. */
+#define OVS_CS_F_REPLY_DIR 0x08 /* Flow is in the reply direction. */
+#define OVS_CS_F_INVALID   0x10 /* Could not track connection. */
+#define OVS_CS_F_TRACKED   0x20 /* Conntrack has occurred. */
 
 /**
  * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 8c5d482c..167cf43 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -167,7 +167,7 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key)
 
 int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb)
 {
-   if (nla_put_u8(skb, OVS_KEY_ATTR_CT_STATE, key->ct.state))
+   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, key->ct.state))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index c658d95..7a125422 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -35,7 +35,7 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key);
 int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb);
 void ovs_ct_free_action(const struct nlattr *a);
 
-static inline bool ovs_ct_state_supported(u8 state)
+static inline bool ovs_ct_state_supported(u32 state)
 {
return !(state & ~(OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED |
 OVS_CS_F_RELATED | OVS_CS_F_REPLY_DIR |
@@ -53,7 +53,7 @@ static inline bool ovs_ct_verify(struct net *net, int attr)
return false;
 }
 
-static inline bool ovs_ct_state_supported(u8 state)
+static inline bool ovs_ct_state_supported(u32 state)
 {
return false;
 }
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index c4917c9..292eb13 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -291,7 +291,7 @@ size_t ovs_key_attr_size(void)
+ nla_total_size(4)   /* OVS_KEY_ATTR_SKB_MARK */
+ nla_total_size(4)   /* OVS_KEY_ATTR_DP_HASH */
+ nla_total_size(4)   /* OVS_KEY_ATTR_RECIRC_ID */
-   + nla_total_size(1)   /* OVS_KEY_ATTR_CT_STATE */
+   + nla_total_size(4)   /* OVS_KEY_ATTR_CT_STATE */
+ nla_total_size(2)   /* OVS_KEY_ATTR_CT_ZONE */
+ nla_total_size(4)   /* OVS_KEY_ATTR_CT_MARK */
+ nla_total_size(16)  /* OVS_KEY_ATTR_CT_LABELS */
@@ -349,7 +349,7 @@ static const struct ovs_len_tbl 
ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_TUNNEL]= { .len = OVS_ATTR_NESTED,
 .next = ovs_tunnel_key_lens, },
[OVS_KEY_ATTR_MPLS]  = { .len = sizeof(struct ovs_key_mpls) },
-   [OVS_KEY_ATTR_CT_STATE]  = { .len = sizeof(u8) },
+   [OVS_KEY_ATTR_CT_STATE]  = { .len = sizeof(u32) },
[OVS_KEY_ATTR_CT_ZONE]   = { .len = sizeof(u16) },
[OVS_KEY_ATTR_CT_MARK]   = { .len = sizeof(u32) },
[OV

[PATCH net 1/7] openvswitch: Make LABELS name more consistent

2015-09-29 Thread Joe Stringer

Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name
for these to be consistent with conntrack.

Fixes: c2ac667 "openvswitch: Allow matching on conntrack label"
Signed-off-by: Joe Stringer 
---
 include/uapi/linux/openvswitch.h |  4 ++--
 net/openvswitch/actions.c|  2 +-
 net/openvswitch/conntrack.c  | 10 +-
 net/openvswitch/flow_netlink.c   | 14 +++---
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 32e07d8..9afcd60 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -326,7 +326,7 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_STATE,  /* u8 bitmask of OVS_CS_F_* */
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
-   OVS_KEY_ATTR_CT_LABEL,  /* 16-octet connection tracking label */
+   OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -633,7 +633,7 @@ enum ovs_ct_attr {
OVS_CT_ATTR_FLAGS,  /* u8 bitmask of OVS_CT_F_*. */
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
-   OVS_CT_ATTR_LABEL,  /* label to associate with this connection. */
+   OVS_CT_ATTR_LABELS, /* labels to associate with this connection. */
OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
   related connections. */
__OVS_CT_ATTR_MAX
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 315f533..e23a61c 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -968,7 +968,7 @@ static int execute_masked_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_CT_STATE:
case OVS_KEY_ATTR_CT_ZONE:
case OVS_KEY_ATTR_CT_MARK:
-   case OVS_KEY_ATTR_CT_LABEL:
+   case OVS_KEY_ATTR_CT_LABELS:
err = -EINVAL;
break;
}
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 002a755..8c5d482c 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -179,7 +179,7 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct 
sk_buff *skb)
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) &&
-   nla_put(skb, OVS_KEY_ATTR_CT_LABEL, sizeof(key->ct.label),
+   nla_put(skb, OVS_KEY_ATTR_CT_LABELS, sizeof(key->ct.label),
&key->ct.label))
return -EMSGSIZE;
 
@@ -545,7 +545,7 @@ static const struct ovs_ct_len_tbl 
ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
.maxlen = sizeof(u16) },
[OVS_CT_ATTR_MARK]  = { .minlen = sizeof(struct md_mark),
.maxlen = sizeof(struct md_mark) },
-   [OVS_CT_ATTR_LABEL] = { .minlen = sizeof(struct md_label),
+   [OVS_CT_ATTR_LABELS]= { .minlen = sizeof(struct md_label),
.maxlen = sizeof(struct md_label) },
[OVS_CT_ATTR_HELPER]= { .minlen = 1,
.maxlen = NF_CT_HELPER_NAME_LEN }
@@ -593,7 +593,7 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
 #endif
 #ifdef CONFIG_NF_CONNTRACK_LABELS
-   case OVS_CT_ATTR_LABEL: {
+   case OVS_CT_ATTR_LABELS: {
struct md_label *label = nla_data(a);
 
info->label = *label;
@@ -633,7 +633,7 @@ bool ovs_ct_verify(struct net *net, enum ovs_key_attr attr)
attr == OVS_KEY_ATTR_CT_MARK)
return true;
if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) &&
-   attr == OVS_KEY_ATTR_CT_LABEL) {
+   attr == OVS_KEY_ATTR_CT_LABELS) {
struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
 
return ovs_net->xt_label;
@@ -711,7 +711,7 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info 
*ct_info,
&ct_info->mark))
return -EMSGSIZE;
if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) &&
-   nla_put(skb, OVS_CT_ATTR_LABEL, sizeof(ct_info->label),
+   nla_put(skb, OVS_CT_ATTR_LABELS, sizeof(ct_info->label),
&ct_info->label))
return -EMSGSIZE;
if (ct_info->helper) {
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 5c030a4..ea82cd5 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -294,7 +294,7 @@ size_t ovs_key_attr_size(void)
+ nla_total_size(1)   /* OVS_KEY_ATTR_CT_STATE */
+ nla_total_size(2)   /* OVS_KEY_ATTR_CT_ZONE */
+ nla_total_size(4)   /* OVS_KEY_ATTR_CT_MARK */
-

[PATCH net 4/7] openvswitch: Ensure flow is valid before executing ct

2015-09-29 Thread Joe Stringer

The ct action uses parts of the flow key, so we need to ensure that it
is valid before executing that action.

Fixes: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer 
---
 net/openvswitch/actions.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index e1afbd1..9a88f15 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1104,6 +1104,12 @@ static int do_execute_actions(struct datapath *dp, 
struct sk_buff *skb,
break;
 
case OVS_ACTION_ATTR_CT:
+   if (!is_flow_key_valid(key)) {
+   err = ovs_flow_key_update(skb, key);
+   if (err)
+   return err;
+   }
+
err = ovs_ct_execute(ovs_dp_get_net(dp), skb, key,
 nla_data(a));
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 2/7] openvswitch: Fix typos in CT headers

2015-09-29 Thread Joe Stringer

These comments hadn't caught up to their implementations, fix them.

Fixes: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer 
---
 include/uapi/linux/openvswitch.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 9afcd60..7cbb9d5 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -630,7 +630,7 @@ struct ovs_action_hash {
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
-   OVS_CT_ATTR_FLAGS,  /* u8 bitmask of OVS_CT_F_*. */
+   OVS_CT_ATTR_FLAGS,  /* u32 bitmask of OVS_CT_F_*. */
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
OVS_CT_ATTR_LABELS, /* labels to associate with this connection. */
@@ -705,7 +705,7 @@ enum ovs_action_attr {
   * data immediately followed by a mask.
   * The data must be zero for the unmasked
   * bits. */
-   OVS_ACTION_ATTR_CT,   /* One nested OVS_CT_ATTR_* . */
+   OVS_ACTION_ATTR_CT,   /* Nested OVS_CT_ATTR_* . */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 5/7] openvswitch: Reject ct_state unsupported bits

2015-09-29 Thread Joe Stringer

Previously, if userspace specified ct_state bits in the flow key which
are currently undefined (and therefore unsupported), then they would be
ignored. This could cause unexpected behaviour in future if userspace is
extended to support additional bits but attempts to communicate with the
current version of the kernel. This patch rectifies the situation by
rejecting such ct_state bits.

Fixes: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer 
---
 net/openvswitch/conntrack.h| 12 
 net/openvswitch/flow_netlink.c |  6 ++
 2 files changed, 18 insertions(+)

diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index 43f5dd7..c658d95 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -34,6 +34,13 @@ int ovs_ct_execute(struct net *, struct sk_buff *, struct 
sw_flow_key *,
 void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key);
 int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb);
 void ovs_ct_free_action(const struct nlattr *a);
+
+static inline bool ovs_ct_state_supported(u8 state)
+{
+   return !(state & ~(OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED |
+OVS_CS_F_RELATED | OVS_CS_F_REPLY_DIR |
+OVS_CS_F_INVALID | OVS_CS_F_TRACKED));
+}
 #else
 #include 
 
@@ -46,6 +53,11 @@ static inline bool ovs_ct_verify(struct net *net, int attr)
return false;
 }
 
+static inline bool ovs_ct_state_supported(u8 state)
+{
+   return false;
+}
+
 static inline int ovs_ct_copy_action(struct net *net, const struct nlattr *nla,
 const struct sw_flow_key *key,
 struct sw_flow_actions **acts, bool log)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index ea82cd5..c4917c9 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -816,6 +816,12 @@ static int metadata_from_nlattrs(struct net *net, struct 
sw_flow_match *match,
ovs_ct_verify(net, OVS_KEY_ATTR_CT_STATE)) {
u8 ct_state = nla_get_u8(a[OVS_KEY_ATTR_CT_STATE]);
 
+   if (!is_mask && !ovs_ct_state_supported(ct_state)) {
+   OVS_NLERR(log, "ct_state flags %02x unsupported",
+ ct_state);
+   return -EINVAL;
+   }
+
SW_FLOW_KEY_PUT(match, ct.state, ct_state, is_mask);
*attrs &= ~(1ULL << OVS_KEY_ATTR_CT_STATE);
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 7/7] openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMIT

2015-09-29 Thread Joe Stringer

Previously, the CT_ATTR_FLAGS attribute, when nested under the
OVS_ACTION_ATTR_CT, encoded a 32-bit bitmask of flags that modify the
semantics of the ct action. It's more extensible to just represent each
flag as a nested attribute, and this requires no additional error
checking to reject flags that aren't currently supported.

Suggested-by: Ben Pfaff 
Signed-off-by: Joe Stringer 
---
 include/uapi/linux/openvswitch.h | 14 --
 net/openvswitch/conntrack.c  | 20 +---
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index f121af5..e14563e 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -618,7 +618,9 @@ struct ovs_action_hash {
 
 /**
  * enum ovs_ct_attr - Attributes for %OVS_ACTION_ATTR_CT action.
- * @OVS_CT_ATTR_FLAGS: u32 connection tracking flags.
+ * @OVS_CT_ATTR_COMMIT: If present, commits the connection to the conntrack
+ * table. This allows future packets for the same connection to be identified
+ * as 'established' or 'related'.
  * @OVS_CT_ATTR_ZONE: u16 connection tracking zone.
  * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the
  * mask, the corresponding bit in the value is copied to the connection
@@ -630,7 +632,7 @@ struct ovs_action_hash {
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
-   OVS_CT_ATTR_FLAGS,  /* u32 bitmask of OVS_CT_F_*. */
+   OVS_CT_ATTR_COMMIT, /* No argument, commits connection. */
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
OVS_CT_ATTR_LABELS, /* labels to associate with this connection. */
@@ -641,14 +643,6 @@ enum ovs_ct_attr {
 
 #define OVS_CT_ATTR_MAX (__OVS_CT_ATTR_MAX - 1)
 
-/*
- * OVS_CT_ATTR_FLAGS flags - bitmask of %OVS_CT_F_*
- * @OVS_CT_F_COMMIT: Commits the flow to the conntrack table. This allows
- * future packets for the same connection to be identified as 'established'
- * or 'related'.
- */
-#define OVS_CT_F_COMMIT0x01
-
 /**
  * enum ovs_action_attr - Action types.
  *
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 167cf43..effa78c 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -42,12 +42,18 @@ struct md_label {
struct ovs_key_ct_label mask;
 };
 
+/* Flags for performing connection tracking.
+ *
+ * CT_F_COMMIT: Commits the flow to the conntrack table.
+ */
+#define CT_F_COMMITBIT(0)
+
 /* Conntrack action context for execution. */
 struct ovs_conntrack_info {
struct nf_conntrack_helper *helper;
struct nf_conntrack_zone zone;
struct nf_conn *ct;
-   u32 flags;
+   u8 flags;   /* bitmask of CT_F_*. */
u16 family;
struct md_mark mark;
struct md_label label;
@@ -493,7 +499,7 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
return err;
}
 
-   if (info->flags & OVS_CT_F_COMMIT)
+   if (info->flags & CT_F_COMMIT)
err = ovs_ct_commit(net, key, info, skb);
else
err = ovs_ct_lookup(net, key, info, skb);
@@ -539,8 +545,7 @@ static int ovs_ct_add_helper(struct ovs_conntrack_info 
*info, const char *name,
 }
 
 static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
-   [OVS_CT_ATTR_FLAGS] = { .minlen = sizeof(u32),
-   .maxlen = sizeof(u32) },
+   [OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 },
[OVS_CT_ATTR_ZONE]  = { .minlen = sizeof(u16),
.maxlen = sizeof(u16) },
[OVS_CT_ATTR_MARK]  = { .minlen = sizeof(struct md_mark),
@@ -576,8 +581,8 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
 
switch (type) {
-   case OVS_CT_ATTR_FLAGS:
-   info->flags = nla_get_u32(a);
+   case OVS_CT_ATTR_COMMIT:
+   info->flags |= CT_F_COMMIT;
break;
 #ifdef CONFIG_NF_CONNTRACK_ZONES
case OVS_CT_ATTR_ZONE:
@@ -701,7 +706,8 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info 
*ct_info,
if (!start)
return -EMSGSIZE;
 
-   if (nla_put_u32(skb, OVS_CT_ATTR_FLAGS, ct_info->flags))
+   if (ct_info->flags & CT_F_COMMIT &&
+   nla_put_flag(skb, OVS_CT_ATTR_COMMIT))
return -EMSGSIZE;
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
nla_put_u16(skb, OVS_CT_ATTR_ZONE, ct_info->zone.id))
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 3/7] openvswitch: Fix skb leak in ovs_fragment()

2015-09-29 Thread Joe Stringer

If ovs_fragment() was unable to fragment the skb due to an L2 header
that exceeds the supported length, skbs would be leaked. Fix the bug.

Fixes: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer 
---
 net/openvswitch/actions.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index e23a61c..e1afbd1 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -684,7 +684,7 @@ static void ovs_fragment(struct vport *vport, struct 
sk_buff *skb, u16 mru,
 {
if (skb_network_offset(skb) > MAX_L2_LEN) {
OVS_NLERR(1, "L2 header too long to fragment");
-   return;
+   goto out;
}
 
if (ethertype == htons(ETH_P_IP)) {
@@ -708,8 +708,7 @@ static void ovs_fragment(struct vport *vport, struct 
sk_buff *skb, u16 mru,
struct rt6_info ovs_rt;
 
if (!v6ops) {
-   kfree_skb(skb);
-   return;
+   goto out;
}
 
prepare_frag(vport, skb);
@@ -728,8 +727,14 @@ static void ovs_fragment(struct vport *vport, struct 
sk_buff *skb, u16 mru,
WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.",
  ovs_vport_name(vport), ntohs(ethertype), mru,
  vport->dev->mtu);
-   kfree_skb(skb);
+   goto out;
}
+
+   skb = NULL;
+
+out:
+   if (skb)
+   kfree_skb(skb);
 }
 
 static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 0/7] OVS conntrack fixes for net

2015-09-29 Thread Joe Stringer

The userspace side of the Open vSwitch conntrack changes is currently
undergoing review, which has highlighted some minor bugs in the existing
conntrack implementation in the kernel, as well as pointing out some
future-proofing that can be done on the interface to reduce the need for
additional compatibility code in future.

The biggest changes here are to the userspace API for the ct_state match
field and the CT action. This series proposes to firstly extend the ct_state
match field to 32 bits, ensuring to reject any currently unsupported bits.
Secondly, rather than representing CT action flags within a 32-bit field,
simply use a netlink attribute as presence of the single flag that is
defined today. This also serves to reject unsupported ct action flag bits.

Joe Stringer (7):
  openvswitch: Make LABELS name more consistent
  openvswitch: Fix typos in CT headers
  openvswitch: Fix skb leak in ovs_fragment()
  openvswitch: Ensure flow is valid before executing ct
  openvswitch: Reject ct_state unsupported bits
  openvswitch: Extend ct_state match field to 32 bits
  openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMIT

 include/uapi/linux/openvswitch.h | 28 +++-
 net/openvswitch/actions.c| 21 -
 net/openvswitch/conntrack.c  | 32 +++-
 net/openvswitch/conntrack.h  | 12 
 net/openvswitch/flow_netlink.c   | 26 --
 5 files changed, 74 insertions(+), 45 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr

2015-09-29 Thread kbuild test robot

Hi Tom,

[auto build test results on next-20150929 -- if it's inappropriate base, please 
ignore]

config: xtensa-allyesconfig (attached as .config)
reproduce:
  wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout c505336670b5c681c0a36053a68591e0f9074245
  # save the attached .config to linux build tree
  make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

   net/ipv6/ila/ila_xlat.c: In function 'ila_xlat_fini':
>> net/ipv6/ila/ila_xlat.c:636:6: warning: unused variable 'i' 
>> [-Wunused-variable]
 int i;
 ^

vim +/i +636 net/ipv6/ila/ila_xlat.c

   620  ila_nl_ops);
   621  if (ret < 0)
   622  goto unregister;
   623  
   624  xfrm6_xlat_addr_add(&ila_xlat);
   625  
   626  return 0;
   627  
   628  unregister:
   629  unregister_pernet_device(&ila_net_ops);
   630  exit:
   631  return ret;
   632  }
   633  
   634  void ila_xlat_fini(void)
   635  {
 > 636  int i;
   637  
   638  xfrm6_xlat_addr_del(&ila_xlat);
   639  genl_unregister_family(&ila_nl_family);
   640  unregister_pernet_device(&ila_net_ops);
   641  }
   642  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[PATCH 3/3] fm10k: use napi_schedule_irqoff()

2015-09-29 Thread Alexander Duyck

The fm10k_msix_clean_rings function runs from hard interrupt context or
with interrupts already disabled in netpoll.

It can use napi_schedule_irqoff() instead of napi_schedule()

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 74be792f3f1b..5fbffbaefe32 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -846,7 +846,7 @@ static irqreturn_t fm10k_msix_clean_rings(int 
__always_unused irq, void *data)
struct fm10k_q_vector *q_vector = data;
 
if (q_vector->rx.count || q_vector->tx.count)
-   napi_schedule(&q_vector->napi);
+   napi_schedule_irqoff(&q_vector->napi);
 
return IRQ_HANDLED;
 }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3] use napi_schedule_irqoff()

2015-09-29 Thread Alexander Duyck

This patch set is meant to replace the calls to napi_schedule with
napi_schedule_irqoff as this should help to reduce the interrupt overhead
slightly by removing the unneeded call to local_irq_save and
local_irq_restore.

---

Alexander Duyck (3):
  ixgbe/ixgbevf: use napi_schedule_irqoff()
  i40e/i40evf: use napi_schedule_irqoff()
  fm10k: use napi_schedule_irqoff()


 drivers/net/ethernet/intel/fm10k/fm10k_pci.c  |2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c   |6 --
 drivers/net/ethernet/intel/i40evf/i40evf_main.c   |2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |4 ++--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |2 +-
 5 files changed, 9 insertions(+), 7 deletions(-)

--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] i40e/i40evf: use napi_schedule_irqoff()

2015-09-29 Thread Alexander Duyck

The i40e_intr and i40e/i40evf_msix_clean_rings functions run from hard
interrupt context or with interrupts already disabled in netpoll.

They can use napi_schedule_irqoff() instead of napi_schedule()

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c |6 --
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 484226e0365d..3cc97d4f5f70 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3281,7 +3281,7 @@ static irqreturn_t i40e_msix_clean_rings(int irq, void 
*data)
if (!q_vector->tx.ring && !q_vector->rx.ring)
return IRQ_HANDLED;
 
-   napi_schedule(&q_vector->napi);
+   napi_schedule_irqoff(&q_vector->napi);
 
return IRQ_HANDLED;
 }
@@ -3450,6 +3450,8 @@ static irqreturn_t i40e_intr(int irq, void *data)
 
/* only q0 is used in MSI/Legacy mode, and none are used in MSIX */
if (icr0 & I40E_PFINT_ICR0_QUEUE_0_MASK) {
+   struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
+   struct i40e_q_vector *q_vector = vsi->q_vectors[0];
 
/* temporarily disable queue cause for NAPI processing */
u32 qval = rd32(hw, I40E_QINT_RQCTL(0));
@@ -3462,7 +3464,7 @@ static irqreturn_t i40e_intr(int irq, void *data)
wr32(hw, I40E_QINT_TQCTL(0), qval);
 
if (!test_bit(__I40E_DOWN, &pf->state))
-   
napi_schedule(&pf->vsi[pf->lan_vsi]->q_vectors[0]->napi);
+   napi_schedule_irqoff(&q_vector->napi);
}
 
if (icr0 & I40E_PFINT_ICR0_ADMINQ_MASK) {
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 5e1336321c2f..4b3db099f58c 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -334,7 +334,7 @@ static irqreturn_t i40evf_msix_clean_rings(int irq, void 
*data)
if (!q_vector->tx.ring && !q_vector->rx.ring)
return IRQ_HANDLED;
 
-   napi_schedule(&q_vector->napi);
+   napi_schedule_irqoff(&q_vector->napi);
 
return IRQ_HANDLED;
 }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] ixgbe/ixgbevf: use napi_schedule_irqoff()

2015-09-29 Thread Alexander Duyck

The ixgbe_intr and ixgbe/ixgbevf_msix_clean_rings functions run from hard
interrupt context or with interrupts already disabled in netpoll.

They can use napi_schedule_irqoff() instead of napi_schedule()

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |4 ++--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 693f2da33569..67dc916c94d6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2754,7 +2754,7 @@ static irqreturn_t ixgbe_msix_clean_rings(int irq, void 
*data)
/* EIAM disabled interrupts (on this vector) for us */
 
if (q_vector->rx.ring || q_vector->tx.ring)
-   napi_schedule(&q_vector->napi);
+   napi_schedule_irqoff(&q_vector->napi);
 
return IRQ_HANDLED;
 }
@@ -2948,7 +2948,7 @@ static irqreturn_t ixgbe_intr(int irq, void *data)
ixgbe_ptp_check_pps_event(adapter, eicr);
 
/* would disable interrupts here but EIAM disabled it */
-   napi_schedule(&q_vector->napi);
+   napi_schedule_irqoff(&q_vector->napi);
 
/*
 * re-enable link(maybe) and non-queue interrupts, no flush.
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 592ff237d692..f1c5f3372667 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1288,7 +1288,7 @@ static irqreturn_t ixgbevf_msix_clean_rings(int irq, void 
*data)
 
/* EIAM disabled interrupts (on this vector) for us */
if (q_vector->rx.ring || q_vector->tx.ring)
-   napi_schedule(&q_vector->napi);
+   napi_schedule_irqoff(&q_vector->napi);
 
return IRQ_HANDLED;
 }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr

2015-09-29 Thread Tom Herbert

This patch set up a hook for xfrm6_xlat_addr. This provides a way to
perform ILA translation before early demux which can be a significant
performance advantage over LWT which would occur later.

The implementation entails a rhashtable which is used to do the locator
lookup. The rhash table is configured via new netlink commands.

Signed-off-by: Tom Herbert 
---
 include/uapi/linux/ila.h  |  22 ++
 net/ipv6/Kconfig  |   1 +
 net/ipv6/ila/Makefile |   2 +-
 net/ipv6/ila/ila.h|   2 +
 net/ipv6/ila/ila_common.c |   8 +
 net/ipv6/ila/ila_xlat.c   | 642 ++
 6 files changed, 676 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv6/ila/ila_xlat.c

diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index 7ed9e67..abde7bb 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -3,13 +3,35 @@
 #ifndef _UAPI_LINUX_ILA_H
 #define _UAPI_LINUX_ILA_H
 
+/* NETLINK_GENERIC related info */
+#define ILA_GENL_NAME  "ila"
+#define ILA_GENL_VERSION   0x1
+
 enum {
ILA_ATTR_UNSPEC,
ILA_ATTR_LOCATOR,   /* u64 */
+   ILA_ATTR_IDENTIFIER,/* u64 */
+   ILA_ATTR_LOCATOR_MATCH, /* u64 */
+   ILA_ATTR_IFINDEX,   /* s32 */
+   ILA_ATTR_DIR,   /* u32 */
 
__ILA_ATTR_MAX,
 };
 
 #define ILA_ATTR_MAX   (__ILA_ATTR_MAX - 1)
 
+enum {
+   ILA_CMD_UNSPEC,
+   ILA_CMD_ADD,
+   ILA_CMD_DEL,
+   ILA_CMD_GET,
+
+   __ILA_CMD_MAX,
+};
+
+#define ILA_CMD_MAX(__ILA_CMD_MAX - 1)
+
+#define ILA_DIR_IN (1 << 0)
+#define ILA_DIR_OUT(1 << 1)
+
 #endif /* _UAPI_LINUX_ILA_H */
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 6e8ca06..c972497 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -95,6 +95,7 @@ config IPV6_MIP6
 config IPV6_ILA
tristate "IPv6: Identifier Locator Addressing (ILA)"
select LWTUNNEL
+   select INET6_XFRM_XLAT_ADDR
---help---
  Support for IPv6 Identifier Locator Addressing (ILA).
 
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
index 31d136b..4b32e59 100644
--- a/net/ipv6/ila/Makefile
+++ b/net/ipv6/ila/Makefile
@@ -4,4 +4,4 @@
 
 obj-$(CONFIG_IPV6_ILA) += ila.o
 
-ila-objs := ila_common.o ila_lwt.o
+ila-objs := ila_common.o ila_lwt.o ila_xlat.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index b94081f..28542cb 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -42,5 +42,7 @@ void update_ipv6_locator(struct sk_buff *skb, struct 
ila_params *p);
 
 int ila_lwt_init(void);
 void ila_lwt_fini(void);
+int ila_xlat_init(void);
+void ila_xlat_fini(void);
 
 #endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index 1a1e1e0..cde7b96 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -80,12 +80,20 @@ static int __init ila_init(void)
if (ret)
goto fail_lwt;
 
+   ret = ila_xlat_init();
+   if (ret)
+   goto fail_xlat;
+
+   return 0;
+fail_xlat:
+   ila_lwt_fini();
 fail_lwt:
return ret;
 }
 
 static void __exit ila_fini(void)
 {
+   ila_xlat_fini();
ila_lwt_fini();
 }
 
diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
new file mode 100644
index 000..cd6135b
--- /dev/null
+++ b/net/ipv6/ila/ila_xlat.c
@@ -0,0 +1,642 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "ila.h"
+
+struct ila_xlat_params {
+   struct ila_params ip;
+   __be64 identifier;
+   int ifindex;
+   unsigned int dir;
+};
+
+struct ila_map {
+   struct ila_xlat_params p;
+   struct rhash_head node;
+   struct ila_map *next;
+   struct rcu_head rcu;
+};
+
+static unsigned int ila_net_id;
+
+struct ila_net {
+   struct rhashtable rhash_table;
+   spinlock_t *locks; /* Bucket locks for entry manipulation */
+   unsigned int locks_mask;
+};
+
+#defineLOCKS_PER_CPU 10
+
+static int alloc_ila_locks(struct ila_net *ilan, gfp_t gfp)
+{
+   unsigned int i, size;
+   unsigned int nr_pcpus = num_possible_cpus();
+
+   nr_pcpus = min_t(unsigned int, nr_pcpus, 32UL);
+   size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU);
+
+   if (sizeof(spinlock_t) != 0) {
+#ifdef CONFIG_NUMA
+   if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+   gfp == GFP_KERNEL)
+   ilan->locks = vmalloc(size * sizeof(spinlock_t));
+   else
+#endif
+   ilan->locks = kmalloc_array(size, sizeof(spinlock_t),
+   gfp);
+   if (!ilan->locks)
+   return -ENOMEM;
+   for (i = 0; i < size; i++)
+   spin_lock_init(&ilan->locks[i]);
+   }
+   ilan->locks_mask = size - 1;
+
+   return 0;
+}
+
+static u

[PATCH net-next 3/6] netlink: add a start callback for starting a netlink dump

2015-09-29 Thread Tom Herbert

The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.

Signed-off-by: Tom Herbert 
---
 include/linux/netlink.h  |  2 ++
 include/net/genetlink.h  |  2 ++
 net/netlink/af_netlink.c |  4 
 net/netlink/genetlink.c  | 16 
 4 files changed, 24 insertions(+)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 639e9b8..0b41959 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -131,6 +131,7 @@ netlink_skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 struct netlink_callback {
struct sk_buff  *skb;
const struct nlmsghdr   *nlh;
+   int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff * skb,
struct netlink_callback *cb);
int (*done)(struct netlink_callback *cb);
@@ -153,6 +154,7 @@ struct nlmsghdr *
 __nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int 
flags);
 
 struct netlink_dump_control {
+   int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff *skb, struct netlink_callback *);
int (*done)(struct netlink_callback *);
void *data;
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 1b6b6dc..43c0e77 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -114,6 +114,7 @@ static inline void genl_info_net_set(struct genl_info 
*info, struct net *net)
  * @flags: flags
  * @policy: attribute validation policy
  * @doit: standard command callback
+ * @start: start callback for dumps
  * @dumpit: callback for dumpers
  * @done: completion callback for dumps
  * @ops_list: operations list
@@ -122,6 +123,7 @@ struct genl_ops {
const struct nla_policy *policy;
int(*doit)(struct sk_buff *skb,
   struct genl_info *info);
+   int(*start)(struct netlink_callback *cb);
int(*dumpit)(struct sk_buff *skb,
 struct netlink_callback *cb);
int(*done)(struct netlink_callback *cb);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 8f060d7..c8c43ac 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2905,6 +2905,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 
cb = &nlk->cb;
memset(cb, 0, sizeof(*cb));
+   cb->start = control->start;
cb->dump = control->dump;
cb->done = control->done;
cb->nlh = nlh;
@@ -2917,6 +2918,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 
mutex_unlock(nlk->cb_mutex);
 
+   if (cb->start)
+   cb->start(cb);
+
ret = netlink_dump(sk);
sock_put(sk);
 
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 75724a9..5fd08c0 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -513,6 +513,20 @@ void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
 }
 EXPORT_SYMBOL(genlmsg_put);
 
+static int genl_lock_start(struct netlink_callback *cb)
+{
+   /* our ops are always const - netlink API doesn't propagate that */
+   const struct genl_ops *ops = cb->data;
+   int rc = 0;
+
+   if (ops->start) {
+   genl_lock();
+   rc = ops->start(cb);
+   genl_unlock();
+   }
+   return rc;
+}
+
 static int genl_lock_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 {
/* our ops are always const - netlink API doesn't propagate that */
@@ -577,6 +591,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
.module = family->module,
/* we have const, but the netlink API doesn't */
.data = (void *)ops,
+   .start = genl_lock_start,
.dump = genl_lock_dumpit,
.done = genl_lock_done,
};
@@ -588,6 +603,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
} else {
struct netlink_dump_control c = {
.module = family->module,
+   .start = ops->start,
.dump = ops->dumpit,
.done = ops->done,
};
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv

2015-09-29 Thread Tom Herbert

Call before performing NF_HOOK and routing in order to perform address
translation in the receive path.

Signed-off-by: Tom Herbert 
---
 net/ipv6/ip6_input.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 9075acf..06dac55 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, 
struct packet_type *pt
/* Must drop socket now because of tproxy. */
skb_orphan(skb);
 
+   /* Translate destination address before routing */
+   xfrm6_xlat_addr(skb);
+
return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
   net, NULL, skb, dev, NULL,
   ip6_rcv_finish);
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 4/6] xfrm: Add xfrm6 address translation function

2015-09-29 Thread Tom Herbert

This patch adds xfrm6_xlat_addr which is called in the data path
to perform address translation (primarily for the receive path). Modules
may register their own callback to perform a translation-- this
registration is managed by xfrm6_xlat_addr_add and xfrm6_xlat_addr_del.
xfrm6_xlat_addr allows translation of addresses for an sk_buff.

Signed-off-by: Tom Herbert 
---
 include/net/xfrm.h | 25 ++
 net/ipv6/Kconfig   |  4 +++
 net/ipv6/Makefile  |  1 +
 net/ipv6/xfrm6_policy.c|  7 +
 net/ipv6/xfrm6_xlat_addr.c | 66 ++
 5 files changed, 103 insertions(+)
 create mode 100644 net/ipv6/xfrm6_xlat_addr.c

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index fd17610..ea05c4e 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -607,6 +607,31 @@ struct xfrm_mgr {
 int xfrm_register_km(struct xfrm_mgr *km);
 int xfrm_unregister_km(struct xfrm_mgr *km);
 
+struct xfrm6_xlat_addr {
+   int (*xlat)(struct sk_buff *skb);
+   struct list_head list;
+};
+
+#ifdef CONFIG_INET6_XFRM_XLAT_ADDR
+void xfrm6_xlat_addr_add(struct xfrm6_xlat_addr *xla);
+void xfrm6_xlat_addr_del(struct xfrm6_xlat_addr *xla);
+int xfrm6_xlat_addr(struct sk_buff *skb);
+int xfrm6_xlat_addr_init(void);
+void xfrm6_xlat_addr_fini(void);
+#else
+static inline int xfrm6_xlat_addr(struct sk_buff *skb)
+{
+   return 0;
+}
+
+static inline int xfrm6_xlat_addr_init(void)
+{
+   return 0;
+}
+
+static inline void xfrm6_xlat_addr_fini(void) { }
+#endif
+
 struct xfrm_tunnel_skb_cb {
union {
struct inet_skb_parm h4;
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 983bb99..6e8ca06 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -153,6 +153,10 @@ config INET6_XFRM_MODE_ROUTEOPTIMIZATION
---help---
  Support for MIPv6 route optimization mode.
 
+config INET6_XFRM_XLAT_ADDR
+   select XFRM
+   bool
+
 config IPV6_VTI
 tristate "Virtual (secure) IPv6: tunneling"
select IPV6_TUNNEL
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 2fbd90b..c719d6f 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TRANSPORT) += 
xfrm6_mode_transport.o
 obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o
 obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o
 obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o
+obj-$(CONFIG_INET6_XFRM_XLAT_ADDR) += xfrm6_xlat_addr.o
 obj-$(CONFIG_IPV6_MIP6) += mip6.o
 obj-$(CONFIG_IPV6_ILA) += ila/
 obj-$(CONFIG_NETFILTER)+= netfilter/
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 30caa28..81b9079 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -390,11 +390,17 @@ int __init xfrm6_init(void)
if (ret)
goto out_state;
 
+   ret = xfrm6_xlat_addr_init();
+   if (ret)
+   goto out_protocol;
+
 #ifdef CONFIG_SYSCTL
register_pernet_subsys(&xfrm6_net_ops);
 #endif
 out:
return ret;
+out_protocol:
+   xfrm6_protocol_fini();
 out_state:
xfrm6_state_fini();
 out_policy:
@@ -407,6 +413,7 @@ void xfrm6_fini(void)
 #ifdef CONFIG_SYSCTL
unregister_pernet_subsys(&xfrm6_net_ops);
 #endif
+   xfrm6_xlat_addr_fini();
xfrm6_protocol_fini();
xfrm6_policy_fini();
xfrm6_state_fini();
diff --git a/net/ipv6/xfrm6_xlat_addr.c b/net/ipv6/xfrm6_xlat_addr.c
new file mode 100644
index 000..dd2199a
--- /dev/null
+++ b/net/ipv6/xfrm6_xlat_addr.c
@@ -0,0 +1,66 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct list_head xfrm6_xlat_addr_head __read_mostly;
+static DEFINE_SPINLOCK(xfrm6_xlat_addr_lock);
+
+void xfrm6_xlat_addr_add(struct xfrm6_xlat_addr *xla)
+{
+   spin_lock(&xfrm6_xlat_addr_lock);
+   list_add_rcu(&xla->list, &xfrm6_xlat_addr_head);
+   spin_unlock(&xfrm6_xlat_addr_lock);
+}
+EXPORT_SYMBOL(xfrm6_xlat_addr_add);
+
+void xfrm6_xlat_addr_del(struct xfrm6_xlat_addr *xla)
+{
+   struct xfrm6_xlat_addr *tmp;
+
+   spin_lock(&xfrm6_xlat_addr_lock);
+
+   list_for_each_entry_rcu(tmp, &xfrm6_xlat_addr_head, list) {
+   if (xla == tmp) {
+   list_del_rcu(&xla->list);
+   goto out;
+   }
+   }
+
+   pr_warn("xfrm6_xlat_addr_del: %p not found\n", xla);
+out:
+   spin_unlock(&xfrm6_xlat_addr_lock);
+}
+EXPORT_SYMBOL(xfrm6_xlat_addr_del);
+
+int xfrm6_xlat_addr(struct sk_buff *skb)
+{
+   struct xfrm6_xlat_addr *xla;
+   int err = 0;
+
+   rcu_read_lock();
+
+   list_for_each_entry_rcu(xla, &xfrm6_xlat_addr_head, list) {
+   err = xla->xlat(skb);
+   if (err < 0)
+   break;
+   }
+
+   rcu_read_unlock();
+
+   return err;
+}
+EXPORT_SYMBOL(xfrm6_xlat_addr);
+
+int __init xfrm6_xlat_addr_init(void)
+{
+   INIT_LIST_HEAD(&xfrm6_xlat_addr_head);
+
+

[PATCH net-next 1/6] ila: Create net/ipv6/ila directory

2015-09-29 Thread Tom Herbert

Create ila directory in preparation for supporting other hooks in the
kernel than LWT for doing ILA. This includes:
  - Moving ila.c to ila/ila_lwt.c
  - Splitting out some common functions into ila_common.c

Signed-off-by: Tom Herbert 
---
 net/ipv6/Makefile |   2 +-
 net/ipv6/ila.c| 229 --
 net/ipv6/ila/Makefile |   7 ++
 net/ipv6/ila/ila.h|  46 ++
 net/ipv6/ila/ila_common.c |  95 +++
 net/ipv6/ila/ila_lwt.c| 152 ++
 6 files changed, 301 insertions(+), 230 deletions(-)
 delete mode 100644 net/ipv6/ila.c
 create mode 100644 net/ipv6/ila/Makefile
 create mode 100644 net/ipv6/ila/ila.h
 create mode 100644 net/ipv6/ila/ila_common.c
 create mode 100644 net/ipv6/ila/ila_lwt.c

diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 2c900c7..2fbd90b 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o
 obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o
 obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o
 obj-$(CONFIG_IPV6_MIP6) += mip6.o
-obj-$(CONFIG_IPV6_ILA) += ila.o
+obj-$(CONFIG_IPV6_ILA) += ila/
 obj-$(CONFIG_NETFILTER)+= netfilter/
 
 obj-$(CONFIG_IPV6_VTI) += ip6_vti.o
diff --git a/net/ipv6/ila.c b/net/ipv6/ila.c
deleted file mode 100644
index 678d2df..000
--- a/net/ipv6/ila.c
+++ /dev/null
@@ -1,229 +0,0 @@
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct ila_params {
-   __be64 locator;
-   __be64 locator_match;
-   __wsum csum_diff;
-};
-
-static inline struct ila_params *ila_params_lwtunnel(
-   struct lwtunnel_state *lwstate)
-{
-   return (struct ila_params *)lwstate->data;
-}
-
-static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
-{
-   __be32 diff[] = {
-   ~from[0], ~from[1], to[0], to[1],
-   };
-
-   return csum_partial(diff, sizeof(diff), 0);
-}
-
-static inline __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
-{
-   if (*(__be64 *)&ip6h->daddr == p->locator_match)
-   return p->csum_diff;
-   else
-   return compute_csum_diff8((__be32 *)&ip6h->daddr,
- (__be32 *)&p->locator);
-}
-
-static void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
-{
-   __wsum diff;
-   struct ipv6hdr *ip6h = ipv6_hdr(skb);
-   size_t nhoff = sizeof(struct ipv6hdr);
-
-   /* First update checksum */
-   switch (ip6h->nexthdr) {
-   case NEXTHDR_TCP:
-   if (likely(pskb_may_pull(skb, nhoff + sizeof(struct tcphdr {
-   struct tcphdr *th = (struct tcphdr *)
-   (skb_network_header(skb) + nhoff);
-
-   diff = get_csum_diff(ip6h, p);
-   inet_proto_csum_replace_by_diff(&th->check, skb,
-   diff, true);
-   }
-   break;
-   case NEXTHDR_UDP:
-   if (likely(pskb_may_pull(skb, nhoff + sizeof(struct udphdr {
-   struct udphdr *uh = (struct udphdr *)
-   (skb_network_header(skb) + nhoff);
-
-   if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
-   diff = get_csum_diff(ip6h, p);
-   inet_proto_csum_replace_by_diff(&uh->check, skb,
-   diff, true);
-   if (!uh->check)
-   uh->check = CSUM_MANGLED_0;
-   }
-   }
-   break;
-   case NEXTHDR_ICMP:
-   if (likely(pskb_may_pull(skb,
-nhoff + sizeof(struct icmp6hdr {
-   struct icmp6hdr *ih = (struct icmp6hdr *)
-   (skb_network_header(skb) + nhoff);
-
-   diff = get_csum_diff(ip6h, p);
-   inet_proto_csum_replace_by_diff(&ih->icmp6_cksum, skb,
-   diff, true);
-   }
-   break;
-   }
-
-   /* Now change destination address */
-   *(__be64 *)&ip6h->daddr = p->locator;
-}
-
-static int ila_output(struct sock *sk, struct sk_buff *skb)
-{
-   struct dst_entry *dst = skb_dst(skb);
-
-   if (skb->protocol != htons(ETH_P_IPV6))
-   goto drop;
-
-   update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
-
-   return dst->lwtstate->orig_output(sk, skb);
-
-drop:
-   kfree_skb(skb);
-   return -EINVAL;
-}
-
-static int ila_input(struct sk_buff *skb)
-{
-   struct ds

[PATCH net-next 2/6] rhashtable: add function to replace an element

2015-09-29 Thread Tom Herbert

Add the rhashtable_replace_fast function. This replaces one object in
the table with another atomically. The hashes of the new and old objects
must be equal.

Signed-off-by: Tom Herbert 
---
 include/linux/rhashtable.h | 82 ++
 1 file changed, 82 insertions(+)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 843ceca..77deece 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -819,4 +819,86 @@ out:
return err;
 }
 
+/* Internal function, please use rhashtable_replace_fast() instead */
+static inline int __rhashtable_replace_fast(
+   struct rhashtable *ht, struct bucket_table *tbl,
+   struct rhash_head *obj_old, struct rhash_head *obj_new,
+   const struct rhashtable_params params)
+{
+   struct rhash_head __rcu **pprev;
+   struct rhash_head *he;
+   spinlock_t *lock;
+   unsigned int hash;
+   int err = -ENOENT;
+
+   /* Minimally, the old and new objects must have same hash
+* (which should mean identifiers are the same).
+*/
+   hash = rht_head_hashfn(ht, tbl, obj_old, params);
+   if (hash != rht_head_hashfn(ht, tbl, obj_new, params))
+   return -EINVAL;
+
+   lock = rht_bucket_lock(tbl, hash);
+
+   spin_lock_bh(lock);
+
+   pprev = &tbl->buckets[hash];
+   rht_for_each(he, tbl, hash) {
+   if (he != obj_old) {
+   pprev = &he->next;
+   continue;
+   }
+
+   rcu_assign_pointer(obj_new->next, obj_old->next);
+   rcu_assign_pointer(*pprev, obj_new);
+   err = 0;
+   break;
+   }
+
+   spin_unlock_bh(lock);
+
+   return err;
+}
+
+/**
+ * rhashtable_replace_fast - replace an object in hash table
+ * @ht:hash table
+ * @obj_old:   pointer to hash head inside object being replaced
+ * @obj_new:   pointer to hash head inside object which is new
+ * @params:hash table parameters
+ *
+ * Replacing an object doesn't affect the number of elements in the hash table
+ * or bucket, so we don't need to worry about shrinking or expanding the
+ * table here.
+ *
+ * Returns zero on success, -ENOENT if the entry could not be found,
+ * -EINVAL if hash is not the same for the old and new objects.
+ */
+static inline int rhashtable_replace_fast(
+   struct rhashtable *ht, struct rhash_head *obj_old,
+   struct rhash_head *obj_new,
+   const struct rhashtable_params params)
+{
+   struct bucket_table *tbl;
+   int err;
+
+   rcu_read_lock();
+
+   tbl = rht_dereference_rcu(ht->tbl, ht);
+
+   /* Because we have already taken (and released) the bucket
+* lock in old_tbl, if we find that future_tbl is not yet
+* visible then that guarantees the entry to still be in
+* the old tbl if it exists.
+*/
+   while ((err = __rhashtable_replace_fast(ht, tbl, obj_old,
+   obj_new, params)) &&
+  (tbl = rht_dereference_rcu(tbl->future_tbl, ht)))
+   ;
+
+   rcu_read_unlock();
+
+   return err;
+}
+
 #endif /* _LINUX_RHASHTABLE_H */
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/6] ila: Optimization to preserve value of early demux

2015-09-29 Thread Tom Herbert

In the current implementation of ILA, LWT is used to perform
translation on both the input and output paths. This is functional,
however there is a big performance hit in the receive path. Early
demux occurs before the routing lookup (a hit actually obviates the
route lookup). Therefore the stack currently performs early
demux before translation so that a local connection with ILA
addresses is never matched. Note that this issue is not just
with ILA, but pretty much any translated or encapsulated packet
handled by LWT would miss the opportunity for early demux. Solving
the general problem seems non trivial since we would need to move
the route lookup before early demx thereby mitigating the value.

This patch set addresses the issue for ILA by adding a fast locator
lookup that occurs before early demux. This is done by creating an
XFRM hook to perform address translation early in the receive path.
For the backend we implement an rhashtable that contains identifier
to locator to mappings. The table also allows more specific matches
that include original locator and interface.

This patch set:
 - Add an rhashtable function to atomically replace and element.
   This is useful to implement sub-trees from a table entry
   without needing to use a special anchor structure as the
   table entry.
 - Add a start callback for starting a netlink dump.
 - Creates an ila directory under net/ipv6 and moves ila.c to it.
   ila.c is split into ila_common.c and ila_lwt.c.
 - Implement a table to do identifier->locator mapping. This is
   an rhashtable.
 - Configuration for the table with netlink.
 - Add XFRM xlat_addr facility. This includes a callback registeration
   function and hook to call registered callbacks.
 - Call xfrm6_xlat_addr from ipv6_rcv before NF_HOOK and routing.

Testing:
   Running 200 netperf TCP_RR streams

No ILA, baseline
   85.72% CPU utilization
   1861945 tps
   93/163/330 50/90/99% latencies

ILA before fix (LWT on both input and output)
   83.47 CPU utilization
   16583186 tps (-11% from baseline)
   107/183/338 50/90/99% latencies

ILA after fix (hook for input)
   84.97% CPU utilization
   1833948 tps (-1.5% from baseline)
   95/164/331 50/90/99% latencies

Hacked DNPT to do ILA
   80.94% CPU utilization
   1683315 tps (-10% from baseline)
   104/179/350 50/90/99% latencies

Tom Herbert (6):
  ila: Create net/ipv6/ila directory
  rhashtable: add function to replace an element
  netlink: add a start callback for starting a netlink dump
  xfrm: Add xfrm6 address translation function
  ipv6: Call xfrm6_xlat_addr from ipv6_rcv
  ila: Add support for xfrm6_xlat_addr

 include/linux/netlink.h|   2 +
 include/linux/rhashtable.h |  82 ++
 include/net/genetlink.h|   2 +
 include/net/xfrm.h |  25 ++
 include/uapi/linux/ila.h   |  22 ++
 net/ipv6/Kconfig   |   5 +
 net/ipv6/Makefile  |   3 +-
 net/ipv6/ila.c | 229 
 net/ipv6/ila/Makefile  |   7 +
 net/ipv6/ila/ila.h |  48 
 net/ipv6/ila/ila_common.c  | 103 
 net/ipv6/ila/ila_lwt.c | 152 +++
 net/ipv6/ila/ila_xlat.c| 642 +
 net/ipv6/ip6_input.c   |   3 +
 net/ipv6/xfrm6_policy.c|   7 +
 net/ipv6/xfrm6_xlat_addr.c |  66 +
 net/netlink/af_netlink.c   |   4 +
 net/netlink/genetlink.c|  16 ++
 18 files changed, 1188 insertions(+), 230 deletions(-)
 delete mode 100644 net/ipv6/ila.c
 create mode 100644 net/ipv6/ila/Makefile
 create mode 100644 net/ipv6/ila/ila.h
 create mode 100644 net/ipv6/ila/ila_common.c
 create mode 100644 net/ipv6/ila/ila_lwt.c
 create mode 100644 net/ipv6/ila/ila_xlat.c
 create mode 100644 net/ipv6/xfrm6_xlat_addr.c

-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next PATCH v2] netpoll: Drop budget parameter from NAPI polling call hierarchy

2015-09-29 Thread David Miller

From: Alexander Duyck 
Date: Mon, 28 Sep 2015 09:16:17 -0700

> For some reason we were carrying the budget value around between the
> various calls to napi->poll.  If for example one of the drivers called had
> a bug in which it returned a non-zero value for work this could result in
> the budget value becoming negative.
> 
> Rather than carry around a value of budget that is 0 or less we can instead
> just loop through and pass 0 to each napi->poll call.  If any driver
> returns a value for work done that is non-zero then we can report that
> driver and continue rather than allowing a bad actor to make the budget
> value negative and pass that negative value to napi->poll.
> 
> Note, the only actual change here is that instead of letting budget become
> negative we are keeping it at 0 regardless of the value returned for work
> since it should not be possible for the polling routine to do any actual
> work with a budget of 0.  So if the polling routine returns a non-0 value
> we are just reporting it and continuing with a budget of 0 rather than
> letting that work value be subtracted from the budget of 0.
> 
> Signed-off-by: Alexander Duyck 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set

2015-09-29 Thread David Miller

From: David Ahern 
Date: Mon, 28 Sep 2015 10:12:13 -0700

> Wolfgang reported that IPv6 stack is ignoring oif in output route lookups:
 ...
> The stack does consider the oif but a mismatch in rt6_device_match is not
> considered fatal because RT6_LOOKUP_F_IFACE is not set in the flags.
> 
> Cc: Wolfgang Nothdurft 
> Signed-off-by: David Ahern 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RESEND: [PATCH v3 net-next] sky2: use random address if EEPROM is bad

2015-09-29 Thread David Miller

From: Liviu Dudau 
Date: Mon, 28 Sep 2015 17:51:51 +0100

> On some embedded systems the EEPROM does not contain a valid MAC address.
> In that case it is better to fallback to a generated mac address and
> let init scripts fix the value later.
> 
> Reported-by: Liviu Dudau 
> Signed-off-by: Stephen Hemminger 
> [Changed handcoded setup to use eth_hw_addr_random() and to save new address 
> into HW]
> Signed-off-by: Liviu Dudau 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/1] net sysfs: Print link speed as signed integer

2015-09-29 Thread David Miller

From: Alexander Stein 
Date: Mon, 28 Sep 2015 15:05:33 +0200

> Otherwise 4294967295 (MBit/s) (-1) will be printed when there is no link.
> Documentation/ABI/testing/sysfs-class-net does not state if this shall be
> signed or unsigned.
> Also remove the now unused variable fmt_udec.
> 
> Signed-off-by: Alexander Stein 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type

2015-09-29 Thread Florian Westphal

Daniel Mack  wrote:
> Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the
> input demux is complete and the final destination socket (if any)
> has been determined.
> 
> This helps filtering packets based on information stored in the
> destination socket, such as cgroup controller supplied net class IDs.

This still seems like the 'x y' problem ("want to do X, think Y is
correct solution; ask about Y, but thats a strange thing to do").

There is nothing that this offers over INPUT *except* that sk is
available.  But there is zero benefit as far as I am concerned --
why would you want to do any meaningful filtering based on the sk at
that point...?

Drop?  Makes no sense, else application would not be running in the first
place.
Allowing response packets?  Can already do that with conntrack.

So the only 'benefit' is that netcls id is available; but
a) why is that even needed and
b) is such a huge sledgehammer just for net cgroup accounting
worth it?

Another question is what other strange things come up once we would
open this door.

> listening on a specific task, the resulting error code that is sent
> back to the remote peer can't be controlled with rules in
> NF_INET_LOCAL_SOCKET_IN chains.

Right, and that makes this even weirder.

For deterministic ingress filtering you can only rely on what
is contained in the packet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bna: fix error handling

2015-09-29 Thread David Miller

From: Andrzej Hajda 
Date: Mon, 28 Sep 2015 10:49:48 +0200

> Several functions can return negative value in case of error,
> so their return type should be fixed as well as type of variables
> to which this value is assigned.
> 
> The problem has been detected using proposed semantic patch
> scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
> 
> [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
> 
> Signed-off-by: Andrzej Hajda 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netpoll: Drop budget parameter from NAPI polling call hierarchy

2015-09-29 Thread David Miller

From: Alexander Duyck 
Date: Sun, 27 Sep 2015 15:58:56 -0700

> On 09/26/2015 10:36 PM, David Miller wrote:
>> From: Alexander Duyck 
>> Date: Tue, 22 Sep 2015 14:56:08 -0700
>>
>>> Rather than carry around a value of budget that is 0 or less we can
>>> instead
>>> just loop through and pass 0 to each napi->poll call.  If any driver
>>> returns a value for work done that is non-zero then we can report that
>>> driver and continue rather than allowing a bad actor to make the
>>> budget
>>> value negative and pass that negative value to napi->poll.
>> Unfortunately we have drivers that won't do any TX work if the budget
>> is zero.
> 
> Well that is what we are doing right now.  The fact is the call starts
> out with a budget of 0, and it is somewhat hidden from the call since
> the budget is assigned a value of 0 in netpoll_poll_dev. That is one
> of the things I was wanting do address because that is clear as mud
> from looking at poll_one_napi.  Based on the code you would assume
> budget starts out as a non-zero value and it doesn't.

I see, thanks for explaining.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 0/2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag

2015-09-29 Thread David Miller

From: Aaron Conole 
Date: Sat, 26 Sep 2015 18:50:41 -0400

> This patch set implements a bugfix for kernel.org bugzilla #12323, allowing
> MSG_PEEK to return all queued data on the unix domain socket, not just the
> data contained in a single SKB. 
> 
> This is the v3 version of this patch, which includes a suggested modification
> by Eric Dumazet to convert the unix_sk() conversion macro to a static inline
> function. These patches are independent and can be applied separately.
> 
> This set was tested over a 24-hour period, utilizing a loop continually 
> executing the bugzilla issue attached python code. It was instrumented with
> a pr_err_once() ([   13.798683] unix: went there at least one time).
> 
> v2->v3:
>  - Added Eric Dumazet's suggestion for #define to static inline
>  - Fixed an issue calling unix_state_lock() with an invalid argument
> 
> v3->v4:
>  - Eliminated an XXX comment
>  - Changed from goto unlock to explicit unix_state_unlock() and break

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 209 matches

Mail list logo