[PATCH net-next 1/4] phylib: Add phy_set_max_speed helper
Add a helper to allow ethernet drivers to limit the speed of a phy (that they are attached to). This mainly involves factoring out the business-end of of_set_phy_supported() and exporting a new symbol. This code seems to be open coded in several places, in several different variants. It is is envisaged that this will be used in situations where setting the "max-speed" property in DT is not appropriate, e.g. because the maximum speed is not a property of the phy hardware. Signed-off-by: Simon Horman --- v2 * First post v3 * As suggested by Florian Fainelli - Do not check for !IS_ENABLED(CONFIG_OF_MDIO) in __set_phy_supported. This is already done in of_set_phy_supported() and is not relevant to phy_set_max_speed) - Return -ENOTSUPP if 'max_speed' is not an unknown value * As suggested by Sergei Shtylyov - White-space and comment enhancements. v4 * No change --- drivers/net/phy/phy_device.c | 59 ++-- include/linux/phy.h | 1 + 2 files changed, 41 insertions(+), 19 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index f761288abe66..383389146099 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1239,6 +1239,44 @@ static int gen10g_resume(struct phy_device *phydev) return 0; } +static int __set_phy_supported(struct phy_device *phydev, u32 max_speed) +{ + /* The default values for phydev->supported are provided by the PHY +* driver "features" member, we want to reset to sane defaults first +* before supporting higher speeds. +*/ + phydev->supported &= PHY_DEFAULT_FEATURES; + + switch (max_speed) { + default: + return -ENOTSUPP; + case SPEED_1000: + phydev->supported |= PHY_1000BT_FEATURES; + /* fall through */ + case SPEED_100: + phydev->supported |= PHY_100BT_FEATURES; + /* fall through */ + case SPEED_10: + phydev->supported |= PHY_10BT_FEATURES; + } + + return 0; +} + +int phy_set_max_speed(struct phy_device *phydev, u32 max_speed) +{ + int err; + + err = __set_phy_supported(phydev, max_speed); + if (err) + return err; + + phydev->advertising = phydev->supported; + + return 0; +} +EXPORT_SYMBOL(phy_set_max_speed); + static void of_set_phy_supported(struct phy_device *phydev) { struct device_node *node = phydev->dev.of_node; @@ -1250,25 +1288,8 @@ static void of_set_phy_supported(struct phy_device *phydev) if (!node) return; - if (!of_property_read_u32(node, "max-speed", &max_speed)) { - /* The default values for phydev->supported are provided by the PHY -* driver "features" member, we want to reset to sane defaults fist -* before supporting higher speeds. -*/ - phydev->supported &= PHY_DEFAULT_FEATURES; - - switch (max_speed) { - default: - return; - - case SPEED_1000: - phydev->supported |= PHY_1000BT_FEATURES; - case SPEED_100: - phydev->supported |= PHY_100BT_FEATURES; - case SPEED_10: - phydev->supported |= PHY_10BT_FEATURES; - } - } + if (!of_property_read_u32(node, "max-speed", &max_speed)) + __set_phy_supported(phydev, max_speed); } /** diff --git a/include/linux/phy.h b/include/linux/phy.h index 4a4e3a092337..4c477e6ece33 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -798,6 +798,7 @@ int phy_mii_ioctl(struct phy_device *phydev, struct ifreq *ifr, int cmd); int phy_start_interrupts(struct phy_device *phydev); void phy_print_status(struct phy_device *phydev); void phy_device_free(struct phy_device *phydev); +int phy_set_max_speed(struct phy_device *phydev, u32 max_speed); int phy_register_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask, int (*run)(struct phy_device *)); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/4] ravb: Add support for r8a7795 SoC
From: Kazuya Mizuguchi This patch supports the r8a7795 SoC by: - Using two interrupts + One for E-MAC + One for everything else + Both can be handled by the existing common interrupt handler, which affords a simpler update to support the new SoC. In future some consideration may be given to implementing multiple interrupt handlers - Limiting the phy speed to 100Mbit/s for the new SoC; at this time it is not clear how this restriction may be lifted but I hope it will be possible as more information comes to light Signed-off-by: Kazuya Mizuguchi [horms: reworked] Signed-off-by: Simon Horman --- v0 [Kazuya Mizuguchi] v1 [Simon Horman] * Updated patch subject v2 [Simon Horman] * Reworked based on extensive feedback from Geert Uytterhoeven and Sergei Shtylyov. * Broke binding update out into separate patch v3 [Simon Horman] * Check new return value of phy_set_max_speed() v4 * No change --- drivers/net/ethernet/renesas/ravb.h | 7 drivers/net/ethernet/renesas/ravb_main.c | 63 2 files changed, 62 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h index a157ff6a..0623fff932e4 100644 --- a/drivers/net/ethernet/renesas/ravb.h +++ b/drivers/net/ethernet/renesas/ravb.h @@ -766,6 +766,11 @@ struct ravb_ptp { struct ravb_ptp_perout perout[N_PER_OUT]; }; +enum ravb_chip_id { + RCAR_GEN2, + RCAR_GEN3, +}; + struct ravb_private { struct net_device *ndev; struct platform_device *pdev; @@ -806,6 +811,8 @@ struct ravb_private { int msg_enable; int speed; int duplex; + int emac_irq; + enum ravb_chip_id chip_id; unsigned no_avb_link:1; unsigned avb_link_active_low:1; diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 4ca093d033f8..8cc5ec5ed19a 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -889,6 +889,22 @@ static int ravb_phy_init(struct net_device *ndev) return -ENOENT; } + /* This driver only support 10/100Mbit speeds on Gen3 +* at this time. +*/ + if (priv->chip_id == RCAR_GEN3) { + int err; + + err = phy_set_max_speed(phydev, SPEED_100); + if (err) { + netdev_err(ndev, "failed to limit PHY to 100Mbit/s\n"); + phy_disconnect(phydev); + return err; + } + + netdev_info(ndev, "limited PHY to 100Mbit/s\n"); + } + netdev_info(ndev, "attached PHY %d (IRQ %d) to driver %s\n", phydev->addr, phydev->irq, phydev->drv->name); @@ -1197,6 +1213,15 @@ static int ravb_open(struct net_device *ndev) goto out_napi_off; } + if (priv->chip_id == RCAR_GEN3) { + error = request_irq(priv->emac_irq, ravb_interrupt, + IRQF_SHARED, ndev->name, ndev); + if (error) { + netdev_err(ndev, "cannot request IRQ\n"); + goto out_free_irq; + } + } + /* Device init */ error = ravb_dmac_init(ndev); if (error) @@ -1220,6 +1245,7 @@ out_ptp_stop: ravb_ptp_stop(ndev); out_free_irq: free_irq(ndev->irq, ndev); + free_irq(priv->emac_irq, ndev); out_napi_off: napi_disable(&priv->napi[RAVB_NC]); napi_disable(&priv->napi[RAVB_BE]); @@ -1625,10 +1651,20 @@ static int ravb_mdio_release(struct ravb_private *priv) return 0; } +static const struct of_device_id ravb_match_table[] = { + { .compatible = "renesas,etheravb-r8a7790", .data = (void *)RCAR_GEN2 }, + { .compatible = "renesas,etheravb-r8a7794", .data = (void *)RCAR_GEN2 }, + { .compatible = "renesas,etheravb-r8a7795", .data = (void *)RCAR_GEN3 }, + { } +}; +MODULE_DEVICE_TABLE(of, ravb_match_table); + static int ravb_probe(struct platform_device *pdev) { struct device_node *np = pdev->dev.of_node; + const struct of_device_id *match; struct ravb_private *priv; + enum ravb_chip_id chip_id; struct net_device *ndev; int error, irq, q; struct resource *res; @@ -1657,7 +1693,14 @@ static int ravb_probe(struct platform_device *pdev) /* The Ether-specific entries in the device structure. */ ndev->base_addr = res->start; ndev->dma = -1; - irq = platform_get_irq(pdev, 0); + + match = of_match_device(of_match_ptr(ravb_match_table), &pdev->dev); + chip_id = (enum ravb_chip_id)match->data; + + if (chip_id == RCAR_GEN3) + irq = platform_get_irq_byname(pdev, "ch22"); + else + irq = platform_get_irq(pdev, 0); if (irq < 0) { error = irq; goto o
[PATCH net-next 0/4] ravb: Add support for r8a7795 SoC
Dave, please consider this series for net-next. It enhances the ravb driver to support the r8a7795 SoC. Changes: * Dropped RFC prefix * Details in changelog of individual patches Base: * net-next/master Availability: To aid review of this in conjunction with other EtherAVB changes the following branches are available in my renesas tree on kernel.org. * me/r8a7795-ravb-driver-v4: this series * me/r8a7795-ravb-pfc-v2: r8a7795 sh-pfc update for EthernetAVB * me/r8a7795-ravb-integration-v4: enable EthernetAVB on r8a7795 * me/r8a7795-ravb-driver-and-integration-v4.runtime: the above three branches with their runtime dependencies Kazuya Mizuguchi (3): ravb: Provide dev parameter to DMA API ravb: Document binding for r8a7795 SoC ravb: Add support for r8a7795 SoC Simon Horman (1): phylib: Add phy_set_max_speed helper .../devicetree/bindings/net/renesas,ravb.txt | 69 -- drivers/net/ethernet/renesas/ravb.h| 7 ++ drivers/net/ethernet/renesas/ravb_main.c | 101 +++-- drivers/net/phy/phy_device.c | 59 include/linux/phy.h| 1 + 5 files changed, 184 insertions(+), 53 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/4] ravb: Provide dev parameter to DMA API
From: Kazuya Mizuguchi This patch is in preparation for using this driver on arm64 where the implementation of __dma_alloc_coherent fails if a device parameter is not provided. Signed-off-by: Kazuya Mizuguchi Signed-off-by: Yoshihiro Shimoda Signed-off-by: Masaru Nagai [horms: squashed into a single patch] Signed-off-by: Simon Horman --- * [horms] I have only tested this on arm64 using r8a7795/salvator-x. v0 [Kazuya Mizuguchi, Yoshihiro Shimoda, Masaru Nagai] v1 [Simon Horman] * Squashed into a single patch v2 [Simon Horman] * No change v4 * No change --- drivers/net/ethernet/renesas/ravb_main.c | 38 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 450899e9cea2..4ca093d033f8 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -201,7 +201,7 @@ static void ravb_ring_free(struct net_device *ndev, int q) if (priv->rx_ring[q]) { ring_size = sizeof(struct ravb_ex_rx_desc) * (priv->num_rx_ring[q] + 1); - dma_free_coherent(NULL, ring_size, priv->rx_ring[q], + dma_free_coherent(ndev->dev.parent, ring_size, priv->rx_ring[q], priv->rx_desc_dma[q]); priv->rx_ring[q] = NULL; } @@ -209,7 +209,7 @@ static void ravb_ring_free(struct net_device *ndev, int q) if (priv->tx_ring[q]) { ring_size = sizeof(struct ravb_tx_desc) * (priv->num_tx_ring[q] * NUM_TX_DESC + 1); - dma_free_coherent(NULL, ring_size, priv->tx_ring[q], + dma_free_coherent(ndev->dev.parent, ring_size, priv->tx_ring[q], priv->tx_desc_dma[q]); priv->tx_ring[q] = NULL; } @@ -240,13 +240,13 @@ static void ravb_ring_format(struct net_device *ndev, int q) rx_desc = &priv->rx_ring[q][i]; /* The size of the buffer should be on 16-byte boundary. */ rx_desc->ds_cc = cpu_to_le16(ALIGN(PKT_BUF_SZ, 16)); - dma_addr = dma_map_single(&ndev->dev, priv->rx_skb[q][i]->data, + dma_addr = dma_map_single(ndev->dev.parent, priv->rx_skb[q][i]->data, ALIGN(PKT_BUF_SZ, 16), DMA_FROM_DEVICE); /* We just set the data size to 0 for a failed mapping which * should prevent DMA from happening... */ - if (dma_mapping_error(&ndev->dev, dma_addr)) + if (dma_mapping_error(ndev->dev.parent, dma_addr)) rx_desc->ds_cc = cpu_to_le16(0); rx_desc->dptr = cpu_to_le32(dma_addr); rx_desc->die_dt = DT_FEMPTY; @@ -309,7 +309,7 @@ static int ravb_ring_init(struct net_device *ndev, int q) /* Allocate all RX descriptors. */ ring_size = sizeof(struct ravb_ex_rx_desc) * (priv->num_rx_ring[q] + 1); - priv->rx_ring[q] = dma_alloc_coherent(NULL, ring_size, + priv->rx_ring[q] = dma_alloc_coherent(ndev->dev.parent, ring_size, &priv->rx_desc_dma[q], GFP_KERNEL); if (!priv->rx_ring[q]) @@ -320,7 +320,7 @@ static int ravb_ring_init(struct net_device *ndev, int q) /* Allocate all TX descriptors. */ ring_size = sizeof(struct ravb_tx_desc) * (priv->num_tx_ring[q] * NUM_TX_DESC + 1); - priv->tx_ring[q] = dma_alloc_coherent(NULL, ring_size, + priv->tx_ring[q] = dma_alloc_coherent(ndev->dev.parent, ring_size, &priv->tx_desc_dma[q], GFP_KERNEL); if (!priv->tx_ring[q]) @@ -443,7 +443,7 @@ static int ravb_tx_free(struct net_device *ndev, int q) size = le16_to_cpu(desc->ds_tagl) & TX_DS; /* Free the original skb. */ if (priv->tx_skb[q][entry / NUM_TX_DESC]) { - dma_unmap_single(&ndev->dev, le32_to_cpu(desc->dptr), + dma_unmap_single(ndev->dev.parent, le32_to_cpu(desc->dptr), size, DMA_TO_DEVICE); /* Last packet descriptor? */ if (entry % NUM_TX_DESC == NUM_TX_DESC - 1) { @@ -546,7 +546,7 @@ static bool ravb_rx(struct net_device *ndev, int *quota, int q) skb = priv->rx_skb[q][entry]; priv->rx_skb[q][entry] = NULL; - dma_unmap_single(&ndev->dev, le32_to_cpu(desc->dptr), + dma_unmap_single(ndev->dev.parent, le32_to_cpu(desc->dptr), ALIGN(PKT_BUF_SZ, 16),
[PATCH net-next 3/4] ravb: Document binding for r8a7795 SoC
From: Kazuya Mizuguchi This patch updates the ravb binding to support the r8a7795 SoC by: - Adding a compat string for the new hardware - Adding 25 named interrupts to binding for the new SoC; older SoCs continue to use a single multiplexed interrupt The example is also updated to reflect the r8a7795 as this is the more complex case. Based on work by Kazuya Mizuguchi and others. Signed-off-by: Simon Horman Acked-by: Geert Uytterhoeven --- v2 * First post; broken out of a driver update patch * As discussed with Geert Uytterhoeven and Sergei Shtylyov - Binding: Make all interrupts mandatory as named-interrupts of the form ch%u v3 * A suggested by Geert Uytterhoeven - Reword description of interrupts and interrupt-names to make things clearer. It is now based to some extent on spi-rspi.txt and renesas,usb-dmac.txt. * As suggested by Sergei Shtylyov - Drop phy-reset-gpio from example * Added power-domains to example v4 * A suggested by Geert Uytterhoeven - grammar fix for interrupt-names description * Add ack --- .../devicetree/bindings/net/renesas,ravb.txt | 69 +++--- 1 file changed, 62 insertions(+), 7 deletions(-) diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt b/Documentation/devicetree/bindings/net/renesas,ravb.txt index 1fd8831437bf..b486f3f5f6a3 100644 --- a/Documentation/devicetree/bindings/net/renesas,ravb.txt +++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt @@ -6,8 +6,12 @@ interface contains. Required properties: - compatible: "renesas,etheravb-r8a7790" if the device is a part of R8A7790 SoC. "renesas,etheravb-r8a7794" if the device is a part of R8A7794 SoC. + "renesas,etheravb-r8a7795" if the device is a part of R8A7795 SoC. - reg: offset and length of (1) the register block and (2) the stream buffer. -- interrupts: interrupt specifier for the sole interrupt. +- interrupts: A list of interrupt-specifiers, one for each entry in + interrupt-names. + If interrupt-names is not present, an interrupt specifier + for a single muxed interrupt. - phy-mode: see ethernet.txt file in the same directory. - phy-handle: see ethernet.txt file in the same directory. - #address-cells: number of address cells for the MDIO bus, must be equal to 1. @@ -18,6 +22,12 @@ Required properties: Optional properties: - interrupt-parent: the phandle for the interrupt controller that services interrupts for this device. +- interrupt-names: A list of interrupt names. + For the R8A7795 SoC this property is mandatory; + it should include one entry per channel, named "ch%u", + where %u is the channel number ranging from 0 to 24. + For other SoCs this property is optional; if present + it should contain "mux" for a single muxed interrupt. - pinctrl-names: pin configuration state name ("default"). - renesas,no-ether-link: boolean, specify when a board does not provide a proper AVB_LINK signal. @@ -27,13 +37,46 @@ Optional properties: Example: ethernet@e680 { - compatible = "renesas,etheravb-r8a7790"; - reg = <0 0xe680 0 0x800>, <0 0xee0e8000 0 0x4000>; + compatible = "renesas,etheravb-r8a7795"; + reg = <0 0xe680 0 0x800>, <0 0xe6a0 0 0x1>; interrupt-parent = <&gic>; - interrupts = <0 163 IRQ_TYPE_LEVEL_HIGH>; - clocks = <&mstp8_clks R8A7790_CLK_ETHERAVB>; - phy-mode = "rmii"; + interrupts = , +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +; + interrupt-names = "ch0", "ch1", "ch2", "ch3", + "ch4", "ch5", "ch6", "ch7", + "ch8", "ch9", "ch10", "ch11", + "ch12", "ch13", "ch14", "ch15", + "ch16", "ch17", "ch18", "ch19", + "ch20", "ch21", "ch22", "ch23", + "ch24"; + clocks = <&mstp8_clks R8A7795_CLK_ETHERAVB>; + power-domains = <&cpg_clocks>; + phy-mode = "rgmii-id";
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On 29 September 2015 at 21:09, Jason Baron wrote: > However, if we call connect on socket 's', to connect to a new socket 'o2', we > drop the reference on the original socket 'o'. Thus, we can now close socket > 'o' without unregistering from epoll. Then, when we either close the ep > or unregister 'o', we end up with this list corruption. Thus, this is not a > race per se, but can be triggered sequentially. Sounds profound, but the reproducers calls connect only once per socket. So there is no "connect to a new socket", no? But w/e, see below. > Linus explains the general case in the context the signalfd stuff here: > https://lkml.org/lkml/2013/10/14/634 I also found that posting while looking for similar bug reports. Also found that one: https://lkml.org/lkml/2014/5/15/532 > So this may be the case that we've been chasing here for a while... That bug triggers since commit 3c73419c09 "af_unix: fix 'poll for write'/ connected DGRAM sockets". That's v2.6.26-rc7, as noted in the reproducer. > > In any case, we could fix with that same POLLFREE mechansim, the simplest > would be something like: > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 03ee4d3..d499f81 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -392,6 +392,9 @@ static void unix_sock_destructor(struct sock *sk) > pr_debug("UNIX %p is destroyed, %ld are still alive.\n", sk, > atomic_long_read(&unix_nr_socks)); > #endif > + /* make sure we remove from epoll */ > + wake_up_poll(&u->peer_wait, POLLFREE); > + synchronize_sched(); > } > > static void unix_release_sock(struct sock *sk, int embrion) > > I'm not suggesting we apply that, but that fixed the supplied test case. > We could enhance the above, to avoid the free'ing latency there by doing > the SLAB_DESTROY_BY_RCU for unix sockets. But I'm not convinced > that this wouldn't be still broken for select()/poll() as well. I think > we can be in a select() call for socket 's', and if we remove socket > 'o' from it in the meantime (by doing a connect() on s to somewhere else > and a close on 'o'), I think we can still crash there. So POLLFREE would > have to be extended. I tried to hit this with select() but could not, > but I think if I tried harder I could. > > Instead of going further down that route, perhaps something like below > might be better. The basic idea would be to do away with the 'other' > poll call in unix_dgram_poll(), and instead revert back to a registering > on a single wait queue. We add a new wait queue to unix sockets such > that we can register it with a remote other on connect(). Then we can > use the wakeup from the remote to wake up the registered unix socket. > Probably better explained with the patch below. Note I didn't add to > the remote for SOCK_STREAM, since the poll() routine there doesn't do > the double wait queue registering: > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h > index 4a167b3..9698aff 100644 > --- a/include/net/af_unix.h > +++ b/include/net/af_unix.h > @@ -62,6 +62,7 @@ struct unix_sock { > #define UNIX_GC_CANDIDATE 0 > #define UNIX_GC_MAYBE_CYCLE1 > struct socket_wqpeer_wq; > + wait_queue_twait; > }; > #define unix_sk(__sk) ((struct unix_sock *)__sk) > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 03ee4d3..9e0692a 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -420,6 +420,8 @@ static void unix_release_sock(struct sock *sk, int > embrion) > skpair = unix_peer(sk); > > if (skpair != NULL) { > + if (sk->sk_type != SOCK_STREAM) > + remove_wait_queue(&unix_sk(skpair)->peer_wait, > &u->wait); > if (sk->sk_type == SOCK_STREAM || sk->sk_type == > SOCK_SEQPACKET) { > unix_state_lock(skpair); > /* No more writes */ > @@ -636,6 +638,16 @@ static struct proto unix_proto = { > */ > static struct lock_class_key af_unix_sk_receive_queue_lock_key; > > +static int peer_wake(wait_queue_t *wait, unsigned mode, int sync, void *key) > +{ > + struct unix_sock *u; > + > + u = container_of(wait, struct unix_sock, wait); > + wake_up_interruptible_sync_poll(sk_sleep(&u->sk), key); > + > + return 0; > +} > + > static struct sock *unix_create1(struct net *net, struct socket *sock, int > kern) > { > struct sock *sk = NULL; > @@ -664,6 +676,7 @@ static struct sock *unix_create1(struct net *net, struct > socket *sock, int kern) > INIT_LIST_HEAD(&u->link); > mutex_init(&u->readlock); /* single task reading lock */ > init_waitqueue_head(&u->peer_wait); > + init_waitqueue_func_entry(&u->wait, peer_wake); > unix_insert_socket(unix_sockets_unbound(sk), sk); > out: > if (sk == NULL) > @@ -1030,7 +1043,10 @@ restart: > */ > if (unix_peer(sk)) { > struct sock *old_peer = unix
Loan Offer
Contact us as we offer our finance service at a low and affordable interest rate for long and short cash term. Interested applicant should contact us for further acquisition procedures. Thanks as we remain obliged to render service to you; worldtrading1...@gmail.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] net: add pfmemalloc check in sk_add_backlog()
From: Eric Dumazet Date: Tue, 29 Sep 2015 18:52:25 -0700 > From: Eric Dumazet > > Greg reported crashes hitting the following check in __sk_backlog_rcv() > > BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); > > The pfmemalloc bit is currently checked in sk_filter(). > > This works correctly for TCP, because sk_filter() is ran in > tcp_v[46]_rcv() before hitting the prequeue or backlog checks. > > For UDP or other protocols, this does not work, because the sk_filter() > is ran from sock_queue_rcv_skb(), which might be called _after_ backlog > queuing if socket is owned by user by the time packet is processed by > softirq handler. > > Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB > processing") > Signed-off-by: Eric Dumazet > Reported-by: Greg Thelen Applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: macb: fix two typos
From: Geliang Tang Date: Tue, 29 Sep 2015 19:31:32 -0700 > Just fix two typos in code comments. > > Signed-off-by: Geliang Tang Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] netfilter: remove dead code
From: Florian Westphal Date: Wed, 30 Sep 2015 02:45:07 +0200 > Flavio Leitner wrote: >> Remove __nf_conntrack_find() from headers. >> Fixes: dcd93ed4cd1 ("netfilter: nf_conntrack: remove dead code" > > For the record: netfilter patches should go to > netfilter-de...@vger.kernel.org . > > That being said, in this case I doubt Pablo minds if David takes this > directly, patch ts obviously correct[tm] :) I don't want to create any unnecessary merge hassles, so please resubmit this properly to netfilter-devel, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: Initialize flow flags in input path
From: David Ahern Date: Tue, 29 Sep 2015 19:07:07 -0700 > The fib_table_lookup tracepoint found 2 places where the flowi4_flags is > not initialized. > > Signed-off-by: David Ahern Applied, thanks David. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/90] Netfilter/IPVS updates for net-next
From: Pablo Neira Ayuso Date: Tue, 29 Sep 2015 21:25:21 +0200 > Hi David, > > The following pull request contains Netfilter/IPVS updates for net-next > containing 90 patches from Eric Biederman. > > The main goal of this batch is to avoid recurrent lookups for the netns > pointer, that happens over and over again in our Netfilter/IPVS code. The idea > consists of passing netns pointer from the hook state to the relevant > functions > and objects where this may be needed. > > You can find more information on the IPVS updates from Simon Horman's commit > merge message: > > c3456026adc0 ("Merge tag 'ipvs2-for-v4.4' of > https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next";). > > Exceptionally, this time, I'm not posting the patches again on netdev, Eric > already Cc'ed this mailing list in the original submission. If you need me to > make, just let me know. Yeah that's appropriate in this situation. Pulled, thanks Pablo. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] net: dsa: fix preparation of a port STP update
From: Vivien Didelot Date: Tue, 29 Sep 2015 14:17:54 -0400 > Because of the default 0 value of ret in dsa_slave_port_attr_set, a > driver may return -EOPNOTSUPP from the commit phase of a STP state, > which triggers a WARN() from switchdev. > > This happened on a 6185 switch which does not support hardware bridging. > > Fixes: 3563606258cf ("switchdev: convert STP update to switchdev attr set") > Reported-by: Andrew Lunn > Signed-off-by: Vivien Didelot Applied and queued up for -stable, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: dsa: fix preparation of a port STP update
From: Vivien Didelot Date: Tue, 29 Sep 2015 12:38:36 -0400 > Because of the default 0 value of ret in dsa_slave_port_attr_set, a > driver may return -EOPNOTSUPP from the commit phase of a STP state, > which triggers a WARN() from switchdev. > > This happened on a 6185 switch which does not support hardware bridging. > > Reported-by: Andrew Lunn > Signed-off-by: Vivien Didelot Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] net: Add support for filtering neigh dump by master device
From: David Ahern Date: Tue, 29 Sep 2015 09:32:03 -0700 > Add support for filtering neighbor dumps by master device by adding > the NDA_MASTER attribute to the dump request. A new netlink flag, > NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the > request and output is filtered as requested. > > Signed-off-by: David Ahern > --- > v2 > - added NLM_F_DUMP_FILTERED flag for userspace feedback that request is > supported > > This method works for other filters as well and other dump commands. > Works fine for all combinations of new and old kernel and new and old ip: > 1. new ip command on old kernel, NDA_MASTER attribute is ignored > 2. old ip command on new kernel, NDA_MASTER attribute is not present > 3. new ip on new kernel ... goodness ensues by limiting data to >only what user wants Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tcp: fix tcp_v6_md5_do_lookup prototype
On Tue, 2015-09-29 at 21:24 -0700, Eric Dumazet wrote: > From: Eric Dumazet > > tcp_v6_md5_do_lookup() now takes a const socket, even if > CONFIG_TCP_MD5SIG is not set. > > Fixes: b83e3deb974c ("tcp: md5: constify tcp_md5_do_lookup() socket argument") > From: Eric Dumazet Signed-off-by: Eric Dumazet > Reported-by: kbuild test robot > --- > net/ipv6/tcp_ipv6.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c > index 0ac64f47f8821ce7da103ecc7391ba7e..2ae95e1d03e1c0d5149c9f6fa7cf94d9 > 100644 > --- a/net/ipv6/tcp_ipv6.c > +++ b/net/ipv6/tcp_ipv6.c > @@ -82,7 +82,7 @@ static const struct inet_connection_sock_af_ops > ipv6_specific; > static const struct tcp_sock_af_ops tcp_sock_ipv6_specific; > static const struct tcp_sock_af_ops tcp_sock_ipv6_mapped_specific; > #else > -static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(struct sock *sk, > +static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(const struct sock *sk, > const struct in6_addr *addr) > { > return NULL; > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tcp: fix tcp_v6_md5_do_lookup prototype
From: Eric Dumazet Date: Tue, 29 Sep 2015 21:24:05 -0700 > From: Eric Dumazet > > tcp_v6_md5_do_lookup() now takes a const socket, even if > CONFIG_TCP_MD5SIG is not set. > > Fixes: b83e3deb974c ("tcp: md5: constify tcp_md5_do_lookup() socket argument") > From: Eric Dumazet > Reported-by: kbuild test robot Applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 0/6] net: switchdev: use specific switchdev_obj_*
From: Vivien Didelot Date: Tue, 29 Sep 2015 12:07:12 -0400 > This patchset changes switchdev add, del, dump operations from this: ... > to something similar to the notifier_call callback of a notifier_block: ... > This allows the caller to pass and expect back a specific switchdev_obj_* > structure (e.g. switchdev_obj_fdb) instead of the generic switchdev_obj one. > > This will simplify pushing the callback function down to the drivers. > > The first 3 patches get rid of the dev parameter of the dump callback, since > it > is not always neeeded (e.g. vlan_dump) and some drivers (such as DSA drivers) > may not have easy access to it. > > Patches 4 and 5 implement the change in the switchdev operations and its > users. > > Patch 6 extracts the inner switchdev_obj_* structures from switchdev_obj and > removes this last one. > > v2: fix error spotted by kbuild (extra ';' inline switchdev_port_obj_dump). Series applied, thanks Vivien. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] net: Add support for filtering neigh dump by master device
On 9/29/15, 9:32 AM, David Ahern wrote: Add support for filtering neighbor dumps by master device by adding the NDA_MASTER attribute to the dump request. A new netlink flag, NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the request and output is filtered as requested. Signed-off-by: David Ahern Acked-by: Roopa Prabhu -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] tcp: fix tcp_v6_md5_do_lookup prototype
From: Eric Dumazet tcp_v6_md5_do_lookup() now takes a const socket, even if CONFIG_TCP_MD5SIG is not set. Fixes: b83e3deb974c ("tcp: md5: constify tcp_md5_do_lookup() socket argument") From: Eric Dumazet Reported-by: kbuild test robot --- net/ipv6/tcp_ipv6.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 0ac64f47f8821ce7da103ecc7391ba7e..2ae95e1d03e1c0d5149c9f6fa7cf94d9 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -82,7 +82,7 @@ static const struct inet_connection_sock_af_ops ipv6_specific; static const struct tcp_sock_af_ops tcp_sock_ipv6_specific; static const struct tcp_sock_af_ops tcp_sock_ipv6_mapped_specific; #else -static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(struct sock *sk, +static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(const struct sock *sk, const struct in6_addr *addr) { return NULL; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next:master 414/428] net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type
On Tue, 2015-09-29 at 21:12 -0700, David Miller wrote: > From: Eric Dumazet > Date: Tue, 29 Sep 2015 21:10:28 -0700 > > > Thanks, probably a matter of applying this patch. > > Looks obvious enough, please submit this formally, thanks. Sure ! I am compiling ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH bluetooth-next 1/4] netlink: add nla_get for le32 and le64
From: Marcel Holtmann Date: Tue, 29 Sep 2015 18:08:32 +0200 > do you have any objections to me taking this change through the > bluetooth-next tree? No objections. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] testptp: Silence compiler warnings on ppc64
From: Thomas Huth Date: Tue, 29 Sep 2015 17:45:28 +0200 > When compiling Documentation/ptp/testptp.c the following compiler > warnings are printed out: ... > This happens because __s64 is by default defined as "long" on ppc64, > not as "long long". However, to fix these warnings, it's possible to > define the __SANE_USERSPACE_TYPES__ so that __s64 gets defined to > "long long" on ppc64, too. > > Signed-off-by: Thomas Huth Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] net/mlx4: Handle return codes in mlx4_qp_attach_common
From: Robb Manes Date: Tue, 29 Sep 2015 11:03:37 -0400 > Both new_steering_entry() and existing_steering_entry() return values > based on their success or failure, but currently they fall through > silently. This can make troubleshooting difficult, as we were unable > to tell which one of these two functions returned errors or > specifically what code was returned. This patch remedies that > situation by passing the return codes to err, which is returned by > mlx4_qp_attach_common() itself. > > This also addresses a leak in the call to mlx4_bitmap_free() as well. > > Signed-off-by: Robb Manes Applied, thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next:master 414/428] net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type
From: Eric Dumazet Date: Tue, 29 Sep 2015 21:10:28 -0700 > Thanks, probably a matter of applying this patch. Looks obvious enough, please submit this formally, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] dsa: mv88e6xxx: Fix unsigned/signed issue
From: Andrew Lunn Date: Tue, 29 Sep 2015 01:53:48 +0200 > commit dea870242a9c ("dsa: mv88e6xxx: Allow speed/duplex of port to be > configured") leads to the following static checker warning: > > drivers/net/dsa/mv88e6xxx.c:585 mv88e6xxx_adjust_link() > warn: unsigned 'ret' is never less than zero. > > drivers/net/dsa/mv88e6xxx.c >573 void mv88e6xxx_adjust_link(struct dsa_switch *ds, int port, >574 struct phy_device *phydev) >575 { >576 struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); >577 u32 ret, reg; >578 >579 if (!phy_is_pseudo_fixed_link(phydev)) >580 return; >581 >582 mutex_lock(&ps->smi_mutex); >583 >584 ret = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL); >585 if (ret < 0) > > Make ret an int, which is the return type for _mv88e6xxx_reg_read() > > Reported-by: Dan Carpenter > Signed-off-by: Andrew Lunn Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] net: m68k: Allow modular build
From: Geert Uytterhoeven Date: Tue, 29 Sep 2015 10:24:01 +0200 > This patch series makes the remaining m68k Ethernet drivers modular. > It's an alternative to the last 3 patches of Paul Gortmaker's series > "[PATCH net-next 0/6] make non-modular code explicitly non-modular". > > Note that "[PATCH 5/5] net: macmace: Allow modular build" depends on > "[PATCH 4/5] m68k/mac: Export Peripheral System Controller (PSC) base > address to modules". Feel free to take the dependency through the netdev > tree to avoid modular build breakage. > > This was compile-tested only (mac_defconfig + allmodconfig) due to lack > of hardware. Series applied, thanks Geert. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 RESEND] BNX2: fix a Null Pointer for stats_blk
From: Weidong Wang Date: Tue, 29 Sep 2015 11:18:18 +0800 > @@ -839,11 +828,12 @@ bnx2_free_mem(struct bnx2 *bp) > } > > static int > -bnx2_alloc_mem(struct bnx2 *bp) > +bnx2_alloc_stats_blk(struct net_device *dev) > { > - int i, status_blk_size, err; > + int i, status_blk_size; > struct bnx2_napi *bnapi; > void *status_blk; > + struct bnx2 *bp = netdev_priv(dev); > > /* Combine status and statistics blocks into one allocation. */ > status_blk_size = L1_CACHE_ALIGN(sizeof(struct status_block)); This function is not just allocating the stats block, it's allocating a whole bunch of other things too. Only allocate the stats block at probe time, not the NAPI et al. stuff as well. That can safely stay in the open/close paths. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next:master 414/428] net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type
On Wed, 2015-09-30 at 12:01 +0800, kbuild test robot wrote: > tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git > master > head: e6934f3ec00b04234acb24a1a2c28af59763d3b5 > commit: a00e74442bac5ad19a929d097370da7e07540ea6 [414/428] tcp/dccp: constify > send_synack and send_reset socket argument > config: avr32-atngw100_defconfig (attached as .config) > reproduce: > wget > https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross > -O ~/bin/make.cross > chmod +x ~/bin/make.cross > git checkout a00e74442bac5ad19a929d097370da7e07540ea6 > # save the attached .config to linux build tree > make.cross ARCH=avr32 > > All warnings (new ones prefixed by >>): > >net/ipv6/tcp_ipv6.c: In function 'tcp_v6_reqsk_send_ack': > >> net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of > >> 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type >net/ipv6/tcp_ipv6.c:926: warning: passing argument 1 of > 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type > > vim +/tcp_v6_md5_do_lookup +930 net/ipv6/tcp_ipv6.c > > 9c76a114b Wang Yufen 2014-03-29 914 > tcptw->tw_ts_recent, tw->tw_bound_dev_if, tcp_twsk_md5_key(tcptw), > 21858cd02 Florent Fourcot 2015-05-16 915 > tw->tw_tclass, cpu_to_be32(tw->tw_flowlabel)); > ^1da177e4 Linus Torvalds 2005-04-16 916 > 8feaf0c0a Arnaldo Carvalho de Melo 2005-08-09 917inet_twsk_put(tw); > ^1da177e4 Linus Torvalds 2005-04-16 918 } > ^1da177e4 Linus Torvalds 2005-04-16 919 > a00e74442 Eric Dumazet 2015-09-29 920 static void > tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, > 6edafaaf6 Gui Jianfeng 2008-08-06 921 > struct request_sock *req) > ^1da177e4 Linus Torvalds 2005-04-16 922 { > 3a19ce0ee Daniel Lee 2014-05-11 923/* sk->sk_state == > TCP_LISTEN -> for regular TCP_SYN_RECV > 3a19ce0ee Daniel Lee 2014-05-11 924 * sk->sk_state == > TCP_SYN_RECV -> for Fast Open. > 3a19ce0ee Daniel Lee 2014-05-11 925 */ > 0f85feae6 Eric Dumazet 2014-12-09 926tcp_v6_send_ack(sk, > skb, (sk->sk_state == TCP_LISTEN) ? > 3a19ce0ee Daniel Lee 2014-05-11 927 > tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt, > 0f85feae6 Eric Dumazet 2014-12-09 928 > tcp_rsk(req)->rcv_nxt, req->rcv_wnd, > 0f85feae6 Eric Dumazet 2014-12-09 929 > tcp_time_stamp, req->ts_recent, sk->sk_bound_dev_if, > 1d13a96c7 Florent Fourcot 2014-01-16 @930 > tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr), > 1d13a96c7 Florent Fourcot 2014-01-16 9310, 0); > ^1da177e4 Linus Torvalds 2005-04-16 932 } > ^1da177e4 Linus Torvalds 2005-04-16 933 > ^1da177e4 Linus Torvalds 2005-04-16 934 > ^1da177e4 Linus Torvalds 2005-04-16 935 static struct sock > *tcp_v6_hnd_req(struct sock *sk, struct sk_buff *skb) > ^1da177e4 Linus Torvalds 2005-04-16 936 { > aa8223c7b Arnaldo Carvalho de Melo 2007-04-10 937const struct tcphdr *th > = tcp_hdr(skb); > 52452c542 Eric Dumazet 2015-03-19 938struct request_sock > *req; > > :: The code at line 930 was first introduced by commit > :: 1d13a96c74fc4802a775189ddb58bc6469ffdaa3 ipv6: tcp: fix flowlabel > value in ACK messages send from TIME_WAIT > > :: TO: Florent Fourcot > :: CC: David S. Miller > > --- Thanks, probably a matter of applying this patch. diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 0ac64f47f8821ce7da103ecc7391ba7e..2ae95e1d03e1c0d5149c9f6fa7cf94d9 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -82,7 +82,7 @@ static const struct inet_connection_sock_af_ops ipv6_specific; static const struct tcp_sock_af_ops tcp_sock_ipv6_specific; static const struct tcp_sock_af_ops tcp_sock_ipv6_mapped_specific; #else -static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(struct sock *sk, +static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(const struct sock *sk, const struct in6_addr *addr) { return NULL; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] dsa: mv88e6xxx: Enable forwarding for unknown to the CPU port
From: Andrew Lunn Date: Tue, 29 Sep 2015 01:50:56 +0200 > Frames destined to an unknown address must be forwarded to the CPU > port. Otherwise incoming ARP, dhcp leases, etc, do not work. > > Signed-off-by: Andrew Lunn Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] y2038 conversion for ntp/pps and sfc driver
From: Arnd Bergmann Date: Mon, 28 Sep 2015 22:21:27 +0200 > When trying to build a kernel with time_t commented out, I found that > the ntp subsystem still relies on timespec for its pps handling. > > This series addresses this and converts all the code to use timespec64 > instead, step by step. There is one device driver that interacts with > this code directly (rather than only through the ptp subsystem), so > I have to convert that driver at the same time. > > The patches should ideally stay together as a series, but they do > span multiple subsystems, so I'm also looking for the right person > to merge them. I'm happy with this going via a tree other than mine, and for the networking bits: Acked-by: David S. Miller -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 net-next 00/11] net: L3 master device
From: David Ahern Date: Tue, 29 Sep 2015 20:07:09 -0700 > v3 > - added license header to l3mdev.c > > - export symbols in l3mdev.c for use with GPL modules > > - removed netdevice header from l3mdev.h (not needed) and fixed > typo in comment Series applied, thanks David. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next:master 414/428] net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: e6934f3ec00b04234acb24a1a2c28af59763d3b5 commit: a00e74442bac5ad19a929d097370da7e07540ea6 [414/428] tcp/dccp: constify send_synack and send_reset socket argument config: avr32-atngw100_defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout a00e74442bac5ad19a929d097370da7e07540ea6 # save the attached .config to linux build tree make.cross ARCH=avr32 All warnings (new ones prefixed by >>): net/ipv6/tcp_ipv6.c: In function 'tcp_v6_reqsk_send_ack': >> net/ipv6/tcp_ipv6.c:930: warning: passing argument 1 of >> 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type net/ipv6/tcp_ipv6.c:926: warning: passing argument 1 of 'tcp_v6_md5_do_lookup' discards qualifiers from pointer target type vim +/tcp_v6_md5_do_lookup +930 net/ipv6/tcp_ipv6.c 9c76a114b Wang Yufen 2014-03-29 914 tcptw->tw_ts_recent, tw->tw_bound_dev_if, tcp_twsk_md5_key(tcptw), 21858cd02 Florent Fourcot 2015-05-16 915 tw->tw_tclass, cpu_to_be32(tw->tw_flowlabel)); ^1da177e4 Linus Torvalds 2005-04-16 916 8feaf0c0a Arnaldo Carvalho de Melo 2005-08-09 917 inet_twsk_put(tw); ^1da177e4 Linus Torvalds 2005-04-16 918 } ^1da177e4 Linus Torvalds 2005-04-16 919 a00e74442 Eric Dumazet 2015-09-29 920 static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, 6edafaaf6 Gui Jianfeng 2008-08-06 921 struct request_sock *req) ^1da177e4 Linus Torvalds 2005-04-16 922 { 3a19ce0ee Daniel Lee 2014-05-11 923 /* sk->sk_state == TCP_LISTEN -> for regular TCP_SYN_RECV 3a19ce0ee Daniel Lee 2014-05-11 924 * sk->sk_state == TCP_SYN_RECV -> for Fast Open. 3a19ce0ee Daniel Lee 2014-05-11 925 */ 0f85feae6 Eric Dumazet 2014-12-09 926 tcp_v6_send_ack(sk, skb, (sk->sk_state == TCP_LISTEN) ? 3a19ce0ee Daniel Lee 2014-05-11 927 tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt, 0f85feae6 Eric Dumazet 2014-12-09 928 tcp_rsk(req)->rcv_nxt, req->rcv_wnd, 0f85feae6 Eric Dumazet 2014-12-09 929 tcp_time_stamp, req->ts_recent, sk->sk_bound_dev_if, 1d13a96c7 Florent Fourcot 2014-01-16 @930 tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr), 1d13a96c7 Florent Fourcot 2014-01-16 931 0, 0); ^1da177e4 Linus Torvalds 2005-04-16 932 } ^1da177e4 Linus Torvalds 2005-04-16 933 ^1da177e4 Linus Torvalds 2005-04-16 934 ^1da177e4 Linus Torvalds 2005-04-16 935 static struct sock *tcp_v6_hnd_req(struct sock *sk, struct sk_buff *skb) ^1da177e4 Linus Torvalds 2005-04-16 936 { aa8223c7b Arnaldo Carvalho de Melo 2007-04-10 937 const struct tcphdr *th = tcp_hdr(skb); 52452c542 Eric Dumazet 2015-03-19 938 struct request_sock *req; :: The code at line 930 was first introduced by commit :: 1d13a96c74fc4802a775189ddb58bc6469ffdaa3 ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAIT :: TO: Florent Fourcot :: CC: David S. Miller --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
[PATCH net-next 07/11] net: Remove the now unused vrf_ptr
Signed-off-by: David Ahern --- drivers/net/vrf.c | 32 ++-- include/linux/netdevice.h | 2 -- include/net/vrf.h | 6 -- 3 files changed, 2 insertions(+), 38 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 72f1892ebad0..df872f4efb0d 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -396,18 +396,15 @@ static void __vrf_insert_slave(struct slave_queue *queue, struct slave *slave) static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev) { - struct net_vrf_dev *vrf_ptr = kmalloc(sizeof(*vrf_ptr), GFP_KERNEL); struct slave *slave = kzalloc(sizeof(*slave), GFP_KERNEL); struct net_vrf *vrf = netdev_priv(dev); struct slave_queue *queue = &vrf->queue; int ret = -ENOMEM; - if (!slave || !vrf_ptr) + if (!slave) goto out_fail; slave->dev = port_dev; - vrf_ptr->ifindex = dev->ifindex; - vrf_ptr->tb_id = vrf->tb_id; /* register the packet handler for slave ports */ ret = netdev_rx_handler_register(port_dev, vrf_handle_frame, dev); @@ -424,7 +421,6 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev) port_dev->flags |= IFF_SLAVE; __vrf_insert_slave(queue, slave); - rcu_assign_pointer(port_dev->vrf_ptr, vrf_ptr); cycle_netdev(port_dev); return 0; @@ -432,7 +428,6 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev) out_unregister: netdev_rx_handler_unregister(port_dev); out_fail: - kfree(vrf_ptr); kfree(slave); return ret; } @@ -448,21 +443,15 @@ static int vrf_add_slave(struct net_device *dev, struct net_device *port_dev) /* inverse of do_vrf_add_slave */ static int do_vrf_del_slave(struct net_device *dev, struct net_device *port_dev) { - struct net_vrf_dev *vrf_ptr = rtnl_dereference(port_dev->vrf_ptr); struct net_vrf *vrf = netdev_priv(dev); struct slave_queue *queue = &vrf->queue; struct slave *slave; - RCU_INIT_POINTER(port_dev->vrf_ptr, NULL); - netdev_upper_dev_unlink(port_dev, dev); port_dev->flags &= ~IFF_SLAVE; netdev_rx_handler_unregister(port_dev); - /* after netdev_rx_handler_unregister for synchronize_rcu */ - kfree(vrf_ptr); - cycle_netdev(port_dev); slave = __vrf_find_slave_dev(queue, port_dev); @@ -601,10 +590,6 @@ static int vrf_validate(struct nlattr *tb[], struct nlattr *data[]) static void vrf_dellink(struct net_device *dev, struct list_head *head) { - struct net_vrf_dev *vrf_ptr = rtnl_dereference(dev->vrf_ptr); - - RCU_INIT_POINTER(dev->vrf_ptr, NULL); - kfree_rcu(vrf_ptr, rcu); unregister_netdevice_queue(dev, head); } @@ -612,7 +597,6 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[]) { struct net_vrf *vrf = netdev_priv(dev); - struct net_vrf_dev *vrf_ptr; int err; if (!data || !data[IFLA_VRF_TABLE]) @@ -622,24 +606,13 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev, dev->priv_flags |= IFF_L3MDEV_MASTER; - err = -ENOMEM; - vrf_ptr = kmalloc(sizeof(*dev->vrf_ptr), GFP_KERNEL); - if (!vrf_ptr) - goto out_fail; - - vrf_ptr->ifindex = dev->ifindex; - vrf_ptr->tb_id = vrf->tb_id; - err = register_netdevice(dev); if (err < 0) goto out_fail; - rcu_assign_pointer(dev->vrf_ptr, vrf_ptr); - return 0; out_fail: - kfree(vrf_ptr); free_netdev(dev); return err; } @@ -683,10 +656,9 @@ static int vrf_device_event(struct notifier_block *unused, /* only care about unregister events to drop slave references */ if (event == NETDEV_UNREGISTER) { - struct net_vrf_dev *vrf_ptr = rtnl_dereference(dev->vrf_ptr); struct net_device *vrf_dev; - if (!vrf_ptr || netif_is_l3_master(dev)) + if (netif_is_l3_master(dev)) goto out; vrf_dev = netdev_master_upper_dev_get(dev); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c7f14794fe14..72bf9e37a2f0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1427,7 +1427,6 @@ enum netdev_priv_flags { * @dn_ptr:DECnet specific data * @ip6_ptr: IPv6 specific data * @ax25_ptr: AX.25 specific data - * @vrf_ptr: VRF specific data * @ieee80211_ptr: IEEE 802.11 specific data, assign before registering * * @last_rx: Time of last Rx @@ -1649,7 +1648,6 @@ struct net_device { struct dn_dev __rcu *dn_ptr; struct inet6_dev __rcu *ip6_ptr; void*ax25_ptr; - struct net_
[PATCH net-next 05/11] net: Replace vrf_dev_table and friends
Replace calls to vrf_dev_table and friends with l3mdev_fib_table and kin. Signed-off-by: David Ahern --- include/net/vrf.h | 80 - net/ipv4/af_inet.c | 4 +-- net/ipv4/fib_frontend.c | 7 ++--- 3 files changed, 5 insertions(+), 86 deletions(-) diff --git a/include/net/vrf.h b/include/net/vrf.h index 874a6c9e4217..b05b96646e2a 100644 --- a/include/net/vrf.h +++ b/include/net/vrf.h @@ -34,66 +34,6 @@ struct net_vrf { #if IS_ENABLED(CONFIG_NET_VRF) -/* called with rcu_read_lock */ -static inline u32 vrf_dev_table_rcu(const struct net_device *dev) -{ - u32 tb_id = 0; - - if (dev) { - struct net_vrf_dev *vrf_ptr; - - vrf_ptr = rcu_dereference(dev->vrf_ptr); - if (vrf_ptr) - tb_id = vrf_ptr->tb_id; - } - return tb_id; -} - -static inline u32 vrf_dev_table(const struct net_device *dev) -{ - u32 tb_id; - - rcu_read_lock(); - tb_id = vrf_dev_table_rcu(dev); - rcu_read_unlock(); - - return tb_id; -} - -static inline u32 vrf_dev_table_ifindex(struct net *net, int ifindex) -{ - struct net_device *dev; - u32 tb_id = 0; - - if (!ifindex) - return 0; - - rcu_read_lock(); - - dev = dev_get_by_index_rcu(net, ifindex); - if (dev) - tb_id = vrf_dev_table_rcu(dev); - - rcu_read_unlock(); - - return tb_id; -} - -/* called with rtnl */ -static inline u32 vrf_dev_table_rtnl(const struct net_device *dev) -{ - u32 tb_id = 0; - - if (dev) { - struct net_vrf_dev *vrf_ptr; - - vrf_ptr = rtnl_dereference(dev->vrf_ptr); - if (vrf_ptr) - tb_id = vrf_ptr->tb_id; - } - return tb_id; -} - /* caller has already checked netif_is_l3_master(dev) */ static inline struct rtable *vrf_dev_get_rth(const struct net_device *dev) { @@ -108,26 +48,6 @@ static inline struct rtable *vrf_dev_get_rth(const struct net_device *dev) } #else -static inline u32 vrf_dev_table_rcu(const struct net_device *dev) -{ - return 0; -} - -static inline u32 vrf_dev_table(const struct net_device *dev) -{ - return 0; -} - -static inline u32 vrf_dev_table_ifindex(struct net *net, int ifindex) -{ - return 0; -} - -static inline u32 vrf_dev_table_rtnl(const struct net_device *dev) -{ - return 0; -} - static inline struct rtable *vrf_dev_get_rth(const struct net_device *dev) { return ERR_PTR(-ENETUNREACH); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 8a556643b874..0df3f0527648 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -119,7 +119,7 @@ #ifdef CONFIG_IP_MROUTE #include #endif -#include +#include /* The inetsw table contains everything that inet_create needs to @@ -450,7 +450,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) goto out; } - tb_id = vrf_dev_table_ifindex(net, sk->sk_bound_dev_if) ? : tb_id; + tb_id = l3mdev_fib_table_by_index(net, sk->sk_bound_dev_if) ? : tb_id; chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, tb_id); /* Not specified by any standard per-se, however it breaks too diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index b901b344f22d..fac172370276 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -45,7 +45,6 @@ #include #include #include -#include #include #include @@ -256,7 +255,7 @@ EXPORT_SYMBOL(inet_addr_type); unsigned int inet_dev_addr_type(struct net *net, const struct net_device *dev, __be32 addr) { - u32 rt_table = vrf_dev_table(dev) ? : RT_TABLE_LOCAL; + u32 rt_table = l3mdev_fib_table(dev) ? : RT_TABLE_LOCAL; return __inet_dev_addr_type(net, dev, addr, rt_table); } @@ -269,7 +268,7 @@ unsigned int inet_addr_type_dev_table(struct net *net, const struct net_device *dev, __be32 addr) { - u32 rt_table = vrf_dev_table(dev) ? : RT_TABLE_LOCAL; + u32 rt_table = l3mdev_fib_table(dev) ? : RT_TABLE_LOCAL; return __inet_dev_addr_type(net, NULL, addr, rt_table); } @@ -804,7 +803,7 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct in_ifaddr *ifa) { struct net *net = dev_net(ifa->ifa_dev->dev); - u32 tb_id = vrf_dev_table_rtnl(ifa->ifa_dev->dev); + u32 tb_id = l3mdev_fib_table(ifa->ifa_dev->dev); struct fib_table *tb; struct fib_config cfg = { .fc_protocol = RTPROT_KERNEL, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htm
[PATCH net-next 08/11] net: Remove vrf header file
Move remaining structs to VRF driver and delete the vrf header file. Signed-off-by: David Ahern --- MAINTAINERS | 1 - drivers/net/vrf.c | 16 +++- include/net/vrf.h | 29 - 3 files changed, 15 insertions(+), 31 deletions(-) delete mode 100644 include/net/vrf.h diff --git a/MAINTAINERS b/MAINTAINERS index 3f2d7a9d0bbf..fa43fa2f30e4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11273,7 +11273,6 @@ M: Shrijeet Mukherjee L: netdev@vger.kernel.org S: Maintained F: drivers/net/vrf.c -F: include/net/vrf.h F: Documentation/networking/vrf.txt VT1211 HARDWARE MONITOR DRIVER diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index df872f4efb0d..64f2ab663ffe 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -34,7 +34,6 @@ #include #include #include -#include #include #define DRV_NAME "vrf" @@ -45,6 +44,21 @@ #define vrf_master_get_rcu(dev) \ ((struct net_device *)rcu_dereference(dev->rx_handler_data)) +struct slave { + struct list_headlist; + struct net_device *dev; +}; + +struct slave_queue { + struct list_headall_slaves; +}; + +struct net_vrf { + struct slave_queue queue; + struct rtable *rth; + u32 tb_id; +}; + struct pcpu_dstats { u64 tx_pkts; u64 tx_bytes; diff --git a/include/net/vrf.h b/include/net/vrf.h deleted file mode 100644 index e83fc38770dd.. --- a/include/net/vrf.h +++ /dev/null @@ -1,29 +0,0 @@ -/* - * include/net/net_vrf.h - adds vrf dev structure definitions - * Copyright (c) 2015 Cumulus Networks - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - */ - -#ifndef __LINUX_NET_VRF_H -#define __LINUX_NET_VRF_H - -struct slave { - struct list_headlist; - struct net_device *dev; -}; - -struct slave_queue { - struct list_headall_slaves; -}; - -struct net_vrf { - struct slave_queue queue; - struct rtable *rth; - u32 tb_id; -}; - -#endif /* __LINUX_NET_VRF_H */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 03/11] net: Add support for l3mdev ops to VRF driver
Signed-off-by: David Ahern --- drivers/net/Kconfig | 1 + drivers/net/vrf.c | 29 + 2 files changed, 30 insertions(+) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index d18eb607bee6..b9ebd0d18a52 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -299,6 +299,7 @@ config NLMON config NET_VRF tristate "Virtual Routing and Forwarding (Lite)" depends on IP_MULTIPLE_TABLES && IPV6_MULTIPLE_TABLES + depends on NET_L3_MASTER_DEV ---help--- This option enables the support for mapping interfaces into VRF's. The support enables VRF devices. diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 2d7418e0b908..72f1892ebad0 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -35,6 +35,7 @@ #include #include #include +#include #define DRV_NAME "vrf" #define DRV_VERSION"1.0" @@ -529,6 +530,33 @@ static const struct net_device_ops vrf_netdev_ops = { .ndo_del_slave = vrf_del_slave, }; +static u32 vrf_fib_table(const struct net_device *dev) +{ + struct net_vrf *vrf = netdev_priv(dev); + + return vrf->tb_id; +} + +static struct rtable *vrf_get_rtable(const struct net_device *dev, +const struct flowi4 *fl4) +{ + struct rtable *rth = NULL; + + if (!(fl4->flowi4_flags & FLOWI_FLAG_VRFSRC)) { + struct net_vrf *vrf = netdev_priv(dev); + + rth = vrf->rth; + atomic_inc(&rth->dst.__refcnt); + } + + return rth; +} + +static const struct l3mdev_ops vrf_l3mdev_ops = { + .l3mdev_fib_table = vrf_fib_table, + .l3mdev_get_rtable = vrf_get_rtable, +}; + static void vrf_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { @@ -546,6 +574,7 @@ static void vrf_setup(struct net_device *dev) /* Initialize the device structure. */ dev->netdev_ops = &vrf_netdev_ops; + dev->l3mdev_ops = &vrf_l3mdev_ops; dev->ethtool_ops = &vrf_ethtool_ops; dev->destructor = free_netdev; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 net-next 00/11] net: L3 master device
The VRF device is essentially a Layer 3 master device used to associate netdevices with a specific routing table and to influence FIB lookups via 'ip rules' and controlling the oif/iif used for the lookup. This series generalizes the VRF into L3 master device, l3mdev. Similar to switchdev it has a Kconfig option and separate set of operations in net_device allowing it to be completely compiled out if not wanted. The l3mdev methods rely on the 'master' aspect and use of netdev_master_upper_dev_get_rcu to retrieve the master device from a given netdevice if it is enslaved to an L3_MASTER. The VRF device is converted to use the l3mdev operations. At the end the vrf_ptr is no longer and removed, as are all direct references to VRF. The end result is a much simpler implementation for VRF. Thanks to Nikolay for suggestions (eg., use of the master linkage which is the key to making this work) and to Roopa, Andy and Shrijeet for early reviews. v3 - added license header to l3mdev.c - export symbols in l3mdev.c for use with GPL modules - removed netdevice header from l3mdev.h (not needed) and fixed typo in comment v2 - rebased to top of net-next - addressed Niks comments (checking master, removing extra lines, and flipping the order of patches 1 and 2) Changes since RFC: - Changed IFF_L3MDEV to IFF_L3MDEV_MASTER after Nikolay pointed out a problem with my flag changes (uniquely identifying a L3MDEV master device versus an enslaved device like a bond that will also be a master device) - Rolled in icmp fix for panic when flipping from vrf functions to l3mdev - Moved netif_is_l3_master check into l3mdev_get_rtable David Ahern (11): net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER net: Introduce L3 Master device abstraction net: Add support for l3mdev ops to VRF driver net: Replace vrf_master_ifindex{,_rcu} with l3mdev equivalents net: Replace vrf_dev_table and friends net: Replace calls to vrf_dev_get_rth net: Remove the now unused vrf_ptr net: Remove vrf header file net: Move netif_index_is_l3_master to l3mdev.h net: Rename FLOWI_FLAG_VRFSRC to FLOWI_FLAG_L3MDEV_SRC net: Add netif_is_l3_slave MAINTAINERS | 8 ++- drivers/net/Kconfig | 1 + drivers/net/vrf.c | 89 +-- include/linux/netdevice.h | 43 --- include/net/flow.h| 2 +- include/net/l3mdev.h | 149 ++ include/net/route.h | 5 +- include/net/vrf.h | 178 -- net/Kconfig | 1 + net/Makefile | 3 + net/ipv4/af_inet.c| 4 +- net/ipv4/fib_frontend.c | 12 ++-- net/ipv4/icmp.c | 8 +-- net/ipv4/ip_fragment.c| 6 +- net/ipv4/ip_output.c | 2 +- net/ipv4/route.c | 15 ++-- net/ipv4/udp.c| 4 +- net/ipv4/xfrm4_policy.c | 8 +-- net/ipv6/xfrm6_policy.c | 8 +-- net/l3mdev/Kconfig| 10 +++ net/l3mdev/Makefile | 5 ++ net/l3mdev/l3mdev.c | 92 22 files changed, 369 insertions(+), 284 deletions(-) create mode 100644 include/net/l3mdev.h delete mode 100644 include/net/vrf.h create mode 100644 net/l3mdev/Kconfig create mode 100644 net/l3mdev/Makefile create mode 100644 net/l3mdev/l3mdev.c -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 09/11] net: Move netif_index_is_l3_master to l3mdev.h
Change CONFIG dependency to CONFIG_NET_L3_MASTER_DEV as well. Signed-off-by: David Ahern --- include/linux/netdevice.h | 21 - include/net/l3mdev.h | 24 include/net/route.h | 1 + 3 files changed, 25 insertions(+), 21 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 72bf9e37a2f0..b9450784ae06 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3840,27 +3840,6 @@ static inline bool netif_is_ovs_master(const struct net_device *dev) return dev->priv_flags & IFF_OPENVSWITCH; } -static inline bool netif_index_is_l3_master(struct net *net, int ifindex) -{ - bool rc = false; - -#if IS_ENABLED(CONFIG_NET_VRF) - struct net_device *dev; - - if (ifindex == 0) - return false; - - rcu_read_lock(); - - dev = dev_get_by_index_rcu(net, ifindex); - if (dev) - rc = netif_is_l3_master(dev); - - rcu_read_unlock(); -#endif - return rc; -} - /* This device needs to keep skb dst for qdisc enqueue or ndo_start_xmit() */ static inline void netif_keep_dst(struct net_device *dev) { diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h index e382c777bab8..87cee05a0a17 100644 --- a/include/net/l3mdev.h +++ b/include/net/l3mdev.h @@ -81,6 +81,25 @@ static inline struct rtable *l3mdev_get_rtable(const struct net_device *dev, return NULL; } +static inline bool netif_index_is_l3_master(struct net *net, int ifindex) +{ + struct net_device *dev; + bool rc = false; + + if (ifindex == 0) + return false; + + rcu_read_lock(); + + dev = dev_get_by_index_rcu(net, ifindex); + if (dev) + rc = netif_is_l3_master(dev); + + rcu_read_unlock(); + + return rc; +} + #else static inline int l3mdev_master_ifindex_rcu(struct net_device *dev) @@ -120,6 +139,11 @@ static inline struct rtable *l3mdev_get_rtable(const struct net_device *dev, return NULL; } +static inline bool netif_index_is_l3_master(struct net *net, int ifindex) +{ + return false; +} + #endif #endif /* _NET_L3MDEV_H_ */ diff --git a/include/net/route.h b/include/net/route.h index a565d0dad12c..e211dc167db1 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 06/11] net: Replace calls to vrf_dev_get_rth
Replace calls to vrf_dev_get_rth with l3mdev_get_rtable. The check on the flow flags is handled in the l3mdev operation. Signed-off-by: David Ahern --- include/net/vrf.h | 22 -- net/ipv4/route.c | 8 +++- 2 files changed, 3 insertions(+), 27 deletions(-) diff --git a/include/net/vrf.h b/include/net/vrf.h index b05b96646e2a..5bba1535ba73 100644 --- a/include/net/vrf.h +++ b/include/net/vrf.h @@ -32,26 +32,4 @@ struct net_vrf { u32 tb_id; }; - -#if IS_ENABLED(CONFIG_NET_VRF) -/* caller has already checked netif_is_l3_master(dev) */ -static inline struct rtable *vrf_dev_get_rth(const struct net_device *dev) -{ - struct rtable *rth = ERR_PTR(-ENETUNREACH); - struct net_vrf *vrf = netdev_priv(dev); - - if (vrf) { - rth = vrf->rth; - atomic_inc(&rth->dst.__refcnt); - } - return rth; -} - -#else -static inline struct rtable *vrf_dev_get_rth(const struct net_device *dev) -{ - return ERR_PTR(-ENETUNREACH); -} -#endif - #endif /* __LINUX_NET_VRF_H */ diff --git a/net/ipv4/route.c b/net/ipv4/route.c index ba47c45c..1441de1550e6 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -112,7 +112,6 @@ #endif #include #include -#include #include #define RT_FL_TOS(oldflp4) \ @@ -2125,11 +2124,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) fl4->saddr = inet_select_addr(dev_out, 0, RT_SCOPE_HOST); } - if (netif_is_l3_master(dev_out) && - !(fl4->flowi4_flags & FLOWI_FLAG_VRFSRC)) { - rth = vrf_dev_get_rth(dev_out); + + rth = l3mdev_get_rtable(dev_out, fl4); + if (rth) goto out; - } } if (!fl4->daddr) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 02/11] net: Introduce L3 Master device abstraction
L3 master devices allow users of the abstraction to influence FIB lookups for enslaved devices. Current API provides a means for the master device to return a specific FIB table for an enslaved device, to return an rtable/custom dst and influence the OIF used for fib lookups. Signed-off-by: David Ahern --- MAINTAINERS | 7 +++ include/linux/netdevice.h | 3 ++ include/net/l3mdev.h | 125 ++ net/Kconfig | 1 + net/Makefile | 3 ++ net/l3mdev/Kconfig| 10 net/l3mdev/Makefile | 5 ++ net/l3mdev/l3mdev.c | 92 ++ 8 files changed, 246 insertions(+) create mode 100644 include/net/l3mdev.h create mode 100644 net/l3mdev/Kconfig create mode 100644 net/l3mdev/Makefile create mode 100644 net/l3mdev/l3mdev.c diff --git a/MAINTAINERS b/MAINTAINERS index bcd263de4827..3f2d7a9d0bbf 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6095,6 +6095,13 @@ F: Documentation/auxdisplay/ks0108 F: drivers/auxdisplay/ks0108.c F: include/linux/ks0108.h +L3MDEV +M: David Ahern +L: netdev@vger.kernel.org +S: Maintained +F: net/l3mdev +F: include/net/l3mdev.h + LAPB module L: linux-...@vger.kernel.org S: Orphan diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 99c33e83822f..c7f14794fe14 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1587,6 +1587,9 @@ struct net_device { #ifdef CONFIG_NET_SWITCHDEV const struct switchdev_ops *switchdev_ops; #endif +#ifdef CONFIG_NET_L3_MASTER_DEV + const struct l3mdev_ops *l3mdev_ops; +#endif const struct header_ops *header_ops; diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h new file mode 100644 index ..e382c777bab8 --- /dev/null +++ b/include/net/l3mdev.h @@ -0,0 +1,125 @@ +/* + * include/net/l3mdev.h - L3 master device API + * Copyright (c) 2015 Cumulus Networks + * Copyright (c) 2015 David Ahern + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ +#ifndef _NET_L3MDEV_H_ +#define _NET_L3MDEV_H_ + +/** + * struct l3mdev_ops - l3mdev operations + * + * @l3mdev_fib_table: Get FIB table id to use for lookups + * + * @l3mdev_get_rtable: Get cached IPv4 rtable (dst_entry) for device + */ + +struct l3mdev_ops { + u32 (*l3mdev_fib_table)(const struct net_device *dev); + struct rtable * (*l3mdev_get_rtable)(const struct net_device *dev, +const struct flowi4 *fl4); +}; + +#ifdef CONFIG_NET_L3_MASTER_DEV + +int l3mdev_master_ifindex_rcu(struct net_device *dev); +static inline int l3mdev_master_ifindex(struct net_device *dev) +{ + int ifindex; + + rcu_read_lock(); + ifindex = l3mdev_master_ifindex_rcu(dev); + rcu_read_unlock(); + + return ifindex; +} + +/* get index of an interface to use for FIB lookups. For devices + * enslaved to an L3 master device FIB lookups are based on the + * master index + */ +static inline int l3mdev_fib_oif_rcu(struct net_device *dev) +{ + return l3mdev_master_ifindex_rcu(dev) ? : dev->ifindex; +} + +static inline int l3mdev_fib_oif(struct net_device *dev) +{ + int oif; + + rcu_read_lock(); + oif = l3mdev_fib_oif_rcu(dev); + rcu_read_unlock(); + + return oif; +} + +u32 l3mdev_fib_table_rcu(const struct net_device *dev); +u32 l3mdev_fib_table_by_index(struct net *net, int ifindex); +static inline u32 l3mdev_fib_table(const struct net_device *dev) +{ + u32 tb_id; + + rcu_read_lock(); + tb_id = l3mdev_fib_table_rcu(dev); + rcu_read_unlock(); + + return tb_id; +} + +static inline struct rtable *l3mdev_get_rtable(const struct net_device *dev, + const struct flowi4 *fl4) +{ + if (netif_is_l3_master(dev) && dev->l3mdev_ops->l3mdev_get_rtable) + return dev->l3mdev_ops->l3mdev_get_rtable(dev, fl4); + + return NULL; +} + +#else + +static inline int l3mdev_master_ifindex_rcu(struct net_device *dev) +{ + return 0; +} +static inline int l3mdev_master_ifindex(struct net_device *dev) +{ + return 0; +} + +static inline int l3mdev_fib_oif_rcu(struct net_device *dev) +{ + return dev ? dev->ifindex : 0; +} +static inline int l3mdev_fib_oif(struct net_device *dev) +{ + return dev ? dev->ifindex : 0; +} + +static inline u32 l3mdev_fib_table_rcu(const struct net_device *dev) +{ + return 0; +} +static inline u32 l3mdev_fib_table(const struct net_device *dev) +{ + return 0; +} +static inline u32 l3mdev_fib_table_by_index(struct net *net, int ifindex) +{ + return 0; +} + +static inline struct rtable *l3mdev_get_rtable(c
[PATCH net-next 01/11] net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER
Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the netif_is_vrf and netif_index_is_vrf macros. Signed-off-by: David Ahern --- drivers/net/vrf.c | 6 +++--- include/linux/netdevice.h | 14 +++--- include/net/route.h | 2 +- include/net/vrf.h | 4 ++-- net/ipv4/ip_output.c | 2 +- net/ipv4/route.c | 2 +- net/ipv4/udp.c| 2 +- 7 files changed, 16 insertions(+), 16 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 4ecb3a3e516a..2d7418e0b908 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -438,7 +438,7 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev) static int vrf_add_slave(struct net_device *dev, struct net_device *port_dev) { - if (netif_is_vrf(port_dev) || vrf_is_slave(port_dev)) + if (netif_is_l3_master(port_dev) || vrf_is_slave(port_dev)) return -EINVAL; return do_vrf_add_slave(dev, port_dev); @@ -591,7 +591,7 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev, vrf->tb_id = nla_get_u32(data[IFLA_VRF_TABLE]); - dev->priv_flags |= IFF_VRF_MASTER; + dev->priv_flags |= IFF_L3MDEV_MASTER; err = -ENOMEM; vrf_ptr = kmalloc(sizeof(*dev->vrf_ptr), GFP_KERNEL); @@ -657,7 +657,7 @@ static int vrf_device_event(struct notifier_block *unused, struct net_vrf_dev *vrf_ptr = rtnl_dereference(dev->vrf_ptr); struct net_device *vrf_dev; - if (!vrf_ptr || netif_is_vrf(dev)) + if (!vrf_ptr || netif_is_l3_master(dev)) goto out; vrf_dev = netdev_master_upper_dev_get(dev); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d2ffeafc9998..99c33e83822f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1258,7 +1258,7 @@ struct net_device_ops { * @IFF_LIVE_ADDR_CHANGE: device supports hardware address * change when it's running * @IFF_MACVLAN: Macvlan device - * @IFF_VRF_MASTER: device is a VRF master + * @IFF_L3MDEV_MASTER: device is an L3 master device * @IFF_NO_QUEUE: device can run without qdisc attached * @IFF_OPENVSWITCH: device is a Open vSwitch master */ @@ -1283,7 +1283,7 @@ enum netdev_priv_flags { IFF_XMIT_DST_RELEASE_PERM = 1<<17, IFF_IPVLAN_MASTER = 1<<18, IFF_IPVLAN_SLAVE= 1<<19, - IFF_VRF_MASTER = 1<<20, + IFF_L3MDEV_MASTER = 1<<20, IFF_NO_QUEUE= 1<<21, IFF_OPENVSWITCH = 1<<22, }; @@ -1308,7 +1308,7 @@ enum netdev_priv_flags { #define IFF_XMIT_DST_RELEASE_PERM IFF_XMIT_DST_RELEASE_PERM #define IFF_IPVLAN_MASTER IFF_IPVLAN_MASTER #define IFF_IPVLAN_SLAVE IFF_IPVLAN_SLAVE -#define IFF_VRF_MASTER IFF_VRF_MASTER +#define IFF_L3MDEV_MASTER IFF_L3MDEV_MASTER #define IFF_NO_QUEUE IFF_NO_QUEUE #define IFF_OPENVSWITCHIFF_OPENVSWITCH @@ -3824,9 +3824,9 @@ static inline bool netif_supports_nofcs(struct net_device *dev) return dev->priv_flags & IFF_SUPP_NOFCS; } -static inline bool netif_is_vrf(const struct net_device *dev) +static inline bool netif_is_l3_master(const struct net_device *dev) { - return dev->priv_flags & IFF_VRF_MASTER; + return dev->priv_flags & IFF_L3MDEV_MASTER; } static inline bool netif_is_bridge_master(const struct net_device *dev) @@ -3839,7 +3839,7 @@ static inline bool netif_is_ovs_master(const struct net_device *dev) return dev->priv_flags & IFF_OPENVSWITCH; } -static inline bool netif_index_is_vrf(struct net *net, int ifindex) +static inline bool netif_index_is_l3_master(struct net *net, int ifindex) { bool rc = false; @@ -3853,7 +3853,7 @@ static inline bool netif_index_is_vrf(struct net *net, int ifindex) dev = dev_get_by_index_rcu(net, ifindex); if (dev) - rc = netif_is_vrf(dev); + rc = netif_is_l3_master(dev); rcu_read_unlock(); #endif diff --git a/include/net/route.h b/include/net/route.h index d1bd90bb3187..a565d0dad12c 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -256,7 +256,7 @@ static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, __be32 if (inet_sk(sk)->transparent) flow_flags |= FLOWI_FLAG_ANYSRC; - if (netif_index_is_vrf(sock_net(sk), oif)) + if (netif_index_is_l3_master(sock_net(sk), oif)) flow_flags |= FLOWI_FLAG_VRFSRC | FLOWI_FLAG_SKIP_NH_OIF; flowi4_init_output(fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE, diff --git a/include/net/vrf.h b/include/net/vrf.h index 593e6094ddd4..34bb3f69def2 100644 --- a/include/net/vrf.h +++ b/include/net/vrf.h @@ -43,7 +43,7 @@ static inline int vrf_master_ifindex_rcu(cons
[PATCH net-next 04/11] net: Replace vrf_master_ifindex{,_rcu} with l3mdev equivalents
Replace calls to vrf_master_ifindex_rcu and vrf_master_ifindex with either l3mdev_master_ifindex_rcu or l3mdev_master_ifindex. The pattern: oif = vrf_master_ifindex(dev) ? : dev->ifindex; is replaced with oif = l3mdev_fib_oif(dev); And remove the now unused vrf macros. Signed-off-by: David Ahern --- include/net/vrf.h | 41 - net/ipv4/fib_frontend.c | 5 +++-- net/ipv4/icmp.c | 8 net/ipv4/ip_fragment.c | 6 +++--- net/ipv4/route.c| 7 --- net/ipv4/xfrm4_policy.c | 8 +++- net/ipv6/xfrm6_policy.c | 8 +++- 7 files changed, 20 insertions(+), 63 deletions(-) diff --git a/include/net/vrf.h b/include/net/vrf.h index 34bb3f69def2..874a6c9e4217 100644 --- a/include/net/vrf.h +++ b/include/net/vrf.h @@ -34,37 +34,6 @@ struct net_vrf { #if IS_ENABLED(CONFIG_NET_VRF) -/* called with rcu_read_lock() */ -static inline int vrf_master_ifindex_rcu(const struct net_device *dev) -{ - struct net_vrf_dev *vrf_ptr; - int ifindex = 0; - - if (!dev) - return 0; - - if (netif_is_l3_master(dev)) { - ifindex = dev->ifindex; - } else { - vrf_ptr = rcu_dereference(dev->vrf_ptr); - if (vrf_ptr) - ifindex = vrf_ptr->ifindex; - } - - return ifindex; -} - -static inline int vrf_master_ifindex(const struct net_device *dev) -{ - int ifindex; - - rcu_read_lock(); - ifindex = vrf_master_ifindex_rcu(dev); - rcu_read_unlock(); - - return ifindex; -} - /* called with rcu_read_lock */ static inline u32 vrf_dev_table_rcu(const struct net_device *dev) { @@ -139,16 +108,6 @@ static inline struct rtable *vrf_dev_get_rth(const struct net_device *dev) } #else -static inline int vrf_master_ifindex_rcu(const struct net_device *dev) -{ - return 0; -} - -static inline int vrf_master_ifindex(const struct net_device *dev) -{ - return 0; -} - static inline u32 vrf_dev_table_rcu(const struct net_device *dev) { return 0; diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 6fcbd215cdbc..b901b344f22d 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -46,6 +46,7 @@ #include #include #include +#include #include #ifndef CONFIG_IP_MULTIPLE_TABLES @@ -332,7 +333,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, bool dev_match; fl4.flowi4_oif = 0; - fl4.flowi4_iif = vrf_master_ifindex_rcu(dev); + fl4.flowi4_iif = l3mdev_master_ifindex_rcu(dev); if (!fl4.flowi4_iif) fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX; fl4.daddr = src; @@ -366,7 +367,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, if (nh->nh_dev == dev) { dev_match = true; break; - } else if (vrf_master_ifindex_rcu(nh->nh_dev) == dev->ifindex) { + } else if (l3mdev_master_ifindex_rcu(nh->nh_dev) == dev->ifindex) { dev_match = true; break; } diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index e5eb8ac4089d..6b96dee2800b 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -96,7 +96,7 @@ #include #include #include -#include +#include /* * Build xmit assembly blocks @@ -309,7 +309,7 @@ static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt, rc = false; if (icmp_global_allow()) { - int vif = vrf_master_ifindex(dst->dev); + int vif = l3mdev_master_ifindex(dst->dev); struct inet_peer *peer; peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, vif, 1); @@ -427,7 +427,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) fl4.flowi4_mark = mark; fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos); fl4.flowi4_proto = IPPROTO_ICMP; - fl4.flowi4_oif = vrf_master_ifindex(skb->dev); + fl4.flowi4_oif = l3mdev_master_ifindex(skb->dev); security_skb_classify_flow(skb, flowi4_to_flowi(&fl4)); rt = ip_route_output_key(net, &fl4); if (IS_ERR(rt)) @@ -461,7 +461,7 @@ static struct rtable *icmp_route_lookup(struct net *net, fl4->flowi4_proto = IPPROTO_ICMP; fl4->fl4_icmp_type = type; fl4->fl4_icmp_code = code; - fl4->flowi4_oif = vrf_master_ifindex(skb_in->dev); + fl4->flowi4_oif = l3mdev_master_ifindex(skb_in->dev); security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4)); rt = __ip_route_output_key(net, fl4); diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index fa7f15305f9a..9772b789adf3 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -48,7 +48,7 @@ #include #include #include -#include +#include /* NOTE. Logic of IP defragmentation is para
Re: [PATCH net-next 2/2] openvswitch: netlink attributes for IPv6 tunneling
On Tue, Sep 29, 2015 at 10:52 AM, Jiri Benc wrote: > When compat code for tunnel configuration is used, IPv6 tun_info will be > rejected by ovs_tunnel_get_egress_info. As the consequence, only the new way > of tunnel config supports IPv6. This appears to me to be a bug in the existing code. ovs_tunnel_get_egress_info() as a general mechanism is still in use and should work with both the old and new configuration methods. However, I agree that it doesn't look like it will work currently with tunnel devices. I think we need to fix this rather than making it more broken. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: macb: fix two typos
Just fix two typos in code comments. Signed-off-by: Geliang Tang --- drivers/net/ethernet/cadence/macb.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index 6e1faea..866b128 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -267,9 +267,9 @@ #define MACB_BEX_SIZE 1 #define MACB_RM9200_BNQ_OFFSET 4 /* AT91RM9200 only */ #define MACB_RM9200_BNQ_SIZE 1 /* AT91RM9200 only */ -#define MACB_COMP_OFFSET 5 /* Trnasmit complete */ +#define MACB_COMP_OFFSET 5 /* Transmit complete */ #define MACB_COMP_SIZE 1 -#define MACB_UND_OFFSET6 /* Trnasmit under run */ +#define MACB_UND_OFFSET6 /* Transmit under run */ #define MACB_UND_SIZE 1 /* Bitfields in RSR */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: Initialize flow flags in input path
The fib_table_lookup tracepoint found 2 places where the flowi4_flags is not initialized. Signed-off-by: David Ahern --- net/ipv4/fib_frontend.c | 1 + net/ipv4/route.c| 1 + 2 files changed, 2 insertions(+) diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 6fcbd215cdbc..690bcbc59f26 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -340,6 +340,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, fl4.flowi4_tos = tos; fl4.flowi4_scope = RT_SCOPE_UNIVERSE; fl4.flowi4_tun_key.tun_id = 0; + fl4.flowi4_flags = 0; no_addr = idev->ifa_list == NULL; diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 8c84a6664b30..13ac8d012aa7 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1743,6 +1743,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, fl4.flowi4_mark = skb->mark; fl4.flowi4_tos = tos; fl4.flowi4_scope = RT_SCOPE_UNIVERSE; + fl4.flowi4_flags = 0; fl4.daddr = daddr; fl4.saddr = saddr; err = fib_lookup(net, &fl4, &res, 0); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ovs-dev] [PATCH net-next 1/2] openvswitch: add tunnel protocol to sw_flow_key
On Tue, Sep 29, 2015 at 10:52 AM, Jiri Benc wrote: > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c > index 5c030a4d7338..03ba070c3256 100644 > --- a/net/openvswitch/flow_netlink.c > +++ b/net/openvswitch/flow_netlink.c > @@ -643,6 +643,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr, > } > > SW_FLOW_KEY_PUT(match, tun_key.tun_flags, tun_flags, is_mask); > + SW_FLOW_KEY_PUT(match, tun_proto, AF_INET, is_mask); I don't think this is right in the case of the mask. It will cause the the mask to be the value AF_INET - instead you want to set the mask to be 0xff. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] net: add pfmemalloc check in sk_add_backlog()
From: Eric Dumazet Greg reported crashes hitting the following check in __sk_backlog_rcv() BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); The pfmemalloc bit is currently checked in sk_filter(). This works correctly for TCP, because sk_filter() is ran in tcp_v[46]_rcv() before hitting the prequeue or backlog checks. For UDP or other protocols, this does not work, because the sk_filter() is ran from sock_queue_rcv_skb(), which might be called _after_ backlog queuing if socket is owned by user by the time packet is processed by softirq handler. Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB processing") Signed-off-by: Eric Dumazet Reported-by: Greg Thelen --- include/net/sock.h |8 1 file changed, 8 insertions(+) diff --git a/include/net/sock.h b/include/net/sock.h index 7aa78440559a..e23717013a4e 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -828,6 +828,14 @@ static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *s if (sk_rcvqueues_full(sk, limit)) return -ENOBUFS; + /* +* If the skb was allocated from pfmemalloc reserves, only +* allow SOCK_MEMALLOC sockets to use it as this socket is +* helping free memory +*/ + if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) + return -ENOMEM; + __sk_add_backlog(sk, skb); sk->sk_backlog.len += skb->truesize; return 0; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] netfilter: remove dead code
Flavio Leitner wrote: > Remove __nf_conntrack_find() from headers. > Fixes: dcd93ed4cd1 ("netfilter: nf_conntrack: remove dead code" For the record: netfilter patches should go to netfilter-de...@vger.kernel.org . That being said, in this case I doubt Pablo minds if David takes this directly, patch ts obviously correct[tm] :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] netfilter: remove dead code
Remove __nf_conntrack_find() from headers. Fixes: dcd93ed4cd1 ("netfilter: nf_conntrack: remove dead code" Signed-off-by: Flavio Leitner --- include/net/netfilter/nf_conntrack.h | 4 1 file changed, 4 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index d642f68..fde4068 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -183,10 +183,6 @@ void *nf_ct_alloc_hashtable(unsigned int *sizep, int nulls); void nf_ct_free_hashtable(void *hash, unsigned int size); -struct nf_conntrack_tuple_hash * -__nf_conntrack_find(struct net *net, u16 zone, - const struct nf_conntrack_tuple *tuple); - int nf_conntrack_hash_check_insert(struct nf_conn *ct); bool nf_ct_delete(struct nf_conn *ct, u32 pid, int report); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Documentation: improve line discipline method descriptions
Mention that the ldisc open method must set tty->receive_room, and that many methods are optional. Add description of receive_buf2 method. Signed-off-by: Tilman Schmidt --- Documentation/serial/tty.txt | 60 1 file changed, 39 insertions(+), 21 deletions(-) diff --git a/Documentation/serial/tty.txt b/Documentation/serial/tty.txt index 973c8ad..bc3842d 100644 --- a/Documentation/serial/tty.txt +++ b/Documentation/serial/tty.txt @@ -39,8 +39,13 @@ TTY side interfaces: open() - Called when the line discipline is attached to the terminal. No other call into the line discipline for this tty will occur until it - completes successfully. Returning an error will - prevent the ldisc from being attached. Can sleep. + completes successfully. Should initialize any + state needed by the ldisc, and set receive_room + in the tty_struct to the maximum amount of data + the line discipline is willing to accept from the + driver with a single call to receive_buf(). + Returning an error will prevent the ldisc from + being attached. Can sleep. close()- This is called on a terminal when the line discipline is being unplugged. At the point of @@ -52,9 +57,16 @@ hangup() - Called when the tty line is hung up. No further calls into the ldisc code will occur. The return value is ignored. Can sleep. -write()- A process is writing data through the line - discipline. Multiple write calls are serialized - by the tty layer for the ldisc. May sleep. +read() - (optional) A process requests reading data from + the line. Multiple read calls may occur in parallel + and the ldisc must deal with serialization issues. + If not defined, the process will receive an EIO + error. May sleep. + +write()- (optional) A process requests writing data to the + line. Multiple write calls are serialized by the + tty layer for the ldisc. If not defined, the + process will receive an EIO error. May sleep. flush_buffer() - (optional) May be called at any point between open and close, and instructs the line discipline @@ -69,27 +81,33 @@ set_termios() - (optional) Called on termios structure changes. termios semaphore so allowed to sleep. Serialized against itself only. -read() - Move data from the line discipline to the user. - Multiple read calls may occur in parallel and the - ldisc must deal with serialization issues. May - sleep. - -poll() - Check the status for the poll/select calls. Multiple - poll calls may occur in parallel. May sleep. +poll() - (optional) Check the status for the poll/select + calls. Multiple poll calls may occur in parallel. + May sleep. -ioctl()- Called when an ioctl is handed to the tty layer - that might be for the ldisc. Multiple ioctl calls - may occur in parallel. May sleep. +ioctl()- (optional) Called when an ioctl is handed to the + tty layer that might be for the ldisc. Multiple + ioctl calls may occur in parallel. May sleep. -compat_ioctl() - Called when a 32 bit ioctl is handed to the tty layer - that might be for the ldisc. Multiple ioctl calls - may occur in parallel. May sleep. +compat_ioctl() - (optional) Called when a 32 bit ioctl is handed + to the tty layer that might be for the ldisc. + Multiple ioctl calls may occur in parallel. + May sleep. Driver Side Interfaces: -receive_buf() - Hand buffers of bytes from the driver to the ldisc - for processing. Semantics currently rather - mysterious 8( +receive_buf() - (optional) Called by the low-level driver to hand + a buffer of received bytes to the ldisc for + processing. The number of bytes is guaranteed not + to exceed the current value of tty->receive_room. + All bytes must be processed. + +receive_buf2() - (optional) Called by
Re: [PATCH net-next 00/14] tcp: listener refactoring preparations
From: Eric Dumazet Date: Tue, 29 Sep 2015 07:42:38 -0700 > This patch series makes changes to TCP/DCCP stacks so that > we can switch listener code to lockless mode. > > This is done by marking const the listener socket in all > appropriate paths. > > FastOpen code had to be changed to not dynamically allocate > a very small structure to make code simpler for following changes. Series applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] 802.1AD: Flow handling, actions, vlan parsing and netlink attributes
On 9/29/15 6:56 PM, Pravin Shelar wrote: On Fri, Sep 25, 2015 at 3:35 PM, Thomas F Herbert wrote: Pravin, Another comment and question. Please seen inline below. Thanks, --Tom On 9/24/15 7:42 PM, Pravin Shelar wrote: On Thu, Sep 24, 2015 at 10:58 AM, Thomas F Herbert wrote: Add support for 802.1ad including the ability to push and pop double tagged vlans. Add support for 802.1ad to netlink parsing and flow conversion. Uses double nested encap attributes to represent double tagged vlan. Inner TPID encoded along with ctci in nested attributes. Signed-off-by: Thomas F Herbert --- net/openvswitch/flow.c | 83 + net/openvswitch/flow.h | 5 ++ net/openvswitch/flow_netlink.c | 166 ++--- 3 files changed, 230 insertions(+), 24 deletions(-) ... @@ -1320,6 +1437,7 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey, { struct ovs_key_ethernet *eth_key; struct nlattr *nla, *encap; + struct nlattr *in_encap = NULL; if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id)) goto nla_put_failure; @@ -1368,17 +1486,42 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey, ether_addr_copy(eth_key->eth_src, output->eth.src); ether_addr_copy(eth_key->eth_dst, output->eth.dst); - if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) { + if (swkey->eth.tci || eth_type_vlan(swkey->eth.type)) { __be16 eth_type; - eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0x); + + if (swkey->eth.cvlan.ctci || + eth_type_vlan(swkey->eth.cvlan.c_tpid)) + eth_type = !is_mask ? htons(ETH_P_8021AD) : + htons(0x); + else + eth_type = !is_mask ? htons(ETH_P_8021Q) : + htons(0x); + Here we can directly dump output->eth.type to netlink. No need to check for inner encap. The eth.type is set to the inner encapsulated protocol not to the tpid. We don't "know" what the outer tpid so I assume it is 802.1Q. To address this situation, do you think I should add the outer tpid to sw_flow_key? Also see comment above in flow.h. With the addition of nested vlan, we need to add outer tpid. This will simplify vlan netlink serialization too. Yes, thanks. I agree that this is the sensible approach. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 00/11] net: L3 master device
On 9/29/15 5:23 PM, David Miller wrote: From: David Ahern Date: Mon, 28 Sep 2015 10:16:50 -0700 v2 - rebased to top of net-next - addressed Niks comments (checking master, removing extra lines, and flipping the order of patches 1 and 2) This still needs some work: ERROR: "l3mdev_master_ifindex_rcu" [net/ipv6/ipv6.ko] undefined! scripts/Makefile.modpost:90: recipe for target '__modpost' failed make[1]: *** [__modpost] Error 1 Makefile:1095: recipe for target 'modules' failed make: *** [modules] Error 2 ugh. All of my builds have CONFIG_IPV6=y. Will kickout a v3 later. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] skbuff: Fix skb checksum partial check.
From: Pravin B Shelar Date: Mon, 28 Sep 2015 17:24:25 -0700 > Earlier patch 6ae459bda tried to detect void ckecksum partial > skb by comparing pull length to checksum offset. But it does > not work for all cases since checksum-offset depends on > updates to skb->data. > > Following patch fixes it by validating checksum start offset > after skb-data pointer is updated. Negative value of checksum > offset start means there is no need to checksum. > > Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull") > Reported-by: Andrew Vagin > Signed-off-by: Pravin B Shelar > --- > This and 6ae459bda patches needs to be backported to stable. Applied and both queued up for -stable, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 0/3] BPF updates
Some minor updates to {cls,act}_bpf to retrieve routing realms and to make skb->priority writable. Thanks! v1 -> v2: - Dropped preclassify patch for now from the series as the rest is pretty much independent of it - Rest unchanged, only rebased and already posted Acked-by's kept Daniel Borkmann (3): ebpf: migrate bpf_prog's flags to bitfield sched, bpf: add helper for retrieving routing realms sched, bpf: make skb->priority writable arch/arm/net/bpf_jit_32.c | 2 +- arch/arm64/net/bpf_jit_comp.c | 2 +- arch/mips/net/bpf_jit.c | 2 +- arch/powerpc/net/bpf_jit_comp.c | 2 +- arch/s390/net/bpf_jit_comp.c| 2 +- arch/sparc/net/bpf_jit_comp.c | 2 +- arch/x86/net/bpf_jit_comp.c | 2 +- include/linux/filter.h | 7 +-- include/uapi/linux/bpf.h| 7 +++ kernel/bpf/core.c | 4 kernel/bpf/syscall.c| 6 -- net/core/filter.c | 33 ++--- net/sched/cls_bpf.c | 8 ++-- 13 files changed, 63 insertions(+), 16 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 3/3] sched, bpf: make skb->priority writable
{cls,act}_bpf can now set the skb->priority from an eBPF program based on various critera, so that for example classful qdiscs like multiq can update the skb's priority during enqueue time and further push it down into subsequent qdiscs. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- net/core/filter.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 45c69ce..53a5036 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1721,6 +1721,7 @@ static bool tc_cls_act_is_valid_access(int off, int size, switch (off) { case offsetof(struct __sk_buff, mark): case offsetof(struct __sk_buff, tc_index): + case offsetof(struct __sk_buff, priority): case offsetof(struct __sk_buff, cb[0]) ... offsetof(struct __sk_buff, cb[4]): break; @@ -1762,8 +1763,12 @@ static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg, case offsetof(struct __sk_buff, priority): BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, priority) != 4); - *insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg, - offsetof(struct sk_buff, priority)); + if (type == BPF_WRITE) + *insn++ = BPF_STX_MEM(BPF_W, dst_reg, src_reg, + offsetof(struct sk_buff, priority)); + else + *insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg, + offsetof(struct sk_buff, priority)); break; case offsetof(struct __sk_buff, ingress_ifindex): -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 1/3] ebpf: migrate bpf_prog's flags to bitfield
As we need to add further flags to the bpf_prog structure, lets migrate both bools to a bitfield representation. The size of the base structure (excluding insns) remains unchanged at 40 bytes. Add also tags for the kmemchecker, so that it doesn't throw false positives. Even in case gcc would generate suboptimal code, it's not being accessed in performance critical paths. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- arch/arm/net/bpf_jit_32.c | 2 +- arch/arm64/net/bpf_jit_comp.c | 2 +- arch/mips/net/bpf_jit.c | 2 +- arch/powerpc/net/bpf_jit_comp.c | 2 +- arch/s390/net/bpf_jit_comp.c| 2 +- arch/sparc/net/bpf_jit_comp.c | 2 +- arch/x86/net/bpf_jit_comp.c | 2 +- include/linux/filter.h | 6 -- kernel/bpf/core.c | 4 kernel/bpf/syscall.c| 4 ++-- net/core/filter.c | 2 +- 11 files changed, 18 insertions(+), 12 deletions(-) diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index 876060b..0df5fd5 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -1047,7 +1047,7 @@ void bpf_jit_compile(struct bpf_prog *fp) set_memory_ro((unsigned long)header, header->pages); fp->bpf_func = (void *)ctx.target; - fp->jited = true; + fp->jited = 1; out: kfree(ctx.offsets); return; diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index c047598..a44e529 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -744,7 +744,7 @@ void bpf_int_jit_compile(struct bpf_prog *prog) set_memory_ro((unsigned long)header, header->pages); prog->bpf_func = (void *)ctx.image; - prog->jited = true; + prog->jited = 1; out: kfree(ctx.offset); } diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c index 0c4a133..77cb273 100644 --- a/arch/mips/net/bpf_jit.c +++ b/arch/mips/net/bpf_jit.c @@ -1251,7 +1251,7 @@ void bpf_jit_compile(struct bpf_prog *fp) bpf_jit_dump(fp->len, alloc_size, 2, ctx.target); fp->bpf_func = (void *)ctx.target; - fp->jited = true; + fp->jited = 1; out: kfree(ctx.offsets); diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index 17cea18..0478216 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -679,7 +679,7 @@ void bpf_jit_compile(struct bpf_prog *fp) ((u64 *)image)[1] = local_paca->kernel_toc; #endif fp->bpf_func = (void *)image; - fp->jited = true; + fp->jited = 1; } out: kfree(addrs); diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index eeda051..9a0c4c2 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -1310,7 +1310,7 @@ void bpf_int_jit_compile(struct bpf_prog *fp) if (jit.prg_buf) { set_memory_ro((unsigned long)header, header->pages); fp->bpf_func = (void *) jit.prg_buf; - fp->jited = true; + fp->jited = 1; } free_addrs: kfree(jit.addrs); diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c index f8b9f71..22564f5 100644 --- a/arch/sparc/net/bpf_jit_comp.c +++ b/arch/sparc/net/bpf_jit_comp.c @@ -812,7 +812,7 @@ cond_branch:f_offset = addrs[i + filter[i].jf]; if (image) { bpf_flush_icache(image, image + proglen); fp->bpf_func = (void *)image; - fp->jited = true; + fp->jited = 1; } out: kfree(addrs); diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 70efcd0..7599197 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1109,7 +1109,7 @@ void bpf_int_jit_compile(struct bpf_prog *prog) bpf_flush_icache(header, image + proglen); set_memory_ro((unsigned long)header, header->pages); prog->bpf_func = (void *)image; - prog->jited = true; + prog->jited = 1; } out: kfree(addrs); diff --git a/include/linux/filter.h b/include/linux/filter.h index fa2cab9..bad618f 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -326,8 +326,10 @@ struct bpf_binary_header { struct bpf_prog { u16 pages; /* Number of allocated pages */ - booljited; /* Is our filter JIT'ed? */ - boolgpl_compatible; /* Is our filter GPL compatible? */ + kmemcheck_bitfield_begin(meta); + u16 jited:1,/* Is our filter JIT'ed? */ + gpl_compatible:1; /* Is filter GPL compatible? */ + kmemcheck_bitfield_end(meta); u32 len;/* Number of filter blocks */ e
[PATCH net-next v2 2/3] sched, bpf: add helper for retrieving routing realms
Using routing realms as part of the classifier is quite useful, it can be viewed as a tag for one or multiple routing entries (think of an analogy to net_cls cgroup for processes), set by user space routing daemons or via iproute2 as an indicator for traffic classifiers and later on processed in the eBPF program. Unlike actions, the classifier can inspect device flags and enable netif_keep_dst() if necessary. tc actions don't have that possibility, but in case people know what they are doing, it can be used from there as well (e.g. via devs that must keep dsts by design anyway). If a realm is set, the handler returns the non-zero realm. User space can set the full 32bit realm for the dst. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- include/linux/filter.h | 3 ++- include/uapi/linux/bpf.h | 7 +++ kernel/bpf/syscall.c | 2 ++ net/core/filter.c| 22 ++ net/sched/cls_bpf.c | 8 ++-- 5 files changed, 39 insertions(+), 3 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index bad618f..3d5fd24 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -328,7 +328,8 @@ struct bpf_prog { u16 pages; /* Number of allocated pages */ kmemcheck_bitfield_begin(meta); u16 jited:1,/* Is our filter JIT'ed? */ - gpl_compatible:1; /* Is filter GPL compatible? */ + gpl_compatible:1, /* Is filter GPL compatible? */ + dst_needed:1; /* Do we need dst entry? */ kmemcheck_bitfield_end(meta); u32 len;/* Number of filter blocks */ enum bpf_prog_type type; /* Type of BPF program */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 4ec0b54..564f1f0 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -280,6 +280,13 @@ enum bpf_func_id { * Return: TC_ACT_REDIRECT */ BPF_FUNC_redirect, + + /** +* bpf_get_route_realm(skb) - retrieve a dst's tclassid +* @skb: pointer to skb +* Return: realm if != 0 +*/ + BPF_FUNC_get_route_realm, __BPF_FUNC_MAX_ID, }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 2190ab1..5f35f42 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -402,6 +402,8 @@ static void fixup_bpf_calls(struct bpf_prog *prog) */ BUG_ON(!prog->aux->ops->get_func_proto); + if (insn->imm == BPF_FUNC_get_route_realm) + prog->dst_needed = 1; if (insn->imm == BPF_FUNC_tail_call) { /* mark bpf_tail_call as different opcode * to avoid conditional branch in diff --git a/net/core/filter.c b/net/core/filter.c index 04664ac..45c69ce 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -49,6 +49,7 @@ #include #include #include +#include /** * sk_filter - run a packet through a socket filter @@ -1478,6 +1479,25 @@ static const struct bpf_func_proto bpf_get_cgroup_classid_proto = { .arg1_type = ARG_PTR_TO_CTX, }; +static u64 bpf_get_route_realm(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) +{ +#ifdef CONFIG_IP_ROUTE_CLASSID + const struct dst_entry *dst; + + dst = skb_dst((struct sk_buff *) (unsigned long) r1); + if (dst) + return dst->tclassid; +#endif + return 0; +} + +static const struct bpf_func_proto bpf_get_route_realm_proto = { + .func = bpf_get_route_realm, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, +}; + static u64 bpf_skb_vlan_push(u64 r1, u64 r2, u64 vlan_tci, u64 r4, u64 r5) { struct sk_buff *skb = (struct sk_buff *) (long) r1; @@ -1648,6 +1668,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id) return bpf_get_skb_set_tunnel_key_proto(); case BPF_FUNC_redirect: return &bpf_redirect_proto; + case BPF_FUNC_get_route_realm: + return &bpf_get_route_realm_proto; default: return sk_filter_func_proto(func_id); } diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index 7eeffaf6..5faaa54 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -262,7 +262,8 @@ static int cls_bpf_prog_from_ops(struct nlattr **tb, struct cls_bpf_prog *prog) return 0; } -static int cls_bpf_prog_from_efd(struct nlattr **tb, struct cls_bpf_prog *prog) +static int cls_bpf_prog_from_efd(struct nlattr **tb, struct cls_bpf_prog *prog, +const struct tcf_proto *tp) { struct bpf_prog *fp; char *name = NULL; @@ -294,6 +295,9 @@ static int cls_bpf_prog_from_efd(struc
Re: Poor IPv6 TCP performance in 4.3-rc3
Hi Russell, On Tue, Sep 29, 2015 at 8:32 PM, Russell King - ARM Linux wrote: > Hi, > > I'm seeing really poor IPv6 performance compared to IPv4. I've > checked using two different ARM platforms - an iMX6 platform using > the FEC driver, and an Armada 38x using mvneta. Does this patch help? https://patchwork.ozlabs.org/patch/523632/ It was suggested in the following thread: https://lkml.org/lkml/2015/9/29/258 and it seems to have fixed the performance issue. Regards, Fabio Estevam -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Poor IPv6 TCP performance in 4.3-rc3
Hi, I'm seeing really poor IPv6 performance compared to IPv4. I've checked using two different ARM platforms - an iMX6 platform using the FEC driver, and an Armada 38x using mvneta. The following was captured using iperf between the target system and my laptop. The problem only occurs one-way. The 4.3-rc3 platform is running iperf in server mode, the laptop is in client mode. Armada 38x: ipv6: [ 4] 0.0-23.9 sec 170 KBytes 58.3 Kbits/sec ipv4: [ 4] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec iMX6Q: ipv6: [ 4] 0.0-11.1 sec 640 KBytes 474 Kbits/sec ipv4: [ 4] 0.0-10.0 sec 655 MBytes 549 Mbits/sec iMX6D with 4.2: ipv6: [ 4] 0.0-10.0 sec 685 MBytes 574 Mbits/sec ipv4: [ 4] 0.0-10.0 sec 696 MBytes 583 Mbits/sec It looks like there's an IPv6 regression between 4.2 and 4.3-rc3. Turning GRO off on Armada 38x gives: ipv6: [ 4] 0.0-10.0 sec 1.08 GBytes 923 Mbits/sec ipv4: [ 5] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec I haven't started to debug yet, but I thought I'd post a heads-up in case it's a known problem. I'll try to get some packet logs on Thursday, and I'll try to bisect. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH 0/3] Minor IPv4 routing cleanups
From: Alexander Duyck Date: Mon, 28 Sep 2015 11:10:25 -0700 > These patches just contain some minor cleanups to address a few minor > issues. The first and the third mostly just improve readability. The > second patch should improve the performance for multicast destination > addresses that do not have a localhost source IP address by avoiding some > unnecessary dereferences. Series applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv
Tom Herbert wrote: > Call before performing NF_HOOK and routing in order to perform address > translation in the receive path. > > Signed-off-by: Tom Herbert > --- > net/ipv6/ip6_input.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c > index 9075acf..06dac55 100644 > --- a/net/ipv6/ip6_input.c > +++ b/net/ipv6/ip6_input.c > @@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, > struct packet_type *pt > /* Must drop socket now because of tproxy. */ > skb_orphan(skb); > > + /* Translate destination address before routing */ > + xfrm6_xlat_addr(skb); > + Ugh. Yet another hook :-( One would think we have enough by now. In any case, I still think this ILA translation stuff should either go into xtables (NPT-ish), nftables, or into tc if nft is unusable for whatever reeason. Judging by where this hook is placed, nf hooks would work just fine. If the iptables traverser has too high cost (unfortunately, xtables design enforces counters and iface name matching even if its not wanted/unneeded for instance), maybe nft would perform better in that regard. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 00/11] net: L3 master device
From: David Ahern Date: Mon, 28 Sep 2015 10:16:50 -0700 > v2 > - rebased to top of net-next > > - addressed Niks comments (checking master, removing extra lines, and > flipping the order of patches 1 and 2) This still needs some work: ERROR: "l3mdev_master_ifindex_rcu" [net/ipv6/ipv6.ko] undefined! scripts/Makefile.modpost:90: recipe for target '__modpost' failed make[1]: *** [__modpost] Error 1 Makefile:1095: recipe for target 'modules' failed make: *** [modules] Error 2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr
Hi Tom, [auto build test results on next-20150929 -- if it's inappropriate base, please ignore] reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> net/ipv6/ila/ila_xlat.c:218:24: sparse: incompatible types in comparison >> expression (different address spaces) net/ipv6/ila/ila_xlat.c:269:32: sparse: incompatible types in comparison expression (different address spaces) net/ipv6/ila/ila_xlat.c:275:25: sparse: incompatible types in comparison expression (different address spaces) net/ipv6/ila/ila_xlat.c:279:25: sparse: incompatible types in comparison expression (different address spaces) net/ipv6/ila/ila_xlat.c:315:31: sparse: incompatible types in comparison expression (different address spaces) net/ipv6/ila/ila_xlat.c:329:32: sparse: incompatible types in comparison expression (different address spaces) net/ipv6/ila/ila_xlat.c:201:23: sparse: incompatible types in comparison expression (different address spaces) net/ipv6/ila/ila_xlat.c:514:31: sparse: incompatible types in comparison expression (different address spaces) net/ipv6/ila/ila_xlat.c:184:23: sparse: incompatible types in comparison expression (different address spaces) net/ipv6/ila/ila_xlat.c: In function 'ila_xlat_fini': net/ipv6/ila/ila_xlat.c:636:6: warning: unused variable 'i' [-Wunused-variable] int i; ^ vim +218 net/ipv6/ila/ila_xlat.c 202 } 203 204 return NULL; 205 } 206 207 static inline void ila_release(struct ila_map *ila) 208 { 209 kfree_rcu(ila, rcu); 210 } 211 212 static void ila_free_cb(void *ptr, void *arg) 213 { 214 struct ila_map *ila = (struct ila_map *)ptr, *next; 215 216 /* Assume rcu_readlock held */ 217 while (ila) { > 218 next = rcu_access_pointer(ila->next); 219 ila_release(ila); 220 ila = next; 221 } 222 } 223 224 static int ila_add_mapping(struct net *net, struct ila_xlat_params *p) 225 { 226 struct ila_net *ilan = net_generic(net, ila_net_id); --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about potential integer truncation in mwifiex_set_wapi_ie and mwifiex_set_wps_ie
On Tue, Sep 29, 2015 at 05:21:28PM +0200, PaX Team wrote: > hi all, > > in drivers/net/wireless/mwifiex/sta_ioctl.c the following functions > > mwifiex_set_wpa_ie_helper > mwifiex_set_wapi_ie > mwifiex_set_wps_ie > > can truncate the incoming ie_len argument from u16 to u8 when it gets > stored in mwifiex_private.wpa_ie_len, mwifiex_private.wapi_ie_len and > mwifiex_private.wps_ie_len, respectively. based on some light code > reading it seems a length value of 256 is valid (IEEE_MAX_IE_SIZE and > MWIFIEX_MAX_VSIE_LEN seem to limit it) and thus would get truncated > to 0 when stored in those u8 fields. the question is whether this is > intentional or a bug somewhere. i agree, while there is a test to ensure ie_len is not greater than 256, there is a possibility that it will be exactly 256, which means 256 bytes will be given to memcpy but mwifiex_private.{wpa,wapi,wps}_ie_len will be zero. i suggest changing the lengths to u16. not tested. diff --git a/drivers/net/wireless/mwifiex/main.h b/drivers/net/wireless/mwifiex/main.h index fe12560..b66e9a7 100644 --- a/drivers/net/wireless/mwifiex/main.h +++ b/drivers/net/wireless/mwifiex/main.h @@ -512,14 +512,14 @@ struct mwifiex_private { struct mwifiex_wep_key wep_key[NUM_WEP_KEYS]; u16 wep_key_curr_index; u8 wpa_ie[256]; - u8 wpa_ie_len; + u16 wpa_ie_len; u8 wpa_is_gtk_set; struct host_cmd_ds_802_11_key_material aes_key; struct host_cmd_ds_802_11_key_material_v2 aes_key_v2; u8 wapi_ie[256]; - u8 wapi_ie_len; + u16 wapi_ie_len; u8 *wps_ie; - u8 wps_ie_len; + u16 wps_ie_len; u8 wmm_required; u8 wmm_enabled; u8 wmm_qosinfo; -- James Cameron http://quozl.linux.org.au/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 3/7] openvswitch: Fix skb leak in ovs_fragment()
On 29 September 2015 at 15:48, Rustad, Mark D wrote: >> On Sep 29, 2015, at 3:39 PM, Joe Stringer wrote: >> >> @@ -728,8 +727,14 @@ static void ovs_fragment(struct vport *vport, struct >> sk_buff *skb, u16 mru, >> WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.", >> ovs_vport_name(vport), ntohs(ethertype), mru, >> vport->dev->mtu); >> - kfree_skb(skb); >> + goto out; >> } >> + >> + skb = NULL; >> + >> +out: >> + if (skb) >> + kfree_skb(skb); >> } >> >> static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, > > Wouldn't that hunk be better as: > > @@ -728,8 +727,13 @@ static void ovs_fragment(struct vport *vport, struct > sk_buff *skb, u16 mru, > WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, > MTU=%d.", > ovs_vport_name(vport), ntohs(ethertype), mru, > vport->dev->mtu); > - kfree_skb(skb); > + goto out; > } > + > + return; > + > +out: > + kfree_skb(skb); > } > > static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, > > -- > Mark Rustad, Networking Division, Intel Corporation Sure thing, I'll roll this change in to a v2 when the rest of the series is reviewed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function
Hi Tom: On 9/29/15 4:17 PM, Tom Herbert wrote: This patch adds xfrm6_xlat_addr which is called in the data path to perform address translation (primarily for the receive path). Modules may register their own callback to perform a translation-- this registration is managed by xfrm6_xlat_addr_add and xfrm6_xlat_addr_del. xfrm6_xlat_addr allows translation of addresses for an sk_buff. Seems like a stretch to lump this into xfrms. You have a separate genl based config as opposed to the netlink xfrm API and you are calling the xlat_addr function directly in ip6_rcv as opposed to via some policy with dst_ops driven redirection. Why call this a xfrm? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] 802.1AD: Flow handling, actions, vlan parsing and netlink attributes
On Fri, Sep 25, 2015 at 3:35 PM, Thomas F Herbert wrote: > Pravin, > > Another comment and question. Please seen inline below. > > Thanks, > > --Tom > > On 9/24/15 7:42 PM, Pravin Shelar wrote: >> >> On Thu, Sep 24, 2015 at 10:58 AM, Thomas F Herbert >> wrote: >>> >>> Add support for 802.1ad including the ability to push and pop double >>> tagged vlans. Add support for 802.1ad to netlink parsing and flow >>> conversion. Uses double nested encap attributes to represent double >>> tagged vlan. Inner TPID encoded along with ctci in nested attributes. >>> >>> Signed-off-by: Thomas F Herbert >>> --- >>> net/openvswitch/flow.c | 83 + >>> net/openvswitch/flow.h | 5 ++ >>> net/openvswitch/flow_netlink.c | 166 >>> ++--- >>> 3 files changed, 230 insertions(+), 24 deletions(-) >>> ... >>> @@ -1320,6 +1437,7 @@ static int __ovs_nla_put_key(const struct >>> sw_flow_key *swkey, >>> { >>> struct ovs_key_ethernet *eth_key; >>> struct nlattr *nla, *encap; >>> + struct nlattr *in_encap = NULL; >>> >>> if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id)) >>> goto nla_put_failure; >>> @@ -1368,17 +1486,42 @@ static int __ovs_nla_put_key(const struct >>> sw_flow_key *swkey, >>> ether_addr_copy(eth_key->eth_src, output->eth.src); >>> ether_addr_copy(eth_key->eth_dst, output->eth.dst); >>> >>> - if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) { >>> + if (swkey->eth.tci || eth_type_vlan(swkey->eth.type)) { >>> __be16 eth_type; >>> - eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0x); >>> + >>> + if (swkey->eth.cvlan.ctci || >>> + eth_type_vlan(swkey->eth.cvlan.c_tpid)) >>> + eth_type = !is_mask ? htons(ETH_P_8021AD) : >>> + htons(0x); >>> + else >>> + eth_type = !is_mask ? htons(ETH_P_8021Q) : >>> + htons(0x); >>> + >> >> Here we can directly dump output->eth.type to netlink. No need to >> check for inner encap. > > The eth.type is set to the inner encapsulated protocol not to the tpid. We > don't "know" what the outer tpid so I assume it is 802.1Q. To address this > situation, do you think I should add the outer tpid to sw_flow_key? > Also see comment above in flow.h. > With the addition of nested vlan, we need to add outer tpid. This will simplify vlan netlink serialization too. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
On Tue, Sep 29, 2015 at 12:12 AM, David Woodhouse wrote: > On Mon, 2015-09-28 at 20:04 -0700, Tom Herbert wrote: >> >> > I've been pondering a bit of a redesign in this space. I think the >> > skb struct should be explicit in its instructions to hardware for >> > which offloads to do for each packet. >> > >> > In this way, the stack would be *directly* telling the drivers what to >> > do (and what not to do), solving all sorts of bugs and really improving >> > driver reliability and implementation. >> > >> Doesn't CHECKSUM_PARTIAL with csum_offset and csum_start already tell >> the driver unambiguously what to do wrt checksum offload? > > Right. That's precisely what we *do* have. But as things stand, we > can't *use* it to its full capability. > > It's fine for decent devices which can handle such explicit > instructions (advertised by the NETIF_F_HW_CSUM feature). > > The problem is the crappy devices that can *only* checksum UDP and TCP > frames, advertised with the NETIF_F_IP{V6,}_CSUM features. We make a > primitive attempt *not* to feed arbitrary checksum requests to such > hardware. But we fail — we end up feeding *all* Legacy IP packets to a > NETIF_F_IP_CSUM device, and *all* IPv6 packets to a NETIF_F_IPV6_CSUM > device, regardless of whether they're *actually* TCP or UDP packets. > Please look at ixgbe_tx_csum in ixgbe driver. This one example of how a driver can determine whether the checksum being offloaded is TCP or UDP. The bug in this driver is that skb_checksum_help is not called for a protocol the driver isn't looking for. In particular, I believe this driver will probably send packets with invalid checksums when TCP/UDP is used with IPv6 packets that contain extension headers. Tom > That's the problem I'm trying to solve. And then we *can* make full use > of the generic checksum offload (I had it working for ICMPv6 at one > point: http://lists.openwall.net/netdev/2013/01/14/38 ). > > -- > David WoodhouseOpen Source Technology Centre > david.woodho...@intel.com Intel Corporation > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 3/7] openvswitch: Fix skb leak in ovs_fragment()
> On Sep 29, 2015, at 3:39 PM, Joe Stringer wrote: > > @@ -728,8 +727,14 @@ static void ovs_fragment(struct vport *vport, struct > sk_buff *skb, u16 mru, > WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.", > ovs_vport_name(vport), ntohs(ethertype), mru, > vport->dev->mtu); > - kfree_skb(skb); > + goto out; > } > + > + skb = NULL; > + > +out: > + if (skb) > + kfree_skb(skb); > } > > static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, Wouldn't that hunk be better as: @@ -728,8 +727,13 @@ static void ovs_fragment(struct vport *vport, struct sk_buff *skb, u16 mru, WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.", ovs_vport_name(vport), ntohs(ethertype), mru, vport->dev->mtu); - kfree_skb(skb); + goto out; } + + return; + +out: + kfree_skb(skb); } static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, -- Mark Rustad, Networking Division, Intel Corporation signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr
Hi Tom, [auto build test results on next-20150929 -- if it's inappropriate base, please ignore] config: m68k-sun3_defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout c505336670b5c681c0a36053a68591e0f9074245 # save the attached .config to linux build tree make.cross ARCH=m68k All error/warnings (new ones prefixed by >>): >> ERROR: "xfrm6_xlat_addr_fini" [net/ipv6/ipv6.ko] undefined! >> ERROR: "xfrm6_xlat_addr_init" [net/ipv6/ipv6.ko] undefined! --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
[PATCH net 6/7] openvswitch: Extend ct_state match field to 32 bits
The ct_state field was initially added as an 8-bit field, however six of the bits are already being used and use cases are already starting to appear that may push the limits of this field. This patch extends the field to 32 bits while retaining the internal representation of 8 bits. This should cover forward compatibility of the ABI for the foreseeable future. This patch also reorders the OVS_CS_F_* bits to be sequential. Suggested-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- include/uapi/linux/openvswitch.h | 8 net/openvswitch/conntrack.c | 2 +- net/openvswitch/conntrack.h | 4 ++-- net/openvswitch/flow_netlink.c | 8 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 7cbb9d5..f121af5 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -323,7 +323,7 @@ enum ovs_key_attr { OVS_KEY_ATTR_MPLS, /* array of struct ovs_key_mpls. * The implementation may restrict * the accepted length of the array. */ - OVS_KEY_ATTR_CT_STATE, /* u8 bitmask of OVS_CS_F_* */ + OVS_KEY_ATTR_CT_STATE, /* u32 bitmask of OVS_CS_F_* */ OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */ OVS_KEY_ATTR_CT_MARK, /* u32 connection tracking mark */ OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */ @@ -449,9 +449,9 @@ struct ovs_key_ct_label { #define OVS_CS_F_ESTABLISHED 0x02 /* Part of an existing connection. */ #define OVS_CS_F_RELATED 0x04 /* Related to an established * connection. */ -#define OVS_CS_F_INVALID 0x20 /* Could not track connection. */ -#define OVS_CS_F_REPLY_DIR 0x40 /* Flow is in the reply direction. */ -#define OVS_CS_F_TRACKED 0x80 /* Conntrack has occurred. */ +#define OVS_CS_F_REPLY_DIR 0x08 /* Flow is in the reply direction. */ +#define OVS_CS_F_INVALID 0x10 /* Could not track connection. */ +#define OVS_CS_F_TRACKED 0x20 /* Conntrack has occurred. */ /** * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands. diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 8c5d482c..167cf43 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -167,7 +167,7 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key) int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb) { - if (nla_put_u8(skb, OVS_KEY_ATTR_CT_STATE, key->ct.state)) + if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, key->ct.state)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) && diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h index c658d95..7a125422 100644 --- a/net/openvswitch/conntrack.h +++ b/net/openvswitch/conntrack.h @@ -35,7 +35,7 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key); int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb); void ovs_ct_free_action(const struct nlattr *a); -static inline bool ovs_ct_state_supported(u8 state) +static inline bool ovs_ct_state_supported(u32 state) { return !(state & ~(OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED | OVS_CS_F_RELATED | OVS_CS_F_REPLY_DIR | @@ -53,7 +53,7 @@ static inline bool ovs_ct_verify(struct net *net, int attr) return false; } -static inline bool ovs_ct_state_supported(u8 state) +static inline bool ovs_ct_state_supported(u32 state) { return false; } diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index c4917c9..292eb13 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -291,7 +291,7 @@ size_t ovs_key_attr_size(void) + nla_total_size(4) /* OVS_KEY_ATTR_SKB_MARK */ + nla_total_size(4) /* OVS_KEY_ATTR_DP_HASH */ + nla_total_size(4) /* OVS_KEY_ATTR_RECIRC_ID */ - + nla_total_size(1) /* OVS_KEY_ATTR_CT_STATE */ + + nla_total_size(4) /* OVS_KEY_ATTR_CT_STATE */ + nla_total_size(2) /* OVS_KEY_ATTR_CT_ZONE */ + nla_total_size(4) /* OVS_KEY_ATTR_CT_MARK */ + nla_total_size(16) /* OVS_KEY_ATTR_CT_LABELS */ @@ -349,7 +349,7 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { [OVS_KEY_ATTR_TUNNEL]= { .len = OVS_ATTR_NESTED, .next = ovs_tunnel_key_lens, }, [OVS_KEY_ATTR_MPLS] = { .len = sizeof(struct ovs_key_mpls) }, - [OVS_KEY_ATTR_CT_STATE] = { .len = sizeof(u8) }, + [OVS_KEY_ATTR_CT_STATE] = { .len = sizeof(u32) }, [OVS_KEY_ATTR_CT_ZONE] = { .len = sizeof(u16) }, [OVS_KEY_ATTR_CT_MARK] = { .len = sizeof(u32) }, [OV
[PATCH net 1/7] openvswitch: Make LABELS name more consistent
Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name for these to be consistent with conntrack. Fixes: c2ac667 "openvswitch: Allow matching on conntrack label" Signed-off-by: Joe Stringer --- include/uapi/linux/openvswitch.h | 4 ++-- net/openvswitch/actions.c| 2 +- net/openvswitch/conntrack.c | 10 +- net/openvswitch/flow_netlink.c | 14 +++--- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 32e07d8..9afcd60 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -326,7 +326,7 @@ enum ovs_key_attr { OVS_KEY_ATTR_CT_STATE, /* u8 bitmask of OVS_CS_F_* */ OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */ OVS_KEY_ATTR_CT_MARK, /* u32 connection tracking mark */ - OVS_KEY_ATTR_CT_LABEL, /* 16-octet connection tracking label */ + OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */ #ifdef __KERNEL__ OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */ @@ -633,7 +633,7 @@ enum ovs_ct_attr { OVS_CT_ATTR_FLAGS, /* u8 bitmask of OVS_CT_F_*. */ OVS_CT_ATTR_ZONE, /* u16 zone id. */ OVS_CT_ATTR_MARK, /* mark to associate with this connection. */ - OVS_CT_ATTR_LABEL, /* label to associate with this connection. */ + OVS_CT_ATTR_LABELS, /* labels to associate with this connection. */ OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of related connections. */ __OVS_CT_ATTR_MAX diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 315f533..e23a61c 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -968,7 +968,7 @@ static int execute_masked_set_action(struct sk_buff *skb, case OVS_KEY_ATTR_CT_STATE: case OVS_KEY_ATTR_CT_ZONE: case OVS_KEY_ATTR_CT_MARK: - case OVS_KEY_ATTR_CT_LABEL: + case OVS_KEY_ATTR_CT_LABELS: err = -EINVAL; break; } diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 002a755..8c5d482c 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -179,7 +179,7 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) && - nla_put(skb, OVS_KEY_ATTR_CT_LABEL, sizeof(key->ct.label), + nla_put(skb, OVS_KEY_ATTR_CT_LABELS, sizeof(key->ct.label), &key->ct.label)) return -EMSGSIZE; @@ -545,7 +545,7 @@ static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = { .maxlen = sizeof(u16) }, [OVS_CT_ATTR_MARK] = { .minlen = sizeof(struct md_mark), .maxlen = sizeof(struct md_mark) }, - [OVS_CT_ATTR_LABEL] = { .minlen = sizeof(struct md_label), + [OVS_CT_ATTR_LABELS]= { .minlen = sizeof(struct md_label), .maxlen = sizeof(struct md_label) }, [OVS_CT_ATTR_HELPER]= { .minlen = 1, .maxlen = NF_CT_HELPER_NAME_LEN } @@ -593,7 +593,7 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info, } #endif #ifdef CONFIG_NF_CONNTRACK_LABELS - case OVS_CT_ATTR_LABEL: { + case OVS_CT_ATTR_LABELS: { struct md_label *label = nla_data(a); info->label = *label; @@ -633,7 +633,7 @@ bool ovs_ct_verify(struct net *net, enum ovs_key_attr attr) attr == OVS_KEY_ATTR_CT_MARK) return true; if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) && - attr == OVS_KEY_ATTR_CT_LABEL) { + attr == OVS_KEY_ATTR_CT_LABELS) { struct ovs_net *ovs_net = net_generic(net, ovs_net_id); return ovs_net->xt_label; @@ -711,7 +711,7 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info *ct_info, &ct_info->mark)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) && - nla_put(skb, OVS_CT_ATTR_LABEL, sizeof(ct_info->label), + nla_put(skb, OVS_CT_ATTR_LABELS, sizeof(ct_info->label), &ct_info->label)) return -EMSGSIZE; if (ct_info->helper) { diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 5c030a4..ea82cd5 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -294,7 +294,7 @@ size_t ovs_key_attr_size(void) + nla_total_size(1) /* OVS_KEY_ATTR_CT_STATE */ + nla_total_size(2) /* OVS_KEY_ATTR_CT_ZONE */ + nla_total_size(4) /* OVS_KEY_ATTR_CT_MARK */ -
[PATCH net 4/7] openvswitch: Ensure flow is valid before executing ct
The ct action uses parts of the flow key, so we need to ensure that it is valid before executing that action. Fixes: 7f8a436 "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer --- net/openvswitch/actions.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index e1afbd1..9a88f15 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -1104,6 +1104,12 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, break; case OVS_ACTION_ATTR_CT: + if (!is_flow_key_valid(key)) { + err = ovs_flow_key_update(skb, key); + if (err) + return err; + } + err = ovs_ct_execute(ovs_dp_get_net(dp), skb, key, nla_data(a)); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 2/7] openvswitch: Fix typos in CT headers
These comments hadn't caught up to their implementations, fix them. Fixes: 7f8a436 "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer --- include/uapi/linux/openvswitch.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 9afcd60..7cbb9d5 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -630,7 +630,7 @@ struct ovs_action_hash { */ enum ovs_ct_attr { OVS_CT_ATTR_UNSPEC, - OVS_CT_ATTR_FLAGS, /* u8 bitmask of OVS_CT_F_*. */ + OVS_CT_ATTR_FLAGS, /* u32 bitmask of OVS_CT_F_*. */ OVS_CT_ATTR_ZONE, /* u16 zone id. */ OVS_CT_ATTR_MARK, /* mark to associate with this connection. */ OVS_CT_ATTR_LABELS, /* labels to associate with this connection. */ @@ -705,7 +705,7 @@ enum ovs_action_attr { * data immediately followed by a mask. * The data must be zero for the unmasked * bits. */ - OVS_ACTION_ATTR_CT, /* One nested OVS_CT_ATTR_* . */ + OVS_ACTION_ATTR_CT, /* Nested OVS_CT_ATTR_* . */ __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted * from userspace. */ -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 5/7] openvswitch: Reject ct_state unsupported bits
Previously, if userspace specified ct_state bits in the flow key which are currently undefined (and therefore unsupported), then they would be ignored. This could cause unexpected behaviour in future if userspace is extended to support additional bits but attempts to communicate with the current version of the kernel. This patch rectifies the situation by rejecting such ct_state bits. Fixes: 7f8a436 "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer --- net/openvswitch/conntrack.h| 12 net/openvswitch/flow_netlink.c | 6 ++ 2 files changed, 18 insertions(+) diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h index 43f5dd7..c658d95 100644 --- a/net/openvswitch/conntrack.h +++ b/net/openvswitch/conntrack.h @@ -34,6 +34,13 @@ int ovs_ct_execute(struct net *, struct sk_buff *, struct sw_flow_key *, void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key); int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb); void ovs_ct_free_action(const struct nlattr *a); + +static inline bool ovs_ct_state_supported(u8 state) +{ + return !(state & ~(OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED | +OVS_CS_F_RELATED | OVS_CS_F_REPLY_DIR | +OVS_CS_F_INVALID | OVS_CS_F_TRACKED)); +} #else #include @@ -46,6 +53,11 @@ static inline bool ovs_ct_verify(struct net *net, int attr) return false; } +static inline bool ovs_ct_state_supported(u8 state) +{ + return false; +} + static inline int ovs_ct_copy_action(struct net *net, const struct nlattr *nla, const struct sw_flow_key *key, struct sw_flow_actions **acts, bool log) diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index ea82cd5..c4917c9 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -816,6 +816,12 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match, ovs_ct_verify(net, OVS_KEY_ATTR_CT_STATE)) { u8 ct_state = nla_get_u8(a[OVS_KEY_ATTR_CT_STATE]); + if (!is_mask && !ovs_ct_state_supported(ct_state)) { + OVS_NLERR(log, "ct_state flags %02x unsupported", + ct_state); + return -EINVAL; + } + SW_FLOW_KEY_PUT(match, ct.state, ct_state, is_mask); *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_STATE); } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 7/7] openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMIT
Previously, the CT_ATTR_FLAGS attribute, when nested under the OVS_ACTION_ATTR_CT, encoded a 32-bit bitmask of flags that modify the semantics of the ct action. It's more extensible to just represent each flag as a nested attribute, and this requires no additional error checking to reject flags that aren't currently supported. Suggested-by: Ben Pfaff Signed-off-by: Joe Stringer --- include/uapi/linux/openvswitch.h | 14 -- net/openvswitch/conntrack.c | 20 +--- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index f121af5..e14563e 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -618,7 +618,9 @@ struct ovs_action_hash { /** * enum ovs_ct_attr - Attributes for %OVS_ACTION_ATTR_CT action. - * @OVS_CT_ATTR_FLAGS: u32 connection tracking flags. + * @OVS_CT_ATTR_COMMIT: If present, commits the connection to the conntrack + * table. This allows future packets for the same connection to be identified + * as 'established' or 'related'. * @OVS_CT_ATTR_ZONE: u16 connection tracking zone. * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the * mask, the corresponding bit in the value is copied to the connection @@ -630,7 +632,7 @@ struct ovs_action_hash { */ enum ovs_ct_attr { OVS_CT_ATTR_UNSPEC, - OVS_CT_ATTR_FLAGS, /* u32 bitmask of OVS_CT_F_*. */ + OVS_CT_ATTR_COMMIT, /* No argument, commits connection. */ OVS_CT_ATTR_ZONE, /* u16 zone id. */ OVS_CT_ATTR_MARK, /* mark to associate with this connection. */ OVS_CT_ATTR_LABELS, /* labels to associate with this connection. */ @@ -641,14 +643,6 @@ enum ovs_ct_attr { #define OVS_CT_ATTR_MAX (__OVS_CT_ATTR_MAX - 1) -/* - * OVS_CT_ATTR_FLAGS flags - bitmask of %OVS_CT_F_* - * @OVS_CT_F_COMMIT: Commits the flow to the conntrack table. This allows - * future packets for the same connection to be identified as 'established' - * or 'related'. - */ -#define OVS_CT_F_COMMIT0x01 - /** * enum ovs_action_attr - Action types. * diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 167cf43..effa78c 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -42,12 +42,18 @@ struct md_label { struct ovs_key_ct_label mask; }; +/* Flags for performing connection tracking. + * + * CT_F_COMMIT: Commits the flow to the conntrack table. + */ +#define CT_F_COMMITBIT(0) + /* Conntrack action context for execution. */ struct ovs_conntrack_info { struct nf_conntrack_helper *helper; struct nf_conntrack_zone zone; struct nf_conn *ct; - u32 flags; + u8 flags; /* bitmask of CT_F_*. */ u16 family; struct md_mark mark; struct md_label label; @@ -493,7 +499,7 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb, return err; } - if (info->flags & OVS_CT_F_COMMIT) + if (info->flags & CT_F_COMMIT) err = ovs_ct_commit(net, key, info, skb); else err = ovs_ct_lookup(net, key, info, skb); @@ -539,8 +545,7 @@ static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name, } static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = { - [OVS_CT_ATTR_FLAGS] = { .minlen = sizeof(u32), - .maxlen = sizeof(u32) }, + [OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 }, [OVS_CT_ATTR_ZONE] = { .minlen = sizeof(u16), .maxlen = sizeof(u16) }, [OVS_CT_ATTR_MARK] = { .minlen = sizeof(struct md_mark), @@ -576,8 +581,8 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info, } switch (type) { - case OVS_CT_ATTR_FLAGS: - info->flags = nla_get_u32(a); + case OVS_CT_ATTR_COMMIT: + info->flags |= CT_F_COMMIT; break; #ifdef CONFIG_NF_CONNTRACK_ZONES case OVS_CT_ATTR_ZONE: @@ -701,7 +706,8 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info *ct_info, if (!start) return -EMSGSIZE; - if (nla_put_u32(skb, OVS_CT_ATTR_FLAGS, ct_info->flags)) + if (ct_info->flags & CT_F_COMMIT && + nla_put_flag(skb, OVS_CT_ATTR_COMMIT)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) && nla_put_u16(skb, OVS_CT_ATTR_ZONE, ct_info->zone.id)) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 3/7] openvswitch: Fix skb leak in ovs_fragment()
If ovs_fragment() was unable to fragment the skb due to an L2 header that exceeds the supported length, skbs would be leaked. Fix the bug. Fixes: 7f8a436 "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer --- net/openvswitch/actions.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index e23a61c..e1afbd1 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -684,7 +684,7 @@ static void ovs_fragment(struct vport *vport, struct sk_buff *skb, u16 mru, { if (skb_network_offset(skb) > MAX_L2_LEN) { OVS_NLERR(1, "L2 header too long to fragment"); - return; + goto out; } if (ethertype == htons(ETH_P_IP)) { @@ -708,8 +708,7 @@ static void ovs_fragment(struct vport *vport, struct sk_buff *skb, u16 mru, struct rt6_info ovs_rt; if (!v6ops) { - kfree_skb(skb); - return; + goto out; } prepare_frag(vport, skb); @@ -728,8 +727,14 @@ static void ovs_fragment(struct vport *vport, struct sk_buff *skb, u16 mru, WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.", ovs_vport_name(vport), ntohs(ethertype), mru, vport->dev->mtu); - kfree_skb(skb); + goto out; } + + skb = NULL; + +out: + if (skb) + kfree_skb(skb); } static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 0/7] OVS conntrack fixes for net
The userspace side of the Open vSwitch conntrack changes is currently undergoing review, which has highlighted some minor bugs in the existing conntrack implementation in the kernel, as well as pointing out some future-proofing that can be done on the interface to reduce the need for additional compatibility code in future. The biggest changes here are to the userspace API for the ct_state match field and the CT action. This series proposes to firstly extend the ct_state match field to 32 bits, ensuring to reject any currently unsupported bits. Secondly, rather than representing CT action flags within a 32-bit field, simply use a netlink attribute as presence of the single flag that is defined today. This also serves to reject unsupported ct action flag bits. Joe Stringer (7): openvswitch: Make LABELS name more consistent openvswitch: Fix typos in CT headers openvswitch: Fix skb leak in ovs_fragment() openvswitch: Ensure flow is valid before executing ct openvswitch: Reject ct_state unsupported bits openvswitch: Extend ct_state match field to 32 bits openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMIT include/uapi/linux/openvswitch.h | 28 +++- net/openvswitch/actions.c| 21 - net/openvswitch/conntrack.c | 32 +++- net/openvswitch/conntrack.h | 12 net/openvswitch/flow_netlink.c | 26 -- 5 files changed, 74 insertions(+), 45 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr
Hi Tom, [auto build test results on next-20150929 -- if it's inappropriate base, please ignore] config: xtensa-allyesconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout c505336670b5c681c0a36053a68591e0f9074245 # save the attached .config to linux build tree make.cross ARCH=xtensa All warnings (new ones prefixed by >>): net/ipv6/ila/ila_xlat.c: In function 'ila_xlat_fini': >> net/ipv6/ila/ila_xlat.c:636:6: warning: unused variable 'i' >> [-Wunused-variable] int i; ^ vim +/i +636 net/ipv6/ila/ila_xlat.c 620 ila_nl_ops); 621 if (ret < 0) 622 goto unregister; 623 624 xfrm6_xlat_addr_add(&ila_xlat); 625 626 return 0; 627 628 unregister: 629 unregister_pernet_device(&ila_net_ops); 630 exit: 631 return ret; 632 } 633 634 void ila_xlat_fini(void) 635 { > 636 int i; 637 638 xfrm6_xlat_addr_del(&ila_xlat); 639 genl_unregister_family(&ila_nl_family); 640 unregister_pernet_device(&ila_net_ops); 641 } 642 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
[PATCH 3/3] fm10k: use napi_schedule_irqoff()
The fm10k_msix_clean_rings function runs from hard interrupt context or with interrupts already disabled in netpoll. It can use napi_schedule_irqoff() instead of napi_schedule() Signed-off-by: Alexander Duyck --- drivers/net/ethernet/intel/fm10k/fm10k_pci.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c index 74be792f3f1b..5fbffbaefe32 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c @@ -846,7 +846,7 @@ static irqreturn_t fm10k_msix_clean_rings(int __always_unused irq, void *data) struct fm10k_q_vector *q_vector = data; if (q_vector->rx.count || q_vector->tx.count) - napi_schedule(&q_vector->napi); + napi_schedule_irqoff(&q_vector->napi); return IRQ_HANDLED; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] use napi_schedule_irqoff()
This patch set is meant to replace the calls to napi_schedule with napi_schedule_irqoff as this should help to reduce the interrupt overhead slightly by removing the unneeded call to local_irq_save and local_irq_restore. --- Alexander Duyck (3): ixgbe/ixgbevf: use napi_schedule_irqoff() i40e/i40evf: use napi_schedule_irqoff() fm10k: use napi_schedule_irqoff() drivers/net/ethernet/intel/fm10k/fm10k_pci.c |2 +- drivers/net/ethernet/intel/i40e/i40e_main.c |6 -- drivers/net/ethernet/intel/i40evf/i40evf_main.c |2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |4 ++-- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |2 +- 5 files changed, 9 insertions(+), 7 deletions(-) -- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] i40e/i40evf: use napi_schedule_irqoff()
The i40e_intr and i40e/i40evf_msix_clean_rings functions run from hard interrupt context or with interrupts already disabled in netpoll. They can use napi_schedule_irqoff() instead of napi_schedule() Signed-off-by: Alexander Duyck --- drivers/net/ethernet/intel/i40e/i40e_main.c |6 -- drivers/net/ethernet/intel/i40evf/i40evf_main.c |2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 484226e0365d..3cc97d4f5f70 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3281,7 +3281,7 @@ static irqreturn_t i40e_msix_clean_rings(int irq, void *data) if (!q_vector->tx.ring && !q_vector->rx.ring) return IRQ_HANDLED; - napi_schedule(&q_vector->napi); + napi_schedule_irqoff(&q_vector->napi); return IRQ_HANDLED; } @@ -3450,6 +3450,8 @@ static irqreturn_t i40e_intr(int irq, void *data) /* only q0 is used in MSI/Legacy mode, and none are used in MSIX */ if (icr0 & I40E_PFINT_ICR0_QUEUE_0_MASK) { + struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi]; + struct i40e_q_vector *q_vector = vsi->q_vectors[0]; /* temporarily disable queue cause for NAPI processing */ u32 qval = rd32(hw, I40E_QINT_RQCTL(0)); @@ -3462,7 +3464,7 @@ static irqreturn_t i40e_intr(int irq, void *data) wr32(hw, I40E_QINT_TQCTL(0), qval); if (!test_bit(__I40E_DOWN, &pf->state)) - napi_schedule(&pf->vsi[pf->lan_vsi]->q_vectors[0]->napi); + napi_schedule_irqoff(&q_vector->napi); } if (icr0 & I40E_PFINT_ICR0_ADMINQ_MASK) { diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 5e1336321c2f..4b3db099f58c 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -334,7 +334,7 @@ static irqreturn_t i40evf_msix_clean_rings(int irq, void *data) if (!q_vector->tx.ring && !q_vector->rx.ring) return IRQ_HANDLED; - napi_schedule(&q_vector->napi); + napi_schedule_irqoff(&q_vector->napi); return IRQ_HANDLED; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] ixgbe/ixgbevf: use napi_schedule_irqoff()
The ixgbe_intr and ixgbe/ixgbevf_msix_clean_rings functions run from hard interrupt context or with interrupts already disabled in netpoll. They can use napi_schedule_irqoff() instead of napi_schedule() Signed-off-by: Alexander Duyck --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |4 ++-- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 693f2da33569..67dc916c94d6 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2754,7 +2754,7 @@ static irqreturn_t ixgbe_msix_clean_rings(int irq, void *data) /* EIAM disabled interrupts (on this vector) for us */ if (q_vector->rx.ring || q_vector->tx.ring) - napi_schedule(&q_vector->napi); + napi_schedule_irqoff(&q_vector->napi); return IRQ_HANDLED; } @@ -2948,7 +2948,7 @@ static irqreturn_t ixgbe_intr(int irq, void *data) ixgbe_ptp_check_pps_event(adapter, eicr); /* would disable interrupts here but EIAM disabled it */ - napi_schedule(&q_vector->napi); + napi_schedule_irqoff(&q_vector->napi); /* * re-enable link(maybe) and non-queue interrupts, no flush. diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index 592ff237d692..f1c5f3372667 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -1288,7 +1288,7 @@ static irqreturn_t ixgbevf_msix_clean_rings(int irq, void *data) /* EIAM disabled interrupts (on this vector) for us */ if (q_vector->rx.ring || q_vector->tx.ring) - napi_schedule(&q_vector->napi); + napi_schedule_irqoff(&q_vector->napi); return IRQ_HANDLED; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr
This patch set up a hook for xfrm6_xlat_addr. This provides a way to perform ILA translation before early demux which can be a significant performance advantage over LWT which would occur later. The implementation entails a rhashtable which is used to do the locator lookup. The rhash table is configured via new netlink commands. Signed-off-by: Tom Herbert --- include/uapi/linux/ila.h | 22 ++ net/ipv6/Kconfig | 1 + net/ipv6/ila/Makefile | 2 +- net/ipv6/ila/ila.h| 2 + net/ipv6/ila/ila_common.c | 8 + net/ipv6/ila/ila_xlat.c | 642 ++ 6 files changed, 676 insertions(+), 1 deletion(-) create mode 100644 net/ipv6/ila/ila_xlat.c diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h index 7ed9e67..abde7bb 100644 --- a/include/uapi/linux/ila.h +++ b/include/uapi/linux/ila.h @@ -3,13 +3,35 @@ #ifndef _UAPI_LINUX_ILA_H #define _UAPI_LINUX_ILA_H +/* NETLINK_GENERIC related info */ +#define ILA_GENL_NAME "ila" +#define ILA_GENL_VERSION 0x1 + enum { ILA_ATTR_UNSPEC, ILA_ATTR_LOCATOR, /* u64 */ + ILA_ATTR_IDENTIFIER,/* u64 */ + ILA_ATTR_LOCATOR_MATCH, /* u64 */ + ILA_ATTR_IFINDEX, /* s32 */ + ILA_ATTR_DIR, /* u32 */ __ILA_ATTR_MAX, }; #define ILA_ATTR_MAX (__ILA_ATTR_MAX - 1) +enum { + ILA_CMD_UNSPEC, + ILA_CMD_ADD, + ILA_CMD_DEL, + ILA_CMD_GET, + + __ILA_CMD_MAX, +}; + +#define ILA_CMD_MAX(__ILA_CMD_MAX - 1) + +#define ILA_DIR_IN (1 << 0) +#define ILA_DIR_OUT(1 << 1) + #endif /* _UAPI_LINUX_ILA_H */ diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig index 6e8ca06..c972497 100644 --- a/net/ipv6/Kconfig +++ b/net/ipv6/Kconfig @@ -95,6 +95,7 @@ config IPV6_MIP6 config IPV6_ILA tristate "IPv6: Identifier Locator Addressing (ILA)" select LWTUNNEL + select INET6_XFRM_XLAT_ADDR ---help--- Support for IPv6 Identifier Locator Addressing (ILA). diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile index 31d136b..4b32e59 100644 --- a/net/ipv6/ila/Makefile +++ b/net/ipv6/ila/Makefile @@ -4,4 +4,4 @@ obj-$(CONFIG_IPV6_ILA) += ila.o -ila-objs := ila_common.o ila_lwt.o +ila-objs := ila_common.o ila_lwt.o ila_xlat.o diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h index b94081f..28542cb 100644 --- a/net/ipv6/ila/ila.h +++ b/net/ipv6/ila/ila.h @@ -42,5 +42,7 @@ void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p); int ila_lwt_init(void); void ila_lwt_fini(void); +int ila_xlat_init(void); +void ila_xlat_fini(void); #endif /* __ILA_H */ diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c index 1a1e1e0..cde7b96 100644 --- a/net/ipv6/ila/ila_common.c +++ b/net/ipv6/ila/ila_common.c @@ -80,12 +80,20 @@ static int __init ila_init(void) if (ret) goto fail_lwt; + ret = ila_xlat_init(); + if (ret) + goto fail_xlat; + + return 0; +fail_xlat: + ila_lwt_fini(); fail_lwt: return ret; } static void __exit ila_fini(void) { + ila_xlat_fini(); ila_lwt_fini(); } diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c new file mode 100644 index 000..cd6135b --- /dev/null +++ b/net/ipv6/ila/ila_xlat.c @@ -0,0 +1,642 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "ila.h" + +struct ila_xlat_params { + struct ila_params ip; + __be64 identifier; + int ifindex; + unsigned int dir; +}; + +struct ila_map { + struct ila_xlat_params p; + struct rhash_head node; + struct ila_map *next; + struct rcu_head rcu; +}; + +static unsigned int ila_net_id; + +struct ila_net { + struct rhashtable rhash_table; + spinlock_t *locks; /* Bucket locks for entry manipulation */ + unsigned int locks_mask; +}; + +#defineLOCKS_PER_CPU 10 + +static int alloc_ila_locks(struct ila_net *ilan, gfp_t gfp) +{ + unsigned int i, size; + unsigned int nr_pcpus = num_possible_cpus(); + + nr_pcpus = min_t(unsigned int, nr_pcpus, 32UL); + size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU); + + if (sizeof(spinlock_t) != 0) { +#ifdef CONFIG_NUMA + if (size * sizeof(spinlock_t) > PAGE_SIZE && + gfp == GFP_KERNEL) + ilan->locks = vmalloc(size * sizeof(spinlock_t)); + else +#endif + ilan->locks = kmalloc_array(size, sizeof(spinlock_t), + gfp); + if (!ilan->locks) + return -ENOMEM; + for (i = 0; i < size; i++) + spin_lock_init(&ilan->locks[i]); + } + ilan->locks_mask = size - 1; + + return 0; +} + +static u
[PATCH net-next 3/6] netlink: add a start callback for starting a netlink dump
The start callback allows the caller to set up a context for the dump callbacks. Presumably, the context can then be destroyed in the done callback. Signed-off-by: Tom Herbert --- include/linux/netlink.h | 2 ++ include/net/genetlink.h | 2 ++ net/netlink/af_netlink.c | 4 net/netlink/genetlink.c | 16 4 files changed, 24 insertions(+) diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 639e9b8..0b41959 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -131,6 +131,7 @@ netlink_skb_clone(struct sk_buff *skb, gfp_t gfp_mask) struct netlink_callback { struct sk_buff *skb; const struct nlmsghdr *nlh; + int (*start)(struct netlink_callback *); int (*dump)(struct sk_buff * skb, struct netlink_callback *cb); int (*done)(struct netlink_callback *cb); @@ -153,6 +154,7 @@ struct nlmsghdr * __nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int flags); struct netlink_dump_control { + int (*start)(struct netlink_callback *); int (*dump)(struct sk_buff *skb, struct netlink_callback *); int (*done)(struct netlink_callback *); void *data; diff --git a/include/net/genetlink.h b/include/net/genetlink.h index 1b6b6dc..43c0e77 100644 --- a/include/net/genetlink.h +++ b/include/net/genetlink.h @@ -114,6 +114,7 @@ static inline void genl_info_net_set(struct genl_info *info, struct net *net) * @flags: flags * @policy: attribute validation policy * @doit: standard command callback + * @start: start callback for dumps * @dumpit: callback for dumpers * @done: completion callback for dumps * @ops_list: operations list @@ -122,6 +123,7 @@ struct genl_ops { const struct nla_policy *policy; int(*doit)(struct sk_buff *skb, struct genl_info *info); + int(*start)(struct netlink_callback *cb); int(*dumpit)(struct sk_buff *skb, struct netlink_callback *cb); int(*done)(struct netlink_callback *cb); diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 8f060d7..c8c43ac 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2905,6 +2905,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, cb = &nlk->cb; memset(cb, 0, sizeof(*cb)); + cb->start = control->start; cb->dump = control->dump; cb->done = control->done; cb->nlh = nlh; @@ -2917,6 +2918,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, mutex_unlock(nlk->cb_mutex); + if (cb->start) + cb->start(cb); + ret = netlink_dump(sk); sock_put(sk); diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index 75724a9..5fd08c0 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -513,6 +513,20 @@ void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, } EXPORT_SYMBOL(genlmsg_put); +static int genl_lock_start(struct netlink_callback *cb) +{ + /* our ops are always const - netlink API doesn't propagate that */ + const struct genl_ops *ops = cb->data; + int rc = 0; + + if (ops->start) { + genl_lock(); + rc = ops->start(cb); + genl_unlock(); + } + return rc; +} + static int genl_lock_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { /* our ops are always const - netlink API doesn't propagate that */ @@ -577,6 +591,7 @@ static int genl_family_rcv_msg(struct genl_family *family, .module = family->module, /* we have const, but the netlink API doesn't */ .data = (void *)ops, + .start = genl_lock_start, .dump = genl_lock_dumpit, .done = genl_lock_done, }; @@ -588,6 +603,7 @@ static int genl_family_rcv_msg(struct genl_family *family, } else { struct netlink_dump_control c = { .module = family->module, + .start = ops->start, .dump = ops->dumpit, .done = ops->done, }; -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv
Call before performing NF_HOOK and routing in order to perform address translation in the receive path. Signed-off-by: Tom Herbert --- net/ipv6/ip6_input.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index 9075acf..06dac55 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt /* Must drop socket now because of tproxy. */ skb_orphan(skb); + /* Translate destination address before routing */ + xfrm6_xlat_addr(skb); + return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING, net, NULL, skb, dev, NULL, ip6_rcv_finish); -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/6] xfrm: Add xfrm6 address translation function
This patch adds xfrm6_xlat_addr which is called in the data path to perform address translation (primarily for the receive path). Modules may register their own callback to perform a translation-- this registration is managed by xfrm6_xlat_addr_add and xfrm6_xlat_addr_del. xfrm6_xlat_addr allows translation of addresses for an sk_buff. Signed-off-by: Tom Herbert --- include/net/xfrm.h | 25 ++ net/ipv6/Kconfig | 4 +++ net/ipv6/Makefile | 1 + net/ipv6/xfrm6_policy.c| 7 + net/ipv6/xfrm6_xlat_addr.c | 66 ++ 5 files changed, 103 insertions(+) create mode 100644 net/ipv6/xfrm6_xlat_addr.c diff --git a/include/net/xfrm.h b/include/net/xfrm.h index fd17610..ea05c4e 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -607,6 +607,31 @@ struct xfrm_mgr { int xfrm_register_km(struct xfrm_mgr *km); int xfrm_unregister_km(struct xfrm_mgr *km); +struct xfrm6_xlat_addr { + int (*xlat)(struct sk_buff *skb); + struct list_head list; +}; + +#ifdef CONFIG_INET6_XFRM_XLAT_ADDR +void xfrm6_xlat_addr_add(struct xfrm6_xlat_addr *xla); +void xfrm6_xlat_addr_del(struct xfrm6_xlat_addr *xla); +int xfrm6_xlat_addr(struct sk_buff *skb); +int xfrm6_xlat_addr_init(void); +void xfrm6_xlat_addr_fini(void); +#else +static inline int xfrm6_xlat_addr(struct sk_buff *skb) +{ + return 0; +} + +static inline int xfrm6_xlat_addr_init(void) +{ + return 0; +} + +static inline void xfrm6_xlat_addr_fini(void) { } +#endif + struct xfrm_tunnel_skb_cb { union { struct inet_skb_parm h4; diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig index 983bb99..6e8ca06 100644 --- a/net/ipv6/Kconfig +++ b/net/ipv6/Kconfig @@ -153,6 +153,10 @@ config INET6_XFRM_MODE_ROUTEOPTIMIZATION ---help--- Support for MIPv6 route optimization mode. +config INET6_XFRM_XLAT_ADDR + select XFRM + bool + config IPV6_VTI tristate "Virtual (secure) IPv6: tunneling" select IPV6_TUNNEL diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile index 2fbd90b..c719d6f 100644 --- a/net/ipv6/Makefile +++ b/net/ipv6/Makefile @@ -33,6 +33,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TRANSPORT) += xfrm6_mode_transport.o obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o +obj-$(CONFIG_INET6_XFRM_XLAT_ADDR) += xfrm6_xlat_addr.o obj-$(CONFIG_IPV6_MIP6) += mip6.o obj-$(CONFIG_IPV6_ILA) += ila/ obj-$(CONFIG_NETFILTER)+= netfilter/ diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 30caa28..81b9079 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -390,11 +390,17 @@ int __init xfrm6_init(void) if (ret) goto out_state; + ret = xfrm6_xlat_addr_init(); + if (ret) + goto out_protocol; + #ifdef CONFIG_SYSCTL register_pernet_subsys(&xfrm6_net_ops); #endif out: return ret; +out_protocol: + xfrm6_protocol_fini(); out_state: xfrm6_state_fini(); out_policy: @@ -407,6 +413,7 @@ void xfrm6_fini(void) #ifdef CONFIG_SYSCTL unregister_pernet_subsys(&xfrm6_net_ops); #endif + xfrm6_xlat_addr_fini(); xfrm6_protocol_fini(); xfrm6_policy_fini(); xfrm6_state_fini(); diff --git a/net/ipv6/xfrm6_xlat_addr.c b/net/ipv6/xfrm6_xlat_addr.c new file mode 100644 index 000..dd2199a --- /dev/null +++ b/net/ipv6/xfrm6_xlat_addr.c @@ -0,0 +1,66 @@ +#include +#include +#include +#include +#include + +static struct list_head xfrm6_xlat_addr_head __read_mostly; +static DEFINE_SPINLOCK(xfrm6_xlat_addr_lock); + +void xfrm6_xlat_addr_add(struct xfrm6_xlat_addr *xla) +{ + spin_lock(&xfrm6_xlat_addr_lock); + list_add_rcu(&xla->list, &xfrm6_xlat_addr_head); + spin_unlock(&xfrm6_xlat_addr_lock); +} +EXPORT_SYMBOL(xfrm6_xlat_addr_add); + +void xfrm6_xlat_addr_del(struct xfrm6_xlat_addr *xla) +{ + struct xfrm6_xlat_addr *tmp; + + spin_lock(&xfrm6_xlat_addr_lock); + + list_for_each_entry_rcu(tmp, &xfrm6_xlat_addr_head, list) { + if (xla == tmp) { + list_del_rcu(&xla->list); + goto out; + } + } + + pr_warn("xfrm6_xlat_addr_del: %p not found\n", xla); +out: + spin_unlock(&xfrm6_xlat_addr_lock); +} +EXPORT_SYMBOL(xfrm6_xlat_addr_del); + +int xfrm6_xlat_addr(struct sk_buff *skb) +{ + struct xfrm6_xlat_addr *xla; + int err = 0; + + rcu_read_lock(); + + list_for_each_entry_rcu(xla, &xfrm6_xlat_addr_head, list) { + err = xla->xlat(skb); + if (err < 0) + break; + } + + rcu_read_unlock(); + + return err; +} +EXPORT_SYMBOL(xfrm6_xlat_addr); + +int __init xfrm6_xlat_addr_init(void) +{ + INIT_LIST_HEAD(&xfrm6_xlat_addr_head); + +
[PATCH net-next 1/6] ila: Create net/ipv6/ila directory
Create ila directory in preparation for supporting other hooks in the kernel than LWT for doing ILA. This includes: - Moving ila.c to ila/ila_lwt.c - Splitting out some common functions into ila_common.c Signed-off-by: Tom Herbert --- net/ipv6/Makefile | 2 +- net/ipv6/ila.c| 229 -- net/ipv6/ila/Makefile | 7 ++ net/ipv6/ila/ila.h| 46 ++ net/ipv6/ila/ila_common.c | 95 +++ net/ipv6/ila/ila_lwt.c| 152 ++ 6 files changed, 301 insertions(+), 230 deletions(-) delete mode 100644 net/ipv6/ila.c create mode 100644 net/ipv6/ila/Makefile create mode 100644 net/ipv6/ila/ila.h create mode 100644 net/ipv6/ila/ila_common.c create mode 100644 net/ipv6/ila/ila_lwt.c diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile index 2c900c7..2fbd90b 100644 --- a/net/ipv6/Makefile +++ b/net/ipv6/Makefile @@ -34,7 +34,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o obj-$(CONFIG_IPV6_MIP6) += mip6.o -obj-$(CONFIG_IPV6_ILA) += ila.o +obj-$(CONFIG_IPV6_ILA) += ila/ obj-$(CONFIG_NETFILTER)+= netfilter/ obj-$(CONFIG_IPV6_VTI) += ip6_vti.o diff --git a/net/ipv6/ila.c b/net/ipv6/ila.c deleted file mode 100644 index 678d2df..000 --- a/net/ipv6/ila.c +++ /dev/null @@ -1,229 +0,0 @@ -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -struct ila_params { - __be64 locator; - __be64 locator_match; - __wsum csum_diff; -}; - -static inline struct ila_params *ila_params_lwtunnel( - struct lwtunnel_state *lwstate) -{ - return (struct ila_params *)lwstate->data; -} - -static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to) -{ - __be32 diff[] = { - ~from[0], ~from[1], to[0], to[1], - }; - - return csum_partial(diff, sizeof(diff), 0); -} - -static inline __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p) -{ - if (*(__be64 *)&ip6h->daddr == p->locator_match) - return p->csum_diff; - else - return compute_csum_diff8((__be32 *)&ip6h->daddr, - (__be32 *)&p->locator); -} - -static void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p) -{ - __wsum diff; - struct ipv6hdr *ip6h = ipv6_hdr(skb); - size_t nhoff = sizeof(struct ipv6hdr); - - /* First update checksum */ - switch (ip6h->nexthdr) { - case NEXTHDR_TCP: - if (likely(pskb_may_pull(skb, nhoff + sizeof(struct tcphdr { - struct tcphdr *th = (struct tcphdr *) - (skb_network_header(skb) + nhoff); - - diff = get_csum_diff(ip6h, p); - inet_proto_csum_replace_by_diff(&th->check, skb, - diff, true); - } - break; - case NEXTHDR_UDP: - if (likely(pskb_may_pull(skb, nhoff + sizeof(struct udphdr { - struct udphdr *uh = (struct udphdr *) - (skb_network_header(skb) + nhoff); - - if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) { - diff = get_csum_diff(ip6h, p); - inet_proto_csum_replace_by_diff(&uh->check, skb, - diff, true); - if (!uh->check) - uh->check = CSUM_MANGLED_0; - } - } - break; - case NEXTHDR_ICMP: - if (likely(pskb_may_pull(skb, -nhoff + sizeof(struct icmp6hdr { - struct icmp6hdr *ih = (struct icmp6hdr *) - (skb_network_header(skb) + nhoff); - - diff = get_csum_diff(ip6h, p); - inet_proto_csum_replace_by_diff(&ih->icmp6_cksum, skb, - diff, true); - } - break; - } - - /* Now change destination address */ - *(__be64 *)&ip6h->daddr = p->locator; -} - -static int ila_output(struct sock *sk, struct sk_buff *skb) -{ - struct dst_entry *dst = skb_dst(skb); - - if (skb->protocol != htons(ETH_P_IPV6)) - goto drop; - - update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate)); - - return dst->lwtstate->orig_output(sk, skb); - -drop: - kfree_skb(skb); - return -EINVAL; -} - -static int ila_input(struct sk_buff *skb) -{ - struct ds
[PATCH net-next 2/6] rhashtable: add function to replace an element
Add the rhashtable_replace_fast function. This replaces one object in the table with another atomically. The hashes of the new and old objects must be equal. Signed-off-by: Tom Herbert --- include/linux/rhashtable.h | 82 ++ 1 file changed, 82 insertions(+) diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index 843ceca..77deece 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -819,4 +819,86 @@ out: return err; } +/* Internal function, please use rhashtable_replace_fast() instead */ +static inline int __rhashtable_replace_fast( + struct rhashtable *ht, struct bucket_table *tbl, + struct rhash_head *obj_old, struct rhash_head *obj_new, + const struct rhashtable_params params) +{ + struct rhash_head __rcu **pprev; + struct rhash_head *he; + spinlock_t *lock; + unsigned int hash; + int err = -ENOENT; + + /* Minimally, the old and new objects must have same hash +* (which should mean identifiers are the same). +*/ + hash = rht_head_hashfn(ht, tbl, obj_old, params); + if (hash != rht_head_hashfn(ht, tbl, obj_new, params)) + return -EINVAL; + + lock = rht_bucket_lock(tbl, hash); + + spin_lock_bh(lock); + + pprev = &tbl->buckets[hash]; + rht_for_each(he, tbl, hash) { + if (he != obj_old) { + pprev = &he->next; + continue; + } + + rcu_assign_pointer(obj_new->next, obj_old->next); + rcu_assign_pointer(*pprev, obj_new); + err = 0; + break; + } + + spin_unlock_bh(lock); + + return err; +} + +/** + * rhashtable_replace_fast - replace an object in hash table + * @ht:hash table + * @obj_old: pointer to hash head inside object being replaced + * @obj_new: pointer to hash head inside object which is new + * @params:hash table parameters + * + * Replacing an object doesn't affect the number of elements in the hash table + * or bucket, so we don't need to worry about shrinking or expanding the + * table here. + * + * Returns zero on success, -ENOENT if the entry could not be found, + * -EINVAL if hash is not the same for the old and new objects. + */ +static inline int rhashtable_replace_fast( + struct rhashtable *ht, struct rhash_head *obj_old, + struct rhash_head *obj_new, + const struct rhashtable_params params) +{ + struct bucket_table *tbl; + int err; + + rcu_read_lock(); + + tbl = rht_dereference_rcu(ht->tbl, ht); + + /* Because we have already taken (and released) the bucket +* lock in old_tbl, if we find that future_tbl is not yet +* visible then that guarantees the entry to still be in +* the old tbl if it exists. +*/ + while ((err = __rhashtable_replace_fast(ht, tbl, obj_old, + obj_new, params)) && + (tbl = rht_dereference_rcu(tbl->future_tbl, ht))) + ; + + rcu_read_unlock(); + + return err; +} + #endif /* _LINUX_RHASHTABLE_H */ -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/6] ila: Optimization to preserve value of early demux
In the current implementation of ILA, LWT is used to perform translation on both the input and output paths. This is functional, however there is a big performance hit in the receive path. Early demux occurs before the routing lookup (a hit actually obviates the route lookup). Therefore the stack currently performs early demux before translation so that a local connection with ILA addresses is never matched. Note that this issue is not just with ILA, but pretty much any translated or encapsulated packet handled by LWT would miss the opportunity for early demux. Solving the general problem seems non trivial since we would need to move the route lookup before early demx thereby mitigating the value. This patch set addresses the issue for ILA by adding a fast locator lookup that occurs before early demux. This is done by creating an XFRM hook to perform address translation early in the receive path. For the backend we implement an rhashtable that contains identifier to locator to mappings. The table also allows more specific matches that include original locator and interface. This patch set: - Add an rhashtable function to atomically replace and element. This is useful to implement sub-trees from a table entry without needing to use a special anchor structure as the table entry. - Add a start callback for starting a netlink dump. - Creates an ila directory under net/ipv6 and moves ila.c to it. ila.c is split into ila_common.c and ila_lwt.c. - Implement a table to do identifier->locator mapping. This is an rhashtable. - Configuration for the table with netlink. - Add XFRM xlat_addr facility. This includes a callback registeration function and hook to call registered callbacks. - Call xfrm6_xlat_addr from ipv6_rcv before NF_HOOK and routing. Testing: Running 200 netperf TCP_RR streams No ILA, baseline 85.72% CPU utilization 1861945 tps 93/163/330 50/90/99% latencies ILA before fix (LWT on both input and output) 83.47 CPU utilization 16583186 tps (-11% from baseline) 107/183/338 50/90/99% latencies ILA after fix (hook for input) 84.97% CPU utilization 1833948 tps (-1.5% from baseline) 95/164/331 50/90/99% latencies Hacked DNPT to do ILA 80.94% CPU utilization 1683315 tps (-10% from baseline) 104/179/350 50/90/99% latencies Tom Herbert (6): ila: Create net/ipv6/ila directory rhashtable: add function to replace an element netlink: add a start callback for starting a netlink dump xfrm: Add xfrm6 address translation function ipv6: Call xfrm6_xlat_addr from ipv6_rcv ila: Add support for xfrm6_xlat_addr include/linux/netlink.h| 2 + include/linux/rhashtable.h | 82 ++ include/net/genetlink.h| 2 + include/net/xfrm.h | 25 ++ include/uapi/linux/ila.h | 22 ++ net/ipv6/Kconfig | 5 + net/ipv6/Makefile | 3 +- net/ipv6/ila.c | 229 net/ipv6/ila/Makefile | 7 + net/ipv6/ila/ila.h | 48 net/ipv6/ila/ila_common.c | 103 net/ipv6/ila/ila_lwt.c | 152 +++ net/ipv6/ila/ila_xlat.c| 642 + net/ipv6/ip6_input.c | 3 + net/ipv6/xfrm6_policy.c| 7 + net/ipv6/xfrm6_xlat_addr.c | 66 + net/netlink/af_netlink.c | 4 + net/netlink/genetlink.c| 16 ++ 18 files changed, 1188 insertions(+), 230 deletions(-) delete mode 100644 net/ipv6/ila.c create mode 100644 net/ipv6/ila/Makefile create mode 100644 net/ipv6/ila/ila.h create mode 100644 net/ipv6/ila/ila_common.c create mode 100644 net/ipv6/ila/ila_lwt.c create mode 100644 net/ipv6/ila/ila_xlat.c create mode 100644 net/ipv6/xfrm6_xlat_addr.c -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH v2] netpoll: Drop budget parameter from NAPI polling call hierarchy
From: Alexander Duyck Date: Mon, 28 Sep 2015 09:16:17 -0700 > For some reason we were carrying the budget value around between the > various calls to napi->poll. If for example one of the drivers called had > a bug in which it returned a non-zero value for work this could result in > the budget value becoming negative. > > Rather than carry around a value of budget that is 0 or less we can instead > just loop through and pass 0 to each napi->poll call. If any driver > returns a value for work done that is non-zero then we can report that > driver and continue rather than allowing a bad actor to make the budget > value negative and pass that negative value to napi->poll. > > Note, the only actual change here is that instead of letting budget become > negative we are keeping it at 0 regardless of the value returned for work > since it should not be possible for the polling routine to do any actual > work with a budget of 0. So if the polling routine returns a non-0 value > we are just reporting it and continuing with a budget of 0 rather than > letting that work value be subtracted from the budget of 0. > > Signed-off-by: Alexander Duyck Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set
From: David Ahern Date: Mon, 28 Sep 2015 10:12:13 -0700 > Wolfgang reported that IPv6 stack is ignoring oif in output route lookups: ... > The stack does consider the oif but a mismatch in rt6_device_match is not > considered fatal because RT6_LOOKUP_F_IFACE is not set in the flags. > > Cc: Wolfgang Nothdurft > Signed-off-by: David Ahern Applied, thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND: [PATCH v3 net-next] sky2: use random address if EEPROM is bad
From: Liviu Dudau Date: Mon, 28 Sep 2015 17:51:51 +0100 > On some embedded systems the EEPROM does not contain a valid MAC address. > In that case it is better to fallback to a generated mac address and > let init scripts fix the value later. > > Reported-by: Liviu Dudau > Signed-off-by: Stephen Hemminger > [Changed handcoded setup to use eth_hw_addr_random() and to save new address > into HW] > Signed-off-by: Liviu Dudau Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/1] net sysfs: Print link speed as signed integer
From: Alexander Stein Date: Mon, 28 Sep 2015 15:05:33 +0200 > Otherwise 4294967295 (MBit/s) (-1) will be printed when there is no link. > Documentation/ABI/testing/sysfs-class-net does not state if this shall be > signed or unsigned. > Also remove the now unused variable fmt_udec. > > Signed-off-by: Alexander Stein Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
Daniel Mack wrote: > Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the > input demux is complete and the final destination socket (if any) > has been determined. > > This helps filtering packets based on information stored in the > destination socket, such as cgroup controller supplied net class IDs. This still seems like the 'x y' problem ("want to do X, think Y is correct solution; ask about Y, but thats a strange thing to do"). There is nothing that this offers over INPUT *except* that sk is available. But there is zero benefit as far as I am concerned -- why would you want to do any meaningful filtering based on the sk at that point...? Drop? Makes no sense, else application would not be running in the first place. Allowing response packets? Can already do that with conntrack. So the only 'benefit' is that netcls id is available; but a) why is that even needed and b) is such a huge sledgehammer just for net cgroup accounting worth it? Another question is what other strange things come up once we would open this door. > listening on a specific task, the resulting error code that is sent > back to the remote peer can't be controlled with rules in > NF_INET_LOCAL_SOCKET_IN chains. Right, and that makes this even weirder. For deterministic ingress filtering you can only rely on what is contained in the packet. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bna: fix error handling
From: Andrzej Hajda Date: Mon, 28 Sep 2015 10:49:48 +0200 > Several functions can return negative value in case of error, > so their return type should be fixed as well as type of variables > to which this value is assigned. > > The problem has been detected using proposed semantic patch > scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1]. > > [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107 > > Signed-off-by: Andrzej Hajda Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netpoll: Drop budget parameter from NAPI polling call hierarchy
From: Alexander Duyck Date: Sun, 27 Sep 2015 15:58:56 -0700 > On 09/26/2015 10:36 PM, David Miller wrote: >> From: Alexander Duyck >> Date: Tue, 22 Sep 2015 14:56:08 -0700 >> >>> Rather than carry around a value of budget that is 0 or less we can >>> instead >>> just loop through and pass 0 to each napi->poll call. If any driver >>> returns a value for work done that is non-zero then we can report that >>> driver and continue rather than allowing a bad actor to make the >>> budget >>> value negative and pass that negative value to napi->poll. >> Unfortunately we have drivers that won't do any TX work if the budget >> is zero. > > Well that is what we are doing right now. The fact is the call starts > out with a budget of 0, and it is somewhat hidden from the call since > the budget is assigned a value of 0 in netpoll_poll_dev. That is one > of the things I was wanting do address because that is clear as mud > from looking at poll_one_napi. Based on the code you would assume > budget starts out as a non-zero value and it doesn't. I see, thanks for explaining. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag
From: Aaron Conole Date: Sat, 26 Sep 2015 18:50:41 -0400 > This patch set implements a bugfix for kernel.org bugzilla #12323, allowing > MSG_PEEK to return all queued data on the unix domain socket, not just the > data contained in a single SKB. > > This is the v3 version of this patch, which includes a suggested modification > by Eric Dumazet to convert the unix_sk() conversion macro to a static inline > function. These patches are independent and can be applied separately. > > This set was tested over a 24-hour period, utilizing a loop continually > executing the bugzilla issue attached python code. It was instrumented with > a pr_err_once() ([ 13.798683] unix: went there at least one time). > > v2->v3: > - Added Eric Dumazet's suggestion for #define to static inline > - Fixed an issue calling unix_state_lock() with an invalid argument > > v3->v4: > - Eliminated an XXX comment > - Changed from goto unlock to explicit unix_state_unlock() and break Series applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html