Re: [net-next PATCH V3 1/3] net: adjust napi_consume_skb to handle none-NAPI callers
On Thu, 10 Mar 2016 20:21:55 +0300 Sergei Shtylyovwrote: > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -801,9 +801,9 @@ void napi_consume_skb(struct sk_buff *skb, int budget) > > if (unlikely(!skb)) > > return; > > > > - /* if budget is 0 assume netpoll w/ IRQs disabled */ > > + /* Zero budget indicate none-NAPI context called us, like netpoll */ > > Non-NAPI? Okay, I'll send a V4. Hope there are no more nitpicking changes... I'll also adjust the subj none-NAPI -> non-NAPI, and hope that does not disturb patchwork. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
[PATCHv2 (net.git) 1/2] Revert "stmmac: Fix 'eth0: No PHY found' regression"
This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493. due to problems on GeekBox and Banana Pi M1 board when connected to a real transceiver instead of a switch via fixed-link. Signed-off-by: Giuseppe CavallaroCc: Gabriel Fernandez Cc: Andreas Färber Cc: Frank Schäfer Cc: Dinh Nguyen Cc: David S. Miller --- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 11 ++- .../net/ethernet/stmicro/stmmac/stmmac_platform.c |9 + include/linux/stmmac.h |1 - 3 files changed, 11 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c index efb54f3..0faf163 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c @@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev) struct stmmac_priv *priv = netdev_priv(ndev); struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data; int addr, found; - struct device_node *mdio_node = priv->plat->mdio_node; + struct device_node *mdio_node = NULL; + struct device_node *child_node = NULL; if (!mdio_bus_data) return 0; if (IS_ENABLED(CONFIG_OF)) { + for_each_child_of_node(priv->device->of_node, child_node) { + if (of_device_is_compatible(child_node, + "snps,dwmac-mdio")) { + mdio_node = child_node; + break; + } + } + if (mdio_node) { netdev_dbg(ndev, "FOUND MDIO subnode\n"); } else { diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 4514ba7..6a52fa1 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -110,7 +110,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac) struct device_node *np = pdev->dev.of_node; struct plat_stmmacenet_data *plat; struct stmmac_dma_cfg *dma_cfg; - struct device_node *child_node = NULL; plat = devm_kzalloc(>dev, sizeof(*plat), GFP_KERNEL); if (!plat) @@ -141,19 +140,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac) plat->phy_node = of_node_get(np); } - for_each_child_of_node(np, child_node) - if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) { - plat->mdio_node = child_node; - break; - } - /* "snps,phy-addr" is not a standard property. Mark it as deprecated * and warn of its use. Remove this when phy node support is added. */ if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0) dev_warn(>dev, "snps,phy-addr property is deprecated\n"); - if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node) + if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name) plat->mdio_bus_data = NULL; else plat->mdio_bus_data = diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 881a79d..eead8ab 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -100,7 +100,6 @@ struct plat_stmmacenet_data { int interface; struct stmmac_mdio_bus_data *mdio_bus_data; struct device_node *phy_node; - struct device_node *mdio_node; struct stmmac_dma_cfg *dma_cfg; int clk_csr; int has_gmac; -- 1.7.4.4
[PATCHv2 (net.git) 2/2] stmmac: fix MDIO settings
Initially the phy_bus_name was added to manipulate the driver name but It was recently just used to manage the fixed-link and then to take some decision at run-time inside the main (for example to skip EEE). So the patch uses the is_pseudo_fixed_link and removes removes the phy_bus_name variable not necessary anymore. The driver can manage the mdio registration by using phy-handle, dwmac-mdio and own parameter e.g. snps,phy-addr. This patch takes care about all these possible configurations and fixes the mdio registration in case of there is a real transceiver or a switch (that needs to be managed by using fixed-link). Signed-off-by: Giuseppe CavallaroReviewed-by: Andreas Färber Tested-by: Frank Schäfer Cc: Gabriel Fernandez Cc: Dinh Nguyen Cc: David S. Miller --- V2: use is_pseudo_fixed_link drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 11 +++ .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 14 +- include/linux/stmmac.h |1 - 3 files changed, 8 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index c21015b..389d7d0 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -271,7 +271,6 @@ static void stmmac_eee_ctrl_timer(unsigned long arg) */ bool stmmac_eee_init(struct stmmac_priv *priv) { - char *phy_bus_name = priv->plat->phy_bus_name; unsigned long flags; bool ret = false; @@ -283,7 +282,7 @@ bool stmmac_eee_init(struct stmmac_priv *priv) goto out; /* Never init EEE in case of a switch is attached */ - if (phy_bus_name && (!strcmp(phy_bus_name, "fixed"))) + if (priv->phydev->is_pseudo_fixed_link) goto out; /* MAC core supports the EEE feature. */ @@ -820,12 +819,8 @@ static int stmmac_init_phy(struct net_device *dev) phydev = of_phy_connect(dev, priv->plat->phy_node, _adjust_link, 0, interface); } else { - if (priv->plat->phy_bus_name) - snprintf(bus_id, MII_BUS_ID_SIZE, "%s-%x", -priv->plat->phy_bus_name, priv->plat->bus_id); - else - snprintf(bus_id, MII_BUS_ID_SIZE, "stmmac-%x", -priv->plat->bus_id); + snprintf(bus_id, MII_BUS_ID_SIZE, "stmmac-%x", +priv->plat->bus_id); snprintf(phy_id_fmt, MII_BUS_ID_SIZE + 3, PHY_ID_FMT, bus_id, priv->plat->phy_addr); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 6a52fa1..ed33920 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -138,7 +138,11 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac) return ERR_PTR(-ENODEV); plat->phy_node = of_node_get(np); - } + } else + plat->mdio_bus_data = + devm_kzalloc(>dev, +sizeof(struct stmmac_mdio_bus_data), +GFP_KERNEL); /* "snps,phy-addr" is not a standard property. Mark it as deprecated * and warn of its use. Remove this when phy node support is added. @@ -146,14 +150,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac) if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0) dev_warn(>dev, "snps,phy-addr property is deprecated\n"); - if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name) - plat->mdio_bus_data = NULL; - else - plat->mdio_bus_data = - devm_kzalloc(>dev, -sizeof(struct stmmac_mdio_bus_data), -GFP_KERNEL); - of_property_read_u32(np, "tx-fifo-depth", >tx_fifo_size); of_property_read_u32(np, "rx-fifo-depth", >rx_fifo_size); diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index eead8ab..1b4884c 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -94,7 +94,6 @@ struct stmmac_dma_cfg { }; struct plat_stmmacenet_data { - char *phy_bus_name; int bus_id; int phy_addr; int interface; -- 1.7.4.4
[PATCHv2 (net.git) 0/2] stmmac: MDIO fixes
These two patches are to fix the recent regressions raised when test the stmmac on some platforms due to broken MDIO management. V2: use is_pseudo_fixed_link Giuseppe Cavallaro (2): Revert "stmmac: Fix 'eth0: No PHY found' regression" stmmac: fix MDIO settings drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 11 ++--- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 11 +- .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 21 --- include/linux/stmmac.h |2 - 4 files changed, 18 insertions(+), 27 deletions(-) -- 1.7.4.4
RE: [PATCH v2] can: rcar_canfd: Add Renesas R-Car CAN FD driver
Hi Oliver, Marc, > On 03/08/2016 01:48 PM, Ramesh Shanmugasundaram wrote: > > >> In fact you provided a CAN driver which is "CAN-FD-only". > > > > Yes. That's the status of current submission. (...) > > > > I did try this option earlier but there are two problems with this > method. > > > > 1) Below configuration is not possible > > > > ip link set can0 up type can bitrate 100 dbitrate 100 fd on > > > > "fd on" -> This is not allowed because CAN_CTRLMODE_FD bit is not set in > ctrlmode_supported. > > > > 2) If I ignore "fd on", my interface MTU stays as CAN_MTU only. If I > have to change the MTU alone to CANFD_MTU using another netlink message, > it again checks ctrlmode_supported where it would fail. I have the option > of providing my own change_mtu function & ignore this check but two > configuration messages are required for my driver alone :-(. > > > > Both these anomalies are addressed with the current check I have. > > Oh - you are right with complaining about this inconsistency. > > Can you check my RFC patch for Linux stable I just sent on the mailing > list? > http://marc.info/?l=linux-can=145745724917976=2 As we are fixing this issue in CAN dev.c, I'll remove this check in ndo_open and set CAN_CTRLMODE_FD flag in ctrlmode & remove the flag in ctrlmode_supported in the next v3 version of the patch. Are there any further comments on v2 patch please? Thanks, Ramesh
RE: [PATCH net v2 0/2] qlcnic fixes
>-Original Message- >From: David Miller [mailto:da...@davemloft.net] >Sent: Friday, March 11, 2016 2:47 AM >To: Rajesh Borundia>Cc: netdev ; Dept-GE Linux NIC Dev gelinuxnic...@qlogic.com> >Subject: Re: [PATCH net v2 0/2] qlcnic fixes > >From: Rajesh Borundia >Date: Tue, 8 Mar 2016 02:39:56 -0500 > >> This series adds following fixes. >> >> o While processing mailbox if driver gets a spurious mailbox >> interrupt it leads into premature completion of a next >> mailbox request. Added a guard against this by checking current >> state of mailbox and ignored spurious interrupt. >> Added a stats counter to record this condition. >> >> v2: >> >> o Added patch that removes usage of atomic_t as we are not implemeting >> atomicity by using atomic_t value. >> >> Please apply these fixes to net. > >As explained in other list postings, 'net' is basically closed for this >release cycle, >so I applied this series to 'net-next'. > Thanks. >Let me know if you'd like me to therefore queue these changes up for -stable. > Please queue the changes for stable. >Thanks.
[PATCH] kcm: fix variable type
Function skb_splice_bits can return negative values, its result should be assigned to signed variable to allow correct error checking. The problem has been detected using patch scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci. Signed-off-by: Andrzej Hajda--- net/kcm/kcmsock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 40662d73..0b68ba7 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -1483,7 +1483,7 @@ static ssize_t kcm_splice_read(struct socket *sock, loff_t *ppos, long timeo; struct kcm_rx_msg *rxm; int err = 0; - size_t copied; + ssize_t copied; struct sk_buff *skb; /* Only support splice for SOCKSEQPACKET */ -- 1.9.1
[PATCH net v2] r8169:Remove unnecessary phy reset for pcie nic when setting link spped.
For pcie nic, after setting link speed and there is no link driver does not need to do phy reset until link up. For some pcie nics, to do this will also reset phy speed down counter and prevent phy from auto speed down. This patch fix the issue reported in following link. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1547151 Signed-off-by: Chunhao Lin--- drivers/net/ethernet/realtek/r8169.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index dd2cf37..94f08f1 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -1999,7 +1999,8 @@ static int rtl8169_set_speed(struct net_device *dev, goto out; if (netif_running(dev) && (autoneg == AUTONEG_ENABLE) && - (advertising & ADVERTISED_1000baseT_Full)) { + (advertising & ADVERTISED_1000baseT_Full) && + !pci_is_pcie(tp->pci_dev)) { mod_timer(>timer, jiffies + RTL8169_PHY_TIMEOUT); } out: -- 1.9.1
[PATCH v3 net-next 1/2] net: hns: fix return value of the function about rss
Both .get_rxfh and .get_rxfh are always return 0, it should return result from hardware when getting or setting rss. And the rss function should return the correct data type. Signed-off-by: Kejian Yan--- change log: PATCH v3: - This patch removes unused variable 'ret' to fix the build warning PATCH v2: - This patch fixes the comments provided by Andy Shevchenko Link: https://lkml.org/lkml/2016/3/10/266 PATCH v1: - first submit Link: https://lkml.org/lkml/2016/3/9/978 --- drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 2 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 2 +- drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 14 -- 3 files changed, 6 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c index d4f92ed..d07db1f 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c @@ -799,7 +799,7 @@ static int hns_ae_set_rss(struct hnae_handle *handle, const u32 *indir, /* set the RSS Hash Key if specififed by the user */ if (key) - hns_ppe_set_rss_key(ppe_cb, (int *)key); + hns_ppe_set_rss_key(ppe_cb, (u32 *)key); /* update the shadow RSS table with user specified qids */ memcpy(ppe_cb->rss_indir_table, indir, HNS_PPEV2_RSS_IND_TBL_SIZE); diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c index f302ef9..811ef35 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c @@ -27,7 +27,7 @@ void hns_ppe_set_tso_enable(struct hns_ppe_cb *ppe_cb, u32 value) void hns_ppe_set_rss_key(struct hns_ppe_cb *ppe_cb, const u32 rss_key[HNS_PPEV2_RSS_KEY_NUM]) { - int key_item = 0; + u32 key_item = 0; for (key_item = 0; key_item < HNS_PPEV2_RSS_KEY_NUM; key_item++) dsaf_write_dev(ppe_cb, PPEV2_RSS_KEY_REG + key_item * 0x4, diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c index 3c4a3bc..01b65eb 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c @@ -1178,7 +1178,7 @@ hns_get_rss_key_size(struct net_device *netdev) if (AE_IS_VER1(priv->enet_ver)) { netdev_err(netdev, "RSS feature is not supported on this hardware\n"); - return -EOPNOTSUPP; + return (u32)-EOPNOTSUPP; } ops = priv->ae_handle->dev->ops; @@ -1197,7 +1197,7 @@ hns_get_rss_indir_size(struct net_device *netdev) if (AE_IS_VER1(priv->enet_ver)) { netdev_err(netdev, "RSS feature is not supported on this hardware\n"); - return -EOPNOTSUPP; + return (u32)-EOPNOTSUPP; } ops = priv->ae_handle->dev->ops; @@ -1211,7 +1211,6 @@ hns_get_rss(struct net_device *netdev, u32 *indir, u8 *key, u8 *hfunc) { struct hns_nic_priv *priv = netdev_priv(netdev); struct hnae_ae_ops *ops; - int ret; if (AE_IS_VER1(priv->enet_ver)) { netdev_err(netdev, @@ -1224,9 +1223,7 @@ hns_get_rss(struct net_device *netdev, u32 *indir, u8 *key, u8 *hfunc) if (!indir) return 0; - ret = ops->get_rss(priv->ae_handle, indir, key, hfunc); - - return 0; + return ops->get_rss(priv->ae_handle, indir, key, hfunc); } static int @@ -1235,7 +1232,6 @@ hns_set_rss(struct net_device *netdev, const u32 *indir, const u8 *key, { struct hns_nic_priv *priv = netdev_priv(netdev); struct hnae_ae_ops *ops; - int ret; if (AE_IS_VER1(priv->enet_ver)) { netdev_err(netdev, @@ -1252,9 +1248,7 @@ hns_set_rss(struct net_device *netdev, const u32 *indir, const u8 *key, if (!indir) return 0; - ret = ops->set_rss(priv->ae_handle, indir, key, hfunc); - - return 0; + return ops->set_rss(priv->ae_handle, indir, key, hfunc); } static struct ethtool_ops hns_ethtool_ops = { -- 1.9.1
[PATCH v3 net-next 0/2] net: hns: get and set RSS indirection table by using ethtool
When we use ethtool to retrieves or configure the receive flow hash indirection table, ethtool needs to call .get_rxnfc to get the ring number so this patchset implements the .get_rxnfc and fixes the bug that we can not get the tatal table each time. --- change log: PATCH v3: - This patchset fixes the building warning and error PATCH v2: - This patchset fixes the comments provided by Andy Shevchenko PATCH v1: - first submit Kejian Yan (2): net: hns: fix return value of the function about rss net: hns: fixes a bug of RSS drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 8 --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 2 +- drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 28 --- 3 files changed, 26 insertions(+), 12 deletions(-) -- 1.9.1
[PATCH v3 net-next 2/2] net: hns: fixes a bug of RSS
If trying to get receive flow hash indirection table by ethtool, it needs to call .get_rxnfc to get ring number first. So this patch implements the .get_rxnfc of ethtool. And the data type of rss_indir_table is u32, it has to be multiply by the width of data type when using memcpy. Signed-off-by: Kejian Yan--- change log: PATCH v3: - This patch modifies the return value of .get_rxnfc to fix building error PATCH v2: - This patch fixes the comments provided by Andy Shevchenko Link: https://lkml.org/lkml/2016/3/10/267 PATCH v1: - first submit Link: https://lkml.org/lkml/2016/3/9/981 --- drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 6 -- drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 18 ++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c index d07db1f..7b06e9b 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c @@ -787,7 +787,8 @@ static int hns_ae_get_rss(struct hnae_handle *handle, u32 *indir, u8 *key, memcpy(key, ppe_cb->rss_key, HNS_PPEV2_RSS_KEY_SIZE); /* update the current hash->queue mappings from the shadow RSS table */ - memcpy(indir, ppe_cb->rss_indir_table, HNS_PPEV2_RSS_IND_TBL_SIZE); + memcpy(indir, ppe_cb->rss_indir_table, + HNS_PPEV2_RSS_IND_TBL_SIZE * sizeof(*indir)); return 0; } @@ -802,7 +803,8 @@ static int hns_ae_set_rss(struct hnae_handle *handle, const u32 *indir, hns_ppe_set_rss_key(ppe_cb, (u32 *)key); /* update the shadow RSS table with user specified qids */ - memcpy(ppe_cb->rss_indir_table, indir, HNS_PPEV2_RSS_IND_TBL_SIZE); + memcpy(ppe_cb->rss_indir_table, indir, + HNS_PPEV2_RSS_IND_TBL_SIZE * sizeof(*indir)); /* now update the hardware */ hns_ppe_set_indir_table(ppe_cb, ppe_cb->rss_indir_table); diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c index 01b65eb..46379ce 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c @@ -1251,6 +1251,23 @@ hns_set_rss(struct net_device *netdev, const u32 *indir, const u8 *key, return ops->set_rss(priv->ae_handle, indir, key, hfunc); } +static int hns_get_rxnfc(struct net_device *netdev, +struct ethtool_rxnfc *cmd, +u32 *rule_locs) +{ + struct hns_nic_priv *priv = netdev_priv(netdev); + + switch (cmd->cmd) { + case ETHTOOL_GRXRINGS: + cmd->data = priv->ae_handle->q_num; + break; + default: + return -EOPNOTSUPP; + } + + return 0; +} + static struct ethtool_ops hns_ethtool_ops = { .get_drvinfo = hns_nic_get_drvinfo, .get_link = hns_nic_get_link, @@ -1274,6 +1291,7 @@ static struct ethtool_ops hns_ethtool_ops = { .get_rxfh_indir_size = hns_get_rss_indir_size, .get_rxfh = hns_get_rss, .set_rxfh = hns_set_rss, + .get_rxnfc = hns_get_rxnfc, }; void hns_ethtool_set_ops(struct net_device *ndev) -- 1.9.1
Re: [V9fs-developer] [PATCH] net/9p: convert to new CQ API
On 03/08/2016 09:38 AM, Dominique Martinet wrote: > Christoph Hellwig wrote on Thu, Mar 03, 2016: >> New version with the nits fixed below. Now that checkpath started >> a stupid warning about not using tabs for indentation which I've >> ignored here and will take up in my usual fights against Joes >> idicotic opinions separately.. > > Thanks for the nitpicks, I can confirm it works as expected as well so > all good with me. > I like the new CQ interface :) > > (if someone adds an Acked-by please use dominique.marti...@cea.fr for my > mail; sorry for the split personality) > Since I haven't heard anyone else say they are picking this up, I've grabbed it for 4.6. Thanks. -- Doug LedfordGPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [ovs-dev] [PATCH v2 net-next] ovs: allow nl 'flow set' to use ufid without flow key
On Thu, Mar 10, 2016 at 8:14 AM, Samuel Gauthierwrote: > When we want to change a flow using netlink, we have to identify it to > be able to perform a lookup. Both the flow key and unique flow ID > (ufid) are valid identifiers, but we always have to specify the flow > key in the netlink message. When both attributes are there, the ufid > is used. The flow key is used to validate the actions provided by > the userland. > > This commit allows to use the ufid without having to provide the flow > key, as it is already done in the netlink 'flow get' and 'flow del' > path. The flow key remains mandatory when an action is provided. > > Signed-off-by: Samuel Gauthier > --- > v2: > - Restore mask init and parsing > - Keep the flow key mandatory when an action is provided > Looks good. Acked-by: Pravin B Shelar
[net-next:master 1158/1168] net/sched/cls_flower.c:222:28: warning: cast from pointer to integer of different size
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: e8ab563f4b2e51849a16d962c6235b81e429c0d7 commit: 5b33f48842fa1e13e9c0ea8cc59c1d0df19042db [1158/1168] net/flower: Introduce hardware offload support config: i386-randconfig-r0-201610 (attached as .config) reproduce: git checkout 5b33f48842fa1e13e9c0ea8cc59c1d0df19042db # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): net/sched/cls_flower.c: In function 'fl_destroy': >> net/sched/cls_flower.c:222:28: warning: cast from pointer to integer of >> different size [-Wpointer-to-int-cast] fl_hw_destroy_filter(tp, (u64)f); ^ net/sched/cls_flower.c: In function 'fl_change': net/sched/cls_flower.c:557:9: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] (u64)fnew, ^ net/sched/cls_flower.c:563:28: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] fl_hw_destroy_filter(tp, (u64)fold); ^ net/sched/cls_flower.c: In function 'fl_delete': net/sched/cls_flower.c:591:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] fl_hw_destroy_filter(tp, (u64)f); ^ vim +222 net/sched/cls_flower.c 206 207 tc.type = TC_SETUP_CLSFLOWER; 208 tc.cls_flower = 209 210 dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol, ); 211 } 212 213 static bool fl_destroy(struct tcf_proto *tp, bool force) 214 { 215 struct cls_fl_head *head = rtnl_dereference(tp->root); 216 struct cls_fl_filter *f, *next; 217 218 if (!force && !list_empty(>filters)) 219 return false; 220 221 list_for_each_entry_safe(f, next, >filters, list) { > 222 fl_hw_destroy_filter(tp, (u64)f); 223 list_del_rcu(>list); 224 call_rcu(>rcu, fl_destroy_filter); 225 } 226 RCU_INIT_POINTER(tp->root, NULL); 227 if (head->mask_assigned) 228 rhashtable_destroy(>ht); 229 kfree_rcu(head, rcu); 230 return true; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH net 4/6] net: hns: adds uc match for debug port
On 2016/3/4 21:39, Sergei Shtylyov wrote: On 3/4/2016 4:09 AM, Daode Huang wrote: This patch adds uc match for debug port by: 1)Enables uc match of debug port when initializing gmac 2)Enables uc match of mac address register2 Signed-off-by: Daode HuangSigned-off-by: lipeng Lipeng is his full name. i will change it to another style (Peng Li ) True/full name is required here. --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 18 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h | 2 ++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c index b8517b0..2591a51 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c [...] @@ -407,8 +419,12 @@ static void hns_gmac_set_mac_addr(void *mac_drv, char *mac_addr) u32 low_val = mac_addr[5] | (mac_addr[4] << 8) | (mac_addr[3] << 16) | (mac_addr[2] << 24); + +u32 val = dsaf_read_dev(drv, GMAC_STATION_ADDR_HIGH_2_REG); +u32 sta_addr_en = dsaf_get_bit(val, GMAC_ADDR_EN_B); Empty line needed after declarations. agree, thanks Daode. dsaf_write_dev(drv, GMAC_STATION_ADDR_LOW_2_REG, low_val); -dsaf_write_dev(drv, GMAC_STATION_ADDR_HIGH_2_REG, high_val); +dsaf_write_dev(drv, GMAC_STATION_ADDR_HIGH_2_REG, + high_val | (sta_addr_en << GMAC_ADDR_EN_B)); } } [...] MBR, Sergei .
Re: [PATCH net 3/6] net: hns: fixed portid bug in sending manage pkt
On 2016/3/4 21:37, Sergei Shtylyov wrote: Hello. On 3/4/2016 4:09 AM, Daode Huang wrote: In V2 chip, when sending mamagement packets, the driver should config the port id to BD descs. Signed-off-by: Daode HuangSigned-off-by: Lisheng --- drivers/net/ethernet/hisilicon/hns/hnae.h | 3 +++ drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 1 + drivers/net/ethernet/hisilicon/hns/hns_enet.c | 4 3 files changed, 8 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h b/drivers/net/ethernet/hisilicon/hns/hnae.h index 1cbcb9f..11a3f97 100644 --- a/drivers/net/ethernet/hisilicon/hns/hnae.h +++ b/drivers/net/ethernet/hisilicon/hns/hnae.h [...] @@ -516,6 +518,7 @@ struct hnae_handle { int q_num; int vf_id; u32 eport_id; +u32 dport_id;/*v2 tx bd should fill the dport_id*/ Please add spaces after /* and before */ (like it's done in other places in this driver). Hi MBR, Sergei, Thanks for you comments, will change it in next version. Daode. [...] diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c index 6250a42..b45dcc2 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c @@ -69,6 +69,10 @@ static void fill_v2_desc(struct hnae_ring *ring, void *priv, hnae_set_bit(rrcfv, HNSV2_TXD_VLD_B, 1); hnae_set_field(bn_pid, HNSV2_TXD_BUFNUM_M, 0, buf_num - 1); +/*fill port_id in the tx bd for sending management pkts*/ Likewise. +hnae_set_field(bn_pid, HNSV2_TXD_PORTID_M, + HNSV2_TXD_PORTID_S, ring->q->handle->dport_id); + if (type == DESC_TYPE_SKB) { skb = (struct sk_buff *)priv; MBR, Sergei .
Re: [PATCH v2 net-next] ovs: allow nl 'flow set' to use ufid without flow key
On Thu, Mar 10, 2016 at 05:14:59PM +0100, Samuel Gauthier wrote: > When we want to change a flow using netlink, we have to identify it to > be able to perform a lookup. Both the flow key and unique flow ID > (ufid) are valid identifiers, but we always have to specify the flow > key in the netlink message. When both attributes are there, the ufid > is used. The flow key is used to validate the actions provided by > the userland. > > This commit allows to use the ufid without having to provide the flow > key, as it is already done in the netlink 'flow get' and 'flow del' > path. The flow key remains mandatory when an action is provided. > > Signed-off-by: Samuel GauthierReviewed-by: Simon Horman
Re: [PATCH net-next v5] tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In
On Thu, Mar 10, 2016 at 10:46 AM, Martin KaFai Lauwrote: > Per RFC4898, they count segments sent/received > containing a positive length data segment (that includes > retransmission segments carrying data). Unlike > tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments > carrying no data (e.g. pure ack). Acked-by: Eric Dumazet Thanks.
Re: [PATCH nf-next v10 8/8] openvswitch: Interface with NAT.
On 11 March 2016 at 07:54, Jarno Rajahalmewrote: > Extend OVS conntrack interface to cover NAT. New nested > OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action. > A bare OVS_CT_ATTR_NAT only mangles existing and expected connections. > If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested > attributes, new (non-committed/non-confirmed) connections are mangled > according to the rest of the nested attributes. > > The corresponding OVS userspace patch series includes test cases (in > tests/system-traffic.at) that also serve as example uses. > > This work extends on a branch by Thomas Graf at > https://github.com/tgraf/ovs/tree/nat. > > Signed-off-by: Jarno Rajahalme > Acked-by: Thomas Graf Acked-by: Joe Stringer
Re: Micrel Phy - Is there a way to configure the Phy not to do 802.3x flow control?
On 03/10/2016 02:38 PM, Murali Karicheri wrote: > On 03/10/2016 01:05 PM, Florian Fainelli wrote: >> On 10/03/16 08:48, Murali Karicheri wrote: >>> On 03/03/2016 07:16 PM, Florian Fainelli wrote: On 03/03/16 14:18, Murali Karicheri wrote: > Hi, > > We are using Micrel Phy in one of our board and wondering if we can force > the > Phy to disable flow control at start. I have a 1G ethernet switch > connected > to Phy and the phy always enable flow control. I would like to configure > the > phy not to flow control. Is that possible and if yes, what should I do in > the > my Ethernet driver to tell the Phy not to enable flow control? The PHY is not doing flow control per-se, your pseudo Ethernet MAC in the switch is doing, along with the link partner advertising support for it. You would want to make sure that your PHY device interface (provided that you are using the PHY library) is not starting with Pause advertised, but it could be supported. >>> >>> Understood that Phy is just advertise FC. The Micrel phy for 9031 advertise >>> by default FC supported. After negotiation, I see that Phylib provide the >>> link status with parameter pause = 1, asym_pause = 1. How do I tell the Phy >>> not >>> to advertise? >>> >>> I call following sequence in the Ethernet driver. >>> >>> of_phy_connect(x,y,hndlr,a,z); >> >> Here you should be able to change phydev->advertising and >> phydev->supported to mask the ADVERTISED_Pause | ADVERTISED_AsymPause >> bits and have phy_start() restart with that which should disable pause >> and asym_pause as seen by your adjust_link handler. >> > Ok. Good point. I will try this. Thanks for your suggestion. > I made following changes. The phylib still report flow control enabled to the driver. Some bug in the phylib/phydev? + + printk("slave->phy->supported %x, slave->phy->advertising %x\n", + slave->phy->supported, slave->phy->advertising); + slave->phy->supported &= + ~(SUPPORTED_Pause | SUPPORTED_Asym_Pause); + slave->phy->advertising = slave->phy->supported; + printk("slave->phy->supported %x, slave->phy->advertising %x\n", + slave->phy->supported, slave->phy->advertising); phy_start(slave->phy); + printk("slave->phy->supported %x, slave->phy->advertising %x\n", + slave->phy->supported, slave->phy->advertising); phy_read_status(slave->phy); [ 10.757001] slave->phy->supported 22ff, slave->phy->advertising 22ff [ 10.763354] slave->phy->supported 2ff, slave->phy->advertising 2ff [ 10.769552] slave->phy->supported 2ff, slave->phy->advertising 2ff [ 10.776045] netcp-1.0 2620110.netcp eth0: Link is Down udhcpc (v1.23.1) started Sending discover... Sending discover... [ 14.757280] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow control rx/tx Sending discover... Sending select for 158.218.103.170... Lease of 158.218.103.170 obtained, lease time 28800 /etc/udhcpc.d/50default: Adding DNS 192.0.2.2 /etc/udhcpc.d/50default: Adding DNS 192.0.2.3 > Murali >>> phy_start() >>> >>> Now in hndlr() I have pause = 1, asym_pause = 1, in phy_device ptr. How can >>> I tell the phy not to advertise initially? > > -- Murali Karicheri Linux Kernel, Keystone
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
On Thu, Mar 10, 2016 at 05:36:30PM -0500, David Miller wrote: > > > > Works like a charm! So David, what are the next steps then? > > Mind to gather all your patches into one (maybe)? > > I'll re-review all of the changes tomorrow and also look into ipv6 > masq, to see if it needs the same treatment, as well. > > Thanks for all of your help and testing so far. Thanks a lot, David!
Re: [PATCH 0/2] sh_eth: fix couple of bugs in sh_eth_ring_format()
From: Sergei ShtylyovDate: Fri, 11 Mar 2016 01:01:22 +0300 > On 03/11/2016 12:07 AM, David Miller wrote: > >>> Here's a set of 2 patches against DaveM's 'net.git' repo fixing two >>> bugs >>> in sh_eth_.ring_format()... >>> >>> [1/2] sh_eth: fix NULL pointer dereference in sh_eth_ring_format() >>> [2/2] sh_eth: advance 'rxdesc' later in sh_eth_ring_format() >> >> Since Linus is likely to release today or otherwise very soon I'm not >> putting things into 'net'. >> >> So I've applied this series to 'net-next', let me know if I should >> queue it up for stable. > >If your generally queue the error path fixes, then queue these two >please. Done.
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
From: Cyrill GorcunovDate: Fri, 11 Mar 2016 00:59:59 +0300 > On Fri, Mar 11, 2016 at 12:19:45AM +0300, Cyrill Gorcunov wrote: >> > >> > Oh yes they do, from masq's non-inet notifier. masq registers two >> > notifiers, one for generic netdev and one for inetdev. >> >> Thanks a huge David! I'll test it just to be sure. > > Works like a charm! So David, what are the next steps then? > Mind to gather all your patches into one (maybe)? I'll re-review all of the changes tomorrow and also look into ipv6 masq, to see if it needs the same treatment, as well. Thanks for all of your help and testing so far.
Re: [PATCH nf-next v10 7/8] openvswitch: Delay conntrack helper call for new connections.
Thanks for the reviews, Joe! Now we have acks for the patches 3-8, but not for 1 and 2 that touch netfilter proper. Who could review those? Jarno > On Mar 10, 2016, at 2:01 PM, Joe Stringerwrote: > > On 11 March 2016 at 07:54, Jarno Rajahalme wrote: >> There is no need to help connections that are not confirmed, so we can >> delay helping new connections to the time when they are confirmed. >> This change is needed for NAT support, and having this as a separate >> patch will make the following NAT patch a bit easier to review. >> >> Signed-off-by: Jarno Rajahalme > > Acked-by: Joe Stringer
[patch net-next] mlxsw: pci: Implement reset done check
From: Jiri PirkoFirmware now tells us that the reset is done by passing a magic value via register. Use it to shorten the wait in case this is supported. With old firmware, we still wait until the timeout is reached. Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/pci.c | 15 +++ drivers/net/ethernet/mellanox/mlxsw/pci.h | 3 +++ 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index 7992c55..7f4173c 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -1681,11 +1681,18 @@ static const struct mlxsw_bus mlxsw_pci_bus = { static int mlxsw_pci_sw_reset(struct mlxsw_pci *mlxsw_pci) { + unsigned long end; + mlxsw_pci_write32(mlxsw_pci, SW_RESET, MLXSW_PCI_SW_RESET_RST_BIT); - /* Current firware does not let us know when the reset is done. -* So we just wait here for constant time and hope for the best. -*/ - msleep(MLXSW_PCI_SW_RESET_TIMEOUT_MSECS); + wmb(); /* reset needs to be written before we read control register */ + end = jiffies + msecs_to_jiffies(MLXSW_PCI_SW_RESET_TIMEOUT_MSECS); + do { + u32 val = mlxsw_pci_read32(mlxsw_pci, FW_READY); + + if ((val & MLXSW_PCI_FW_READY_MASK) == MLXSW_PCI_FW_READY_MAGIC) + break; + cond_resched(); + } while (time_before(jiffies, end)); return 0; } diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.h b/drivers/net/ethernet/mellanox/mlxsw/pci.h index 9121060..d942a3e 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.h +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.h @@ -61,6 +61,9 @@ #define MLXSW_PCI_SW_RESET 0xF0010 #define MLXSW_PCI_SW_RESET_RST_BIT BIT(0) #define MLXSW_PCI_SW_RESET_TIMEOUT_MSECS 5000 +#define MLXSW_PCI_FW_READY 0xA1844 +#define MLXSW_PCI_FW_READY_MASK0xFF +#define MLXSW_PCI_FW_READY_MAGIC 0x5E #define MLXSW_PCI_DOORBELL_SDQ_OFFSET 0x000 #define MLXSW_PCI_DOORBELL_RDQ_OFFSET 0x200 -- 2.5.0
Re: [PATCH nf-next v10 3/8] openvswitch: Add commentary to conntrack.c
On 11 March 2016 at 07:54, Jarno Rajahalmewrote: > This makes the code easier to understand and the following patches > more focused. > > Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer
Re: [PATCH 0/2] sh_eth: fix couple of bugs in sh_eth_ring_format()
On 03/11/2016 12:07 AM, David Miller wrote: Here's a set of 2 patches against DaveM's 'net.git' repo fixing two bugs in sh_eth_.ring_format()... [1/2] sh_eth: fix NULL pointer dereference in sh_eth_ring_format() [2/2] sh_eth: advance 'rxdesc' later in sh_eth_ring_format() Since Linus is likely to release today or otherwise very soon I'm not putting things into 'net'. So I've applied this series to 'net-next', let me know if I should queue it up for stable. If your generally queue the error path fixes, then queue these two please. Thanks. My pleasure. :-) MBR, Sergei
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
On Fri, Mar 11, 2016 at 12:19:45AM +0300, Cyrill Gorcunov wrote: > > > > Oh yes they do, from masq's non-inet notifier. masq registers two > > notifiers, one for generic netdev and one for inetdev. > > Thanks a huge David! I'll test it just to be sure. Works like a charm! So David, what are the next steps then? Mind to gather all your patches into one (maybe)?
Re: [PATCH nf-next v10 7/8] openvswitch: Delay conntrack helper call for new connections.
On 11 March 2016 at 07:54, Jarno Rajahalmewrote: > There is no need to help connections that are not confirmed, so we can > delay helping new connections to the time when they are confirmed. > This change is needed for NAT support, and having this as a separate > patch will make the following NAT patch a bit easier to review. > > Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer
Re: [PATCH 1/3] dm9601: enable EP3 interrupt
> "Joseph" == Joseph CHANGwrites: > Enable chip's EP3 interrupt to get the link-up notify soon > immediately. Sorry, what do you mean about 'soon immediately'? > + > +/* Always return 8-bytes data to host per interrupt-interval */ > +dm_write_reg(dev, DM_USB_CTRL, USB_CTRL_EP3ACK); Why would we want to do that instead of the current setup that afaik only returns data when the link status changes? -- Bye, Peter Korsgaard
[PATCH] sctp: allow sctp_transmit_packet and others to use gfp
Currently sctp_sendmsg() triggers some calls that will allocate memory with GFP_ATOMIC even when not necessary. In the case of sctp_packet_transmit it will allocate a linear skb that will be used to construct the packet and this may cause sends to fail due to ENOMEM more often than anticipated specially with big MTUs. This patch thus allows it to inherit gfp flags from upper calls so that it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or similar. All others, like retransmits or flushes started from BH, are still allocated using GFP_ATOMIC. In netperf tests this didn't result in any performance drawbacks when memory is not too fragmented and made it trigger ENOMEM way less often. Signed-off-by: Marcelo Ricardo Leitner--- include/net/sctp/sm.h | 2 +- include/net/sctp/structs.h | 10 +++--- net/sctp/associola.c | 2 +- net/sctp/chunk.c | 6 ++-- net/sctp/input.c | 2 +- net/sctp/output.c | 6 ++-- net/sctp/outqueue.c| 30 - net/sctp/sm_make_chunk.c | 80 +++--- net/sctp/sm_sideeffect.c | 23 ++--- 9 files changed, 89 insertions(+), 72 deletions(-) diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h index 487ef34bbd63ff1cfe511c7ee8b1501593a14de3..efc01743b9d641bf6b16a37780ee0df34b4ec698 100644 --- a/include/net/sctp/sm.h +++ b/include/net/sctp/sm.h @@ -201,7 +201,7 @@ struct sctp_chunk *sctp_make_cwr(const struct sctp_association *, struct sctp_chunk * sctp_make_datafrag_empty(struct sctp_association *, const struct sctp_sndrcvinfo *sinfo, int len, const __u8 flags, - __u16 ssn); + __u16 ssn, gfp_t gfp); struct sctp_chunk *sctp_make_ecne(const struct sctp_association *, const __u32); struct sctp_chunk *sctp_make_sack(const struct sctp_association *); diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 205630bb5010b8ac76b84651b302e488fc1c76ff..0b65c16bbc2a837b2fd2aca4aa8cee5686feaf33 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -656,7 +656,7 @@ void sctp_chunk_free(struct sctp_chunk *); void *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data); struct sctp_chunk *sctp_chunkify(struct sk_buff *, const struct sctp_association *, -struct sock *); +struct sock *, gfp_t gfp); void sctp_init_addrs(struct sctp_chunk *, union sctp_addr *, union sctp_addr *); const union sctp_addr *sctp_source(const struct sctp_chunk *chunk); @@ -718,10 +718,10 @@ struct sctp_packet *sctp_packet_init(struct sctp_packet *, __u16 sport, __u16 dport); struct sctp_packet *sctp_packet_config(struct sctp_packet *, __u32 vtag, int); sctp_xmit_t sctp_packet_transmit_chunk(struct sctp_packet *, - struct sctp_chunk *, int); + struct sctp_chunk *, int, gfp_t); sctp_xmit_t sctp_packet_append_chunk(struct sctp_packet *, struct sctp_chunk *); -int sctp_packet_transmit(struct sctp_packet *); +int sctp_packet_transmit(struct sctp_packet *, gfp_t); void sctp_packet_free(struct sctp_packet *); static inline int sctp_packet_empty(struct sctp_packet *packet) @@ -1054,7 +1054,7 @@ struct sctp_outq { void sctp_outq_init(struct sctp_association *, struct sctp_outq *); void sctp_outq_teardown(struct sctp_outq *); void sctp_outq_free(struct sctp_outq*); -int sctp_outq_tail(struct sctp_outq *, struct sctp_chunk *chunk); +int sctp_outq_tail(struct sctp_outq *, struct sctp_chunk *chunk, gfp_t); int sctp_outq_sack(struct sctp_outq *, struct sctp_chunk *); int sctp_outq_is_empty(const struct sctp_outq *); void sctp_outq_restart(struct sctp_outq *); @@ -1062,7 +1062,7 @@ void sctp_outq_restart(struct sctp_outq *); void sctp_retransmit(struct sctp_outq *, struct sctp_transport *, sctp_retransmit_reason_t); void sctp_retransmit_mark(struct sctp_outq *, struct sctp_transport *, __u8); -int sctp_outq_uncork(struct sctp_outq *); +int sctp_outq_uncork(struct sctp_outq *, gfp_t gfp); /* Uncork and flush an outqueue. */ static inline void sctp_outq_cork(struct sctp_outq *q) { diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 2bf8ec92dde482ed6ab59275aad492d5abc5385e..24d2f6fffbc52bedbcd4efec82eaf834f0c75613 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -1493,7 +1493,7 @@ void sctp_assoc_rwnd_increase(struct sctp_association *asoc, unsigned int len) asoc->peer.sack_needed = 0; - sctp_outq_tail(>outqueue, sack); + sctp_outq_tail(>outqueue, sack,
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
On Thu, Mar 10, 2016 at 1:47 AM, Nicolas Dichtelwrote: > Le 09/03/2016 22:49, Mahesh Bandewar a écrit : >> >> From: Mahesh Bandewar >> >> One of the major request (for enhancement) that I have received >> from various users of IPvlan in L3 mode is its inability to handle >> IPtables. >> >> While looking at the code and how we handle ingress, the problem >> can be attributed to the asymmetry in the way packets get processed >> for IPvlan devices configured in L3 mode. L3 mode is supposed to >> be restrictive and all the L3 decisions need to be taken for the >> traffic in master's ns. This does happen as expected for egress >> traffic however on ingress traffic, the IPvlan packet-handler >> changes the skb->dev and this forces packet to be processed with >> the IPvlan slave and it's associated ns. This causes above mentioned >> problem and few other which are not yet reported / attempted. e.g. >> IPsec with L3 mode or even ingress routing. >> >> This could have been solved if we had a way to handover packet to >> slave and associated ns after completing the L3 phase. This is a >> non-trivial issue to fix especially looking at IPsec code. >> >> This patch series attempts to solve this problem by introducing the >> device pointer l3_dev which resides in net_device structure in the >> RX cache line. We initialize the l3_dev to self. This would mean >> there is no complex logic to when-and-how-to initialize it. Now >> the stack will use this dev pointer during the L3 phase. This should >> not alter any existing properties / behavior and also there should >> not be any additional penalties since it resides in the same RX >> cache line. > > If I understand correctly (and as Cong already said), information are > leaking > between netns during the input phase. On the tx side, skb_scrub_packet() is > called, but not on the rx side. I think it's wrong. There should be an > explicit > boundary. That is not what I am complaining about. I dislike the trick of switching skb->dev pointer with skb->dev->l3_dev. This is not how we switch netns, nor the way how netns works. Look at veth pair or dev_change_net_namespace(), each time when we switch netns, we need to do a full reregistration or a full reentrance, we never just switch some pointers to switch netns. This is why I said it breaks isolation. Also, it is ugly to hide such a ipvlan-specific pointer for half of the RX code path.
Re: [PATCH net-next V3 00/10] cls_flower hardware offload support
From: Amir VadaiDate: Tue, 8 Mar 2016 12:42:28 +0200 > Please see changes from V2 at the bottom. > > This patchset introduces cls_flower hardware offload support over ConnectX-4 > driver, more hardware vendors are welcome to use it too. ... Series applied, thanks for retaining detailed change history in this series header posting.
Re: [PATCH V5 0/4] net-next: mediatek: add ethernet driver
From: John CrispinDate: Tue, 8 Mar 2016 11:29:53 +0100 > This series adds support for the Mediatek ethernet core found on current ARM > based SoCs. The driver works on MT2701 and MT7623 SoCs > > Instead of trying to upstream everything at once I decided to concentrate on > the important parts required to make current generation silicon work. The V3 > series only includes the code required to make dual MAC setups work and only > supports the newer QDMA engine. ... Series applied, thanks.
Re: [PATCH] net: dsa: Fix cleanup resources upon module removal
From: Neil ArmstrongDate: Tue, 8 Mar 2016 10:36:20 +0100 > The initial commit badly merged into the dsa_resume method instead > of the dsa_remove_dst method. > As consequence, the dst->master_netdev->dsa_ptr is not set to NULL on > removal and re-bind of the dsa device fails with error -17. > > Fixes: b0dc635d923c ("net: dsa: cleanup resources upon module removal ") > Signed-off-by: Neil Armstrong > --- > net/dsa/dsa.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > David, Florian, Andrew, > > This fix is quite urgent since it breaks all the removal cleanup. Since 'net' is closed, I've applied this to 'net-next' and queue it up for -stable. Thanks.
Re: [PATCH net-next] net: dsa: mv88e6xxx: rework port state setter
From: Vivien DidelotDate: Mon, 7 Mar 2016 18:24:17 -0500 > Apply a few non-functional changes on the port state setter: > > * add a dynamic debug message with state names to track changes > * explicit states checking instead of assuming their numeric values > * lock mutex only once when changing several port states > * use bitmap macros to declare and access port_state_update_mask > > Signed-off-by: Vivien Didelot Applied.
Re: [PATCH net-next] net: dsa: mv88e6xxx: avoid writing the same mode
From: Vivien DidelotDate: Mon, 7 Mar 2016 18:24:52 -0500 > There is no need to change the 802.1Q port mode for the same value. > Thus avoid such message: > > [ 401.954836] dsa dsa@0 lan0: 802.1Q Mode: Disabled (was Disabled) > > Signed-off-by: Vivien Didelot Applied.
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
On Thu, Mar 10, 2016 at 04:05:21PM -0500, David Miller wrote: > > > > and nobody calls for nf_ct_iterate_cleanup, no? > > Oh yes they do, from masq's non-inet notifier. masq registers two > notifiers, one for generic netdev and one for inetdev. Thanks a huge David! I'll test it just to be sure. Cyrill
Re: [PATCH net-next 1/1] qede: Fix net-next "make ARCH=x86_64"
From: Manish ChopraDate: Tue, 8 Mar 2016 04:09:44 -0500 > 'commit 55482edc25f0606851de42e73618f813f310d009 > ("qede: Add slowpath/fastpath support and enable hardware GRO")' > introduces below error when compiling net-next with "make ARCH=x86_64" > > drivers/built-in.o: In function `qede_rx_int': > qede_main.c:(.text+0x6101a0): undefined reference to `tcp_gro_complete' > > Signed-off-by: Manish Chopra Applied, thank you.
Re: [PATCH net] r8169:Remove unnecessary phy reset for pcie nic when setting link spped.
From: Chunhao LinDate: Tue, 8 Mar 2016 16:51:05 +0800 > For pcie nic, after setting link speed and thers is no link driver does not > need > to do phy reset untill link up. "there's", "until" > For some pcie nics, to do this will also reset phy speed down counter and > prevent > phy from auto speed down. Please fix these typos and resubmit, thanks.
Re: [net-next] arp: add macro to get drop_gratuitous_arp setting
From: Zhang ShengjuDate: Tue, 8 Mar 2016 07:53:50 + > Add macro IN_DEV_DROP_GRATUITOUS_ARP to facilitate getting > drop_gratuitous_arp value. > > Signed-off-by: Zhang Shengju As it's used in one location, I see zero value in this, sorry. I'm not applying this patch.
Re: [PATCH net v2 0/2] qlcnic fixes
From: Rajesh BorundiaDate: Tue, 8 Mar 2016 02:39:56 -0500 > This series adds following fixes. > > o While processing mailbox if driver gets a spurious mailbox > interrupt it leads into premature completion of a next > mailbox request. Added a guard against this by checking current > state of mailbox and ignored spurious interrupt. > Added a stats counter to record this condition. > > v2: > > o Added patch that removes usage of atomic_t as we are not implemeting > atomicity by using atomic_t value. > > Please apply these fixes to net. As explained in other list postings, 'net' is basically closed for this release cycle, so I applied this series to 'net-next'. Let me know if you'd like me to therefore queue these changes up for -stable. Thanks.
Re: [PATCH] include/net/inet_connection_sock.h: Use pr_devel() instead of pr_debug()
From: Nick WangDate: Tue, 8 Mar 2016 13:52:28 +0800 > File "inet_connection_sock.h" is a common share header that not can > be use for one module, so use pr_devel instead of pr_debug is OK. Not really, we only want these printks to do anything only when debug printk's are enabled. We don't want the overhead otherwise. You'll need to find another fix for this, sorry.
Re: [PATCH net-next 0/4] cxgb4vf: Interrupt and queue configuration changes
From: Hariprasad ShenaiDate: Tue, 8 Mar 2016 10:50:16 +0530 > This series fixes some issues and some changes in the queue and interrupt > configuration for cxgb4vf driver. We need to enable interrupts before we > register our network device, so that we don't loose link up interrupts. > Allocate rx queues based on interrupt type. Set number of tx/rx queues in > probe function only. Also adds check for some invalid configurations. > > This patch series has been created against net-next tree and includes > patches on cxgb4vf driver. > > We have included all the maintainers of respective drivers. Kindly review > the change and let us know in case of any review comments. Series applied, thanks.
Re: [PATCH net-next] net: dsa: mv88e6xxx: read then write PVID
From: Vivien DidelotDate: Mon, 7 Mar 2016 18:24:39 -0500 > The port register 0x07 contains more options than just the default VID, > even though they are not used yet. So prefer a read then write operation > over a direct write. > > This also allows to keep track of the change through dynamic debug. > > Signed-off-by: Vivien Didelot Applied.
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
On Thu, Mar 10, 2016 at 11:55 AM, David Millerwrote: > Indeed, good catch. Therefore: > > 1) Keep the masq netdev notifier. That will flush the conntrack table >for the inetdev_destroy event. > > 2) Make the inetdev notifier only do something if inetdev->dead is >false. (ie. we are flushing an individual address) > > And then we don't need the NETDEV_UNREGISTER thing at all: This makes sense to me. I guess similar thing needs to do for IPv6 masq too. Thanks.
Re: [PATCH 0/2] sh_eth: fix couple of bugs in sh_eth_ring_format()
From: Sergei ShtylyovDate: Tue, 08 Mar 2016 01:33:38 +0300 >Here's a set of 2 patches against DaveM's 'net.git' repo fixing two bugs > in sh_eth_.ring_format()... > > [1/2] sh_eth: fix NULL pointer dereference in sh_eth_ring_format() > [2/2] sh_eth: advance 'rxdesc' later in sh_eth_ring_format() Since Linus is likely to release today or otherwise very soon I'm not putting things into 'net'. So I've applied this series to 'net-next', let me know if I should queue it up for stable. Thanks.
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
From: Cyrill GorcunovDate: Thu, 10 Mar 2016 23:13:51 +0300 > On Thu, Mar 10, 2016 at 03:03:11PM -0500, David Miller wrote: >> From: Cyrill Gorcunov >> Date: Thu, 10 Mar 2016 23:01:34 +0300 >> >> > On Thu, Mar 10, 2016 at 02:55:43PM -0500, David Miller wrote: >> >> > >> >> > Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER >> >> > is happening and masq already registers a netdev notifier... >> >> >> >> Indeed, good catch. Therefore: >> >> >> >> 1) Keep the masq netdev notifier. That will flush the conntrack table >> >>for the inetdev_destroy event. >> >> >> >> 2) Make the inetdev notifier only do something if inetdev->dead is >> >>false. (ie. we are flushing an individual address) >> >> >> >> And then we don't need the NETDEV_UNREGISTER thing at all: >> >> >> >> diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> >> b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> >> index c6eb421..f71841a 100644 >> >> --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> >> +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> >> @@ -108,10 +108,20 @@ static int masq_inet_event(struct notifier_block >> >> *this, >> >> unsigned long event, >> >> void *ptr) >> >> { >> >> - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; >> >> struct netdev_notifier_info info; >> >> + struct in_ifaddr *ifa = ptr; >> >> + struct in_device *idev; >> >> >> >> - netdev_notifier_info_init(, dev); >> >> + /* The masq_dev_notifier will catch the case of the device going >> >> + * down. So if the inetdev is dead and being destroyed we have >> >> + * no work to do. Otherwise this is an individual address removal >> >> + * and we have to perform the flush. >> >> + */ >> >> + idev = ifa->ifa_dev; >> >> + if (idev->dead) >> >> + return NOTIFY_DONE; >> >> + >> >> + netdev_notifier_info_init(, idev->dev); >> >> return masq_device_event(this, event, ); >> >> } >> > >> > Guys, I'm lost. Currently masq_device_event calls for conntrack >> > cleanup with device index, so that once device is going down, the >> > appropriate conntracks gonna be dropped off. Now if device is dead >> > nobody will cleanup the conntracks? >> >> Both notifiers are run in the inetdev_destroy() case. >> >> Maybe that's what you are missing. > > No :) Look, here is what I mean. Previously with your two patches > we've been calling nf-cleanup for every address, so we had to make > code call for cleanup for one time only. Now with the patch above > the code flow is the following > > inetdev_destroy > in_dev->dead = 1; > ... > inet_del_ifa > ... > blocking_notifier_call_chain(_chain, NETDEV_DOWN, > ifa1); > ... > masq_inet_event >... > masq_device_event > if (idev->dead) > return NOTIFY_DONE; > > and nobody calls for nf_ct_iterate_cleanup, no? Oh yes they do, from masq's non-inet notifier. masq registers two notifiers, one for generic netdev and one for inetdev.
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
On Thu, Mar 10, 2016 at 11:13:51PM +0300, Cyrill Gorcunov wrote: > > > > Both notifiers are run in the inetdev_destroy() case. > > > > Maybe that's what you are missing. > > No :) Look, here is what I mean. Previously with your two patches > we've been calling nf-cleanup for every address, so we had to make > code call for cleanup for one time only. Now with the patch above > the code flow is the following Ah, I'm idiot, drop the question.
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
On Thu, Mar 10, 2016 at 03:03:11PM -0500, David Miller wrote: > From: Cyrill Gorcunov> Date: Thu, 10 Mar 2016 23:01:34 +0300 > > > On Thu, Mar 10, 2016 at 02:55:43PM -0500, David Miller wrote: > >> > > >> > Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER > >> > is happening and masq already registers a netdev notifier... > >> > >> Indeed, good catch. Therefore: > >> > >> 1) Keep the masq netdev notifier. That will flush the conntrack table > >>for the inetdev_destroy event. > >> > >> 2) Make the inetdev notifier only do something if inetdev->dead is > >>false. (ie. we are flushing an individual address) > >> > >> And then we don't need the NETDEV_UNREGISTER thing at all: > >> > >> diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > >> b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > >> index c6eb421..f71841a 100644 > >> --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > >> +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > >> @@ -108,10 +108,20 @@ static int masq_inet_event(struct notifier_block > >> *this, > >> unsigned long event, > >> void *ptr) > >> { > >> - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; > >>struct netdev_notifier_info info; > >> + struct in_ifaddr *ifa = ptr; > >> + struct in_device *idev; > >> > >> - netdev_notifier_info_init(, dev); > >> + /* The masq_dev_notifier will catch the case of the device going > >> + * down. So if the inetdev is dead and being destroyed we have > >> + * no work to do. Otherwise this is an individual address removal > >> + * and we have to perform the flush. > >> + */ > >> + idev = ifa->ifa_dev; > >> + if (idev->dead) > >> + return NOTIFY_DONE; > >> + > >> + netdev_notifier_info_init(, idev->dev); > >>return masq_device_event(this, event, ); > >> } > > > > Guys, I'm lost. Currently masq_device_event calls for conntrack > > cleanup with device index, so that once device is going down, the > > appropriate conntracks gonna be dropped off. Now if device is dead > > nobody will cleanup the conntracks? > > Both notifiers are run in the inetdev_destroy() case. > > Maybe that's what you are missing. No :) Look, here is what I mean. Previously with your two patches we've been calling nf-cleanup for every address, so we had to make code call for cleanup for one time only. Now with the patch above the code flow is the following inetdev_destroy in_dev->dead = 1; ... inet_del_ifa ... blocking_notifier_call_chain(_chain, NETDEV_DOWN, ifa1); ... masq_inet_event ... masq_device_event if (idev->dead) return NOTIFY_DONE; and nobody calls for nf_ct_iterate_cleanup, no?
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
From: Cyrill GorcunovDate: Thu, 10 Mar 2016 23:01:34 +0300 > On Thu, Mar 10, 2016 at 02:55:43PM -0500, David Miller wrote: >> > >> > Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER >> > is happening and masq already registers a netdev notifier... >> >> Indeed, good catch. Therefore: >> >> 1) Keep the masq netdev notifier. That will flush the conntrack table >>for the inetdev_destroy event. >> >> 2) Make the inetdev notifier only do something if inetdev->dead is >>false. (ie. we are flushing an individual address) >> >> And then we don't need the NETDEV_UNREGISTER thing at all: >> >> diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> index c6eb421..f71841a 100644 >> --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c >> @@ -108,10 +108,20 @@ static int masq_inet_event(struct notifier_block *this, >> unsigned long event, >> void *ptr) >> { >> -struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; >> struct netdev_notifier_info info; >> +struct in_ifaddr *ifa = ptr; >> +struct in_device *idev; >> >> -netdev_notifier_info_init(, dev); >> +/* The masq_dev_notifier will catch the case of the device going >> + * down. So if the inetdev is dead and being destroyed we have >> + * no work to do. Otherwise this is an individual address removal >> + * and we have to perform the flush. >> + */ >> +idev = ifa->ifa_dev; >> +if (idev->dead) >> +return NOTIFY_DONE; >> + >> +netdev_notifier_info_init(, idev->dev); >> return masq_device_event(this, event, ); >> } > > Guys, I'm lost. Currently masq_device_event calls for conntrack > cleanup with device index, so that once device is going down, the > appropriate conntracks gonna be dropped off. Now if device is dead > nobody will cleanup the conntracks? Both notifiers are run in the inetdev_destroy() case. Maybe that's what you are missing.
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
On Thu, Mar 10, 2016 at 02:55:43PM -0500, David Miller wrote: > > > > Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER > > is happening and masq already registers a netdev notifier... > > Indeed, good catch. Therefore: > > 1) Keep the masq netdev notifier. That will flush the conntrack table >for the inetdev_destroy event. > > 2) Make the inetdev notifier only do something if inetdev->dead is >false. (ie. we are flushing an individual address) > > And then we don't need the NETDEV_UNREGISTER thing at all: > > diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > index c6eb421..f71841a 100644 > --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > @@ -108,10 +108,20 @@ static int masq_inet_event(struct notifier_block *this, > unsigned long event, > void *ptr) > { > - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; > struct netdev_notifier_info info; > + struct in_ifaddr *ifa = ptr; > + struct in_device *idev; > > - netdev_notifier_info_init(, dev); > + /* The masq_dev_notifier will catch the case of the device going > + * down. So if the inetdev is dead and being destroyed we have > + * no work to do. Otherwise this is an individual address removal > + * and we have to perform the flush. > + */ > + idev = ifa->ifa_dev; > + if (idev->dead) > + return NOTIFY_DONE; > + > + netdev_notifier_info_init(, idev->dev); > return masq_device_event(this, event, ); > } Guys, I'm lost. Currently masq_device_event calls for conntrack cleanup with device index, so that once device is going down, the appropriate conntracks gonna be dropped off. Now if device is dead nobody will cleanup the conntracks? Cyrill
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
From: Cong WangDate: Thu, 10 Mar 2016 11:02:28 -0800 > On Thu, Mar 10, 2016 at 10:01 AM, David Miller wrote: >> I'm tempted to say that we should provide these notifier handlers with >> the information they need, explicitly, to handle this case. >> >> Most intdev notifiers actually want to know the individual addresses >> that get removed, one by one. That's handled by the existing >> NETDEV_DOWN event and the ifa we pass to that. >> >> But some, like this netfilter masq case, would be satisfied with a >> single event that tells them the whole inetdev instance is being torn >> down. Which is the case we care about here. >> >> We currently don't use NETDEV_UNREGISTER for inetdev notifiers, so >> maybe we could use that. >> >> And that is consistent with the core netdev notifier that triggers >> this call chain in the first place. >> >> Roughly, something like this: >> >> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c >> index 8c3df2c..6eee5cb 100644 >> --- a/net/ipv4/devinet.c >> +++ b/net/ipv4/devinet.c >> @@ -292,6 +292,11 @@ static void inetdev_destroy(struct in_device *in_dev) >> >> in_dev->dead = 1; >> >> + if (in_dev->ifa_list) >> + blocking_notifier_call_chain(_chain, >> +NETDEV_UNREGISTER, >> +in_dev->ifa_list); >> + >> ip_mc_destroy_dev(in_dev); > > > Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER > is happening and masq already registers a netdev notifier... Indeed, good catch. Therefore: 1) Keep the masq netdev notifier. That will flush the conntrack table for the inetdev_destroy event. 2) Make the inetdev notifier only do something if inetdev->dead is false. (ie. we are flushing an individual address) And then we don't need the NETDEV_UNREGISTER thing at all: diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c index c6eb421..f71841a 100644 --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c @@ -108,10 +108,20 @@ static int masq_inet_event(struct notifier_block *this, unsigned long event, void *ptr) { - struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; struct netdev_notifier_info info; + struct in_ifaddr *ifa = ptr; + struct in_device *idev; - netdev_notifier_info_init(, dev); + /* The masq_dev_notifier will catch the case of the device going +* down. So if the inetdev is dead and being destroyed we have +* no work to do. Otherwise this is an individual address removal +* and we have to perform the flush. +*/ + idev = ifa->ifa_dev; + if (idev->dead) + return NOTIFY_DONE; + + netdev_notifier_info_init(, idev->dev); return masq_device_event(this, event, ); }
Re: [PATCH] kcm: mark helper functions inline
From: Arnd BergmannDate: Thu, 10 Mar 2016 19:31:12 +0100 > The stub helper functions for the newly added kcm_proc_init/exit interfaces > are defined as 'static' in a header file, which leads to build warnings for > each file that includes them without calling them: > > include/net/kcm.h:183:12: error: 'kcm_proc_init' defined but not used > [-Werror=unused-function] > include/net/kcm.h:184:13: error: 'kcm_proc_exit' defined but not used > [-Werror=unused-function] > > This marks the two functions as 'static inline' instead, which avoids the > warnings and is obviously what was meant here. > > Signed-off-by: Arnd Bergmann > Fixes: cd6e111bf5be ("kcm: Add statistics and proc interfaces") Applied, thanks Arnd.
Re: [PATCH 1/2] net: thunderx: Set recevie buffer page usage count in bulk
From: Sunil KovvuriDate: Thu, 10 Mar 2016 23:57:48 +0530 > Difference between NIU driver and this patch is there it's > calculate split count, increment page count and then divide page into > buffers. Here it's divide page into buffers, have a counter which increments > at every split and then at the end do a atomic increment of page->_count. > > Any issue with this approach ? I guess not.
Re: Micrel Phy - Is there a way to configure the Phy not to do 802.3x flow control?
On 03/10/2016 01:05 PM, Florian Fainelli wrote: > On 10/03/16 08:48, Murali Karicheri wrote: >> On 03/03/2016 07:16 PM, Florian Fainelli wrote: >>> On 03/03/16 14:18, Murali Karicheri wrote: Hi, We are using Micrel Phy in one of our board and wondering if we can force the Phy to disable flow control at start. I have a 1G ethernet switch connected to Phy and the phy always enable flow control. I would like to configure the phy not to flow control. Is that possible and if yes, what should I do in the my Ethernet driver to tell the Phy not to enable flow control? >>> >>> The PHY is not doing flow control per-se, your pseudo Ethernet MAC in >>> the switch is doing, along with the link partner advertising support for >>> it. You would want to make sure that your PHY device interface (provided >>> that you are using the PHY library) is not starting with Pause >>> advertised, but it could be supported. >> >> Understood that Phy is just advertise FC. The Micrel phy for 9031 advertise >> by default FC supported. After negotiation, I see that Phylib provide the >> link status with parameter pause = 1, asym_pause = 1. How do I tell the Phy >> not >> to advertise? >> >> I call following sequence in the Ethernet driver. >> >> of_phy_connect(x,y,hndlr,a,z); > > Here you should be able to change phydev->advertising and > phydev->supported to mask the ADVERTISED_Pause | ADVERTISED_AsymPause > bits and have phy_start() restart with that which should disable pause > and asym_pause as seen by your adjust_link handler. > Ok. Good point. I will try this. Thanks for your suggestion. Murali >> phy_start() >> >> Now in hndlr() I have pause = 1, asym_pause = 1, in phy_device ptr. How can >> I tell the phy not to advertise initially? -- Murali Karicheri Linux Kernel, Keystone
Re: net: use-after-free in recvmmsg
Em Thu, Mar 10, 2016 at 07:35:57PM +0100, Dmitry Vyukov escreveu: > On Tue, Jan 26, 2016 at 8:30 PM, Arnaldo Carvalho de Melo >wrote: > > Em Tue, Jan 26, 2016 at 08:27:48PM +0100, Dmitry Vyukov escreveu: > >> On Fri, Jan 22, 2016 at 10:16 PM, Arnaldo Carvalho de Melo > >> wrote: > >> > Em Fri, Jan 22, 2016 at 09:39:53PM +0100, Dmitry Vyukov escreveu: > >> >> I am on commit 30f05309bde49295e02e45c7e615f73aa4e0ccc2 (Jan 20). > >> >> Seems to be added in commit a2e2725541fad72416326798c2d7fa4dafb7d337 > >> >> (Oct 2009). > >> > > >> > Maybe this helps? Compile testing now... > >> > >> > >> I don't have a reliable reproducer, so can't test it per se. > >> I will integrate this patch tomorrow and restart fuzzer with it. > > > > Thanks a lot! > > Hi Arnaldo, > > I am running with that patch since then, and did not see the bug. > Please mail it as a proper patch. Thanks, and I'll add a: Reported-and-Tested-by: Dmitry Vyukov Ok? - Arnaldo
Re:Money Clips
Dear Sir or Madam We have some stock of items. if you are retailer that will be good for you. No MOQ demand. prompt shipment. If you are interested pls do feel free to contact us Best whishes Tom
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
On Thu, Mar 10, 2016 at 10:01 AM, David Millerwrote: > I'm tempted to say that we should provide these notifier handlers with > the information they need, explicitly, to handle this case. > > Most intdev notifiers actually want to know the individual addresses > that get removed, one by one. That's handled by the existing > NETDEV_DOWN event and the ifa we pass to that. > > But some, like this netfilter masq case, would be satisfied with a > single event that tells them the whole inetdev instance is being torn > down. Which is the case we care about here. > > We currently don't use NETDEV_UNREGISTER for inetdev notifiers, so > maybe we could use that. > > And that is consistent with the core netdev notifier that triggers > this call chain in the first place. > > Roughly, something like this: > > diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c > index 8c3df2c..6eee5cb 100644 > --- a/net/ipv4/devinet.c > +++ b/net/ipv4/devinet.c > @@ -292,6 +292,11 @@ static void inetdev_destroy(struct in_device *in_dev) > > in_dev->dead = 1; > > + if (in_dev->ifa_list) > + blocking_notifier_call_chain(_chain, > +NETDEV_UNREGISTER, > +in_dev->ifa_list); > + > ip_mc_destroy_dev(in_dev); Hmm, but inetdev_destroy() is only called when NETDEV_UNREGISTER is happening and masq already registers a netdev notifier... > > while ((ifa = in_dev->ifa_list) != NULL) { > diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > index c6eb421..1bb8026 100644 > --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c > @@ -111,6 +111,10 @@ static int masq_inet_event(struct notifier_block *this, > struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; > struct netdev_notifier_info info; > > + if (event != NETDEV_UNREGISTER) > + return NOTIFY_DONE; > + event = NETDEV_DOWN; > + > netdev_notifier_info_init(, dev); > return masq_device_event(this, event, ); > } If masq really doesn't care about inetdev destroy or inetaddr removal, we should just remove its inetaddr notifier.
Re: [RFC/RFT] mac80211: implement fq_codel for software queuing
>> regular fq_codel uses 1024 and there has not been much reason to >> change it. In the case of an AP which has more limited memory, 256 or >> 1024 would be a good setting, per station. I'd stick to 1024 for now. > > Do note that the 4096 is shared _across_ station-tid queues. It is not > per-station. If you have 10 stations you still have 4096 flows > (actually 4096 + 16*10, because each tid - and there are 16 - has it's > own fallback flow in case of hash collision on the global flowmap to > maintain per-sta-tid queuing). I have to admit I didn't parse this well - still haven't, I think I need to draw. (got a picture?) Where is this part happening in the code (or firmware?) " because each tid - and there are 16 - has it's own fallback flow in case of hash collision on the global flowmap to maintain per-sta-tid queuing" "fallback flow - hash collision on global flowmap" - huh? > With that in mind do you still think 1024 is enough? Can't answer that question without understanding what you said above. I assembled a few of the patches to date (your fq_codel patch, avery's and tims ath9k stuff) and tested them, to no measurable effect, against linus's tree a day or two back. I also acquired an ath10k card - would one of these suit? http://www.amazon.com/gp/product/B011SIMFR8?psc=1=true_=oh_aui_detailpage_o08_s00
Re: [RFC -next 2/2] virtio_net: Read and use the advised MTU
Hello. On 03/10/2016 05:28 PM, Aaron Conole wrote: This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it exists, read the advised MTU and use it. No proper error handling is provided for the case where a user changes the negotiated MTU. A future commit will add proper error handling. Instead, a warning is emitted if the guest changes the device MTU after previously being given advice. Signed-off-by: Aaron Conole--- drivers/net/virtio_net.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 767ab11..7175563 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c [...] @@ -1390,8 +1391,12 @@ static const struct ethtool_ops virtnet_ethtool_ops = { static int virtnet_change_mtu(struct net_device *dev, int new_mtu) { + struct virtnet_info *vi = netdev_priv(dev); if (new_mtu < MIN_MTU || new_mtu > MAX_MTU) return -EINVAL; + if (vi->negotiated_mtu == true) { + pr_warn("changing mtu from negotiated mtu."); + } {} not needed, see Documentation/CodingStyle. [...] MBR, Sergei
Re: [PATCH nf-next v9 8/8] openvswitch: Interface with NAT.
Thanks for the reviews Joe! Comments below. > On Mar 9, 2016, at 7:47 PM, Joe Stringerwrote: > > Hi Jarno, > > Thanks for working on this. Mostly just a few style things around #ifdefs > below. > > On 9 March 2016 at 15:10, Jarno Rajahalme wrote: >> Extend OVS conntrack interface to cover NAT. New nested >> OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action. >> A bare OVS_CT_ATTR_NAT only mangles existing and expected connections. >> If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested >> attributes, new (non-committed/non-confirmed) connections are mangled >> according to the rest of the nested attributes. >> >> The corresponding OVS userspace patch series includes test cases (in >> tests/system-traffic.at) that also serve as example uses. >> >> This work extends on a branch by Thomas Graf at >> https://github.com/tgraf/ovs/tree/nat. > > Thomas, I guess there was not signoff in these patches so Jarno does > not have your signoff in this patch. > >> Signed-off-by: Jarno Rajahalme >> --- >> v9: Fixed module dependencies. >> >> include/uapi/linux/openvswitch.h | 49 >> net/openvswitch/Kconfig | 3 +- >> net/openvswitch/conntrack.c | 523 >> +-- >> net/openvswitch/conntrack.h | 3 +- >> 4 files changed, 551 insertions(+), 27 deletions(-) > > > >> diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig >> index cd5fd9d..23471a4 100644 >> --- a/net/openvswitch/Kconfig >> +++ b/net/openvswitch/Kconfig >> @@ -6,7 +6,8 @@ config OPENVSWITCH >>tristate "Open vSwitch" >>depends on INET >>depends on !NF_CONNTRACK || \ >> - (NF_CONNTRACK && (!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6)) >> + (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \ >> +(!NF_NAT || NF_NAT))) > > Whitespace. > Fixed. >>select LIBCRC32C >>select MPLS >>select NET_MPLS_GSO >> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c >> index 5711f80..6455237 100644 >> --- a/net/openvswitch/conntrack.c >> +++ b/net/openvswitch/conntrack.c > > > >> struct ovs_ct_len_tbl { >> - size_t maxlen; >> - size_t minlen; >> + int maxlen; >> + int minlen; >> }; > > Are these changed for a specific reason, or just to use INT_MAX rather > than SIZE_MAX in ovs_ct_len_tbl? > ‘maxlen’ and ‘minlen’ are compared against the values returned by nla_len(), which returns an int: net/netlink.h:static inline int nla_len(const struct nlattr *nla) so I figured it is better to have these as ints, too. >> /* Metadata mark for masked write to conntrack mark */ >> @@ -42,15 +52,29 @@ struct md_labels { >>struct ovs_key_ct_labels mask; >> }; >> >> +#ifdef CONFIG_NF_NAT_NEEDED >> +enum ovs_ct_nat { >> + OVS_CT_NAT = 1 << 0, /* NAT for committed connections only. */ >> + OVS_CT_SRC_NAT = 1 << 1, /* Source NAT for NEW connections. */ >> + OVS_CT_DST_NAT = 1 << 2, /* Destination NAT for NEW connections. */ >> +}; >> +#endif > > Here... > >> /* Conntrack action context for execution. */ >> struct ovs_conntrack_info { >>struct nf_conntrack_helper *helper; >>struct nf_conntrack_zone zone; >>struct nf_conn *ct; >>u8 commit : 1; >> +#ifdef CONFIG_NF_NAT_NEEDED >> + u8 nat : 3; /* enum ovs_ct_nat */ >> +#endif > > and here.. I wonder if we can trim more of these #ifdefs, for > readability and more compiler coverage if the feature is disabled. > Trimmed this and other #ifdefs as you suggested, and it still compiles when NAT is disabled. Just posted the v10, which I hope will be the final version :-) Jarno
[PATCH nf-next v10 6/8] openvswitch: Handle NF_REPEAT in conntrack action.
Repeat the nf_conntrack_in() call when it returns NF_REPEAT. This avoids dropping a SYN packet re-opening an existing TCP connection. Signed-off-by: Jarno RajahalmeAcked-by: Joe Stringer --- net/openvswitch/conntrack.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index ae36fe2..85256b3 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -485,6 +485,7 @@ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, */ if (!skb_nfct_cached(net, key, info, skb)) { struct nf_conn *tmpl = info->ct; + int err; /* Associate skb with specified zone. */ if (tmpl) { @@ -495,8 +496,13 @@ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, skb->nfctinfo = IP_CT_NEW; } - if (nf_conntrack_in(net, info->family, NF_INET_PRE_ROUTING, - skb) != NF_ACCEPT) + /* Repeat if requested, see nf_iterate(). */ + do { + err = nf_conntrack_in(net, info->family, + NF_INET_PRE_ROUTING, skb); + } while (err == NF_REPEAT); + + if (err != NF_ACCEPT) return -ENOENT; ovs_ct_update_key(skb, info, key, true); -- 2.1.4
[PATCH nf-next v10 7/8] openvswitch: Delay conntrack helper call for new connections.
There is no need to help connections that are not confirmed, so we can delay helping new connections to the time when they are confirmed. This change is needed for NAT support, and having this as a separate patch will make the following NAT patch a bit easier to review. Signed-off-by: Jarno Rajahalme--- net/openvswitch/conntrack.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 85256b3..f718b72 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -483,7 +483,11 @@ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, * actually run the packet through conntrack twice unless it's for a * different zone. */ - if (!skb_nfct_cached(net, key, info, skb)) { + bool cached = skb_nfct_cached(net, key, info, skb); + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + + if (!cached) { struct nf_conn *tmpl = info->ct; int err; @@ -506,11 +510,18 @@ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, return -ENOENT; ovs_ct_update_key(skb, info, key, true); + } - if (ovs_ct_helper(skb, info->family) != NF_ACCEPT) { - WARN_ONCE(1, "helper rejected packet"); - return -EINVAL; - } + /* Call the helper only if: +* - nf_conntrack_in() was executed above ("!cached") for a confirmed +* connection, or +* - When committing an unconfirmed connection. +*/ + ct = nf_ct_get(skb, ); + if (ct && (nf_ct_is_confirmed(ct) ? !cached : info->commit) && + ovs_ct_helper(skb, info->family) != NF_ACCEPT) { + WARN_ONCE(1, "helper rejected packet"); + return -EINVAL; } return 0; -- 2.1.4
[PATCH nf-next v10 2/8] netfilter: Allow calling into nat helper without skb_dst.
NAT checksum recalculation code assumes existence of skb_dst, which becomes a problem for a later patch in the series ("openvswitch: Interface with NAT."). Simplify this by removing the check on skb_dst, as the checksum will be dealt with later in the stack. Suggested-by: Pravin ShelarSigned-off-by: Jarno Rajahalme --- net/ipv4/netfilter/nf_nat_l3proto_ipv4.c | 30 -- net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 30 -- 2 files changed, 16 insertions(+), 44 deletions(-) diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c index 61c7cc2..f8aad03 100644 --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c @@ -127,29 +127,15 @@ static void nf_nat_ipv4_csum_recalc(struct sk_buff *skb, u8 proto, void *data, __sum16 *check, int datalen, int oldlen) { - const struct iphdr *iph = ip_hdr(skb); - struct rtable *rt = skb_rtable(skb); - if (skb->ip_summed != CHECKSUM_PARTIAL) { - if (!(rt->rt_flags & RTCF_LOCAL) && - (!skb->dev || skb->dev->features & -(NETIF_F_IP_CSUM | NETIF_F_HW_CSUM))) { - skb->ip_summed = CHECKSUM_PARTIAL; - skb->csum_start = skb_headroom(skb) + - skb_network_offset(skb) + - ip_hdrlen(skb); - skb->csum_offset = (void *)check - data; - *check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, - datalen, proto, 0); - } else { - *check = 0; - *check = csum_tcpudp_magic(iph->saddr, iph->daddr, - datalen, proto, - csum_partial(data, datalen, - 0)); - if (proto == IPPROTO_UDP && !*check) - *check = CSUM_MANGLED_0; - } + const struct iphdr *iph = ip_hdr(skb); + + skb->ip_summed = CHECKSUM_PARTIAL; + skb->csum_start = skb_headroom(skb) + skb_network_offset(skb) + + ip_hdrlen(skb); + skb->csum_offset = (void *)check - data; + *check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, datalen, + proto, 0); } else inet_proto_csum_replace2(check, skb, htons(oldlen), htons(datalen), true); diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c index 6ce3099..e0be97e 100644 --- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c +++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c @@ -131,29 +131,15 @@ static void nf_nat_ipv6_csum_recalc(struct sk_buff *skb, u8 proto, void *data, __sum16 *check, int datalen, int oldlen) { - const struct ipv6hdr *ipv6h = ipv6_hdr(skb); - struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); - if (skb->ip_summed != CHECKSUM_PARTIAL) { - if (!(rt->rt6i_flags & RTF_LOCAL) && - (!skb->dev || skb->dev->features & -(NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM))) { - skb->ip_summed = CHECKSUM_PARTIAL; - skb->csum_start = skb_headroom(skb) + - skb_network_offset(skb) + - (data - (void *)skb->data); - skb->csum_offset = (void *)check - data; - *check = ~csum_ipv6_magic(>saddr, >daddr, - datalen, proto, 0); - } else { - *check = 0; - *check = csum_ipv6_magic(>saddr, >daddr, -datalen, proto, -csum_partial(data, datalen, - 0)); - if (proto == IPPROTO_UDP && !*check) - *check = CSUM_MANGLED_0; - } + const struct ipv6hdr *ipv6h = ipv6_hdr(skb); + + skb->ip_summed = CHECKSUM_PARTIAL; + skb->csum_start = skb_headroom(skb) + skb_network_offset(skb) + + (data - (void *)skb->data); + skb->csum_offset = (void *)check - data; + *check = ~csum_ipv6_magic(>saddr, >daddr, + datalen, proto, 0); } else
[PATCH nf-next v10 3/8] openvswitch: Add commentary to conntrack.c
This makes the code easier to understand and the following patches more focused. Signed-off-by: Jarno Rajahalme--- net/openvswitch/conntrack.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 3045290..2c2bf07 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -152,8 +152,12 @@ static void ovs_ct_update_key(const struct sk_buff *skb, ct = nf_ct_get(skb, ); if (ct) { state = ovs_ct_get_state(ctinfo); + /* All unconfirmed entries are NEW connections. */ if (!nf_ct_is_confirmed(ct)) state |= OVS_CS_F_NEW; + /* OVS persists the related flag for the duration of the +* connection. +*/ if (ct->master) state |= OVS_CS_F_RELATED; zone = nf_ct_zone(ct); @@ -165,6 +169,9 @@ static void ovs_ct_update_key(const struct sk_buff *skb, __ovs_ct_update_key(key, state, zone, ct); } +/* This is called to initialize CT key fields possibly coming in from the local + * stack. + */ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key) { ovs_ct_update_key(skb, NULL, key, false); @@ -199,7 +206,6 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key, struct nf_conn *ct; u32 new_mark; - /* The connection could be invalid, in which case set_mark is no-op. */ ct = nf_ct_get(skb, ); if (!ct) @@ -375,6 +381,11 @@ static bool skb_nfct_cached(const struct net *net, const struct sk_buff *skb, return true; } +/* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if + * not done already. Update key with new CT state. + * Note that if the packet is deemed invalid by conntrack, skb->nfct will be + * set to NULL and 0 will be returned. + */ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, const struct ovs_conntrack_info *info, struct sk_buff *skb) @@ -418,6 +429,13 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, { struct nf_conntrack_expect *exp; + /* If we pass an expected packet through nf_conntrack_in() the +* expectation is typically removed, but the packet could still be +* lost in upcall processing. To prevent this from happening we +* perform an explicit expectation lookup. Expected connections are +* always new, and will be passed through conntrack only when they are +* committed, as it is OK to remove the expectation at that time. +*/ exp = ovs_ct_expect_find(net, >zone, info->family, skb); if (exp) { u8 state; @@ -455,6 +473,7 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key, err = __ovs_ct_lookup(net, key, info, skb); if (err) return err; + /* This is a no-op if the connection has already been confirmed. */ if (nf_conntrack_confirm(skb) != NF_ACCEPT) return -EINVAL; -- 2.1.4
[PATCH nf-next v10 4/8] openvswitch: Update the CT state key only after nf_conntrack_in().
Only a successful nf_conntrack_in() call can effect a connection state change, so it suffices to update the key only after the nf_conntrack_in() returns. This change is needed for the later NAT patches. Signed-off-by: Jarno RajahalmeAcked-by: Joe Stringer --- net/openvswitch/conntrack.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 2c2bf07..a487bb3 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -382,7 +382,8 @@ static bool skb_nfct_cached(const struct net *net, const struct sk_buff *skb, } /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if - * not done already. Update key with new CT state. + * not done already. Update key with new CT state after passing the packet + * through conntrack. * Note that if the packet is deemed invalid by conntrack, skb->nfct will be * set to NULL and 0 will be returned. */ @@ -411,14 +412,14 @@ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, skb) != NF_ACCEPT) return -ENOENT; + ovs_ct_update_key(skb, info, key, true); + if (ovs_ct_helper(skb, info->family) != NF_ACCEPT) { WARN_ONCE(1, "helper rejected packet"); return -EINVAL; } } - ovs_ct_update_key(skb, info, key, true); - return 0; } -- 2.1.4
[PATCH nf-next v10 8/8] openvswitch: Interface with NAT.
Extend OVS conntrack interface to cover NAT. New nested OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action. A bare OVS_CT_ATTR_NAT only mangles existing and expected connections. If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested attributes, new (non-committed/non-confirmed) connections are mangled according to the rest of the nested attributes. The corresponding OVS userspace patch series includes test cases (in tests/system-traffic.at) that also serve as example uses. This work extends on a branch by Thomas Graf at https://github.com/tgraf/ovs/tree/nat. Signed-off-by: Jarno RajahalmeAcked-by: Thomas Graf --- include/uapi/linux/openvswitch.h | 49 net/openvswitch/Kconfig | 3 +- net/openvswitch/conntrack.c | 524 +-- net/openvswitch/conntrack.h | 3 +- 4 files changed, 551 insertions(+), 28 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index a27222d..616d047 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -454,6 +454,14 @@ struct ovs_key_ct_labels { #define OVS_CS_F_REPLY_DIR 0x08 /* Flow is in the reply direction. */ #define OVS_CS_F_INVALID 0x10 /* Could not track connection. */ #define OVS_CS_F_TRACKED 0x20 /* Conntrack has occurred. */ +#define OVS_CS_F_SRC_NAT 0x40 /* Packet's source address/port was +* mangled by NAT. +*/ +#define OVS_CS_F_DST_NAT 0x80 /* Packet's destination address/port +* was mangled by NAT. +*/ + +#define OVS_CS_F_NAT_MASK (OVS_CS_F_SRC_NAT | OVS_CS_F_DST_NAT) /** * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands. @@ -632,6 +640,8 @@ struct ovs_action_hash { * mask. For each bit set in the mask, the corresponding bit in the value is * copied to the connection tracking label field in the connection. * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG. + * @OVS_CT_ATTR_NAT: Nested OVS_NAT_ATTR_* for performing L3 network address + * translation (NAT) on the packet. */ enum ovs_ct_attr { OVS_CT_ATTR_UNSPEC, @@ -641,12 +651,51 @@ enum ovs_ct_attr { OVS_CT_ATTR_LABELS, /* labels to associate with this connection. */ OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of related connections. */ + OVS_CT_ATTR_NAT,/* Nested OVS_NAT_ATTR_* */ __OVS_CT_ATTR_MAX }; #define OVS_CT_ATTR_MAX (__OVS_CT_ATTR_MAX - 1) /** + * enum ovs_nat_attr - Attributes for %OVS_CT_ATTR_NAT. + * + * @OVS_NAT_ATTR_SRC: Flag for Source NAT (mangle source address/port). + * @OVS_NAT_ATTR_DST: Flag for Destination NAT (mangle destination + * address/port). Only one of (@OVS_NAT_ATTR_SRC, @OVS_NAT_ATTR_DST) may be + * specified. Effective only for packets for ct_state NEW connections. + * Packets of committed connections are mangled by the NAT action according to + * the committed NAT type regardless of the flags specified. As a corollary, a + * NAT action without a NAT type flag will only mangle packets of committed + * connections. The following NAT attributes only apply for NEW + * (non-committed) connections, and they may be included only when the CT + * action has the @OVS_CT_ATTR_COMMIT flag and either @OVS_NAT_ATTR_SRC or + * @OVS_NAT_ATTR_DST is also included. + * @OVS_NAT_ATTR_IP_MIN: struct in_addr or struct in6_addr + * @OVS_NAT_ATTR_IP_MAX: struct in_addr or struct in6_addr + * @OVS_NAT_ATTR_PROTO_MIN: u16 L4 protocol specific lower boundary (port) + * @OVS_NAT_ATTR_PROTO_MAX: u16 L4 protocol specific upper boundary (port) + * @OVS_NAT_ATTR_PERSISTENT: Flag for persistent IP mapping across reboots + * @OVS_NAT_ATTR_PROTO_HASH: Flag for pseudo random L4 port mapping (MD5) + * @OVS_NAT_ATTR_PROTO_RANDOM: Flag for fully randomized L4 port mapping + */ +enum ovs_nat_attr { + OVS_NAT_ATTR_UNSPEC, + OVS_NAT_ATTR_SRC, + OVS_NAT_ATTR_DST, + OVS_NAT_ATTR_IP_MIN, + OVS_NAT_ATTR_IP_MAX, + OVS_NAT_ATTR_PROTO_MIN, + OVS_NAT_ATTR_PROTO_MAX, + OVS_NAT_ATTR_PERSISTENT, + OVS_NAT_ATTR_PROTO_HASH, + OVS_NAT_ATTR_PROTO_RANDOM, + __OVS_NAT_ATTR_MAX, +}; + +#define OVS_NAT_ATTR_MAX (__OVS_NAT_ATTR_MAX - 1) + +/** * enum ovs_action_attr - Action types. * * @OVS_ACTION_ATTR_OUTPUT: Output packet to port. diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig index cd5fd9d..234a733 100644 --- a/net/openvswitch/Kconfig +++ b/net/openvswitch/Kconfig @@ -6,7 +6,8 @@ config OPENVSWITCH tristate "Open vSwitch" depends on INET depends on !NF_CONNTRACK || \ - (NF_CONNTRACK && (!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6)) +
[PATCH nf-next v10 0/8] openvswitch: NAT support
This series adds NAT support to openvswitch kernel module. A few changes are needed to the netfilter code to facilitate this (patches 1-2/8). Patches 3-7 make the openvswitch kernel module ready for the patch 8 that adds the NAT support by calling into netfilter NAT code from the openvswitch conntrack action. This version fixes spelling errors in comments and eliminates many of the #ifdefs in the final patch that were not strictly necessary. This makes the code more readable and improves compile time coverage even when NAT feature is not configured. The OVS master now has the corresponding OVS userspace support to use and test the NAT features. Below if a walk through of a simple use case. In this case ports 1 and 2 are in different namespaces. The OpenFlow table below only allows IPv4 connections initiated from port 1, and applies source NAT to those connections: in_port=1,ip,action=ct(commit,zone=1,nat(src=10.1.1.240-10.1.1.255)),2 in_port=2,ct_state=-trk,ip,action=ct(table=0,zone=1,nat) in_port=2,ct_state=+est,ct_zone=1,ip,action=1 This flow table matches all IPv4 traffic from port 1, runs them through conntrack in zone 1 and NATs them. The NAT is initialized to do source IP mapping to the given range for the first packet of each connection, after which the new connection is committed (confirmed). For further packets of already tracked connections NAT is done according to the connection state and the commit is a no-op. Each packet that is not flagged as a drop by the CT action is forwarded to port 2. The CT action does an implicit fragmentation reassembly, so that only complete packets are run through conntrack. Reassembled packets are re-fragmented on output. The IPv4 traffic coming from port 2 is first matched for the non-tracked state (-trk), which means that the packet has not been through a CT action yet. Such traffic is run trough the conntrack in zone 1 and all packets associated with a NATted connection are NATted also in the return direction. After the packet has been through conntrack it is recirculated back to OpenFlow table 0 (which is the default table, so all the rules above are in table 0). The CT action changes the 'trk' flag to being set, so the packets after recirculation no longer match the second rule. The third rule then matches the recirculated packets that were marked as established by conntrack (+est), and the packet is output on port 1. Matching on ct_zone is not strictly needed, but in this test case it verifies that the ct_zone key attribute is properly set by the conntrack action. A full test case requires rules for ARP handling not shown here. The flow table above is an OpenFlow table, and the rules therein are translated to kernel flow entries on-demand by ovs-vswitchd. Jarno Rajahalme (8): netfilter: Remove IP_CT_NEW_REPLY definition. netfilter: Allow calling into nat helper without skb_dst. openvswitch: Add commentary to conntrack.c openvswitch: Update the CT state key only after nf_conntrack_in(). openvswitch: Find existing conntrack entry after upcall. openvswitch: Handle NF_REPEAT in conntrack action. openvswitch: Delay conntrack helper call for new connections. openvswitch: Interface with NAT. include/uapi/linux/netfilter/nf_conntrack_common.h | 12 +- include/uapi/linux/openvswitch.h | 49 ++ net/ipv4/netfilter/nf_nat_l3proto_ipv4.c | 30 +- net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 30 +- net/openvswitch/Kconfig| 3 +- net/openvswitch/conntrack.c| 660 +++-- net/openvswitch/conntrack.h| 3 +- 7 files changed, 700 insertions(+), 87 deletions(-) -- 2.1.4
[PATCH nf-next v10 5/8] openvswitch: Find existing conntrack entry after upcall.
Add a new function ovs_ct_find_existing() to find an existing conntrack entry for which this packet was already applied to. This is only to be called when there is evidence that the packet was already tracked and committed, but we lost the ct reference due to an userspace upcall. ovs_ct_find_existing() is called from skb_nfct_cached(), which can now hide the fact that the ct reference may have been lost due to an upcall. This allows ovs_ct_commit() to be simplified. This patch is needed by later "openvswitch: Interface with NAT" patch, as we need to be able to pass the packet through NAT using the original ct reference also after the reference is lost after an upcall. Signed-off-by: Jarno RajahalmeAcked-by: Joe Stringer --- net/openvswitch/conntrack.c | 103 ++-- 1 file changed, 90 insertions(+), 13 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index a487bb3..ae36fe2 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -356,14 +356,101 @@ ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, return __nf_ct_expect_find(net, zone, ); } +/* This replicates logic from nf_conntrack_core.c that is not exported. */ +static enum ip_conntrack_info +ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) +{ + const struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); + + if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) + return IP_CT_ESTABLISHED_REPLY; + /* Once we've had two way comms, always ESTABLISHED. */ + if (test_bit(IPS_SEEN_REPLY_BIT, >status)) + return IP_CT_ESTABLISHED; + if (test_bit(IPS_EXPECTED_BIT, >status)) + return IP_CT_RELATED; + return IP_CT_NEW; +} + +/* Find an existing connection which this packet belongs to without + * re-attributing statistics or modifying the connection state. This allows an + * skb->nfct lost due to an upcall to be recovered during actions execution. + * + * Must be called with rcu_read_lock. + * + * On success, populates skb->nfct and skb->nfctinfo, and returns the + * connection. Returns NULL if there is no existing entry. + */ +static struct nf_conn * +ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, +u8 l3num, struct sk_buff *skb) +{ + struct nf_conntrack_l3proto *l3proto; + struct nf_conntrack_l4proto *l4proto; + struct nf_conntrack_tuple tuple; + struct nf_conntrack_tuple_hash *h; + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + unsigned int dataoff; + u8 protonum; + + l3proto = __nf_ct_l3proto_find(l3num); + if (!l3proto) { + pr_debug("ovs_ct_find_existing: Can't get l3proto\n"); + return NULL; + } + if (l3proto->get_l4proto(skb, skb_network_offset(skb), , +) <= 0) { + pr_debug("ovs_ct_find_existing: Can't get protonum\n"); + return NULL; + } + l4proto = __nf_ct_l4proto_find(l3num, protonum); + if (!l4proto) { + pr_debug("ovs_ct_find_existing: Can't get l4proto\n"); + return NULL; + } + if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, l3num, +protonum, net, , l3proto, l4proto)) { + pr_debug("ovs_ct_find_existing: Can't get tuple\n"); + return NULL; + } + + /* look for tuple match */ + h = nf_conntrack_find_get(net, zone, ); + if (!h) + return NULL; /* Not found. */ + + ct = nf_ct_tuplehash_to_ctrack(h); + + ctinfo = ovs_ct_get_info(h); + if (ctinfo == IP_CT_NEW) { + /* This should not happen. */ + WARN_ONCE(1, "ovs_ct_find_existing: new packet for %p\n", ct); + } + skb->nfct = >ct_general; + skb->nfctinfo = ctinfo; + return ct; +} + /* Determine whether skb->nfct is equal to the result of conntrack lookup. */ -static bool skb_nfct_cached(const struct net *net, const struct sk_buff *skb, - const struct ovs_conntrack_info *info) +static bool skb_nfct_cached(struct net *net, + const struct sw_flow_key *key, + const struct ovs_conntrack_info *info, + struct sk_buff *skb) { enum ip_conntrack_info ctinfo; struct nf_conn *ct; ct = nf_ct_get(skb, ); + /* If no ct, check if we have evidence that an existing conntrack entry +* might be found for this skb. This happens when we lose a skb->nfct +* due to an upcall. If the connection was not confirmed, it is not +* cached and needs to be run through conntrack again. +*/ + if (!ct && key->ct.state & OVS_CS_F_TRACKED && + !(key->ct.state & OVS_CS_F_INVALID) && +
[PATCH nf-next v10 1/8] netfilter: Remove IP_CT_NEW_REPLY definition.
Remove the definition of IP_CT_NEW_REPLY from the kernel as it does not make sense. This allows the definition of IP_CT_NUMBER to be simplified as well. Signed-off-by: Jarno Rajahalme--- include/uapi/linux/netfilter/nf_conntrack_common.h | 12 +--- net/openvswitch/conntrack.c| 2 -- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h index 319f471..6d074d1 100644 --- a/include/uapi/linux/netfilter/nf_conntrack_common.h +++ b/include/uapi/linux/netfilter/nf_conntrack_common.h @@ -20,9 +20,15 @@ enum ip_conntrack_info { IP_CT_ESTABLISHED_REPLY = IP_CT_ESTABLISHED + IP_CT_IS_REPLY, IP_CT_RELATED_REPLY = IP_CT_RELATED + IP_CT_IS_REPLY, - IP_CT_NEW_REPLY = IP_CT_NEW + IP_CT_IS_REPLY, - /* Number of distinct IP_CT types (no NEW in reply dirn). */ - IP_CT_NUMBER = IP_CT_IS_REPLY * 2 - 1 + /* No NEW in reply direction. */ + + /* Number of distinct IP_CT types. */ + IP_CT_NUMBER, + + /* only for userspace compatibility */ +#ifndef __KERNEL__ + IP_CT_NEW_REPLY = IP_CT_NUMBER, +#endif }; #define NF_CT_STATE_INVALID_BIT(1 << 0) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index ee6ff8f..3045290 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -75,7 +75,6 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo) switch (ctinfo) { case IP_CT_ESTABLISHED_REPLY: case IP_CT_RELATED_REPLY: - case IP_CT_NEW_REPLY: ct_state |= OVS_CS_F_REPLY_DIR; break; default: @@ -92,7 +91,6 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo) ct_state |= OVS_CS_F_RELATED; break; case IP_CT_NEW: - case IP_CT_NEW_REPLY: ct_state |= OVS_CS_F_NEW; break; default: -- 2.1.4
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
On Thu, Mar 10, 2016 at 01:01:38PM -0500, David Miller wrote: > From: Cyrill Gorcunov> Date: Thu, 10 Mar 2016 18:09:20 +0300 > > > On Thu, Mar 10, 2016 at 02:03:24PM +0300, Cyrill Gorcunov wrote: > >> On Thu, Mar 10, 2016 at 01:20:18PM +0300, Cyrill Gorcunov wrote: > >> > On Thu, Mar 10, 2016 at 12:16:29AM +0300, Cyrill Gorcunov wrote: > >> > > > >> > > Thanks for explanation, Dave! I'll continue on this task tomorrow > >> > > tryin to implement optimization you proposed. > >> > > >> > OK, here are the results for the preliminary patch with conntrack running > >> ... > >> > net/ipv4/devinet.c | 13 - > >> > 1 file changed, 12 insertions(+), 1 deletion(-) > >> > > >> > Index: linux-ml.git/net/ipv4/devinet.c > >> > === > >> > --- linux-ml.git.orig/net/ipv4/devinet.c > >> > +++ linux-ml.git/net/ipv4/devinet.c > >> > @@ -403,7 +403,18 @@ no_promotions: > >> > So that, this order is correct. > >> > */ > >> > >> This patch is wrong, so drop it please. I'll do another. > > > > Here I think is a better variant. The resulst are good > > enough -- 1 sec for cleanup. Does the patch look sane? > > I'm tempted to say that we should provide these notifier handlers with > the information they need, explicitly, to handle this case. > > Most intdev notifiers actually want to know the individual addresses > that get removed, one by one. That's handled by the existing > NETDEV_DOWN event and the ifa we pass to that. > > But some, like this netfilter masq case, would be satisfied with a > single event that tells them the whole inetdev instance is being torn > down. Which is the case we care about here. > > We currently don't use NETDEV_UNREGISTER for inetdev notifiers, so > maybe we could use that. > > And that is consistent with the core netdev notifier that triggers > this call chain in the first place. > > Roughly, something like this: I see. Dave, gimme some time to test but I'm sure it'll work. I don't have some strong opinion here, so your patch looks pretty fine to me. But maybe people from netdev camp have some other ideas.
[PATCH net-next v5] tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In
Per RFC4898, they count segments sent/received containing a positive length data segment (that includes retransmission segments carrying data). Unlike tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments carrying no data (e.g. pure ack). The patch also updates the segs_in in tcp_fastopen_add_skb() so that segs_in >= data_segs_in property is kept. Together with retransmission data, tcpi_data_segs_out gives a better signal on the rxmit rate. v5: Eric pointed out that checking skb->len is still needed in tcp_fastopen_add_skb() because skb can carry a FIN without data. Hence, instead of open coding segs_in and data_segs_in, tcp_segs_in() helper is used. Comment is added to the fastopen case to explain why segs_in has to be reset and tcp_segs_in() has to be called before __skb_pull(). v4: Add comment to the changes in tcp_fastopen_add_skb() and also add remark on this case in the commit message. v3: Add const modifier to the skb parameter in tcp_segs_in() v2: Rework based on recent fix by Eric: commit a9d99ce28ed3 ("tcp: fix tcpi_segs_in after connection establishment") Signed-off-by: Martin KaFai LauCc: Chris Rapier Cc: Eric Dumazet Cc: Marcelo Ricardo Leitner Cc: Neal Cardwell Cc: Yuchung Cheng --- include/linux/tcp.h | 6 ++ include/net/tcp.h| 10 ++ include/uapi/linux/tcp.h | 2 ++ net/ipv4/tcp.c | 2 ++ net/ipv4/tcp_fastopen.c | 8 net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/tcp_minisocks.c | 2 +- net/ipv4/tcp_output.c| 4 +++- net/ipv6/tcp_ipv6.c | 2 +- 9 files changed, 34 insertions(+), 4 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index bcbf51d..7be9b12 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -158,6 +158,9 @@ struct tcp_sock { u32 segs_in;/* RFC4898 tcpEStatsPerfSegsIn * total number of segments in. */ + u32 data_segs_in; /* RFC4898 tcpEStatsPerfDataSegsIn +* total number of data segments in. +*/ u32 rcv_nxt;/* What we want to receive next */ u32 copied_seq; /* Head of yet unread data */ u32 rcv_wup;/* rcv_nxt on last window update sent */ @@ -165,6 +168,9 @@ struct tcp_sock { u32 segs_out; /* RFC4898 tcpEStatsPerfSegsOut * The total number of segments sent. */ + u32 data_segs_out; /* RFC4898 tcpEStatsPerfDataSegsOut +* total number of data segments sent. +*/ u64 bytes_acked;/* RFC4898 tcpEStatsAppHCThruOctetsAcked * sum(delta(snd_una)), or how many bytes * were acked. diff --git a/include/net/tcp.h b/include/net/tcp.h index e90db85..24557a8 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1816,4 +1816,14 @@ static inline void skb_set_tcp_pure_ack(struct sk_buff *skb) skb->truesize = 2; } +static inline void tcp_segs_in(struct tcp_sock *tp, const struct sk_buff *skb) +{ + u16 segs_in; + + segs_in = max_t(u16, 1, skb_shinfo(skb)->gso_segs); + tp->segs_in += segs_in; + if (skb->len > tcp_hdrlen(skb)) + tp->data_segs_in += segs_in; +} + #endif /* _TCP_H */ diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index fe95446..53e8e3f 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -199,6 +199,8 @@ struct tcp_info { __u32 tcpi_notsent_bytes; __u32 tcpi_min_rtt; + __u32 tcpi_data_segs_in; /* RFC4898 tcpEStatsDataSegsIn */ + __u32 tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */ }; /* for TCP_MD5SIG socket option */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f9faadb..6b01b48 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2728,6 +2728,8 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info) info->tcpi_notsent_bytes = max(0, notsent_bytes); info->tcpi_min_rtt = tcp_min_rtt(tp); + info->tcpi_data_segs_in = tp->data_segs_in; + info->tcpi_data_segs_out = tp->data_segs_out; } EXPORT_SYMBOL_GPL(tcp_get_info); diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index fdb286d..4fc0061 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -140,6 +140,14 @@ void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb) return; skb_dst_drop(skb); + /* segs_in has been initialized to 1 in tcp_create_openreq_child(). +* Hence, reset segs_in to 0 before calling tcp_segs_in() +* to avoid double counting. Also, tcp_segs_in() expects +
Re: [PATCH nf-next v9 8/8] openvswitch: Interface with NAT.
> On Mar 10, 2016, at 4:00 AM, Thomas Grafwrote: > > On 03/09/16 at 07:47pm, Joe Stringer wrote: >> On 9 March 2016 at 15:10, Jarno Rajahalme wrote: >>> Extend OVS conntrack interface to cover NAT. New nested >>> OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action. >>> A bare OVS_CT_ATTR_NAT only mangles existing and expected connections. >>> If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested >>> attributes, new (non-committed/non-confirmed) connections are mangled >>> according to the rest of the nested attributes. >>> >>> The corresponding OVS userspace patch series includes test cases (in >>> tests/system-traffic.at) that also serve as example uses. >>> >>> This work extends on a branch by Thomas Graf at >>> https://github.com/tgraf/ovs/tree/nat. >> >> Thomas, I guess there was not signoff in these patches so Jarno does >> not have your signoff in this patch. > > That's fine. The code has evolved a lot since. I don't see anything > further than what Joe spotted so feel free to add my > > Acked-by: Thomas Graf Thanks!
Re: [PATCH nf-next v8 1/8] netfilter: Remove IP_CT_NEW_REPLY definition.
Thanks for pointing this out, v10, which hope is the final version, will have the cover letter back. Jarno > On Mar 10, 2016, at 1:16 AM, Or Gerlitzwrote: > > On Wed, Mar 9, 2016 at 2:24 AM, Jarno Rajahalme wrote: >> Remove the definition of IP_CT_NEW_REPLY from the kernel as it does >> not make sense. This allows the definition of IP_CT_NUMBER to be >> simplified as well. > > I just realized that after V7 you stopped sending cover letter (patch 0/N) > with this series. > > Maybe you send it and this misses the list? we need to be able to see > differences from earlier versions. > > Or.
Re: [PATCH 2/3] dm9601: manage eeprom to assure the chip for correct operation
> "Joseph" == Joseph CHANGwrites: > Add to maintain variant eeprom adapters which may have not right > dm962x's format. > Signed-off-by: Joseph CHANG > +static void dm_render_begin(struct usbnet *dev) > +{ > +/* Render eeprom if need, WORD3 render, set D[15:14] 01b */ > +dm_eeprom_render(dev, 3, 0x4000, 0xc000); > +/* Render eeprom if need, WORD7 render, clear D[10] */ > +dm_eeprom_render(dev, 7, 0x, 0x0400); > +/* Render eeprom if need, WORD11 render, need 0x005a */ > +dm_eeprom_render(dev, 11, 0x005a, 0x); > +/* Render eeprom if need, WORD12 render, need 0x0007 */ > +dm_eeprom_render(dev, 12, DM_EP3I_VAL, 0x); With render I guess you mean something like fixup? I'm not sure we want to do this automatically without an explicit action from the user. How common are these adapters without valid eeprom? What happens if the eeprom content isn't fixed? Do we need to reset the device once the eeprom is updated? -- Bye, Peter Korsgaard
Re: net: use-after-free in recvmmsg
On Tue, Jan 26, 2016 at 8:30 PM, Arnaldo Carvalho de Melowrote: > Em Tue, Jan 26, 2016 at 08:27:48PM +0100, Dmitry Vyukov escreveu: >> On Fri, Jan 22, 2016 at 10:16 PM, Arnaldo Carvalho de Melo >> wrote: >> > Em Fri, Jan 22, 2016 at 09:39:53PM +0100, Dmitry Vyukov escreveu: >> >> I am on commit 30f05309bde49295e02e45c7e615f73aa4e0ccc2 (Jan 20). >> >> Seems to be added in commit a2e2725541fad72416326798c2d7fa4dafb7d337 >> >> (Oct 2009). >> > >> > Maybe this helps? Compile testing now... >> >> >> I don't have a reliable reproducer, so can't test it per se. >> I will integrate this patch tomorrow and restart fuzzer with it. > > Thanks a lot! Hi Arnaldo, I am running with that patch since then, and did not see the bug. Please mail it as a proper patch.
[PATCH] kcm: mark helper functions inline
The stub helper functions for the newly added kcm_proc_init/exit interfaces are defined as 'static' in a header file, which leads to build warnings for each file that includes them without calling them: include/net/kcm.h:183:12: error: 'kcm_proc_init' defined but not used [-Werror=unused-function] include/net/kcm.h:184:13: error: 'kcm_proc_exit' defined but not used [-Werror=unused-function] This marks the two functions as 'static inline' instead, which avoids the warnings and is obviously what was meant here. Signed-off-by: Arnd BergmannFixes: cd6e111bf5be ("kcm: Add statistics and proc interfaces") --- include/net/kcm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/net/kcm.h b/include/net/kcm.h index 95c425ca97b6..2840b5825dcc 100644 --- a/include/net/kcm.h +++ b/include/net/kcm.h @@ -180,8 +180,8 @@ struct kcm_mux { int kcm_proc_init(void); void kcm_proc_exit(void); #else -static int kcm_proc_init(void) { return 0; } -static void kcm_proc_exit(void) { } +static inline int kcm_proc_init(void) { return 0; } +static inline void kcm_proc_exit(void) { } #endif static inline void aggregate_psock_stats(struct kcm_psock_stats *stats, -- 2.7.0
Re: [PATCH 1/2] net: thunderx: Set recevie buffer page usage count in bulk
> > So calculate the modulus on the page split count and optimize the > increment ahead of time when possible, and for the sub page split > pieces do it one at a time. > Patch does almost the same with a negligible overhead of a counter for page->_count increment at a later time but still before HW starts using buffers. Difference between NIU driver and this patch is there it's calculate split count, increment page count and then divide page into buffers. Here it's divide page into buffers, have a counter which increments at every split and then at the end do a atomic increment of page->_count. Any issue with this approach ? Thanks, Sunil.
Re: Micrel Phy - Is there a way to configure the Phy not to do 802.3x flow control?
On 10/03/16 08:48, Murali Karicheri wrote: > On 03/03/2016 07:16 PM, Florian Fainelli wrote: >> On 03/03/16 14:18, Murali Karicheri wrote: >>> Hi, >>> >>> We are using Micrel Phy in one of our board and wondering if we can force >>> the >>> Phy to disable flow control at start. I have a 1G ethernet switch connected >>> to Phy and the phy always enable flow control. I would like to configure the >>> phy not to flow control. Is that possible and if yes, what should I do in >>> the >>> my Ethernet driver to tell the Phy not to enable flow control? >> >> The PHY is not doing flow control per-se, your pseudo Ethernet MAC in >> the switch is doing, along with the link partner advertising support for >> it. You would want to make sure that your PHY device interface (provided >> that you are using the PHY library) is not starting with Pause >> advertised, but it could be supported. > > Understood that Phy is just advertise FC. The Micrel phy for 9031 advertise > by default FC supported. After negotiation, I see that Phylib provide the > link status with parameter pause = 1, asym_pause = 1. How do I tell the Phy > not > to advertise? > > I call following sequence in the Ethernet driver. > > of_phy_connect(x,y,hndlr,a,z); Here you should be able to change phydev->advertising and phydev->supported to mask the ADVERTISED_Pause | ADVERTISED_AsymPause bits and have phy_start() restart with that which should disable pause and asym_pause as seen by your adjust_link handler. > phy_start() > > Now in hndlr() I have pause = 1, asym_pause = 1, in phy_device ptr. How can > I tell the phy not to advertise initially? -- Florian
Re: [RFC] net: ipv4 -- Introduce ifa limit per net
From: Cyrill GorcunovDate: Thu, 10 Mar 2016 18:09:20 +0300 > On Thu, Mar 10, 2016 at 02:03:24PM +0300, Cyrill Gorcunov wrote: >> On Thu, Mar 10, 2016 at 01:20:18PM +0300, Cyrill Gorcunov wrote: >> > On Thu, Mar 10, 2016 at 12:16:29AM +0300, Cyrill Gorcunov wrote: >> > > >> > > Thanks for explanation, Dave! I'll continue on this task tomorrow >> > > tryin to implement optimization you proposed. >> > >> > OK, here are the results for the preliminary patch with conntrack running >> ... >> > net/ipv4/devinet.c | 13 - >> > 1 file changed, 12 insertions(+), 1 deletion(-) >> > >> > Index: linux-ml.git/net/ipv4/devinet.c >> > === >> > --- linux-ml.git.orig/net/ipv4/devinet.c >> > +++ linux-ml.git/net/ipv4/devinet.c >> > @@ -403,7 +403,18 @@ no_promotions: >> > So that, this order is correct. >> > */ >> >> This patch is wrong, so drop it please. I'll do another. > > Here I think is a better variant. The resulst are good > enough -- 1 sec for cleanup. Does the patch look sane? I'm tempted to say that we should provide these notifier handlers with the information they need, explicitly, to handle this case. Most intdev notifiers actually want to know the individual addresses that get removed, one by one. That's handled by the existing NETDEV_DOWN event and the ifa we pass to that. But some, like this netfilter masq case, would be satisfied with a single event that tells them the whole inetdev instance is being torn down. Which is the case we care about here. We currently don't use NETDEV_UNREGISTER for inetdev notifiers, so maybe we could use that. And that is consistent with the core netdev notifier that triggers this call chain in the first place. Roughly, something like this: diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 8c3df2c..6eee5cb 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -292,6 +292,11 @@ static void inetdev_destroy(struct in_device *in_dev) in_dev->dead = 1; + if (in_dev->ifa_list) + blocking_notifier_call_chain(_chain, +NETDEV_UNREGISTER, +in_dev->ifa_list); + ip_mc_destroy_dev(in_dev); while ((ifa = in_dev->ifa_list) != NULL) { diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c index c6eb421..1bb8026 100644 --- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c +++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c @@ -111,6 +111,10 @@ static int masq_inet_event(struct notifier_block *this, struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev; struct netdev_notifier_info info; + if (event != NETDEV_UNREGISTER) + return NOTIFY_DONE; + event = NETDEV_DOWN; + netdev_notifier_info_init(, dev); return masq_device_event(this, event, ); }
Re: [PATCH net-next v4] tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In
On Thu, Mar 10, 2016 at 09:43:18AM -0800, Eric Dumazet wrote: > On Thu, Mar 10, 2016 at 9:39 AM, Eric Dumazetwrote: > > On Thu, Mar 10, 2016 at 9:29 AM, Martin KaFai Lau wrote: > >> Per RFC4898, they count segments sent/received > >> containing a positive length data segment (that includes > >> retransmission segments carrying data). Unlike > >> tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments > >> carrying no data (e.g. pure ack). > >> > >> The patch also updates the segs_in in tcp_fastopen_add_skb() > >> so that segs_in >= data_segs_in property is kept. If > >> tcp_segs_in() helper is used in this fastopen case, tp->segs_in > >> has to be 0 reset first to avoid double counting. Also, it has > >> to be done before __skb_pull(skb, tcp_hdrlen(skb)) while > >> there is no need to check skb->len since skb has already > >> been confirmed carrying data. I found it more confusing > >> and chose to directly set segs_in and data_segs_in in > >> this special case. > > > > Note that on my TODO list after commit > > e11ecddf5128011c936cc5360780190cbc901fdc > > I had the project of pulling TCP headers much earlier in input path > > so that we do not have all these special cases. > > > > Acked-by: Eric Dumazet > > Actually, tcp_fastopen_add_skb() can queue a packet with a FIN only, > but no data. Thanks for pointing it out. Didn't know it is allowed and the above end_seq check could also be +1 by the FIN. > > I believe you need to test skb->len before setting tp->data_segs_in In that case, I will try to 0 reset segs_in with comment explanation and call tcp_segs_in() before the skb_pull. I will spin another version.
Re: [PATCH] b43: fix memory leak
On Thursday 10 March 2016 11:13 PM, Michael Büsch wrote: On Fri, 19 Feb 2016 20:37:18 +0530 Sudip Mukherjeewrote: https://patchwork.kernel.org/patch/8049041/ I have an old laptop running on 800Mhz CPU. It has "Broadcom BCM4311 [14e4:4311] (rev 01)". I will try to test it on this weekend. Any news on this one? No. Sorry. I was trying to install ubuntu 14.04 in it, but for some reason the usb stick is not moving past the boot screen. Give me two more days and I will let you all know by this Saturday. regards sudip
Re: [PATCH v2] phy: remove documentation of removed members of phy_device structure
On 10/03/16 04:58, LABBE Corentin wrote: > Commit e5a03bfd873c ("phy: Add an mdio_device structure") removed addr, > bus and dev member of the phy_device structure. > This patch remove the documentation about those members. > > Signed-off-by: LABBE CorentinAcked-by: Florian Fainelli -- Florian
Re: [PATCH 1/2] net: thunderx: Set recevie buffer page usage count in bulk
From: Sunil KovvuriDate: Thu, 10 Mar 2016 16:13:28 +0530 > Hi David, > > >>> So if you know ahead of time how the page will be split up, just >>> calculate that when you get the page and increment the page count >>> appropriately. >>> >>> That's what we do in the NIU driver. >> >> Thanks for the suggestion, will check and get back. >> > > I looked at the NIU driver and in fn() niu_rbr_refill() > static void niu_rbr_refill(struct niu *np, struct rx_ring_info *rp, gfp_t > mask) > { > int index = rp->rbr_index; > > rp->rbr_pending++; > if ((rp->rbr_pending % rp->rbr_blocks_per_page) == 0) { > > Here it's been checked whether rbr_pending is a exact multiple of page > split count. > And hence updating page count based on fixed calculation is right. > > On my platform driver receives a interrupt when free buffer count > falls below a threshold > and by the time SW reads count of buffers to be refilled it can be any > number i.e > may or may not be a exact multiple of page split count. So calculate the modulus on the page split count and optimize the increment ahead of time when possible, and for the sub page split pieces do it one at a time. I don't understand what the problem is.
Re: [PATCH] b43: fix memory leak
On Fri, 19 Feb 2016 20:37:18 +0530 Sudip Mukherjeewrote: > > https://patchwork.kernel.org/patch/8049041/ > > I have an old laptop running on 800Mhz CPU. It has "Broadcom BCM4311 > [14e4:4311] (rev 01)". > I will try to test it on this weekend. Any news on this one? -- Michael pgpLl72Z376ek.pgp Description: OpenPGP digital signature
Re: [PATCH net-next v4] tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In
On Thu, Mar 10, 2016 at 9:39 AM, Eric Dumazetwrote: > On Thu, Mar 10, 2016 at 9:29 AM, Martin KaFai Lau wrote: >> Per RFC4898, they count segments sent/received >> containing a positive length data segment (that includes >> retransmission segments carrying data). Unlike >> tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments >> carrying no data (e.g. pure ack). >> >> The patch also updates the segs_in in tcp_fastopen_add_skb() >> so that segs_in >= data_segs_in property is kept. If >> tcp_segs_in() helper is used in this fastopen case, tp->segs_in >> has to be 0 reset first to avoid double counting. Also, it has >> to be done before __skb_pull(skb, tcp_hdrlen(skb)) while >> there is no need to check skb->len since skb has already >> been confirmed carrying data. I found it more confusing >> and chose to directly set segs_in and data_segs_in in >> this special case. > > Note that on my TODO list after commit > e11ecddf5128011c936cc5360780190cbc901fdc > I had the project of pulling TCP headers much earlier in input path > so that we do not have all these special cases. > > Acked-by: Eric Dumazet Actually, tcp_fastopen_add_skb() can queue a packet with a FIN only, but no data. I believe you need to test skb->len before setting tp->data_segs_in
Re: pull-request: can-next 2016-03-10,pull-request: can-next 2016-03-10
From: Marc Kleine-BuddeDate: Thu, 10 Mar 2016 10:33:28 +0100 > this is a pull request of 5 patch for net-next/master. > > Marek Vasut contributes 4 patches for the ifi CAN driver, which makes > it work on real hardware. There is one patch by Ramesh Shanmugasundaram > for the rcar_can driver that adds support for the 3rd generation IP > core. Pulled, thanks Marc.
Re: [PATCH net-next v4] tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In
On Thu, Mar 10, 2016 at 9:29 AM, Martin KaFai Lauwrote: > Per RFC4898, they count segments sent/received > containing a positive length data segment (that includes > retransmission segments carrying data). Unlike > tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments > carrying no data (e.g. pure ack). > > The patch also updates the segs_in in tcp_fastopen_add_skb() > so that segs_in >= data_segs_in property is kept. If > tcp_segs_in() helper is used in this fastopen case, tp->segs_in > has to be 0 reset first to avoid double counting. Also, it has > to be done before __skb_pull(skb, tcp_hdrlen(skb)) while > there is no need to check skb->len since skb has already > been confirmed carrying data. I found it more confusing > and chose to directly set segs_in and data_segs_in in > this special case. Note that on my TODO list after commit e11ecddf5128011c936cc5360780190cbc901fdc I had the project of pulling TCP headers much earlier in input path so that we do not have all these special cases. Acked-by: Eric Dumazet
Re: [PATCH net-next v4] tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In
On Thu, Mar 10, 2016 at 9:29 AM, Martin KaFai Lauwrote: > Per RFC4898, they count segments sent/received > containing a positive length data segment (that includes > retransmission segments carrying data). Unlike > tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments > carrying no data (e.g. pure ack). > > The patch also updates the segs_in in tcp_fastopen_add_skb() > so that segs_in >= data_segs_in property is kept. If > tcp_segs_in() helper is used in this fastopen case, tp->segs_in > has to be 0 reset first to avoid double counting. Also, it has > to be done before __skb_pull(skb, tcp_hdrlen(skb)) while > there is no need to check skb->len since skb has already > been confirmed carrying data. I found it more confusing > and chose to directly set segs_in and data_segs_in in > this special case. > > Together with retransmission data, tcpi_data_segs_out > gives a better signal on the rxmit rate. > > v4: Add comment to the changes in tcp_fastopen_add_skb() > and also add remark on this case in the commit message. > > v3: Add const modifier to the skb parameter in tcp_segs_in() > > v2: Rework based on recent fix by Eric: > commit a9d99ce28ed3 ("tcp: fix tcpi_segs_in after connection establishment") > > Signed-off-by: Martin KaFai Lau > Cc: Chris Rapier > Cc: Eric Dumazet > Cc: Marcelo Ricardo Leitner > Cc: Neal Cardwell > Cc: Yuchung Cheng > --- Acked-by: Yuchung Cheng Thanks for the clarification. > include/linux/tcp.h | 6 ++ > include/net/tcp.h| 10 ++ > include/uapi/linux/tcp.h | 2 ++ > net/ipv4/tcp.c | 2 ++ > net/ipv4/tcp_fastopen.c | 10 ++ > net/ipv4/tcp_ipv4.c | 2 +- > net/ipv4/tcp_minisocks.c | 2 +- > net/ipv4/tcp_output.c| 4 +++- > net/ipv6/tcp_ipv6.c | 2 +- > 9 files changed, 36 insertions(+), 4 deletions(-) > > diff --git a/include/linux/tcp.h b/include/linux/tcp.h > index bcbf51d..7be9b12 100644 > --- a/include/linux/tcp.h > +++ b/include/linux/tcp.h > @@ -158,6 +158,9 @@ struct tcp_sock { > u32 segs_in;/* RFC4898 tcpEStatsPerfSegsIn > * total number of segments in. > */ > + u32 data_segs_in; /* RFC4898 tcpEStatsPerfDataSegsIn > +* total number of data segments in. > +*/ > u32 rcv_nxt;/* What we want to receive next */ > u32 copied_seq; /* Head of yet unread data */ > u32 rcv_wup;/* rcv_nxt on last window update sent */ > @@ -165,6 +168,9 @@ struct tcp_sock { > u32 segs_out; /* RFC4898 tcpEStatsPerfSegsOut > * The total number of segments sent. > */ > + u32 data_segs_out; /* RFC4898 tcpEStatsPerfDataSegsOut > +* total number of data segments sent. > +*/ > u64 bytes_acked;/* RFC4898 tcpEStatsAppHCThruOctetsAcked > * sum(delta(snd_una)), or how many bytes > * were acked. > diff --git a/include/net/tcp.h b/include/net/tcp.h > index e90db85..24557a8 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -1816,4 +1816,14 @@ static inline void skb_set_tcp_pure_ack(struct sk_buff > *skb) > skb->truesize = 2; > } > > +static inline void tcp_segs_in(struct tcp_sock *tp, const struct sk_buff > *skb) > +{ > + u16 segs_in; > + > + segs_in = max_t(u16, 1, skb_shinfo(skb)->gso_segs); > + tp->segs_in += segs_in; > + if (skb->len > tcp_hdrlen(skb)) > + tp->data_segs_in += segs_in; > +} > + > #endif /* _TCP_H */ > diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h > index fe95446..53e8e3f 100644 > --- a/include/uapi/linux/tcp.h > +++ b/include/uapi/linux/tcp.h > @@ -199,6 +199,8 @@ struct tcp_info { > > __u32 tcpi_notsent_bytes; > __u32 tcpi_min_rtt; > + __u32 tcpi_data_segs_in; /* RFC4898 tcpEStatsDataSegsIn */ > + __u32 tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */ > }; > > /* for TCP_MD5SIG socket option */ > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index f9faadb..6b01b48 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -2728,6 +2728,8 @@ void tcp_get_info(struct sock *sk, struct tcp_info > *info) > info->tcpi_notsent_bytes = max(0, notsent_bytes); > > info->tcpi_min_rtt = tcp_min_rtt(tp); > + info->tcpi_data_segs_in = tp->data_segs_in; > + info->tcpi_data_segs_out = tp->data_segs_out; > } > EXPORT_SYMBOL_GPL(tcp_get_info); > > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c > index fdb286d..74068e6 100644 > ---
[PATCH net-next v4] tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In
Per RFC4898, they count segments sent/received containing a positive length data segment (that includes retransmission segments carrying data). Unlike tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments carrying no data (e.g. pure ack). The patch also updates the segs_in in tcp_fastopen_add_skb() so that segs_in >= data_segs_in property is kept. If tcp_segs_in() helper is used in this fastopen case, tp->segs_in has to be 0 reset first to avoid double counting. Also, it has to be done before __skb_pull(skb, tcp_hdrlen(skb)) while there is no need to check skb->len since skb has already been confirmed carrying data. I found it more confusing and chose to directly set segs_in and data_segs_in in this special case. Together with retransmission data, tcpi_data_segs_out gives a better signal on the rxmit rate. v4: Add comment to the changes in tcp_fastopen_add_skb() and also add remark on this case in the commit message. v3: Add const modifier to the skb parameter in tcp_segs_in() v2: Rework based on recent fix by Eric: commit a9d99ce28ed3 ("tcp: fix tcpi_segs_in after connection establishment") Signed-off-by: Martin KaFai LauCc: Chris Rapier Cc: Eric Dumazet Cc: Marcelo Ricardo Leitner Cc: Neal Cardwell Cc: Yuchung Cheng --- include/linux/tcp.h | 6 ++ include/net/tcp.h| 10 ++ include/uapi/linux/tcp.h | 2 ++ net/ipv4/tcp.c | 2 ++ net/ipv4/tcp_fastopen.c | 10 ++ net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/tcp_minisocks.c | 2 +- net/ipv4/tcp_output.c| 4 +++- net/ipv6/tcp_ipv6.c | 2 +- 9 files changed, 36 insertions(+), 4 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index bcbf51d..7be9b12 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -158,6 +158,9 @@ struct tcp_sock { u32 segs_in;/* RFC4898 tcpEStatsPerfSegsIn * total number of segments in. */ + u32 data_segs_in; /* RFC4898 tcpEStatsPerfDataSegsIn +* total number of data segments in. +*/ u32 rcv_nxt;/* What we want to receive next */ u32 copied_seq; /* Head of yet unread data */ u32 rcv_wup;/* rcv_nxt on last window update sent */ @@ -165,6 +168,9 @@ struct tcp_sock { u32 segs_out; /* RFC4898 tcpEStatsPerfSegsOut * The total number of segments sent. */ + u32 data_segs_out; /* RFC4898 tcpEStatsPerfDataSegsOut +* total number of data segments sent. +*/ u64 bytes_acked;/* RFC4898 tcpEStatsAppHCThruOctetsAcked * sum(delta(snd_una)), or how many bytes * were acked. diff --git a/include/net/tcp.h b/include/net/tcp.h index e90db85..24557a8 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1816,4 +1816,14 @@ static inline void skb_set_tcp_pure_ack(struct sk_buff *skb) skb->truesize = 2; } +static inline void tcp_segs_in(struct tcp_sock *tp, const struct sk_buff *skb) +{ + u16 segs_in; + + segs_in = max_t(u16, 1, skb_shinfo(skb)->gso_segs); + tp->segs_in += segs_in; + if (skb->len > tcp_hdrlen(skb)) + tp->data_segs_in += segs_in; +} + #endif /* _TCP_H */ diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index fe95446..53e8e3f 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -199,6 +199,8 @@ struct tcp_info { __u32 tcpi_notsent_bytes; __u32 tcpi_min_rtt; + __u32 tcpi_data_segs_in; /* RFC4898 tcpEStatsDataSegsIn */ + __u32 tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */ }; /* for TCP_MD5SIG socket option */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f9faadb..6b01b48 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2728,6 +2728,8 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info) info->tcpi_notsent_bytes = max(0, notsent_bytes); info->tcpi_min_rtt = tcp_min_rtt(tp); + info->tcpi_data_segs_in = tp->data_segs_in; + info->tcpi_data_segs_out = tp->data_segs_out; } EXPORT_SYMBOL_GPL(tcp_get_info); diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index fdb286d..74068e6 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -131,6 +131,7 @@ static bool tcp_fastopen_cookie_gen(struct request_sock *req, void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); + u16 segs_in; if (TCP_SKB_CB(skb)->end_seq == tp->rcv_nxt) return; @@ -154,6 +155,15 @@ void
Re: [PATCH net-next 1/3] xen-netback: re-import canonical netif header
On Thu, Mar 10, 2016 at 12:30:26PM +, Paul Durrant wrote: > The canonical netif header (in the Xen source repo) and the Linux variant > have diverged significantly. Recently much documentation has been added to > the canonical header which is highly useful for developers making > modifications to either xen-netfront or xen-netback. This patch therefore > re-imports the canonical header in its entirity. > > To maintain compatibility and some style consistency with the old Linux > variant, the header was stripped of its emacs boilerplate, and > post-processed and copied into place with the following commands: > > ed -s netif.h << EOF > H > ,s/NETTXF_/XEN_NETTXF_/g > ,s/NETRXF_/XEN_NETRXF_/g > ,s/NETIF_/XEN_NETIF_/g > ,s/XEN_XEN_/XEN_/g > ,s/netif/xen_netif/g > ,s/xen_xen_/xen_/g > ,s/^typedef.*$//g > ,s/^/${TAB}/g > w > $ > w > EOF > > indent --line-length 80 --linux-style netif.h \ > -o include/xen/interface/io/netif.h > > Signed-off-by: Paul Durrant> Cc: Konrad Rzeszutek Wilk > Cc: Boris Ostrovsky > Cc: David Vrabel > Cc: Wei Liu Acked-by: Wei Liu
Re: [PATCH net-next 2/3] xen-netback: support multiple extra info fragments passed from frontend
On Thu, Mar 10, 2016 at 12:30:27PM +, Paul Durrant wrote: > The code does not currently support a frontend passing multiple extra info > fragments to the backend in a tx request. The xenvif_get_extras() function > handles multiple extra_info fragments but make_tx_response() assumes there > is only ever a single extra info fragment. > > This patch modifies xenvif_get_extras() to pass back a count of extra > info fragments, which is then passed to make_tx_response() (after > possibly being stashed in pending_tx_info for deferred responses). > > Signed-off-by: Paul Durrant> Cc: Wei Liu Acked-by: Wei Liu
Re: [PATCH net-next 3/3] xen-netback: reduce log spam
On Thu, Mar 10, 2016 at 12:30:28PM +, Paul Durrant wrote: > Remove the "prepare for reconnect" pr_info in xenbus.c. It's largely > uninteresting and the states of the frontend and backend can easily be > observed by watching the (o)xenstored log. > > Signed-off-by: Paul Durrant> Cc: Wei Liu Acked-by: Wei Liu > --- > drivers/net/xen-netback/xenbus.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/net/xen-netback/xenbus.c > b/drivers/net/xen-netback/xenbus.c > index 39a303d..bd182cd 100644 > --- a/drivers/net/xen-netback/xenbus.c > +++ b/drivers/net/xen-netback/xenbus.c > @@ -511,8 +511,6 @@ static void set_backend_state(struct backend_info *be, > switch (state) { > case XenbusStateInitWait: > case XenbusStateConnected: > - pr_info("%s: prepare for reconnect\n", > - be->dev->nodename); > backend_switch_state(be, XenbusStateInitWait); > break; > case XenbusStateClosing: > -- > 2.1.4 >
Re: [net-next PATCH V3 1/3] net: adjust napi_consume_skb to handle none-NAPI callers
Hello. On 03/10/2016 05:59 PM, Jesper Dangaard Brouer wrote: Some drivers reuse/share code paths that free SKBs between NAPI and none-NAPI calls. Adjust napi_consume_skb to handle this use-case. Before, calls from netpoll (w/ IRQs disabled) was handled and indicated with a budget zero indication. Use the same zero indication to handle calls not originating from NAPI/softirq. Simply handled by using dev_consume_skb_any(). This adds an extra branch+call for the netpoll case (checking in_irq() + irqs_disabled()), but that is okay as this is a slowpath. Suggested-by: Alexander DuyckSigned-off-by: Jesper Dangaard Brouer --- net/core/skbuff.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7af7ec635d90..bc62baa54ceb 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -801,9 +801,9 @@ void napi_consume_skb(struct sk_buff *skb, int budget) if (unlikely(!skb)) return; - /* if budget is 0 assume netpoll w/ IRQs disabled */ + /* Zero budget indicate none-NAPI context called us, like netpoll */ Non-NAPI? [...] MBR, Sergei
Re: Micrel Phy - Is there a way to configure the Phy not to do 802.3x flow control?
On 03/03/2016 07:16 PM, Florian Fainelli wrote: > On 03/03/16 14:18, Murali Karicheri wrote: >> Hi, >> >> We are using Micrel Phy in one of our board and wondering if we can force the >> Phy to disable flow control at start. I have a 1G ethernet switch connected >> to Phy and the phy always enable flow control. I would like to configure the >> phy not to flow control. Is that possible and if yes, what should I do in the >> my Ethernet driver to tell the Phy not to enable flow control? > > The PHY is not doing flow control per-se, your pseudo Ethernet MAC in > the switch is doing, along with the link partner advertising support for > it. You would want to make sure that your PHY device interface (provided > that you are using the PHY library) is not starting with Pause > advertised, but it could be supported. Understood that Phy is just advertise FC. The Micrel phy for 9031 advertise by default FC supported. After negotiation, I see that Phylib provide the link status with parameter pause = 1, asym_pause = 1. How do I tell the Phy not to advertise? I call following sequence in the Ethernet driver. of_phy_connect(x,y,hndlr,a,z); phy_start() Now in hndlr() I have pause = 1, asym_pause = 1, in phy_device ptr. How can I tell the phy not to advertise initially? Murali > > As Andrew indicated the proper way to do this is do to use ethtool if > you need to this dynamically. > -- Murali Karicheri Linux Kernel, Keystone
Re: [PATCH v3 0/8] arm64: rockchip: Initial GeekBox enablement
On Thu, Mar 10, 2016 at 3:13 AM, Giuseppe CAVALLAROwrote: > On 3/9/2016 5:31 PM, Dinh Nguyen wrote: >> >> On Wed, Mar 9, 2016 at 8:53 AM, Giuseppe CAVALLARO >> wrote: >>> >>> Hi Tomeu, Dinh, Andreas >>> >>> I need a sum and help from you to go ahead on the >>> tx timeout. >>> >>> The "stmmac: MDIO fixes" seems to be the candidate to >>> fix the phy connection and I will send the V2 asap (Andreas' comment). >>> >>> So, supposing the probe is ok and phy is connected, >>> I need your input ... >>> >>> Tomeu: after revering the 0e80bdc9a72d (stmmac: first frame >>> prep at the end of xmit routine) the network is >>> not stable and there is a timeout after a while. >>> The box has 3.50 with normal desc settings. >>> >>> Dinh: the network is ok, I wonder if you can share a boot >>> log just to understand if the normal or enhanced >>> descriptors are used. >>> >> >> Here it is: > > ... >> >> [0.850523] stmmac - user ID: 0x10, Synopsys ID: 0x37 >> [0.855570] Ring mode enabled >> [0.858611] DMA HW capability register supported >> [0.863128] Enhanced/Alternate descriptors >> [0.867482] Enabled extended descriptors >> [0.871482] RX Checksum Offload Engine supported (type 2) >> [0.876948] TX Checksum insertion supported >> [0.881204] Enable RX Mitigation via HW Watchdog Timer >> [0.886863] socfpga-dwmac ff702000.ethernet eth0: No MDIO subnode found >> [0.899090] libphy: stmmac: probed >> [0.902484] eth0: PHY ID 00221611 at 4 IRQ POLL (stmmac-0:04) active > > > Thx Dinh, so you are using the Enhanced/Alternate descriptors > I am debugging on my side on a setup with normal descriptors, I let you > know > Doesn't the printout "Enhanced/Alternate descriptors" mean that I'm using Enhanced/Alternate descriptors? Dinh
Re: [PATCH 2/2] isdn: i4l: move active-isdn drivers to staging
Am 10.03.2016 um 13:58 schrieb Paul Bolle: > Hi Karsten, > > On do, 2016-03-10 at 11:53 +0100, i...@linux-pingi.de wrote: >> mISDN with CAPI support works just fine with pppd and pppdcapiplugin >> and the CAPI works for all mISDN HW. > > In the mainline tree the mISDN and CAPI stacks are effectively separate. > Do you perhaps refer to a mISDN + Asterisk + chan-capi setup? (That's > the closest to mISDN with CAPI support that I could find. Did I miss > something?) http://listserv.isdn4linux.de/pipermail/isdn4linux/2012-January/005580.html Since 2012 mISDN has a cAPI20 interface, pure in userspace. Everything is in the capi20 subdirectory of mISDNuser. The capi20 support need to be enabled with ./configure. Has nothing to do with Asterisk, but for FAX it is useing the same DSP library, spandsp. Best Karsten Keil
Re: [PATCH] mrf24j40: fix security-enabled processing on inbound frames
Hello. On 29/02/16 20:49, Alan Ott wrote: On 02/18/2016 01:34 PM, zopieux wrote: Fix the MRF24J40 handling of security-enabled frames so it does not block upon receiving such frames. Signed-off-by: Alexander AringReported-by: Alexandre Macabies Tested-by: Alexandre Macabies --- When receiving a security-enabled IEEE 802.15.4 frame, the MRF24J40 triggers a SECIF interrupt that needs to be handled for RX processing to keep functioning properly. This patch enables the SECIF interrupt and makes the MRF ignores all hardware processing of security-enabled frames, that is handled by the ieee802154 stack instead. --- The "From" field of the email needs to have your real name in it. This will be where the "Author" field in git comes from. It looks like there are a few separate things happening in this patch. Maybe they should be broken out in to separate patches. I see: 1. The ieee802154.h part, 2. The TX part, 3. The RX part. The patch description only really describes the RX part. zopieux, could you split the patch as Alan suggested and re-submitted the series? regards Stefan Schmidt
Re: [GIT PULL v2 0/4] IPVS Fixes for v4.5
On Mon, Mar 07, 2016 at 12:03:30PM +0900, Simon Horman wrote: > Hi Pablo, > > please consider these IPVS fixes for v4.5 or > if it is too late please consider them for v4.6. Pulled into nf-next, thanks Simon!