Re: [PATCH net-next 0/3] net: systemport: misc improvements
From: Florian Fainelli Date: Thu, 28 May 2015 15:24:41 -0700 > These patches are highly inspired by changes from Petri on bcmgenet, last > patch > is a misc fix that I had pending for a while, but is not a candidate for 'net' > at this point. Applied, thanks Florian. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] ipv6: drop unneeded goto
From: Julia Lawall Date: Thu, 28 May 2015 23:02:17 +0200 > From: Julia Lawall > > Delete jump to a label on the next line, when that label is not > used elsewhere. > > A simplified version of the semantic patch that makes this change is as > follows: (http://coccinelle.lip6.fr/) ... > Also remove the unnecessary ret variable. > > Signed-off-by: Julia Lawall Applied, thanks Julia. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 0/3] bna: misc bugfixes
From: Ivan Vecera Date: Thu, 28 May 2015 23:10:05 +0200 > These patches fix several bugs found during device initialization debugging. Applied, thanks Ivan. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 net-next 00/11] net: Increase inputs to flow_keys hashing
From: Tom Herbert Date: Thu, 28 May 2015 11:18:57 -0700 > This patch set adds new fields to the flow_keys structure and hashes > over these fields to get a better flow hash. In particular, these > patches now include hashing over the full IPv6 addresses in order > to defend against address spoofing that always results in the > same hash. The new input also includes the Ethertype, L4 protocol, > VLAN, flow label, GRE keyid, and MPLS entropy label. Looks like one more respin needed of this based upon Jiri's feedback. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: thunderx: add 64-bit dependency
From: Arnd Bergmann Date: Thu, 28 May 2015 16:00:46 +0200 > The thunderx ethernet driver fails to build on architectures > that do not have an atomic readq() and writeq() function for > 64-bit PCI bus access: > > drivers/net/ethernet/cavium/thunder/thunder_bgx.c: In function 'bgx_reg_read': > include/asm-generic/io.h:195:23: error: implicit declaration of function > 'readq' [-Werror=implicit-function-declaration] > > It seems impossible to get this driver to work on most 32-bit > hardware, so it's better to add an explicit dependency, in > order to let us keep building 'allmodconfig' kernels on > all architectures. > > As the driver is meant for the internal hardware on an arm64 SoC, this > is not a problem for usability. Allowing the build on all 64-bit > architectures rather than just CONFIG_ARM64 on the other hand means that > we get the benefit of build testing on x86. > > Signed-off-by: Arnd Bergmann Applied, thanks Arnd. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pull-request: mac80211 2015-05-28
From: Johannes Berg Date: Thu, 28 May 2015 14:45:49 +0200 > Please excuse the quick succession with another pull request - Ben > pointed out to me that a fix I'd applied on -next is actually needed on > 4.1 - we'll have to live with it being in both I suppose. Sorry about > that. Pulled, thanks Johannes. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next V1 0/4] mlx4 driver update, May 28, 2015
From: Or Gerlitz Date: Sun, 31 May 2015 09:30:14 +0300 > The 1st patch fixes an issue with a function running DPDK overriding > broadcast steering rules set by other functions. Please add this one > to your -stable queue. > > The rest of the series from Matan and Ido deals with scaling the number > of IRQs that serve RoCE applications to be in par with the Ethernet driver. Series applied, thanks Or. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware
On Sat, May 30, 2015 at 9:19 PM, John Fastabend wrote: > On 05/30/2015 02:00 AM, Jiri Pirko wrote: >> >> Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote: >>> >>> On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko wrote: Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote: > > On Tue, May 19, 2015 at 1:28 PM, David Miller > wrote: >> >> From: Andy Gospodarek >> Date: Tue, 19 May 2015 15:47:32 -0400 >> >>> Are you actually saying that if users complain loudly enough about >>> the current behavior (not the change Roopa has proposed) that you >>> would be open to considering a change the current behavior? >> >> >> I am saying that we have a contract with users not to break existing >> behavior. Full stop. > > > After rehearing David's argument, we should probably explore option d) > which is a refinement on the fib_offload_disable mechanism we have > today. fib_offload_disable is global for all routes. Once we hit a > HW install problem, the global flag is set and all routes fallback to > SW. We did this because we can't allow the failed route to exist in > SW and not in HW because it could mess up LPM searches (HW could hit > on a lesser prefix even when SW has the true LPM, because HW gets > first shot at match). The refinement on fib_offload_disable is this: > make it per-related-prefix rather than global, and on a HW install > problem, set the flag for the related-prefix and uninstall only those > routes from HW. Related-prefix (is there a correct term for this?) > are routes to the same dst addr but with different prefix lengths. I > haven't parsed the fib_trie structure to see how routes are organized, > but I suspect since it's optimized for lookup the related-prefix > tracking is already there and we can build on that. This looks interesting. However, I'm not sure that it is acceptable for user to experience this hw evict of "random entries". User knows what entries are essential to have in hw. With your solution, I can see no way user can actually say what should be offloaded or not. Kernel just automagically decides. >>> >>> >>> The default eviction policy could be based on RTA_PRIORITY: evict >>> lower priority routes first. It would be up to the device driver to >>> decide between two routes of same priority. >>> >>> To help device driver make the decision, we could have eviction policy >>> options: >>> >>> Priority-base (default) >>> Prefer IPv6 over IPv4 >>> Prefer IPv4 over IPv6 >>> Prefer single path over multipath >>> Prefer longer prefix lengths over shorter >>> Optimize for resource utilization >>> >>> These are portable across different switches. They're in terms a >>> user understands. It's up to the device driver which truly >>> understands the device constraints to translates the user's eviction >>> policy choices into something that makes sense to that device. >> >> >> This sounds tempting... You plan to throw in some patches, or should I >> take care of that? >> > > This is encoding specific policies into the kernel. I was hoping to > avoid this and let user space develop whatever policy it wants. If you > use Jiri's proposed NLM_F_SKIP_{KERNEL|OFFLOAD} flags you get this. > > Also I don't understand the "truly understands the device constraints" > comment. We can export a model of the device and know how many rules > of each type will fit exactly into the table. This doesn't seem like > much of a problem to me. In fact the driver developer should know this > anyway. > > Part of my motivation here is I really don't want to get stuck with a > case where each driver writer gets to translate the eviction policy > onto their device in some device specific and slightly different way. But this is _exactly_ what I want. Here's why: my claim is it will be impossible for us (device vendors) to define a universal set of resource constraints that works for all devices from all vendors. I was kind of hoping some vendor would throw out a set to get us started. Ok, I'll start with rocker: rocker will enforce in the device these constraints listed below. There will be a device command to query the raw constraints. So here goes: VLAN table max entries: 16K // a VLAN on a port takes one entry Term MAC table max entries: no limit Bridging table: Unicast max entries: 12K Multicast max entries: 4K Unicast Routing table (shared for v4 and v6 entries): Prefix max slots: 16K IPv4 route takes one slot IPv6 prefix len <= 64 route takes two slots IPv6 prefix len > 64 takes four slots Nexthop max slots: 4K Max ECMP width: 32 Each nexthop MAC takes one slot, but there is a stride of 4 slots Multicast Routing table (shared for v4 and v6 entries): (same as unicast routing, except max slots are 1/2 as big) ACL
[PATCH] bridge: fix br_multicast_query_expired() bug
From: Eric Dumazet br_multicast_query_expired() querier argument is a pointer to a struct bridge_mcast_querier : struct bridge_mcast_querier { struct br_ip addr; struct net_bridge_port __rcu*port; }; Intent of the code was to clear port field, not the pointer to querier. Fixes: 2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier port") Signed-off-by: Eric Dumazet Acked-by: Thadeu Lima de Souza Cascardo Acked-by: Linus Lüssing Cc: Linus Lüssing Cc: Steinar H. Gunderson Signed-off-by: David S. Miller --- Posting this to the list so it gets into patchwork and I can properly queue it up for -stable. net/bridge/br_multicast.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index a3abe6e..22fd041 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1822,7 +1822,7 @@ static void br_multicast_query_expired(struct net_bridge *br, if (query->startup_sent < br->multicast_startup_query_count) query->startup_sent++; - RCU_INIT_POINTER(querier, NULL); + RCU_INIT_POINTER(querier->port, NULL); br_multicast_send_query(br, NULL, query); spin_unlock(&br->multicast_lock); } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] bridge: fix br_multicast_query_expired() bug
From: Eric Dumazet Date: Thu, 28 May 2015 04:42:54 -0700 > From: Eric Dumazet > > br_multicast_query_expired() querier argument is a pointer to > a struct bridge_mcast_querier : > > struct bridge_mcast_querier { > struct br_ip addr; > struct net_bridge_port __rcu*port; > }; > > Intent of the code was to clear port field, not the pointer to querier. > > Fixes: 2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier > port") > Signed-off-by: Eric Dumazet Applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next V1 3/4] net/mlx4_core: Move affinity hints to mlx4_core ownership
From: Ido Shamay Now that EQs management is in the sole responsibility of mlx4_core, the IRQ affinity hints configuration should be in its hands as well. request_irq is called only once by the first consumer (maybe mlx4_ib), so mlx4_en passes the affinity mask too late. We also need to request vectors according to the cores we want to run on. mlx4_core distribution of IRQs to cores is straight forward, EQ(i)->IRQ will set affinity hint to core i. Consumers need to request EQ vectors, according to their cores considerations (NUMA). Signed-off-by: Ido Shamay Signed-off-by: Matan Barak Signed-off-by: Or Gerlitz --- drivers/net/ethernet/mellanox/mlx4/en_cq.c | 10 +--- drivers/net/ethernet/mellanox/mlx4/eq.c| 21 drivers/net/ethernet/mellanox/mlx4/main.c | 36 drivers/net/ethernet/mellanox/mlx4/mlx4.h |1 + 4 files changed, 59 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c index d71c567..63769df 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c @@ -114,7 +114,7 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq, if (cq->is_tx == RX) { if (!mlx4_is_eq_vector_valid(mdev->dev, priv->port, cq->vector)) { - cq->vector = cq_idx; + cq->vector = cpumask_first(priv->rx_ring[cq->ring]->affinity_mask); err = mlx4_assign_eq(mdev->dev, priv->port, &cq->vector); @@ -160,13 +160,6 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq, netif_napi_add(cq->dev, &cq->napi, mlx4_en_poll_tx_cq, NAPI_POLL_WEIGHT); } else { - struct mlx4_en_rx_ring *ring = priv->rx_ring[cq->ring]; - - err = irq_set_affinity_hint(cq->mcq.irq, - ring->affinity_mask); - if (err) - mlx4_warn(mdev, "Failed setting affinity hint\n"); - netif_napi_add(cq->dev, &cq->napi, mlx4_en_poll_rx_cq, 64); napi_hash_add(&cq->napi); } @@ -205,7 +198,6 @@ void mlx4_en_deactivate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq) if (!cq->is_tx) { napi_hash_del(&cq->napi); synchronize_rcu(); - irq_set_affinity_hint(cq->mcq.irq, NULL); } netif_napi_del(&cq->napi); diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c index 2e6fc6a..1116882 100644 --- a/drivers/net/ethernet/mellanox/mlx4/eq.c +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c @@ -221,6 +221,20 @@ static void mlx4_slave_event(struct mlx4_dev *dev, int slave, slave_event(dev, slave, eqe); } +static void mlx4_set_eq_affinity_hint(struct mlx4_priv *priv, int vec) +{ + int hint_err; + struct mlx4_dev *dev = &priv->dev; + struct mlx4_eq *eq = &priv->eq_table.eq[vec]; + + if (!eq->affinity_mask || cpumask_empty(eq->affinity_mask)) + return; + + hint_err = irq_set_affinity_hint(eq->irq, eq->affinity_mask); + if (hint_err) + mlx4_warn(dev, "irq_set_affinity_hint failed, err %d\n", hint_err); +} + int mlx4_gen_pkey_eqe(struct mlx4_dev *dev, int slave, u8 port) { struct mlx4_eqe eqe; @@ -1092,6 +1106,10 @@ static void mlx4_free_irqs(struct mlx4_dev *dev) for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) if (eq_table->eq[i].have_irq) { + free_cpumask_var(eq_table->eq[i].affinity_mask); +#if defined(CONFIG_SMP) + irq_set_affinity_hint(eq_table->eq[i].irq, NULL); +#endif free_irq(eq_table->eq[i].irq, eq_table->eq + i); eq_table->eq[i].have_irq = 0; } @@ -1483,6 +1501,9 @@ int mlx4_assign_eq(struct mlx4_dev *dev, u8 port, int *vector) clear_bit(*prequested_vector, priv->msix_ctl.pool_bm); *prequested_vector = -1; } else { +#if defined(CONFIG_SMP) + mlx4_set_eq_affinity_hint(priv, *prequested_vector); +#endif eq_set_ci(&priv->eq_table.eq[*prequested_vector], 1); priv->eq_table.eq[*prequested_vector].have_irq = 1; } diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 3ec5113..0dbd704 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2481,6 +2481,36 @@ err_uar_table_free: return err; } +static int mlx4_init_affinity_hint(struct mlx4_dev *dev, int port, int eqn) +{ + int requested_cp
[PATCH net-next V1 1/4] net/mlx4_core: Demote simple multicast and broadcast flow steering rules
From: Matan Barak In SRIOV, when simple (i.e - Ethernet L2 only) flow steering rules are created, always create them at MLX4_DOMAIN_NIC priority (instead of the real priority the function created them at). This is done in order to let multiple functions add broadcast/multicast rules without affecting other functions, which is necessary for DPDK in SRIOV. Signed-off-by: Matan Barak Signed-off-by: Or Gerlitz --- drivers/infiniband/hw/mlx4/main.c |4 +- .../net/ethernet/mellanox/mlx4/resource_tracker.c | 23 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index cc64400..8c96c71 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -1090,7 +1090,7 @@ static int __mlx4_ib_create_flow(struct ib_qp *qp, struct ib_flow_attr *flow_att ret = mlx4_cmd_imm(mdev->dev, mailbox->dma, reg_id, size >> 2, 0, MLX4_QP_FLOW_STEERING_ATTACH, MLX4_CMD_TIME_CLASS_A, - MLX4_CMD_NATIVE); + MLX4_CMD_WRAPPED); if (ret == -ENOMEM) pr_err("mcg table is full. Fail to register network rule.\n"); else if (ret == -ENXIO) @@ -1107,7 +1107,7 @@ static int __mlx4_ib_destroy_flow(struct mlx4_dev *dev, u64 reg_id) int err; err = mlx4_cmd(dev, reg_id, 0, 0, MLX4_QP_FLOW_STEERING_DETACH, MLX4_CMD_TIME_CLASS_A, - MLX4_CMD_NATIVE); + MLX4_CMD_WRAPPED); if (err) pr_err("Fail to detach network rule. registration id = 0x%llx\n", reg_id); diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c index 15ec081..ab48386 100644 --- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c +++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c @@ -3973,6 +3973,22 @@ static int validate_eth_header_mac(int slave, struct _rule_hw *eth_header, return 0; } +static void handle_eth_header_mcast_prio(struct mlx4_net_trans_rule_hw_ctrl *ctrl, +struct _rule_hw *eth_header) +{ + if (is_multicast_ether_addr(eth_header->eth.dst_mac) || + is_broadcast_ether_addr(eth_header->eth.dst_mac)) { + struct mlx4_net_trans_rule_hw_eth *eth = + (struct mlx4_net_trans_rule_hw_eth *)eth_header; + struct _rule_hw *next_rule = (struct _rule_hw *)(eth + 1); + bool last_rule = next_rule->size == 0 && next_rule->id == 0 && + next_rule->rsvd == 0; + + if (last_rule) + ctrl->prio = cpu_to_be16(MLX4_DOMAIN_NIC); + } +} + /* * In case of missing eth header, append eth header with a MAC address * assigned to the VF. @@ -4125,6 +4141,12 @@ int mlx4_QP_FLOW_STEERING_ATTACH_wrapper(struct mlx4_dev *dev, int slave, rule_header = (struct _rule_hw *)(ctrl + 1); header_id = map_hw_to_sw_id(be16_to_cpu(rule_header->id)); + if (header_id == MLX4_NET_TRANS_RULE_ID_ETH) + handle_eth_header_mcast_prio(ctrl, rule_header); + + if (slave == dev->caps.function) + goto execute; + switch (header_id) { case MLX4_NET_TRANS_RULE_ID_ETH: if (validate_eth_header_mac(slave, rule_header, rlist)) { @@ -4151,6 +4173,7 @@ int mlx4_QP_FLOW_STEERING_ATTACH_wrapper(struct mlx4_dev *dev, int slave, goto err_put; } +execute: err = mlx4_cmd_imm(dev, inbox->dma, &vhcr->out_param, vhcr->in_modifier, 0, MLX4_QP_FLOW_STEERING_ATTACH, MLX4_CMD_TIME_CLASS_A, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next V1 2/4] net/mlx4: Add EQ pool
From: Matan Barak Previously, mlx4_en allocated EQs and used them exclusively. This affected RoCE performance, as applications which are events sensitive were limited to use only the legacy EQs. Change that by introducing an EQ pool. This pool is managed by mlx4_core. EQs are assigned to ports (when there are limited number of EQs, multiple ports could be assigned to the same EQs). An exception to this rule is the ASYNC EQ which handles various events. Legacy EQs are completely removed as all EQs could be shared. When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for EQ serving on a specific port. The core driver calculates which EQ should be assigned to that request. Because IRQs are shared between IB and Ethernet modules, their names only include the PCI device BDF address. Signed-off-by: Matan Barak Signed-off-by: Ido Shamay Signed-off-by: Or Gerlitz --- drivers/infiniband/hw/mlx4/main.c | 71 ++ drivers/infiniband/hw/mlx4/mlx4_ib.h |1 - drivers/net/ethernet/mellanox/mlx4/cq.c| 10 +- drivers/net/ethernet/mellanox/mlx4/en_cq.c | 48 ++-- drivers/net/ethernet/mellanox/mlx4/en_netdev.c |7 +- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 13 +- drivers/net/ethernet/mellanox/mlx4/eq.c| 353 ++-- drivers/net/ethernet/mellanox/mlx4/main.c | 74 -- drivers/net/ethernet/mellanox/mlx4/mlx4.h | 11 +- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |2 +- include/linux/mlx4/device.h| 11 +- 11 files changed, 342 insertions(+), 259 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 8c96c71..024b0f7 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -2041,77 +2041,52 @@ static void init_pkeys(struct mlx4_ib_dev *ibdev) static void mlx4_ib_alloc_eqs(struct mlx4_dev *dev, struct mlx4_ib_dev *ibdev) { - char name[80]; - int eq_per_port = 0; - int added_eqs = 0; - int total_eqs = 0; - int i, j, eq; - - /* Legacy mode or comp_pool is not large enough */ - if (dev->caps.comp_pool == 0 || - dev->caps.num_ports > dev->caps.comp_pool) - return; - - eq_per_port = dev->caps.comp_pool / dev->caps.num_ports; - - /* Init eq table */ - added_eqs = 0; - mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB) - added_eqs += eq_per_port; - - total_eqs = dev->caps.num_comp_vectors + added_eqs; + int i, j, eq = 0, total_eqs = 0; - ibdev->eq_table = kzalloc(total_eqs * sizeof(int), GFP_KERNEL); + ibdev->eq_table = kcalloc(dev->caps.num_comp_vectors, + sizeof(ibdev->eq_table[0]), GFP_KERNEL); if (!ibdev->eq_table) return; - ibdev->eq_added = added_eqs; - - eq = 0; - mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB) { - for (j = 0; j < eq_per_port; j++) { - snprintf(name, sizeof(name), "mlx4-ib-%d-%d@%s", -i, j, dev->persist->pdev->bus->name); - /* Set IRQ for specific name (per ring) */ - if (mlx4_assign_eq(dev, name, NULL, - &ibdev->eq_table[eq])) { - /* Use legacy (same as mlx4_en driver) */ - pr_warn("Can't allocate EQ %d; reverting to legacy\n", eq); - ibdev->eq_table[eq] = - (eq % dev->caps.num_comp_vectors); - } - eq++; + for (i = 1; i <= dev->caps.num_ports; i++) { + for (j = 0; j < mlx4_get_eqs_per_port(dev, i); +j++, total_eqs++) { + if (i > 1 && mlx4_is_eq_shared(dev, total_eqs)) + continue; + ibdev->eq_table[eq] = total_eqs; + if (!mlx4_assign_eq(dev, i, + &ibdev->eq_table[eq])) + eq++; + else + ibdev->eq_table[eq] = -1; } } - /* Fill the reset of the vector with legacy EQ */ - for (i = 0, eq = added_eqs; i < dev->caps.num_comp_vectors; i++) - ibdev->eq_table[eq++] = i; + for (i = eq; i < dev->caps.num_comp_vectors; +ibdev->eq_table[i++] = -1) + ; /* Advertise the new number of EQs to clients */ - ibdev->ib_dev.num_comp_vectors = total_eqs; + ibdev->ib_dev.num_comp_vectors = eq; } static void mlx4_ib_free_eqs(struct mlx4_dev *dev, struct mlx4_ib_dev *ibdev) { int i; + int total_eqs = ibdev->ib_dev.num_comp_vectors; - /* no additional eqs were added */ + /* no eqs were allocated */
Re: [PATCH net-next 4/4] net/mlx4_core: Make sure there are no pending async events when freeing CQ
On 5/31/2015 9:23 AM, David Miller wrote: I agree with Sergei that one empty line is sufficient here, don't make it into two. Please respin with this fixed. Sure, I prepared V1 to address that earlier today, and will send it now. Or. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next V1 4/4] net/mlx4_core: Make sure there are no pending async events when freeing CQ
From: Matan Barak When freeing a CQ, we need to make sure there are no asynchronous events (on the ASYNC EQ) that could relate to this CQ before freeing it. This is done by introducing synchronize_irq. Signed-off-by: Matan Barak Signed-off-by: Ido Shamay Signed-off-by: Or Gerlitz --- drivers/net/ethernet/mellanox/mlx4/cq.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c index 7431cd4..3348e64 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c @@ -369,6 +369,9 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn); synchronize_irq(priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq); + if (priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq != + priv->eq_table.eq[MLX4_EQ_ASYNC].irq) + synchronize_irq(priv->eq_table.eq[MLX4_EQ_ASYNC].irq); spin_lock_irq(&cq_table->lock); radix_tree_delete(&cq_table->tree, cq->cqn); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next V1 0/4] mlx4 driver update, May 28, 2015
Hi Dave, The 1st patch fixes an issue with a function running DPDK overriding broadcast steering rules set by other functions. Please add this one to your -stable queue. The rest of the series from Matan and Ido deals with scaling the number of IRQs that serve RoCE applications to be in par with the Ethernet driver. Or. changes from V0: - addressed feedback from Sergei, removed extra blank line in patch #4 Ido Shamay (1): net/mlx4_core: Move affinity hints to mlx4_core ownership Matan Barak (3): net/mlx4_core: Demote simple multicast and broadcast flow steering rules net/mlx4: Add EQ pool net/mlx4_core: Make sure there are no pending async events when freeing CQ drivers/infiniband/hw/mlx4/main.c | 75 ++--- drivers/infiniband/hw/mlx4/mlx4_ib.h |1 - drivers/net/ethernet/mellanox/mlx4/cq.c| 13 +- drivers/net/ethernet/mellanox/mlx4/en_cq.c | 56 ++-- drivers/net/ethernet/mellanox/mlx4/en_netdev.c |7 +- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 13 +- drivers/net/ethernet/mellanox/mlx4/eq.c| 374 drivers/net/ethernet/mellanox/mlx4/main.c | 110 +- drivers/net/ethernet/mellanox/mlx4/mlx4.h | 12 +- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |2 +- .../net/ethernet/mellanox/mlx4/resource_tracker.c | 23 ++ include/linux/mlx4/device.h| 11 +- 12 files changed, 428 insertions(+), 269 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pull request: bluetooth-next 2015-05-28
From: Johan Hedberg Date: Thu, 28 May 2015 12:31:43 +0300 > Here's a set of patches intended for 4.2. The majority of the changes > are on the 802.15.4 side of things rather than Bluetooth related: > > - All sorts of cleanups & fixes to ieee802154 and related drivers > - Rework of tx power support in ieee802154 and its drivers > - Support for setting ieee802154 tx power through nl802154 > - New IDs for the btusb driver > - Various cleanups & smaller fixes to btusb > - New btrtl driver for Realtec devices > - Fix suspend/resume for Realtek devices > > Please let me know if there are any issues pulling. Thanks. Pulled, thanks Johan. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] isdn: Use ktime_t instead of 'struct timeval'
> > This doesn't compile: > Oops, I sent an older version of the patch with a typo. I've correct this in a v4. (NS_PER_SEC -> NSEC_PER_SEC). Thanks for taking a look at this. Tina -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/4] net/mlx4_core: Make sure there are no pending async events when freeing CQ
From: Or Gerlitz Date: Thu, 28 May 2015 18:41:16 +0300 > @@ -369,6 +369,10 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq > *cq) > mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, > cq->cqn); > > > synchronize_irq(priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq); > + if (priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq != > + priv->eq_table.eq[MLX4_EQ_ASYNC].irq) > + synchronize_irq(priv->eq_table.eq[MLX4_EQ_ASYNC].irq); > + > > spin_lock_irq(&cq_table->lock); > radix_tree_delete(&cq_table->tree, cq->cqn); I agree with Sergei that one empty line is sufficient here, don't make it into two. Please respin with this fixed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware
On 05/30/2015 02:00 AM, Jiri Pirko wrote: Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote: On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko wrote: Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote: On Tue, May 19, 2015 at 1:28 PM, David Miller wrote: From: Andy Gospodarek Date: Tue, 19 May 2015 15:47:32 -0400 Are you actually saying that if users complain loudly enough about the current behavior (not the change Roopa has proposed) that you would be open to considering a change the current behavior? I am saying that we have a contract with users not to break existing behavior. Full stop. After rehearing David's argument, we should probably explore option d) which is a refinement on the fib_offload_disable mechanism we have today. fib_offload_disable is global for all routes. Once we hit a HW install problem, the global flag is set and all routes fallback to SW. We did this because we can't allow the failed route to exist in SW and not in HW because it could mess up LPM searches (HW could hit on a lesser prefix even when SW has the true LPM, because HW gets first shot at match). The refinement on fib_offload_disable is this: make it per-related-prefix rather than global, and on a HW install problem, set the flag for the related-prefix and uninstall only those routes from HW. Related-prefix (is there a correct term for this?) are routes to the same dst addr but with different prefix lengths. I haven't parsed the fib_trie structure to see how routes are organized, but I suspect since it's optimized for lookup the related-prefix tracking is already there and we can build on that. This looks interesting. However, I'm not sure that it is acceptable for user to experience this hw evict of "random entries". User knows what entries are essential to have in hw. With your solution, I can see no way user can actually say what should be offloaded or not. Kernel just automagically decides. The default eviction policy could be based on RTA_PRIORITY: evict lower priority routes first. It would be up to the device driver to decide between two routes of same priority. To help device driver make the decision, we could have eviction policy options: Priority-base (default) Prefer IPv6 over IPv4 Prefer IPv4 over IPv6 Prefer single path over multipath Prefer longer prefix lengths over shorter Optimize for resource utilization These are portable across different switches. They're in terms a user understands. It's up to the device driver which truly understands the device constraints to translates the user's eviction policy choices into something that makes sense to that device. This sounds tempting... You plan to throw in some patches, or should I take care of that? This is encoding specific policies into the kernel. I was hoping to avoid this and let user space develop whatever policy it wants. If you use Jiri's proposed NLM_F_SKIP_{KERNEL|OFFLOAD} flags you get this. Also I don't understand the "truly understands the device constraints" comment. We can export a model of the device and know how many rules of each type will fit exactly into the table. This doesn't seem like much of a problem to me. In fact the driver developer should know this anyway. Part of my motivation here is I really don't want to get stuck with a case where each driver writer gets to translate the eviction policy onto their device in some device specific and slightly different way. It means every developer has to write a new mapping and get it correct. At very least we should put a layer in switchdev that reads the table out of the driver and does the mapping so we have it one spot. At least then the kernel is enforcing policy the same on all devices. Better still IMO would be to develop the policy in user space and have a library/tool that does this so we don't end up with a bunch of policy blobs in the kernel. The 6 above is a good start but over time we more policy blobs will surely pop up. I would for example put 'optimize for throughput' on the list. .John -- John Fastabend Intel Corporation -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()
On Sun, May 31, 2015 at 11:53:47AM +0900, Greg KH wrote: > On Mon, May 25, 2015 at 11:02:27AM -0500, Larry Finger wrote: > > On 05/23/2015 04:16 PM, Larry Finger wrote: > > >The driver is reporting a warning at kernel/time/timer.c:1096 due to > > >calling > > >del_timer_sync() while in interrupt mode. Such warnings are fixed by > > >calling > > >del_timer() instead. > > > > > >Signed-off-by: Larry Finger > > >Cc: Stable > > >Cc: Haggi Eran > > >--- > > > > Greg, > > > > Please drop this patch. The same fixes were submitted as > > https://lkml.org/lkml/2015/5/15/226. > > That's not working for me at the moment, what was the subject: name? I > think I already applied it to the testing tree... Nevermind, found it... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()
On Mon, May 25, 2015 at 11:02:27AM -0500, Larry Finger wrote: > On 05/23/2015 04:16 PM, Larry Finger wrote: > >The driver is reporting a warning at kernel/time/timer.c:1096 due to calling > >del_timer_sync() while in interrupt mode. Such warnings are fixed by calling > >del_timer() instead. > > > >Signed-off-by: Larry Finger > >Cc: Stable > >Cc: Haggi Eran > >--- > > Greg, > > Please drop this patch. The same fixes were submitted as > https://lkml.org/lkml/2015/5/15/226. That's not working for me at the moment, what was the subject: name? I think I already applied it to the testing tree... thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-4 100G Ethernet driver
From: Amir Vadai Date: Thu, 28 May 2015 22:28:37 +0300 > This patchset extends the mlx5_core driver to support Ethernet > functionality. The Ethernet functionality in the mlx5 driver is > integrated into the core driver and not as separated driver. The > IB functionality remains in the mlx5_ib driver as before. Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next 00/14][pull request] Intel Wired LAN Driver Updates 2015-05-28
From: Jeff Kirsher Date: Thu, 28 May 2015 04:25:25 -0700 > This series contains updates to ethtool, ixgbe, i40e and i40evf. > > John adds helper routines for ethtool to pass VF to rx_flow_spec. Since > the ring_cookie is 64 bits wide which is much larger than what could be > used for actual queue index values, provide helper routines to pack a VF > index into the cookie. Then John provides a ixgbe patch to allow flow > director to use the entire queue space. > > Neerav provides a i40e patch to collect XOFF Rx stats, where it was not > being collected before. > > Anjali provides ATR support for tunneled packets, as well as stats to > count tunnel ATR hits. Cleaned up PF struct members which are > unnecessary, since we can use the stat index macro directly. Cleaned > up flow director ATR/SB messages to a higher debug level since they > are not useful unless silicon validation is happening. > > Greg provides a patch to disable offline diagnostics if VFs are enabled > since ethtool offline diagnostic tests are not designed (out of scope) > to disable VF functions for testing and re-enable afterward. Also cleans > up TODO comment that is no longer needed. > > Vasu provides a fix an FCoE EOF case where i40e_fcoe_ctxt_eof() maybe > called before i40e_fcoe_eof_is_supported() is called. > > Jesse adds skb->xmit_more support for i40evf. Then provides a performance > enhancement for i40evf by inlining some functions which provides a 15% > gain in small packet performance. Also cleans up the use of time_stamp > since it is no longer used to determine if there is a tx_hang and was > a part of a previous tx_hang design which is no longer used. Pulled, thanks Jeff. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tipc: unconditionally put sock refcnt when sock timer to be deleted is pending
From: Ying Xue Date: Thu, 28 May 2015 13:19:22 +0800 > As sock refcnt is taken when sock timer is started in > sk_reset_timer(), the sock refcnt should be put when sock timer > to be deleted is in pending state no matter what "probing_state" > value of tipc sock is. > > Reviewed-by: Erik Hugne > Reviewed-by: Jon Maloy > Signed-off-by: Ying Xue Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] if_vlan: fix vlaue -> value typo
From: Vivien Didelot Date: Wed, 27 May 2015 21:07:26 -0400 > Fixes "vlaue" for "value" in include/linux/if_vlan.h. > > Signed-off-by: Vivien Didelot Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] bpf: allow BPF programs access skb->skb_iif and skb->dev->ifindex fields
From: Alexei Starovoitov Date: Wed, 27 May 2015 15:30:39 -0700 > classic BPF already exposes skb->dev->ifindex via SKF_AD_IFINDEX extension. > Allow eBPF program to access it as well. Note that classic aborts execution > of the program if 'skb->dev == NULL' (which is inconvenient for program > writers), whereas eBPF returns zero in such case. > Also expose the 'skb_iif' field, since programs triggered by redirected > packet need to known the original interface index. > Summary: > __skb->ifindex -> skb->dev->ifindex > __skb->ingress_ifindex -> skb->skb_iif > > Signed-off-by: Alexei Starovoitov Applied, thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 net-next 1/1] hv_netvsc: Properly size the vrss queues
From: "K. Y. Srinivasan" Date: Wed, 27 May 2015 13:16:57 -0700 > The current algorithm for deciding on the number of VRSS channels is > not optimal since we open up the min of number of CPUs online and the > number of VRSS channels the host is offering. So on a 32 VCPU guest > we could potentially open 32 VRSS subchannels. Experimentation has > shown that it is best to limit the number of VRSS channels to the number > of CPUs within a NUMA node. > > Here is the new algorithm for deciding on the number of sub-channels we > would open up: > 1) Pick the minimum of what the host is offering and what the driver >in the guest is specifying as the default value. > 2) Pick the minimum of (1) and the numbers of CPUs in the NUMA >node the primary channel is bound to. > > > Signed-off-by: K. Y. Srinivasan Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN
From: Sorin Dumitru Date: Wed, 27 May 2015 22:16:49 +0300 > This is similar to b1cb59cf2efe(net: sysctl_net_core: check SNDBUF > and RCVBUF for min length). I don't think too small values can cause > crashes in the case of udp and tcp, but I've seen this set to too > small values which triggered awful performance. It also makes the > setting consistent across all the wmem/rmem sysctls. > > Signed-off-by: Sorin Dumitru Applied, thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Gefeliciteerd !!!
-- Gefeliciteerd !!! Including we Vieren Onze 10 jaar Van het internet Journey en Global Communication we are Blij aan te kondigen aan u DAT Uw Facebook-rekening are willekeurig geselecteerd als begunstigde van $ 1,000,000.00usd in de 2014/2015 Facebook account van het Jaar {Grote Rewards winnaar} . E-mail ons de informatie hieronder: fb_deliveryserv...@mynet.com BERICHT VAN identificatie: NW90W0W0-XANSIEW-1015 1) Bedrag gewonnen: $ 1.000.000,00 usd 2) facebook Gebruikersnaam: 3) De dialog Land van Woonplaats: 4) Paspoort / Identity Number: E-mail: fb_deliveryserv...@mynet.com George Jones. Program Coordinator, Facebook Rewards Program, www.facebook.com Alle Rechten voorbehouden 2015. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/3] dsa: Add support for multiple cpu ports.
2015-05-30 15:09 GMT+03:00 Bjørn Mork : > Andrew Lunn writes: > >> Some boards have two CPU interfaces connected to the switch, e.g. WiFi >> access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and >> two port connected to the SoC. >> >> This patch extends DSA to allows both CPU ports to be used. The "cpu" >> node in the DSA tree can now have a phandle to the host interface it >> connects to. Each user port can have a phandle to a cpu port which >> should be used for traffic between the port and the CPU. Thus simple >> load sharing over the two CPU ports can be achieved. >> >> Signed-off-by: Andrew Lunn >> --- >> Documentation/devicetree/bindings/net/dsa/dsa.txt | 66 - >> drivers/net/dsa/mv88e6xxx.c | 8 +- >> include/net/dsa.h | 28 +- >> net/dsa/dsa.c | 109 >> ++ >> net/dsa/dsa_priv.h| 6 ++ >> net/dsa/slave.c | 10 +- >> net/dsa/tag_brcm.c| 2 +- >> net/dsa/tag_dsa.c | 2 +- >> net/dsa/tag_edsa.c| 2 +- >> net/dsa/tag_trailer.c | 2 +- >> 10 files changed, 206 insertions(+), 29 deletions(-) >> >> diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt >> b/Documentation/devicetree/bindings/net/dsa/dsa.txt >> index f0b4cd72411d..34f7f18026e5 100644 >> --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt >> +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt >> @@ -58,13 +58,24 @@ Optionnal property: >> Documentation/devicetree/bindings/net/ethernet.txt >> for details. >> >> +- ethernet : Optional for "cpu" ports. A phandle to an ethernet >> + device which will be used by this CPU port for >> + passing packets to/from the host. If not present, >> + the port will use the "dsa,ethernet" property >> + defined above. >> + >> +- cpu: Option for non "cpu"/"dsa" ports. A phandle >> to a >> + "cpu" port, which will be used for passing packets >> + from this port to the host. If not present, the first >> + "cpu" port will be used. >> + > Forgive me my intrusion. Maybe I could answer to some of your questions. > I'm in deep water here, but this scheme sounds a little too static to me > if I understand your proposal correctly. Why would you want to create a > static mapping of CPU ports to external ports for any given device? Vendor already assumes that this mapping is static and DT just describes this assumption. Single switch chip with two ports connected to CPU on such devices is cheaper than switch chip + dedicated phy chip. In other words, one of the switch ports just used as independent phy and Andrew's patch gives an ability to perfectly describe such situation. > To me, that's part of the switch VLAN configuration. > AFAIK DSA is designed to allow L3 routing between ports as opposed to switching and VLANs at L2. DSA facilitates work of hardware designer by providing more configurable chips. If so then interconnection tasks should be resolved by kernel in "plug-and-play" manner, just as kernel assigns memory regions to PCI devices :) > My experience with these devices is limited to running OpenWRT on an > WRT1900AC, having a Marvell 88E6172 switch. And using the OpenWRT > switch API of course. There I've found it very useful to be able to mix > and match the two CPU ports as I like with the external ports. How you > want the CPU ports used is not as much depeing on device properties as > on your network configuration, IMHO. How many and which links do you > have? What bandwith are they? Trunks or not? Etc. You cannot describe > these answers as device properties, because they aren't. > Nobody forbids to run custom kernel with custom DT in case of custom setup :) > You can currently configure this as you like in OpenWRT using their > usual swconfig tool. The CPU ports are added or removed from VLANs like > any other port on the switch, and that feels very natural for me as an > end user. The only distinction necessary to know, is your 'ethernet' > property above: Which host device is this switch port connected to. > > So I wonder: Do you plan to put all of the switch config into DT? Where > does that stop? How about trunking between external ports and CPU ports? > Will every VLAN in the trunk have to go into DT too? > IMHO VLANs shouldn't be described by DT. VLANs is part of network configuration and should be configured by end user, if he needs them. In the same time, DSA configuration is part of hw configuration and that's why it placed in DT. In any case, Andrew as an author could give a better e
Re: [PATCH V2 0/5] Add support for QCA IPQ806x Ethernet GMAC controller
From: Mathieu Olivari Date: Wed, 27 May 2015 11:02:45 -0700 > This patch set adds support for the integrated Ethernet GMAC controller > on QCA IPQ806x SoC. This controller is based on a Gigabit Synopsys > DesignWare IP, already supported in the stmmac driver located in > drivers/net/ethernet/stmicro/stmmac. > > This change is done as a follow-up to the following thread: > *http://www.spinics.net/lists/netdev/msg311265.html > While previous attempt was creating a new driver to drive this controller, > this new post leverages the existing stmmac driver by implementing the > SoC specific glue to it. > > Aside from the pure stmmac glue layer, we have a couple of related > patches: > *IPQ806x NSS clock addition is cherry-picked and refreshed from the > following thread: https://lkml.org/lkml/2014/8/6/390 > *phy-handle and fixed-link support are also added in this change set so the > driver can be fully functional on platforms using device-trees as well as > ethernet switches. > > V2: > *Fix MODULE_LICENSE to "Dual BSD/GPL" as the dwmac-ipq806x.c is using > ISC license. Series applied to net-next, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recurring trace from tcp_fragment()
On Sat, May 30, 2015 at 2:52 PM, Grant Zhang wrote: > Thank you Neal. Most likely I will test the patch on Monday and report > back the result. > > As for the TcpExtTCPSACKReneging counter, attached is the captured > counter value on a 1-second interval for 10 minutes. OK, great. Those TcpExtTCPSACKReneging values look consistent with the theory underlying the patch, so that's a good sign. Thanks! neal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] net: dsa: ar8xxx: add regmap support
2015-05-29 20:59 GMT+03:00 Andrew Lunn : > On Fri, May 29, 2015 at 10:36:49AM -0700, Mathieu Olivari wrote: >> Alternatively, we could have something similar to what happens for the phy >> in the wireless subsystems. Wireless PHYs are not registered as net_device >> but they can still be listed, queried or configured through netlink. > > It is a reasonable idea, but you retrieve most of the useful > information using ethtool. That, as far as i know, operates on > net_devices, not phys. > May be it's time to rework Ethernet cards handling to decouple "Network interfaces" from "Ethernet ports"? -- Sergey -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ingress tc filters with IPSec
> On May 30, 2015 at 4:12 PM "jsulli...@opensourcedevel.com" > wrote: > > > > > On May 30, 2015 at 2:24 AM "John A. Sullivan III" > > wrote: > > > > > > On Sat, 2015-05-30 at 01:52 -0400, John A. Sullivan III wrote: > > > Argh! yet another obstacle from my ignorance. We are attempting ingress > > > traffic shaping using IFB interfaces on traffic coming via GRE / IPSec. > > > Filters and hash tables are working fine with plain GRE including > > > stripping the header. We even got the ematch filter working so that the > > > ESP packets are the only packets not redirected to IFB. > > > > > > But, regardless of whether we redirect ESP packets to IFB, the filters > > > never see the decrypted packets. I thought the packets passed through > > > the interface twice - first encrypted and they decrypted. However, > > > tcpdump only shows the ESP packets on the interface. > > > > > > How do we apply filters to the packets after decryption? Thanks - John > > > > I see what changed. In the past, this seemed to work but we were using > > tunnel mode. We were trying to use transport mode in this application > > but that seems to prevent the decrypted packet contents from appearing > > again on the interface. Reverting to tunnel mode made the contents > > visible again and our filters are working as expected - John > > Alas, this is still a problem since we are using VRRP and the tunnel end > points > are the virtual IP addresses. That makes StrongSWAN choke on selector matching > in tunnel mode so back to trying to make transport mode work. > > I am guessing we do not see the second pass of the packet because it is only > encrypted and not encapsulated. So my hunch is that we ned to pass the ESP > packet into the ifb qdisc but need to look elsewhere the packet for the filter > matching information. We know that matching on the normal offsets does not > work > so I am hoping the decrypted packet is decipherable by the filter matching > logic > but just still has all the ESP transport header attached. > > Normally, to extract the contents of my GRE tunnel, I would place them into a > separate hash table with the GRE header stripped off and then filter them into > TCP and UDP hast tables: > > tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47 > 0xff match u16 0x0800 0x at 22 link 11: offset at 0 mask 0f00 shift 6 plus > 4 > eat > > So we match the GRE protocol and determine that GRE is carrying an IP packet. > With the ESP transport header and IV (AES = 16B) interposed between the IP > header and the GRE header, I suppose the first part of this filter becomes: > > tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47 > 0xff match u16 0x0800 0x at 46 > > but what do I do with the second half to find the start of the TCP/UDP header? > Is it still offset at 0 because tc filter somehow knows where the interior IP > header starts or should it be offset at 48 to account for the GRE + ESP > headers? > Or is there a better way to filter ingress traffic on GRE/IPSec tunnels? > Thanks > - John Alas, this is not working. I set a continue action for the ESP traffic: tc filter replace dev ifb0 parent 11:0 protocol ip prio 1 u32 match ip protocol 50 0xff action continue and that seems to be matching: filter parent 11: protocol ip pref 1 u32 fh 802::800 order 2048 key ht 802 bkt 0 terminal flowid ??? (rule hit 3130003 success 2931853) match 0032/00ff at 8 (success 2931853 ) action order 1: gact action continue random type none pass val 0 index 1 ref 1 bind 1 installed 294 sec And I even reduced the GRE filter to just look for the GRE protocol in the IP header: tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47 0xff link 11: offset at 48 mask 0f00 shift 6 plus 4 eat but it does not appear to be matching at all: filter parent 11: protocol ip pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 link 11: (rule hit 3130012 success 0) match 002f/00ff at 8 (success 0 ) offset 0f00>>6 at 48 plus 4 eat Any suggestions about how to traffic shape ingest traffic coming off an ESP Transport connection? Thanks - John -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ingress tc filters with IPSec
> On May 30, 2015 at 2:24 AM "John A. Sullivan III" > wrote: > > > On Sat, 2015-05-30 at 01:52 -0400, John A. Sullivan III wrote: > > Argh! yet another obstacle from my ignorance. We are attempting ingress > > traffic shaping using IFB interfaces on traffic coming via GRE / IPSec. > > Filters and hash tables are working fine with plain GRE including > > stripping the header. We even got the ematch filter working so that the > > ESP packets are the only packets not redirected to IFB. > > > > But, regardless of whether we redirect ESP packets to IFB, the filters > > never see the decrypted packets. I thought the packets passed through > > the interface twice - first encrypted and they decrypted. However, > > tcpdump only shows the ESP packets on the interface. > > > > How do we apply filters to the packets after decryption? Thanks - John > > I see what changed. In the past, this seemed to work but we were using > tunnel mode. We were trying to use transport mode in this application > but that seems to prevent the decrypted packet contents from appearing > again on the interface. Reverting to tunnel mode made the contents > visible again and our filters are working as expected - John Alas, this is still a problem since we are using VRRP and the tunnel end points are the virtual IP addresses. That makes StrongSWAN choke on selector matching in tunnel mode so back to trying to make transport mode work. I am guessing we do not see the second pass of the packet because it is only encrypted and not encapsulated. So my hunch is that we ned to pass the ESP packet into the ifb qdisc but need to look elsewhere the packet for the filter matching information. We know that matching on the normal offsets does not work so I am hoping the decrypted packet is decipherable by the filter matching logic but just still has all the ESP transport header attached. Normally, to extract the contents of my GRE tunnel, I would place them into a separate hash table with the GRE header stripped off and then filter them into TCP and UDP hast tables: tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47 0xff match u16 0x0800 0x at 22 link 11: offset at 0 mask 0f00 shift 6 plus 4 eat So we match the GRE protocol and determine that GRE is carrying an IP packet. With the ESP transport header and IV (AES = 16B) interposed between the IP header and the GRE header, I suppose the first part of this filter becomes: tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47 0xff match u16 0x0800 0x at 46 but what do I do with the second half to find the start of the TCP/UDP header? Is it still offset at 0 because tc filter somehow knows where the interior IP header starts or should it be offset at 48 to account for the GRE + ESP headers? Or is there a better way to filter ingress traffic on GRE/IPSec tunnels? Thanks - John -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recurring trace from tcp_fragment()
Thank you Neal. Most likely I will test the patch on Monday and report back the result. As for the TcpExtTCPSACKReneging counter, attached is the captured counter value on a 1-second interval for 10 minutes. Thanks, Grant reneg.log Description: Binary data > On May 30, 2015, at 10:29 AM, Neal Cardwell wrote: > > On Fri, May 29, 2015 at 3:53 PM, Grant Zhang wrote: >> Hi Neal, >> >> I will be more happy to test the patch. Please send it my way. > > Great. Thank you so much for being willing to do this. Attached is a > patch for testing. I generated it and tested it relative to Linux > v3.14.39, since your stack trace seemed to suggest that you were > seeing this on some variant of v3.14.39. (Newer kernels would need a > slightly different patch, since the reneging code path has changed a > little since 3.14.) > > Can you please try it out and see if it makes that warning go away? > > Also, I would be interested in seeing the value of your > TcpExtTCPSACKReneging counter, and some sense of how fast that value > is increasing, on a machine that's seeing this issue: > nstat -z -a | grep Reneg > > Thanks! > > neal > <0001-RFC-for-tests-on-v3.14.39-tcp-resegment-skbs-that-we.patch>
[PATCH net-next 0/3] s390/bpf: implement bpf_tail_call JIT support
This set is for net-next tree. Patch 3 adds bpf_tail_call() support for s390x JIT. It has a dependency on patches 1 and 2 that will also be submitted to stable via Martin Schwidefsky. Michael Holzheu (3): s390/bpf: fix stack allocation s390/bpf: fix bpf frame pointer setup s390/bpf: implement bpf_tail_call() helper arch/s390/net/bpf_jit.h | 12 - arch/s390/net/bpf_jit_comp.c | 117 +++--- 2 files changed, 121 insertions(+), 8 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/3] s390/bpf: fix stack allocation
From: Michael Holzheu On s390x we have to provide 160 bytes stack space before we can call the next function. From the 160 bytes that we got from the previous function we only use 11 * 8 bytes and have 160 - 11 * 8 bytes left. Currently for BPF we allocate additional 160 - 11 * 8 bytes for the next function. This is wrong because then the next function only gets: (160 - 11 * 8) + (160 - 11 * 8) = 2 * 72 = 144 bytes Fix this and allocate enough memory for the next function. Cc: sta...@vger.kernel.org # 4.0+ Signed-off-by: Michael Holzheu Acked-by: Heiko Carstens Signed-off-by: Alexei Starovoitov --- arch/s390/net/bpf_jit.h |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/s390/net/bpf_jit.h b/arch/s390/net/bpf_jit.h index ba8593a515ba..de156ba3bd71 100644 --- a/arch/s390/net/bpf_jit.h +++ b/arch/s390/net/bpf_jit.h @@ -48,7 +48,9 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[]; * We get 160 bytes stack space from calling function, but only use * 11 * 8 byte (old backchain + r15 - r6) for storing registers. */ -#define STK_OFF (MAX_BPF_STACK + 8 + 4 + 4 + (160 - 11 * 8)) +#define STK_SPACE (MAX_BPF_STACK + 8 + 4 + 4 + 160) +#define STK_160_UNUSED (160 - 11 * 8) +#define STK_OFF(STK_SPACE - STK_160_UNUSED) #define STK_OFF_TMP160 /* Offset of tmp buffer on stack */ #define STK_OFF_HLEN 168 /* Offset of SKB header length on stack */ -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/3] s390/bpf: fix bpf frame pointer setup
From: Michael Holzheu Currently the bpf frame pointer is set to the old r15. This is wrong because of packed stack. Fix this and adjust the frame pointer to respect packed stack. This now generates a prolog like the following: 3ff8001c3fa: eb67f0480024 stmg%r6,%r7,72(%r15) 3ff8001c400: ebcff0780024 stmg%r12,%r15,120(%r15) 3ff8001c406: b904001f lgr %r1,%r15 <- load backchain 3ff8001c40a: 41d0f048 la %r13,72(%r15) <- load adjusted bfp 3ff8001c40e: a7fbfd98 aghi%r15,-616 3ff8001c412: e310f0980024 stg %r1,152(%r15) <- save backchain Cc: sta...@vger.kernel.org # 4.0+ Signed-off-by: Michael Holzheu Acked-by: Heiko Carstens Signed-off-by: Alexei Starovoitov --- arch/s390/net/bpf_jit_comp.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 20c146d1251a..55423d8be580 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -384,13 +384,16 @@ static void bpf_jit_prologue(struct bpf_jit *jit) } /* Setup stack and backchain */ if (jit->seen & SEEN_STACK) { - /* lgr %bfp,%r15 (BPF frame pointer) */ - EMIT4(0xb904, BPF_REG_FP, REG_15); + if (jit->seen & SEEN_FUNC) + /* lgr %w1,%r15 (backchain) */ + EMIT4(0xb904, REG_W1, REG_15); + /* la %bfp,STK_160_UNUSED(%r15) (BPF frame pointer) */ + EMIT4_DISP(0x4100, BPF_REG_FP, REG_15, STK_160_UNUSED); /* aghi %r15,-STK_OFF */ EMIT4_IMM(0xa70b, REG_15, -STK_OFF); if (jit->seen & SEEN_FUNC) - /* stg %bfp,152(%r15) (backchain) */ - EMIT6_DISP_LH(0xe300, 0x0024, BPF_REG_FP, REG_0, + /* stg %w1,152(%r15) (backchain) */ + EMIT6_DISP_LH(0xe300, 0x0024, REG_W1, REG_0, REG_15, 152); } /* -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/3] s390/bpf: implement bpf_tail_call() helper
From: Michael Holzheu bpf_tail_call() arguments: - ctx..: Context pointer - jmp_table: One of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table - index: Index in the jump table In this implementation s390x JIT does stack unwinding and jumps into the callee program prologue. Caller and callee use the same stack. With this patch a tail call generates the following code on s390x: if (index >= array->map.max_entries) goto out 03ff8001c7e4: e31030100016 llgf%r1,16(%r3) 03ff8001c7ea: ec41001fa065 clgrj %r4,%r1,10,3ff8001c828 if (tail_call_cnt++ > MAX_TAIL_CALL_CNT) goto out; 03ff8001c7f0: a7080001 lhi %r0,1 03ff8001c7f4: eb10f25000fa laal%r1,%r0,592(%r15) 03ff8001c7fa: ec120017207f clij%r1,32,2,3ff8001c828 prog = array->prog[index]; if (prog == NULL) goto out; 03ff8001c800: eb140003000d sllg%r1,%r4,3 03ff8001c806: e3131084 lg %r1,128(%r3,%r1) 03ff8001c80c: ec18000e007d clgij %r1,0,8,3ff8001c828 Restore registers before calling function 03ff8001c812: eb68f2980004 lmg %r6,%r8,664(%r15) 03ff8001c818: ebbff2c4 lmg %r11,%r15,704(%r15) goto *(prog->bpf_func + tail_call_start); 03ff8001c81e: e3110024 lg %r1,32(%r1,%r0) 03ff8001c824: 47f01006 bc 15,6(%r1) Reviewed-by: Martin Schwidefsky Signed-off-by: Michael Holzheu Acked-by: Heiko Carstens Signed-off-by: Alexei Starovoitov --- arch/s390/net/bpf_jit.h | 10 +++- arch/s390/net/bpf_jit_comp.c | 106 +- 2 files changed, 112 insertions(+), 4 deletions(-) diff --git a/arch/s390/net/bpf_jit.h b/arch/s390/net/bpf_jit.h index de156ba3bd71..f6498eec9ee1 100644 --- a/arch/s390/net/bpf_jit.h +++ b/arch/s390/net/bpf_jit.h @@ -28,6 +28,9 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[]; * | old backchain | | * +---+ | * | r15 - r6| | + * +---+ | + * | 4 byte align | | + * | tail_call_cnt | | * BFP-> +===+ | * | | | * | BPF stack | | @@ -46,14 +49,17 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[]; * R15-> +---+ + low * * We get 160 bytes stack space from calling function, but only use - * 11 * 8 byte (old backchain + r15 - r6) for storing registers. + * 12 * 8 byte for old backchain, r15..r6, and tail_call_cnt. */ #define STK_SPACE (MAX_BPF_STACK + 8 + 4 + 4 + 160) -#define STK_160_UNUSED (160 - 11 * 8) +#define STK_160_UNUSED (160 - 12 * 8) #define STK_OFF(STK_SPACE - STK_160_UNUSED) #define STK_OFF_TMP160 /* Offset of tmp buffer on stack */ #define STK_OFF_HLEN 168 /* Offset of SKB header length on stack */ +#define STK_OFF_R6 (160 - 11 * 8) /* Offset of r6 on stack */ +#define STK_OFF_TCCNT (160 - 12 * 8) /* Offset of tail_call_cnt on stack */ + /* Offset to skip condition code check */ #define OFF_OK 4 diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 55423d8be580..d3766dd67e23 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include "bpf_jit.h" @@ -40,6 +41,8 @@ struct bpf_jit { int base_ip;/* Base address for literal pool */ int ret0_ip;/* Address of return 0 */ int exit_ip;/* Address of exit */ + int tail_call_start;/* Tail call start offset */ + int labels[1]; /* Labels for local jumps */ }; #define BPF_SIZE_MAX 4096/* Max size for program */ @@ -49,6 +52,7 @@ struct bpf_jit { #define SEEN_RET0 4 /* ret0_ip points to a valid return 0 */ #define SEEN_LITERAL 8 /* code uses literals */ #define SEEN_FUNC 16 /* calls C functions */ +#define SEEN_TAIL_CALL 32 /* code uses tail calls */ #define SEEN_STACK (SEEN_FUNC | SEEN_MEM | SEEN_SKB) /* @@ -60,6 +64,7 @@ struct bpf_jit { #define REG_L (__MAX_BPF_REG+3) /* Literal pool register */ #define REG_15 (__MAX_BPF_REG+4) /* Register 15 */ #define REG_0 REG_W0 /* Register 0 */ +#define REG_1 REG_W1 /* Register 1 */ #define REG_2 BPF_REG_1 /* Register 2 */ #define REG_14 BPF_REG_0 /* Register 14 */ @@ -223,6 +228,24 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1) REG_SET_SEEN(b3); \ }) +#define EMIT6_PCREL_LABEL(op1, op2, b1, b2, label, mask) \ +({ \ + int rel = (jit->labels[label] - jit->prg) >> 1; \ + _EMIT6(op1 | reg(b1,
Re: Recurring trace from tcp_fragment()
On Fri, May 29, 2015 at 3:53 PM, Grant Zhang wrote: > Hi Neal, > > I will be more happy to test the patch. Please send it my way. Great. Thank you so much for being willing to do this. Attached is a patch for testing. I generated it and tested it relative to Linux v3.14.39, since your stack trace seemed to suggest that you were seeing this on some variant of v3.14.39. (Newer kernels would need a slightly different patch, since the reneging code path has changed a little since 3.14.) Can you please try it out and see if it makes that warning go away? Also, I would be interested in seeing the value of your TcpExtTCPSACKReneging counter, and some sense of how fast that value is increasing, on a machine that's seeing this issue: nstat -z -a | grep Reneg Thanks! neal 0001-RFC-for-tests-on-v3.14.39-tcp-resegment-skbs-that-we.patch Description: Binary data
[PATCH 81/98] include/uapi/linux/openvswitch.h: use __u32 from linux/types.h
Fixes userspace compiler error: error: unknown type name ‘uint32_t’ Signed-off-by: Mikko Rapeli --- include/uapi/linux/openvswitch.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index bbd49a0..0ab8eca 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -586,8 +586,8 @@ enum ovs_hash_alg { * @hash_basis: basis used for computing hash. */ struct ovs_action_hash { - uint32_t hash_alg; /* One of ovs_hash_alg. */ - uint32_t hash_basis; + __u32 hash_alg; /* One of ovs_hash_alg. */ + __u32 hash_basis; }; /** -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 84/98] include/uapi/linux/atm_zatm.h: include linux/time.h
Fixes userspace compile error: error: field ‘real’ has incomplete type struct timeval real; /* real (wall-clock) time */ Signed-off-by: Mikko Rapeli --- include/uapi/linux/atm_zatm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/atm_zatm.h b/include/uapi/linux/atm_zatm.h index 10f0fa2..adbaa6c 100644 --- a/include/uapi/linux/atm_zatm.h +++ b/include/uapi/linux/atm_zatm.h @@ -14,6 +14,7 @@ #include #include +#include #define ZATM_GETPOOL _IOW('a',ATMIOC_SARPRV+1,struct atmif_sioc) /* get pool statistics */ -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] udp: fix behavior of wrong checksums
From: Eric Dumazet We have two problems in UDP stack related to bogus checksums : 1) We return -EAGAIN to application even if receive queue is not empty. This breaks applications using edge trigger epoll() 2) Under UDP flood, we can loop forever without yielding to other processes, potentially hanging the host, especially on non SMP. This patch is an attempt to make things better. We might in the future add extra support for rt applications wanting to better control time spent doing a recv() in a hostile environment. For example we could validate checksums before queuing packets in socket receive queue. Signed-off-by: Eric Dumazet Cc: Willem de Bruijn --- net/ipv4/udp.c |6 ++ net/ipv6/udp.c |6 ++ 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index d10b7e0112eb..1c92ea67baef 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1345,10 +1345,8 @@ csum_copy_err: } unlock_sock_fast(sk, slow); - if (noblock) - return -EAGAIN; - - /* starting over for a new packet */ + /* starting over for a new packet, but check if we need to yield */ + cond_resched(); msg->msg_flags &= ~MSG_TRUNC; goto try_again; } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index c2ec41617a35..e51fc3eee6db 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -525,10 +525,8 @@ csum_copy_err: } unlock_sock_fast(sk, slow); - if (noblock) - return -EAGAIN; - - /* starting over for a new packet */ + /* starting over for a new packet, but check if we need to yield */ + cond_resched(); msg->msg_flags &= ~MSG_TRUNC; goto try_again; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 41/98] include/uapi/linux/if_pppox.h: include linux/if.h
Fixes userspace compilation error: error: ‘IFNAMSIZ’ undeclared here (not in a function) Signed-off-by: Mikko Rapeli --- include/uapi/linux/if_pppox.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h index e128769..473c3c4 100644 --- a/include/uapi/linux/if_pppox.h +++ b/include/uapi/linux/if_pppox.h @@ -21,6 +21,7 @@ #include #include +#include #include #include -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 48/98] include/uapi/linux/if_pppox.h: include linux/in.h and linux/in6.h
Fixes userspace compilation errors: error: field ‘addr’ has incomplete type struct sockaddr_in addr; /* IP address and port to send to */ error: field ‘addr’ has incomplete type struct sockaddr_in6 addr; /* IP address and port to send to */ Signed-off-by: Mikko Rapeli --- include/uapi/linux/if_pppox.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h index 473c3c4..d37bbb1 100644 --- a/include/uapi/linux/if_pppox.h +++ b/include/uapi/linux/if_pppox.h @@ -24,6 +24,8 @@ #include #include #include +#include +#include /* For user-space programs to pick up these definitions * which they wouldn't get otherwise without defining __KERNEL__ -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 47/98] include/uapi/linux/if_pppol2tp.h: include linux/in.h and linux/in6.h
Fixes userspace compilation errors like: error: field ‘addr’ has incomplete type struct sockaddr_in addr; /* IP address and port to send to */ ^ error: field ‘addr’ has incomplete type struct sockaddr_in6 addr; /* IP address and port to send to */ Signed-off-by: Mikko Rapeli --- include/uapi/linux/if_pppol2tp.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/if_pppol2tp.h b/include/uapi/linux/if_pppol2tp.h index 163e8ad..4bd1f55 100644 --- a/include/uapi/linux/if_pppol2tp.h +++ b/include/uapi/linux/if_pppol2tp.h @@ -16,7 +16,8 @@ #define _UAPI__LINUX_IF_PPPOL2TP_H #include - +#include +#include /* Structure used to connect() the socket to a particular tunnel UDP * socket over IPv4. -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 42/98] include/uapi/linux/if_tunnel.h: include linux/if.h, linux/ip.h and linux/in6.h
Fixes userspace compilation errors like: error: field ‘iph’ has incomplete type error: field ‘prefix’ has incomplete type Signed-off-by: Mikko Rapeli --- include/uapi/linux/if_tunnel.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h index bd3cc11..2a36080 100644 --- a/include/uapi/linux/if_tunnel.h +++ b/include/uapi/linux/if_tunnel.h @@ -2,6 +2,9 @@ #define _UAPI_IF_TUNNEL_H_ #include +#include +#include +#include #include -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/3] dsa: Add support for multiple cpu ports.
Andrew Lunn writes: > Some boards have two CPU interfaces connected to the switch, e.g. WiFi > access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and > two port connected to the SoC. > > This patch extends DSA to allows both CPU ports to be used. The "cpu" > node in the DSA tree can now have a phandle to the host interface it > connects to. Each user port can have a phandle to a cpu port which > should be used for traffic between the port and the CPU. Thus simple > load sharing over the two CPU ports can be achieved. > > Signed-off-by: Andrew Lunn > --- > Documentation/devicetree/bindings/net/dsa/dsa.txt | 66 - > drivers/net/dsa/mv88e6xxx.c | 8 +- > include/net/dsa.h | 28 +- > net/dsa/dsa.c | 109 > ++ > net/dsa/dsa_priv.h| 6 ++ > net/dsa/slave.c | 10 +- > net/dsa/tag_brcm.c| 2 +- > net/dsa/tag_dsa.c | 2 +- > net/dsa/tag_edsa.c| 2 +- > net/dsa/tag_trailer.c | 2 +- > 10 files changed, 206 insertions(+), 29 deletions(-) > > diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt > b/Documentation/devicetree/bindings/net/dsa/dsa.txt > index f0b4cd72411d..34f7f18026e5 100644 > --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt > +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt > @@ -58,13 +58,24 @@ Optionnal property: > Documentation/devicetree/bindings/net/ethernet.txt > for details. > > +- ethernet : Optional for "cpu" ports. A phandle to an ethernet > + device which will be used by this CPU port for > + passing packets to/from the host. If not present, > + the port will use the "dsa,ethernet" property > + defined above. > + > +- cpu: Option for non "cpu"/"dsa" ports. A phandle > to a > + "cpu" port, which will be used for passing packets > + from this port to the host. If not present, the first > + "cpu" port will be used. > + I'm in deep water here, but this scheme sounds a little too static to me if I understand your proposal correctly. Why would you want to create a static mapping of CPU ports to external ports for any given device? To me, that's part of the switch VLAN configuration. My experience with these devices is limited to running OpenWRT on an WRT1900AC, having a Marvell 88E6172 switch. And using the OpenWRT switch API of course. There I've found it very useful to be able to mix and match the two CPU ports as I like with the external ports. How you want the CPU ports used is not as much depeing on device properties as on your network configuration, IMHO. How many and which links do you have? What bandwith are they? Trunks or not? Etc. You cannot describe these answers as device properties, because they aren't. You can currently configure this as you like in OpenWRT using their usual swconfig tool. The CPU ports are added or removed from VLANs like any other port on the switch, and that feels very natural for me as an end user. The only distinction necessary to know, is your 'ethernet' property above: Which host device is this switch port connected to. So I wonder: Do you plan to put all of the switch config into DT? Where does that stop? How about trunking between external ports and CPU ports? Will every VLAN in the trunk have to go into DT too? Bjørn -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] xen-netback: return correct ethtool stats
Control: fixed -1 4.0-1~exp1 On Wed, 2015-03-04 at 11:14 +, David Vrabel wrote: > Use correct pointer arithmetic to get the pointer to each stat. I think this incorrect arithmetic was also responsible for the crash reported in http://bugs.debian.org/786936 which was using the resulting stray pointer. I'll add the fix to our kernel but: David (Miller) could we also have it queued for stable please? Thanks. Reasoning: IP: [] xenvif_get_ethtool_stats+0x50/0x80 [xen_netback] (gdb) disas xenvif_get_ethtool_stats+0x50 Dump of assembler code for function xenvif_get_ethtool_stats: 0x5280 <+0>: callq 0x5285 0x5285 <+5>: mov0x900(%rdi),%r9d 0x528c <+12>:mov$0x0,%r8 0x5293 <+19>:lea-0x1(%r9),%r10d 0x5297 <+23>:imul $0x36258,%r10,%r10 0x529e <+30>:xchg %ax,%ax 0x52a0 <+32>:test %r9d,%r9d 0x52a3 <+35>:je 0x52f8 0x52a5 <+37>:movzwl (%r8),%esi 0x52a9 <+41>:mov0x8f8(%rdi),%rcx 0x52b0 <+48>:lea0x0(,%rsi,8),%rax 0x52b8 <+56>:shl$0x6,%rsi 0x52bc <+60>:sub%rax,%rsi 0x52bf <+63>:lea(%rcx,%rsi,1),%rax 0x52c3 <+67>:lea0x36258(%rcx,%r10,1),%rcx 0x52cb <+75>:add%rcx,%rsi 0x52ce <+78>:xor%ecx,%ecx 0x52d0 <+80>:add0x36220(%rax),%rcx 0x52d7 <+87>:add$0x36258,%rax 0x52dd <+93>:cmp%rsi,%rax 0x52e0 <+96>:jne0x52d0 0x52e2 <+98>:add$0x22,%r8 0x52e6 <+102>: mov%rcx,(%rdx) 0x52e9 <+105>: add$0x8,%rdx 0x52ed <+109>: cmp$0x0,%r8 0x52f4 <+116>: jne0x52a0 0x52f6 <+118>: repz retq 0x52f8 <+120>: xor%ecx,%ecx 0x52fa <+122>: jmp0x52e2 End of assembler dump. (gdb) list *xenvif_get_ethtool_stats+0x50 0x52d0 is in xenvif_get_ethtool_stats (/build/linux-RGM_Ed/linux-3.16.7-ckt9/drivers/net/xen-netback/interface.c:349). ... and in the Debian kernel interface.c:349 is the accum += line from the patch. Ian. > > Signed-off-by: David Vrabel > --- > drivers/net/xen-netback/interface.c |3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/net/xen-netback/interface.c > b/drivers/net/xen-netback/interface.c > index f38227a..3aa8648 100644 > --- a/drivers/net/xen-netback/interface.c > +++ b/drivers/net/xen-netback/interface.c > @@ -340,12 +340,11 @@ static void xenvif_get_ethtool_stats(struct net_device > *dev, > unsigned int num_queues = vif->num_queues; > int i; > unsigned int queue_index; > - struct xenvif_stats *vif_stats; > > for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++) { > unsigned long accum = 0; > for (queue_index = 0; queue_index < num_queues; ++queue_index) { > - vif_stats = &vif->queues[queue_index].stats; > + void *vif_stats = &vif->queues[queue_index].stats; > accum += *(unsigned long *)(vif_stats + > xenvif_stats[i].offset); > } > data[i] = accum; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] can: mcp251x: not correct register address
v2: fix of corrupted patch This patch corrects addresses of acceptance filters. These registers are not in use, but values should be correct. Tested with MCP2515 and am3352 and also checked datasheets for MCP2515 and MCP2510. Signed-off-by: Tomas Krcka --- drivers/net/can/spi/mcp251x.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c index bf63fee..c1a95a3 100644 --- a/drivers/net/can/spi/mcp251x.c +++ b/drivers/net/can/spi/mcp251x.c @@ -190,10 +190,11 @@ #define RXBEID0_OFF 4 #define RXBDLC_OFF 5 #define RXBDAT_OFF 6 -#define RXFSIDH(n) ((n) * 4) -#define RXFSIDL(n) ((n) * 4 + 1) -#define RXFEID8(n) ((n) * 4 + 2) -#define RXFEID0(n) ((n) * 4 + 3) +#define RXFSID(n) ((n < 3) ? 0 : 4) +#define RXFSIDH(n) ((n) * 4 + RXFSID(n)) +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n)) +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n)) +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n)) #define RXMSIDH(n) ((n) * 4 + 0x20) #define RXMSIDL(n) ((n) * 4 + 0x21) #define RXMEID8(n) ((n) * 4 + 0x22) -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] bpf: add missing rcu protection when releasing programs from prog_array
On 05/30/2015 01:22 AM, Alexei Starovoitov wrote: ... Like __sk_filter_release() and __bpf_prog_release() should be removed. The whole filter cleanup procedure needs to be simplified a bit, got a bit too complicated over time, agreed. Of course, it's a grey line when to introduce a helper and when not to, but just because two lines are close enough between two functions it doesn't mean that helper is warranted. In this bpf_prog_put() case I think helper is not needed _today_. If it grows, we'll reconsider. Yes, that's what I meant. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware
Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote: >On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko wrote: >> Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote: >>>On Tue, May 19, 2015 at 1:28 PM, David Miller wrote: From: Andy Gospodarek Date: Tue, 19 May 2015 15:47:32 -0400 > Are you actually saying that if users complain loudly enough about > the current behavior (not the change Roopa has proposed) that you > would be open to considering a change the current behavior? I am saying that we have a contract with users not to break existing behavior. Full stop. >>> >>>After rehearing David's argument, we should probably explore option d) >>>which is a refinement on the fib_offload_disable mechanism we have >>>today. fib_offload_disable is global for all routes. Once we hit a >>>HW install problem, the global flag is set and all routes fallback to >>>SW. We did this because we can't allow the failed route to exist in >>>SW and not in HW because it could mess up LPM searches (HW could hit >>>on a lesser prefix even when SW has the true LPM, because HW gets >>>first shot at match). The refinement on fib_offload_disable is this: >>>make it per-related-prefix rather than global, and on a HW install >>>problem, set the flag for the related-prefix and uninstall only those >>>routes from HW. Related-prefix (is there a correct term for this?) >>>are routes to the same dst addr but with different prefix lengths. I >>>haven't parsed the fib_trie structure to see how routes are organized, >>>but I suspect since it's optimized for lookup the related-prefix >>>tracking is already there and we can build on that. >> >> This looks interesting. However, I'm not sure that it is acceptable for >> user to experience this hw evict of "random entries". User knows what >> entries are essential to have in hw. With your solution, I can see no way >> user can actually say what should be offloaded or not. Kernel just >> automagically decides. > >The default eviction policy could be based on RTA_PRIORITY: evict >lower priority routes first. It would be up to the device driver to >decide between two routes of same priority. > >To help device driver make the decision, we could have eviction policy options: > >Priority-base (default) >Prefer IPv6 over IPv4 >Prefer IPv4 over IPv6 >Prefer single path over multipath >Prefer longer prefix lengths over shorter >Optimize for resource utilization > >These are portable across different switches. They're in terms a >user understands. It's up to the device driver which truly >understands the device constraints to translates the user's eviction >policy choices into something that makes sense to that device. This sounds tempting... You plan to throw in some patches, or should I take care of that? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] can: mcp251x: not correct register address
You are right, sorry for that. I'll send v2. Thanks. 2015-05-30 9:41 GMT+02:00 Jakub Kicinski : > On Mon, 25 May 2015 08:57:48 +0200, Tomas Krcka wrote: >> This patch corrects addresses of acceptance filters. >> These registers are not in use, but values should be correct. >> Tested with MCP2515 and am3352 and also checked datasheets for MCP2515 >> and MCP2510. >> >> Signed-off-by: Tomas Krcka >> >> --- >> drivers/net/can/spi/mcp251x.c |9 + >> 1 files changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c >> index bf63fee..c1a95a3 100644 >> --- a/drivers/net/can/spi/mcp251x.c >> +++ b/drivers/net/can/spi/mcp251x.c >> @@ -190,10 +190,11 @@ >> #define RXBEID0_OFF 4 >> #define RXBDLC_OFF 5 >> #define RXBDAT_OFF 6 >> -#define RXFSIDH(n) ((n) * 4) >> -#define RXFSIDL(n) ((n) * 4 + 1) >> -#define RXFEID8(n) ((n) * 4 + 2) >> -#define RXFEID0(n) ((n) * 4 + 3) >> +#define RXFSID(n) ((n < 3) ? 0 : 4) >> +#define RXFSIDH(n) ((n) * 4 + RXFSID(n)) >> +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n)) >> +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n)) >> +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n)) >> #define RXMSIDH(n) ((n) * 4 + 0x20) >> #define RXMSIDL(n) ((n) * 4 + 0x21) >> #define RXMEID8(n) ((n) * 4 + 0x22) > > I think your patch was corrupted. It doesn't apply because you have > extra space before each surviving #define. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] can: mcp251x: not correct register address
On Mon, 25 May 2015 08:57:48 +0200, Tomas Krcka wrote: > This patch corrects addresses of acceptance filters. > These registers are not in use, but values should be correct. > Tested with MCP2515 and am3352 and also checked datasheets for MCP2515 > and MCP2510. > > Signed-off-by: Tomas Krcka > > --- > drivers/net/can/spi/mcp251x.c |9 + > 1 files changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c > index bf63fee..c1a95a3 100644 > --- a/drivers/net/can/spi/mcp251x.c > +++ b/drivers/net/can/spi/mcp251x.c > @@ -190,10 +190,11 @@ > #define RXBEID0_OFF 4 > #define RXBDLC_OFF 5 > #define RXBDAT_OFF 6 > -#define RXFSIDH(n) ((n) * 4) > -#define RXFSIDL(n) ((n) * 4 + 1) > -#define RXFEID8(n) ((n) * 4 + 2) > -#define RXFEID0(n) ((n) * 4 + 3) > +#define RXFSID(n) ((n < 3) ? 0 : 4) > +#define RXFSIDH(n) ((n) * 4 + RXFSID(n)) > +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n)) > +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n)) > +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n)) > #define RXMSIDH(n) ((n) * 4 + 0x20) > #define RXMSIDL(n) ((n) * 4 + 0x21) > #define RXMEID8(n) ((n) * 4 + 0x22) I think your patch was corrupted. It doesn't apply because you have extra space before each surviving #define. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html