Re: [PATCH net-next 0/3] net: systemport: misc improvements

2015-05-30 Thread David Miller
From: Florian Fainelli 
Date: Thu, 28 May 2015 15:24:41 -0700

> These patches are highly inspired by changes from Petri on bcmgenet, last 
> patch
> is a misc fix that I had pending for a while, but is not a candidate for 'net'
> at this point.

Applied, thanks Florian.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] ipv6: drop unneeded goto

2015-05-30 Thread David Miller
From: Julia Lawall 
Date: Thu, 28 May 2015 23:02:17 +0200

> From: Julia Lawall 
> 
> Delete jump to a label on the next line, when that label is not
> used elsewhere.
> 
> A simplified version of the semantic patch that makes this change is as
> follows: (http://coccinelle.lip6.fr/)
 ...
> Also remove the unnecessary ret variable.
> 
> Signed-off-by: Julia Lawall 

Applied, thanks Julia.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 0/3] bna: misc bugfixes

2015-05-30 Thread David Miller
From: Ivan Vecera 
Date: Thu, 28 May 2015 23:10:05 +0200

> These patches fix several bugs found during device initialization debugging.

Applied, thanks Ivan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 net-next 00/11] net: Increase inputs to flow_keys hashing

2015-05-30 Thread David Miller
From: Tom Herbert 
Date: Thu, 28 May 2015 11:18:57 -0700

> This patch set adds new fields to the flow_keys structure and hashes
> over these fields to get a better flow hash. In particular, these
> patches now include hashing over the full IPv6 addresses in order
> to defend against address spoofing that always results in the
> same hash. The new input also includes the Ethertype, L4 protocol,
> VLAN, flow label, GRE keyid, and MPLS entropy label.

Looks like one more respin needed of this based upon Jiri's feedback.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: thunderx: add 64-bit dependency

2015-05-30 Thread David Miller
From: Arnd Bergmann 
Date: Thu, 28 May 2015 16:00:46 +0200

> The thunderx ethernet driver fails to build on architectures
> that do not have an atomic readq() and writeq() function for
> 64-bit PCI bus access:
> 
> drivers/net/ethernet/cavium/thunder/thunder_bgx.c: In function 'bgx_reg_read':
> include/asm-generic/io.h:195:23: error: implicit declaration of function 
> 'readq' [-Werror=implicit-function-declaration]
> 
> It seems impossible to get this driver to work on most 32-bit
> hardware, so it's better to add an explicit dependency, in
> order to let us keep building 'allmodconfig' kernels on
> all architectures.
> 
> As the driver is meant for the internal hardware on an arm64 SoC, this
> is not a problem for usability. Allowing the build on all 64-bit
> architectures rather than just CONFIG_ARM64 on the other hand means that
> we get the benefit of build testing on x86.
> 
> Signed-off-by: Arnd Bergmann 

Applied, thanks Arnd.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull-request: mac80211 2015-05-28

2015-05-30 Thread David Miller
From: Johannes Berg 
Date: Thu, 28 May 2015 14:45:49 +0200

> Please excuse the quick succession with another pull request - Ben
> pointed out to me that a fix I'd applied on -next is actually needed on
> 4.1 - we'll have to live with it being in both I suppose. Sorry about
> that.

Pulled, thanks Johannes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V1 0/4] mlx4 driver update, May 28, 2015

2015-05-30 Thread David Miller
From: Or Gerlitz 
Date: Sun, 31 May 2015 09:30:14 +0300

> The 1st patch fixes an issue with a function running DPDK overriding 
> broadcast steering rules set by other functions. Please add this one 
> to your -stable queue.
> 
> The rest of the series from Matan and Ido deals with scaling the number 
> of IRQs that serve RoCE applications to be in par with the Ethernet driver.

Series applied, thanks Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware

2015-05-30 Thread Scott Feldman
On Sat, May 30, 2015 at 9:19 PM, John Fastabend
 wrote:
> On 05/30/2015 02:00 AM, Jiri Pirko wrote:
>>
>> Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote:
>>>
>>> On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko  wrote:

 Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote:
>
> On Tue, May 19, 2015 at 1:28 PM, David Miller 
> wrote:
>>
>> From: Andy Gospodarek 
>> Date: Tue, 19 May 2015 15:47:32 -0400
>>
>>> Are you actually saying that if users complain loudly enough about
>>> the current behavior (not the change Roopa has proposed) that you
>>> would be open to considering a change the current behavior?
>>
>>
>> I am saying that we have a contract with users not to break existing
>> behavior.  Full stop.
>
>
> After rehearing David's argument, we should probably explore option d)
> which is a refinement on the fib_offload_disable mechanism we have
> today.  fib_offload_disable is global for all routes.  Once we hit a
> HW install problem, the global flag is set and all routes fallback to
> SW.  We did this because we can't allow the failed route to exist in
> SW and not in HW because it could mess up LPM searches (HW could hit
> on a lesser prefix even when SW has the true LPM, because HW gets
> first shot at match).  The refinement on fib_offload_disable is this:
> make it per-related-prefix rather than global, and on a HW install
> problem, set the flag for the related-prefix and uninstall only those
> routes from HW.  Related-prefix (is there a correct term for this?)
> are routes to the same dst addr but with different prefix lengths.  I
> haven't parsed the fib_trie structure to see how routes are organized,
> but I suspect since it's optimized for lookup the related-prefix
> tracking is already there and we can build on that.


 This looks interesting. However, I'm not sure that it is acceptable for
 user to experience this hw evict of "random entries". User knows what
 entries are essential to have in hw. With your solution, I can see no
 way
 user can actually say what should be offloaded or not. Kernel just
 automagically decides.
>>>
>>>
>>> The default eviction policy could be based on RTA_PRIORITY: evict
>>> lower priority routes first.  It would be up to the device driver to
>>> decide between two routes of same priority.
>>>
>>> To help device driver make the decision, we could have eviction policy
>>> options:
>>>
>>> Priority-base (default)
>>> Prefer IPv6 over IPv4
>>> Prefer IPv4 over IPv6
>>> Prefer single path over multipath
>>> Prefer longer prefix lengths over shorter
>>> Optimize for resource utilization
>>>
>>> These are portable across different switches.   They're in terms a
>>> user understands.  It's up to the device driver which truly
>>> understands the device constraints to translates the user's eviction
>>> policy choices into something that makes sense to that device.
>>
>>
>> This sounds tempting... You plan to throw in some patches, or should I
>> take care of that?
>>
>
> This is encoding specific policies into the kernel. I was hoping to
> avoid this and let user space develop whatever policy it wants. If you
> use Jiri's proposed NLM_F_SKIP_{KERNEL|OFFLOAD} flags you get this.
>
> Also I don't understand the "truly  understands the device constraints"
> comment. We can export a model of the device and know how many rules
> of each type will fit exactly into the table. This doesn't seem like
> much of a problem to me. In fact the driver developer should know this
> anyway.
>
> Part of my motivation here is I really don't want to get stuck with a
> case where each driver writer gets to translate the eviction policy
> onto their device in some device specific and slightly different way.

But this is _exactly_ what I want.  Here's why: my claim is it will be
impossible for us (device vendors) to define a universal set of
resource constraints that works for all devices from all vendors.  I
was kind of hoping some vendor would throw out a set to get us
started.  Ok, I'll start with rocker: rocker will enforce in the
device these constraints listed below.  There will be a device command
to query the raw constraints.   So here goes:

VLAN table max entries: 16K // a VLAN on a port takes one entry
Term MAC table max entries: no limit
Bridging table:
 Unicast max entries: 12K
 Multicast max entries: 4K
Unicast Routing table (shared for v4 and v6 entries):
 Prefix max slots: 16K
 IPv4 route takes one slot
 IPv6 prefix len <= 64 route takes two slots
 IPv6 prefix len > 64 takes four slots
Nexthop max slots: 4K
 Max ECMP width: 32
 Each nexthop MAC takes one slot, but there is a stride of 4 slots
Multicast Routing table (shared for v4 and v6 entries):
(same as unicast routing, except max slots are 1/2 as big)
ACL

[PATCH] bridge: fix br_multicast_query_expired() bug

2015-05-30 Thread David Miller

From: Eric Dumazet 

br_multicast_query_expired() querier argument is a pointer to
a struct bridge_mcast_querier :

struct bridge_mcast_querier {
struct br_ip addr;
struct net_bridge_port __rcu*port;
};

Intent of the code was to clear port field, not the pointer to querier.

Fixes: 2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier 
port")
Signed-off-by: Eric Dumazet 
Acked-by: Thadeu Lima de Souza Cascardo 
Acked-by: Linus Lüssing 
Cc: Linus Lüssing 
Cc: Steinar H. Gunderson 
Signed-off-by: David S. Miller 
---

Posting this to the list so it gets into patchwork and I can properly
queue it up for -stable.

 net/bridge/br_multicast.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index a3abe6e..22fd041 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1822,7 +1822,7 @@ static void br_multicast_query_expired(struct net_bridge 
*br,
if (query->startup_sent < br->multicast_startup_query_count)
query->startup_sent++;
 
-   RCU_INIT_POINTER(querier, NULL);
+   RCU_INIT_POINTER(querier->port, NULL);
br_multicast_send_query(br, NULL, query);
spin_unlock(&br->multicast_lock);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bridge: fix br_multicast_query_expired() bug

2015-05-30 Thread David Miller
From: Eric Dumazet 
Date: Thu, 28 May 2015 04:42:54 -0700

> From: Eric Dumazet 
> 
> br_multicast_query_expired() querier argument is a pointer to
> a struct bridge_mcast_querier :
> 
> struct bridge_mcast_querier {
> struct br_ip addr;
> struct net_bridge_port __rcu*port;
> };
> 
> Intent of the code was to clear port field, not the pointer to querier.
> 
> Fixes: 2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier 
> port")
> Signed-off-by: Eric Dumazet 

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 3/4] net/mlx4_core: Move affinity hints to mlx4_core ownership

2015-05-30 Thread Or Gerlitz
From: Ido Shamay 

Now that EQs management is in the sole responsibility of mlx4_core,
the IRQ affinity hints configuration should be in its hands as well.
request_irq is called only once by the first consumer (maybe mlx4_ib),
so mlx4_en passes the affinity mask too late. We also need to request
vectors according to the cores we want to run on.

mlx4_core distribution of IRQs to cores is straight forward,
EQ(i)->IRQ will set affinity hint to core i.
Consumers need to request EQ vectors, according to their cores
considerations (NUMA).

Signed-off-by: Ido Shamay 
Signed-off-by: Matan Barak 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c |   10 +---
 drivers/net/ethernet/mellanox/mlx4/eq.c|   21 
 drivers/net/ethernet/mellanox/mlx4/main.c  |   36 
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |1 +
 4 files changed, 59 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c 
b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index d71c567..63769df 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -114,7 +114,7 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct 
mlx4_en_cq *cq,
if (cq->is_tx == RX) {
if (!mlx4_is_eq_vector_valid(mdev->dev, priv->port,
 cq->vector)) {
-   cq->vector = cq_idx;
+   cq->vector = 
cpumask_first(priv->rx_ring[cq->ring]->affinity_mask);
 
err = mlx4_assign_eq(mdev->dev, priv->port,
 &cq->vector);
@@ -160,13 +160,6 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct 
mlx4_en_cq *cq,
netif_napi_add(cq->dev, &cq->napi, mlx4_en_poll_tx_cq,
   NAPI_POLL_WEIGHT);
} else {
-   struct mlx4_en_rx_ring *ring = priv->rx_ring[cq->ring];
-
-   err = irq_set_affinity_hint(cq->mcq.irq,
-   ring->affinity_mask);
-   if (err)
-   mlx4_warn(mdev, "Failed setting affinity hint\n");
-
netif_napi_add(cq->dev, &cq->napi, mlx4_en_poll_rx_cq, 64);
napi_hash_add(&cq->napi);
}
@@ -205,7 +198,6 @@ void mlx4_en_deactivate_cq(struct mlx4_en_priv *priv, 
struct mlx4_en_cq *cq)
if (!cq->is_tx) {
napi_hash_del(&cq->napi);
synchronize_rcu();
-   irq_set_affinity_hint(cq->mcq.irq, NULL);
}
netif_napi_del(&cq->napi);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c 
b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 2e6fc6a..1116882 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -221,6 +221,20 @@ static void mlx4_slave_event(struct mlx4_dev *dev, int 
slave,
slave_event(dev, slave, eqe);
 }
 
+static void mlx4_set_eq_affinity_hint(struct mlx4_priv *priv, int vec)
+{
+   int hint_err;
+   struct mlx4_dev *dev = &priv->dev;
+   struct mlx4_eq *eq = &priv->eq_table.eq[vec];
+
+   if (!eq->affinity_mask || cpumask_empty(eq->affinity_mask))
+   return;
+
+   hint_err = irq_set_affinity_hint(eq->irq, eq->affinity_mask);
+   if (hint_err)
+   mlx4_warn(dev, "irq_set_affinity_hint failed, err %d\n", 
hint_err);
+}
+
 int mlx4_gen_pkey_eqe(struct mlx4_dev *dev, int slave, u8 port)
 {
struct mlx4_eqe eqe;
@@ -1092,6 +1106,10 @@ static void mlx4_free_irqs(struct mlx4_dev *dev)
 
for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i)
if (eq_table->eq[i].have_irq) {
+   free_cpumask_var(eq_table->eq[i].affinity_mask);
+#if defined(CONFIG_SMP)
+   irq_set_affinity_hint(eq_table->eq[i].irq, NULL);
+#endif
free_irq(eq_table->eq[i].irq, eq_table->eq + i);
eq_table->eq[i].have_irq = 0;
}
@@ -1483,6 +1501,9 @@ int mlx4_assign_eq(struct mlx4_dev *dev, u8 port, int 
*vector)
clear_bit(*prequested_vector, priv->msix_ctl.pool_bm);
*prequested_vector = -1;
} else {
+#if defined(CONFIG_SMP)
+   mlx4_set_eq_affinity_hint(priv, *prequested_vector);
+#endif
eq_set_ci(&priv->eq_table.eq[*prequested_vector], 1);
priv->eq_table.eq[*prequested_vector].have_irq = 1;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 3ec5113..0dbd704 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2481,6 +2481,36 @@ err_uar_table_free:
return err;
 }
 
+static int mlx4_init_affinity_hint(struct mlx4_dev *dev, int port, int eqn)
+{
+   int requested_cp

[PATCH net-next V1 1/4] net/mlx4_core: Demote simple multicast and broadcast flow steering rules

2015-05-30 Thread Or Gerlitz
From: Matan Barak 

In SRIOV, when simple (i.e - Ethernet L2 only) flow steering rules are
created, always create them at MLX4_DOMAIN_NIC priority (instead of
the real priority the function created them at). This is done in order
to let multiple functions add broadcast/multicast rules without
affecting other functions, which is necessary for DPDK in SRIOV.

Signed-off-by: Matan Barak 
Signed-off-by: Or Gerlitz 
---
 drivers/infiniband/hw/mlx4/main.c  |4 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   23 
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index cc64400..8c96c71 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1090,7 +1090,7 @@ static int __mlx4_ib_create_flow(struct ib_qp *qp, struct 
ib_flow_attr *flow_att
 
ret = mlx4_cmd_imm(mdev->dev, mailbox->dma, reg_id, size >> 2, 0,
   MLX4_QP_FLOW_STEERING_ATTACH, MLX4_CMD_TIME_CLASS_A,
-  MLX4_CMD_NATIVE);
+  MLX4_CMD_WRAPPED);
if (ret == -ENOMEM)
pr_err("mcg table is full. Fail to register network rule.\n");
else if (ret == -ENXIO)
@@ -1107,7 +1107,7 @@ static int __mlx4_ib_destroy_flow(struct mlx4_dev *dev, 
u64 reg_id)
int err;
err = mlx4_cmd(dev, reg_id, 0, 0,
   MLX4_QP_FLOW_STEERING_DETACH, MLX4_CMD_TIME_CLASS_A,
-  MLX4_CMD_NATIVE);
+  MLX4_CMD_WRAPPED);
if (err)
pr_err("Fail to detach network rule. registration id = 
0x%llx\n",
   reg_id);
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c 
b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 15ec081..ab48386 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -3973,6 +3973,22 @@ static int validate_eth_header_mac(int slave, struct 
_rule_hw *eth_header,
return 0;
 }
 
+static void handle_eth_header_mcast_prio(struct mlx4_net_trans_rule_hw_ctrl 
*ctrl,
+struct _rule_hw *eth_header)
+{
+   if (is_multicast_ether_addr(eth_header->eth.dst_mac) ||
+   is_broadcast_ether_addr(eth_header->eth.dst_mac)) {
+   struct mlx4_net_trans_rule_hw_eth *eth =
+   (struct mlx4_net_trans_rule_hw_eth *)eth_header;
+   struct _rule_hw *next_rule = (struct _rule_hw *)(eth + 1);
+   bool last_rule = next_rule->size == 0 && next_rule->id == 0 &&
+   next_rule->rsvd == 0;
+
+   if (last_rule)
+   ctrl->prio = cpu_to_be16(MLX4_DOMAIN_NIC);
+   }
+}
+
 /*
  * In case of missing eth header, append eth header with a MAC address
  * assigned to the VF.
@@ -4125,6 +4141,12 @@ int mlx4_QP_FLOW_STEERING_ATTACH_wrapper(struct mlx4_dev 
*dev, int slave,
rule_header = (struct _rule_hw *)(ctrl + 1);
header_id = map_hw_to_sw_id(be16_to_cpu(rule_header->id));
 
+   if (header_id == MLX4_NET_TRANS_RULE_ID_ETH)
+   handle_eth_header_mcast_prio(ctrl, rule_header);
+
+   if (slave == dev->caps.function)
+   goto execute;
+
switch (header_id) {
case MLX4_NET_TRANS_RULE_ID_ETH:
if (validate_eth_header_mac(slave, rule_header, rlist)) {
@@ -4151,6 +4173,7 @@ int mlx4_QP_FLOW_STEERING_ATTACH_wrapper(struct mlx4_dev 
*dev, int slave,
goto err_put;
}
 
+execute:
err = mlx4_cmd_imm(dev, inbox->dma, &vhcr->out_param,
   vhcr->in_modifier, 0,
   MLX4_QP_FLOW_STEERING_ATTACH, MLX4_CMD_TIME_CLASS_A,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 2/4] net/mlx4: Add EQ pool

2015-05-30 Thread Or Gerlitz
From: Matan Barak 

Previously, mlx4_en allocated EQs and used them exclusively.
This affected RoCE performance, as applications which are
events sensitive were limited to use only the legacy EQs.

Change that by introducing an EQ pool. This pool is managed
by mlx4_core. EQs are assigned to ports (when there are limited
number of EQs, multiple ports could be assigned to the same EQs).

An exception to this rule is the ASYNC EQ which handles various events.

Legacy EQs are completely removed as all EQs could be shared.

When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
EQ serving on a specific port. The core driver calculates which
EQ should be assigned to that request.

Because IRQs are shared between IB and Ethernet modules, their
names only include the PCI device BDF address.

Signed-off-by: Matan Barak 
Signed-off-by: Ido Shamay 
Signed-off-by: Or Gerlitz 
---
 drivers/infiniband/hw/mlx4/main.c  |   71 ++
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |1 -
 drivers/net/ethernet/mellanox/mlx4/cq.c|   10 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c |   48 ++--
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |7 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   13 +-
 drivers/net/ethernet/mellanox/mlx4/eq.c|  353 ++--
 drivers/net/ethernet/mellanox/mlx4/main.c  |   74 --
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |   11 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |2 +-
 include/linux/mlx4/device.h|   11 +-
 11 files changed, 342 insertions(+), 259 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 8c96c71..024b0f7 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2041,77 +2041,52 @@ static void init_pkeys(struct mlx4_ib_dev *ibdev)
 
 static void mlx4_ib_alloc_eqs(struct mlx4_dev *dev, struct mlx4_ib_dev *ibdev)
 {
-   char name[80];
-   int eq_per_port = 0;
-   int added_eqs = 0;
-   int total_eqs = 0;
-   int i, j, eq;
-
-   /* Legacy mode or comp_pool is not large enough */
-   if (dev->caps.comp_pool == 0 ||
-   dev->caps.num_ports > dev->caps.comp_pool)
-   return;
-
-   eq_per_port = dev->caps.comp_pool / dev->caps.num_ports;
-
-   /* Init eq table */
-   added_eqs = 0;
-   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
-   added_eqs += eq_per_port;
-
-   total_eqs = dev->caps.num_comp_vectors + added_eqs;
+   int i, j, eq = 0, total_eqs = 0;
 
-   ibdev->eq_table = kzalloc(total_eqs * sizeof(int), GFP_KERNEL);
+   ibdev->eq_table = kcalloc(dev->caps.num_comp_vectors,
+ sizeof(ibdev->eq_table[0]), GFP_KERNEL);
if (!ibdev->eq_table)
return;
 
-   ibdev->eq_added = added_eqs;
-
-   eq = 0;
-   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB) {
-   for (j = 0; j < eq_per_port; j++) {
-   snprintf(name, sizeof(name), "mlx4-ib-%d-%d@%s",
-i, j, dev->persist->pdev->bus->name);
-   /* Set IRQ for specific name (per ring) */
-   if (mlx4_assign_eq(dev, name, NULL,
-  &ibdev->eq_table[eq])) {
-   /* Use legacy (same as mlx4_en driver) */
-   pr_warn("Can't allocate EQ %d; reverting to 
legacy\n", eq);
-   ibdev->eq_table[eq] =
-   (eq % dev->caps.num_comp_vectors);
-   }
-   eq++;
+   for (i = 1; i <= dev->caps.num_ports; i++) {
+   for (j = 0; j < mlx4_get_eqs_per_port(dev, i);
+j++, total_eqs++) {
+   if (i > 1 &&  mlx4_is_eq_shared(dev, total_eqs))
+   continue;
+   ibdev->eq_table[eq] = total_eqs;
+   if (!mlx4_assign_eq(dev, i,
+   &ibdev->eq_table[eq]))
+   eq++;
+   else
+   ibdev->eq_table[eq] = -1;
}
}
 
-   /* Fill the reset of the vector with legacy EQ */
-   for (i = 0, eq = added_eqs; i < dev->caps.num_comp_vectors; i++)
-   ibdev->eq_table[eq++] = i;
+   for (i = eq; i < dev->caps.num_comp_vectors;
+ibdev->eq_table[i++] = -1)
+   ;
 
/* Advertise the new number of EQs to clients */
-   ibdev->ib_dev.num_comp_vectors = total_eqs;
+   ibdev->ib_dev.num_comp_vectors = eq;
 }
 
 static void mlx4_ib_free_eqs(struct mlx4_dev *dev, struct mlx4_ib_dev *ibdev)
 {
int i;
+   int total_eqs = ibdev->ib_dev.num_comp_vectors;
 
-   /* no additional eqs were added */
+   /* no eqs were allocated */

Re: [PATCH net-next 4/4] net/mlx4_core: Make sure there are no pending async events when freeing CQ

2015-05-30 Thread Or Gerlitz

On 5/31/2015 9:23 AM, David Miller wrote:

I agree with Sergei that one empty line is sufficient here, don't make
it into two.

Please respin with this fixed.


Sure, I prepared V1 to address that earlier today, and will send it now.

Or.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 4/4] net/mlx4_core: Make sure there are no pending async events when freeing CQ

2015-05-30 Thread Or Gerlitz
From: Matan Barak 

When freeing a CQ, we need to make sure there are no
asynchronous events (on the ASYNC EQ) that could
relate to this CQ before freeing it.

This is done by introducing synchronize_irq.

Signed-off-by: Matan Barak 
Signed-off-by: Ido Shamay 
Signed-off-by: Or Gerlitz 
---
 drivers/net/ethernet/mellanox/mlx4/cq.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c 
b/drivers/net/ethernet/mellanox/mlx4/cq.c
index 7431cd4..3348e64 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -369,6 +369,9 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq)
mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, 
cq->cqn);
 

synchronize_irq(priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq);
+   if (priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq !=
+   priv->eq_table.eq[MLX4_EQ_ASYNC].irq)
+   synchronize_irq(priv->eq_table.eq[MLX4_EQ_ASYNC].irq);
 
spin_lock_irq(&cq_table->lock);
radix_tree_delete(&cq_table->tree, cq->cqn);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 0/4] mlx4 driver update, May 28, 2015

2015-05-30 Thread Or Gerlitz
Hi Dave,

The 1st patch fixes an issue with a function running DPDK overriding 
broadcast steering rules set by other functions. Please add this one 
to your -stable queue.

The rest of the series from Matan and Ido deals with scaling the number 
of IRQs that serve RoCE applications to be in par with the Ethernet driver.

Or.

changes from V0:
 - addressed feedback from Sergei, removed extra blank line in patch #4

Ido Shamay (1):
  net/mlx4_core: Move affinity hints to mlx4_core ownership

Matan Barak (3):
  net/mlx4_core: Demote simple multicast and broadcast flow steering rules
  net/mlx4: Add EQ pool
  net/mlx4_core: Make sure there are no pending async events when freeing CQ

 drivers/infiniband/hw/mlx4/main.c  |   75 ++---
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |1 -
 drivers/net/ethernet/mellanox/mlx4/cq.c|   13 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c |   56 ++--
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |7 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   13 +-
 drivers/net/ethernet/mellanox/mlx4/eq.c|  374 
 drivers/net/ethernet/mellanox/mlx4/main.c  |  110 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |   12 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |2 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   23 ++
 include/linux/mlx4/device.h|   11 +-
 12 files changed, 428 insertions(+), 269 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull request: bluetooth-next 2015-05-28

2015-05-30 Thread David Miller
From: Johan Hedberg 
Date: Thu, 28 May 2015 12:31:43 +0300

> Here's a set of patches intended for 4.2. The majority of the changes
> are on the 802.15.4 side of things rather than Bluetooth related:
> 
>  - All sorts of cleanups & fixes to ieee802154 and related drivers
>  - Rework of tx power support in ieee802154 and its drivers
>  - Support for setting ieee802154 tx power through nl802154
>  - New IDs for the btusb driver
>  - Various cleanups & smaller fixes to btusb
>  - New btrtl driver for Realtec devices
>  - Fix suspend/resume for Realtek devices
> 
> Please let me know if there are any issues pulling. Thanks.

Pulled, thanks Johan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] isdn: Use ktime_t instead of 'struct timeval'

2015-05-30 Thread Tina Ruchandani
>
> This doesn't compile:
>

Oops, I sent an older version of the patch with a typo. I've correct
this in a v4. (NS_PER_SEC -> NSEC_PER_SEC). Thanks for taking a look
at this.

Tina
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 4/4] net/mlx4_core: Make sure there are no pending async events when freeing CQ

2015-05-30 Thread David Miller
From: Or Gerlitz 
Date: Thu, 28 May 2015 18:41:16 +0300

> @@ -369,6 +369,10 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq 
> *cq)
>   mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, 
> cq->cqn);
>  
>   
> synchronize_irq(priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq);
> + if (priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq !=
> + priv->eq_table.eq[MLX4_EQ_ASYNC].irq)
> + synchronize_irq(priv->eq_table.eq[MLX4_EQ_ASYNC].irq);
> +
>  
>   spin_lock_irq(&cq_table->lock);
>   radix_tree_delete(&cq_table->tree, cq->cqn);

I agree with Sergei that one empty line is sufficient here, don't make
it into two.

Please respin with this fixed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware

2015-05-30 Thread John Fastabend

On 05/30/2015 02:00 AM, Jiri Pirko wrote:

Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote:

On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko  wrote:

Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote:

On Tue, May 19, 2015 at 1:28 PM, David Miller  wrote:

From: Andy Gospodarek 
Date: Tue, 19 May 2015 15:47:32 -0400


Are you actually saying that if users complain loudly enough about
the current behavior (not the change Roopa has proposed) that you
would be open to considering a change the current behavior?


I am saying that we have a contract with users not to break existing
behavior.  Full stop.


After rehearing David's argument, we should probably explore option d)
which is a refinement on the fib_offload_disable mechanism we have
today.  fib_offload_disable is global for all routes.  Once we hit a
HW install problem, the global flag is set and all routes fallback to
SW.  We did this because we can't allow the failed route to exist in
SW and not in HW because it could mess up LPM searches (HW could hit
on a lesser prefix even when SW has the true LPM, because HW gets
first shot at match).  The refinement on fib_offload_disable is this:
make it per-related-prefix rather than global, and on a HW install
problem, set the flag for the related-prefix and uninstall only those
routes from HW.  Related-prefix (is there a correct term for this?)
are routes to the same dst addr but with different prefix lengths.  I
haven't parsed the fib_trie structure to see how routes are organized,
but I suspect since it's optimized for lookup the related-prefix
tracking is already there and we can build on that.


This looks interesting. However, I'm not sure that it is acceptable for
user to experience this hw evict of "random entries". User knows what
entries are essential to have in hw. With your solution, I can see no way
user can actually say what should be offloaded or not. Kernel just
automagically decides.


The default eviction policy could be based on RTA_PRIORITY: evict
lower priority routes first.  It would be up to the device driver to
decide between two routes of same priority.

To help device driver make the decision, we could have eviction policy options:

Priority-base (default)
Prefer IPv6 over IPv4
Prefer IPv4 over IPv6
Prefer single path over multipath
Prefer longer prefix lengths over shorter
Optimize for resource utilization

These are portable across different switches.   They're in terms a
user understands.  It's up to the device driver which truly
understands the device constraints to translates the user's eviction
policy choices into something that makes sense to that device.


This sounds tempting... You plan to throw in some patches, or should I
take care of that?



This is encoding specific policies into the kernel. I was hoping to
avoid this and let user space develop whatever policy it wants. If you
use Jiri's proposed NLM_F_SKIP_{KERNEL|OFFLOAD} flags you get this.

Also I don't understand the "truly  understands the device constraints"
comment. We can export a model of the device and know how many rules
of each type will fit exactly into the table. This doesn't seem like
much of a problem to me. In fact the driver developer should know this
anyway.

Part of my motivation here is I really don't want to get stuck with a
case where each driver writer gets to translate the eviction policy
onto their device in some device specific and slightly different way.
It means every developer has to write a new mapping and get it correct.
At very least we should put a layer in switchdev that reads the table
out of the driver and does the mapping so we have it one spot. At least
then the kernel is enforcing policy the same on all devices. Better
still IMO would be to develop the policy in user space and have a
library/tool that does this so we don't end up with a bunch of policy
blobs in the kernel. The 6 above is a good start but over time we more
policy blobs will surely pop up. I would for example put 'optimize for
throughput' on the list.

.John

--
John Fastabend Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()

2015-05-30 Thread Greg KH
On Sun, May 31, 2015 at 11:53:47AM +0900, Greg KH wrote:
> On Mon, May 25, 2015 at 11:02:27AM -0500, Larry Finger wrote:
> > On 05/23/2015 04:16 PM, Larry Finger wrote:
> > >The driver is reporting a warning at kernel/time/timer.c:1096 due to 
> > >calling
> > >del_timer_sync() while in interrupt mode. Such warnings are fixed by 
> > >calling
> > >del_timer() instead.
> > >
> > >Signed-off-by: Larry Finger 
> > >Cc: Stable 
> > >Cc: Haggi Eran 
> > >---
> > 
> > Greg,
> > 
> > Please drop this patch. The same fixes were submitted as
> > https://lkml.org/lkml/2015/5/15/226.
> 
> That's not working for me at the moment, what was the subject: name?  I
> think I already applied it to the testing tree...

Nevermind, found it...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()

2015-05-30 Thread Greg KH
On Mon, May 25, 2015 at 11:02:27AM -0500, Larry Finger wrote:
> On 05/23/2015 04:16 PM, Larry Finger wrote:
> >The driver is reporting a warning at kernel/time/timer.c:1096 due to calling
> >del_timer_sync() while in interrupt mode. Such warnings are fixed by calling
> >del_timer() instead.
> >
> >Signed-off-by: Larry Finger 
> >Cc: Stable 
> >Cc: Haggi Eran 
> >---
> 
> Greg,
> 
> Please drop this patch. The same fixes were submitted as
> https://lkml.org/lkml/2015/5/15/226.

That's not working for me at the moment, what was the subject: name?  I
think I already applied it to the testing tree...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-4 100G Ethernet driver

2015-05-30 Thread David Miller
From: Amir Vadai 
Date: Thu, 28 May 2015 22:28:37 +0300

> This patchset extends the mlx5_core driver to support Ethernet
> functionality. The Ethernet functionality in the mlx5 driver is
> integrated into the core driver and not as separated driver. The
> IB functionality remains in the mlx5_ib driver as before.

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next 00/14][pull request] Intel Wired LAN Driver Updates 2015-05-28

2015-05-30 Thread David Miller
From: Jeff Kirsher 
Date: Thu, 28 May 2015 04:25:25 -0700

> This series contains updates to ethtool, ixgbe, i40e and i40evf.
> 
> John adds helper routines for ethtool to pass VF to rx_flow_spec.  Since
> the ring_cookie is 64 bits wide which is much larger than what could be
> used for actual queue index values, provide helper routines to pack a VF
> index into the cookie.  Then John provides a ixgbe patch to allow flow
> director to use the entire queue space.
> 
> Neerav provides a i40e patch to collect XOFF Rx stats, where it was not
> being collected before.
> 
> Anjali provides ATR support for tunneled packets, as well as stats to
> count tunnel ATR hits.  Cleaned up PF struct members which are
> unnecessary, since we can use the stat index macro directly.  Cleaned
> up flow director ATR/SB messages to a higher debug level since they
> are not useful unless silicon validation is happening.
> 
> Greg provides a patch to disable offline diagnostics if VFs are enabled
> since ethtool offline diagnostic tests are not designed (out of scope)
> to disable VF functions for testing and re-enable afterward.  Also cleans
> up TODO comment that is no longer needed.
> 
> Vasu provides a fix an FCoE EOF case where i40e_fcoe_ctxt_eof() maybe
> called before i40e_fcoe_eof_is_supported() is called.
> 
> Jesse adds skb->xmit_more support for i40evf.  Then provides a performance
> enhancement for i40evf by inlining some functions which provides a 15%
> gain in small packet performance.  Also cleans up the use of time_stamp
> since it is no longer used to determine if there is a tx_hang and was
> a part of a previous tx_hang design which is no longer used.

Pulled, thanks Jeff.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tipc: unconditionally put sock refcnt when sock timer to be deleted is pending

2015-05-30 Thread David Miller
From: Ying Xue 
Date: Thu, 28 May 2015 13:19:22 +0800

> As sock refcnt is taken when sock timer is started in
> sk_reset_timer(), the sock refcnt should be put when sock timer
> to be deleted is in pending state no matter what "probing_state"
> value of tipc sock is.
> 
> Reviewed-by: Erik Hugne 
> Reviewed-by: Jon Maloy 
> Signed-off-by: Ying Xue 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] if_vlan: fix vlaue -> value typo

2015-05-30 Thread David Miller
From: Vivien Didelot 
Date: Wed, 27 May 2015 21:07:26 -0400

> Fixes "vlaue" for "value" in include/linux/if_vlan.h.
> 
> Signed-off-by: Vivien Didelot 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: allow BPF programs access skb->skb_iif and skb->dev->ifindex fields

2015-05-30 Thread David Miller
From: Alexei Starovoitov 
Date: Wed, 27 May 2015 15:30:39 -0700

> classic BPF already exposes skb->dev->ifindex via SKF_AD_IFINDEX extension.
> Allow eBPF program to access it as well. Note that classic aborts execution
> of the program if 'skb->dev == NULL' (which is inconvenient for program
> writers), whereas eBPF returns zero in such case.
> Also expose the 'skb_iif' field, since programs triggered by redirected
> packet need to known the original interface index.
> Summary:
> __skb->ifindex -> skb->dev->ifindex
> __skb->ingress_ifindex -> skb->skb_iif
> 
> Signed-off-by: Alexei Starovoitov 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 net-next 1/1] hv_netvsc: Properly size the vrss queues

2015-05-30 Thread David Miller
From: "K. Y. Srinivasan" 
Date: Wed, 27 May 2015 13:16:57 -0700

> The current algorithm for deciding on the number of VRSS channels is
> not optimal since we open up the min of number of CPUs online and the
> number of VRSS channels the host is offering. So on a 32 VCPU guest
> we could potentially open 32 VRSS subchannels. Experimentation has
> shown that it is best to limit the number of VRSS channels to the number
> of CPUs within a NUMA node.
> 
> Here is the new algorithm for deciding on the number of sub-channels we
> would open up:
> 1) Pick the minimum of what the host is offering and what the driver
>in the guest is specifying as the default value.
> 2) Pick the minimum of (1) and the numbers of CPUs in the NUMA
>node the primary channel is bound to.
> 
> 
> Signed-off-by: K. Y. Srinivasan 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN

2015-05-30 Thread David Miller
From: Sorin Dumitru 
Date: Wed, 27 May 2015 22:16:49 +0300

> This is similar to b1cb59cf2efe(net: sysctl_net_core: check SNDBUF
> and RCVBUF for min length). I don't think too small values can cause
> crashes in the case of udp and tcp, but I've seen this set to too
> small values which triggered awful performance. It also makes the
> setting consistent across all the wmem/rmem sysctls.
> 
> Signed-off-by: Sorin Dumitru 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gefeliciteerd !!!

2015-05-30 Thread Facebook Rewards Program



--
Gefeliciteerd !!!

Including we Vieren Onze 10 jaar Van het internet Journey en Global 
Communication we are Blij aan te kondigen aan u DAT Uw Facebook-rekening 
are willekeurig geselecteerd als begunstigde van $ 1,000,000.00usd in de 
2014/2015 Facebook account van het Jaar {Grote Rewards winnaar} .


E-mail ons de informatie hieronder: fb_deliveryserv...@mynet.com

BERICHT VAN identificatie: NW90W0W0-XANSIEW-1015
1) Bedrag gewonnen: $ 1.000.000,00 usd
2) facebook Gebruikersnaam:
3) De dialog Land van Woonplaats:
4) Paspoort / Identity Number:


E-mail: fb_deliveryserv...@mynet.com
George Jones.

Program Coordinator,
Facebook Rewards Program,
www.facebook.com
Alle Rechten voorbehouden 2015.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] dsa: Add support for multiple cpu ports.

2015-05-30 Thread Sergey Ryazanov
2015-05-30 15:09 GMT+03:00 Bjørn Mork :
> Andrew Lunn  writes:
>
>> Some boards have two CPU interfaces connected to the switch, e.g. WiFi
>> access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and
>> two port connected to the SoC.
>>
>> This patch extends DSA to allows both CPU ports to be used. The "cpu"
>> node in the DSA tree can now have a phandle to the host interface it
>> connects to. Each user port can have a phandle to a cpu port which
>> should be used for traffic between the port and the CPU. Thus simple
>> load sharing over the two CPU ports can be achieved.
>>
>> Signed-off-by: Andrew Lunn 
>> ---
>>  Documentation/devicetree/bindings/net/dsa/dsa.txt |  66 -
>>  drivers/net/dsa/mv88e6xxx.c   |   8 +-
>>  include/net/dsa.h |  28 +-
>>  net/dsa/dsa.c | 109 
>> ++
>>  net/dsa/dsa_priv.h|   6 ++
>>  net/dsa/slave.c   |  10 +-
>>  net/dsa/tag_brcm.c|   2 +-
>>  net/dsa/tag_dsa.c |   2 +-
>>  net/dsa/tag_edsa.c|   2 +-
>>  net/dsa/tag_trailer.c |   2 +-
>>  10 files changed, 206 insertions(+), 29 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt 
>> b/Documentation/devicetree/bindings/net/dsa/dsa.txt
>> index f0b4cd72411d..34f7f18026e5 100644
>> --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
>> +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
>> @@ -58,13 +58,24 @@ Optionnal property:
>> Documentation/devicetree/bindings/net/ethernet.txt
>> for details.
>>
>> +- ethernet   : Optional for "cpu" ports. A phandle to an ethernet
>> +  device which will be used by this CPU port for
>> +   passing packets to/from the host. If not present,
>> +   the port will use the "dsa,ethernet" property
>> +   defined above.
>> +
>> +- cpu: Option for non "cpu"/"dsa" ports. A phandle 
>> to a
>> +   "cpu" port, which will be used for passing packets
>> +   from this port to the host. If not present, the first
>> +   "cpu" port will be used.
>> +
>

Forgive me my intrusion. Maybe I could answer to some of your questions.

> I'm in deep water here, but this scheme sounds a little too static to me
> if I understand your proposal correctly.  Why would you want to create a
> static mapping of CPU ports to external ports for any given device?

Vendor already assumes that this mapping is static and DT just
describes this assumption. Single switch chip with two ports connected
to CPU on such devices is cheaper than switch chip + dedicated phy
chip. In other words, one of the switch ports just used as independent
phy and Andrew's patch gives an ability to perfectly describe such
situation.

> To me, that's part of the switch VLAN configuration.
>
AFAIK DSA is designed to allow L3 routing between ports as opposed to
switching and VLANs at L2.
DSA facilitates work of hardware designer by providing more
configurable chips. If so then interconnection tasks should be
resolved by kernel in "plug-and-play" manner, just as kernel assigns
memory regions to PCI devices :)

> My experience with these devices is limited to running OpenWRT on an
> WRT1900AC, having a Marvell 88E6172 switch.  And using the OpenWRT
> switch API of course. There I've found it very useful to be able to mix
> and match the two CPU ports as I like with the external ports. How you
> want the CPU ports used is not as much depeing on device properties as
> on your network configuration, IMHO.  How many and which links do you
> have?  What bandwith are they? Trunks or not?  Etc.  You cannot describe
> these answers as device properties, because they aren't.
>
Nobody forbids to run custom kernel with custom DT in case of custom setup :)

> You can currently configure this as you like in OpenWRT using their
> usual swconfig tool.  The CPU ports are added or removed from VLANs like
> any other port on the switch, and that feels very natural for me as an
> end user.  The only distinction necessary to know, is your 'ethernet'
> property above:  Which host device is this switch port connected to.
>
> So I wonder: Do you plan to put all of the switch config into DT?  Where
> does that stop? How about trunking between external ports and CPU ports?
> Will every VLAN in the trunk have to go into DT too?
>
IMHO VLANs shouldn't be described by DT. VLANs is part of network
configuration and should be configured by end user, if he needs them.
In the same time, DSA configuration is part of hw configuration and
that's why it placed in DT.

In any case, Andrew as an author could give a better e

Re: [PATCH V2 0/5] Add support for QCA IPQ806x Ethernet GMAC controller

2015-05-30 Thread David Miller
From: Mathieu Olivari 
Date: Wed, 27 May 2015 11:02:45 -0700

> This patch set adds support for the integrated Ethernet GMAC controller
> on QCA IPQ806x SoC. This controller is based on a Gigabit Synopsys
> DesignWare IP, already supported in the stmmac driver located in
> drivers/net/ethernet/stmicro/stmmac.
> 
> This change is done as a follow-up to the following thread:
> *http://www.spinics.net/lists/netdev/msg311265.html
> While previous attempt was creating a new driver to drive this controller,
> this new post leverages the existing stmmac driver by implementing the
> SoC specific glue to it.
> 
> Aside from the pure stmmac glue layer, we have a couple of related
> patches:
> *IPQ806x NSS clock addition is cherry-picked and refreshed from the
>  following thread: https://lkml.org/lkml/2014/8/6/390
> *phy-handle and fixed-link support are also added in this change set so the
>  driver can be fully functional on platforms using device-trees as well as
>  ethernet switches.
> 
> V2:
>  *Fix MODULE_LICENSE to "Dual BSD/GPL" as the dwmac-ipq806x.c is using
>   ISC license.

Series applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recurring trace from tcp_fragment()

2015-05-30 Thread Neal Cardwell
On Sat, May 30, 2015 at 2:52 PM, Grant Zhang  wrote:
> Thank you Neal. Most likely I will test the patch on Monday and report
> back the result.
>
> As for the TcpExtTCPSACKReneging counter, attached is the captured
> counter value on a 1-second interval for 10 minutes.

OK, great. Those TcpExtTCPSACKReneging values look consistent with the
theory underlying the patch, so that's a good sign.

Thanks!

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] net: dsa: ar8xxx: add regmap support

2015-05-30 Thread Sergey Ryazanov
2015-05-29 20:59 GMT+03:00 Andrew Lunn :
> On Fri, May 29, 2015 at 10:36:49AM -0700, Mathieu Olivari wrote:
>> Alternatively, we could have something similar to what happens for the phy
>> in the wireless subsystems. Wireless PHYs are not registered as net_device
>> but they can still be listed, queried or configured through netlink.
>
> It is a reasonable idea, but you retrieve most of the useful
> information using ethtool. That, as far as i know, operates on
> net_devices, not phys.
>
May be it's time to rework Ethernet cards handling to decouple
"Network interfaces" from "Ethernet ports"?

-- 
Sergey
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ingress tc filters with IPSec

2015-05-30 Thread jsulli...@opensourcedevel.com

> On May 30, 2015 at 4:12 PM "jsulli...@opensourcedevel.com"
>  wrote:
>
>
>
> > On May 30, 2015 at 2:24 AM "John A. Sullivan III"
> >  wrote:
> >
> >
> > On Sat, 2015-05-30 at 01:52 -0400, John A. Sullivan III wrote:
> > > Argh! yet another obstacle from my ignorance. We are attempting ingress
> > > traffic shaping using IFB interfaces on traffic coming via GRE / IPSec.
> > > Filters and hash tables are working fine with plain GRE including
> > > stripping the header. We even got the ematch filter working so that the
> > > ESP packets are the only packets not redirected to IFB.
> > >
> > > But, regardless of whether we redirect ESP packets to IFB, the filters
> > > never see the decrypted packets. I thought the packets passed through
> > > the interface twice - first encrypted and they decrypted. However,
> > > tcpdump only shows the ESP packets on the interface.
> > >
> > > How do we apply filters to the packets after decryption? Thanks - John
> >
> > I see what changed. In the past, this seemed to work but we were using
> > tunnel mode. We were trying to use transport mode in this application
> > but that seems to prevent the decrypted packet contents from appearing
> > again on the interface. Reverting to tunnel mode made the contents
> > visible again and our filters are working as expected - John
>
> Alas, this is still a problem since we are using VRRP and the tunnel end
> points
> are the virtual IP addresses. That makes StrongSWAN choke on selector matching
> in tunnel mode so back to trying to make transport mode work.
>
> I am guessing we do not see the second pass of the packet because it is only
> encrypted and not encapsulated. So my hunch is that we ned to pass the ESP
> packet into the ifb qdisc but need to look elsewhere the packet for the filter
> matching information. We know that matching on the normal offsets does not
> work
> so I am hoping the decrypted packet is decipherable by the filter matching
> logic
> but just still has all the ESP transport header attached.
>
> Normally, to extract the contents of my GRE tunnel, I would place them into a
> separate hash table with the GRE header stripped off and then filter them into
> TCP and UDP hast tables:
>
> tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47
> 0xff match u16 0x0800 0x at 22 link 11: offset at 0 mask 0f00 shift 6 plus
> 4
> eat
>
> So we match the GRE protocol and determine that GRE is carrying an IP packet.
> With the ESP transport header and IV (AES = 16B) interposed between the IP
> header and the GRE header, I suppose the first part of this filter becomes:
>
> tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47
> 0xff match u16 0x0800 0x at 46
>
> but what do I do with the second half to find the start of the TCP/UDP header?
> Is it still offset at 0 because tc filter somehow knows where the interior IP
> header starts or should it be offset at 48 to account for the GRE + ESP
> headers?
> Or is there a better way to filter ingress traffic on GRE/IPSec tunnels?
> Thanks
> - John

Alas, this is not working.  I set a continue action for the ESP traffic:

tc filter replace dev ifb0 parent 11:0 protocol ip prio 1 u32 match ip protocol
50 0xff action continue

and that seems to be matching:

filter parent 11: protocol ip pref 1 u32 fh 802::800 order 2048 key ht 802 bkt 0
terminal flowid ???  (rule hit 3130003 success 2931853)
  match 0032/00ff at 8 (success 2931853 ) 
action order 1: gact action continue
 random type none pass val 0
 index 1 ref 1 bind 1 installed 294 sec

And I even reduced the GRE filter to just look for the GRE protocol in the IP
header:

tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47
0xff link 11: offset at 48 mask 0f00 shift 6 plus 4 eat

but it does not appear to be matching at all:

filter parent 11: protocol ip pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0
link 11:  (rule hit 3130012 success 0)
  match 002f/00ff at 8 (success 0 ) 
offset 0f00>>6 at 48 plus 4  eat 

Any suggestions about how to traffic shape ingest traffic coming off an ESP
Transport connection? Thanks - John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ingress tc filters with IPSec

2015-05-30 Thread jsulli...@opensourcedevel.com

> On May 30, 2015 at 2:24 AM "John A. Sullivan III"
>  wrote:
>
>
> On Sat, 2015-05-30 at 01:52 -0400, John A. Sullivan III wrote:
> > Argh! yet another obstacle from my ignorance. We are attempting ingress
> > traffic shaping using IFB interfaces on traffic coming via GRE / IPSec.
> > Filters and hash tables are working fine with plain GRE including
> > stripping the header. We even got the ematch filter working so that the
> > ESP packets are the only packets not redirected to IFB.
> >
> > But, regardless of whether we redirect ESP packets to IFB, the filters
> > never see the decrypted packets. I thought the packets passed through
> > the interface twice - first encrypted and they decrypted. However,
> > tcpdump only shows the ESP packets on the interface.
> >
> > How do we apply filters to the packets after decryption? Thanks - John
>
> I see what changed. In the past, this seemed to work but we were using
> tunnel mode. We were trying to use transport mode in this application
> but that seems to prevent the decrypted packet contents from appearing
> again on the interface. Reverting to tunnel mode made the contents
> visible again and our filters are working as expected - John

Alas, this is still a problem since we are using VRRP and the tunnel end points
are the virtual IP addresses.  That makes StrongSWAN choke on selector matching
in tunnel mode so back to trying to make transport mode work.

I am guessing we do not see the second pass of the packet because it is only
encrypted and not encapsulated.  So my hunch is that we ned to pass the ESP
packet into the ifb qdisc but need to look elsewhere the packet for the filter
matching information.  We know that matching on the normal offsets does not work
so I am hoping the decrypted packet is decipherable by the filter matching logic
but just still has all the ESP transport header attached.

Normally, to extract the contents of my GRE tunnel, I would place them into a
separate hash table with the GRE header stripped off and then filter them into
TCP and UDP hast tables:

tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47
0xff match u16 0x0800 0x at 22 link 11: offset at 0 mask 0f00 shift 6 plus 4
eat

So we match the GRE protocol and determine that GRE is carrying an IP packet.
 With the ESP transport header and IV (AES = 16B) interposed between the IP
header and the GRE header, I suppose the first part of this filter becomes:

tc filter add dev ifb0 parent 11:0 protocol ip prio 2 u32 match ip protocol 47
0xff match u16 0x0800 0x at 46

but what do I do with the second half to find the start of the TCP/UDP header?
Is it still offset at 0 because tc filter somehow knows where the interior IP
header starts or should it be offset at 48 to account for the GRE + ESP headers?
Or is there a better way to filter ingress traffic on GRE/IPSec tunnels? Thanks
- John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recurring trace from tcp_fragment()

2015-05-30 Thread Grant Zhang
Thank you Neal. Most likely I will test the patch on Monday and report
back the result.

As for the TcpExtTCPSACKReneging counter, attached is the captured
counter value on a 1-second interval for 10 minutes.

Thanks,

Grant




reneg.log
Description: Binary data




> On May 30, 2015, at 10:29 AM, Neal Cardwell  wrote:
> 
> On Fri, May 29, 2015 at 3:53 PM, Grant Zhang  wrote:
>> Hi Neal,
>> 
>> I will be more happy to test the patch. Please send it my way.
> 
> Great. Thank you so much for being willing to do this. Attached is a
> patch for testing. I generated it and tested it relative to Linux
> v3.14.39, since your stack trace seemed to suggest that you were
> seeing this on some variant of v3.14.39. (Newer kernels would need a
> slightly different patch, since the reneging code path has changed a
> little since 3.14.)
> 
> Can you please try it out and see if it makes that warning go away?
> 
> Also, I would be interested in seeing the value of your
> TcpExtTCPSACKReneging counter, and some sense of how fast that value
> is increasing, on a machine that's seeing this issue:
>  nstat -z -a | grep Reneg
> 
> Thanks!
> 
> neal
> <0001-RFC-for-tests-on-v3.14.39-tcp-resegment-skbs-that-we.patch>



[PATCH net-next 0/3] s390/bpf: implement bpf_tail_call JIT support

2015-05-30 Thread Alexei Starovoitov
This set is for net-next tree.

Patch 3 adds bpf_tail_call() support for s390x JIT. It has
a dependency on patches 1 and 2 that will also be submitted
to stable via Martin Schwidefsky.

Michael Holzheu (3):
  s390/bpf: fix stack allocation
  s390/bpf: fix bpf frame pointer setup
  s390/bpf: implement bpf_tail_call() helper

 arch/s390/net/bpf_jit.h  |   12 -
 arch/s390/net/bpf_jit_comp.c |  117 +++---
 2 files changed, 121 insertions(+), 8 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/3] s390/bpf: fix stack allocation

2015-05-30 Thread Alexei Starovoitov
From: Michael Holzheu 

On s390x we have to provide 160 bytes stack space before we can call
the next function. From the 160 bytes that we got from the previous
function we only use 11 * 8 bytes and have 160 - 11 * 8 bytes left.
Currently for BPF we allocate additional 160 - 11 * 8 bytes for the
next function. This is wrong because then the next function only gets:

 (160 - 11 * 8) + (160 - 11 * 8) = 2 * 72 = 144 bytes

Fix this and allocate enough memory for the next function.

Cc: sta...@vger.kernel.org # 4.0+
Signed-off-by: Michael Holzheu 
Acked-by: Heiko Carstens 
Signed-off-by: Alexei Starovoitov 
---
 arch/s390/net/bpf_jit.h |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/s390/net/bpf_jit.h b/arch/s390/net/bpf_jit.h
index ba8593a515ba..de156ba3bd71 100644
--- a/arch/s390/net/bpf_jit.h
+++ b/arch/s390/net/bpf_jit.h
@@ -48,7 +48,9 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
  * We get 160 bytes stack space from calling function, but only use
  * 11 * 8 byte (old backchain + r15 - r6) for storing registers.
  */
-#define STK_OFF (MAX_BPF_STACK + 8 + 4 + 4 + (160 - 11 * 8))
+#define STK_SPACE  (MAX_BPF_STACK + 8 + 4 + 4 + 160)
+#define STK_160_UNUSED (160 - 11 * 8)
+#define STK_OFF(STK_SPACE - STK_160_UNUSED)
 #define STK_OFF_TMP160 /* Offset of tmp buffer on stack */
 #define STK_OFF_HLEN   168 /* Offset of SKB header length on stack */
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/3] s390/bpf: fix bpf frame pointer setup

2015-05-30 Thread Alexei Starovoitov
From: Michael Holzheu 

Currently the bpf frame pointer is set to the old r15. This is
wrong because of packed stack. Fix this and adjust the frame pointer
to respect packed stack. This now generates a prolog like the following:

 3ff8001c3fa: eb67f0480024   stmg%r6,%r7,72(%r15)
 3ff8001c400: ebcff0780024   stmg%r12,%r15,120(%r15)
 3ff8001c406: b904001f   lgr %r1,%r15  <- load backchain
 3ff8001c40a: 41d0f048   la  %r13,72(%r15) <- load adjusted bfp
 3ff8001c40e: a7fbfd98   aghi%r15,-616
 3ff8001c412: e310f0980024   stg %r1,152(%r15) <- save backchain

Cc: sta...@vger.kernel.org # 4.0+
Signed-off-by: Michael Holzheu 
Acked-by: Heiko Carstens 
Signed-off-by: Alexei Starovoitov 
---
 arch/s390/net/bpf_jit_comp.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 20c146d1251a..55423d8be580 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -384,13 +384,16 @@ static void bpf_jit_prologue(struct bpf_jit *jit)
}
/* Setup stack and backchain */
if (jit->seen & SEEN_STACK) {
-   /* lgr %bfp,%r15 (BPF frame pointer) */
-   EMIT4(0xb904, BPF_REG_FP, REG_15);
+   if (jit->seen & SEEN_FUNC)
+   /* lgr %w1,%r15 (backchain) */
+   EMIT4(0xb904, REG_W1, REG_15);
+   /* la %bfp,STK_160_UNUSED(%r15) (BPF frame pointer) */
+   EMIT4_DISP(0x4100, BPF_REG_FP, REG_15, STK_160_UNUSED);
/* aghi %r15,-STK_OFF */
EMIT4_IMM(0xa70b, REG_15, -STK_OFF);
if (jit->seen & SEEN_FUNC)
-   /* stg %bfp,152(%r15) (backchain) */
-   EMIT6_DISP_LH(0xe300, 0x0024, BPF_REG_FP, REG_0,
+   /* stg %w1,152(%r15) (backchain) */
+   EMIT6_DISP_LH(0xe300, 0x0024, REG_W1, REG_0,
  REG_15, 152);
}
/*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/3] s390/bpf: implement bpf_tail_call() helper

2015-05-30 Thread Alexei Starovoitov
From: Michael Holzheu 

bpf_tail_call() arguments:

 - ctx..: Context pointer
 - jmp_table: One of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table
 - index: Index in the jump table

In this implementation s390x JIT does stack unwinding and jumps into the
callee program prologue. Caller and callee use the same stack.

With this patch a tail call generates the following code on s390x:

 if (index >= array->map.max_entries)
 goto out
 03ff8001c7e4: e31030100016   llgf%r1,16(%r3)
 03ff8001c7ea: ec41001fa065   clgrj   %r4,%r1,10,3ff8001c828

 if (tail_call_cnt++ > MAX_TAIL_CALL_CNT)
 goto out;
 03ff8001c7f0: a7080001   lhi %r0,1
 03ff8001c7f4: eb10f25000fa   laal%r1,%r0,592(%r15)
 03ff8001c7fa: ec120017207f   clij%r1,32,2,3ff8001c828

 prog = array->prog[index];
 if (prog == NULL)
 goto out;
 03ff8001c800: eb140003000d   sllg%r1,%r4,3
 03ff8001c806: e3131084   lg  %r1,128(%r3,%r1)
 03ff8001c80c: ec18000e007d   clgij   %r1,0,8,3ff8001c828

 Restore registers before calling function
 03ff8001c812: eb68f2980004   lmg %r6,%r8,664(%r15)
 03ff8001c818: ebbff2c4   lmg %r11,%r15,704(%r15)

 goto *(prog->bpf_func + tail_call_start);
 03ff8001c81e: e3110024   lg  %r1,32(%r1,%r0)
 03ff8001c824: 47f01006   bc  15,6(%r1)

Reviewed-by: Martin Schwidefsky 
Signed-off-by: Michael Holzheu 
Acked-by: Heiko Carstens 
Signed-off-by: Alexei Starovoitov 
---
 arch/s390/net/bpf_jit.h  |   10 +++-
 arch/s390/net/bpf_jit_comp.c |  106 +-
 2 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/arch/s390/net/bpf_jit.h b/arch/s390/net/bpf_jit.h
index de156ba3bd71..f6498eec9ee1 100644
--- a/arch/s390/net/bpf_jit.h
+++ b/arch/s390/net/bpf_jit.h
@@ -28,6 +28,9 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
  *   | old backchain | |
  *   +---+ |
  *   |   r15 - r6| |
+ *   +---+ |
+ *   | 4 byte align  | |
+ *   | tail_call_cnt | |
  * BFP-> +===+ |
  *   |   | |
  *   |   BPF stack   | |
@@ -46,14 +49,17 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
  * R15-> +---+ + low
  *
  * We get 160 bytes stack space from calling function, but only use
- * 11 * 8 byte (old backchain + r15 - r6) for storing registers.
+ * 12 * 8 byte for old backchain, r15..r6, and tail_call_cnt.
  */
 #define STK_SPACE  (MAX_BPF_STACK + 8 + 4 + 4 + 160)
-#define STK_160_UNUSED (160 - 11 * 8)
+#define STK_160_UNUSED (160 - 12 * 8)
 #define STK_OFF(STK_SPACE - STK_160_UNUSED)
 #define STK_OFF_TMP160 /* Offset of tmp buffer on stack */
 #define STK_OFF_HLEN   168 /* Offset of SKB header length on stack */
 
+#define STK_OFF_R6 (160 - 11 * 8)  /* Offset of r6 on stack */
+#define STK_OFF_TCCNT  (160 - 12 * 8)  /* Offset of tail_call_cnt on stack */
+
 /* Offset to skip condition code check */
 #define OFF_OK 4
 
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 55423d8be580..d3766dd67e23 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "bpf_jit.h"
@@ -40,6 +41,8 @@ struct bpf_jit {
int base_ip;/* Base address for literal pool */
int ret0_ip;/* Address of return 0 */
int exit_ip;/* Address of exit */
+   int tail_call_start;/* Tail call start offset */
+   int labels[1];  /* Labels for local jumps */
 };
 
 #define BPF_SIZE_MAX   4096/* Max size for program */
@@ -49,6 +52,7 @@ struct bpf_jit {
 #define SEEN_RET0  4   /* ret0_ip points to a valid return 0 */
 #define SEEN_LITERAL   8   /* code uses literals */
 #define SEEN_FUNC  16  /* calls C functions */
+#define SEEN_TAIL_CALL 32  /* code uses tail calls */
 #define SEEN_STACK (SEEN_FUNC | SEEN_MEM | SEEN_SKB)
 
 /*
@@ -60,6 +64,7 @@ struct bpf_jit {
 #define REG_L  (__MAX_BPF_REG+3)   /* Literal pool register */
 #define REG_15 (__MAX_BPF_REG+4)   /* Register 15 */
 #define REG_0  REG_W0  /* Register 0 */
+#define REG_1  REG_W1  /* Register 1 */
 #define REG_2  BPF_REG_1   /* Register 2 */
 #define REG_14 BPF_REG_0   /* Register 14 */
 
@@ -223,6 +228,24 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 
b1)
REG_SET_SEEN(b3);   \
 })
 
+#define EMIT6_PCREL_LABEL(op1, op2, b1, b2, label, mask)   \
+({ \
+   int rel = (jit->labels[label] - jit->prg) >> 1; \
+   _EMIT6(op1 | reg(b1,

Re: Recurring trace from tcp_fragment()

2015-05-30 Thread Neal Cardwell
On Fri, May 29, 2015 at 3:53 PM, Grant Zhang  wrote:
> Hi Neal,
>
> I will be more happy to test the patch. Please send it my way.

Great. Thank you so much for being willing to do this. Attached is a
patch for testing. I generated it and tested it relative to Linux
v3.14.39, since your stack trace seemed to suggest that you were
seeing this on some variant of v3.14.39. (Newer kernels would need a
slightly different patch, since the reneging code path has changed a
little since 3.14.)

Can you please try it out and see if it makes that warning go away?

Also, I would be interested in seeing the value of your
TcpExtTCPSACKReneging counter, and some sense of how fast that value
is increasing, on a machine that's seeing this issue:
  nstat -z -a | grep Reneg

Thanks!

neal


0001-RFC-for-tests-on-v3.14.39-tcp-resegment-skbs-that-we.patch
Description: Binary data


[PATCH 81/98] include/uapi/linux/openvswitch.h: use __u32 from linux/types.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compiler error:

error: unknown type name ‘uint32_t’

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/openvswitch.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index bbd49a0..0ab8eca 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -586,8 +586,8 @@ enum ovs_hash_alg {
  * @hash_basis: basis used for computing hash.
  */
 struct ovs_action_hash {
-   uint32_t  hash_alg; /* One of ovs_hash_alg. */
-   uint32_t  hash_basis;
+   __u32  hash_alg; /* One of ovs_hash_alg. */
+   __u32  hash_basis;
 };
 
 /**
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 84/98] include/uapi/linux/atm_zatm.h: include linux/time.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compile error:

error: field ‘real’ has incomplete type
 struct timeval real;  /* real (wall-clock) time */

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/atm_zatm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/atm_zatm.h b/include/uapi/linux/atm_zatm.h
index 10f0fa2..adbaa6c 100644
--- a/include/uapi/linux/atm_zatm.h
+++ b/include/uapi/linux/atm_zatm.h
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 
 #define ZATM_GETPOOL   _IOW('a',ATMIOC_SARPRV+1,struct atmif_sioc)
/* get pool statistics */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] udp: fix behavior of wrong checksums

2015-05-30 Thread Eric Dumazet
From: Eric Dumazet 

We have two problems in UDP stack related to bogus checksums :

1) We return -EAGAIN to application even if receive queue is not empty.
   This breaks applications using edge trigger epoll()

2) Under UDP flood, we can loop forever without yielding to other
   processes, potentially hanging the host, especially on non SMP.


This patch is an attempt to make things better.

We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.

Signed-off-by: Eric Dumazet 
Cc: Willem de Bruijn 
---
 net/ipv4/udp.c |6 ++
 net/ipv6/udp.c |6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d10b7e0112eb..1c92ea67baef 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1345,10 +1345,8 @@ csum_copy_err:
}
unlock_sock_fast(sk, slow);
 
-   if (noblock)
-   return -EAGAIN;
-
-   /* starting over for a new packet */
+   /* starting over for a new packet, but check if we need to yield */
+   cond_resched();
msg->msg_flags &= ~MSG_TRUNC;
goto try_again;
 }
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index c2ec41617a35..e51fc3eee6db 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -525,10 +525,8 @@ csum_copy_err:
}
unlock_sock_fast(sk, slow);
 
-   if (noblock)
-   return -EAGAIN;
-
-   /* starting over for a new packet */
+   /* starting over for a new packet, but check if we need to yield */
+   cond_resched();
msg->msg_flags &= ~MSG_TRUNC;
goto try_again;
 }


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 41/98] include/uapi/linux/if_pppox.h: include linux/if.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compilation error:

error: ‘IFNAMSIZ’ undeclared here (not in a function)

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/if_pppox.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h
index e128769..473c3c4 100644
--- a/include/uapi/linux/if_pppox.h
+++ b/include/uapi/linux/if_pppox.h
@@ -21,6 +21,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 48/98] include/uapi/linux/if_pppox.h: include linux/in.h and linux/in6.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compilation errors:

error: field ‘addr’ has incomplete type
 struct sockaddr_in addr; /* IP address and port to send to */

error: field ‘addr’ has incomplete type
 struct sockaddr_in6 addr; /* IP address and port to send to */

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/if_pppox.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h
index 473c3c4..d37bbb1 100644
--- a/include/uapi/linux/if_pppox.h
+++ b/include/uapi/linux/if_pppox.h
@@ -24,6 +24,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /* For user-space programs to pick up these definitions
  * which they wouldn't get otherwise without defining __KERNEL__
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 47/98] include/uapi/linux/if_pppol2tp.h: include linux/in.h and linux/in6.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compilation errors like:

error: field ‘addr’ has incomplete type
 struct sockaddr_in addr; /* IP address and port to send to */
^
error: field ‘addr’ has incomplete type
 struct sockaddr_in6 addr; /* IP address and port to send to */

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/if_pppol2tp.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_pppol2tp.h b/include/uapi/linux/if_pppol2tp.h
index 163e8ad..4bd1f55 100644
--- a/include/uapi/linux/if_pppol2tp.h
+++ b/include/uapi/linux/if_pppol2tp.h
@@ -16,7 +16,8 @@
 #define _UAPI__LINUX_IF_PPPOL2TP_H
 
 #include 
-
+#include 
+#include 
 
 /* Structure used to connect() the socket to a particular tunnel UDP
  * socket over IPv4.
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 42/98] include/uapi/linux/if_tunnel.h: include linux/if.h, linux/ip.h and linux/in6.h

2015-05-30 Thread Mikko Rapeli
Fixes userspace compilation errors like:

error: field ‘iph’ has incomplete type
error: field ‘prefix’ has incomplete type

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/if_tunnel.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index bd3cc11..2a36080 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -2,6 +2,9 @@
 #define _UAPI_IF_TUNNEL_H_
 
 #include 
+#include 
+#include 
+#include 
 #include 
 
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] dsa: Add support for multiple cpu ports.

2015-05-30 Thread Bjørn Mork
Andrew Lunn  writes:

> Some boards have two CPU interfaces connected to the switch, e.g. WiFi
> access points, with 1 port labeled WAN, 4 ports labeled lan1-lan4, and
> two port connected to the SoC.
>
> This patch extends DSA to allows both CPU ports to be used. The "cpu"
> node in the DSA tree can now have a phandle to the host interface it
> connects to. Each user port can have a phandle to a cpu port which
> should be used for traffic between the port and the CPU. Thus simple
> load sharing over the two CPU ports can be achieved.
>
> Signed-off-by: Andrew Lunn 
> ---
>  Documentation/devicetree/bindings/net/dsa/dsa.txt |  66 -
>  drivers/net/dsa/mv88e6xxx.c   |   8 +-
>  include/net/dsa.h |  28 +-
>  net/dsa/dsa.c | 109 
> ++
>  net/dsa/dsa_priv.h|   6 ++
>  net/dsa/slave.c   |  10 +-
>  net/dsa/tag_brcm.c|   2 +-
>  net/dsa/tag_dsa.c |   2 +-
>  net/dsa/tag_edsa.c|   2 +-
>  net/dsa/tag_trailer.c |   2 +-
>  10 files changed, 206 insertions(+), 29 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt 
> b/Documentation/devicetree/bindings/net/dsa/dsa.txt
> index f0b4cd72411d..34f7f18026e5 100644
> --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
> +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
> @@ -58,13 +58,24 @@ Optionnal property:
> Documentation/devicetree/bindings/net/ethernet.txt
> for details.
>  
> +- ethernet   : Optional for "cpu" ports. A phandle to an ethernet
> +  device which will be used by this CPU port for
> +   passing packets to/from the host. If not present,
> +   the port will use the "dsa,ethernet" property
> +   defined above.
> +
> +- cpu: Option for non "cpu"/"dsa" ports. A phandle 
> to a
> +   "cpu" port, which will be used for passing packets
> +   from this port to the host. If not present, the first
> +   "cpu" port will be used.
> +

I'm in deep water here, but this scheme sounds a little too static to me
if I understand your proposal correctly.  Why would you want to create a
static mapping of CPU ports to external ports for any given device?  To
me, that's part of the switch VLAN configuration.

My experience with these devices is limited to running OpenWRT on an
WRT1900AC, having a Marvell 88E6172 switch.  And using the OpenWRT
switch API of course. There I've found it very useful to be able to mix
and match the two CPU ports as I like with the external ports. How you
want the CPU ports used is not as much depeing on device properties as
on your network configuration, IMHO.  How many and which links do you
have?  What bandwith are they? Trunks or not?  Etc.  You cannot describe
these answers as device properties, because they aren't.

You can currently configure this as you like in OpenWRT using their
usual swconfig tool.  The CPU ports are added or removed from VLANs like
any other port on the switch, and that feels very natural for me as an
end user.  The only distinction necessary to know, is your 'ethernet'
property above:  Which host device is this switch port connected to.

So I wonder: Do you plan to put all of the switch config into DT?  Where
does that stop? How about trunking between external ports and CPU ports?
Will every VLAN in the trunk have to go into DT too?


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] xen-netback: return correct ethtool stats

2015-05-30 Thread Ian Campbell
Control: fixed -1 4.0-1~exp1

On Wed, 2015-03-04 at 11:14 +, David Vrabel wrote:
> Use correct pointer arithmetic to get the pointer to each stat.

I think this incorrect arithmetic was also responsible for the crash
reported in http://bugs.debian.org/786936 which was using the resulting
stray pointer.

I'll add the fix to our kernel but: David (Miller) could we also have it
queued for stable please?

Thanks.

Reasoning:

IP: [] xenvif_get_ethtool_stats+0x50/0x80 [xen_netback]

(gdb) disas xenvif_get_ethtool_stats+0x50
Dump of assembler code for function xenvif_get_ethtool_stats:
   0x5280 <+0>: callq  0x5285 
   0x5285 <+5>: mov0x900(%rdi),%r9d
   0x528c <+12>:mov$0x0,%r8
   0x5293 <+19>:lea-0x1(%r9),%r10d
   0x5297 <+23>:imul   $0x36258,%r10,%r10
   0x529e <+30>:xchg   %ax,%ax
   0x52a0 <+32>:test   %r9d,%r9d
   0x52a3 <+35>:je 0x52f8 
   0x52a5 <+37>:movzwl (%r8),%esi
   0x52a9 <+41>:mov0x8f8(%rdi),%rcx
   0x52b0 <+48>:lea0x0(,%rsi,8),%rax
   0x52b8 <+56>:shl$0x6,%rsi
   0x52bc <+60>:sub%rax,%rsi
   0x52bf <+63>:lea(%rcx,%rsi,1),%rax
   0x52c3 <+67>:lea0x36258(%rcx,%r10,1),%rcx
   0x52cb <+75>:add%rcx,%rsi
   0x52ce <+78>:xor%ecx,%ecx
   0x52d0 <+80>:add0x36220(%rax),%rcx
   0x52d7 <+87>:add$0x36258,%rax
   0x52dd <+93>:cmp%rsi,%rax
   0x52e0 <+96>:jne0x52d0 
   0x52e2 <+98>:add$0x22,%r8
   0x52e6 <+102>:   mov%rcx,(%rdx)
   0x52e9 <+105>:   add$0x8,%rdx
   0x52ed <+109>:   cmp$0x0,%r8
   0x52f4 <+116>:   jne0x52a0 
   0x52f6 <+118>:   repz retq 
   0x52f8 <+120>:   xor%ecx,%ecx
   0x52fa <+122>:   jmp0x52e2 
End of assembler dump.
(gdb) list *xenvif_get_ethtool_stats+0x50
0x52d0 is in xenvif_get_ethtool_stats 
(/build/linux-RGM_Ed/linux-3.16.7-ckt9/drivers/net/xen-netback/interface.c:349).

... and in the Debian kernel interface.c:349 is the accum += line from
the patch.

Ian.

> 
> Signed-off-by: David Vrabel 
> ---
>  drivers/net/xen-netback/interface.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/interface.c 
> b/drivers/net/xen-netback/interface.c
> index f38227a..3aa8648 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -340,12 +340,11 @@ static void xenvif_get_ethtool_stats(struct net_device 
> *dev,
>   unsigned int num_queues = vif->num_queues;
>   int i;
>   unsigned int queue_index;
> - struct xenvif_stats *vif_stats;
>  
>   for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++) {
>   unsigned long accum = 0;
>   for (queue_index = 0; queue_index < num_queues; ++queue_index) {
> - vif_stats = &vif->queues[queue_index].stats;
> + void *vif_stats = &vif->queues[queue_index].stats;
>   accum += *(unsigned long *)(vif_stats + 
> xenvif_stats[i].offset);
>   }
>   data[i] = accum;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] can: mcp251x: not correct register address

2015-05-30 Thread Tomas Krcka
v2: fix of corrupted patch

This patch corrects addresses of acceptance filters.
These registers are not in use, but values should be correct.
Tested with MCP2515 and am3352 and also checked datasheets for MCP2515
and MCP2510.

Signed-off-by: Tomas Krcka  

---
 drivers/net/can/spi/mcp251x.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index bf63fee..c1a95a3 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -190,10 +190,11 @@
 #define RXBEID0_OFF 4
 #define RXBDLC_OFF  5
 #define RXBDAT_OFF  6
-#define RXFSIDH(n) ((n) * 4)
-#define RXFSIDL(n) ((n) * 4 + 1)
-#define RXFEID8(n) ((n) * 4 + 2)
-#define RXFEID0(n) ((n) * 4 + 3)
+#define RXFSID(n) ((n < 3) ? 0 : 4)
+#define RXFSIDH(n) ((n) * 4 + RXFSID(n))
+#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n))
+#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n))
+#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n))
 #define RXMSIDH(n) ((n) * 4 + 0x20)
 #define RXMSIDL(n) ((n) * 4 + 0x21)
 #define RXMEID8(n) ((n) * 4 + 0x22)
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: add missing rcu protection when releasing programs from prog_array

2015-05-30 Thread Daniel Borkmann

On 05/30/2015 01:22 AM, Alexei Starovoitov wrote:
...

Like __sk_filter_release() and __bpf_prog_release() should be removed.


The whole filter cleanup procedure needs to be simplified a bit, got a
bit too complicated over time, agreed.


Of course, it's a grey line when to introduce a helper and when not to,
but just because two lines are close enough between two functions it
doesn't mean that helper is warranted. In this bpf_prog_put() case
I think helper is not needed _today_. If it grows, we'll reconsider.


Yes, that's what I meant.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware

2015-05-30 Thread Jiri Pirko
Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote:
>On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko  wrote:
>> Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote:
>>>On Tue, May 19, 2015 at 1:28 PM, David Miller  wrote:
 From: Andy Gospodarek 
 Date: Tue, 19 May 2015 15:47:32 -0400

> Are you actually saying that if users complain loudly enough about
> the current behavior (not the change Roopa has proposed) that you
> would be open to considering a change the current behavior?

 I am saying that we have a contract with users not to break existing
 behavior.  Full stop.
>>>
>>>After rehearing David's argument, we should probably explore option d)
>>>which is a refinement on the fib_offload_disable mechanism we have
>>>today.  fib_offload_disable is global for all routes.  Once we hit a
>>>HW install problem, the global flag is set and all routes fallback to
>>>SW.  We did this because we can't allow the failed route to exist in
>>>SW and not in HW because it could mess up LPM searches (HW could hit
>>>on a lesser prefix even when SW has the true LPM, because HW gets
>>>first shot at match).  The refinement on fib_offload_disable is this:
>>>make it per-related-prefix rather than global, and on a HW install
>>>problem, set the flag for the related-prefix and uninstall only those
>>>routes from HW.  Related-prefix (is there a correct term for this?)
>>>are routes to the same dst addr but with different prefix lengths.  I
>>>haven't parsed the fib_trie structure to see how routes are organized,
>>>but I suspect since it's optimized for lookup the related-prefix
>>>tracking is already there and we can build on that.
>>
>> This looks interesting. However, I'm not sure that it is acceptable for
>> user to experience this hw evict of "random entries". User knows what
>> entries are essential to have in hw. With your solution, I can see no way
>> user can actually say what should be offloaded or not. Kernel just
>> automagically decides.
>
>The default eviction policy could be based on RTA_PRIORITY: evict
>lower priority routes first.  It would be up to the device driver to
>decide between two routes of same priority.
>
>To help device driver make the decision, we could have eviction policy options:
>
>Priority-base (default)
>Prefer IPv6 over IPv4
>Prefer IPv4 over IPv6
>Prefer single path over multipath
>Prefer longer prefix lengths over shorter
>Optimize for resource utilization
>
>These are portable across different switches.   They're in terms a
>user understands.  It's up to the device driver which truly
>understands the device constraints to translates the user's eviction
>policy choices into something that makes sense to that device.

This sounds tempting... You plan to throw in some patches, or should I
take care of that?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] can: mcp251x: not correct register address

2015-05-30 Thread Tomas Krcka
You are right, sorry for that. I'll send v2.

Thanks.

2015-05-30 9:41 GMT+02:00 Jakub Kicinski :
> On Mon, 25 May 2015 08:57:48 +0200, Tomas Krcka wrote:
>> This patch corrects addresses of acceptance filters.
>> These registers are not in use, but values should be correct.
>> Tested with MCP2515 and am3352 and also checked datasheets for MCP2515
>> and MCP2510.
>>
>> Signed-off-by: Tomas Krcka 
>>
>> ---
>>   drivers/net/can/spi/mcp251x.c |9 +
>>   1 files changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
>> index bf63fee..c1a95a3 100644
>> --- a/drivers/net/can/spi/mcp251x.c
>> +++ b/drivers/net/can/spi/mcp251x.c
>> @@ -190,10 +190,11 @@
>>   #define RXBEID0_OFF 4
>>   #define RXBDLC_OFF  5
>>   #define RXBDAT_OFF  6
>> -#define RXFSIDH(n) ((n) * 4)
>> -#define RXFSIDL(n) ((n) * 4 + 1)
>> -#define RXFEID8(n) ((n) * 4 + 2)
>> -#define RXFEID0(n) ((n) * 4 + 3)
>> +#define RXFSID(n) ((n < 3) ? 0 : 4)
>> +#define RXFSIDH(n) ((n) * 4 + RXFSID(n))
>> +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n))
>> +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n))
>> +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n))
>>   #define RXMSIDH(n) ((n) * 4 + 0x20)
>>   #define RXMSIDL(n) ((n) * 4 + 0x21)
>>   #define RXMEID8(n) ((n) * 4 + 0x22)
>
> I think your patch was corrupted.  It doesn't apply because you have
> extra space before each surviving #define.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] can: mcp251x: not correct register address

2015-05-30 Thread Jakub Kicinski
On Mon, 25 May 2015 08:57:48 +0200, Tomas Krcka wrote:
> This patch corrects addresses of acceptance filters.
> These registers are not in use, but values should be correct.
> Tested with MCP2515 and am3352 and also checked datasheets for MCP2515
> and MCP2510.
> 
> Signed-off-by: Tomas Krcka 
> 
> ---
>   drivers/net/can/spi/mcp251x.c |9 +
>   1 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
> index bf63fee..c1a95a3 100644
> --- a/drivers/net/can/spi/mcp251x.c
> +++ b/drivers/net/can/spi/mcp251x.c
> @@ -190,10 +190,11 @@
>   #define RXBEID0_OFF 4
>   #define RXBDLC_OFF  5
>   #define RXBDAT_OFF  6
> -#define RXFSIDH(n) ((n) * 4)
> -#define RXFSIDL(n) ((n) * 4 + 1)
> -#define RXFEID8(n) ((n) * 4 + 2)
> -#define RXFEID0(n) ((n) * 4 + 3)
> +#define RXFSID(n) ((n < 3) ? 0 : 4)
> +#define RXFSIDH(n) ((n) * 4 + RXFSID(n))
> +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n))
> +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n))
> +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n))
>   #define RXMSIDH(n) ((n) * 4 + 0x20)
>   #define RXMSIDL(n) ((n) * 4 + 0x21)
>   #define RXMEID8(n) ((n) * 4 + 0x22)

I think your patch was corrupted.  It doesn't apply because you have
extra space before each surviving #define.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html