date:20150705

Re: [PATCH v2 net-next 0/7] net_sched: act: lockless operation

2015-07-05 Thread Eric Dumazet

On Mon, Jul 6, 2015 at 8:41 AM, Eric Dumazet  wrote:
> As mentioned by Alexei last week in Budapest, it is a bit weird
> to take a spinlock in order to drop a packet in a tc filter...
>

Arg, please ignore v2, I messed my git send-email command
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next 4/7] net_sched: act_gact: use a separate packet counters for gact_determ()

2015-07-05 Thread Eric Dumazet

Second step for gact RCU operation :

We want to get rid of the spinlock protecting gact operations.
Stats (packets/bytes) will soon be per cpu.

gact_determ() would not work without a central packet counter,
so lets add it for this mode.

Signed-off-by: Eric Dumazet 
Cc: Alexei Starovoitov 
Acked-by: Jamal Hadi Salim 
Acked-by: John Fastabend 
---
 include/net/tc_act/tc_gact.h | 7 ---
 net/sched/act_gact.c | 4 +++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h
index 9fc9b578908a..592a6bc02b0b 100644
--- a/include/net/tc_act/tc_gact.h
+++ b/include/net/tc_act/tc_gact.h
@@ -6,9 +6,10 @@
 struct tcf_gact {
struct tcf_common   common;
 #ifdef CONFIG_GACT_PROB
-u16tcfg_ptype;
-u16tcfg_pval;
-inttcfg_paction;
+   u16 tcfg_ptype;
+   u16 tcfg_pval;
+   int tcfg_paction;
+   atomic_tpackets;
 #endif
 };
 #define to_gact(a) \
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 22a3a61aa090..2f9bec584b3f 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -36,8 +36,10 @@ static int gact_net_rand(struct tcf_gact *gact)
 
 static int gact_determ(struct tcf_gact *gact)
 {
+   u32 pack = atomic_inc_return(&gact->packets);
+
smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
-   if (gact->tcf_bstats.packets % gact->tcfg_pval)
+   if (pack % gact->tcfg_pval)
return gact->tcf_action;
return gact->tcfg_paction;
 }
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next 3/7] net_sched: act: make tcfg_pval non zero

2015-07-05 Thread Eric Dumazet

First step for gact RCU operation :

Instead of testing if tcfg_pval is zero or not, just make it 1.

No change in behavior, but slightly faster code.

The smp_rmb()/smp_wmb() barriers, while not strictly needed at this
stage are added for upcoming spinlock removal.

Signed-off-by: Eric Dumazet 
Acked-by: Alexei Starovoitov 
Acked-by: Jamal Hadi Salim 
Acked-by: John Fastabend 
---
 net/sched/act_gact.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index a4f8af29ee30..22a3a61aa090 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -28,14 +28,16 @@
 #ifdef CONFIG_GACT_PROB
 static int gact_net_rand(struct tcf_gact *gact)
 {
-   if (!gact->tcfg_pval || prandom_u32() % gact->tcfg_pval)
+   smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
+   if (prandom_u32() % gact->tcfg_pval)
return gact->tcf_action;
return gact->tcfg_paction;
 }
 
 static int gact_determ(struct tcf_gact *gact)
 {
-   if (!gact->tcfg_pval || gact->tcf_bstats.packets % gact->tcfg_pval)
+   smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
+   if (gact->tcf_bstats.packets % gact->tcfg_pval)
return gact->tcf_action;
return gact->tcfg_paction;
 }
@@ -105,7 +107,11 @@ static int tcf_gact_init(struct net *net, struct nlattr 
*nla,
 #ifdef CONFIG_GACT_PROB
if (p_parm) {
gact->tcfg_paction = p_parm->paction;
-   gact->tcfg_pval= p_parm->pval;
+   gact->tcfg_pval= max_t(u16, 1, p_parm->pval);
+   /* Make sure tcfg_pval is written before tcfg_ptype
+* coupled with smp_rmb() in gact_net_rand() & gact_determ()
+*/
+   smp_wmb();
gact->tcfg_ptype   = p_parm->ptype;
}
 #endif
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next 2/7] net: sched: add percpu stats to actions

2015-07-05 Thread Eric Dumazet

Reuse existing percpu infrastructure John Fastabend added for qdisc.

This patch adds a new cpustats parameter to tcf_hash_create() and all
actions pass false, meaning this patch should have no effect yet.

Signed-off-by: Eric Dumazet 
Cc: Alexei Starovoitov 
Acked-by: Jamal Hadi Salim 
Acked-by: John Fastabend 
---
 include/net/act_api.h|  4 +++-
 net/sched/act_api.c  | 44 ++--
 net/sched/act_bpf.c  |  2 +-
 net/sched/act_connmark.c |  3 ++-
 net/sched/act_csum.c |  3 ++-
 net/sched/act_gact.c |  3 ++-
 net/sched/act_ipt.c  |  2 +-
 net/sched/act_mirred.c   |  3 ++-
 net/sched/act_nat.c  |  3 ++-
 net/sched/act_pedit.c|  3 ++-
 net/sched/act_simple.c   |  3 ++-
 net/sched/act_skbedit.c  |  3 ++-
 net/sched/act_vlan.c |  3 ++-
 13 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 3ee4c92afd1b..db2063ffd181 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -21,6 +21,8 @@ struct tcf_common {
struct gnet_stats_rate_est64tcfc_rate_est;
spinlock_t  tcfc_lock;
struct rcu_head tcfc_rcu;
+   struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+   struct gnet_stats_queue __percpu *cpu_qstats;
 };
 #define tcf_head   common.tcfc_head
 #define tcf_index  common.tcfc_index
@@ -103,7 +105,7 @@ int tcf_hash_release(struct tc_action *a, int bind);
 u32 tcf_hash_new_index(struct tcf_hashinfo *hinfo);
 int tcf_hash_check(u32 index, struct tc_action *a, int bind);
 int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
-   int size, int bind);
+   int size, int bind, bool cpustats);
 void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est);
 void tcf_hash_insert(struct tc_action *a);
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index af427a3dbcba..074a32f466f8 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -27,6 +27,15 @@
 #include 
 #include 
 
+static void free_tcf(struct rcu_head *head)
+{
+   struct tcf_common *p = container_of(head, struct tcf_common, tcfc_rcu);
+
+   free_percpu(p->cpu_bstats);
+   free_percpu(p->cpu_qstats);
+   kfree(p);
+}
+
 void tcf_hash_destroy(struct tc_action *a)
 {
struct tcf_common *p = a->priv;
@@ -41,7 +50,7 @@ void tcf_hash_destroy(struct tc_action *a)
 * gen_estimator est_timer() might access p->tcfc_lock
 * or bstats, wait a RCU grace period before freeing p
 */
-   kfree_rcu(p, tcfc_rcu);
+   call_rcu(&p->tcfc_rcu, free_tcf);
 }
 EXPORT_SYMBOL(tcf_hash_destroy);
 
@@ -230,15 +239,16 @@ void tcf_hash_cleanup(struct tc_action *a, struct nlattr 
*est)
if (est)
gen_kill_estimator(&pc->tcfc_bstats,
   &pc->tcfc_rate_est);
-   kfree_rcu(pc, tcfc_rcu);
+   call_rcu(&pc->tcfc_rcu, free_tcf);
 }
 EXPORT_SYMBOL(tcf_hash_cleanup);
 
 int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
-   int size, int bind)
+   int size, int bind, bool cpustats)
 {
struct tcf_hashinfo *hinfo = a->ops->hinfo;
struct tcf_common *p = kzalloc(size, GFP_KERNEL);
+   int err = -ENOMEM;
 
if (unlikely(!p))
return -ENOMEM;
@@ -246,18 +256,32 @@ int tcf_hash_create(u32 index, struct nlattr *est, struct 
tc_action *a,
if (bind)
p->tcfc_bindcnt = 1;
 
+   if (cpustats) {
+   p->cpu_bstats = netdev_alloc_pcpu_stats(struct 
gnet_stats_basic_cpu);
+   if (!p->cpu_bstats) {
+err1:
+   kfree(p);
+   return err;
+   }
+   p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
+   if (!p->cpu_qstats) {
+err2:
+   free_percpu(p->cpu_bstats);
+   goto err1;
+   }
+   }
spin_lock_init(&p->tcfc_lock);
INIT_HLIST_NODE(&p->tcfc_head);
p->tcfc_index = index ? index : tcf_hash_new_index(hinfo);
p->tcfc_tm.install = jiffies;
p->tcfc_tm.lastuse = jiffies;
if (est) {
-   int err = gen_new_estimator(&p->tcfc_bstats, NULL,
-   &p->tcfc_rate_est,
-   &p->tcfc_lock, est);
+   err = gen_new_estimator(&p->tcfc_bstats, p->cpu_bstats,
+   &p->tcfc_rate_est,
+   &p->tcfc_lock, est);
if (err) {
-   kfree(p);
-   return err;
+   free_percpu(p->cpu_qstats);
+   goto err2;
}
}
 
@@ -615,10 +639,10 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct 
tc_action *a,
if (err < 0)
goto err

[PATCH v2 net-next 3/7] net_sched: act_gact: make tcfg_pval non zero

2015-07-05 Thread Eric Dumazet

First step for gact RCU operation :

Instead of testing if tcfg_pval is zero or not, just make it 1.

No change in behavior, but slightly faster code.

The smp_rmb()/smp_wmb() barriers, while not strictly needed at this
stage are added for upcoming spinlock removal.

Signed-off-by: Eric Dumazet 
Acked-by: Alexei Starovoitov 
Acked-by: Jamal Hadi Salim 
Acked-by: John Fastabend 
---
 net/sched/act_gact.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index a4f8af29ee30..22a3a61aa090 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -28,14 +28,16 @@
 #ifdef CONFIG_GACT_PROB
 static int gact_net_rand(struct tcf_gact *gact)
 {
-   if (!gact->tcfg_pval || prandom_u32() % gact->tcfg_pval)
+   smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
+   if (prandom_u32() % gact->tcfg_pval)
return gact->tcf_action;
return gact->tcfg_paction;
 }
 
 static int gact_determ(struct tcf_gact *gact)
 {
-   if (!gact->tcfg_pval || gact->tcf_bstats.packets % gact->tcfg_pval)
+   smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
+   if (gact->tcf_bstats.packets % gact->tcfg_pval)
return gact->tcf_action;
return gact->tcfg_paction;
 }
@@ -105,7 +107,11 @@ static int tcf_gact_init(struct net *net, struct nlattr 
*nla,
 #ifdef CONFIG_GACT_PROB
if (p_parm) {
gact->tcfg_paction = p_parm->paction;
-   gact->tcfg_pval= p_parm->pval;
+   gact->tcfg_pval= max_t(u16, 1, p_parm->pval);
+   /* Make sure tcfg_pval is written before tcfg_ptype
+* coupled with smp_rmb() in gact_net_rand() & gact_determ()
+*/
+   smp_wmb();
gact->tcfg_ptype   = p_parm->ptype;
}
 #endif
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next 1/7] net: sched: extend percpu stats helpers

2015-07-05 Thread Eric Dumazet

qdisc_bstats_update_cpu() and other helpers were added to support
percpu stats for qdisc.

We want to add percpu stats for tc action, so this patch add common
helpers.

qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update()
qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop()

Signed-off-by: Eric Dumazet 
Cc: Alexei Starovoitov 
Acked-by: Jamal Hadi Salim 
Acked-by: John Fastabend 
---
 include/net/sch_generic.h | 31 +--
 net/core/dev.c|  4 ++--
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 2738f6f87908..2eab08c38e32 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -513,17 +513,20 @@ static inline void bstats_update(struct 
gnet_stats_basic_packed *bstats,
bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1;
 }
 
-static inline void qdisc_bstats_update_cpu(struct Qdisc *sch,
-  const struct sk_buff *skb)
+static inline void bstats_cpu_update(struct gnet_stats_basic_cpu *bstats,
+const struct sk_buff *skb)
 {
-   struct gnet_stats_basic_cpu *bstats =
-   this_cpu_ptr(sch->cpu_bstats);
-
u64_stats_update_begin(&bstats->syncp);
bstats_update(&bstats->bstats, skb);
u64_stats_update_end(&bstats->syncp);
 }
 
+static inline void qdisc_bstats_cpu_update(struct Qdisc *sch,
+  const struct sk_buff *skb)
+{
+   bstats_cpu_update(this_cpu_ptr(sch->cpu_bstats), skb);
+}
+
 static inline void qdisc_bstats_update(struct Qdisc *sch,
   const struct sk_buff *skb)
 {
@@ -547,16 +550,24 @@ static inline void __qdisc_qstats_drop(struct Qdisc *sch, 
int count)
sch->qstats.drops += count;
 }
 
-static inline void qdisc_qstats_drop(struct Qdisc *sch)
+static inline void qstats_drop_inc(struct gnet_stats_queue *qstats)
 {
-   sch->qstats.drops++;
+   qstats->drops++;
 }
 
-static inline void qdisc_qstats_drop_cpu(struct Qdisc *sch)
+static inline void qstats_overlimit_inc(struct gnet_stats_queue *qstats)
 {
-   struct gnet_stats_queue *qstats = this_cpu_ptr(sch->cpu_qstats);
+   qstats->overlimits++;
+}
 
-   qstats->drops++;
+static inline void qdisc_qstats_drop(struct Qdisc *sch)
+{
+   qstats_drop_inc(&sch->qstats);
+}
+
+static inline void qdisc_qstats_cpu_drop(struct Qdisc *sch)
+{
+   qstats_drop_inc(this_cpu_ptr(sch->cpu_qstats));
 }
 
 static inline void qdisc_qstats_overlimit(struct Qdisc *sch)
diff --git a/net/core/dev.c b/net/core/dev.c
index 6778ad52..e0d270143fc7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3646,7 +3646,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff 
*skb,
 
qdisc_skb_cb(skb)->pkt_len = skb->len;
skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
-   qdisc_bstats_update_cpu(cl->q, skb);
+   qdisc_bstats_cpu_update(cl->q, skb);
 
switch (tc_classify(skb, cl, &cl_res)) {
case TC_ACT_OK:
@@ -3654,7 +3654,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff 
*skb,
skb->tc_index = TC_H_MIN(cl_res.classid);
break;
case TC_ACT_SHOT:
-   qdisc_qstats_drop_cpu(cl->q);
+   qdisc_qstats_cpu_drop(cl->q);
case TC_ACT_STOLEN:
case TC_ACT_QUEUED:
kfree_skb(skb);
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next 0/7] net_sched: act: lockless operation

2015-07-05 Thread Eric Dumazet

As mentioned by Alexei last week in Budapest, it is a bit weird
to take a spinlock in order to drop a packet in a tc filter...

Lets add percpu infra for tc actions and use it for gact & mirred.

Before changes, my host with 8 RX queues was handling 5 Mpps with gact,
and more than 11 Mpps after.

Mirred change is not yet visible if ifb+qdisc is used, as ifb is
not yet multi queue enabled, but is a step forward.

Signed-off-by: Eric Dumazet 
Cc: Alexei Starovoitov 
Cc: Jamal Hadi Salim 
Cc: John Fastabend 

Eric Dumazet (7):
  net: sched: extend percpu stats helpers
  net: sched: add percpu stats to actions
  net_sched: act_gact: make tcfg_pval non zero
  net_sched: act_gact: use a separate packet counters for gact_determ()
  net_sched: act_gact: read tcfg_ptype once
  net_sched: act_gact: remove spinlock in fast path
  net_sched: act_mirred: remove spinlock in fast path

 include/net/act_api.h  | 15 ++-
 include/net/sch_generic.h  | 31 ++
 include/net/tc_act/tc_gact.h   |  7 ++---
 include/net/tc_act/tc_mirred.h |  2 +-
 net/core/dev.c |  4 +--
 net/sched/act_api.c| 44 
 net/sched/act_bpf.c|  2 +-
 net/sched/act_connmark.c   |  3 ++-
 net/sched/act_csum.c   |  3 ++-
 net/sched/act_gact.c   | 44 ++--
 net/sched/act_ipt.c|  2 +-
 net/sched/act_mirred.c | 58 ++
 net/sched/act_nat.c|  3 ++-
 net/sched/act_pedit.c  |  3 ++-
 net/sched/act_simple.c |  3 ++-
 net/sched/act_skbedit.c|  3 ++-
 net/sched/act_vlan.c   |  3 ++-
 17 files changed, 148 insertions(+), 82 deletions(-)

-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: macb: Add SG support for Zynq SOC family

2015-07-05 Thread Punnaiah Choudary Kalluri

Enable SG support for Zynq SOC family devices.

Signed-off-by: Punnaiah Choudary Kalluri 
---
 drivers/net/ethernet/cadence/macb.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index caeb395..a4e3f86 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2741,8 +2741,7 @@ static const struct macb_config emac_config = {
 
 
 static const struct macb_config zynqmp_config = {
-   .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE |
-   MACB_CAPS_JUMBO,
+   .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO,
.dma_burst_length = 16,
.clk_init = macb_clk_init,
.init = macb_init,
@@ -2750,8 +2749,7 @@ static const struct macb_config zynqmp_config = {
 };
 
 static const struct macb_config zynq_config = {
-   .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE |
-   MACB_CAPS_NO_GIGABIT_HALF,
+   .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_NO_GIGABIT_HALF,
.dma_burst_length = 16,
.clk_init = macb_clk_init,
.init = macb_init,
-- 
1.7.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH,v2 net-next] ipvs: skb_orphan in case of forwarding

2015-07-05 Thread Alex Gartrell

On Sun, Jul 5, 2015 at 8:50 PM, Simon Horman  wrote:
> Is it possible to get a 'Fixes:' tag?

I suppose it'd be appropriate to say

Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.")

As that is what introduces tcp early_demux, but that's just a guess as
I haven't bisected it (not even sure my test would run on that code
base).

-- 
Alex Gartrell 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH,v2 net-next] ipvs: skb_orphan in case of forwarding

2015-07-05 Thread Simon Horman

On Sun, Jul 05, 2015 at 03:19:27PM -0700, Alex Gartrell wrote:
> On Sun, Jul 5, 2015 at 3:13 PM, Julian Anastasov  wrote:
> > May be the patch fixes crashes? If yes, Simon
> > should apply it for ipvs/net tree, otherwise after
> > the merge window...
> 
> Yeah this is definitely a crash-fix and it's existed since at least 3.10.

Is it possible to get a 'Fixes:' tag?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] ipv6: sysctl to restrict candidate source addresses

2015-07-05 Thread Erik Kline

Reworked with "use_oif_addr".

Thanks,
-Erik

On 3 July 2015 at 16:03, YOSHIFUJI Hideaki
 wrote:
> Hi,
>
> Erik Kline wrote:
>> Per RFC 6724, section 4, "Candidate Source Addresses":
>>
>> It is RECOMMENDED that the candidate source addresses be the set
>> of unicast addresses assigned to the interface that will be used
>> to send to the destination (the "outgoing" interface).
>>
>> Add a sysctl to enable this behaviour.
>>
>> Signed-off-by: Erik Kline 
>> ---
>>  Documentation/networking/ip-sysctl.txt | 12 
>>  include/linux/ipv6.h   |  1 +
>>  include/uapi/linux/ipv6.h  |  1 +
>>  net/ipv6/addrconf.c| 30 +-
>>  4 files changed, 39 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/networking/ip-sysctl.txt 
>> b/Documentation/networking/ip-sysctl.txt
>> index 5fae770..d8f3e60 100644
>> --- a/Documentation/networking/ip-sysctl.txt
>> +++ b/Documentation/networking/ip-sysctl.txt
>> @@ -1435,6 +1435,18 @@ mtu - INTEGER
>>   Default Maximum Transfer Unit
>>   Default: 1280 (IPv6 required minimum)
>>
>> +restrict_srcaddr - INTEGER
>> + Restrict candidate source addresses (vis. RFC 6724, section 4).
>> +
>> + When set to 1, the candidate source addresses for destinations
>> + routed via this interface are restricted to the set of addresses
>> + configured on this interface.
>> +
>> + Possible values are:
>> + 0 : no source address restrictions
>> + 1 : require matching outgoing interface
>> + Default:  0
>> +
>
> I cannot get what "restrict" restricts.  How about "use_oif_addr" or
> something like that (like use_tempaddr)?
>
> --
> Hideaki Yoshifuji 
> Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2] ipv6: sysctl to restrict candidate source addresses

2015-07-05 Thread Erik Kline

Per RFC 6724, section 4, "Candidate Source Addresses":

It is RECOMMENDED that the candidate source addresses be the set
of unicast addresses assigned to the interface that will be used
to send to the destination (the "outgoing" interface).

Add a sysctl to enable this behaviour.

Signed-off-by: Erik Kline 
---
 Documentation/networking/ip-sysctl.txt |  7 +++
 include/linux/ipv6.h   |  1 +
 include/uapi/linux/ipv6.h  |  1 +
 net/ipv6/addrconf.c| 30 +-
 4 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 5fae770..c3bf04d 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1455,6 +1455,13 @@ router_solicitations - INTEGER
routers are present.
Default: 3
 
+use_oif_addr - BOOLEAN
+   When enabled, the candidate source addresses for destinations
+   routed via this interface are restricted to the set of addresses
+   configured on this interface (vis. RFC 6724, section 4).
+
+   Default: false
+
 use_tempaddr - INTEGER
Preference for Privacy Extensions (RFC3041).
  <= 0 : disable Privacy Extensions
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 82806c6..4633c88 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -57,6 +57,7 @@ struct ipv6_devconf {
bool initialized;
struct in6_addr secret;
} stable_secret;
+   __s32   use_oif_addr;
void*sysctl;
 };
 
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 5efa54a..cf9d65a 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -171,6 +171,7 @@ enum {
DEVCONF_USE_OPTIMISTIC,
DEVCONF_ACCEPT_RA_MTU,
DEVCONF_STABLE_SECRET,
+   DEVCONF_USE_OIF_ADDR,
DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 21c2c81..a43687d 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -211,7 +211,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
.accept_ra_mtu  = 1,
.stable_secret  = {
.initialized = false,
-   }
+   },
+   .use_oif_addr   = 0,
 };
 
 static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -253,6 +254,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly 
= {
.stable_secret  = {
.initialized = false,
},
+   .use_oif_addr   = 0,
 };
 
 /* Check if a valid qdisc is available */
@@ -1366,7 +1368,8 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
*score = &scores[0], *hiscore = &scores[1];
struct ipv6_saddr_dst dst;
struct net_device *dev;
-   int dst_type;
+   struct inet6_dev *idev;
+   int dst_type, use_oif_addr = 0;
 
dst_type = __ipv6_addr_type(daddr);
dst.addr = daddr;
@@ -1380,9 +1383,12 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
 
rcu_read_lock();
 
-   for_each_netdev_rcu(net, dev) {
-   struct inet6_dev *idev;
+   if (dst_dev) {
+   idev = __in6_dev_get(dst_dev);
+   use_oif_addr = (idev) ? idev->cnf.use_oif_addr : 0;
+   }
 
+   for_each_netdev_rcu(net, dev) {
/* Candidate Source Address (section 4)
 *  - multicast and link-local destination address,
 *the set of candidate source address MUST only
@@ -1394,9 +1400,14 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
 *include addresses assigned to interfaces
 *belonging to the same site as the outgoing
 *interface.)
+*  - "It is RECOMMENDED that the candidate source addresses
+*be the set of unicast addresses assigned to the
+*interface that will be used to send to the destination
+*(the 'outgoing' interface)." (RFC 6724)
 */
if (((dst_type & IPV6_ADDR_MULTICAST) ||
-dst.scope <= IPV6_ADDR_SCOPE_LINKLOCAL) &&
+dst.scope <= IPV6_ADDR_SCOPE_LINKLOCAL ||
+use_oif_addr) &&
dst.ifindex && dev->ifindex != dst.ifindex)
continue;
 
@@ -4586,6 +4597,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf 
*cnf,
array[DEVCONF_ACCEPT_RA_FROM_LOCAL] = cnf->accept_ra_from_local;
array[DEVCONF_ACCEPT_RA_MTU] = cnf->accept_ra_mtu;
/* we omit DEVCONF_STABLE_SECRET for now */
+   array[DEVCONF_USE_OIF_ADDR] = cnf->use_oif_addr;
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -5585,6 +5597,14 @@ static struct

[PATCH v3 3/3] net: dsa: mv88e6xxx: add switchdev VLAN operations

2015-07-05 Thread Vivien Didelot

This commit implements the switchdev operations to add, delete and dump
VLANs for the Marvell 88E6352 and compatible switch chips.

This allows to access the switch VLAN Table Unit from standard userspace
commands such as "bridge vlan".

A configuration like "1t 2t 3t 4u" for VLAN 10 is achieved like this:

# bridge vlan add dev swp1 vid 10 master
# bridge vlan add dev swp2 vid 10 master
# bridge vlan add dev swp3 vid 10 master
# bridge vlan add dev swp4 vid 10 master untagged pvid

This calls port_vlan_add() for each command. Removing the port 3 from
VLAN 10 is done with:

# bridge vlan del dev swp3 vid 10

This calls port_vlan_del() for port 3. Dumping VLANs is done with:

# bridge vlan show
portvlan ids
swp0None
swp0
swp1 10

swp1 10

swp2 10

swp2 10

swp3None
swp3
swp4 10 PVID Egress Untagged

swp4 10 PVID Egress Untagged

br0 None

This calls port_vlan_dump() for each ports.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6123_61_65.c |   3 +
 drivers/net/dsa/mv88e6131.c   |   3 +
 drivers/net/dsa/mv88e6171.c   |   3 +
 drivers/net/dsa/mv88e6352.c   |   3 +
 drivers/net/dsa/mv88e6xxx.c   | 152 ++
 drivers/net/dsa/mv88e6xxx.h   |   5 ++
 6 files changed, 169 insertions(+)

diff --git a/drivers/net/dsa/mv88e6123_61_65.c 
b/drivers/net/dsa/mv88e6123_61_65.c
index 71a29a7..8e679ff 100644
--- a/drivers/net/dsa/mv88e6123_61_65.c
+++ b/drivers/net/dsa/mv88e6123_61_65.c
@@ -134,6 +134,9 @@ struct dsa_switch_driver mv88e6123_61_65_switch_driver = {
 #endif
.get_regs_len   = mv88e6xxx_get_regs_len,
.get_regs   = mv88e6xxx_get_regs,
+   .port_vlan_add  = mv88e6xxx_port_vlan_add,
+   .port_vlan_del  = mv88e6xxx_port_vlan_del,
+   .port_vlan_dump = mv88e6xxx_port_vlan_dump,
 };
 
 MODULE_ALIAS("platform:mv88e6123");
diff --git a/drivers/net/dsa/mv88e6131.c b/drivers/net/dsa/mv88e6131.c
index 32f4a08..c4d914b 100644
--- a/drivers/net/dsa/mv88e6131.c
+++ b/drivers/net/dsa/mv88e6131.c
@@ -182,6 +182,9 @@ struct dsa_switch_driver mv88e6131_switch_driver = {
.get_strings= mv88e6xxx_get_strings,
.get_ethtool_stats  = mv88e6xxx_get_ethtool_stats,
.get_sset_count = mv88e6xxx_get_sset_count,
+   .port_vlan_add  = mv88e6xxx_port_vlan_add,
+   .port_vlan_del  = mv88e6xxx_port_vlan_del,
+   .port_vlan_dump = mv88e6xxx_port_vlan_dump,
 };
 
 MODULE_ALIAS("platform:mv88e6085");
diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index 1c78084..7701ce6 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -119,6 +119,9 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
.fdb_add= mv88e6xxx_port_fdb_add,
.fdb_del= mv88e6xxx_port_fdb_del,
.fdb_getnext= mv88e6xxx_port_fdb_getnext,
+   .port_vlan_add  = mv88e6xxx_port_vlan_add,
+   .port_vlan_del  = mv88e6xxx_port_vlan_del,
+   .port_vlan_dump = mv88e6xxx_port_vlan_dump,
 };
 
 MODULE_ALIAS("platform:mv88e6171");
diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index 632815c..b981be4a 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -392,6 +392,9 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.fdb_add= mv88e6xxx_port_fdb_add,
.fdb_del= mv88e6xxx_port_fdb_del,
.fdb_getnext= mv88e6xxx_port_fdb_getnext,
+   .port_vlan_add  = mv88e6xxx_port_vlan_add,
+   .port_vlan_del  = mv88e6xxx_port_vlan_del,
+   .port_vlan_dump = mv88e6xxx_port_vlan_dump,
 };
 
 MODULE_ALIAS("platform:mv88e6352");
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index ffd9fc6..d5812ba 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1544,6 +1544,158 @@ static int _mv88e6xxx_vtu_loadpurge(struct dsa_switch 
*ds,
return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_VTU_LOAD_PURGE);
 }
 
+int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port, u16 vid,
+   u16 bridge_flags)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   struct mv88e6xxx_vtu_entry entry = { 0 };
+   int prev_vid = vid ? vid - 1 : 0xfff;
+   int i, ret;
+
+   mutex_lock(&ps->smi_mutex);
+   ret = _mv88e6xxx_vtu_getnext(ds, prev_vid, &entry);
+   if (ret < 0)
+   goto unlock;
+
+   /* If the VLAN does not exist, re-initialize the entry for addition */
+   if (entry.vid != vid || !entry.valid) {
+   memset(&entry, 0, sizeof(entry));
+   entry.valid = true;
+   entry.vid = vid;
+   entry.fid = vid; /* We use one FID per VLAN at the

[PATCH v3 2/3] net: dsa: add support for switchdev VLAN objects

2015-07-05 Thread Vivien Didelot

This patch adds the glue between DSA and switchdev operations to add,
delete and dump SWITCHDEV_OBJ_PORT_VLAN objects.

This is a first step to link the "bridge vlan" command with hardware
entries for DSA compatible switch chips.

Signed-off-by: Vivien Didelot 
---
 include/net/dsa.h  |   9 
 net/dsa/dsa_priv.h |   6 +++
 net/dsa/slave.c| 137 +
 3 files changed, 152 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index fbca63b..cabf2a5 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -302,6 +302,15 @@ struct dsa_switch_driver {
   const unsigned char *addr, u16 vid);
int (*fdb_getnext)(struct dsa_switch *ds, int port,
   unsigned char *addr, bool *is_static);
+
+   /*
+* VLAN support
+*/
+   int (*port_vlan_add)(struct dsa_switch *ds, int port, u16 vid,
+u16 bridge_flags);
+   int (*port_vlan_del)(struct dsa_switch *ds, int port, u16 vid);
+   int (*port_vlan_dump)(struct dsa_switch *ds, int port, u16 vid,
+ u16 *bridge_flags);
 };
 
 void register_switch_driver(struct dsa_switch_driver *type);
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index d5f1f9b..9029717 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 
 struct dsa_device_ops {
netdev_tx_t (*xmit)(struct sk_buff *skb, struct net_device *dev);
@@ -47,6 +48,11 @@ struct dsa_slave_priv {
int old_duplex;
 
struct net_device   *bridge_dev;
+
+   /*
+* Which VLANs this port is a member of.
+*/
+   DECLARE_BITMAP(vlan_bitmap, VLAN_N_VID);
 };
 
 /* dsa.c */
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 04ffad3..47c459b 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "dsa_priv.h"
 
 /* slave mii_bus handling ***/
@@ -363,6 +364,136 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
return ret;
 }
 
+static int dsa_slave_port_vlans_add(struct net_device *dev,
+struct switchdev_obj *obj)
+{
+   struct switchdev_obj_vlan *vlan = &obj->u.vlan;
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   struct dsa_switch *ds = p->parent;
+   int vid, err = 0;
+
+   if (!ds->drv->port_vlan_add)
+   return -EOPNOTSUPP;
+
+   for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
+   err = ds->drv->port_vlan_add(ds, p->port, vid, vlan->flags);
+   if (err)
+   break;
+   set_bit(vid, p->vlan_bitmap);
+   }
+
+   return err;
+}
+
+static int dsa_slave_port_obj_add(struct net_device *dev,
+ struct switchdev_obj *obj)
+{
+   int err;
+
+   /*
+* Skip the prepare phase, since currently the DSA drivers don't need to
+* allocate any memory for operations and they will not fail to HW
+* (unless something horrible goes wrong on the MDIO bus, in which case
+* the prepare phase wouldn't have been able to predict anyway).
+*/
+   if (obj->trans != SWITCHDEV_TRANS_COMMIT)
+   return 0;
+
+   switch (obj->id) {
+   case SWITCHDEV_OBJ_PORT_VLAN:
+   err = dsa_slave_port_vlans_add(dev, obj);
+   break;
+   default:
+   err = -EOPNOTSUPP;
+   break;
+   }
+
+   return err;
+}
+
+static int dsa_slave_port_vlans_del(struct net_device *dev,
+struct switchdev_obj *obj)
+{
+   struct switchdev_obj_vlan *vlan = &obj->u.vlan;
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   struct dsa_switch *ds = p->parent;
+   int vid, err = 0;
+
+   if (!ds->drv->port_vlan_del)
+   return -EOPNOTSUPP;
+
+   for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
+   err = ds->drv->port_vlan_del(ds, p->port, vid);
+   if (err)
+   break;
+   clear_bit(vid, p->vlan_bitmap);
+   }
+
+   return err;
+}
+
+static int dsa_slave_port_obj_del(struct net_device *dev,
+ struct switchdev_obj *obj)
+{
+   int err;
+
+   switch (obj->id) {
+   case SWITCHDEV_OBJ_PORT_VLAN:
+   err = dsa_slave_port_vlans_del(dev, obj);
+   break;
+   default:
+   err = -EOPNOTSUPP;
+   break;
+   }
+
+   return err;
+}
+
+static int dsa_slave_port_vlans_dump(struct net_device *dev,
+struct switchdev_obj *obj)
+{
+   struct switchdev_obj_vlan *vlan = &obj->u.vlan;
+   struct dsa_slave_priv *p = netdev_priv(dev);
+

[PATCH v3 1/3] net: dsa: mv88e6xxx: add debugfs interface for VTU

2015-07-05 Thread Vivien Didelot

Implement the Get Next and Load Purge operations for the VLAN Table
Unit, and a "vtu" debugfs file to read and write the hardware VLANs.

A populated VTU look like this:

# cat /sys/kernel/debug/dsa0/vtu
 VID  FID  SID  0  1  2  3  4  5  6
 550  5620  x  x  x  u  x  t  x
1000 10120  x  x  t  x  x  t  x
1200 12120  x  x  t  x  t  t  x

Where "t", "u", "x", "-", respectively means that the port is tagged,
untagged, excluded or unmodified, for a given VLAN entry.

VTU entries can be added by echoing the same format:

echo 1300 1312 0 x x t x t t x > vtu

and can be deleted by echoing only the VID:

echo 1000 > vtu

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 311 
 drivers/net/dsa/mv88e6xxx.h |  24 
 2 files changed, 335 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 8c130c0..ffd9fc6 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2,6 +2,9 @@
  * net/dsa/mv88e6xxx.c - Marvell 88e6xxx switch chip support
  * Copyright (c) 2008 Marvell Semiconductor
  *
+ * Copyright (c) 2015 CMC Electronics, Inc.
+ * Added support for 802.1Q VLAN Table Unit
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -1366,6 +1369,181 @@ static void mv88e6xxx_bridge_work(struct work_struct 
*work)
}
 }
 
+static int _mv88e6xxx_vtu_wait(struct dsa_switch *ds)
+{
+   return _mv88e6xxx_wait(ds, REG_GLOBAL, GLOBAL_VTU_OP,
+  GLOBAL_VTU_OP_BUSY);
+}
+
+static int _mv88e6xxx_vtu_cmd(struct dsa_switch *ds, u16 op)
+{
+   int ret;
+
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_OP, op);
+   if (ret < 0)
+   return ret;
+
+   return _mv88e6xxx_vtu_wait(ds);
+}
+
+static int _mv88e6xxx_stu_loadpurge(struct dsa_switch *ds, u8 sid, bool valid)
+{
+   int ret, data;
+
+   ret = _mv88e6xxx_vtu_wait(ds);
+   if (ret < 0)
+   return ret;
+
+   data = sid & GLOBAL_VTU_SID_MASK;
+   if (valid)
+   data |= GLOBAL_VTU_VID_VALID;
+
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_VID, data);
+   if (ret < 0)
+   return ret;
+
+   /* Unused (yet) data registers */
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_DATA_0_3, 0);
+   if (ret < 0)
+   return ret;
+
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_DATA_4_7, 0);
+   if (ret < 0)
+   return ret;
+
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_DATA_8_11, 0);
+   if (ret < 0)
+   return ret;
+
+   return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_STU_LOAD_PURGE);
+}
+
+static int _mv88e6xxx_vtu_getnext(struct dsa_switch *ds, u16 vid,
+ struct mv88e6xxx_vtu_entry *entry)
+{
+   int ret, i;
+
+   ret = _mv88e6xxx_vtu_wait(ds);
+   if (ret < 0)
+   return ret;
+
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_VID,
+  vid & GLOBAL_VTU_VID_MASK);
+   if (ret < 0)
+   return ret;
+
+   ret = _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_VTU_GET_NEXT);
+   if (ret < 0)
+   return ret;
+
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_VID);
+   if (ret < 0)
+   return ret;
+
+   entry->vid = ret & GLOBAL_VTU_VID_MASK;
+   entry->valid = !!(ret & GLOBAL_VTU_VID_VALID);
+
+   if (entry->valid) {
+   /* Ports 0-3, offsets 0, 4, 8, 12 */
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_DATA_0_3);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < 4; ++i)
+   entry->tags[i] = (ret >> (i * 4)) & 3;
+
+   /* Ports 4-6, offsets 0, 4, 8 */
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_DATA_4_7);
+   if (ret < 0)
+   return ret;
+
+   for (i = 4; i < 7; ++i)
+   entry->tags[i] = (ret >> ((i - 4) * 4)) & 3;
+
+   if (mv88e6xxx_6097_family(ds) || mv88e6xxx_6165_family(ds) ||
+   mv88e6xxx_6351_family(ds) || mv88e6xxx_6352_family(ds)) {
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL,
+ GLOBAL_VTU_FID);
+   if (ret < 0)
+   return ret;
+
+   entry->fid = ret & GLOBAL_VTU_FID_MASK;
+
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL,
+ GLOBAL_VTU_SID);
+   if (ret < 0)
+   return ret;
+
+

[PATCH v3 0/3] net: dsa: mv88e6xxx: add support for VLAN Table Unit

2015-07-05 Thread Vivien Didelot

Hi all,

This patchset brings full support for hardware VLANs in DSA, and the Marvell
88E6xxx compatible switch chips.

The first patch adds the VTU operations to the mv88e6xxx code, as well as a
"vtu" debugfs file to read and modify the hardware VLAN table.

The second patch adds the glue between DSA and the switchdev VLAN objects.

The third patch finally implements the necessary functions in the mv88e6xxx
code to interact with the hardware VLAN through switchdev, from userspace
commands such as "bridge vlan".

Below is an example of what can be done with this patchset.

"VID 550: 1t 3u"
"VID 1000: 2t"
"VID 1200: 2t 4t"

The VLAN setup above can be achieved with the following bridge commands:

bridge vlan add vid 550 dev swp1 master
bridge vlan add vid 550 dev swp3 master untagged pvid
bridge vlan add vid 1000 dev swp2 master
bridge vlan add vid 1200 dev swp2 master
bridge vlan add vid 1200 dev swp4 master

Removing the port 1 from VLAN 550 is done with:

bridge vlan del vid 550 dev swp1

The bridge command would output the following setup:

# bridge vlan
portvlan ids
swp0None
swp0
swp1None
swp1
swp21000
1200

swp21000
1200

swp3550 PVID Egress Untagged

swp3550 PVID Egress Untagged

swp41200

swp41200

br0 None

Assuming that swp5 is the CPU port, the "vtu" debugfs file would show:

# cat /sys/kernel/debug/dsa0/vtu
VID  FID  SID  0  1  2  3  4  5  6
550  550  0x  x  x  u  x  t  x
1000 1000 0x  x  t  x  x  t  x
1200 1200 0x  x  t  x  t  t  x

Cheers,
  -v


Vivien Didelot (3):
  net: dsa: mv88e6xxx: add debugfs interface for VTU
  net: dsa: add support for switchdev VLAN objects
  net: dsa: mv88e6xxx: add switchdev VLAN operations

 drivers/net/dsa/mv88e6123_61_65.c |   3 +
 drivers/net/dsa/mv88e6131.c   |   3 +
 drivers/net/dsa/mv88e6171.c   |   3 +
 drivers/net/dsa/mv88e6352.c   |   3 +
 drivers/net/dsa/mv88e6xxx.c   | 463 ++
 drivers/net/dsa/mv88e6xxx.h   |  29 +++
 include/net/dsa.h |   9 +
 net/dsa/dsa_priv.h|   6 +
 net/dsa/slave.c   | 137 +++
 9 files changed, 656 insertions(+)

-- 
2.4.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RESEND] xen-netback: remove duplicated function definition

2015-07-05 Thread Liang Li

There are two duplicated xenvif_zerocopy_callback() definitions.
Remove one of them.

Signed-off-by: Liang Li 
---
 drivers/net/xen-netback/common.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..c6cb85a 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -325,9 +325,6 @@ static inline pending_ring_idx_t nr_pending_reqs(struct 
xenvif_queue *queue)
queue->pending_prod + queue->pending_cons;
 }
 
-/* Callback from stack when TX packet can be released */
-void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
-
 irqreturn_t xenvif_interrupt(int irq, void *dev_id);
 
 extern bool separate_tx_rx_irq;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH,v2 net-next] ipvs: skb_orphan in case of forwarding

2015-07-05 Thread Alex Gartrell

On Sun, Jul 5, 2015 at 3:13 PM, Julian Anastasov  wrote:
> May be the patch fixes crashes? If yes, Simon
> should apply it for ipvs/net tree, otherwise after
> the merge window...

Yeah this is definitely a crash-fix and it's existed since at least 3.10.

-- 
Alex Gartrell 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH,v2 net-next] ipvs: skb_orphan in case of forwarding

2015-07-05 Thread Julian Anastasov


Hello,

On Sun, 5 Jul 2015, Alex Gartrell wrote:

> It is possible that we bind against a local socket in early_demux when we
> are actually going to want to forward it.  In this case, the socket serves
> no purpose and only serves to confuse things (particularly functions which
> implicitly expect sk_fullsock to be true, like ip_local_out).
> Additionally, skb_set_owner_w is totally broken for non full-socks.
> 
> Signed-off-by: Alex Gartrell 

Thanks for fixing this problem!

Acked-by: Julian Anastasov 

May be the patch fixes crashes? If yes, Simon
should apply it for ipvs/net tree, otherwise after
the merge window...

> ---
>  net/netfilter/ipvs/ip_vs_xmit.c | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> index bf66a86..99d4a41 100644
> --- a/net/netfilter/ipvs/ip_vs_xmit.c
> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> @@ -527,6 +527,21 @@ static inline int ip_vs_tunnel_xmit_prepare(struct 
> sk_buff *skb,
>   return ret;
>  }
>  
> +/* In the event of a remote destination, it's possible that we would have
> + * matches against an old socket (particularly a TIME-WAIT socket). This
> + * causes havoc down the line (ip_local_out et. al. expect regular sockets
> + * and invalid memory accesses will happen) so simply drop the association
> + * in this case.
> +*/
> +static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb)
> +{
> + /* If dev is set, the packet came from the LOCAL_IN callback and
> +  * not from a local TCP socket.
> +  */
> + if (skb->dev)
> + skb_orphan(skb);
> +}
> +
>  /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */
>  static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb,
>struct ip_vs_conn *cp, int local)
> @@ -538,12 +553,21 @@ static inline int ip_vs_nat_send_or_cont(int pf, struct 
> sk_buff *skb,
>   ip_vs_notrack(skb);
>   else
>   ip_vs_update_conntrack(skb, cp, 1);
> +
> + /* Remove the early_demux association unless it's bound for the
> +  * exact same port and address on this host after translation.
> +  */
> + if (!local || cp->vport != cp->dport ||
> + !ip_vs_addr_equal(cp->af, &cp->vaddr, &cp->daddr))
> + ip_vs_drop_early_demux_sk(skb);
> +
>   if (!local) {
>   skb_forward_csum(skb);
>   NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb,
>   NULL, skb_dst(skb)->dev, dst_output_sk);
>   } else
>   ret = NF_ACCEPT;
> +
>   return ret;
>  }
>  
> @@ -557,6 +581,7 @@ static inline int ip_vs_send_or_cont(int pf, struct 
> sk_buff *skb,
>   if (likely(!(cp->flags & IP_VS_CONN_F_NFCT)))
>   ip_vs_notrack(skb);
>   if (!local) {
> + ip_vs_drop_early_demux_sk(skb);
>   skb_forward_csum(skb);
>   NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb,
>   NULL, skb_dst(skb)->dev, dst_output_sk);
> @@ -845,6 +870,8 @@ ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int 
> skb_af,
>   struct ipv6hdr *old_ipv6h = NULL;
>  #endif
>  
> + ip_vs_drop_early_demux_sk(skb);
> +
>   if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) {
>   new_skb = skb_realloc_headroom(skb, max_headroom);
>   if (!new_skb)
> -- 
> Alex Gartrell 

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] mlx4: TCP/UDP packets have L4 hash

2015-07-05 Thread Ido Shamay


On 7/6/2015 12:33 AM, Eric Dumazet wrote:

On Mon, 2015-07-06 at 00:16 +0300, Ido Shamay wrote:


We can have a relaxation of the condition  by looking only at TCP/UDP
CQE indication (without check-sum indications)
This can cover us also when device rx-checksuming feature is off.
Do we want it or a correlation between check-sum and l4_hash is needed?

I thought about that, but this was adding a more complex test in fast
path.

Not sure we should care here, as nobody would disable hardware checksum
if they care about performance.

I agree, thank you Eric
Acked-by: Ido Shamay 





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] mlx4: TCP/UDP packets have L4 hash

2015-07-05 Thread Eric Dumazet

On Mon, 2015-07-06 at 00:16 +0300, Ido Shamay wrote:

> We can have a relaxation of the condition  by looking only at TCP/UDP 
> CQE indication (without check-sum indications)
> This can cover us also when device rx-checksuming feature is off.
> Do we want it or a correlation between check-sum and l4_hash is needed?

I thought about that, but this was adding a more complex test in fast
path.

Not sure we should care here, as nobody would disable hardware checksum
if they care about performance.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH,v2 net-next] ipvs: skb_orphan in case of forwarding

2015-07-05 Thread Alex Gartrell

It is possible that we bind against a local socket in early_demux when we
are actually going to want to forward it.  In this case, the socket serves
no purpose and only serves to confuse things (particularly functions which
implicitly expect sk_fullsock to be true, like ip_local_out).
Additionally, skb_set_owner_w is totally broken for non full-socks.

Signed-off-by: Alex Gartrell 
---
 net/netfilter/ipvs/ip_vs_xmit.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index bf66a86..99d4a41 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -527,6 +527,21 @@ static inline int ip_vs_tunnel_xmit_prepare(struct sk_buff 
*skb,
return ret;
 }
 
+/* In the event of a remote destination, it's possible that we would have
+ * matches against an old socket (particularly a TIME-WAIT socket). This
+ * causes havoc down the line (ip_local_out et. al. expect regular sockets
+ * and invalid memory accesses will happen) so simply drop the association
+ * in this case.
+*/
+static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb)
+{
+   /* If dev is set, the packet came from the LOCAL_IN callback and
+* not from a local TCP socket.
+*/
+   if (skb->dev)
+   skb_orphan(skb);
+}
+
 /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */
 static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb,
 struct ip_vs_conn *cp, int local)
@@ -538,12 +553,21 @@ static inline int ip_vs_nat_send_or_cont(int pf, struct 
sk_buff *skb,
ip_vs_notrack(skb);
else
ip_vs_update_conntrack(skb, cp, 1);
+
+   /* Remove the early_demux association unless it's bound for the
+* exact same port and address on this host after translation.
+*/
+   if (!local || cp->vport != cp->dport ||
+   !ip_vs_addr_equal(cp->af, &cp->vaddr, &cp->daddr))
+   ip_vs_drop_early_demux_sk(skb);
+
if (!local) {
skb_forward_csum(skb);
NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb,
NULL, skb_dst(skb)->dev, dst_output_sk);
} else
ret = NF_ACCEPT;
+
return ret;
 }
 
@@ -557,6 +581,7 @@ static inline int ip_vs_send_or_cont(int pf, struct sk_buff 
*skb,
if (likely(!(cp->flags & IP_VS_CONN_F_NFCT)))
ip_vs_notrack(skb);
if (!local) {
+   ip_vs_drop_early_demux_sk(skb);
skb_forward_csum(skb);
NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb,
NULL, skb_dst(skb)->dev, dst_output_sk);
@@ -845,6 +870,8 @@ ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int skb_af,
struct ipv6hdr *old_ipv6h = NULL;
 #endif
 
+   ip_vs_drop_early_demux_sk(skb);
+
if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) {
new_skb = skb_realloc_headroom(skb, max_headroom);
if (!new_skb)
-- 
Alex Gartrell 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] mlx4: TCP/UDP packets have L4 hash

2015-07-05 Thread Ido Shamay


On 7/2/2015 2:24 PM, Eric Dumazet wrote:

From: Eric Dumazet 

Mellanox driver has the knowledge if rxhash is a L4 hash,
if it receives a non fragmented TCP or UDP frame and
NETIF_F_RXCSUM is enabled on netdev.

ip_summed value is CHECKSUM_UNNECESSARY in this case.

Signed-off-by: Eric Dumazet 
Cc: Amir Vadai 
Cc: Ido Shamay 
---
  drivers/net/ethernet/mellanox/mlx4/en_rx.c |8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 7a4f20bb7fcb..12c65e1ad6a9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -917,7 +917,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
if (dev->features & NETIF_F_RXHASH)
skb_set_hash(gro_skb,
 
be32_to_cpu(cqe->immed_rss_invalid),
-PKT_HASH_TYPE_L3);
+(ip_summed == 
CHECKSUM_UNNECESSARY) ?
+   PKT_HASH_TYPE_L4 :
+   PKT_HASH_TYPE_L3);

Thanks Eric,

We can have a relaxation of the condition  by looking only at TCP/UDP 
CQE indication (without check-sum indications)

This can cover us also when device rx-checksuming feature is off.
Do we want it or a correlation between check-sum and l4_hash is needed?

Ido
  
  			skb_record_rx_queue(gro_skb, cq->ring);

skb_mark_napi_id(gro_skb, &cq->napi);
@@ -963,7 +965,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
if (dev->features & NETIF_F_RXHASH)
skb_set_hash(skb,
 be32_to_cpu(cqe->immed_rss_invalid),
-PKT_HASH_TYPE_L3);
+(ip_summed == CHECKSUM_UNNECESSARY) ?
+   PKT_HASH_TYPE_L4 :
+   PKT_HASH_TYPE_L3);
  
  		if ((be32_to_cpu(cqe->vlan_my_qpn) &

MLX4_CQE_VLAN_PRESENT_MASK) &&


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] ipvs: skb_orphan in case of forwarding

2015-07-05 Thread Julian Anastasov

Hello,

On Sun, 5 Jul 2015, Alex Gartrell wrote:

> + /* Remove the early_demux association unless it's bound for the
> +  * exact same port and address on this host after translation.
> +  */
> + if (!local || cp->vport != cp->dport ||
> + !ip_vs_addr_equal(cp->af, &cp->vaddr, &cp->caddr))

Sigh, it was my mistake, it should be
cp->daddr instead of cp->caddr. It seems, I copied it
from somewhere to give example... Sorry, can you resend
with cp->daddr as v2.

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] ipvs: skb_orphan in case of forwarding

2015-07-05 Thread Alex Gartrell

It is possible that we bind against a local socket in early_demux when we
are actually going to want to forward it.  In this case, the socket serves
no purpose and only serves to confuse things (particularly functions which
implicitly expect sk_fullsock to be true, like ip_local_out).
Additionally, skb_set_owner_w is totally broken for non full-socks.

Signed-off-by: Alex Gartrell 
---
 net/netfilter/ipvs/ip_vs_xmit.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index bf66a86..65526f4 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -527,6 +527,21 @@ static inline int ip_vs_tunnel_xmit_prepare(struct sk_buff 
*skb,
return ret;
 }
 
+/* In the event of a remote destination, it's possible that we would have
+ * matches against an old socket (particularly a TIME-WAIT socket). This
+ * causes havoc down the line (ip_local_out et. al. expect regular sockets
+ * and invalid memory accesses will happen) so simply drop the association
+ * in this case.
+*/
+static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb)
+{
+   /* If dev is set, the packet came from the LOCAL_IN callback and
+* not from a local TCP socket.
+*/
+   if (skb->dev)
+   skb_orphan(skb);
+}
+
 /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */
 static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb,
 struct ip_vs_conn *cp, int local)
@@ -538,12 +553,21 @@ static inline int ip_vs_nat_send_or_cont(int pf, struct 
sk_buff *skb,
ip_vs_notrack(skb);
else
ip_vs_update_conntrack(skb, cp, 1);
+
+   /* Remove the early_demux association unless it's bound for the
+* exact same port and address on this host after translation.
+*/
+   if (!local || cp->vport != cp->dport ||
+   !ip_vs_addr_equal(cp->af, &cp->vaddr, &cp->caddr))
+   ip_vs_drop_early_demux_sk(skb);
+
if (!local) {
skb_forward_csum(skb);
NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb,
NULL, skb_dst(skb)->dev, dst_output_sk);
} else
ret = NF_ACCEPT;
+
return ret;
 }
 
@@ -557,6 +581,7 @@ static inline int ip_vs_send_or_cont(int pf, struct sk_buff 
*skb,
if (likely(!(cp->flags & IP_VS_CONN_F_NFCT)))
ip_vs_notrack(skb);
if (!local) {
+   ip_vs_drop_early_demux_sk(skb);
skb_forward_csum(skb);
NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb,
NULL, skb_dst(skb)->dev, dst_output_sk);
@@ -845,6 +870,8 @@ ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int skb_af,
struct ipv6hdr *old_ipv6h = NULL;
 #endif
 
+   ip_vs_drop_early_demux_sk(skb);
+
if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) {
new_skb = skb_realloc_headroom(skb, max_headroom);
if (!new_skb)
-- 
Alex Gartrell 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] xen-netback: remove duplicated function definition

2015-07-05 Thread Wei Liu

On Sat, Jul 04, 2015 at 03:33:00AM +0800, Liang Li wrote:
> There are two duplicated xenvif_zerocopy_callback() definitions.
> Remove one of them.
> 
> Signed-off-by: Liang Li 

Acked-by: Wei Liu 

Please fix the time of your computer and resend.

Wei.

> ---
>  drivers/net/xen-netback/common.h | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h 
> b/drivers/net/xen-netback/common.h
> index 8a495b3..c6cb85a 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -325,9 +325,6 @@ static inline pending_ring_idx_t nr_pending_reqs(struct 
> xenvif_queue *queue)
>   queue->pending_prod + queue->pending_cons;
>  }
>  
> -/* Callback from stack when TX packet can be released */
> -void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
> -
>  irqreturn_t xenvif_interrupt(int irq, void *dev_id);
>  
>  extern bool separate_tx_rx_irq;
> -- 
> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: phy: add dependency on HAS_IOMEM to MDIO_BUS_MUX_MMIOREG

2015-07-05 Thread Rob Herring

On UML builds, mdio-mux-mmioreg.c fails to compile:

drivers/net/phy/mdio-mux-mmioreg.c:50:3: error: implicit declaration of 
function ‘ioremap’ [-Werror=implicit-function-declaration]
drivers/net/phy/mdio-mux-mmioreg.c:63:3: error: implicit declaration of 
function ‘iounmap’ [-Werror=implicit-function-declaration]

This is due to CONFIG_OF now being user selectable. Add a dependency on
HAS_IOMEM to fix this.

Signed-off-by: Rob Herring 
Cc: Florian Fainelli 
Cc: David S. Miller 
---
 drivers/net/phy/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index cf18940..cb86d7a 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -191,7 +191,7 @@ config MDIO_BUS_MUX_GPIO
 
 config MDIO_BUS_MUX_MMIOREG
tristate "Support for MMIO device-controlled MDIO bus multiplexers"
-   depends on OF_MDIO
+   depends on OF_MDIO && HAS_IOMEM
select MDIO_BUS_MUX
help
  This module provides a driver for MDIO bus multiplexers that
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux 4.2 build error in net/netfilter/ipset/ip_set_hash_netnet.c

2015-07-05 Thread Guenter Roeck

On Sat, Jul 04, 2015 at 12:44:36AM -0700, Vinson Lee wrote:
> Hi.
> 
> With the latest Linux 4.2-rc1, I am hitting this build error with GCC
> 4.4.7 on CentOS 6.
> 
>   CC  net/netfilter/ipset/ip_set_hash_netnet.o
> net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet4_uadt’:
> net/netfilter/ipset/ip_set_hash_netnet.c:163: error: unknown field
> ‘cidr’ specified in initializer
> net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: missing braces
> around initializer
> net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: (near
> initialization for ‘e..ip’)
> net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet6_uadt’:
> net/netfilter/ipset/ip_set_hash_netnet.c:388: error: unknown field
> ‘cidr’ specified in initializer
> net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: missing braces
> around initializer
> net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: (near
> initialization for ‘e.ip[0]’)
> 
Previously fixed with commit 1a869205c75cb ("netfilter: ipset: The unnamed union
initialization may lead to compilation error"), reintroduced with commit
aff227581ed1a ("netfilter: ipset: Check CIDR value only when attribute is 
given").

Guenter
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-05 Thread Julian Anastasov


Hello,

On Fri, 3 Jul 2015, Alex Gartrell wrote:

> > - if packets go to local server IPVS should not touch
> > skb->dst, skb->sk, etc (NF_ACCEPT case)
> 
> Yeah, the thing is that early demux could totally match for a socket
> that existed before we created the service, and in that instance it
> might make the most sense to retain the connection and simply
> NF_ACCEPT.  The problem with that approach though is that is that the
> behavior changes if early_demux is not enabled.  I believe that we
> should just do the consistent thing and always drop the early_demux
> result if bound for non-local, as you've said.

We must not forget that a local server listening
on 0.0.0.0:VPORT or VIP:VPORT can be reached if a real
server with some local IP is used as RIP. So, early demux
will really work for this case when local stack is one
of the real servers.

> The interesting thing though is that, for the purposes of routing,
> enabling early_demux does change the behavior.  I suspect that's a
> bug, but it's far enough away from actual use cases that it's probably
> fine (who is out there tearing down addresses and setting up routes in
> their place?)

Looks like routing by definition can not divert skbs with
early-demux socket because input routing is not called.
Netfilter's DNAT may change daddr/dport before early-demux
and in this case socket should not be found (eg. if we
DNAT to other host). So, there is problem mostly for IPVS,
I don't remember for other cases. May be CLUSTERIP too,
I'm not sure. There is the problem that at LOCAL_IN
SNAT is valid operation, not sure how it affects
early-demux.

> What do you think of the following:
> 
> commit f04c42f8041cc4ccc4cb2a30c1058136dd497a83
> Author: Alex Gartrell 
> Date:   Wed Jul 1 13:24:46 2015 -0700
> 
> ipvs: orphan_skb in case of forwarding

skb_orphan or orphan skb

> It is possible that we bind against a local socket in early_demux when we
> are actually going to want to forward it.  In this case, the socket serves
> no purpose and only serves to confuse things (particularly functions which
> implicitly expect sk_fullsock to be true, like ip_local_out).
> Additionally, skb_set_owner_w is totally broken for non full-socks.
> 
> Signed-off-by: Alex Gartrell 
> 
> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> index bf66a86..3efe719 100644
> --- a/net/netfilter/ipvs/ip_vs_xmit.c
> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> @@ -527,6 +527,19 @@ static inline int
> ip_vs_tunnel_xmit_prepare(struct sk_buff *skb,
> return ret;
>  }
> 
> +/* In the event of a remote destination, it's possible that we would have
> + * matches against an old socket (particularly a TIME-WAIT socket). This
> + * causes havoc down the line (ip_local_out et. al. expect regular sockets
> + * and invalid memory accesses will happen) so simply drop the association
> + * in this case
> +*/
> +static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb) {

Move '{' on next line and below comment should be closed
on next line. But I guess you will run later
scripts/checkpatch.pl --strict /tmp/file.patch

> +   /* If dev is set, the packet came from the LOCAL_IN callback and
> +* not from a local TCP socket */
> +   if (skb->dev)
> +   skb_orphan(skb);
> +}
> +
>  /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */
>  static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb,
>  struct ip_vs_conn *cp, int local)
> @@ -539,6 +552,7 @@ static inline int ip_vs_nat_send_or_cont(int pf,
> struct sk_buff *skb,
> else
> ip_vs_update_conntrack(skb, cp, 1);
> if (!local) {
> +   ip_vs_drop_early_demux_sk(skb);
> skb_forward_csum(skb);
> NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb,
> NULL, skb_dst(skb)->dev, dst_output_sk);

For the local=true case in ip_vs_nat_send_or_cont may be
we should call skb_orphan when cp->dport != cp->vport or
cp->daddr != cp->vaddr. This is a case where we DNAT to
local real server but on different addr/port. If early
demux finds socket, it is some socket shadowed after adding
the virtual service. So, may be we have to add such checks
near the NF_ACCEPT code.

Can this work?

else {
/* Drop early-demux socket on DNAT */
if (cp->vport != cp->dport ||
!ip_vs_addr_equal(cp->af, cp->vaddr, &cp->caddr))
ip_vs_drop_early_demux_sk(skb);
ret = NF_ACCEPT;
}

Otherwise, the other changes look good to me.

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] net: mvneta: Statically assign queues to CPUs

2015-07-05 Thread Willy Tarreau

Hi Thomas,

On Fri, Jul 03, 2015 at 04:46:24PM +0200, Thomas Petazzoni wrote:
> Maxime,
> 
> On Fri,  3 Jul 2015 16:25:51 +0200, Maxime Ripard wrote:
> 
> > +static void mvneta_percpu_enable(void *arg)
> > +{
> > +   struct mvneta_port *pp = arg;
> > +
> > +   enable_percpu_irq(pp->dev->irq, IRQ_TYPE_NONE);
> > +}
> > +
> >  static int mvneta_open(struct net_device *dev)
> >  {
> > struct mvneta_port *pp = netdev_priv(dev);
> > @@ -2655,6 +2662,19 @@ static int mvneta_open(struct net_device *dev)
> > goto err_cleanup_txqs;
> > }
> >  
> > +   /*
> > +* Even though the documentation says that request_percpu_irq
> > +* doesn't enable the interrupts automatically, it actually
> > +* does so on the local CPU.
> > +*
> > +* Make sure it's disabled.
> > +*/
> > +   disable_percpu_irq(pp->dev->irq);
> > +
> > +   /* Enable per-CPU interrupt on the one CPU we care about */
> > +   smp_call_function_single(rxq_def % num_online_cpus(),
> > +mvneta_percpu_enable, pp, true);
> 
> What happens if that CPU goes offline through CPU hotplug?

I just tried : if I start mvneta with "rxq_def=1", then my irq runs on
CPU1. Then I offline CPU1 and the irqs are automatically handled by CPU0.
Then I online CPU1 and irqs stay on CPU0.

More or less related, I found that if I enable a queue number larger than
the CPU count it does work, but then the system complains during rmmod :

[  877.146203] [ cut here ]
[  877.146227] WARNING: CPU: 1 PID: 1731 at fs/proc/generic.c:552 
remove_proc_entry+0x144/0x15c()
[  877.146233] remove_proc_entry: removing non-empty directory 'irq/29', 
leaking at least 'mvneta'
[  877.146238] Modules linked in: mvneta(-) [last unloaded: mvneta]
[  877.146254] CPU: 1 PID: 1731 Comm: rmmod Tainted: GW   
4.1.1-mvebu-6-g3d317ed-dirty #5
[  877.146260] Hardware name: Marvell Armada 370/XP (Device Tree)
[  877.146281] [] (unwind_backtrace) from [] 
(show_stack+0x10/0x14)
[  877.146293] [] (show_stack) from [] 
(dump_stack+0x74/0x90)
[  877.146305] [] (dump_stack) from [] 
(warn_slowpath_common+0x74/0xb0)
[  877.146315] [] (warn_slowpath_common) from [] 
(warn_slowpath_fmt+0x30/0x40)
[  877.146325] [] (warn_slowpath_fmt) from [] 
(remove_proc_entry+0x144/0x15c)
[  877.146336] [] (remove_proc_entry) from [] 
(unregister_irq_proc+0x8c/0xb0)
[  877.146347] [] (unregister_irq_proc) from [] 
(free_desc+0x28/0x58)
[  877.146356] [] (free_desc) from [] 
(irq_free_descs+0x44/0x80)
[  877.146368] [] (irq_free_descs) from [] 
(mvneta_remove+0x3c/0x4c [mvneta])
[  877.146382] [] (mvneta_remove [mvneta]) from [] 
(platform_drv_remove+0x18/0x30)
[  877.146393] [] (platform_drv_remove) from [] 
(__device_release_driver+0x70/0xe4)
[  877.146402] [] (__device_release_driver) from [] 
(driver_detach+0xcc/0xd0)
[  877.146411] [] (driver_detach) from [] 
(bus_remove_driver+0x4c/0x90)
[  877.146425] [] (bus_remove_driver) from [] 
(SyS_delete_module+0x164/0x1b4)
[  877.146437] [] (SyS_delete_module) from [] 
(ret_fast_syscall+0x0/0x3c)
[  877.146443] ---[ end trace 48713a9ae31204b1 ]---

This was on the AX3 (dual-proc) with rxq_def=2.

Hoping this helps,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] net: mvneta: Handle per-cpu interrupts

2015-07-05 Thread Willy Tarreau

Hi Maxime,

On Fri, Jul 03, 2015 at 04:25:49PM +0200, Maxime Ripard wrote:
> Now that our interrupt controller is allowing us to use per-CPU interrupts,
> actually use it in the mvneta driver.
> 
> This involves obviously reworking the driver to have a CPU-local NAPI
> structure, and report for incoming packet using that structure.
> 
> Signed-off-by: Maxime Ripard 

This patch breaks module build of mvneta unless you export request_percpu_irq :

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index ec31697..1440a92 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1799,6 +1799,7 @@ int request_percpu_irq(unsigned int irq, irq_handler_t 
handler,
 
return retval;
 }
+EXPORT_SYMBOL_GPL(request_percpu_irq);
 
Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Linux 4.2 build error in net/netfilter/ipset/ip_set_hash_netnet.c

2015-07-05 Thread Vinson Lee

Hi.

With the latest Linux 4.2-rc1, I am hitting this build error with GCC
4.4.7 on CentOS 6.

  CC  net/netfilter/ipset/ip_set_hash_netnet.o
net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet4_uadt’:
net/netfilter/ipset/ip_set_hash_netnet.c:163: error: unknown field
‘cidr’ specified in initializer
net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: missing braces
around initializer
net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: (near
initialization for ‘e..ip’)
net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet6_uadt’:
net/netfilter/ipset/ip_set_hash_netnet.c:388: error: unknown field
‘cidr’ specified in initializer
net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: missing braces
around initializer
net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: (near
initialization for ‘e.ip[0]’)

Cheers,
Vinson
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 3/6] net_sched: act: make tcfg_pval non zero

2015-07-05 Thread Eric Dumazet

Thanks guys for the review.

For completeness, I'll add smp_wmb() here :

gact->tcfg_pval= max_t(u16, 1, p_parm->pval);
smp_wmb();
gact->tcfg_ptype   = p_parm->ptype;

And corresponding smp_rmb()

On Fri, Jul 3, 2015 at 12:49 PM, Jamal Hadi Salim  wrote:
> On 07/02/15 09:07, Eric Dumazet wrote:
>>
>> First step for gact RCU operation :
>>
>> Instead of testing if tcfg_pval is zero or not, just make it 1.
>>
>> No change in behavior, but slightly faster code.
>>
>> Signed-off-by: Eric Dumazet 
>
>
> Acked-by: Jamal Hadi Salim 
>
> cheers,
> jamal
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] cdc_ncm: Add support for moving NDP to end of NCM frame

2015-07-05 Thread Enrico Mioso

When sending lots of small packets, this patch will generate an "Unable to 
handle kernel paging request" in the memset call:

ndp16 = memset(ctx->delayed_ndp16, 0, ctx->max_ndp_size);
And I don't know why.
Any comment or suggestion would be greatly apreciated.
This has been reproduced in a QEMU X86 VM, from kernel 4.0.4 to current git.
Thanks,
Enrico Mioso
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-05 Thread Eric Dumazet

> Looks like routing by definition can not divert skbs with
> early-demux socket because input routing is not called.

Only if found socket has a valid sk->sk_rx_dst

Early demux :

1) if TCP lookup found a matching socket, we do the attachment
   skb->sk = sk;
   skb->destructor = sock_edemux

2) If sk->sk_rx_dst is set and still valid, IP routing will use this cached dst.

So it looks very possible that some packets could match a socket but
fail the 2) phase.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Summary lightweight tunnel discussion at NFWS

2015-07-05 Thread roopa


On 7/3/15, 3:00 AM, Thomas Graf wrote:

On 06/18/15 at 09:49pm, Roopa Prabhu wrote:

+#ifdef CONFIG_LWTUNNEL
+   if (fi->fib_nh->nh_lwtstate) {
+   struct lwtunnel_state *lwtstate;
+
+   lwtstate = fi->fib_nh->nh_lwtstate;
+   if (nla_put_u16(skb, RTA_ENCAP_TYPE, lwtstate->type))
+   goto nla_put_failure;
+   lwtunnel_fill_encap(skb, lwtstate);
+   }
}
+#endif

Misplaced #endif ;-)
Thx. I have fixed this since,...did not realize it came in as part of 
this RFC series.


Other than that I managed to rebase my changes onto yours and it
looks clean.
Glad to know!. thanks Thomas. I had a few more changes (mostly 
cleanup/bug fixes, ipv6 support and mostly earlier feedback from you)

in my local clone, pushed it to my github tree just now.
This also tries to not use CONFIG_LWTUNNEL all over the place. I had it 
that way initially also because of fib struct members
under #ifdef CONFIG_LWTUNNEL. (If we think at a later point that it is 
better to #ifdef CONFIG_LWTUNNEL fib struct members,
I can bring some of that back in). And, Only control path (rtnetlink) 
for ipv6 mpls iptunnels has been tested.




Since we also discussed this a bit at NFWS, I'm enclosing a quick
summary:

  * Overall consensus that a lightweight flow based encapsulation
makes sense.
  * Realization that what we actually want is stackable skb metadata
between layers without over engineering it.
  * Consensus to avoid adding it to skb_shared_info and try to reuse
the skb dst field.
  * New dst_metadata type similar to xfrm_dst which can carry metadata
such as encapsulation instructions/information.
  * Can be made stackable to implement nested encapsulation if needed.
Left out in the beginning to keep it simple.
  * Possible optimization option by putting the dst_metadata into a
per cpu scratch buffer or stack without taking a reference and
only force the reference & allocation when the skb is about to
be queued. The regular fast path should never queue a skb with
dst metadata attached.

Thanks for the summary. this helps.
I have been thinking of moving lwtstate from rtable to struct dst_entry.
I will also look at the dst_metadata.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] xen-netback: remove duplicated function definition

2015-07-05 Thread Li, Liang Z

> Cc: linux-ker...@vger.kernel.org; ian.campb...@citrix.com;
> wei.l...@citrix.com; xen-de...@lists.xenproject.org;
> netdev@vger.kernel.org
> Subject: Re: [PATCH] xen-netback: remove duplicated function definition
> 
> From: Liang Li 
> Date: Sat,  4 Jul 2015 03:33:00 +0800
> 
> > There are two duplicated xenvif_zerocopy_callback() definitions.
> > Remove one of them.
> >
> > Signed-off-by: Liang Li 
> 
> You really need to fix the date on your computer.
> 
> If your date is in the future, as your's is, then your patch appears out-of-
> order in the patchwork patch queue since it is ordered by the
> Date: field in the email.

OK. Thanks for your reminding.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] NET: hamradio: Fix IP over bpq encapsulation.

2015-07-05 Thread Ralf Baechle

Since 1d5da757da860a6916adbf68b09e868062b4b3b8 (ax25: Stop using magic
neighbour cache operations.) any attempt to transmit IP packets over
a bpqether device will result in a message like "Dead loop on virtual
device bpq0, fix it urgently!"

Fix suggested by Eric W. Biederman .

Signed-off-by: Ralf Baechle 
Cc:  # 4.1
---
 drivers/net/hamradio/bpqether.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index 63ff08a..5b54b18 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -483,6 +483,7 @@ static void bpq_setup(struct net_device *dev)
memcpy(dev->dev_addr,  &ax25_defaddr, AX25_ADDR_LEN);
 
dev->flags  = 0;
+   dev->features   = NETIF_F_LLTX; /* Allow recursion */
 
 #if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
dev->header_ops  = &ax25_header_ops;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] bnx2x: Update to FW version 7.12.30

2015-07-05 Thread Yuval Mintz

> The new FW will allow us to utilize some new features in our driver,
> mainly adding vlan filtering offload and vxlan offload support.
> 
> In addition, this fixes several issues:
> 1. Packets from a VF with pvid configured which were sent with a
>different vlan were transmitted instead of being discarded.
> 
> 2. FCoE traffic might not recover after a failue while there's traffic
>to another function.
> 
> Signed-off-by: Yuval Mintz 

Hi, any news about this one?
Thanks, Yuval
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

39 matches

Mail list logo