Re: [PATCH v2 net-next 0/7] net_sched: act: lockless operation
On Mon, Jul 6, 2015 at 8:41 AM, Eric Dumazet wrote: > As mentioned by Alexei last week in Budapest, it is a bit weird > to take a spinlock in order to drop a packet in a tc filter... > Arg, please ignore v2, I messed my git send-email command -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 4/7] net_sched: act_gact: use a separate packet counters for gact_determ()
Second step for gact RCU operation : We want to get rid of the spinlock protecting gact operations. Stats (packets/bytes) will soon be per cpu. gact_determ() would not work without a central packet counter, so lets add it for this mode. Signed-off-by: Eric Dumazet Cc: Alexei Starovoitov Acked-by: Jamal Hadi Salim Acked-by: John Fastabend --- include/net/tc_act/tc_gact.h | 7 --- net/sched/act_gact.c | 4 +++- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h index 9fc9b578908a..592a6bc02b0b 100644 --- a/include/net/tc_act/tc_gact.h +++ b/include/net/tc_act/tc_gact.h @@ -6,9 +6,10 @@ struct tcf_gact { struct tcf_common common; #ifdef CONFIG_GACT_PROB -u16tcfg_ptype; -u16tcfg_pval; -inttcfg_paction; + u16 tcfg_ptype; + u16 tcfg_pval; + int tcfg_paction; + atomic_tpackets; #endif }; #define to_gact(a) \ diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c index 22a3a61aa090..2f9bec584b3f 100644 --- a/net/sched/act_gact.c +++ b/net/sched/act_gact.c @@ -36,8 +36,10 @@ static int gact_net_rand(struct tcf_gact *gact) static int gact_determ(struct tcf_gact *gact) { + u32 pack = atomic_inc_return(&gact->packets); + smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */ - if (gact->tcf_bstats.packets % gact->tcfg_pval) + if (pack % gact->tcfg_pval) return gact->tcf_action; return gact->tcfg_paction; } -- 2.4.3.573.g4eafbef -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 3/7] net_sched: act: make tcfg_pval non zero
First step for gact RCU operation : Instead of testing if tcfg_pval is zero or not, just make it 1. No change in behavior, but slightly faster code. The smp_rmb()/smp_wmb() barriers, while not strictly needed at this stage are added for upcoming spinlock removal. Signed-off-by: Eric Dumazet Acked-by: Alexei Starovoitov Acked-by: Jamal Hadi Salim Acked-by: John Fastabend --- net/sched/act_gact.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c index a4f8af29ee30..22a3a61aa090 100644 --- a/net/sched/act_gact.c +++ b/net/sched/act_gact.c @@ -28,14 +28,16 @@ #ifdef CONFIG_GACT_PROB static int gact_net_rand(struct tcf_gact *gact) { - if (!gact->tcfg_pval || prandom_u32() % gact->tcfg_pval) + smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */ + if (prandom_u32() % gact->tcfg_pval) return gact->tcf_action; return gact->tcfg_paction; } static int gact_determ(struct tcf_gact *gact) { - if (!gact->tcfg_pval || gact->tcf_bstats.packets % gact->tcfg_pval) + smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */ + if (gact->tcf_bstats.packets % gact->tcfg_pval) return gact->tcf_action; return gact->tcfg_paction; } @@ -105,7 +107,11 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla, #ifdef CONFIG_GACT_PROB if (p_parm) { gact->tcfg_paction = p_parm->paction; - gact->tcfg_pval= p_parm->pval; + gact->tcfg_pval= max_t(u16, 1, p_parm->pval); + /* Make sure tcfg_pval is written before tcfg_ptype +* coupled with smp_rmb() in gact_net_rand() & gact_determ() +*/ + smp_wmb(); gact->tcfg_ptype = p_parm->ptype; } #endif -- 2.4.3.573.g4eafbef -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 2/7] net: sched: add percpu stats to actions
Reuse existing percpu infrastructure John Fastabend added for qdisc. This patch adds a new cpustats parameter to tcf_hash_create() and all actions pass false, meaning this patch should have no effect yet. Signed-off-by: Eric Dumazet Cc: Alexei Starovoitov Acked-by: Jamal Hadi Salim Acked-by: John Fastabend --- include/net/act_api.h| 4 +++- net/sched/act_api.c | 44 ++-- net/sched/act_bpf.c | 2 +- net/sched/act_connmark.c | 3 ++- net/sched/act_csum.c | 3 ++- net/sched/act_gact.c | 3 ++- net/sched/act_ipt.c | 2 +- net/sched/act_mirred.c | 3 ++- net/sched/act_nat.c | 3 ++- net/sched/act_pedit.c| 3 ++- net/sched/act_simple.c | 3 ++- net/sched/act_skbedit.c | 3 ++- net/sched/act_vlan.c | 3 ++- 13 files changed, 57 insertions(+), 22 deletions(-) diff --git a/include/net/act_api.h b/include/net/act_api.h index 3ee4c92afd1b..db2063ffd181 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -21,6 +21,8 @@ struct tcf_common { struct gnet_stats_rate_est64tcfc_rate_est; spinlock_t tcfc_lock; struct rcu_head tcfc_rcu; + struct gnet_stats_basic_cpu __percpu *cpu_bstats; + struct gnet_stats_queue __percpu *cpu_qstats; }; #define tcf_head common.tcfc_head #define tcf_index common.tcfc_index @@ -103,7 +105,7 @@ int tcf_hash_release(struct tc_action *a, int bind); u32 tcf_hash_new_index(struct tcf_hashinfo *hinfo); int tcf_hash_check(u32 index, struct tc_action *a, int bind); int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a, - int size, int bind); + int size, int bind, bool cpustats); void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est); void tcf_hash_insert(struct tc_action *a); diff --git a/net/sched/act_api.c b/net/sched/act_api.c index af427a3dbcba..074a32f466f8 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -27,6 +27,15 @@ #include #include +static void free_tcf(struct rcu_head *head) +{ + struct tcf_common *p = container_of(head, struct tcf_common, tcfc_rcu); + + free_percpu(p->cpu_bstats); + free_percpu(p->cpu_qstats); + kfree(p); +} + void tcf_hash_destroy(struct tc_action *a) { struct tcf_common *p = a->priv; @@ -41,7 +50,7 @@ void tcf_hash_destroy(struct tc_action *a) * gen_estimator est_timer() might access p->tcfc_lock * or bstats, wait a RCU grace period before freeing p */ - kfree_rcu(p, tcfc_rcu); + call_rcu(&p->tcfc_rcu, free_tcf); } EXPORT_SYMBOL(tcf_hash_destroy); @@ -230,15 +239,16 @@ void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est) if (est) gen_kill_estimator(&pc->tcfc_bstats, &pc->tcfc_rate_est); - kfree_rcu(pc, tcfc_rcu); + call_rcu(&pc->tcfc_rcu, free_tcf); } EXPORT_SYMBOL(tcf_hash_cleanup); int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a, - int size, int bind) + int size, int bind, bool cpustats) { struct tcf_hashinfo *hinfo = a->ops->hinfo; struct tcf_common *p = kzalloc(size, GFP_KERNEL); + int err = -ENOMEM; if (unlikely(!p)) return -ENOMEM; @@ -246,18 +256,32 @@ int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a, if (bind) p->tcfc_bindcnt = 1; + if (cpustats) { + p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu); + if (!p->cpu_bstats) { +err1: + kfree(p); + return err; + } + p->cpu_qstats = alloc_percpu(struct gnet_stats_queue); + if (!p->cpu_qstats) { +err2: + free_percpu(p->cpu_bstats); + goto err1; + } + } spin_lock_init(&p->tcfc_lock); INIT_HLIST_NODE(&p->tcfc_head); p->tcfc_index = index ? index : tcf_hash_new_index(hinfo); p->tcfc_tm.install = jiffies; p->tcfc_tm.lastuse = jiffies; if (est) { - int err = gen_new_estimator(&p->tcfc_bstats, NULL, - &p->tcfc_rate_est, - &p->tcfc_lock, est); + err = gen_new_estimator(&p->tcfc_bstats, p->cpu_bstats, + &p->tcfc_rate_est, + &p->tcfc_lock, est); if (err) { - kfree(p); - return err; + free_percpu(p->cpu_qstats); + goto err2; } } @@ -615,10 +639,10 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *a, if (err < 0) goto err
[PATCH v2 net-next 3/7] net_sched: act_gact: make tcfg_pval non zero
First step for gact RCU operation : Instead of testing if tcfg_pval is zero or not, just make it 1. No change in behavior, but slightly faster code. The smp_rmb()/smp_wmb() barriers, while not strictly needed at this stage are added for upcoming spinlock removal. Signed-off-by: Eric Dumazet Acked-by: Alexei Starovoitov Acked-by: Jamal Hadi Salim Acked-by: John Fastabend --- net/sched/act_gact.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c index a4f8af29ee30..22a3a61aa090 100644 --- a/net/sched/act_gact.c +++ b/net/sched/act_gact.c @@ -28,14 +28,16 @@ #ifdef CONFIG_GACT_PROB static int gact_net_rand(struct tcf_gact *gact) { - if (!gact->tcfg_pval || prandom_u32() % gact->tcfg_pval) + smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */ + if (prandom_u32() % gact->tcfg_pval) return gact->tcf_action; return gact->tcfg_paction; } static int gact_determ(struct tcf_gact *gact) { - if (!gact->tcfg_pval || gact->tcf_bstats.packets % gact->tcfg_pval) + smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */ + if (gact->tcf_bstats.packets % gact->tcfg_pval) return gact->tcf_action; return gact->tcfg_paction; } @@ -105,7 +107,11 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla, #ifdef CONFIG_GACT_PROB if (p_parm) { gact->tcfg_paction = p_parm->paction; - gact->tcfg_pval= p_parm->pval; + gact->tcfg_pval= max_t(u16, 1, p_parm->pval); + /* Make sure tcfg_pval is written before tcfg_ptype +* coupled with smp_rmb() in gact_net_rand() & gact_determ() +*/ + smp_wmb(); gact->tcfg_ptype = p_parm->ptype; } #endif -- 2.4.3.573.g4eafbef -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 1/7] net: sched: extend percpu stats helpers
qdisc_bstats_update_cpu() and other helpers were added to support percpu stats for qdisc. We want to add percpu stats for tc action, so this patch add common helpers. qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update() qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop() Signed-off-by: Eric Dumazet Cc: Alexei Starovoitov Acked-by: Jamal Hadi Salim Acked-by: John Fastabend --- include/net/sch_generic.h | 31 +-- net/core/dev.c| 4 ++-- 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 2738f6f87908..2eab08c38e32 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -513,17 +513,20 @@ static inline void bstats_update(struct gnet_stats_basic_packed *bstats, bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1; } -static inline void qdisc_bstats_update_cpu(struct Qdisc *sch, - const struct sk_buff *skb) +static inline void bstats_cpu_update(struct gnet_stats_basic_cpu *bstats, +const struct sk_buff *skb) { - struct gnet_stats_basic_cpu *bstats = - this_cpu_ptr(sch->cpu_bstats); - u64_stats_update_begin(&bstats->syncp); bstats_update(&bstats->bstats, skb); u64_stats_update_end(&bstats->syncp); } +static inline void qdisc_bstats_cpu_update(struct Qdisc *sch, + const struct sk_buff *skb) +{ + bstats_cpu_update(this_cpu_ptr(sch->cpu_bstats), skb); +} + static inline void qdisc_bstats_update(struct Qdisc *sch, const struct sk_buff *skb) { @@ -547,16 +550,24 @@ static inline void __qdisc_qstats_drop(struct Qdisc *sch, int count) sch->qstats.drops += count; } -static inline void qdisc_qstats_drop(struct Qdisc *sch) +static inline void qstats_drop_inc(struct gnet_stats_queue *qstats) { - sch->qstats.drops++; + qstats->drops++; } -static inline void qdisc_qstats_drop_cpu(struct Qdisc *sch) +static inline void qstats_overlimit_inc(struct gnet_stats_queue *qstats) { - struct gnet_stats_queue *qstats = this_cpu_ptr(sch->cpu_qstats); + qstats->overlimits++; +} - qstats->drops++; +static inline void qdisc_qstats_drop(struct Qdisc *sch) +{ + qstats_drop_inc(&sch->qstats); +} + +static inline void qdisc_qstats_cpu_drop(struct Qdisc *sch) +{ + qstats_drop_inc(this_cpu_ptr(sch->cpu_qstats)); } static inline void qdisc_qstats_overlimit(struct Qdisc *sch) diff --git a/net/core/dev.c b/net/core/dev.c index 6778ad52..e0d270143fc7 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3646,7 +3646,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, qdisc_skb_cb(skb)->pkt_len = skb->len; skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS); - qdisc_bstats_update_cpu(cl->q, skb); + qdisc_bstats_cpu_update(cl->q, skb); switch (tc_classify(skb, cl, &cl_res)) { case TC_ACT_OK: @@ -3654,7 +3654,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, skb->tc_index = TC_H_MIN(cl_res.classid); break; case TC_ACT_SHOT: - qdisc_qstats_drop_cpu(cl->q); + qdisc_qstats_cpu_drop(cl->q); case TC_ACT_STOLEN: case TC_ACT_QUEUED: kfree_skb(skb); -- 2.4.3.573.g4eafbef -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 0/7] net_sched: act: lockless operation
As mentioned by Alexei last week in Budapest, it is a bit weird to take a spinlock in order to drop a packet in a tc filter... Lets add percpu infra for tc actions and use it for gact & mirred. Before changes, my host with 8 RX queues was handling 5 Mpps with gact, and more than 11 Mpps after. Mirred change is not yet visible if ifb+qdisc is used, as ifb is not yet multi queue enabled, but is a step forward. Signed-off-by: Eric Dumazet Cc: Alexei Starovoitov Cc: Jamal Hadi Salim Cc: John Fastabend Eric Dumazet (7): net: sched: extend percpu stats helpers net: sched: add percpu stats to actions net_sched: act_gact: make tcfg_pval non zero net_sched: act_gact: use a separate packet counters for gact_determ() net_sched: act_gact: read tcfg_ptype once net_sched: act_gact: remove spinlock in fast path net_sched: act_mirred: remove spinlock in fast path include/net/act_api.h | 15 ++- include/net/sch_generic.h | 31 ++ include/net/tc_act/tc_gact.h | 7 ++--- include/net/tc_act/tc_mirred.h | 2 +- net/core/dev.c | 4 +-- net/sched/act_api.c| 44 net/sched/act_bpf.c| 2 +- net/sched/act_connmark.c | 3 ++- net/sched/act_csum.c | 3 ++- net/sched/act_gact.c | 44 ++-- net/sched/act_ipt.c| 2 +- net/sched/act_mirred.c | 58 ++ net/sched/act_nat.c| 3 ++- net/sched/act_pedit.c | 3 ++- net/sched/act_simple.c | 3 ++- net/sched/act_skbedit.c| 3 ++- net/sched/act_vlan.c | 3 ++- 17 files changed, 148 insertions(+), 82 deletions(-) -- 2.4.3.573.g4eafbef -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: macb: Add SG support for Zynq SOC family
Enable SG support for Zynq SOC family devices. Signed-off-by: Punnaiah Choudary Kalluri --- drivers/net/ethernet/cadence/macb.c |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index caeb395..a4e3f86 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -2741,8 +2741,7 @@ static const struct macb_config emac_config = { static const struct macb_config zynqmp_config = { - .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE | - MACB_CAPS_JUMBO, + .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO, .dma_burst_length = 16, .clk_init = macb_clk_init, .init = macb_init, @@ -2750,8 +2749,7 @@ static const struct macb_config zynqmp_config = { }; static const struct macb_config zynq_config = { - .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE | - MACB_CAPS_NO_GIGABIT_HALF, + .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_NO_GIGABIT_HALF, .dma_burst_length = 16, .clk_init = macb_clk_init, .init = macb_init, -- 1.7.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net-next] ipvs: skb_orphan in case of forwarding
On Sun, Jul 5, 2015 at 8:50 PM, Simon Horman wrote: > Is it possible to get a 'Fixes:' tag? I suppose it'd be appropriate to say Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") As that is what introduces tcp early_demux, but that's just a guess as I haven't bisected it (not even sure my test would run on that code base). -- Alex Gartrell -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net-next] ipvs: skb_orphan in case of forwarding
On Sun, Jul 05, 2015 at 03:19:27PM -0700, Alex Gartrell wrote: > On Sun, Jul 5, 2015 at 3:13 PM, Julian Anastasov wrote: > > May be the patch fixes crashes? If yes, Simon > > should apply it for ipvs/net tree, otherwise after > > the merge window... > > Yeah this is definitely a crash-fix and it's existed since at least 3.10. Is it possible to get a 'Fixes:' tag? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: sysctl to restrict candidate source addresses
Reworked with "use_oif_addr". Thanks, -Erik On 3 July 2015 at 16:03, YOSHIFUJI Hideaki wrote: > Hi, > > Erik Kline wrote: >> Per RFC 6724, section 4, "Candidate Source Addresses": >> >> It is RECOMMENDED that the candidate source addresses be the set >> of unicast addresses assigned to the interface that will be used >> to send to the destination (the "outgoing" interface). >> >> Add a sysctl to enable this behaviour. >> >> Signed-off-by: Erik Kline >> --- >> Documentation/networking/ip-sysctl.txt | 12 >> include/linux/ipv6.h | 1 + >> include/uapi/linux/ipv6.h | 1 + >> net/ipv6/addrconf.c| 30 +- >> 4 files changed, 39 insertions(+), 5 deletions(-) >> >> diff --git a/Documentation/networking/ip-sysctl.txt >> b/Documentation/networking/ip-sysctl.txt >> index 5fae770..d8f3e60 100644 >> --- a/Documentation/networking/ip-sysctl.txt >> +++ b/Documentation/networking/ip-sysctl.txt >> @@ -1435,6 +1435,18 @@ mtu - INTEGER >> Default Maximum Transfer Unit >> Default: 1280 (IPv6 required minimum) >> >> +restrict_srcaddr - INTEGER >> + Restrict candidate source addresses (vis. RFC 6724, section 4). >> + >> + When set to 1, the candidate source addresses for destinations >> + routed via this interface are restricted to the set of addresses >> + configured on this interface. >> + >> + Possible values are: >> + 0 : no source address restrictions >> + 1 : require matching outgoing interface >> + Default: 0 >> + > > I cannot get what "restrict" restricts. How about "use_oif_addr" or > something like that (like use_tempaddr)? > > -- > Hideaki Yoshifuji > Technical Division, MIRACLE LINUX CORPORATION -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] ipv6: sysctl to restrict candidate source addresses
Per RFC 6724, section 4, "Candidate Source Addresses": It is RECOMMENDED that the candidate source addresses be the set of unicast addresses assigned to the interface that will be used to send to the destination (the "outgoing" interface). Add a sysctl to enable this behaviour. Signed-off-by: Erik Kline --- Documentation/networking/ip-sysctl.txt | 7 +++ include/linux/ipv6.h | 1 + include/uapi/linux/ipv6.h | 1 + net/ipv6/addrconf.c| 30 +- 4 files changed, 34 insertions(+), 5 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 5fae770..c3bf04d 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1455,6 +1455,13 @@ router_solicitations - INTEGER routers are present. Default: 3 +use_oif_addr - BOOLEAN + When enabled, the candidate source addresses for destinations + routed via this interface are restricted to the set of addresses + configured on this interface (vis. RFC 6724, section 4). + + Default: false + use_tempaddr - INTEGER Preference for Privacy Extensions (RFC3041). <= 0 : disable Privacy Extensions diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 82806c6..4633c88 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -57,6 +57,7 @@ struct ipv6_devconf { bool initialized; struct in6_addr secret; } stable_secret; + __s32 use_oif_addr; void*sysctl; }; diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h index 5efa54a..cf9d65a 100644 --- a/include/uapi/linux/ipv6.h +++ b/include/uapi/linux/ipv6.h @@ -171,6 +171,7 @@ enum { DEVCONF_USE_OPTIMISTIC, DEVCONF_ACCEPT_RA_MTU, DEVCONF_STABLE_SECRET, + DEVCONF_USE_OIF_ADDR, DEVCONF_MAX }; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 21c2c81..a43687d 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -211,7 +211,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { .accept_ra_mtu = 1, .stable_secret = { .initialized = false, - } + }, + .use_oif_addr = 0, }; static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { @@ -253,6 +254,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .stable_secret = { .initialized = false, }, + .use_oif_addr = 0, }; /* Check if a valid qdisc is available */ @@ -1366,7 +1368,8 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, *score = &scores[0], *hiscore = &scores[1]; struct ipv6_saddr_dst dst; struct net_device *dev; - int dst_type; + struct inet6_dev *idev; + int dst_type, use_oif_addr = 0; dst_type = __ipv6_addr_type(daddr); dst.addr = daddr; @@ -1380,9 +1383,12 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, rcu_read_lock(); - for_each_netdev_rcu(net, dev) { - struct inet6_dev *idev; + if (dst_dev) { + idev = __in6_dev_get(dst_dev); + use_oif_addr = (idev) ? idev->cnf.use_oif_addr : 0; + } + for_each_netdev_rcu(net, dev) { /* Candidate Source Address (section 4) * - multicast and link-local destination address, *the set of candidate source address MUST only @@ -1394,9 +1400,14 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, *include addresses assigned to interfaces *belonging to the same site as the outgoing *interface.) +* - "It is RECOMMENDED that the candidate source addresses +*be the set of unicast addresses assigned to the +*interface that will be used to send to the destination +*(the 'outgoing' interface)." (RFC 6724) */ if (((dst_type & IPV6_ADDR_MULTICAST) || -dst.scope <= IPV6_ADDR_SCOPE_LINKLOCAL) && +dst.scope <= IPV6_ADDR_SCOPE_LINKLOCAL || +use_oif_addr) && dst.ifindex && dev->ifindex != dst.ifindex) continue; @@ -4586,6 +4597,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, array[DEVCONF_ACCEPT_RA_FROM_LOCAL] = cnf->accept_ra_from_local; array[DEVCONF_ACCEPT_RA_MTU] = cnf->accept_ra_mtu; /* we omit DEVCONF_STABLE_SECRET for now */ + array[DEVCONF_USE_OIF_ADDR] = cnf->use_oif_addr; } static inline size_t inet6_ifla6_size(void) @@ -5585,6 +5597,14 @@ static struct
[PATCH v3 3/3] net: dsa: mv88e6xxx: add switchdev VLAN operations
This commit implements the switchdev operations to add, delete and dump VLANs for the Marvell 88E6352 and compatible switch chips. This allows to access the switch VLAN Table Unit from standard userspace commands such as "bridge vlan". A configuration like "1t 2t 3t 4u" for VLAN 10 is achieved like this: # bridge vlan add dev swp1 vid 10 master # bridge vlan add dev swp2 vid 10 master # bridge vlan add dev swp3 vid 10 master # bridge vlan add dev swp4 vid 10 master untagged pvid This calls port_vlan_add() for each command. Removing the port 3 from VLAN 10 is done with: # bridge vlan del dev swp3 vid 10 This calls port_vlan_del() for port 3. Dumping VLANs is done with: # bridge vlan show portvlan ids swp0None swp0 swp1 10 swp1 10 swp2 10 swp2 10 swp3None swp3 swp4 10 PVID Egress Untagged swp4 10 PVID Egress Untagged br0 None This calls port_vlan_dump() for each ports. Signed-off-by: Vivien Didelot --- drivers/net/dsa/mv88e6123_61_65.c | 3 + drivers/net/dsa/mv88e6131.c | 3 + drivers/net/dsa/mv88e6171.c | 3 + drivers/net/dsa/mv88e6352.c | 3 + drivers/net/dsa/mv88e6xxx.c | 152 ++ drivers/net/dsa/mv88e6xxx.h | 5 ++ 6 files changed, 169 insertions(+) diff --git a/drivers/net/dsa/mv88e6123_61_65.c b/drivers/net/dsa/mv88e6123_61_65.c index 71a29a7..8e679ff 100644 --- a/drivers/net/dsa/mv88e6123_61_65.c +++ b/drivers/net/dsa/mv88e6123_61_65.c @@ -134,6 +134,9 @@ struct dsa_switch_driver mv88e6123_61_65_switch_driver = { #endif .get_regs_len = mv88e6xxx_get_regs_len, .get_regs = mv88e6xxx_get_regs, + .port_vlan_add = mv88e6xxx_port_vlan_add, + .port_vlan_del = mv88e6xxx_port_vlan_del, + .port_vlan_dump = mv88e6xxx_port_vlan_dump, }; MODULE_ALIAS("platform:mv88e6123"); diff --git a/drivers/net/dsa/mv88e6131.c b/drivers/net/dsa/mv88e6131.c index 32f4a08..c4d914b 100644 --- a/drivers/net/dsa/mv88e6131.c +++ b/drivers/net/dsa/mv88e6131.c @@ -182,6 +182,9 @@ struct dsa_switch_driver mv88e6131_switch_driver = { .get_strings= mv88e6xxx_get_strings, .get_ethtool_stats = mv88e6xxx_get_ethtool_stats, .get_sset_count = mv88e6xxx_get_sset_count, + .port_vlan_add = mv88e6xxx_port_vlan_add, + .port_vlan_del = mv88e6xxx_port_vlan_del, + .port_vlan_dump = mv88e6xxx_port_vlan_dump, }; MODULE_ALIAS("platform:mv88e6085"); diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c index 1c78084..7701ce6 100644 --- a/drivers/net/dsa/mv88e6171.c +++ b/drivers/net/dsa/mv88e6171.c @@ -119,6 +119,9 @@ struct dsa_switch_driver mv88e6171_switch_driver = { .fdb_add= mv88e6xxx_port_fdb_add, .fdb_del= mv88e6xxx_port_fdb_del, .fdb_getnext= mv88e6xxx_port_fdb_getnext, + .port_vlan_add = mv88e6xxx_port_vlan_add, + .port_vlan_del = mv88e6xxx_port_vlan_del, + .port_vlan_dump = mv88e6xxx_port_vlan_dump, }; MODULE_ALIAS("platform:mv88e6171"); diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c index 632815c..b981be4a 100644 --- a/drivers/net/dsa/mv88e6352.c +++ b/drivers/net/dsa/mv88e6352.c @@ -392,6 +392,9 @@ struct dsa_switch_driver mv88e6352_switch_driver = { .fdb_add= mv88e6xxx_port_fdb_add, .fdb_del= mv88e6xxx_port_fdb_del, .fdb_getnext= mv88e6xxx_port_fdb_getnext, + .port_vlan_add = mv88e6xxx_port_vlan_add, + .port_vlan_del = mv88e6xxx_port_vlan_del, + .port_vlan_dump = mv88e6xxx_port_vlan_dump, }; MODULE_ALIAS("platform:mv88e6352"); diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index ffd9fc6..d5812ba 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -1544,6 +1544,158 @@ static int _mv88e6xxx_vtu_loadpurge(struct dsa_switch *ds, return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_VTU_LOAD_PURGE); } +int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port, u16 vid, + u16 bridge_flags) +{ + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + struct mv88e6xxx_vtu_entry entry = { 0 }; + int prev_vid = vid ? vid - 1 : 0xfff; + int i, ret; + + mutex_lock(&ps->smi_mutex); + ret = _mv88e6xxx_vtu_getnext(ds, prev_vid, &entry); + if (ret < 0) + goto unlock; + + /* If the VLAN does not exist, re-initialize the entry for addition */ + if (entry.vid != vid || !entry.valid) { + memset(&entry, 0, sizeof(entry)); + entry.valid = true; + entry.vid = vid; + entry.fid = vid; /* We use one FID per VLAN at the
[PATCH v3 2/3] net: dsa: add support for switchdev VLAN objects
This patch adds the glue between DSA and switchdev operations to add, delete and dump SWITCHDEV_OBJ_PORT_VLAN objects. This is a first step to link the "bridge vlan" command with hardware entries for DSA compatible switch chips. Signed-off-by: Vivien Didelot --- include/net/dsa.h | 9 net/dsa/dsa_priv.h | 6 +++ net/dsa/slave.c| 137 + 3 files changed, 152 insertions(+) diff --git a/include/net/dsa.h b/include/net/dsa.h index fbca63b..cabf2a5 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -302,6 +302,15 @@ struct dsa_switch_driver { const unsigned char *addr, u16 vid); int (*fdb_getnext)(struct dsa_switch *ds, int port, unsigned char *addr, bool *is_static); + + /* +* VLAN support +*/ + int (*port_vlan_add)(struct dsa_switch *ds, int port, u16 vid, +u16 bridge_flags); + int (*port_vlan_del)(struct dsa_switch *ds, int port, u16 vid); + int (*port_vlan_dump)(struct dsa_switch *ds, int port, u16 vid, + u16 *bridge_flags); }; void register_switch_driver(struct dsa_switch_driver *type); diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index d5f1f9b..9029717 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -13,6 +13,7 @@ #include #include +#include struct dsa_device_ops { netdev_tx_t (*xmit)(struct sk_buff *skb, struct net_device *dev); @@ -47,6 +48,11 @@ struct dsa_slave_priv { int old_duplex; struct net_device *bridge_dev; + + /* +* Which VLANs this port is a member of. +*/ + DECLARE_BITMAP(vlan_bitmap, VLAN_N_VID); }; /* dsa.c */ diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 04ffad3..47c459b 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "dsa_priv.h" /* slave mii_bus handling ***/ @@ -363,6 +364,136 @@ static int dsa_slave_port_attr_set(struct net_device *dev, return ret; } +static int dsa_slave_port_vlans_add(struct net_device *dev, +struct switchdev_obj *obj) +{ + struct switchdev_obj_vlan *vlan = &obj->u.vlan; + struct dsa_slave_priv *p = netdev_priv(dev); + struct dsa_switch *ds = p->parent; + int vid, err = 0; + + if (!ds->drv->port_vlan_add) + return -EOPNOTSUPP; + + for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) { + err = ds->drv->port_vlan_add(ds, p->port, vid, vlan->flags); + if (err) + break; + set_bit(vid, p->vlan_bitmap); + } + + return err; +} + +static int dsa_slave_port_obj_add(struct net_device *dev, + struct switchdev_obj *obj) +{ + int err; + + /* +* Skip the prepare phase, since currently the DSA drivers don't need to +* allocate any memory for operations and they will not fail to HW +* (unless something horrible goes wrong on the MDIO bus, in which case +* the prepare phase wouldn't have been able to predict anyway). +*/ + if (obj->trans != SWITCHDEV_TRANS_COMMIT) + return 0; + + switch (obj->id) { + case SWITCHDEV_OBJ_PORT_VLAN: + err = dsa_slave_port_vlans_add(dev, obj); + break; + default: + err = -EOPNOTSUPP; + break; + } + + return err; +} + +static int dsa_slave_port_vlans_del(struct net_device *dev, +struct switchdev_obj *obj) +{ + struct switchdev_obj_vlan *vlan = &obj->u.vlan; + struct dsa_slave_priv *p = netdev_priv(dev); + struct dsa_switch *ds = p->parent; + int vid, err = 0; + + if (!ds->drv->port_vlan_del) + return -EOPNOTSUPP; + + for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) { + err = ds->drv->port_vlan_del(ds, p->port, vid); + if (err) + break; + clear_bit(vid, p->vlan_bitmap); + } + + return err; +} + +static int dsa_slave_port_obj_del(struct net_device *dev, + struct switchdev_obj *obj) +{ + int err; + + switch (obj->id) { + case SWITCHDEV_OBJ_PORT_VLAN: + err = dsa_slave_port_vlans_del(dev, obj); + break; + default: + err = -EOPNOTSUPP; + break; + } + + return err; +} + +static int dsa_slave_port_vlans_dump(struct net_device *dev, +struct switchdev_obj *obj) +{ + struct switchdev_obj_vlan *vlan = &obj->u.vlan; + struct dsa_slave_priv *p = netdev_priv(dev); +
[PATCH v3 1/3] net: dsa: mv88e6xxx: add debugfs interface for VTU
Implement the Get Next and Load Purge operations for the VLAN Table Unit, and a "vtu" debugfs file to read and write the hardware VLANs. A populated VTU look like this: # cat /sys/kernel/debug/dsa0/vtu VID FID SID 0 1 2 3 4 5 6 550 5620 x x x u x t x 1000 10120 x x t x x t x 1200 12120 x x t x t t x Where "t", "u", "x", "-", respectively means that the port is tagged, untagged, excluded or unmodified, for a given VLAN entry. VTU entries can be added by echoing the same format: echo 1300 1312 0 x x t x t t x > vtu and can be deleted by echoing only the VID: echo 1000 > vtu Signed-off-by: Vivien Didelot --- drivers/net/dsa/mv88e6xxx.c | 311 drivers/net/dsa/mv88e6xxx.h | 24 2 files changed, 335 insertions(+) diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index 8c130c0..ffd9fc6 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -2,6 +2,9 @@ * net/dsa/mv88e6xxx.c - Marvell 88e6xxx switch chip support * Copyright (c) 2008 Marvell Semiconductor * + * Copyright (c) 2015 CMC Electronics, Inc. + * Added support for 802.1Q VLAN Table Unit + * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or @@ -1366,6 +1369,181 @@ static void mv88e6xxx_bridge_work(struct work_struct *work) } } +static int _mv88e6xxx_vtu_wait(struct dsa_switch *ds) +{ + return _mv88e6xxx_wait(ds, REG_GLOBAL, GLOBAL_VTU_OP, + GLOBAL_VTU_OP_BUSY); +} + +static int _mv88e6xxx_vtu_cmd(struct dsa_switch *ds, u16 op) +{ + int ret; + + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_OP, op); + if (ret < 0) + return ret; + + return _mv88e6xxx_vtu_wait(ds); +} + +static int _mv88e6xxx_stu_loadpurge(struct dsa_switch *ds, u8 sid, bool valid) +{ + int ret, data; + + ret = _mv88e6xxx_vtu_wait(ds); + if (ret < 0) + return ret; + + data = sid & GLOBAL_VTU_SID_MASK; + if (valid) + data |= GLOBAL_VTU_VID_VALID; + + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_VID, data); + if (ret < 0) + return ret; + + /* Unused (yet) data registers */ + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_DATA_0_3, 0); + if (ret < 0) + return ret; + + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_DATA_4_7, 0); + if (ret < 0) + return ret; + + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_DATA_8_11, 0); + if (ret < 0) + return ret; + + return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_STU_LOAD_PURGE); +} + +static int _mv88e6xxx_vtu_getnext(struct dsa_switch *ds, u16 vid, + struct mv88e6xxx_vtu_entry *entry) +{ + int ret, i; + + ret = _mv88e6xxx_vtu_wait(ds); + if (ret < 0) + return ret; + + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_VID, + vid & GLOBAL_VTU_VID_MASK); + if (ret < 0) + return ret; + + ret = _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_VTU_GET_NEXT); + if (ret < 0) + return ret; + + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_VID); + if (ret < 0) + return ret; + + entry->vid = ret & GLOBAL_VTU_VID_MASK; + entry->valid = !!(ret & GLOBAL_VTU_VID_VALID); + + if (entry->valid) { + /* Ports 0-3, offsets 0, 4, 8, 12 */ + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_DATA_0_3); + if (ret < 0) + return ret; + + for (i = 0; i < 4; ++i) + entry->tags[i] = (ret >> (i * 4)) & 3; + + /* Ports 4-6, offsets 0, 4, 8 */ + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_DATA_4_7); + if (ret < 0) + return ret; + + for (i = 4; i < 7; ++i) + entry->tags[i] = (ret >> ((i - 4) * 4)) & 3; + + if (mv88e6xxx_6097_family(ds) || mv88e6xxx_6165_family(ds) || + mv88e6xxx_6351_family(ds) || mv88e6xxx_6352_family(ds)) { + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, + GLOBAL_VTU_FID); + if (ret < 0) + return ret; + + entry->fid = ret & GLOBAL_VTU_FID_MASK; + + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, + GLOBAL_VTU_SID); + if (ret < 0) + return ret; + +
[PATCH v3 0/3] net: dsa: mv88e6xxx: add support for VLAN Table Unit
Hi all, This patchset brings full support for hardware VLANs in DSA, and the Marvell 88E6xxx compatible switch chips. The first patch adds the VTU operations to the mv88e6xxx code, as well as a "vtu" debugfs file to read and modify the hardware VLAN table. The second patch adds the glue between DSA and the switchdev VLAN objects. The third patch finally implements the necessary functions in the mv88e6xxx code to interact with the hardware VLAN through switchdev, from userspace commands such as "bridge vlan". Below is an example of what can be done with this patchset. "VID 550: 1t 3u" "VID 1000: 2t" "VID 1200: 2t 4t" The VLAN setup above can be achieved with the following bridge commands: bridge vlan add vid 550 dev swp1 master bridge vlan add vid 550 dev swp3 master untagged pvid bridge vlan add vid 1000 dev swp2 master bridge vlan add vid 1200 dev swp2 master bridge vlan add vid 1200 dev swp4 master Removing the port 1 from VLAN 550 is done with: bridge vlan del vid 550 dev swp1 The bridge command would output the following setup: # bridge vlan portvlan ids swp0None swp0 swp1None swp1 swp21000 1200 swp21000 1200 swp3550 PVID Egress Untagged swp3550 PVID Egress Untagged swp41200 swp41200 br0 None Assuming that swp5 is the CPU port, the "vtu" debugfs file would show: # cat /sys/kernel/debug/dsa0/vtu VID FID SID 0 1 2 3 4 5 6 550 550 0x x x u x t x 1000 1000 0x x t x x t x 1200 1200 0x x t x t t x Cheers, -v Vivien Didelot (3): net: dsa: mv88e6xxx: add debugfs interface for VTU net: dsa: add support for switchdev VLAN objects net: dsa: mv88e6xxx: add switchdev VLAN operations drivers/net/dsa/mv88e6123_61_65.c | 3 + drivers/net/dsa/mv88e6131.c | 3 + drivers/net/dsa/mv88e6171.c | 3 + drivers/net/dsa/mv88e6352.c | 3 + drivers/net/dsa/mv88e6xxx.c | 463 ++ drivers/net/dsa/mv88e6xxx.h | 29 +++ include/net/dsa.h | 9 + net/dsa/dsa_priv.h| 6 + net/dsa/slave.c | 137 +++ 9 files changed, 656 insertions(+) -- 2.4.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND] xen-netback: remove duplicated function definition
There are two duplicated xenvif_zerocopy_callback() definitions. Remove one of them. Signed-off-by: Liang Li --- drivers/net/xen-netback/common.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 8a495b3..c6cb85a 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -325,9 +325,6 @@ static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue) queue->pending_prod + queue->pending_cons; } -/* Callback from stack when TX packet can be released */ -void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success); - irqreturn_t xenvif_interrupt(int irq, void *dev_id); extern bool separate_tx_rx_irq; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net-next] ipvs: skb_orphan in case of forwarding
On Sun, Jul 5, 2015 at 3:13 PM, Julian Anastasov wrote: > May be the patch fixes crashes? If yes, Simon > should apply it for ipvs/net tree, otherwise after > the merge window... Yeah this is definitely a crash-fix and it's existed since at least 3.10. -- Alex Gartrell -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net-next] ipvs: skb_orphan in case of forwarding
Hello, On Sun, 5 Jul 2015, Alex Gartrell wrote: > It is possible that we bind against a local socket in early_demux when we > are actually going to want to forward it. In this case, the socket serves > no purpose and only serves to confuse things (particularly functions which > implicitly expect sk_fullsock to be true, like ip_local_out). > Additionally, skb_set_owner_w is totally broken for non full-socks. > > Signed-off-by: Alex Gartrell Thanks for fixing this problem! Acked-by: Julian Anastasov May be the patch fixes crashes? If yes, Simon should apply it for ipvs/net tree, otherwise after the merge window... > --- > net/netfilter/ipvs/ip_vs_xmit.c | 27 +++ > 1 file changed, 27 insertions(+) > > diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c > index bf66a86..99d4a41 100644 > --- a/net/netfilter/ipvs/ip_vs_xmit.c > +++ b/net/netfilter/ipvs/ip_vs_xmit.c > @@ -527,6 +527,21 @@ static inline int ip_vs_tunnel_xmit_prepare(struct > sk_buff *skb, > return ret; > } > > +/* In the event of a remote destination, it's possible that we would have > + * matches against an old socket (particularly a TIME-WAIT socket). This > + * causes havoc down the line (ip_local_out et. al. expect regular sockets > + * and invalid memory accesses will happen) so simply drop the association > + * in this case. > +*/ > +static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb) > +{ > + /* If dev is set, the packet came from the LOCAL_IN callback and > + * not from a local TCP socket. > + */ > + if (skb->dev) > + skb_orphan(skb); > +} > + > /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */ > static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb, >struct ip_vs_conn *cp, int local) > @@ -538,12 +553,21 @@ static inline int ip_vs_nat_send_or_cont(int pf, struct > sk_buff *skb, > ip_vs_notrack(skb); > else > ip_vs_update_conntrack(skb, cp, 1); > + > + /* Remove the early_demux association unless it's bound for the > + * exact same port and address on this host after translation. > + */ > + if (!local || cp->vport != cp->dport || > + !ip_vs_addr_equal(cp->af, &cp->vaddr, &cp->daddr)) > + ip_vs_drop_early_demux_sk(skb); > + > if (!local) { > skb_forward_csum(skb); > NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb, > NULL, skb_dst(skb)->dev, dst_output_sk); > } else > ret = NF_ACCEPT; > + > return ret; > } > > @@ -557,6 +581,7 @@ static inline int ip_vs_send_or_cont(int pf, struct > sk_buff *skb, > if (likely(!(cp->flags & IP_VS_CONN_F_NFCT))) > ip_vs_notrack(skb); > if (!local) { > + ip_vs_drop_early_demux_sk(skb); > skb_forward_csum(skb); > NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb, > NULL, skb_dst(skb)->dev, dst_output_sk); > @@ -845,6 +870,8 @@ ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int > skb_af, > struct ipv6hdr *old_ipv6h = NULL; > #endif > > + ip_vs_drop_early_demux_sk(skb); > + > if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) { > new_skb = skb_realloc_headroom(skb, max_headroom); > if (!new_skb) > -- > Alex Gartrell Regards -- Julian Anastasov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] mlx4: TCP/UDP packets have L4 hash
On 7/6/2015 12:33 AM, Eric Dumazet wrote: On Mon, 2015-07-06 at 00:16 +0300, Ido Shamay wrote: We can have a relaxation of the condition by looking only at TCP/UDP CQE indication (without check-sum indications) This can cover us also when device rx-checksuming feature is off. Do we want it or a correlation between check-sum and l4_hash is needed? I thought about that, but this was adding a more complex test in fast path. Not sure we should care here, as nobody would disable hardware checksum if they care about performance. I agree, thank you Eric Acked-by: Ido Shamay -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] mlx4: TCP/UDP packets have L4 hash
On Mon, 2015-07-06 at 00:16 +0300, Ido Shamay wrote: > We can have a relaxation of the condition by looking only at TCP/UDP > CQE indication (without check-sum indications) > This can cover us also when device rx-checksuming feature is off. > Do we want it or a correlation between check-sum and l4_hash is needed? I thought about that, but this was adding a more complex test in fast path. Not sure we should care here, as nobody would disable hardware checksum if they care about performance. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH,v2 net-next] ipvs: skb_orphan in case of forwarding
It is possible that we bind against a local socket in early_demux when we are actually going to want to forward it. In this case, the socket serves no purpose and only serves to confuse things (particularly functions which implicitly expect sk_fullsock to be true, like ip_local_out). Additionally, skb_set_owner_w is totally broken for non full-socks. Signed-off-by: Alex Gartrell --- net/netfilter/ipvs/ip_vs_xmit.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index bf66a86..99d4a41 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -527,6 +527,21 @@ static inline int ip_vs_tunnel_xmit_prepare(struct sk_buff *skb, return ret; } +/* In the event of a remote destination, it's possible that we would have + * matches against an old socket (particularly a TIME-WAIT socket). This + * causes havoc down the line (ip_local_out et. al. expect regular sockets + * and invalid memory accesses will happen) so simply drop the association + * in this case. +*/ +static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb) +{ + /* If dev is set, the packet came from the LOCAL_IN callback and +* not from a local TCP socket. +*/ + if (skb->dev) + skb_orphan(skb); +} + /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */ static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb, struct ip_vs_conn *cp, int local) @@ -538,12 +553,21 @@ static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb, ip_vs_notrack(skb); else ip_vs_update_conntrack(skb, cp, 1); + + /* Remove the early_demux association unless it's bound for the +* exact same port and address on this host after translation. +*/ + if (!local || cp->vport != cp->dport || + !ip_vs_addr_equal(cp->af, &cp->vaddr, &cp->daddr)) + ip_vs_drop_early_demux_sk(skb); + if (!local) { skb_forward_csum(skb); NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb, NULL, skb_dst(skb)->dev, dst_output_sk); } else ret = NF_ACCEPT; + return ret; } @@ -557,6 +581,7 @@ static inline int ip_vs_send_or_cont(int pf, struct sk_buff *skb, if (likely(!(cp->flags & IP_VS_CONN_F_NFCT))) ip_vs_notrack(skb); if (!local) { + ip_vs_drop_early_demux_sk(skb); skb_forward_csum(skb); NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb, NULL, skb_dst(skb)->dev, dst_output_sk); @@ -845,6 +870,8 @@ ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int skb_af, struct ipv6hdr *old_ipv6h = NULL; #endif + ip_vs_drop_early_demux_sk(skb); + if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) { new_skb = skb_realloc_headroom(skb, max_headroom); if (!new_skb) -- Alex Gartrell -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] mlx4: TCP/UDP packets have L4 hash
On 7/2/2015 2:24 PM, Eric Dumazet wrote: From: Eric Dumazet Mellanox driver has the knowledge if rxhash is a L4 hash, if it receives a non fragmented TCP or UDP frame and NETIF_F_RXCSUM is enabled on netdev. ip_summed value is CHECKSUM_UNNECESSARY in this case. Signed-off-by: Eric Dumazet Cc: Amir Vadai Cc: Ido Shamay --- drivers/net/ethernet/mellanox/mlx4/en_rx.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 7a4f20bb7fcb..12c65e1ad6a9 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -917,7 +917,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud if (dev->features & NETIF_F_RXHASH) skb_set_hash(gro_skb, be32_to_cpu(cqe->immed_rss_invalid), -PKT_HASH_TYPE_L3); +(ip_summed == CHECKSUM_UNNECESSARY) ? + PKT_HASH_TYPE_L4 : + PKT_HASH_TYPE_L3); Thanks Eric, We can have a relaxation of the condition by looking only at TCP/UDP CQE indication (without check-sum indications) This can cover us also when device rx-checksuming feature is off. Do we want it or a correlation between check-sum and l4_hash is needed? Ido skb_record_rx_queue(gro_skb, cq->ring); skb_mark_napi_id(gro_skb, &cq->napi); @@ -963,7 +965,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud if (dev->features & NETIF_F_RXHASH) skb_set_hash(skb, be32_to_cpu(cqe->immed_rss_invalid), -PKT_HASH_TYPE_L3); +(ip_summed == CHECKSUM_UNNECESSARY) ? + PKT_HASH_TYPE_L4 : + PKT_HASH_TYPE_L3); if ((be32_to_cpu(cqe->vlan_my_qpn) & MLX4_CQE_VLAN_PRESENT_MASK) && -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipvs: skb_orphan in case of forwarding
Hello, On Sun, 5 Jul 2015, Alex Gartrell wrote: > + /* Remove the early_demux association unless it's bound for the > + * exact same port and address on this host after translation. > + */ > + if (!local || cp->vport != cp->dport || > + !ip_vs_addr_equal(cp->af, &cp->vaddr, &cp->caddr)) Sigh, it was my mistake, it should be cp->daddr instead of cp->caddr. It seems, I copied it from somewhere to give example... Sorry, can you resend with cp->daddr as v2. Regards -- Julian Anastasov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] ipvs: skb_orphan in case of forwarding
It is possible that we bind against a local socket in early_demux when we are actually going to want to forward it. In this case, the socket serves no purpose and only serves to confuse things (particularly functions which implicitly expect sk_fullsock to be true, like ip_local_out). Additionally, skb_set_owner_w is totally broken for non full-socks. Signed-off-by: Alex Gartrell --- net/netfilter/ipvs/ip_vs_xmit.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index bf66a86..65526f4 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -527,6 +527,21 @@ static inline int ip_vs_tunnel_xmit_prepare(struct sk_buff *skb, return ret; } +/* In the event of a remote destination, it's possible that we would have + * matches against an old socket (particularly a TIME-WAIT socket). This + * causes havoc down the line (ip_local_out et. al. expect regular sockets + * and invalid memory accesses will happen) so simply drop the association + * in this case. +*/ +static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb) +{ + /* If dev is set, the packet came from the LOCAL_IN callback and +* not from a local TCP socket. +*/ + if (skb->dev) + skb_orphan(skb); +} + /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */ static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb, struct ip_vs_conn *cp, int local) @@ -538,12 +553,21 @@ static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb, ip_vs_notrack(skb); else ip_vs_update_conntrack(skb, cp, 1); + + /* Remove the early_demux association unless it's bound for the +* exact same port and address on this host after translation. +*/ + if (!local || cp->vport != cp->dport || + !ip_vs_addr_equal(cp->af, &cp->vaddr, &cp->caddr)) + ip_vs_drop_early_demux_sk(skb); + if (!local) { skb_forward_csum(skb); NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb, NULL, skb_dst(skb)->dev, dst_output_sk); } else ret = NF_ACCEPT; + return ret; } @@ -557,6 +581,7 @@ static inline int ip_vs_send_or_cont(int pf, struct sk_buff *skb, if (likely(!(cp->flags & IP_VS_CONN_F_NFCT))) ip_vs_notrack(skb); if (!local) { + ip_vs_drop_early_demux_sk(skb); skb_forward_csum(skb); NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb, NULL, skb_dst(skb)->dev, dst_output_sk); @@ -845,6 +870,8 @@ ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int skb_af, struct ipv6hdr *old_ipv6h = NULL; #endif + ip_vs_drop_early_demux_sk(skb); + if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) { new_skb = skb_realloc_headroom(skb, max_headroom); if (!new_skb) -- Alex Gartrell -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xen-netback: remove duplicated function definition
On Sat, Jul 04, 2015 at 03:33:00AM +0800, Liang Li wrote: > There are two duplicated xenvif_zerocopy_callback() definitions. > Remove one of them. > > Signed-off-by: Liang Li Acked-by: Wei Liu Please fix the time of your computer and resend. Wei. > --- > drivers/net/xen-netback/common.h | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/net/xen-netback/common.h > b/drivers/net/xen-netback/common.h > index 8a495b3..c6cb85a 100644 > --- a/drivers/net/xen-netback/common.h > +++ b/drivers/net/xen-netback/common.h > @@ -325,9 +325,6 @@ static inline pending_ring_idx_t nr_pending_reqs(struct > xenvif_queue *queue) > queue->pending_prod + queue->pending_cons; > } > > -/* Callback from stack when TX packet can be released */ > -void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success); > - > irqreturn_t xenvif_interrupt(int irq, void *dev_id); > > extern bool separate_tx_rx_irq; > -- > 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: phy: add dependency on HAS_IOMEM to MDIO_BUS_MUX_MMIOREG
On UML builds, mdio-mux-mmioreg.c fails to compile: drivers/net/phy/mdio-mux-mmioreg.c:50:3: error: implicit declaration of function ‘ioremap’ [-Werror=implicit-function-declaration] drivers/net/phy/mdio-mux-mmioreg.c:63:3: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration] This is due to CONFIG_OF now being user selectable. Add a dependency on HAS_IOMEM to fix this. Signed-off-by: Rob Herring Cc: Florian Fainelli Cc: David S. Miller --- drivers/net/phy/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index cf18940..cb86d7a 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -191,7 +191,7 @@ config MDIO_BUS_MUX_GPIO config MDIO_BUS_MUX_MMIOREG tristate "Support for MMIO device-controlled MDIO bus multiplexers" - depends on OF_MDIO + depends on OF_MDIO && HAS_IOMEM select MDIO_BUS_MUX help This module provides a driver for MDIO bus multiplexers that -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 4.2 build error in net/netfilter/ipset/ip_set_hash_netnet.c
On Sat, Jul 04, 2015 at 12:44:36AM -0700, Vinson Lee wrote: > Hi. > > With the latest Linux 4.2-rc1, I am hitting this build error with GCC > 4.4.7 on CentOS 6. > > CC net/netfilter/ipset/ip_set_hash_netnet.o > net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet4_uadt’: > net/netfilter/ipset/ip_set_hash_netnet.c:163: error: unknown field > ‘cidr’ specified in initializer > net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: missing braces > around initializer > net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: (near > initialization for ‘e..ip’) > net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet6_uadt’: > net/netfilter/ipset/ip_set_hash_netnet.c:388: error: unknown field > ‘cidr’ specified in initializer > net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: missing braces > around initializer > net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: (near > initialization for ‘e.ip[0]’) > Previously fixed with commit 1a869205c75cb ("netfilter: ipset: The unnamed union initialization may lead to compilation error"), reintroduced with commit aff227581ed1a ("netfilter: ipset: Check CIDR value only when attribute is given"). Guenter -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk
Hello, On Fri, 3 Jul 2015, Alex Gartrell wrote: > > - if packets go to local server IPVS should not touch > > skb->dst, skb->sk, etc (NF_ACCEPT case) > > Yeah, the thing is that early demux could totally match for a socket > that existed before we created the service, and in that instance it > might make the most sense to retain the connection and simply > NF_ACCEPT. The problem with that approach though is that is that the > behavior changes if early_demux is not enabled. I believe that we > should just do the consistent thing and always drop the early_demux > result if bound for non-local, as you've said. We must not forget that a local server listening on 0.0.0.0:VPORT or VIP:VPORT can be reached if a real server with some local IP is used as RIP. So, early demux will really work for this case when local stack is one of the real servers. > The interesting thing though is that, for the purposes of routing, > enabling early_demux does change the behavior. I suspect that's a > bug, but it's far enough away from actual use cases that it's probably > fine (who is out there tearing down addresses and setting up routes in > their place?) Looks like routing by definition can not divert skbs with early-demux socket because input routing is not called. Netfilter's DNAT may change daddr/dport before early-demux and in this case socket should not be found (eg. if we DNAT to other host). So, there is problem mostly for IPVS, I don't remember for other cases. May be CLUSTERIP too, I'm not sure. There is the problem that at LOCAL_IN SNAT is valid operation, not sure how it affects early-demux. > What do you think of the following: > > commit f04c42f8041cc4ccc4cb2a30c1058136dd497a83 > Author: Alex Gartrell > Date: Wed Jul 1 13:24:46 2015 -0700 > > ipvs: orphan_skb in case of forwarding skb_orphan or orphan skb > It is possible that we bind against a local socket in early_demux when we > are actually going to want to forward it. In this case, the socket serves > no purpose and only serves to confuse things (particularly functions which > implicitly expect sk_fullsock to be true, like ip_local_out). > Additionally, skb_set_owner_w is totally broken for non full-socks. > > Signed-off-by: Alex Gartrell > > diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c > index bf66a86..3efe719 100644 > --- a/net/netfilter/ipvs/ip_vs_xmit.c > +++ b/net/netfilter/ipvs/ip_vs_xmit.c > @@ -527,6 +527,19 @@ static inline int > ip_vs_tunnel_xmit_prepare(struct sk_buff *skb, > return ret; > } > > +/* In the event of a remote destination, it's possible that we would have > + * matches against an old socket (particularly a TIME-WAIT socket). This > + * causes havoc down the line (ip_local_out et. al. expect regular sockets > + * and invalid memory accesses will happen) so simply drop the association > + * in this case > +*/ > +static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb) { Move '{' on next line and below comment should be closed on next line. But I guess you will run later scripts/checkpatch.pl --strict /tmp/file.patch > + /* If dev is set, the packet came from the LOCAL_IN callback and > +* not from a local TCP socket */ > + if (skb->dev) > + skb_orphan(skb); > +} > + > /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */ > static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb, > struct ip_vs_conn *cp, int local) > @@ -539,6 +552,7 @@ static inline int ip_vs_nat_send_or_cont(int pf, > struct sk_buff *skb, > else > ip_vs_update_conntrack(skb, cp, 1); > if (!local) { > + ip_vs_drop_early_demux_sk(skb); > skb_forward_csum(skb); > NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb, > NULL, skb_dst(skb)->dev, dst_output_sk); For the local=true case in ip_vs_nat_send_or_cont may be we should call skb_orphan when cp->dport != cp->vport or cp->daddr != cp->vaddr. This is a case where we DNAT to local real server but on different addr/port. If early demux finds socket, it is some socket shadowed after adding the virtual service. So, may be we have to add such checks near the NF_ACCEPT code. Can this work? else { /* Drop early-demux socket on DNAT */ if (cp->vport != cp->dport || !ip_vs_addr_equal(cp->af, cp->vaddr, &cp->caddr)) ip_vs_drop_early_demux_sk(skb); ret = NF_ACCEPT; } Otherwise, the other changes look good to me. Regards -- Julian Anastasov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] net: mvneta: Statically assign queues to CPUs
Hi Thomas, On Fri, Jul 03, 2015 at 04:46:24PM +0200, Thomas Petazzoni wrote: > Maxime, > > On Fri, 3 Jul 2015 16:25:51 +0200, Maxime Ripard wrote: > > > +static void mvneta_percpu_enable(void *arg) > > +{ > > + struct mvneta_port *pp = arg; > > + > > + enable_percpu_irq(pp->dev->irq, IRQ_TYPE_NONE); > > +} > > + > > static int mvneta_open(struct net_device *dev) > > { > > struct mvneta_port *pp = netdev_priv(dev); > > @@ -2655,6 +2662,19 @@ static int mvneta_open(struct net_device *dev) > > goto err_cleanup_txqs; > > } > > > > + /* > > +* Even though the documentation says that request_percpu_irq > > +* doesn't enable the interrupts automatically, it actually > > +* does so on the local CPU. > > +* > > +* Make sure it's disabled. > > +*/ > > + disable_percpu_irq(pp->dev->irq); > > + > > + /* Enable per-CPU interrupt on the one CPU we care about */ > > + smp_call_function_single(rxq_def % num_online_cpus(), > > +mvneta_percpu_enable, pp, true); > > What happens if that CPU goes offline through CPU hotplug? I just tried : if I start mvneta with "rxq_def=1", then my irq runs on CPU1. Then I offline CPU1 and the irqs are automatically handled by CPU0. Then I online CPU1 and irqs stay on CPU0. More or less related, I found that if I enable a queue number larger than the CPU count it does work, but then the system complains during rmmod : [ 877.146203] [ cut here ] [ 877.146227] WARNING: CPU: 1 PID: 1731 at fs/proc/generic.c:552 remove_proc_entry+0x144/0x15c() [ 877.146233] remove_proc_entry: removing non-empty directory 'irq/29', leaking at least 'mvneta' [ 877.146238] Modules linked in: mvneta(-) [last unloaded: mvneta] [ 877.146254] CPU: 1 PID: 1731 Comm: rmmod Tainted: GW 4.1.1-mvebu-6-g3d317ed-dirty #5 [ 877.146260] Hardware name: Marvell Armada 370/XP (Device Tree) [ 877.146281] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [ 877.146293] [] (show_stack) from [] (dump_stack+0x74/0x90) [ 877.146305] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0) [ 877.146315] [] (warn_slowpath_common) from [] (warn_slowpath_fmt+0x30/0x40) [ 877.146325] [] (warn_slowpath_fmt) from [] (remove_proc_entry+0x144/0x15c) [ 877.146336] [] (remove_proc_entry) from [] (unregister_irq_proc+0x8c/0xb0) [ 877.146347] [] (unregister_irq_proc) from [] (free_desc+0x28/0x58) [ 877.146356] [] (free_desc) from [] (irq_free_descs+0x44/0x80) [ 877.146368] [] (irq_free_descs) from [] (mvneta_remove+0x3c/0x4c [mvneta]) [ 877.146382] [] (mvneta_remove [mvneta]) from [] (platform_drv_remove+0x18/0x30) [ 877.146393] [] (platform_drv_remove) from [] (__device_release_driver+0x70/0xe4) [ 877.146402] [] (__device_release_driver) from [] (driver_detach+0xcc/0xd0) [ 877.146411] [] (driver_detach) from [] (bus_remove_driver+0x4c/0x90) [ 877.146425] [] (bus_remove_driver) from [] (SyS_delete_module+0x164/0x1b4) [ 877.146437] [] (SyS_delete_module) from [] (ret_fast_syscall+0x0/0x3c) [ 877.146443] ---[ end trace 48713a9ae31204b1 ]--- This was on the AX3 (dual-proc) with rxq_def=2. Hoping this helps, Willy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] net: mvneta: Handle per-cpu interrupts
Hi Maxime, On Fri, Jul 03, 2015 at 04:25:49PM +0200, Maxime Ripard wrote: > Now that our interrupt controller is allowing us to use per-CPU interrupts, > actually use it in the mvneta driver. > > This involves obviously reworking the driver to have a CPU-local NAPI > structure, and report for incoming packet using that structure. > > Signed-off-by: Maxime Ripard This patch breaks module build of mvneta unless you export request_percpu_irq : diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index ec31697..1440a92 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1799,6 +1799,7 @@ int request_percpu_irq(unsigned int irq, irq_handler_t handler, return retval; } +EXPORT_SYMBOL_GPL(request_percpu_irq); Regards, Willy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Linux 4.2 build error in net/netfilter/ipset/ip_set_hash_netnet.c
Hi. With the latest Linux 4.2-rc1, I am hitting this build error with GCC 4.4.7 on CentOS 6. CC net/netfilter/ipset/ip_set_hash_netnet.o net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet4_uadt’: net/netfilter/ipset/ip_set_hash_netnet.c:163: error: unknown field ‘cidr’ specified in initializer net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: missing braces around initializer net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: (near initialization for ‘e..ip’) net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet6_uadt’: net/netfilter/ipset/ip_set_hash_netnet.c:388: error: unknown field ‘cidr’ specified in initializer net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: missing braces around initializer net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: (near initialization for ‘e.ip[0]’) Cheers, Vinson -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 3/6] net_sched: act: make tcfg_pval non zero
Thanks guys for the review. For completeness, I'll add smp_wmb() here : gact->tcfg_pval= max_t(u16, 1, p_parm->pval); smp_wmb(); gact->tcfg_ptype = p_parm->ptype; And corresponding smp_rmb() On Fri, Jul 3, 2015 at 12:49 PM, Jamal Hadi Salim wrote: > On 07/02/15 09:07, Eric Dumazet wrote: >> >> First step for gact RCU operation : >> >> Instead of testing if tcfg_pval is zero or not, just make it 1. >> >> No change in behavior, but slightly faster code. >> >> Signed-off-by: Eric Dumazet > > > Acked-by: Jamal Hadi Salim > > cheers, > jamal > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] cdc_ncm: Add support for moving NDP to end of NCM frame
When sending lots of small packets, this patch will generate an "Unable to handle kernel paging request" in the memset call: ndp16 = memset(ctx->delayed_ndp16, 0, ctx->max_ndp_size); And I don't know why. Any comment or suggestion would be greatly apreciated. This has been reproduced in a QEMU X86 VM, from kernel 4.0.4 to current git. Thanks, Enrico Mioso -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk
> Looks like routing by definition can not divert skbs with > early-demux socket because input routing is not called. Only if found socket has a valid sk->sk_rx_dst Early demux : 1) if TCP lookup found a matching socket, we do the attachment skb->sk = sk; skb->destructor = sock_edemux 2) If sk->sk_rx_dst is set and still valid, IP routing will use this cached dst. So it looks very possible that some packets could match a socket but fail the 2) phase. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Summary lightweight tunnel discussion at NFWS
On 7/3/15, 3:00 AM, Thomas Graf wrote: On 06/18/15 at 09:49pm, Roopa Prabhu wrote: +#ifdef CONFIG_LWTUNNEL + if (fi->fib_nh->nh_lwtstate) { + struct lwtunnel_state *lwtstate; + + lwtstate = fi->fib_nh->nh_lwtstate; + if (nla_put_u16(skb, RTA_ENCAP_TYPE, lwtstate->type)) + goto nla_put_failure; + lwtunnel_fill_encap(skb, lwtstate); + } } +#endif Misplaced #endif ;-) Thx. I have fixed this since,...did not realize it came in as part of this RFC series. Other than that I managed to rebase my changes onto yours and it looks clean. Glad to know!. thanks Thomas. I had a few more changes (mostly cleanup/bug fixes, ipv6 support and mostly earlier feedback from you) in my local clone, pushed it to my github tree just now. This also tries to not use CONFIG_LWTUNNEL all over the place. I had it that way initially also because of fib struct members under #ifdef CONFIG_LWTUNNEL. (If we think at a later point that it is better to #ifdef CONFIG_LWTUNNEL fib struct members, I can bring some of that back in). And, Only control path (rtnetlink) for ipv6 mpls iptunnels has been tested. Since we also discussed this a bit at NFWS, I'm enclosing a quick summary: * Overall consensus that a lightweight flow based encapsulation makes sense. * Realization that what we actually want is stackable skb metadata between layers without over engineering it. * Consensus to avoid adding it to skb_shared_info and try to reuse the skb dst field. * New dst_metadata type similar to xfrm_dst which can carry metadata such as encapsulation instructions/information. * Can be made stackable to implement nested encapsulation if needed. Left out in the beginning to keep it simple. * Possible optimization option by putting the dst_metadata into a per cpu scratch buffer or stack without taking a reference and only force the reference & allocation when the skb is about to be queued. The regular fast path should never queue a skb with dst metadata attached. Thanks for the summary. this helps. I have been thinking of moving lwtstate from rtable to struct dst_entry. I will also look at the dst_metadata. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] xen-netback: remove duplicated function definition
> Cc: linux-ker...@vger.kernel.org; ian.campb...@citrix.com; > wei.l...@citrix.com; xen-de...@lists.xenproject.org; > netdev@vger.kernel.org > Subject: Re: [PATCH] xen-netback: remove duplicated function definition > > From: Liang Li > Date: Sat, 4 Jul 2015 03:33:00 +0800 > > > There are two duplicated xenvif_zerocopy_callback() definitions. > > Remove one of them. > > > > Signed-off-by: Liang Li > > You really need to fix the date on your computer. > > If your date is in the future, as your's is, then your patch appears out-of- > order in the patchwork patch queue since it is ordered by the > Date: field in the email. OK. Thanks for your reminding. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NET: hamradio: Fix IP over bpq encapsulation.
Since 1d5da757da860a6916adbf68b09e868062b4b3b8 (ax25: Stop using magic neighbour cache operations.) any attempt to transmit IP packets over a bpqether device will result in a message like "Dead loop on virtual device bpq0, fix it urgently!" Fix suggested by Eric W. Biederman . Signed-off-by: Ralf Baechle Cc: # 4.1 --- drivers/net/hamradio/bpqether.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c index 63ff08a..5b54b18 100644 --- a/drivers/net/hamradio/bpqether.c +++ b/drivers/net/hamradio/bpqether.c @@ -483,6 +483,7 @@ static void bpq_setup(struct net_device *dev) memcpy(dev->dev_addr, &ax25_defaddr, AX25_ADDR_LEN); dev->flags = 0; + dev->features = NETIF_F_LLTX; /* Allow recursion */ #if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE) dev->header_ops = &ax25_header_ops; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] bnx2x: Update to FW version 7.12.30
> The new FW will allow us to utilize some new features in our driver, > mainly adding vlan filtering offload and vxlan offload support. > > In addition, this fixes several issues: > 1. Packets from a VF with pvid configured which were sent with a >different vlan were transmitted instead of being discarded. > > 2. FCoE traffic might not recover after a failue while there's traffic >to another function. > > Signed-off-by: Yuval Mintz Hi, any news about this one? Thanks, Yuval -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html