pull-request: can 2015-08-25
Hello David, this is the updated pull request of one patch by me for the peak_usb driver. It fixes the driver, so that non FD adapters don't provide CAN FD bittimings. regards, Marc --- The following changes since commit b6df7d61c8776a882dd47ba4714d1445dd7ef2d9: net: bcmgenet: fix uncleaned dma flags (2015-08-23 23:00:41 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git tags/linux-can-fixes-for-4.2-20150825 for you to fetch changes up to 06b23f7fbbf26a025fd68395c7586949db586b47: can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters (2015-08-25 08:50:00 +0200) linux-can-fixes-for-4.2-20150825 Marc Kleine-Budde (1): can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters drivers/net/can/usb/peak_usb/pcan_usb.c | 24 +++ drivers/net/can/usb/peak_usb/pcan_usb_core.c | 4 +- drivers/net/can/usb/peak_usb/pcan_usb_core.h | 4 +- drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 96 +++- drivers/net/can/usb/peak_usb/pcan_usb_pro.c | 24 +++ 5 files changed, 82 insertions(+), 70 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters
The CAN FD data bittiming constants are provided via netlink only when there are valid CAN FD constants available in priv->data_bittiming_const. Due to the indirection of pointer assignments in the peak_usb driver the priv->data_bittiming_const never becomes NULL - not even for non-FD adapters. The data_bittiming_const points to zero'ed data which leads to this result when running 'ip -details link show can0': 35: can0: mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10 link/can promiscuity 0 can state STOPPED restart-ms 0 pcan_usb: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 : dtseg1 0..0 dtseg2 0..0 dsjw 1..0 dbrp 0..0 dbrp-inc 0 <== BROKEN! clock 800 This patch changes the struct peak_usb_adapter::bittiming_const and struct peak_usb_adapter::data_bittiming_const to pointers to fix the assignemnt problems. Cc: linux-stable # >= 4.0 Reported-by: Oliver Hartkopp Tested-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde --- drivers/net/can/usb/peak_usb/pcan_usb.c | 24 +++ drivers/net/can/usb/peak_usb/pcan_usb_core.c | 4 +- drivers/net/can/usb/peak_usb/pcan_usb_core.h | 4 +- drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 96 +++- drivers/net/can/usb/peak_usb/pcan_usb_pro.c | 24 +++ 5 files changed, 82 insertions(+), 70 deletions(-) diff --git a/drivers/net/can/usb/peak_usb/pcan_usb.c b/drivers/net/can/usb/peak_usb/pcan_usb.c index 6b94007ae052..838545ce468d 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb.c @@ -854,6 +854,18 @@ static int pcan_usb_probe(struct usb_interface *intf) /* * describe the PCAN-USB adapter */ +static const struct can_bittiming_const pcan_usb_const = { + .name = "pcan_usb", + .tseg1_min = 1, + .tseg1_max = 16, + .tseg2_min = 1, + .tseg2_max = 8, + .sjw_max = 4, + .brp_min = 1, + .brp_max = 64, + .brp_inc = 1, +}; + const struct peak_usb_adapter pcan_usb = { .name = "PCAN-USB", .device_id = PCAN_USB_PRODUCT_ID, @@ -862,17 +874,7 @@ const struct peak_usb_adapter pcan_usb = { .clock = { .freq = PCAN_USB_CRYSTAL_HZ / 2 , }, - .bittiming_const = { - .name = "pcan_usb", - .tseg1_min = 1, - .tseg1_max = 16, - .tseg2_min = 1, - .tseg2_max = 8, - .sjw_max = 4, - .brp_min = 1, - .brp_max = 64, - .brp_inc = 1, - }, + .bittiming_const = &pcan_usb_const, /* size of device private data */ .sizeof_dev_private = sizeof(struct pcan_usb), diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c index 7921cff93a63..5a2e341a6d1e 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c @@ -792,9 +792,9 @@ static int peak_usb_create_dev(const struct peak_usb_adapter *peak_usb_adapter, dev->ep_msg_out = peak_usb_adapter->ep_msg_out[ctrl_idx]; dev->can.clock = peak_usb_adapter->clock; - dev->can.bittiming_const = &peak_usb_adapter->bittiming_const; + dev->can.bittiming_const = peak_usb_adapter->bittiming_const; dev->can.do_set_bittiming = peak_usb_set_bittiming; - dev->can.data_bittiming_const = &peak_usb_adapter->data_bittiming_const; + dev->can.data_bittiming_const = peak_usb_adapter->data_bittiming_const; dev->can.do_set_data_bittiming = peak_usb_set_data_bittiming; dev->can.do_set_mode = peak_usb_set_mode; dev->can.do_get_berr_counter = peak_usb_adapter->do_get_berr_counter; diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.h b/drivers/net/can/usb/peak_usb/pcan_usb_core.h index 9e624f05ad4d..506fe506c9d3 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.h +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.h @@ -48,8 +48,8 @@ struct peak_usb_adapter { u32 device_id; u32 ctrlmode_supported; struct can_clock clock; - const struct can_bittiming_const bittiming_const; - const struct can_bittiming_const data_bittiming_const; + const struct can_bittiming_const * const bittiming_const; + const struct can_bittiming_const * const data_bittiming_const; unsigned int ctrl_count; int (*intf_probe)(struct usb_interface *intf); diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c index 09d14e70abd7..ce44a033f63b 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c @@ -990,6 +990,30 @@ static void pcan_usb_fd_free(struct peak_usb_device *dev) } /* describes the PCAN-USB FD adapter */ +static const struct can_bittiming_const pcan_usb_fd_const = { + .name = "pcan_usb_fd", + .tseg1_min = 1, + .tseg1_max = 64, + .t
Re: pull-request: can 2015-08-24
On 08/24/2015 11:20 AM, Marc Kleine-Budde wrote: > Hello David, > > this is a pull request of one patch by me for the peak_usb driver. It fixes > the > driver, so that non FD adapters don't provide CAN FD bittimings. As there are some typos in the commit message I'll send an updated pull request. David, please don't pull this one. thanks, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions| Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917- | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | signature.asc Description: OpenPGP digital signature
Re: [net-next PATCH 1/3] drivers: net: cpsw: add am335x errata workarround for interrutps
On Monday 24 August 2015 03:34 PM, Sekhar Nori wrote: > Hi Mugunthan, > > On Wednesday 12 August 2015 03:22 PM, Mugunthan V N wrote: >> > +static const struct of_device_id cpsw_of_mtable[] = { >> > + { .compatible = "ti,cpsw", .data = &cpsw_devtype[CPSW], }, >> > + { .compatible = "ti,am335x-cpsw", .data = &cpsw_devtype[AM335X_CPSW], }, >> > + { .compatible = "ti,am4372-cpsw", .data = &cpsw_devtype[AM4372_CPSW], }, >> > + { .compatible = "ti,dra7-cpsw", .data = &cpsw_devtype[DRA7_CPSW], }, > I do not see documentation added for these compatibles. Since the series > is already applied, can you send additional patches adding documentation? Will submit a patch ASAP Regards Mugunthan V N -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iproute2: Behavioural Bug?
On Mon, Aug 24, 2015 at 10:14 PM, Akshat Kakkar wrote: > Dear Florian, > > There are two filters 15:2:2 and 15:2:3 and I have deleted only > 15:2:3, so 15:2:2 will still be there and hence this condition > "destroy proto tp when all filters are gone" should not be applicable > over here. > Florian is correct, it _does_ look like this is caused by my patch, I guess some check in u32_destroy() isn't correct. It's late here, I will look into this tomorrow. Thanks for the report anyway! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
use after free again...
Hi, Jiri, In your commit 61adedf3e3f1d3f032c5a6a299978d91eff6d555 ("route: move lwtunnel state to dst_entry"), how the hell could the following piece be correct? :-/ @@ -264,6 +266,7 @@ again: kfree(dst); else kmem_cache_free(dst->ops->kmem_cachep, dst); + lwtstate_put(dst->lwtstate); There is clearly a kfree(dst) before dereferencing dst... And I got a nice crash: [ 33.160081] general protection fault: [#1] SMP DEBUG_PAGEALLOC [ 33.164285] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc7+ #166 [ 33.164285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 33.164285] task: 88010656d280 ti: 88010657 task.ti: 88010657 [ 33.164285] RIP: 0010:[] [] dst_destroy+0xa6/0xef [ 33.164285] RSP: 0018:880107603e38 EFLAGS: 00010202 [ 33.164285] RAX: 0001 RBX: 8800d225a000 RCX: 82250fd0 [ 33.164285] RDX: 0001 RSI: 82250fd0 RDI: 6b6b6b6b6b6b6b6b [ 33.164285] RBP: 880107603e58 R08: 0001 R09: 0001 [ 33.164285] R10: b530 R11: 880107609000 R12: [ 33.164285] R13: 82343c40 R14: R15: 8182fb4f [ 33.164285] FS: () GS:88010760() knlGS: [ 33.164285] CS: 0010 DS: ES: CR0: 8005003b [ 33.164285] CR2: 7fcabd9d3000 CR3: d7279000 CR4: 06e0 [ 33.164285] Stack: [ 33.164285] 82250fd0 8801077d6f00 82253c40 8800d225a000 [ 33.164285] 880107603e68 8182fb5d 880107603f08 810d795e [ 33.164285] 810d7648 880106574000 88010656d280 88010656d280 [ 33.164285] Call Trace: [ 33.164285] [ 33.164285] [] dst_destroy_rcu+0xe/0x1d [ 33.164285] [] rcu_process_callbacks+0x618/0x7eb [ 33.164285] [] ? rcu_process_callbacks+0x302/0x7eb [ 33.164285] [] ? dst_gc_task+0x1eb/0x1eb [ 33.164285] [] __do_softirq+0x178/0x39f [ 33.164285] [] irq_exit+0x41/0x95 [ 33.164285] [] smp_apic_timer_interrupt+0x34/0x40 [ 33.164285] [] apic_timer_interrupt+0x6d/0x80 [ 33.164285] [ 33.164285] [] ? default_idle+0x21/0x32 [ 33.164285] [] ? default_idle+0x1f/0x32 [ 33.164285] [] arch_cpu_idle+0xf/0x11 [ 33.164285] [] default_idle_call+0x1f/0x21 [ 33.164285] [] cpu_startup_entry+0x1ad/0x273 [ 33.164285] [] start_secondary+0x135/0x156 I cooked a _quick_ patch to fix it. I can send it formally if it looks good to you, if not, feel free to send a better fix before me. diff --git a/net/core/dst.c b/net/core/dst.c index 50dcdbb..477035e 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -262,11 +262,12 @@ again: if (dst->dev) dev_put(dst->dev); + lwtstate_put(dst->lwtstate); + if (dst->flags & DST_METADATA) kfree(dst); else kmem_cache_free(dst->ops->kmem_cachep, dst); - lwtstate_put(dst->lwtstate); dst = child; if (dst) { -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Compile error: 'nf_skb_duplicated' undeclared (first use in this function)
Hi, I just got: net/ipv4/netfilter/nf_dup_ipv4.c: In function ‘nf_dup_ipv4’: net/ipv4/netfilter/nf_dup_ipv4.c:72:16: error: ‘nf_skb_duplicated’ undeclared (first use in this function) if (this_cpu_read(nf_skb_duplicated)) ^ net/ipv4/netfilter/nf_dup_ipv4.c:72:16: note: each undeclared identifier is reported only once for each function it appears in And the following patch could fix it, but I haven't looked into it yet, maybe some Kconfig symbol dependency issue too. diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c index b5bb375..2d79e6e 100644 --- a/net/ipv4/netfilter/nf_dup_ipv4.c +++ b/net/ipv4/netfilter/nf_dup_ipv4.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include diff --git a/net/ipv6/netfilter/nf_dup_ipv6.c b/net/ipv6/netfilter/nf_dup_ipv6.c index d8ab654..89c2624 100644 --- a/net/ipv6/netfilter/nf_dup_ipv6.c +++ b/net/ipv6/netfilter/nf_dup_ipv6.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iproute2: Behavioural Bug?
Dear Florian, There are two filters 15:2:2 and 15:2:3 and I have deleted only 15:2:3, so 15:2:2 will still be there and hence this condition "destroy proto tp when all filters are gone" should not be applicable over here. On Tue, Aug 25, 2015 at 4:52 AM, Florian Westphal wrote: > Akshat Kakkar wrote: > > [ CC Cong ] > >> When I am trying to delete a single tc filter (i.e. specifying its >> handle), it is deleting all the >> filters with the same priority/preference. i.e. it is ignoring the >> handle specified. >> >> But, When I am doing similar activity in hashtable 800: it is deleting only >> the >> specified filter, i.e. it is behaving as expected. >> >> I am unable to comprehend the reason for this difference in behaviour. >> >> Infact, in kernel 2.6.32 all is working as expected. However, in >> kernel 3.1 and 4.1 it is having the behaviour as mentioned above. >> >> For example, following set of commands create a hashtable 15: and add >> 2 filters to it. >> >> tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor >> 256 >> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32 >> ht 15:2: match ip src 10.0.0.2 flowid 1:10 >> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32 >> ht 15:2: match ip src 10.0.0.3 flowid 1:10 >> >> Now following command DELETES ALL THE FILTERS, though it should only >> delete FILTER 15:2:3 ! >> tc filter del dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32 >> >> O/p of tc filter show eth0 is this case is blank. As all filters are deleted. > > Happens since > > 1e052be69d045c8d0f82ff1116fd3e5a79661745 > ("net_sched: destroy proto tp when all filters are gone"). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, net-next] r8169: On RTL 8101 series bit SYSErr is reserved.
May be entire program must rewritten due multiple errors. 2015-08-24 21:38 GMT+03:00, David Miller : > From: Corcodel Marian > Date: Mon, 24 Aug 2015 21:12:53 +0300 > >> diff --git a/drivers/net/ethernet/realtek/r8169.c >> b/drivers/net/ethernet/realtek/r8169.c >> index 5693e65..32d2072 100644 >> --- a/drivers/net/ethernet/realtek/r8169.c >> +++ b/drivers/net/ethernet/realtek/r8169.c >> @@ -8256,6 +8256,14 @@ static int rtl_init_one(struct pci_dev *pdev, const >> struct pci_device_id *ent) >> RTL_W8(Config1, RTL_R8(Config1) | PMEnable); >> RTL_W8(Config5, RTL_R8(Config5) & (BWF | MWF | UWF | LanWake | >> PMEStatus));*/ >> switch (tp->mac_version) { >> +case RTL_GIGA_MAC_VER_07: >> +case RTL_GIGA_MAC_VER_08: >> +case RTL_GIGA_MAC_VER_09: >> +case RTL_GIGA_MAC_VER_10: >> +case RTL_GIGA_MAC_VER_13: >> +case RTL_GIGA_MAC_VER_16: >> +pci_write_config_word(pdev, PCI_COMMAND, ~PCI_COMMAND_SERR); > > You're writing all sorts of bits you definitely don't want to set here. > > Furthermore, there is no need to clear a bit that shouldn't be set > in the first place. > > Your patches are really full of major errors, and unsuitable for > upstream. > > Yes, all of them. > > So please stop posting your r8169 changes here, because if you don't > care if your patches get included or not, then you should not be > posting them here. This isn't a place to just dump ramdom patches, > sorry. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] netlink: netlink_ack send a capped message in case of error
From: Pablo Neira Ayuso Date: Mon, 24 Aug 2015 20:56:37 +0200 > On Mon, Aug 24, 2015 at 10:08:22AM +0200, Christophe Ricard wrote: >> Hi Scott, >> >> I think i understand the potential limitation of my solution. >> I saw something was proposed by Jiri Benc who pushed an additional flag to >> tell if the payload can be ignored in case of an error. >> http://patchwork.ozlabs.org/patch/290976/ >> >> Do you think this one is acceptable ? I am not sure to understand David >> last comment. > > I think David suggests something like the (completely untested) > attached patch. Yes, echo'ing the entire message back in an ACK is really pointless. Especially since if the user really is interested in noticing ACKs it can very easily keep the original request around and match on sequence number, as Pablo's patch's commit message suggests. We're stuck with the current behavior by default, but we can add the new ACK feature to deal with the issue in the long term. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 net-next 5/8] geneve: Add support to collect tunnel metadata.
On Mon, Aug 24, 2015 at 6:42 PM, Jesse Gross wrote: > On Mon, Aug 24, 2015 at 10:43 AM, Pravin B Shelar wrote: >> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c >> index 0a6d974..c05bc13 100644 >> --- a/drivers/net/geneve.c >> +++ b/drivers/net/geneve.c >> @@ -141,10 +190,15 @@ drop: >> /* Setup stats when device is created */ >> static int geneve_init(struct net_device *dev) >> { >> + struct geneve_dev *geneve = netdev_priv(dev); >> + >> dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); >> if (!dev->tstats) >> return -ENOMEM; >> >> + if (geneve->collect_md) >> + dev->features |= NETIF_F_NETNS_LOCAL; > > I was going back and forth on whether this is the right thing to do. > Is it any weirder to allow this than to move a normal tunnel device > across namespaces? Moving this device means moving all tunnels backed by this device rather than specific tunnel device. Thats why it does not look right to move such device. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2 03/11] soc/fsl: Introduce the DPAA BMan portal driver
On Wed, Aug 12, 2015 at 04:14:49PM -0400, Roy Pledge wrote: > diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c > index 9a500ce..d6e2204 100644 > --- a/drivers/soc/fsl/qbman/bman.c > +++ b/drivers/soc/fsl/qbman/bman.c > @@ -165,11 +165,11 @@ static struct bman *bm_create(void *regs) > > static inline u32 __bm_in(struct bman *bm, u32 offset) > { > - return in_be32((void *)bm + offset); > + return ioread32be((void *)bm + offset); > } > static inline void __bm_out(struct bman *bm, u32 offset, u32 val) > { > - out_be32((void *)bm + offset, val); > + iowrite32be(val, (void*) bm + offset); > } Don't introduce a problem in one patch and then fix it in another. What does this change have to do with introducing the portal driver? > #define bm_in(reg) __bm_in(bm, REG_##reg) > #define bm_out(reg, val) __bm_out(bm, REG_##reg, val) > @@ -341,6 +341,7 @@ u32 bm_pool_free_buffers(u32 bpid) > { > return bm_in(POOL_CONTENT(bpid)); > } > +EXPORT_SYMBOL(bm_pool_free_buffers); If you're exporting this (or even making it global), where's the documentation? > +/* BTW, the drivers (and h/w programming model) already obtain the required > + * synchronisation for portal accesses via lwsync(), hwsync(), and > + * data-dependencies. Use of barrier()s or other order-preserving primitives > + * simply degrade performance. Hence the use of the __raw_*() interfaces, > which > + * simply ensure that the compiler treats the portal registers as volatile > (ie. > + * non-coherent). */ volatile does not mean "non-coherent". Be careful with this regarding endian, e.g. on ARM we can run the CPU in big or little endian on the same chip, and the raw accessors also unfortunately bypass endian conversion. > + > +/* Cache-inhibited register access. */ > +#define __bm_in(bm, o) __raw_readl((bm)->addr_ci + (o)) > +#define __bm_out(bm, o, val) __raw_writel((val), (bm)->addr_ci + (o)) > +#define bm_in(reg) __bm_in(&portal->addr, BM_REG_##reg) > +#define bm_out(reg, val) __bm_out(&portal->addr, BM_REG_##reg, val) Don't have multiple implementations of bm_in/out, with the same name, where bm in both refers to "bman", but which have different functions. > +/* Cache-enabled (index) register access */ > +#define __bm_cl_touch_ro(bm, o) dcbt_ro((bm)->addr_ce + (o)) > +#define __bm_cl_touch_rw(bm, o) dcbt_rw((bm)->addr_ce + (o)) > +#define __bm_cl_in(bm, o)__raw_readl((bm)->addr_ce + (o)) > +#define __bm_cl_out(bm, o, val) \ > + do { \ > + u32 *__tmpclout = (bm)->addr_ce + (o); \ > + __raw_writel((val), __tmpclout); \ > + dcbf(__tmpclout); \ > + } while (0) > +#define __bm_cl_invalidate(bm, o) dcbi((bm)->addr_ce + (o)) > +#define bm_cl_touch_ro(reg) __bm_cl_touch_ro(&portal->addr, > BM_CL_##reg##_CENA) > +#define bm_cl_touch_rw(reg) __bm_cl_touch_rw(&portal->addr, > BM_CL_##reg##_CENA) > +#define bm_cl_in(reg)__bm_cl_in(&portal->addr, > BM_CL_##reg##_CENA) > +#define bm_cl_out(reg, val) __bm_cl_out(&portal->addr, BM_CL_##reg##_CENA, > val) > +#define bm_cl_invalidate(reg)\ > + __bm_cl_invalidate(&portal->addr, BM_CL_##reg##_CENA) Define these using functions to operate on pointers, and pass the pointer in without all the token-pasting. Some extra explanation of the cache manipulation would also be helpful. > +/* --- RCR API --- */ > + > +/* Bit-wise logic to wrap a ring pointer by clearing the "carry bit" */ > +#define RCR_CARRYCLEAR(p) \ > + (void *)((unsigned long)(p) & (~(unsigned long)(BM_RCR_SIZE << 6))) This could be a function. Where does 6 come from? You use it again in the next function. Please define it symbolically. > + > +/* Bit-wise logic to convert a ring pointer to a ring index */ > +static inline u8 RCR_PTR2IDX(struct bm_rcr_entry *e) > +{ > + return ((uintptr_t)e >> 6) & (BM_RCR_SIZE - 1); > +} This is a function, so don't use ALLCAPS. > +/* Increment the 'cursor' ring pointer, taking 'vbit' into account */ > +static inline void RCR_INC(struct bm_rcr *rcr) > +{ > + /* NB: this is odd-looking, but experiments show that it generates > + * fast code with essentially no branching overheads. We increment to > + * the next RCR pointer and handle overflow and 'vbit'. */ > + struct bm_rcr_entry *partial = rcr->cursor + 1; > + > + rcr->cursor = RCR_CARRYCLEAR(partial); > + if (partial != rcr->cursor) > + rcr->vbit ^= BM_RCR_VERB_VBIT; > +} > + > +static inline int bm_rcr_init(struct bm_portal *portal, enum bm_rcr_pmode > pmode, > + __maybe_unused enum bm_rcr_cmode cmode) > +{ > + /* This use of 'register', as well as all other occurrences, is because > + * it has been observed to generate much faster code with gcc than is > + * otherwise the case. */ > + register struct bm_rcr *rcr = &portal->rcr; What version of GCC? Normal optimization settings? Has the seemingly excessive use of inlin
Re: [PATCH v3 net-next 5/8] geneve: Add support to collect tunnel metadata.
On Mon, Aug 24, 2015 at 10:43 AM, Pravin B Shelar wrote: > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c > index 0a6d974..c05bc13 100644 > --- a/drivers/net/geneve.c > +++ b/drivers/net/geneve.c > @@ -141,10 +190,15 @@ drop: > /* Setup stats when device is created */ > static int geneve_init(struct net_device *dev) > { > + struct geneve_dev *geneve = netdev_priv(dev); > + > dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); > if (!dev->tstats) > return -ENOMEM; > > + if (geneve->collect_md) > + dev->features |= NETIF_F_NETNS_LOCAL; I was going back and forth on whether this is the right thing to do. Is it any weirder to allow this than to move a normal tunnel device across namespaces? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2 v2] add support for brief output for link and addresses
This adds support for slightly less output than is normally provided by 'ip link show' and 'ip addr show'. This is a bit better when you have a host with lots of interfaces. Sample output: $ ip -br link show lo UNKNOWN 00:00:00:00:00:00 p7p1 UP 08:00:27:9d:62:9f p8p1 DOWN 08:00:27:dc:d8:ca p9p1 UP 08:00:27:76:d9:75 p7p1.100@p7p1UP 08:00:27:9d:62:9f $ ip -br -4 addr show lo UNKNOWN 127.0.0.1/8 p7p1 UP 70.0.0.1/24 p8p1 DOWN 80.0.0.1/24 p7p1.100@p7p1UP 200.0.0.1/24 $ ip -br -6 addr show lo UNKNOWN ::1/128 p7p1 UP 7000::1/8 fe80::a00:27ff:fe9d:629f/64 p8p1 DOWN 8000::1/8 p9p1 UP fe80::a00:27ff:fe76:d975/64 p7p1.100@p7p1UP fe80::a00:27ff:fe9d:629f/64 $ ip -br addr show p7p1 p7p1 UP 70.0.0.1/24 7000::1/8 fe80::a00:27ff:fe9d:629f/64 v2: Now with color support! Signed-off-by: Andy Gospodarek --- include/utils.h | 1 + ip/ip.c | 5 +- ip/ip_common.h| 3 + ip/ipaddress.c| 155 +++--- ip/iplink.c | 5 +- man/man8/ip-link.8.in | 3 +- 6 files changed, 147 insertions(+), 25 deletions(-) diff --git a/include/utils.h b/include/utils.h index 0c57ccd..f77edeb 100644 --- a/include/utils.h +++ b/include/utils.h @@ -19,6 +19,7 @@ extern int show_details; extern int show_raw; extern int resolve_hosts; extern int oneline; +extern int brief; extern int timestamp; extern int timestamp_short; extern const char * _SL_; diff --git a/ip/ip.c b/ip/ip.c index e75447e..eea00b8 100644 --- a/ip/ip.c +++ b/ip/ip.c @@ -32,6 +32,7 @@ int show_stats; int show_details; int resolve_hosts; int oneline; +int brief; int timestamp; const char *_SL_; int force; @@ -55,7 +56,7 @@ static void usage(void) "-h[uman-readable] | -iec |\n" "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | link } |\n" "-4 | -6 | -I | -D | -B | -0 |\n" -"-l[oops] { maximum-addr-flush-attempts } |\n" +"-l[oops] { maximum-addr-flush-attempts } | -br[ief] |\n" "-o[neline] | -t[imestamp] | -ts[hort] | -b[atch] [filename] |\n" "-rc[vbuf] [size] | -n[etns] name | -a[ll] | -c[olor]}\n"); exit(-1); @@ -250,6 +251,8 @@ int main(int argc, char **argv) if (argc <= 1) usage(); batch_file = argv[1]; + } else if (matches(opt, "-brief") == 0) { + ++brief; } else if (matches(opt, "-rcvbuf") == 0) { unsigned int size; diff --git a/ip/ip_common.h b/ip/ip_common.h index f120f5b..f74face 100644 --- a/ip/ip_common.h +++ b/ip/ip_common.h @@ -2,6 +2,9 @@ extern int get_operstate(const char *name); extern int print_linkinfo(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg); +extern int print_linkinfo_brief(const struct sockaddr_nl *who, + struct nlmsghdr *n, + void *arg); extern int print_addrinfo(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg); diff --git a/ip/ipaddress.c b/ip/ipaddress.c index 13d9c46..bb44a55 100644 --- a/ip/ipaddress.c +++ b/ip/ipaddress.c @@ -125,7 +125,10 @@ static void print_link_flags(FILE *fp, unsigned flags, unsigned mdown) fprintf(fp, "%x", flags); if (mdown) fprintf(fp, ",M-DOWN"); - fprintf(fp, "> "); + if (brief) + fprintf(fp, ">"); + else + fprintf(fp, "> "); } static const char *oper_states[] = { @@ -138,13 +141,22 @@ static void print_operstate(FILE *f, __u8 state) if (state >= sizeof(oper_states)/sizeof(oper_states[0])) fprintf(f, "state %#x ", state); else { - fprintf(f, "state "); - if (strcmp(oper_states[state], "UP") == 0) - color_fprintf(f, COLOR_OPERSTATE_UP, "%s ", oper_states[state]); - else if (strcmp(oper_states[state], "DOWN") == 0) - color_fprintf(f, COLOR_OPERSTATE_DOWN, "%s ", oper_states[state]); - else - fprintf(f, "%s ", oper_states[state]); + if (brief) { + if (strcmp(oper_states[state], "UP") == 0) + color_fprintf(f, COLOR_OPERSTATE_UP, "%-7s ", oper_states[state]); + else if (strcmp(oper_states[state], "DOWN") == 0) + color_fprintf(f, COLOR_OPERSTATE_DOWN, "%-7s ", oper_states[state]); + else +
Re: [PATCH iproute2] add support for brief output for link and addresses
On Mon, Aug 24, 2015 at 02:08:29PM -0700, Stephen Hemminger wrote: > On Mon, 24 Aug 2015 20:41:16 + > Andy Gospodarek wrote: > > > This adds support for slightly less output than is normally provided by > > 'ip link show' and 'ip addr show'. This is a bit better when you have a > > host with lots of interfaces. Sample output: > > > > $ ip -br link show > > lo UNKNOWN 00:00:00:00:00:00 > > p7p1 UP 08:00:27:9d:62:9f > > > > p8p1 DOWN 08:00:27:dc:d8:ca > > > > p9p1 UP 08:00:27:76:d9:75 > > > > p7p1.100@p7p1UP 08:00:27:9d:62:9f > > > > > > $ ip -br -4 addr show > > lo UNKNOWN 127.0.0.1/8 > > p7p1 UP 70.0.0.1/24 > > p8p1 DOWN 80.0.0.1/24 > > p7p1.100@p7p1UP 200.0.0.1/24 > > > > $ ip -br -6 addr show > > lo UNKNOWN ::1/128 > > p7p1 UP 7000::1/8 fe80::a00:27ff:fe9d:629f/64 > > p8p1 DOWN 8000::1/8 > > p9p1 UP fe80::a00:27ff:fe76:d975/64 > > p7p1.100@p7p1UP fe80::a00:27ff:fe9d:629f/64 > > > > $ ip -br addr show p7p1 > > p7p1 UP 70.0.0.1/24 7000::1/8 fe80::a00:27ff:fe9d:629f/64 > > > > Signed-off-by: Andy Gospodarek > > Cool, we could colorize this as well :-) Will do, v2 coming up! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5 net-next 00/10] OVS conntrack support
The goal of this series is to allow OVS to send packets through the Linux kernel connection tracker, and subsequently match on fields populated by conntrack. This version addresses the feedback from v4, mostly minor fixes, including shifting the conntrack init into the per-namespace functions rather than per-datapath and ensuring the ct_mark/ct_label attributes are re-serialized when userspace dumps the actions. Users attempting to specify actions that set ct_labels with a length longer than the supported length will now get flow rejections. This series also rebases against the latest conntrack zone changes. This functionality is enabled through the CONFIG_OPENVSWITCH_CONNTRACK option. The branch below has been updated with the corresponding userspace pieces: https://github.com/joestringer/ovs dev/ct_20150818 Joe Stringer (10): openvswitch: Serialize acts with original netlink len openvswitch: Move MASKED* macros to datapath.h ipv6: Export nf_ct_frag6_gather() dst: Add __skb_dst_copy() variation openvswitch: Add conntrack action openvswitch: Allow matching on conntrack mark netfilter: Always export nf_connlabels_replace() netfilter: connlabels: Export setting connlabel length openvswitch: Allow matching on conntrack label openvswitch: Allow attaching helpers to ct action include/net/dst.h | 9 +- include/net/netfilter/nf_conntrack_labels.h | 4 + include/uapi/linux/openvswitch.h| 58 +++ net/ipv6/netfilter/nf_conntrack_reasm.c | 1 + net/netfilter/nf_conntrack_labels.c | 34 +- net/netfilter/xt_connlabel.c| 16 +- net/openvswitch/Kconfig | 11 + net/openvswitch/Makefile| 2 + net/openvswitch/actions.c | 229 +++-- net/openvswitch/conntrack.c | 723 net/openvswitch/conntrack.h | 78 +++ net/openvswitch/datapath.c | 86 +++- net/openvswitch/datapath.h | 13 + net/openvswitch/flow.c | 6 +- net/openvswitch/flow.h | 11 +- net/openvswitch/flow_netlink.c | 129 - net/openvswitch/flow_netlink.h | 13 +- net/openvswitch/vport.c | 1 + 18 files changed, 1317 insertions(+), 107 deletions(-) create mode 100644 net/openvswitch/conntrack.c create mode 100644 net/openvswitch/conntrack.h -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5 net-next 04/10] dst: Add __skb_dst_copy() variation
This variation on skb_dst_copy() doesn't require two skbs. Signed-off-by: Joe Stringer Acked-by: Pravin B Shelar --- v4: Add ack. v5: No change. --- include/net/dst.h | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/include/net/dst.h b/include/net/dst.h index 0a9a723..6f282e7 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -286,13 +286,18 @@ static inline void skb_dst_drop(struct sk_buff *skb) } } -static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb) +static inline void __skb_dst_copy(struct sk_buff *nskb, unsigned long refdst) { - nskb->_skb_refdst = oskb->_skb_refdst; + nskb->_skb_refdst = refdst; if (!(nskb->_skb_refdst & SKB_DST_NOREF)) dst_clone(skb_dst(nskb)); } +static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb) +{ + __skb_dst_copy(nskb, oskb->_skb_refdst); +} + /** * skb_dst_force - makes sure skb dst is refcounted * @skb: buffer -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5 net-next 01/10] openvswitch: Serialize acts with original netlink len
Previously, we used the kernel-internal netlink actions length to calculate the size of messages to serialize back to userspace. However,the sw_flow_actions may not be formatted exactly the same as the actions on the wire, so store the original actions length when de-serializing and re-use the original length when serializing. Signed-off-by: Joe Stringer Acked-by: Pravin B Shelar --- v2: No change. v3: Preserve original length across buffer resize. v4: Add ack. v5: No change. --- net/openvswitch/datapath.c | 2 +- net/openvswitch/flow.h | 1 + net/openvswitch/flow_netlink.c | 2 ++ 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index ffe984f..d5b5473 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -713,7 +713,7 @@ static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts, /* OVS_FLOW_ATTR_ACTIONS */ if (should_fill_actions(ufid_flags)) - len += nla_total_size(acts->actions_len); + len += nla_total_size(acts->orig_len); return len + nla_total_size(sizeof(struct ovs_flow_stats)) /* OVS_FLOW_ATTR_STATS */ diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h index b62cdb3..082a87b 100644 --- a/net/openvswitch/flow.h +++ b/net/openvswitch/flow.h @@ -144,6 +144,7 @@ struct sw_flow_id { struct sw_flow_actions { struct rcu_head rcu; + size_t orig_len;/* From flow_cmd_new netlink actions size */ u32 actions_len; struct nlattr actions[]; }; diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 4e7a3f7..c182b28 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -1619,6 +1619,7 @@ static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa, memcpy(acts->actions, (*sfa)->actions, (*sfa)->actions_len); acts->actions_len = (*sfa)->actions_len; + acts->orig_len = (*sfa)->orig_len; kfree(*sfa); *sfa = acts; @@ -2223,6 +2224,7 @@ int ovs_nla_copy_actions(const struct nlattr *attr, if (IS_ERR(*sfa)) return PTR_ERR(*sfa); + (*sfa)->orig_len = nla_len(attr); err = __ovs_nla_copy_actions(attr, key, 0, sfa, key->eth.type, key->eth.tci, log); if (err) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5 net-next 03/10] ipv6: Export nf_ct_frag6_gather()
Signed-off-by: Joe Stringer Acked-by: Thomas Graf Acked-by: Pravin B Shelar --- v4: Add ack. v5: No change. --- net/ipv6/netfilter/nf_conntrack_reasm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 6d02498..701cd2b 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -633,6 +633,7 @@ ret_orig: kfree_skb(clone); return skb; } +EXPORT_SYMBOL_GPL(nf_ct_frag6_gather); void nf_ct_frag6_consume_orig(struct sk_buff *skb) { -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5 net-next 02/10] openvswitch: Move MASKED* macros to datapath.h
This will allow the ovs-conntrack code to reuse these macros. Signed-off-by: Joe Stringer Acked-by: Thomas Graf Acked-by: Pravin B Shelar --- v4: Add ack. v5: No change. --- net/openvswitch/actions.c | 52 ++ net/openvswitch/datapath.h | 4 2 files changed, 29 insertions(+), 27 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 4f42007..520438b 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -185,10 +185,6 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key, return 0; } -/* 'KEY' must not have any bits set outside of the 'MASK' */ -#define MASKED(OLD, KEY, MASK) ((KEY) | ((OLD) & ~(MASK))) -#define SET_MASKED(OLD, KEY, MASK) ((OLD) = MASKED(OLD, KEY, MASK)) - static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key, const __be32 *mpls_lse, const __be32 *mask) { @@ -201,7 +197,7 @@ static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key, return err; stack = (__be32 *)skb_mpls_header(skb); - lse = MASKED(*stack, *mpls_lse, *mask); + lse = OVS_MASKED(*stack, *mpls_lse, *mask); if (skb->ip_summed == CHECKSUM_COMPLETE) { __be32 diff[] = { ~(*stack), lse }; @@ -244,9 +240,9 @@ static void ether_addr_copy_masked(u8 *dst_, const u8 *src_, const u8 *mask_) const u16 *src = (const u16 *)src_; const u16 *mask = (const u16 *)mask_; - SET_MASKED(dst[0], src[0], mask[0]); - SET_MASKED(dst[1], src[1], mask[1]); - SET_MASKED(dst[2], src[2], mask[2]); + OVS_SET_MASKED(dst[0], src[0], mask[0]); + OVS_SET_MASKED(dst[1], src[1], mask[1]); + OVS_SET_MASKED(dst[2], src[2], mask[2]); } static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key, @@ -338,10 +334,10 @@ static void update_ipv6_checksum(struct sk_buff *skb, u8 l4_proto, static void mask_ipv6_addr(const __be32 old[4], const __be32 addr[4], const __be32 mask[4], __be32 masked[4]) { - masked[0] = MASKED(old[0], addr[0], mask[0]); - masked[1] = MASKED(old[1], addr[1], mask[1]); - masked[2] = MASKED(old[2], addr[2], mask[2]); - masked[3] = MASKED(old[3], addr[3], mask[3]); + masked[0] = OVS_MASKED(old[0], addr[0], mask[0]); + masked[1] = OVS_MASKED(old[1], addr[1], mask[1]); + masked[2] = OVS_MASKED(old[2], addr[2], mask[2]); + masked[3] = OVS_MASKED(old[3], addr[3], mask[3]); } static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto, @@ -358,15 +354,15 @@ static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto, static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl, u32 mask) { /* Bits 21-24 are always unmasked, so this retains their values. */ - SET_MASKED(nh->flow_lbl[0], (u8)(fl >> 16), (u8)(mask >> 16)); - SET_MASKED(nh->flow_lbl[1], (u8)(fl >> 8), (u8)(mask >> 8)); - SET_MASKED(nh->flow_lbl[2], (u8)fl, (u8)mask); + OVS_SET_MASKED(nh->flow_lbl[0], (u8)(fl >> 16), (u8)(mask >> 16)); + OVS_SET_MASKED(nh->flow_lbl[1], (u8)(fl >> 8), (u8)(mask >> 8)); + OVS_SET_MASKED(nh->flow_lbl[2], (u8)fl, (u8)mask); } static void set_ip_ttl(struct sk_buff *skb, struct iphdr *nh, u8 new_ttl, u8 mask) { - new_ttl = MASKED(nh->ttl, new_ttl, mask); + new_ttl = OVS_MASKED(nh->ttl, new_ttl, mask); csum_replace2(&nh->check, htons(nh->ttl << 8), htons(new_ttl << 8)); nh->ttl = new_ttl; @@ -392,7 +388,7 @@ static int set_ipv4(struct sk_buff *skb, struct sw_flow_key *flow_key, * makes sense to check if the value actually changed. */ if (mask->ipv4_src) { - new_addr = MASKED(nh->saddr, key->ipv4_src, mask->ipv4_src); + new_addr = OVS_MASKED(nh->saddr, key->ipv4_src, mask->ipv4_src); if (unlikely(new_addr != nh->saddr)) { set_ip_addr(skb, nh, &nh->saddr, new_addr); @@ -400,7 +396,7 @@ static int set_ipv4(struct sk_buff *skb, struct sw_flow_key *flow_key, } } if (mask->ipv4_dst) { - new_addr = MASKED(nh->daddr, key->ipv4_dst, mask->ipv4_dst); + new_addr = OVS_MASKED(nh->daddr, key->ipv4_dst, mask->ipv4_dst); if (unlikely(new_addr != nh->daddr)) { set_ip_addr(skb, nh, &nh->daddr, new_addr); @@ -488,7 +484,8 @@ static int set_ipv6(struct sk_buff *skb, struct sw_flow_key *flow_key, *(__be32 *)nh & htonl(IPV6_FLOWINFO_FLOWLABEL); } if (mask->ipv6_hlimit) { - SET_MASKED(nh->hop_limit, key->ipv6_hlimit, mask->ipv6_hlimit); + OVS_SET_MASKED(nh->hop_limit, key->ipv6_hlimit, + mask->ipv6_hlimit); flow_key->ip.ttl = nh->hop_limit; } return 0; @@ -517,8 +514,8 @@ static int
Re: [PATCH net-next 09/13] vxlan: provide access function for vxlan socket address family
> On Aug 18, 2015, at 1:33 PM, Jiri Benc wrote: > > Signed-off-by: Jiri Benc > --- > drivers/net/vxlan.c | 8 > include/net/vxlan.h | 5 + > 2 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c > index e4b8ab63d0fa..d5ca1d7e0b81 100644 > --- a/drivers/net/vxlan.c > +++ b/drivers/net/vxlan.c > @@ -236,7 +236,7 @@ static struct vxlan_sock *vxlan_find_sock(struct net > *net, sa_family_t family, > > hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) { > if (inet_sk(vs->sock->sk)->inet_sport == port && > - inet_sk(vs->sock->sk)->sk.sk_family == family && > + vxlan_get_sk_family(vs) == family && > vs->flags == flags) > return vs; > } > @@ -625,7 +625,7 @@ static void vxlan_notify_add_rx_port(struct vxlan_sock > *vs) > struct net_device *dev; > struct sock *sk = vs->sock->sk; > struct net *net = sock_net(sk); > - sa_family_t sa_family = sk->sk_family; > + sa_family_t sa_family = vxlan_get_sk_family(vs); > __be16 port = inet_sk(sk)->inet_sport; > int err; > > @@ -650,7 +650,7 @@ static void vxlan_notify_del_rx_port(struct vxlan_sock > *vs) > struct net_device *dev; > struct sock *sk = vs->sock->sk; > struct net *net = sock_net(sk); > - sa_family_t sa_family = sk->sk_family; > + sa_family_t sa_family = vxlan_get_sk_family(vs); > __be16 port = inet_sk(sk)->inet_sport; > > rcu_read_lock(); > @@ -2390,7 +2390,7 @@ void vxlan_get_rx_port(struct net_device *dev) > for (i = 0; i < PORT_HASH_SIZE; ++i) { > hlist_for_each_entry_rcu(vs, &vn->sock_list[i], hlist) { > port = inet_sk(vs->sock->sk)->inet_sport; > - sa_family = vs->sock->sk->sk_family; > + sa_family = vxlan_get_sk_family(vs); > dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family, > port); > } > diff --git a/include/net/vxlan.h b/include/net/vxlan.h > index e4534f1b2d8c..43677e6b9c43 100644 > --- a/include/net/vxlan.h > +++ b/include/net/vxlan.h > @@ -241,3 +241,8 @@ static inline void vxlan_get_rx_port(struct net_device > *netdev) > } > #endif > #endif > + > +static inline unsigned short vxlan_get_sk_family(struct vxlan_sock *vs) > +{ > + return vs->sock->sk->sk_family; > +} This causes build problems because vxlan_get_sk_family is not inside the #endif protecting the file for multiple inclusion. Please put vxlan_get_sk_family inside the last #endif. -- Mark Rustad, Networking Division, Intel Corporation signature.asc Description: Message signed with OpenPGP using GPGMail
[PATCHv5 net-next 05/10] openvswitch: Add conntrack action
Expose the kernel connection tracker via OVS. Userspace components can make use of the CT action to populate the connection state (ct_state) field for a flow. This state can be subsequently matched. Exposed connection states are OVS_CS_F_*: - NEW (0x01) - Beginning of a new connection. - ESTABLISHED (0x02) - Part of an existing connection. - RELATED (0x04) - Related to an established connection. - INVALID (0x20) - Could not track the connection for this packet. - REPLY_DIR (0x40) - This packet is in the reply direction for the flow. - TRACKED (0x80) - This packet has been sent through conntrack. When the CT action is executed by itself, it will send the packet through the connection tracker and populate the ct_state field with one or more of the connection state flags above. The CT action will always set the TRACKED bit. When the COMMIT flag is passed to the conntrack action, this specifies that information about the connection should be stored. This allows subsequent packets for the same (or related) connections to be correlated with this connection. Sending subsequent packets for the connection through conntrack allows the connection tracker to consider the packets as ESTABLISHED, RELATED, and/or REPLY_DIR. The CT action may optionally take a zone to track the flow within. This allows connections with the same 5-tuple to be kept logically separate from connections in other zones. If the zone is specified, then the "ct_zone" match field will be subsequently populated with the zone id. IP fragments are handled by transparently assembling them as part of the CT action. The maximum received unit (MRU) size is tracked so that refragmentation can occur during output. IP frag handling contributed by Andy Zhou. Signed-off-by: Joe Stringer Signed-off-by: Justin Pettit Signed-off-by: Andy Zhou --- This can be tested with the corresponding userspace component here: https://www.github.com/justinpettit/openvswitch conntrack v2: Don't take references to devs or dsts in output path. Shift ovs_ct_init()/ovs_ct_exit() into this patch Handle output case where flow key is invalidated Store the entire L2 header to apply to fragments Various minor simplifications Improve comments/logs Style fixes Rebase v3: Clone dst in output, free final dst reference properly. Handle CHECKSUM_COMPLETE after fragmentation Restore L2 skb metadata after fragmentation Make MRU types more consistent Better cleanup in error paths Fix sparse warnings v4: Reject set_field actions for ct_state,ct_zone Combine key->ct update from skb->nfct into a single function. Minor documentation tweaks. Simplify some codepaths. v5: Fix ovs_ct_verify(). Don't take references on nf_conntrack_ipv[46] Replace some #ifdefs with IS_ENABLED. Remove unused functions. Rebase. --- include/uapi/linux/openvswitch.h | 40 net/openvswitch/Kconfig | 11 + net/openvswitch/Makefile | 2 + net/openvswitch/actions.c| 175 +++- net/openvswitch/conntrack.c | 442 +++ net/openvswitch/conntrack.h | 70 +++ net/openvswitch/datapath.c | 66 -- net/openvswitch/datapath.h | 6 + net/openvswitch/flow.c | 2 + net/openvswitch/flow.h | 6 + net/openvswitch/flow_netlink.c | 72 +-- net/openvswitch/flow_netlink.h | 4 +- net/openvswitch/vport.c | 1 + 13 files changed, 860 insertions(+), 37 deletions(-) create mode 100644 net/openvswitch/conntrack.c create mode 100644 net/openvswitch/conntrack.h diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index d6b8854..55f5997 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -164,6 +164,9 @@ enum ovs_packet_cmd { * %OVS_USERSPACE_ATTR_EGRESS_TUN_PORT attribute, which is sent only if the * output port is actually a tunnel port. Contains the output tunnel key * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes. + * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and + * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment + * size. * * These attributes follow the &struct ovs_header within the Generic Netlink * payload for %OVS_PACKET_* commands. @@ -180,6 +183,7 @@ enum ovs_packet_attr { OVS_PACKET_ATTR_UNUSED2, OVS_PACKET_ATTR_PROBE, /* Packet operation is a feature probe, error logging should be suppressed. */ + OVS_PACKET_ATTR_MRU,/* Maximum received IP fragment size. */ __OVS_PACKET_ATTR_MAX }; @@ -319,6 +323,8 @@ enum ovs_key_attr { OVS_KEY_ATTR_MPLS, /* array of struct ovs_key_mpls. * The implementation may restrict * the accepted length of the array. */ + OVS_KEY_ATTR_CT_STATE, /* u8 bitmask of OVS_CS_
[PATCHv5 net-next 08/10] netfilter: connlabels: Export setting connlabel length
Add functions to change connlabel length into nf_conntrack_labels.c so they may be reused by other modules like OVS and nftables without needing to jump through xt_match_check() hoops. Suggested-by: Florian Westphal Signed-off-by: Joe Stringer Acked-by: Florian Westphal Acked-by: Thomas Graf --- v2: Protect connlabel modification with spinlock. Fix reference leak in error case. Style fixups. v3: No change. v4-v5: Add acks. --- include/net/netfilter/nf_conntrack_labels.h | 4 net/netfilter/nf_conntrack_labels.c | 32 + net/netfilter/xt_connlabel.c| 16 --- 3 files changed, 40 insertions(+), 12 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_labels.h b/include/net/netfilter/nf_conntrack_labels.h index dec6336..7e2b1d0 100644 --- a/include/net/netfilter/nf_conntrack_labels.h +++ b/include/net/netfilter/nf_conntrack_labels.h @@ -54,7 +54,11 @@ int nf_connlabels_replace(struct nf_conn *ct, #ifdef CONFIG_NF_CONNTRACK_LABELS int nf_conntrack_labels_init(void); void nf_conntrack_labels_fini(void); +int nf_connlabels_get(struct net *net, unsigned int n_bits); +void nf_connlabels_put(struct net *net); #else static inline int nf_conntrack_labels_init(void) { return 0; } static inline void nf_conntrack_labels_fini(void) {} +static inline int nf_connlabels_get(struct net *net, unsigned int n_bits) { return 0; } +static inline void nf_connlabels_put(struct net *net) {} #endif diff --git a/net/netfilter/nf_conntrack_labels.c b/net/netfilter/nf_conntrack_labels.c index daa7c13..3ce5c31 100644 --- a/net/netfilter/nf_conntrack_labels.c +++ b/net/netfilter/nf_conntrack_labels.c @@ -14,6 +14,8 @@ #include #include +static spinlock_t nf_connlabels_lock; + static unsigned int label_bits(const struct nf_conn_labels *l) { unsigned int longs = l->words; @@ -89,6 +91,35 @@ int nf_connlabels_replace(struct nf_conn *ct, } EXPORT_SYMBOL_GPL(nf_connlabels_replace); +int nf_connlabels_get(struct net *net, unsigned int n_bits) +{ + size_t words; + + if (n_bits > (NF_CT_LABELS_MAX_SIZE * BITS_PER_BYTE)) + return -ERANGE; + + words = BITS_TO_LONGS(n_bits); + + spin_lock(&nf_connlabels_lock); + net->ct.labels_used++; + if (words > net->ct.label_words) + net->ct.label_words = words; + spin_unlock(&nf_connlabels_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(nf_connlabels_get); + +void nf_connlabels_put(struct net *net) +{ + spin_lock(&nf_connlabels_lock); + net->ct.labels_used--; + if (net->ct.labels_used == 0) + net->ct.label_words = 0; + spin_unlock(&nf_connlabels_lock); +} +EXPORT_SYMBOL_GPL(nf_connlabels_put); + static struct nf_ct_ext_type labels_extend __read_mostly = { .len= sizeof(struct nf_conn_labels), .align = __alignof__(struct nf_conn_labels), @@ -97,6 +128,7 @@ static struct nf_ct_ext_type labels_extend __read_mostly = { int nf_conntrack_labels_init(void) { + spin_lock_init(&nf_connlabels_lock); return nf_ct_extend_register(&labels_extend); } diff --git a/net/netfilter/xt_connlabel.c b/net/netfilter/xt_connlabel.c index 9f8719d..bb9cbeb 100644 --- a/net/netfilter/xt_connlabel.c +++ b/net/netfilter/xt_connlabel.c @@ -42,10 +42,6 @@ static int connlabel_mt_check(const struct xt_mtchk_param *par) XT_CONNLABEL_OP_SET; struct xt_connlabel_mtinfo *info = par->matchinfo; int ret; - size_t words; - - if (info->bit > XT_CONNLABEL_MAXBIT) - return -ERANGE; if (info->options & ~options) { pr_err("Unknown options in mask %x\n", info->options); @@ -59,19 +55,15 @@ static int connlabel_mt_check(const struct xt_mtchk_param *par) return ret; } - par->net->ct.labels_used++; - words = BITS_TO_LONGS(info->bit+1); - if (words > par->net->ct.label_words) - par->net->ct.label_words = words; - + ret = nf_connlabels_get(par->net, info->bit + 1); + if (ret < 0) + nf_ct_l3proto_module_put(par->family); return ret; } static void connlabel_mt_destroy(const struct xt_mtdtor_param *par) { - par->net->ct.labels_used--; - if (par->net->ct.labels_used == 0) - par->net->ct.label_words = 0; + nf_connlabels_put(par->net); nf_ct_l3proto_module_put(par->family); } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5 net-next 09/10] openvswitch: Allow matching on conntrack label
Allow matching and setting the ct_label field. As with ct_mark, this is populated by executing the CT action. The label field may be modified by specifying a label and mask nested under the CT action. It is stored as metadata attached to the connection. Label modification occurs after lookup, and will only persist when the conntrack entry is committed by providing the COMMIT flag to the CT action. Labels are currently fixed to 128 bits in size. Signed-off-by: Joe Stringer --- v2: Split out setting the connlabel size for the current namespace. v3: No change. v4: Only allow setting label via ct action. Update documentation. v5: Fix ovs_ct_verify(). Add label to ct action serialization. Free label bit length/reference properly. Configure OVS label length per-netns, not per-dp. Reject ct actions with label length longer than supported. Replace some #ifdefs with IS_ENABLED. Rebase. --- include/uapi/linux/openvswitch.h | 10 net/openvswitch/actions.c| 1 + net/openvswitch/conntrack.c | 123 ++- net/openvswitch/conntrack.h | 11 +++- net/openvswitch/datapath.c | 18 +++--- net/openvswitch/datapath.h | 3 + net/openvswitch/flow.c | 4 +- net/openvswitch/flow.h | 3 +- net/openvswitch/flow_netlink.c | 50 +++- net/openvswitch/flow_netlink.h | 9 +-- 10 files changed, 198 insertions(+), 34 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 7a185b5..9d52058 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -326,6 +326,7 @@ enum ovs_key_attr { OVS_KEY_ATTR_CT_STATE, /* u8 bitmask of OVS_CS_F_* */ OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */ OVS_KEY_ATTR_CT_MARK, /* u32 connection tracking mark */ + OVS_KEY_ATTR_CT_LABEL, /* 16-octet connection tracking label */ #ifdef __KERNEL__ OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */ @@ -438,6 +439,11 @@ struct ovs_key_nd { __u8nd_tll[ETH_ALEN]; }; +#define OVS_CT_LABEL_LEN 16 +struct ovs_key_ct_label { + __u8ct_label[OVS_CT_LABEL_LEN]; +}; + /* OVS_KEY_ATTR_CT_STATE flags */ #define OVS_CS_F_NEW 0x01 /* Beginning of a new connection. */ #define OVS_CS_F_ESTABLISHED 0x02 /* Part of an existing connection. */ @@ -617,12 +623,16 @@ struct ovs_action_hash { * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the * mask, the corresponding bit in the value is copied to the connection * tracking mark field in the connection. + * @OVS_CT_ATTR_LABEL: %OVS_CT_LABEL_LEN value followed by %OVS_CT_LABEL_LEN + * mask. For each bit set in the mask, the corresponding bit in the value is + * copied to the connection tracking label field in the connection. */ enum ovs_ct_attr { OVS_CT_ATTR_UNSPEC, OVS_CT_ATTR_FLAGS, /* u8 bitmask of OVS_CT_F_*. */ OVS_CT_ATTR_ZONE, /* u16 zone id. */ OVS_CT_ATTR_MARK, /* mark to associate with this connection. */ + OVS_CT_ATTR_LABEL, /* label to associate with this connection. */ __OVS_CT_ATTR_MAX }; diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 9741d2c..736a113 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -969,6 +969,7 @@ static int execute_masked_set_action(struct sk_buff *skb, case OVS_KEY_ATTR_CT_STATE: case OVS_KEY_ATTR_CT_ZONE: case OVS_KEY_ATTR_CT_MARK: + case OVS_KEY_ATTR_CT_LABEL: err = -EINVAL; break; } diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index daea29e..8cb0987 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -34,6 +35,12 @@ struct md_mark { u32 mask; }; +/* Metadata label for masked write to conntrack label. */ +struct md_label { + struct ovs_key_ct_label value; + struct ovs_key_ct_label mask; +}; + /* Conntrack action context for execution. */ struct ovs_conntrack_info { struct nf_conntrack_zone zone; @@ -41,6 +48,7 @@ struct ovs_conntrack_info { u32 flags; u16 family; struct md_mark mark; + struct md_label label; }; static u16 key_to_nfproto(const struct sw_flow_key *key) @@ -90,6 +98,24 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo) return ct_state; } +static void ovs_ct_get_label(const struct nf_conn *ct, +struct ovs_key_ct_label *label) +{ + struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL; + + if (cl) { + size_t len = cl->words * sizeof(long); + + if (len > OVS_CT_LABEL_LEN) + len = OVS_CT_LABEL_LEN; + else if (le
[PATCHv5 net-next 07/10] netfilter: Always export nf_connlabels_replace()
The following patches will reuse this code from OVS. Signed-off-by: Joe Stringer Acked-by: Pravin B Shelar Acked-by: Thomas Graf --- v2-v4: No change. v5: Add acks. --- net/netfilter/nf_conntrack_labels.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/netfilter/nf_conntrack_labels.c b/net/netfilter/nf_conntrack_labels.c index bb53f12..daa7c13 100644 --- a/net/netfilter/nf_conntrack_labels.c +++ b/net/netfilter/nf_conntrack_labels.c @@ -48,7 +48,6 @@ int nf_connlabel_set(struct nf_conn *ct, u16 bit) } EXPORT_SYMBOL_GPL(nf_connlabel_set); -#if IS_ENABLED(CONFIG_NF_CT_NETLINK) static void replace_u32(u32 *address, u32 mask, u32 new) { u32 old, tmp; @@ -89,7 +88,6 @@ int nf_connlabels_replace(struct nf_conn *ct, return 0; } EXPORT_SYMBOL_GPL(nf_connlabels_replace); -#endif static struct nf_ct_ext_type labels_extend __read_mostly = { .len= sizeof(struct nf_conn_labels), -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5 net-next 10/10] openvswitch: Allow attaching helpers to ct action
Add support for using conntrack helpers to assist protocol detection. The new OVS_CT_ATTR_HELPER attribute of the CT action specifies a helper to be used for this connection. If no helper is specified, then helpers will be automatically applied as per the sysctl configuration of net.netfilter.nf_conntrack_helper. The helper may be specified as part of the conntrack action, eg: ct(helper=ftp). Initial packets for related connections should be committed to allow later packets for the flow to be considered established. Example ovs-ofctl flows allowing FTP connections from ports 1->2: in_port=1,tcp,action=ct(helper=ftp,commit),2 in_port=2,tcp,ct_state=-trk,action=ct(recirc) in_port=2,tcp,ct_state=+trk-new+est,action=1 in_port=2,tcp,ct_state=+trk+rel,action=1 Signed-off-by: Joe Stringer --- v2-v3: No change. v4: Change error code for unknown helper ENOENT->EINVAL. v5: Fix rcu access of helpers. Rebase. --- include/uapi/linux/openvswitch.h | 3 ++ net/openvswitch/conntrack.c | 109 ++- 2 files changed, 110 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 9d52058..32e07d8 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -626,6 +626,7 @@ struct ovs_action_hash { * @OVS_CT_ATTR_LABEL: %OVS_CT_LABEL_LEN value followed by %OVS_CT_LABEL_LEN * mask. For each bit set in the mask, the corresponding bit in the value is * copied to the connection tracking label field in the connection. + * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG. */ enum ovs_ct_attr { OVS_CT_ATTR_UNSPEC, @@ -633,6 +634,8 @@ enum ovs_ct_attr { OVS_CT_ATTR_ZONE, /* u16 zone id. */ OVS_CT_ATTR_MARK, /* mark to associate with this connection. */ OVS_CT_ATTR_LABEL, /* label to associate with this connection. */ + OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of + related connections. */ __OVS_CT_ATTR_MAX }; diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 8cb0987..ac6d1d2 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -43,6 +44,7 @@ struct md_label { /* Conntrack action context for execution. */ struct ovs_conntrack_info { + struct nf_conntrack_helper *helper; struct nf_conntrack_zone zone; struct nf_conn *ct; u32 flags; @@ -213,6 +215,51 @@ static int ovs_ct_set_label(struct sk_buff *skb, struct sw_flow_key *key, return 0; } +/* 'skb' should already be pulled to nh_ofs. */ +static int ovs_ct_helper(struct sk_buff *skb, u16 proto) +{ + const struct nf_conntrack_helper *helper; + const struct nf_conn_help *help; + enum ip_conntrack_info ctinfo; + unsigned int protoff; + struct nf_conn *ct; + + ct = nf_ct_get(skb, &ctinfo); + if (!ct || ctinfo == IP_CT_RELATED_REPLY) + return NF_ACCEPT; + + help = nfct_help(ct); + if (!help) + return NF_ACCEPT; + + helper = rcu_dereference(help->helper); + if (!helper) + return NF_ACCEPT; + + switch (proto) { + case NFPROTO_IPV4: + protoff = ip_hdrlen(skb); + break; + case NFPROTO_IPV6: { + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + __be16 frag_off; + + protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), + &nexthdr, &frag_off); + if (protoff < 0 || (frag_off & htons(~0x7)) != 0) { + pr_debug("proto header not found\n"); + return NF_ACCEPT; + } + break; + } + default: + WARN_ONCE(1, "helper invoked on non-IP family!"); + return NF_DROP; + } + + return helper->help(skb, protoff, ct, ctinfo); +} + static int handle_fragments(struct net *net, struct sw_flow_key *key, u16 zone, struct sk_buff *skb) { @@ -285,6 +332,13 @@ static bool skb_nfct_cached(const struct net *net, const struct sk_buff *skb, return false; if (!nf_ct_zone_equal_any(info->ct, nf_ct_zone(ct))) return false; + if (info->helper) { + struct nf_conn_help *help; + + help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER); + if (help && rcu_access_pointer(help->helper) != info->helper) + return false; + } return true; } @@ -313,6 +367,11 @@ static int __ovs_ct_lookup(struct net *net, const struct sw_flow_key *key, if (nf_conntrack_in(net, info->family, NF_INET_PRE_ROUTING, skb) != NF_ACCEPT)
[PATCHv5 net-next 06/10] openvswitch: Allow matching on conntrack mark
Allow matching and setting the ct_mark field. As with ct_state and ct_zone, these fields are populated when the CT action is executed. To write to this field, a value and mask can be specified as a nested attribute under the CT action. This data is stored with the conntrack entry, and is executed after the lookup occurs for the CT action. The conntrack entry itself must be committed using the COMMIT flag in the CT action flags for this change to persist. Signed-off-by: Justin Pettit Signed-off-by: Joe Stringer --- v1-v3: No change. v4: Only allow setting conntrack mark via ct action. Documentation tweaks. v5: Rebase against conntrack zone changes. Add ct_mark to ct action serialization Replace some #ifdefs with IS_ENABLED. --- include/uapi/linux/openvswitch.h | 5 net/openvswitch/actions.c| 1 + net/openvswitch/conntrack.c | 63 ++-- net/openvswitch/conntrack.h | 1 + net/openvswitch/flow.h | 1 + net/openvswitch/flow_netlink.c | 15 +- 6 files changed, 82 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 55f5997..7a185b5 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -325,6 +325,7 @@ enum ovs_key_attr { * the accepted length of the array. */ OVS_KEY_ATTR_CT_STATE, /* u8 bitmask of OVS_CS_F_* */ OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */ + OVS_KEY_ATTR_CT_MARK, /* u32 connection tracking mark */ #ifdef __KERNEL__ OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */ @@ -613,11 +614,15 @@ struct ovs_action_hash { * enum ovs_ct_attr - Attributes for %OVS_ACTION_ATTR_CT action. * @OVS_CT_ATTR_FLAGS: u32 connection tracking flags. * @OVS_CT_ATTR_ZONE: u16 connection tracking zone. + * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the + * mask, the corresponding bit in the value is copied to the connection + * tracking mark field in the connection. */ enum ovs_ct_attr { OVS_CT_ATTR_UNSPEC, OVS_CT_ATTR_FLAGS, /* u8 bitmask of OVS_CT_F_*. */ OVS_CT_ATTR_ZONE, /* u16 zone id. */ + OVS_CT_ATTR_MARK, /* mark to associate with this connection. */ __OVS_CT_ATTR_MAX }; diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 72ca2c4..9741d2c 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -968,6 +968,7 @@ static int execute_masked_set_action(struct sk_buff *skb, case OVS_KEY_ATTR_CT_STATE: case OVS_KEY_ATTR_CT_ZONE: + case OVS_KEY_ATTR_CT_MARK: err = -EINVAL; break; } diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 4b7c4d7..daea29e 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -28,12 +28,19 @@ struct ovs_ct_len_tbl { size_t minlen; }; +/* Metadata mark for masked write to conntrack mark */ +struct md_mark { + u32 value; + u32 mask; +}; + /* Conntrack action context for execution. */ struct ovs_conntrack_info { struct nf_conntrack_zone zone; struct nf_conn *ct; u32 flags; u16 family; + struct md_mark mark; }; static u16 key_to_nfproto(const struct sw_flow_key *key) @@ -84,10 +91,12 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo) } static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, - const struct nf_conntrack_zone *zone) + const struct nf_conntrack_zone *zone, + const struct nf_conn *ct) { key->ct.state = state; key->ct.zone = zone->id; + key->ct.mark = ct ? ct->mark : 0; } /* Update 'key' based on skb->nfct. If 'post_ct' is true, then OVS has @@ -110,7 +119,7 @@ static void ovs_ct_update_key(const struct sk_buff *skb, } else if (post_ct) { state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID; } - __ovs_ct_update_key(key, state, zone); + __ovs_ct_update_key(key, state, zone, ct); } void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key) @@ -118,6 +127,31 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key) ovs_ct_update_key(skb, key, false); } +static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key, + u32 ct_mark, u32 mask) +{ + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + u32 new_mark; + + if (!IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)) + return -ENOTSUPP; + + /* The connection could be invalid, in which case set_mark is no-op. */ + ct = nf_ct_get(skb, &ctinfo); + if (!ct) + return 0; + + new_mark = ct_mark | (ct->mark & ~(mask)); + if (ct->mark !=
Re: Correct way to access MDIO bus - phy.c seems buggy
On 24/08/15 17:09, Russell King - ARM Linux wrote: > Hi, > > While trying to track down instability in the FEC driver, I've come > across this question: what is the correct way to access the MDIO bus? > > Is it via: > > bus->write() > > where 'bus' is a struct mii_bus, or should it be via mdiobus_write()? > > What I'm seeing in the FEC driver is two thread trying to access the > MDIO bus simultaneously - one thread trying to do a read, and another > trying to do a write. The result is far from pretty with the current > mainline code, because we can end up re-initialising a spinlock while > it's held by the fec interrupt handler. > > I think the correct answer is that mdiobus_write() should be used, > which makes drivers/net/phy/phy.c horribly buggy, as it bypasses the > locking at the mdiobus level by doing this: Right the correct way is to use mdiobus_write() which takes the bus mutex. > > mmd_phy_indirect() > { > bus->write(bus, addr, MII_MMD_CTRL, devad); > bus->write(bus, addr, MII_MMD_DATA, prtad); > bus->write(bus, addr, MII_MMD_CTRL, (devad | MII_MMD_CTRL_NOINCR)); > } > > However, it's not as simple as that, because the whole set of writes > need to be done atomically. The mdio bus lock needs to be taken around > the internals of phy_read_mmd_indirect() and phy_write_mmd_indirect(). Well, yes, the bus lock should be grabbed at the beginning and released at the end of this function at the very least, good catch. > > This bug can be provoked by running an ethtool command which accesses > the phy in a tight loop on a SMP platform. For example: > > while :; do ethtool --show-eee eth0; done > > Patch will follow tomorrow. Good thing is it looks like you have isolated the only cases where we do not grab the MDIO bus mutexm the rest of the code, except phy_mmd_{read,write}_indirect() looks correct. -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2 02/11] soc/fsl: Introduce DPAA BMan device management driver
On Wed, 2015-08-12 at 16:14 -0400, Roy Pledge wrote: > From: Geoff Thorpe > > This driver enables the Freescale DPAA 1.0 Buffer Manager block. BMan > is a hardware buffer pool manager that allows accelerators > connected to the SoC datapath to acquire and release buffers during > data processing. > > Signed-off-by: Geoff Thorpe > Signed-off-by: Emil Medve > Signed-off-by: Roy Pledge > --- > drivers/soc/Kconfig |1 + > drivers/soc/Makefile |1 + > drivers/soc/fsl/Kconfig |5 + > drivers/soc/fsl/Makefile |3 + > drivers/soc/fsl/qbman/Kconfig | 25 ++ > drivers/soc/fsl/qbman/Makefile|1 + > drivers/soc/fsl/qbman/bman.c | 553 > + > drivers/soc/fsl/qbman/bman_priv.h | 53 > drivers/soc/fsl/qbman/dpaa_sys.h | 55 > 9 files changed, 697 insertions(+) > create mode 100644 drivers/soc/fsl/Kconfig > create mode 100644 drivers/soc/fsl/Makefile > create mode 100644 drivers/soc/fsl/qbman/Kconfig > create mode 100644 drivers/soc/fsl/qbman/Makefile > create mode 100644 drivers/soc/fsl/qbman/bman.c > create mode 100644 drivers/soc/fsl/qbman/bman_priv.h > create mode 100644 drivers/soc/fsl/qbman/dpaa_sys.h > > diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig > index 96ddecb..4e3c8f4 100644 > --- a/drivers/soc/Kconfig > +++ b/drivers/soc/Kconfig > @@ -1,6 +1,7 @@ > menu "SOC (System On Chip) specific Drivers" > > source "drivers/soc/mediatek/Kconfig" > +source "drivers/soc/fsl/Kconfig" > source "drivers/soc/qcom/Kconfig" > source "drivers/soc/sunxi/Kconfig" > source "drivers/soc/ti/Kconfig" > diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile > index 7dc7c0d..7adcd97 100644 > --- a/drivers/soc/Makefile > +++ b/drivers/soc/Makefile > @@ -3,6 +3,7 @@ > # > > obj-$(CONFIG_ARCH_MEDIATEK) += mediatek/ > +obj-$(CONFIG_FSL_SOC)+= fsl/ > obj-$(CONFIG_ARCH_QCOM) += qcom/ > obj-$(CONFIG_ARCH_SUNXI) += sunxi/ > obj-$(CONFIG_ARCH_TEGRA) += tegra/ > diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig > new file mode 100644 > index 000..daa9c0d > --- /dev/null > +++ b/drivers/soc/fsl/Kconfig > @@ -0,0 +1,5 @@ > +menu "Freescale SOC (System On Chip) specific Drivers" > + > +source "drivers/soc/fsl/qbman/Kconfig" > + > +endmenu > diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile > new file mode 100644 > index 000..19e74bb > --- /dev/null > +++ b/drivers/soc/fsl/Makefile > @@ -0,0 +1,3 @@ > +# Common > +obj-$(CONFIG_FSL_DPA)+= qbman/ > + > diff --git a/drivers/soc/fsl/qbman/Kconfig b/drivers/soc/fsl/qbman/Kconfig > new file mode 100644 > index 000..be4ae01 > --- /dev/null > +++ b/drivers/soc/fsl/qbman/Kconfig > @@ -0,0 +1,25 @@ > +menuconfig FSL_DPA > + bool "Freescale DPAA support" > + depends on FSL_SOC || COMPILE_TEST > + default n Drop the COMPILE_TEST -- this driver still has PPCisms that will break the build elsewhere. > + help > + FSL Data-Path Acceleration Architecture drivers > + > + These are not the actual Ethernet driver(s) > + > +if FSL_DPA > + > +config FSL_DPA_CHECKING > + bool "additional driver checking" > + default n > + help > + Compiles in additional checks to sanity-check the drivers and > + any use of it by other code. Not recommended for performance > + > +config FSL_BMAN > + tristate "BMan device management" > + default n > + help > + FSL DPAA BMan driver Please describe here what BMan is and when it should be enabled. Why isn't it always enabled when DPA is enabled? > +endif # FSL_DPA > diff --git a/drivers/soc/fsl/qbman/Makefile b/drivers/soc/fsl/qbman/Makefile > new file mode 100644 > index 000..02014d9 > --- /dev/null > +++ b/drivers/soc/fsl/qbman/Makefile > @@ -0,0 +1 @@ > +obj-$(CONFIG_FSL_BMAN) += bman.o > diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c > new file mode 100644 > index 000..9a500ce > --- /dev/null > +++ b/drivers/soc/fsl/qbman/bman.c > @@ -0,0 +1,553 @@ > +/* Copyright (c) 2009 - 2015 Freescale Semiconductor, Inc. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions are > met: > + * * Redistributions of source code must retain the above copyright > + *notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + *notice, this list of conditions and the following disclaimer in the > + *documentation and/or other materials provided with the distribution. > + * * Neither the name of Freescale Semiconductor nor the > + *names of its contributors may be used to endorse or promote products > + *derived from this software without specific prior writte
Correct way to access MDIO bus - phy.c seems buggy
Hi, While trying to track down instability in the FEC driver, I've come across this question: what is the correct way to access the MDIO bus? Is it via: bus->write() where 'bus' is a struct mii_bus, or should it be via mdiobus_write()? What I'm seeing in the FEC driver is two thread trying to access the MDIO bus simultaneously - one thread trying to do a read, and another trying to do a write. The result is far from pretty with the current mainline code, because we can end up re-initialising a spinlock while it's held by the fec interrupt handler. I think the correct answer is that mdiobus_write() should be used, which makes drivers/net/phy/phy.c horribly buggy, as it bypasses the locking at the mdiobus level by doing this: mmd_phy_indirect() { bus->write(bus, addr, MII_MMD_CTRL, devad); bus->write(bus, addr, MII_MMD_DATA, prtad); bus->write(bus, addr, MII_MMD_CTRL, (devad | MII_MMD_CTRL_NOINCR)); } However, it's not as simple as that, because the whole set of writes need to be done atomically. The mdio bus lock needs to be taken around the internals of phy_read_mmd_indirect() and phy_write_mmd_indirect(). This bug can be provoked by running an ethtool command which accesses the phy in a tight loop on a SMP platform. For example: while :; do ethtool --show-eee eth0; done Patch will follow tomorrow. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next] r8169: Add values missing in @get_stats64 from HW counters
Corinna Vinschen : > On Aug 22 13:23, Francois Romieu wrote: [...] > > Sorry, my english was really bad: > > > > the code should propagate failure when rtl8169_reset_counters and > > rtl8169_update_counters *simultaneously* fail. > > Uhm... sorry, but that still doesn't answer the question. As you can > see in my patch, the initalization at open time is already encapsulated > in a function rtl8169_init_counter_offsets. I have read your patch, I have already answered the question and I have already said that it wasn't a showstopper. -- Ueimor -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iproute2: Behavioural Bug?
Akshat Kakkar wrote: [ CC Cong ] > When I am trying to delete a single tc filter (i.e. specifying its > handle), it is deleting all the > filters with the same priority/preference. i.e. it is ignoring the > handle specified. > > But, When I am doing similar activity in hashtable 800: it is deleting only > the > specified filter, i.e. it is behaving as expected. > > I am unable to comprehend the reason for this difference in behaviour. > > Infact, in kernel 2.6.32 all is working as expected. However, in > kernel 3.1 and 4.1 it is having the behaviour as mentioned above. > > For example, following set of commands create a hashtable 15: and add > 2 filters to it. > > tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor > 256 > tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32 > ht 15:2: match ip src 10.0.0.2 flowid 1:10 > tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32 > ht 15:2: match ip src 10.0.0.3 flowid 1:10 > > Now following command DELETES ALL THE FILTERS, though it should only > delete FILTER 15:2:3 ! > tc filter del dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32 > > O/p of tc filter show eth0 is this case is blank. As all filters are deleted. Happens since 1e052be69d045c8d0f82ff1116fd3e5a79661745 ("net_sched: destroy proto tp when all filters are gone"). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2 00/11] Freescale DPAA QBMan Drivers
On Wed, 2015-08-12 at 16:14 -0400, Roy Pledge wrote: > The Freescale Data Path Acceleration Architecture (DPAA) is a set of > hardware components on specific QorIQ multicore processors. This > architecture provides the infrastructure to support simplified sharing of > networking interfaces and accelerators by multiple CPU cores and the > accelerators. > > The Queue Manager (QMan) is a hardware queue management block that allows > software and accelerators on the datapath to enqueue and dequeue frames in > order to communicate. > > The Buffer Manager (BMan) is a hardware buffer pool management block that > allows software and accelerators on the datapath to acquire and release > buffers in order to build frames. > > This patch set introduces the QBMan driver code that configures initializes > the QBMan hardware and provides APIs for software to use the frame queues > and buffer pools the blocks provide. These drivers provide the base > fuctionality for software to communicate with the other DPAA accelerators > on Freescale QorIQ processors. > > Changes from v1: > - Cleanup Kconfig options > - Changed base QMan and BMan drivers to only be buit in. > Will add loadable support in future patch CONFIG_FSL_BMAN is tristate -- is it not expected to work if you select 'm'? > - Replace panic() call with WARN_ON() panic() is still there. > > > - Replaced PowerPC specific IO accessors with platform independent > > versions PowerPC accessors, and other PPC-specfic things like cache flushing and memory barriers, are still there. -Scott -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 net-next 3/8] tunnel: introduce udp_tun_rx_dst()
On Mon, Aug 24, 2015 at 10:43 AM, Pravin B Shelar wrote: > Introduce function udp_tun_rx_dst() to initialize tunnel dst on > receive path. > > Signed-off-by: Pravin B Shelar Reviewed-by: Jesse Gross -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] net: Fix RCU splat in af_key
From: David Ahern Date: Mon, 24 Aug 2015 15:17:17 -0600 > Hit the following splat testing VRF change for ipsec: ... > In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes > the RCU lock. > > Since pfkey_broadcast takes the RCU lock the allocation argument is > pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock. > The one call outside of rcu can be done with GFP_KERNEL. > > Fixes: 7f6b9dbd5afbd ("af_key: locking change") > Signed-off-by: David Ahern > --- > v2 > - removed allocation arg and hardcoded to GFP_ATOMIC during rcu locking Applied, thanks David. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] net: Fix RCU splat in af_key
On Mon, 2015-08-24 at 15:17 -0600, David Ahern wrote: > Hit the following splat testing VRF change for ipsec: > > In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes > the RCU lock. > > Since pfkey_broadcast takes the RCU lock the allocation argument is > pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock. > The one call outside of rcu can be done with GFP_KERNEL. > > Fixes: 7f6b9dbd5afbd ("af_key: locking change") > Signed-off-by: David Ahern > --- > v2 > - removed allocation arg and hardcoded to GFP_ATOMIC during rcu locking Acked-by: Eric Dumazet -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] MAINTAINERS: update vmxnet3 driver maintainer
Shreyas Bhatewara would no longer maintain the vmxnet3 driver. Taking over the role of vmxnet3 maintainer. Signed-off-by: Shrikrishna Khare Signed off-by: Shreyas Bhatewara --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 4e6dcb6..2963a89 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11053,7 +11053,7 @@ F: drivers/input/mouse/vmmouse.c F: drivers/input/mouse/vmmouse.h VMWARE VMXNET3 ETHERNET DRIVER -M: Shreyas Bhatewara +M: Shrikrishna Khare M: "VMware, Inc." L: netdev@vger.kernel.org S: Maintained -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] net: Fix RCU splat in af_key
Hit the following splat testing VRF change for ipsec: [ 113.475692] === [ 113.476194] [ INFO: suspicious RCU usage. ] [ 113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted [ 113.477545] --- [ 113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-side critical section! [ 113.479288] [ 113.479288] other info that might help us debug this: [ 113.479288] [ 113.480207] [ 113.480207] rcu_scheduler_active = 1, debug_locks = 1 [ 113.480931] 2 locks held by setkey/6829: [ 113.481371] #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [] pfkey_sendmsg+0xfb/0x213 [ 113.482509] #1: (rcu_read_lock){..}, at: [] rcu_read_lock+0x0/0x6e [ 113.483509] [ 113.483509] stack backtrace: [ 113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED [ 113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 113.486845] 0001 88001d4c7a98 81518af2 81086962 [ 113.487732] 88001d538480 88001d4c7ac8 8107ae75 8180a154 [ 113.488628] 0b30 00d0 88001d4c7ad8 [ 113.489525] Call Trace: [ 113.489813] [] dump_stack+0x4c/0x65 [ 113.490389] [] ? console_unlock+0x3d6/0x405 [ 113.491039] [] lockdep_rcu_suspicious+0xfa/0x103 [ 113.491735] [] rcu_preempt_sleep_check+0x45/0x47 [ 113.492442] [] ___might_sleep+0x19/0x1c8 [ 113.493077] [] __might_sleep+0x6c/0x82 [ 113.493681] [] cache_alloc_debugcheck_before.isra.50+0x1d/0x24 [ 113.494508] [] kmem_cache_alloc+0x31/0x18f [ 113.495149] [] skb_clone+0x64/0x80 [ 113.495712] [] pfkey_broadcast_one+0x3d/0xff [ 113.496380] [] pfkey_broadcast+0xb5/0x11e [ 113.497024] [] pfkey_register+0x191/0x1b1 [ 113.497653] [] pfkey_process+0x162/0x17e [ 113.498274] [] pfkey_sendmsg+0x109/0x213 In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes the RCU lock. Since pfkey_broadcast takes the RCU lock the allocation argument is pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock. The one call outside of rcu can be done with GFP_KERNEL. Fixes: 7f6b9dbd5afbd ("af_key: locking change") Signed-off-by: David Ahern --- v2 - removed allocation arg and hardcoded to GFP_ATOMIC during rcu locking net/key/af_key.c | 46 +++--- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/net/key/af_key.c b/net/key/af_key.c index b397f0aa9005..83a70688784b 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -219,7 +219,7 @@ static int pfkey_broadcast_one(struct sk_buff *skb, struct sk_buff **skb2, #define BROADCAST_ONE 1 #define BROADCAST_REGISTERED 2 #define BROADCAST_PROMISC_ONLY 4 -static int pfkey_broadcast(struct sk_buff *skb, gfp_t allocation, +static int pfkey_broadcast(struct sk_buff *skb, int broadcast_flags, struct sock *one_sk, struct net *net) { @@ -244,7 +244,7 @@ static int pfkey_broadcast(struct sk_buff *skb, gfp_t allocation, * socket. */ if (pfk->promisc) - pfkey_broadcast_one(skb, &skb2, allocation, sk); + pfkey_broadcast_one(skb, &skb2, GFP_ATOMIC, sk); /* the exact target will be processed later */ if (sk == one_sk) @@ -259,7 +259,7 @@ static int pfkey_broadcast(struct sk_buff *skb, gfp_t allocation, continue; } - err2 = pfkey_broadcast_one(skb, &skb2, allocation, sk); + err2 = pfkey_broadcast_one(skb, &skb2, GFP_ATOMIC, sk); /* Error is cleare after succecful sending to at least one * registered KM */ @@ -269,7 +269,7 @@ static int pfkey_broadcast(struct sk_buff *skb, gfp_t allocation, rcu_read_unlock(); if (one_sk != NULL) - err = pfkey_broadcast_one(skb, &skb2, allocation, one_sk); + err = pfkey_broadcast_one(skb, &skb2, GFP_KERNEL, one_sk); kfree_skb(skb2); kfree_skb(skb); @@ -292,7 +292,7 @@ static int pfkey_do_dump(struct pfkey_sock *pfk) hdr = (struct sadb_msg *) pfk->dump.skb->data; hdr->sadb_msg_seq = 0; hdr->sadb_msg_errno = rc; - pfkey_broadcast(pfk->dump.skb, GFP_ATOMIC, BROADCAST_ONE, + pfkey_broadcast(pfk->dump.skb, BROADCAST_ONE, &pfk->sk, sock_net(&pfk->sk)); pfk->dump.skb = NULL; } @@ -333,7 +333,7 @@ static int pfkey_error(const struct sadb_msg *orig, int err, struct sock *sk) hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof
Re: [PATCH v3 00/22] FUJITSU Extended Socket network device driver
From: Taku Izumi Date: Fri, 21 Aug 2015 17:28:00 +0900 > This patchsets adds FUJITSU Extended Socket network device driver. > Extended Socket network device is a shared memory based high-speed > network interface between Extended Partitions of PRIMEQUEST 2000 E2 > series. > > You can get some information about Extended Partition and Extended > Socket by referring the following manual. > > http://globalsp.ts.fujitsu.com/dmsp/Publications/public/CA92344-0537.pdf > 3.2.1 Extended Partitioning > 3.2.2 Extended Socke > > v2.2 -> v3: >- Fix up according to David's comment (No functional change) Series applied, thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iproute2] add support for brief output for link and addresses
On Mon, 24 Aug 2015 20:41:16 + Andy Gospodarek wrote: > This adds support for slightly less output than is normally provided by > 'ip link show' and 'ip addr show'. This is a bit better when you have a > host with lots of interfaces. Sample output: > > $ ip -br link show > lo UNKNOWN 00:00:00:00:00:00 > p7p1 UP 08:00:27:9d:62:9f > p8p1 DOWN 08:00:27:dc:d8:ca > > p9p1 UP 08:00:27:76:d9:75 > p7p1.100@p7p1UP 08:00:27:9d:62:9f > > $ ip -br -4 addr show > lo UNKNOWN 127.0.0.1/8 > p7p1 UP 70.0.0.1/24 > p8p1 DOWN 80.0.0.1/24 > p7p1.100@p7p1UP 200.0.0.1/24 > > $ ip -br -6 addr show > lo UNKNOWN ::1/128 > p7p1 UP 7000::1/8 fe80::a00:27ff:fe9d:629f/64 > p8p1 DOWN 8000::1/8 > p9p1 UP fe80::a00:27ff:fe76:d975/64 > p7p1.100@p7p1UP fe80::a00:27ff:fe9d:629f/64 > > $ ip -br addr show p7p1 > p7p1 UP 70.0.0.1/24 7000::1/8 fe80::a00:27ff:fe9d:629f/64 > > Signed-off-by: Andy Gospodarek Cool, we could colorize this as well :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] usbnet: Fix a race between usbnet_stop() and the BH
Eugene Shatokhin writes: > The race may happen when a device (e.g. YOTA 4G LTE Modem) is > unplugged while the system is downloading a large file from the Net. > > Hardware breakpoints and Kprobes with delays were used to confirm that > the race does actually happen. > > The race is on skb_queue ('next' pointer) between usbnet_stop() > and rx_complete(), which, in turn, calls usbnet_bh(). > > Here is a part of the call stack with the code where the changes to the > queue happen. The line numbers are for the kernel 4.1.0: > > *0 __skb_unlink (skbuff.h:1517) > prev->next = next; > *1 defer_bh (usbnet.c:430) > spin_lock_irqsave(&list->lock, flags); > old_state = entry->state; > entry->state = state; > __skb_unlink(skb, list); > spin_unlock(&list->lock); > spin_lock(&dev->done.lock); > __skb_queue_tail(&dev->done, skb); > if (dev->done.qlen == 1) > tasklet_schedule(&dev->bh); > spin_unlock_irqrestore(&dev->done.lock, flags); > *2 rx_complete (usbnet.c:640) > state = defer_bh(dev, skb, &dev->rxq, state); > > At the same time, the following code repeatedly checks if the queue is > empty and reads these values concurrently with the above changes: > > *0 usbnet_terminate_urbs (usbnet.c:765) > /* maybe wait for deletions to finish. */ > while (!skb_queue_empty(&dev->rxq) > && !skb_queue_empty(&dev->txq) > && !skb_queue_empty(&dev->done)) { > schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS)); > set_current_state(TASK_UNINTERRUPTIBLE); > netif_dbg(dev, ifdown, dev->net, > "waited for %d urb completions\n", temp); > } > *1 usbnet_stop (usbnet.c:806) > if (!(info->flags & FLAG_AVOID_UNLINK_URBS)) > usbnet_terminate_urbs(dev); > > As a result, it is possible, for example, that the skb is removed from > dev->rxq by __skb_unlink() before the check > "!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is > also possible in this case that the skb is added to dev->done queue > after "!skb_queue_empty(&dev->done)" is checked. So > usbnet_terminate_urbs() may stop waiting and return while dev->done > queue still has an item. Exactly what problem will that result in? The tasklet_kill() will wait for the processing of the single element done queue, and everything will be fine. Or? Bjørn -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2] add support for brief output for link and addresses
This adds support for slightly less output than is normally provided by 'ip link show' and 'ip addr show'. This is a bit better when you have a host with lots of interfaces. Sample output: $ ip -br link show lo UNKNOWN 00:00:00:00:00:00 p7p1 UP 08:00:27:9d:62:9f p8p1 DOWN 08:00:27:dc:d8:ca p9p1 UP 08:00:27:76:d9:75 p7p1.100@p7p1UP 08:00:27:9d:62:9f $ ip -br -4 addr show lo UNKNOWN 127.0.0.1/8 p7p1 UP 70.0.0.1/24 p8p1 DOWN 80.0.0.1/24 p7p1.100@p7p1UP 200.0.0.1/24 $ ip -br -6 addr show lo UNKNOWN ::1/128 p7p1 UP 7000::1/8 fe80::a00:27ff:fe9d:629f/64 p8p1 DOWN 8000::1/8 p9p1 UP fe80::a00:27ff:fe76:d975/64 p7p1.100@p7p1UP fe80::a00:27ff:fe9d:629f/64 $ ip -br addr show p7p1 p7p1 UP 70.0.0.1/24 7000::1/8 fe80::a00:27ff:fe9d:629f/64 Signed-off-by: Andy Gospodarek --- include/utils.h | 1 + ip/ip.c | 5 +- ip/ip_common.h| 3 + ip/ipaddress.c| 149 ++ ip/iplink.c | 5 +- man/man8/ip-link.8.in | 3 +- 6 files changed, 141 insertions(+), 25 deletions(-) diff --git a/include/utils.h b/include/utils.h index 0c57ccd..f77edeb 100644 --- a/include/utils.h +++ b/include/utils.h @@ -19,6 +19,7 @@ extern int show_details; extern int show_raw; extern int resolve_hosts; extern int oneline; +extern int brief; extern int timestamp; extern int timestamp_short; extern const char * _SL_; diff --git a/ip/ip.c b/ip/ip.c index e75447e..eea00b8 100644 --- a/ip/ip.c +++ b/ip/ip.c @@ -32,6 +32,7 @@ int show_stats; int show_details; int resolve_hosts; int oneline; +int brief; int timestamp; const char *_SL_; int force; @@ -55,7 +56,7 @@ static void usage(void) "-h[uman-readable] | -iec |\n" "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | link } |\n" "-4 | -6 | -I | -D | -B | -0 |\n" -"-l[oops] { maximum-addr-flush-attempts } |\n" +"-l[oops] { maximum-addr-flush-attempts } | -br[ief] |\n" "-o[neline] | -t[imestamp] | -ts[hort] | -b[atch] [filename] |\n" "-rc[vbuf] [size] | -n[etns] name | -a[ll] | -c[olor]}\n"); exit(-1); @@ -250,6 +251,8 @@ int main(int argc, char **argv) if (argc <= 1) usage(); batch_file = argv[1]; + } else if (matches(opt, "-brief") == 0) { + ++brief; } else if (matches(opt, "-rcvbuf") == 0) { unsigned int size; diff --git a/ip/ip_common.h b/ip/ip_common.h index f120f5b..f74face 100644 --- a/ip/ip_common.h +++ b/ip/ip_common.h @@ -2,6 +2,9 @@ extern int get_operstate(const char *name); extern int print_linkinfo(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg); +extern int print_linkinfo_brief(const struct sockaddr_nl *who, + struct nlmsghdr *n, + void *arg); extern int print_addrinfo(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg); diff --git a/ip/ipaddress.c b/ip/ipaddress.c index 13d9c46..84b453f 100644 --- a/ip/ipaddress.c +++ b/ip/ipaddress.c @@ -125,7 +125,10 @@ static void print_link_flags(FILE *fp, unsigned flags, unsigned mdown) fprintf(fp, "%x", flags); if (mdown) fprintf(fp, ",M-DOWN"); - fprintf(fp, "> "); + if (brief) + fprintf(fp, ">"); + else + fprintf(fp, "> "); } static const char *oper_states[] = { @@ -138,13 +141,17 @@ static void print_operstate(FILE *f, __u8 state) if (state >= sizeof(oper_states)/sizeof(oper_states[0])) fprintf(f, "state %#x ", state); else { - fprintf(f, "state "); - if (strcmp(oper_states[state], "UP") == 0) - color_fprintf(f, COLOR_OPERSTATE_UP, "%s ", oper_states[state]); - else if (strcmp(oper_states[state], "DOWN") == 0) - color_fprintf(f, COLOR_OPERSTATE_DOWN, "%s ", oper_states[state]); - else - fprintf(f, "%s ", oper_states[state]); + if (brief) { + fprintf(f, "%-7s ", oper_states[state]); + } else { + fprintf(f, "state "); + if (strcmp(oper_states[state], "UP") == 0) + color_fprintf(f, COLOR_OPERSTATE_UP, "%s ", oper_states[state]); + else if (strcmp(oper_states[state], "DOWN") == 0) + col
RE: [PATCH net-next 2/2] lan78xx: update eee code
Hi Florian, Thanks for comments. Will update to utilize phylib. - Woojung > -Original Message- > From: Florian Fainelli [mailto:f.faine...@gmail.com] > Sent: Friday, August 21, 2015 5:57 PM > To: Woojung Huh - C21699; da...@davemloft.net > Cc: netdev@vger.kernel.org > Subject: Re: [PATCH net-next 2/2] lan78xx: update eee code > > On 21/08/15 14:41, woojung@microchip.com wrote: > > Patch to pdate EEE code. > > This really deserves a better explanation of what is it that you are > fixing here. > > > > > Signed-off-by: Woojung Huh > > --- > > drivers/net/usb/lan78xx.c | 44 --- > - > > drivers/net/usb/lan78xx.h | 22 +++--- > > 2 files changed, 35 insertions(+), 31 deletions(-) > > > > diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c > > index 4bcbf28..af102b0 100644 > > --- a/drivers/net/usb/lan78xx.c > > +++ b/drivers/net/usb/lan78xx.c > > @@ -1296,38 +1296,37 @@ static int lan78xx_get_eee(struct net_device > *net, struct ethtool_eee *edata) > > if (ret < 0) > > return ret; > > > > + buf = lan78xx_mmd_read(dev->net, dev->mii.phy_id, > > + PHY_MMD_DEV_7, PHY_EEE_ADVERTISEMENT); > > + adv = mmd_eee_adv_to_ethtool_adv_t(buf); > > + buf = lan78xx_mmd_read(dev->net, dev->mii.phy_id, > > + PHY_MMD_DEV_7, > PHY_EEE_LP_ADVERTISEMENT); > > + lpadv = mmd_eee_adv_to_ethtool_adv_t(buf); > > Considering your function signatures, it sounds like you should > implement a libphy driver and you could get things like phy_init_eee() > for free. > > [snip] > > > /* enable PHY interrupts */ > > ret = lan78xx_read_reg(dev, INT_EP_CTL, &buf); > > buf |= INT_ENP_PHY_INT; > > diff --git a/drivers/net/usb/lan78xx.h b/drivers/net/usb/lan78xx.h > > index ae7562e..95e721b 100644 > > --- a/drivers/net/usb/lan78xx.h > > +++ b/drivers/net/usb/lan78xx.h > > @@ -1047,23 +1047,23 @@ > > #define PHY_MMD_DEV_3 3 > > > > #define PHY_EEE_PCS_STATUS (0x1) > > -#define PHY_EEE_PCS_STATUS_TX_LPI_RCVD_ > ((WORD)0x0800) > > -#define PHY_EEE_PCS_STATUS_RX_LPI_RCVD_ > ((WORD)0x0400) > > -#define PHY_EEE_PCS_STATUS_TX_LPI_IND_ > ((WORD)0x0200) > > -#define PHY_EEE_PCS_STATUS_RX_LPI_IND_ > ((WORD)0x0100) > > -#define PHY_EEE_PCS_STATUS_PCS_RCV_LNK_STS_ > ((WORD)0x0004) > > +#define PHY_EEE_PCS_STATUS_TX_LPI_RCVD_(0x0800) > > +#define PHY_EEE_PCS_STATUS_RX_LPI_RCVD_(0x0400) > > +#define PHY_EEE_PCS_STATUS_TX_LPI_IND_ (0x0200) > > +#define PHY_EEE_PCS_STATUS_RX_LPI_IND_ (0x0100) > > +#define PHY_EEE_PCS_STATUS_PCS_RCV_LNK_STS_(0x0004) > > Can you look at updating include/uapi/linux/mdio.h with the missing > registers for your use case instead of replicating this in a driver? > -- > Florian
[PATCH 0/2] usbnet: Fix 2 problems in usbnet_stop()
The following problems found when investigating races in usbnet module are fixed here: 1. EVENT_NO_RUNTIME_PM bit of dev->flags should be read before it is cleared by "dev->flags = 0". Thanks to Oliver Neukum for spotting this problem and providing a fix. 2. A race on on skb_queue between usbnet_stop() and usbnet_bh(). Compared to the combined patch I sent earlier ("[PATCH] usbnet: Fix two races between usbnet_stop() and the BH"), this patch set has the following changes: * The fix for handling of EVENT_NO_RUNTIME_PM is now in a separate patch. * The fix for the race on dev->flags has been removed because the race is not considered harmful. Regards, Eugene -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 3/4] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl
> -Original Message- > From: Richard Cochran [mailto:richardcoch...@gmail.com] > Sent: Sunday, August 23, 2015 4:26 AM > To: Thomas Gleixner > Cc: Hall, Christopher S; Kirsher, Jeffrey T; h...@zytor.com; > mi...@redhat.com; john.stu...@linaro.org; x...@kernel.org; linux- > ker...@vger.kernel.org; netdev@vger.kernel.org; intel-wired- > l...@lists.osuosl.org; pet...@infradead.org > Subject: Re: [PATCH v3 3/4] Add support for driver cross-timestamp to > PTP_SYS_OFFSET ioctl > > On Sun, Aug 23, 2015 at 10:15:00AM +0200, Thomas Gleixner wrote: > > So why can't you take N samples from the synced hardware? It does not > > make any sense to me to switch to the imprecise mode if nsamples > 1. > > Ok, then I prefer to leave this "imprecise" method in place and ... > > > You can also provide a new IOCTL PTP_SYS_OFFSET_PRECISE which returns > > -ENOSYS if hardware timestamping is not available and avoid the whole > > nsamples dance for the case where we can get precise timestamps. > > have this for the new way. > > By keeping the imprecise method, we will be able to run both methods > on the new hardware. That will help to quantify how imprecise the old > method is. This means: remove code changes from the PTP_SYS_OFFSET ioctl and call getsynctime64() from a new ioctl PTP_SYS_OFFSET_PRECISE. Right? And use the same type (struct ptp_sys_offset) for the new ioctl? Or should a new simplified struct be used? Such as: struct precise_ptp_sys_offset { struct ptp_clock_time device; struct ptp_clock_time system; }; Does it make sense to keep the "cross-timestamp" capabilities flag as-is? > > Thanks, > Richard -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] udp_offload: Allow device GRO without checksum-complete
On Mon, 2015-08-24 at 12:34 -0700, Tom Herbert wrote: > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index c0a15e7..1d91227 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -130,6 +130,9 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min); > atomic_long_t udp_memory_allocated; > EXPORT_SYMBOL(udp_memory_allocated); > > +int sysctl_udp_gro_nocsum_ok; > +EXPORT_SYMBOL(sysctl_udp_gro_nocsum_ok); > + 1) Why is this exported ? 2) I do not believe it is specific to UDP path. We could have the same sysctl for GRE or IPIP or XXX encaps ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared
It is needed to check EVENT_NO_RUNTIME_PM bit of dev->flags in usbnet_stop(), but its value should be read before it is cleared when dev->flags is set to 0. The problem was spotted and the fix was provided by Oliver Neukum . Signed-off-by: Eugene Shatokhin --- drivers/net/usb/usbnet.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 3c86b10..e049857 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -778,7 +778,7 @@ int usbnet_stop (struct net_device *net) { struct usbnet *dev = netdev_priv(net); struct driver_info *info = dev->driver_info; - int retval, pm; + int retval, pm, mpn; clear_bit(EVENT_DEV_OPEN, &dev->flags); netif_stop_queue (net); @@ -809,6 +809,8 @@ int usbnet_stop (struct net_device *net) usbnet_purge_paused_rxq(dev); + mpn = !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags); + /* deferred work (task, timer, softirq) must also stop. * can't flush_scheduled_work() until we drop rtnl (later), * else workers could deadlock; so make workers a NOP. @@ -819,8 +821,7 @@ int usbnet_stop (struct net_device *net) if (!pm) usb_autopm_put_interface(dev->intf); - if (info->manage_power && - !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags)) + if (info->manage_power && mpn) info->manage_power(dev, 0); else usb_autopm_put_interface(dev->intf); -- 2.3.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] usbnet: Fix a race between usbnet_stop() and the BH
The race may happen when a device (e.g. YOTA 4G LTE Modem) is unplugged while the system is downloading a large file from the Net. Hardware breakpoints and Kprobes with delays were used to confirm that the race does actually happen. The race is on skb_queue ('next' pointer) between usbnet_stop() and rx_complete(), which, in turn, calls usbnet_bh(). Here is a part of the call stack with the code where the changes to the queue happen. The line numbers are for the kernel 4.1.0: *0 __skb_unlink (skbuff.h:1517) prev->next = next; *1 defer_bh (usbnet.c:430) spin_lock_irqsave(&list->lock, flags); old_state = entry->state; entry->state = state; __skb_unlink(skb, list); spin_unlock(&list->lock); spin_lock(&dev->done.lock); __skb_queue_tail(&dev->done, skb); if (dev->done.qlen == 1) tasklet_schedule(&dev->bh); spin_unlock_irqrestore(&dev->done.lock, flags); *2 rx_complete (usbnet.c:640) state = defer_bh(dev, skb, &dev->rxq, state); At the same time, the following code repeatedly checks if the queue is empty and reads these values concurrently with the above changes: *0 usbnet_terminate_urbs (usbnet.c:765) /* maybe wait for deletions to finish. */ while (!skb_queue_empty(&dev->rxq) && !skb_queue_empty(&dev->txq) && !skb_queue_empty(&dev->done)) { schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS)); set_current_state(TASK_UNINTERRUPTIBLE); netif_dbg(dev, ifdown, dev->net, "waited for %d urb completions\n", temp); } *1 usbnet_stop (usbnet.c:806) if (!(info->flags & FLAG_AVOID_UNLINK_URBS)) usbnet_terminate_urbs(dev); As a result, it is possible, for example, that the skb is removed from dev->rxq by __skb_unlink() before the check "!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is also possible in this case that the skb is added to dev->done queue after "!skb_queue_empty(&dev->done)" is checked. So usbnet_terminate_urbs() may stop waiting and return while dev->done queue still has an item. Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid this race. Signed-off-by: Eugene Shatokhin --- drivers/net/usb/usbnet.c | 39 --- 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index e049857..b4cf107 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -428,12 +428,18 @@ static enum skb_state defer_bh(struct usbnet *dev, struct sk_buff *skb, old_state = entry->state; entry->state = state; __skb_unlink(skb, list); - spin_unlock(&list->lock); - spin_lock(&dev->done.lock); + + /* defer_bh() is never called with list == &dev->done. +* spin_lock_nested() tells lockdep that it is OK to take +* dev->done.lock here with list->lock held. +*/ + spin_lock_nested(&dev->done.lock, SINGLE_DEPTH_NESTING); + __skb_queue_tail(&dev->done, skb); if (dev->done.qlen == 1) tasklet_schedule(&dev->bh); - spin_unlock_irqrestore(&dev->done.lock, flags); + spin_unlock(&dev->done.lock); + spin_unlock_irqrestore(&list->lock, flags); return old_state; } @@ -749,6 +755,20 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs); /*-*/ +static void wait_skb_queue_empty(struct sk_buff_head *q) +{ + unsigned long flags; + + spin_lock_irqsave(&q->lock, flags); + while (!skb_queue_empty(q)) { + spin_unlock_irqrestore(&q->lock, flags); + schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS)); + set_current_state(TASK_UNINTERRUPTIBLE); + spin_lock_irqsave(&q->lock, flags); + } + spin_unlock_irqrestore(&q->lock, flags); +} + // precondition: never called in_interrupt static void usbnet_terminate_urbs(struct usbnet *dev) { @@ -762,14 +782,11 @@ static void usbnet_terminate_urbs(struct usbnet *dev) unlink_urbs(dev, &dev->rxq); /* maybe wait for deletions to finish. */ - while (!skb_queue_empty(&dev->rxq) - && !skb_queue_empty(&dev->txq) - && !skb_queue_empty(&dev->done)) { - schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS)); - set_current_state(TASK_UNINTERRUPTIBLE); - netif_dbg(dev, ifdown, dev->net, - "waited for %d urb completions\n", temp); - } + wait_skb_queue_empty(&dev->rxq); + wait_skb_queue_empty(&dev->txq); + wait_skb_queue_empty(&dev->done); + netif_dbg(dev, ifdown, dev->net, + "waited for %d urb completions\n", temp); set_current_state(TASK_RUNNING); remove_wait_queue(&dev->wait, &wait); } -- 2.3.2 -- To unsubscribe from this list: sen
[PATCH net-next] udp_offload: Allow device GRO without checksum-complete
This patch adds a sysctl which allows GRO for a UDP offload protocol to be performed in the device NAPI. This potentially is a performance improvement if the savings of doing GRO in device NAPI outweighs the cost of performing the checksum. Note that the performing the checksum in device NAPI may negatively impact latency or throughput of unrelated flows. Performance results for VXLAN are below. Allowing GRO in device NAPI does show performance improvement over doing GRO at the VXLAN interface, however this performance is still less than what we see with UDP checksums enabled (or getting checksum complete from the device). Test results: Running one netperf TCP_STREAM over VXLAN. No UDP checksum, enable sysctl to allow GRO at device (this patch) TX CPU: 1.71 RX CPU: 1.14 6174 Mbps UDP checksums and remote checksum offload enabled TX CPU: 1.97% RX CPU: 1.55% 7527 Mbps UDP checksums enabled TX CPU: 1.22% RX CPU: 1.86% 6539 Mbps No UDP checksums, GRO enabled on VXLAN interface TX CPU: 0.95% RX CPU: 1.78% 4393 Mbps No UDP checksum, GRO disabled VXLAN interface TX CPU: 1.31% RX CPU: 2.38% 3613 Mbps Signed-off-by: Tom Herbert --- Documentation/networking/ip-sysctl.txt | 7 +++ include/net/udp.h | 1 + net/ipv4/sysctl_net_ipv4.c | 7 +++ net/ipv4/udp.c | 3 +++ net/ipv4/udp_offload.c | 7 --- 5 files changed, 22 insertions(+), 3 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 46e88ed..d8563c08 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -711,6 +711,13 @@ udp_wmem_min - INTEGER total pages of UDP sockets exceed udp_mem pressure. The unit is byte. Default: 1 page +udp_gro_nocsum_ok - BOOLEAN + If set, allow Generic Receive Offload (GRO) to be performed for UDP + offload protocols in the case that packets are being received + without an offloaded checksum. This implies that packets checksums + may be performed in the device NAPI routines which could negatively + impact unrelated flows. + CIPSOv4 Variables: cipso_cache_enable - BOOLEAN diff --git a/include/net/udp.h b/include/net/udp.h index 6d4ed18..48eb6ae 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -103,6 +103,7 @@ extern atomic_long_t udp_memory_allocated; extern long sysctl_udp_mem[3]; extern int sysctl_udp_rmem_min; extern int sysctl_udp_wmem_min; +extern int sysctl_udp_gro_nocsum_ok; struct sk_buff; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 0330ab2..65fea78 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -766,6 +766,13 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = &one }, + { + .procname = "udp_gro_nocsum_ok", + .data = &sysctl_udp_gro_nocsum_ok, + .maxlen = sizeof(sysctl_udp_gro_nocsum_ok), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + }, { } }; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c0a15e7..1d91227 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -130,6 +130,9 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min); atomic_long_t udp_memory_allocated; EXPORT_SYMBOL(udp_memory_allocated); +int sysctl_udp_gro_nocsum_ok; +EXPORT_SYMBOL(sysctl_udp_gro_nocsum_ok); + #define MAX_UDP_PORTS 65536 #define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN) diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index f938616..1666f44 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -300,9 +300,10 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb, int flush = 1; if (NAPI_GRO_CB(skb)->udp_mark || - (skb->ip_summed != CHECKSUM_PARTIAL && -NAPI_GRO_CB(skb)->csum_cnt == 0 && -!NAPI_GRO_CB(skb)->csum_valid)) + ((skb->ip_summed != CHECKSUM_PARTIAL && + NAPI_GRO_CB(skb)->csum_cnt == 0 && + !NAPI_GRO_CB(skb)->csum_valid) && + !sysctl_udp_gro_nocsum_ok)) goto out; /* mark that this skb passed once through the udp gro layer */ -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] 3c59x: Add BQL support for 3c59x ethernet driver.
From: Loganaden Velvindron Date: Thu, 20 Aug 2015 19:22:18 -0700 > This BQL patch is based on work done by Tino Reichardt. > > Tested on :05:00.0: 3Com PCI 3c905C Tornado at c9e6e000 by running > Flent several times. > > > Signed-off-by: Loganaden Velvindron Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iproute2: Behavioural Bug?
On Mon, Aug 24, 2015 at 02:00:29PM +0530, Akshat Kakkar wrote: > When I am trying to delete a single tc filter (i.e. specifying its > handle), it is deleting all the > filters with the same priority/preference. i.e. it is ignoring the > handle specified. > > But, When I am doing similar activity in hashtable 800: it is deleting only > the > specified filter, i.e. it is behaving as expected. > > I am unable to comprehend the reason for this difference in behaviour. > > Infact, in kernel 2.6.32 all is working as expected. However, in > kernel 3.1 and 4.1 it is having the behaviour as mentioned above. > > For example, following set of commands create a hashtable 15: and add > 2 filters to it. > > tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor > 256 > tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32 > ht 15:2: match ip src 10.0.0.2 flowid 1:10 > tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32 > ht 15:2: match ip src 10.0.0.3 flowid 1:10 > > Now following command DELETES ALL THE FILTERS, though it should only > delete FILTER 15:2:3 ! > tc filter del dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32 > > O/p of tc filter show eth0 is this case is blank. As all filters are deleted. > > > However, similar commands when executed for hashtable 800: is deleting > only the specified filter > tc filter add dev eth0 protocol ip parent 1: prio 5 handle 800:0:2 u32 > ht 800:0: match ip src 10.0.0.2 flowid 1:10 > tc filter add dev eth0 protocol ip parent 1: prio 5 handle 800:0:3 u32 > ht 800:0: match ip src 10.0.0.3 flowid 1:10 > > tc filter del dev eth0 protocol ip parent 1: prio 5 handle 800:0:2 u32 > > Above mentioned command only deletes single filter. > O/p of tc filter show eth0 is 2nd case is > > filter parent 1: protocol ip pref 5 u32 > filter parent 1: protocol ip pref 5 u32 fh 800: ht divisor 1 > filter parent 1: protocol ip pref 5 u32 fh 800::3 order 3 key ht 800 > bkt 0 flowid 1:10 > match 0a03/ at 12 > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Hi, Thats what I got using this script where I copied your commands: -- #!/bin/bash DEV=dummy0 ip link del $DEV 2> /dev/null ip link add dev $DEV type dummy tc qdisc add dev $DEV root handle 1: htb tc filter add dev $DEV parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256 tc filter add dev $DEV protocol ip parent 1: prio 5 handle 15:2:2 u32 ht 15:2: match ip src 10.0.0.2 flowid 1:10 tc filter add dev $DEV protocol ip parent 1: prio 5 handle 15:2:3 u32 ht 15:2: match ip src 10.0.0.3 flowid 1:10 tc filter del dev $DEV protocol ip parent 1: prio 5 handle 15:2:3 u32 tc filter show dev $DEV # - Result is: filter parent 1: protocol ip pref 5 u32 filter parent 1: protocol ip pref 5 u32 fh 15: ht divisor 256 filter parent 1: protocol ip pref 5 u32 fh 15:2:2 order 2 key ht 15 bkt 2 flowid 1:10 match 0a02/ at 12 filter parent 1: protocol ip pref 5 u32 fh 800: ht divisor 1 Some additional info: # tc -V tc utility, iproute2-ss150413 # uname -a Linux angus-think 4.0.4-2-ARCH #1 SMP PREEMPT Fri May 22 03:05:23 UTC 2015 x86_64 GNU/Linux Regards, Vadim Kochan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 11/11] mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()
From: "Luis R. Rodriguez" The crusade to replace mtrr_add() with architecture agnostic arch_phys_wc_add() is complete, this will ensure write-combining implementations (PAT on x86) is taken advantage instead of using MTRR. With the crusade done now, hide direct MTRR access for drivers. Update x86 documentation on MTRR to reflect the completion of the phasing out of direct access to MTRR, also add a note on platform firmware code use of MTRRs based on the obituary discussion of MTRRs on Linux [0]. [0] http://lkml.kernel.org/r/1438991330.3109.196.ca...@hp.com Cc: Toshi Kani Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Borislav Petkov Cc: Dave Hansen Cc: Suresh Siddha Cc: Ingo Molnar Cc: Juergen Gross Cc: Daniel Vetter Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: Ville Syrjälä Cc: Mel Gorman Cc: Vlastimil Babka Cc: Davidlohr Bueso Cc: Doug Ledford Cc: Andy Walls Cc: x...@kernel.org Cc: netdev@vger.kernel.org Cc: linux-me...@vger.kernel.org Cc: linux-fb...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Luis R. Rodriguez --- Documentation/x86/mtrr.txt | 20 arch/x86/kernel/cpu/mtrr/main.c | 2 -- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt index 860bc3adc223..8a0bdb6e7370 100644 --- a/Documentation/x86/mtrr.txt +++ b/Documentation/x86/mtrr.txt @@ -6,10 +6,22 @@ Luis R. Rodriguez - April 9, 2015 === Phasing out MTRR use -MTRR use is replaced on modern x86 hardware with PAT. Over time the only type -of effective MTRR that is expected to be supported will be for write-combining. -As MTRR use is phased out device drivers should use arch_phys_wc_add() to make -MTRR effective on non-PAT systems while a no-op on PAT enabled systems. +MTRR use is replaced on modern x86 hardware with PAT. Direct MTRR use by +drivers on Linux is now completely phased out, device drivers should use +arch_phys_wc_add() in combination with ioremap_wc() to make MTRR effective on +non-PAT systems while a no-op but equally effective on PAT enabled systems. + +Even if Linux does not use MTRR directly some x86 platform firmware may still +set up MTRRs early before booting the OS, they do this as some platform +firmware may still have implemented access to MTRRs which would be controlled +and handled by the platform firmware directly. An example of platform use of +MTRR is through the use of SMI handlers, one case could be for fan control, +the platform code would need uncachable access to some of its fan control +registers. Such platform access does not need any Operating System MTRR code in +place other than mtrr_type_lookup() to ensure any OS specific mapping requests +are aligned with platform MTRR setup. If MTRRs are only set up by the platform +firmware code though and the OS does not make any specific MTRR mapping +requests mtrr_type_lookup() should always return MTRR_TYPE_INVALID. For details refer to Documentation/x86/pat.txt. diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index e7ed0d8ebacb..f891b4750f04 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -448,7 +448,6 @@ int mtrr_add(unsigned long base, unsigned long size, unsigned int type, return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type, increment); } -EXPORT_SYMBOL(mtrr_add); /** * mtrr_del_page - delete a memory type region @@ -537,7 +536,6 @@ int mtrr_del(int reg, unsigned long base, unsigned long size) return -EINVAL; return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT); } -EXPORT_SYMBOL(mtrr_del); /** * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2] sctp: start t5 timer only when peer.rwnd is 0 and local.state is SHUTDOWN_PENDING
On Mon, Aug 24, 2015 at 02:36:59PM -0400, Vlad Yasevich wrote: > On 08/24/2015 02:31 PM, Marcelo Ricardo Leitner wrote: > > On Mon, Aug 24, 2015 at 02:13:38PM -0400, Vlad Yasevich wrote: > >> On 08/23/2015 07:30 AM, Xin Long wrote: > >>> when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING > >>> state, > >>> if B neither claim his rwnd is 0 nor send SACK for this data, A will keep > >>> retransmitting this data util t5 timeout, Max.Retrans times can't work > >>> anymore, > >>> which is bad. > >>> > >>> if B's rwnd is not 0, it should send abord after Max.Retrans times, only > >>> when > >>> B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A will > >>> start > >>> t5 timer, which is also commit f8d960524 means, but it lacks the condition > >>> peer.rwnd == 0. > >>> > >>> Fixes: f8d960524 ("sctp: Enforce retransmission limit during shutdown") > >>> Signed-off-by: Xin Long > >>> --- > >>> net/sctp/sm_statefuns.c | 3 ++- > >>> 1 file changed, 2 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c > >>> index 3ee27b7..deb9eab 100644 > >>> --- a/net/sctp/sm_statefuns.c > >>> +++ b/net/sctp/sm_statefuns.c > >>> @@ -5412,7 +5412,8 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(struct net > >>> *net, > >>> SCTP_INC_STATS(net, SCTP_MIB_T3_RTX_EXPIREDS); > >>> > >>> if (asoc->overall_error_count >= asoc->max_retrans) { > >>> - if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) { > >>> + if (!q->asoc->peer.rwnd && > >>> + asoc->state == SCTP_STATE_SHUTDOWN_PENDING) { > >>> /* > >>>* We are here likely because the receiver had its rwnd > >>>* closed for a while and we have not been able to > >>> > >> > >> This may not work as expected. peer.rwnd is the calculated peer window, > >> but it > >> also gets updated when we receive sacks. So there is no way to tell that > >> the current windows is 0 because peer told us, or because we sent data to > >> make 0 > >> and the peer hasn't responded. > > > > I'm not sure I follow you, Vlad. I don't think we care on why we have > > zero-window in there, just that if we are at it on that stage. Either > > one, if it's zero window, we will go through T5 and give it more time to > > recover, but if it's not zero window, I don't see a reason to enable T5.. > > No, these are 2 distinct instances. In one instance, the peer is reachable > and > is able to communication 0 rwnd state to us. Thus we are being nice and > granting > the peer more time to exit the 0 window state. > > In the other state, the peer is unreachable and we just happen to hit the > 0-window > condition based on some estimations of the peer window. In this case, we > should > be subject to the Max.RTX and terminate the association sooner. Makes sense, we can do better in there. Thanks Vlad. Marcelo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] netlink: netlink_ack send a capped message in case of error
On Mon, Aug 24, 2015 at 10:08:22AM +0200, Christophe Ricard wrote: > Hi Scott, > > I think i understand the potential limitation of my solution. > I saw something was proposed by Jiri Benc who pushed an additional flag to > tell if the payload can be ignored in case of an error. > http://patchwork.ozlabs.org/patch/290976/ > > Do you think this one is acceptable ? I am not sure to understand David > last comment. I think David suggests something like the (completely untested) attached patch. >From 3aa0deafb5648427d154e26920d9d85f89dab190 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Mon, 24 Aug 2015 20:23:45 +0200 Subject: [PATCH RFC] netlink: add NETLINK_CAP_ACK socket option Since commit c05cdb1b864f ("netlink: allow large data transfers from user-space"), the kernel may fail to allocate the necessary room for the acknowledgement message back to userspace. This patch introduces a new socket option that trims off the payload of the original netlink message. The netlink message header is still included, so the user can guess from the sequence number what is the message that has triggered the acknowledgment. Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netlink.h |1 + net/netlink/af_netlink.c | 25 +++-- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index cf6a65c..6f3fe16 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -110,6 +110,7 @@ struct nlmsgerr { #define NETLINK_TX_RING 7 #define NETLINK_LISTEN_ALL_NSID 8 #define NETLINK_LIST_MEMBERSHIPS 9 +#define NETLINK_CAP_ACK 10 struct nl_pktinfo { __u32 group; diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 67d2104..baa5973 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -84,6 +84,7 @@ struct listeners { #define NETLINK_F_BROADCAST_SEND_ERROR 0x4 #define NETLINK_F_RECV_NO_ENOBUFS 0x8 #define NETLINK_F_LISTEN_ALL_NSID 0x10 +#define NETLINK_F_CAP_ACK 0x20 static inline int netlink_is_kernel(struct sock *sk) { @@ -2258,6 +2259,13 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname, nlk->flags &= ~NETLINK_F_LISTEN_ALL_NSID; err = 0; break; + case NETLINK_CAP_ACK: + if (val) + nlk->flags |= NETLINK_F_CAP_ACK; + else + nlk->flags &= ~NETLINK_F_CAP_ACK; + err = 0; + break; default: err = -ENOPROTOOPT; } @@ -2332,6 +2340,16 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname, netlink_table_ungrab(); break; } + case NETLINK_CAP_ACK: + if (len < sizeof(int)) + return -EINVAL; + len = sizeof(int); + val = nlk->flags & NETLINK_F_CAP_ACK ? 1 : 0; + if (put_user(len, optlen) || + put_user(val, optval)) + return -EFAULT; + err = 0; + break; default: err = -ENOPROTOOPT; } @@ -2869,13 +2887,16 @@ EXPORT_SYMBOL(__netlink_dump_start); void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err) { + struct netlink_sock *nlk = nlk_sk(in_skb->sk); struct sk_buff *skb; struct nlmsghdr *rep; struct nlmsgerr *errmsg; size_t payload = sizeof(*errmsg); - /* error messages get the original request appened */ - if (err) + /* Error messages get the original request appended, unless the user + * requests to cap the error message. + */ + if (!(nlk->flags & NETLINK_F_CAP_ACK) && err) payload += nlmsg_len(nlh); skb = netlink_alloc_skb(in_skb->sk, nlmsg_total_size(payload), -- 1.7.10.4
Re: [PATCH v3 01/10] ss: rooted out ss type declarations for output formatters
-BEGIN PGP MESSAGE- Charset: windows-1252 Version: GnuPG v1 hQQOAweL74a5LMkVEA//azcgajmoTO+UKZPf5wl+V8QAi/r9gCmyyJR0wV6RsH0N sUpnR2c9uSVNU+J41L206vDsnNk0Huoa6m6miibLFg3mxQ9KTDdzaePmkfk9FwCC Au7RsDzxo8nq/rpZsPeD2r/EAod6C3XVGRNc6nAMMi84tMCtObjDFDQs+mPcWf5n nCZwmdovGtzCHpw6moq51K8pql0CmRpFSnMdSVySykxc/pFetRBpBJ1hJBT3pCEc ZYogu5LbKqCbn2xpwXPQDC0i3iEU1sa2xuucj88y8yG9Bdy08mgEwFJvbg+wP83e oERVIFKIQK02qeS04RgEt5w8t/3b5F3GOn8lqCjLTiXssiKCjgqh0KsdSeE4SwMN Ny8ND6SOSScCqx1lBvViBGpYw40CdrMwmI5opV0Ljm4lvzmmk8sNcxJhcKQKh/Gq UYHm0oVMFsoiHlKREKtNn8k8A6fWKes1Psoa2ZsyQfiLH0lSii/eA5OudKYHRw71 oT688HPjABQ7PWjBN7cPr2CqNXutbA7NzvcjmaGZ0aXyd5OMIMyMbPZ4uhWtpN4N sKtmaxh7kCBnNE50tj65X/hJurEl/tgJeWK8HjOVQOXlqKOTszxZbmvj3Sy0OHRS RYUSDDaKbgPNeFF67+/ebHlUObo/gt2z9rLZhok9lP8OuXV35Am8qT2INcxsKtEQ AITEgflrFsR+XG9XQNChtwqsFBx2dULsJqI4QHngtf6OcYNYJ544q6BSe8Meij++ IMAgFdFqyYpgsALsSOqcYbqaL4rxANe+1/gp/71ge9jm/T8vXRIImZuonYvEpJPS cIvett1uOjmqckI2L4upz3Kx2La5+qhmMmvXxieib8Cmu51WIl9uSHwLsTkZQvyM oqShZqe/w3zh+3RcuqgaRsTQIpAW5ArS8TvHs5GqYGAo290PxTSYfa70/YmDqkrx rUGB1Dj7525fMUPACwNU0EemM0ia8ZmpBUWNcEtoPjppROjZ9MsuxDyJHSv7ghre nTSL1AjcgUmphUsyq/YMD4sxW5k052GgmrQQOEp+LQX6U+r89uDkno3rsgwHCjKb H7dKL7xqlG3vX8OKsUUvTCQxQIyqCoHxa3Iu6KPyBDq7061A4cNZdE//Jk6kHT0B lSM50Prkaok+DR5lLBGZ87LVqmqasN/R73m0l+Jz6grMBfaqpUmkyrr3tyQEAZjD C7IdnY5mrAM/q2GEQDjxk0gHdWBgg1NCNmRdMJ5VEyDtXeDgujnvSJPlTshqhbNE MTSX9HCdGTwe5L8gEkpcSR7RmZEzPE6Qq4p+lB6cQa8rZlnjuFi7WMXxcivVE4ln KaAEx+2FSWJ1dAUyLre4aIY//OWz9g2PKsau3U2JcCCxhQQOA+gF559QFX/oEA// SQPehPCO9VD0sakvFZOCsWN0+65dOlK5hr3rzC2crZeMDaqr9fX7/5IGNHhAaqyc 7SjByXH0f2QC4M2LhR1oA/CihIsWo+0bnmfhJvfnOKrT1KZBqf8irIbdb6vw80e8 0cVE2FhD3WhawMtRv0a1L0eD3iLhz8C9utNe1iRl6hxdu4JKTvqRt7JAjy0dMy4W 4OcviIgMvE6EXmk8hr9LFx17ULGysYswaDBgBW+R1jjp9+EwzyfAmrGbghjtRxJX 4WMn2xYidBWsq3i/86hz0SkENsfrzhc5evOUun8W0yoiJZm6PQcwqsD4nJ8iGbaT 1GrNNdpoyHdQ8F1IXaabkJwVBZXvDxMLTTOKRlapzCEZVOZgirPm562sWY0yQ/9w Igvv/ufTqLrfU7Wy5tKcuJsDGHdVYaPyusRlOLD65tRQWpDnosAboBB+lpTF13zh nx4TWO6FqUA74Nc9OslspPz+FOX1vGCrQV6xCnKXP0xOZP3gSJ7yoJlcxhjRnwrf XfXj2Rh6+V2KrvPsw4PqidBGkNwp1hbd/qB7loFuklt/vRZ02BVEuc3Wo5jDD8Qa Jct6hwryw3PDu7lI7Zb74CxlzycqhUsHlRtpCXxr6hJepuy8zA1sFKLR8JR+X/u4 QmB1jA+h4WKAXEMuunrEqYitJ4wDrlV6/Kn06H/+K6sP/jjedosIAj46gVXpllu2 MrPVcyo4KQ+uyXhUuq1OeeSGKOmOC70wGrUOZKWTwrXhWMlRgPiiMj1V1/CAHQ2n 7D0ktruvzXf8+rkz2khftz5weke69iHEcjJ9uGiDPYExBGaZQ4SOKdOS2FLz/BWX o+WhuHJ3R/jgKpkR/KQUKe4ueG0wQOqR8DqIB5D8+PV5NMn7AxXuVLUTsBf0vFoy BS63QxhPxeD0x9T6SyFwIbpK7h6kr4R67HWvz0Ryu8Dly214IzWTrqzj4j/1ew0e omKjqyI/+BhHKED7fjfQpmqXhIB492iluzfEeHqXliTxMM+wKdzRdiMeFPwhvI8M YpEITOwTxvaMMcpA60fvxw39BM4TdnUAyc63O3ebciPENGQq32F9Vj4tAg5IRGMS BN9HX3dU0PMhCT0UrbNRIWROSGJu43UFqymwffflhJb0xMCqslBsk9qhUqjtEq3A DLdFt8YZz82uUjqSYyayVr14QdeTj7PhJoX+fOFjWPSLru6jVV7PjekuVfhcILWk PFyCKYenPdWPyswI7NPmmd4YtovoJTBd0Mzijc2a6S+HJP9FNn3lG1p1L4We/Xcq ivOvAIRN4NZuHHzTk75WoDsBaL9j+/ddVsyedKfkjo/bQCTnmpR9R3MAibynIT5K YWUh1wXM2nLqQ/QQL/WHVLUNhQIOA2Nel9d25tJfEAf+JZJbaPnTzw2X04NnnQra MLdNVys9c6HQUuBUmy6IHN48w5RE8ZXpl8jeiXh8/fyVGM01Ae4oG9Nf3nirAXE3 BW3gncwwEXootoJKeuQfRlVfTi+iHuapw8Bt16CpHe434cBlaeaaN6ZGQmKS8Bgd vDn3TQn4Bp1gsahDnc9IjSIQMq+sUAeWE97UFrbK0sfs9/81NBTPS5xX8n+r4NEE mxTHuZIS81aSQHpMc0j2vkZbl6En+JvYa1LcWkw7miIPXN6fGQPjGmw2W4dVoYho 2YwVh0I5mOAYH7kpmJ3bDtIh/DCGRygKqX5xpBBYFUgi0degNM2BXf9Amfcxc+wu YggAsTXFNzxJZD2mMqyR+VI/2Ep9mJEGYBXUI4cVk6tF480VNYzvSnTeIsUFG2dw 7WXDs4Quon1Myah8ybUfXujvP21y+WG9Kdin9Y2pAtyrJkUdfL8kgQd8leQMHAx0 W+ECgRRcjgXUR5gIGGitYXw2vRfIwMqY7D8N2b5zfhtr94h/7cJORL973C5OAx10 Su4ROx/AU0Q2SWjW/3FwfD2praurLrqTcsD1UvTkW3YX1g5zdZomwYho1v8Qh/rM R1nYKNcIF9fycGnRsi8F+vqP84uZBtS2p6CRDOSh+9l0iDFAJNwa+lg0abIA4o60 YgFqY6AeJjVwxZ+37ru362r2lIUCDAPlr88g4tbi7QEP/0JYP/q6tqdy6JyVzBu0 +vnc7tNGzEAW42QgYwDRxOebil/ojkhnHxL9tPqToRS76S0i8aI70Z4E7Aq8K/pW N2QgSNIS2tPdTbVYl9VDidERncLa2eHiCok74H3aR3vLZUC1TGtuvrn2m+PSAzYB o85whG4nD0sz4HCrqIiR/ASWdkgNc1VGS/0lIeZnfij+WQPN8FvwMFOwKeFlSihs zIYnT1bzNPe3d+yAhE9C1mOSgFbLs1s287f41moAxvF6OsYtWwyE6y3bE0mDsf2e 0bTSsEFrYmipp9lAgb0C3vCqp8DMrySduk4PEAf/LXkI6wo/TgAlBHHK8YvDlm4E MEJhUHlt4LkmeEkMTDPtEnuf8DuG0IB8ap4ayQL87hC9N0LrX5hOQVmdWJw5B7Y6 pUIZdOADDT9f/pWNlDu/4bHFAPJNK1S9A2hkhk9FeqTwp4zJCTi7jvHrRad9ckri rad/V60rOij/OoEYZjbW93mTKyhq48GIU4+az9+S/Cb4RrcSXWf7kyoXmXCB+E7Q kKS8L+YaLP8q4QuRyUnY/GXDtwu95S0SJ/AJ7y7XLjshcyJ8xdlAJWXFJhAECdJ9 XV/HvxvFAod7L6ag1gHGuzrZRG8SbMTtyiWzDa3FhqtwmvGiSs7/RyDL2wWtV9VD TRrBGwWaSWsceqRl34UEDVIZhQEMA9VWY3oy9Ed1AQf+NzCoZT1ryrfEI/YzS7sr J8dUKR2dSdjyX62plHjILQCUET6PEgJeTKRotv0THPLdJDjieTsQqbl5pXdm/T6n P7hOhC97xZY/Giweabd30ZesAnnnNOW5zBGH8PSpmU3hMnWCsp1Rnp3C2SaeZvxb Y7QxZRbg+2mb7fp41j3lqiNRczPVldc8iK9ngZSzzMw4MX2mw3BmPwxmD4uUD89y BuB4J53FMF6TUZe9k38XHgmRLvDq+LYQhf53YNcb/DQ4AB1JHMkitKR1oyI3ptOx urYuwMNtn2E/EgOqGIbjQrW8JKIUJ8bTgDQiSE7rkZp172+edU1hQYpbfTlAfW0R zNLpAVd3bBldWfkH29RT2VpxSPbgclAXZGq+ZtBvuTa1p6T3GD0Y95vB70set+w7 oPxjpbjx3zWR/3M6Bz8wla9Bi2qXJ+J87dYRAIK3rYwpnrKfysY+qocJTQiO/Gku 62P2yWKYwcLvRa7v5TwzHxGq0CEul/bzIj8r9ixhReXKGq2PkKEiByqZ900kobMZ 03O4ZXY1FDo9GQ5w6N6T8dPxMuF/hTYADt/8xs1p6RUM+AfljtDdJ1LvBx9pDWjB KF9ieqOkXw7QPCI7lp9XDv4s0Gqup9bTNOd7doTwc4ah5glnu6IdixG
Re: [PATCH, net-next] r8169: On RTL 8101 series bit SYSErr is reserved.
From: Corcodel Marian Date: Mon, 24 Aug 2015 21:12:53 +0300 > diff --git a/drivers/net/ethernet/realtek/r8169.c > b/drivers/net/ethernet/realtek/r8169.c > index 5693e65..32d2072 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -8256,6 +8256,14 @@ static int rtl_init_one(struct pci_dev *pdev, const > struct pci_device_id *ent) > RTL_W8(Config1, RTL_R8(Config1) | PMEnable); > RTL_W8(Config5, RTL_R8(Config5) & (BWF | MWF | UWF | LanWake | > PMEStatus));*/ > switch (tp->mac_version) { > + case RTL_GIGA_MAC_VER_07: > + case RTL_GIGA_MAC_VER_08: > + case RTL_GIGA_MAC_VER_09: > + case RTL_GIGA_MAC_VER_10: > + case RTL_GIGA_MAC_VER_13: > + case RTL_GIGA_MAC_VER_16: > + pci_write_config_word(pdev, PCI_COMMAND, ~PCI_COMMAND_SERR); You're writing all sorts of bits you definitely don't want to set here. Furthermore, there is no need to clear a bit that shouldn't be set in the first place. Your patches are really full of major errors, and unsuitable for upstream. Yes, all of them. So please stop posting your r8169 changes here, because if you don't care if your patches get included or not, then you should not be posting them here. This isn't a place to just dump ramdom patches, sorry. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2] sctp: start t5 timer only when peer.rwnd is 0 and local.state is SHUTDOWN_PENDING
On 08/24/2015 02:31 PM, Marcelo Ricardo Leitner wrote: > On Mon, Aug 24, 2015 at 02:13:38PM -0400, Vlad Yasevich wrote: >> On 08/23/2015 07:30 AM, Xin Long wrote: >>> when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING >>> state, >>> if B neither claim his rwnd is 0 nor send SACK for this data, A will keep >>> retransmitting this data util t5 timeout, Max.Retrans times can't work >>> anymore, >>> which is bad. >>> >>> if B's rwnd is not 0, it should send abord after Max.Retrans times, only >>> when >>> B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A will start >>> t5 timer, which is also commit f8d960524 means, but it lacks the condition >>> peer.rwnd == 0. >>> >>> Fixes: f8d960524 ("sctp: Enforce retransmission limit during shutdown") >>> Signed-off-by: Xin Long >>> --- >>> net/sctp/sm_statefuns.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c >>> index 3ee27b7..deb9eab 100644 >>> --- a/net/sctp/sm_statefuns.c >>> +++ b/net/sctp/sm_statefuns.c >>> @@ -5412,7 +5412,8 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(struct net >>> *net, >>> SCTP_INC_STATS(net, SCTP_MIB_T3_RTX_EXPIREDS); >>> >>> if (asoc->overall_error_count >= asoc->max_retrans) { >>> - if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) { >>> + if (!q->asoc->peer.rwnd && >>> + asoc->state == SCTP_STATE_SHUTDOWN_PENDING) { >>> /* >>> * We are here likely because the receiver had its rwnd >>> * closed for a while and we have not been able to >>> >> >> This may not work as expected. peer.rwnd is the calculated peer window, but >> it >> also gets updated when we receive sacks. So there is no way to tell that >> the current windows is 0 because peer told us, or because we sent data to >> make 0 >> and the peer hasn't responded. > > I'm not sure I follow you, Vlad. I don't think we care on why we have > zero-window in there, just that if we are at it on that stage. Either > one, if it's zero window, we will go through T5 and give it more time to > recover, but if it's not zero window, I don't see a reason to enable T5.. No, these are 2 distinct instances. In one instance, the peer is reachable and is able to communication 0 rwnd state to us. Thus we are being nice and granting the peer more time to exit the 0 window state. In the other state, the peer is unreachable and we just happen to hit the 0-window condition based on some estimations of the peer window. In this case, we should be subject to the Max.RTX and terminate the association sooner. -vlad > > Marcelo > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH
From: Alan Stern Date: Mon, 24 Aug 2015 14:06:15 -0400 (EDT) > On Mon, 24 Aug 2015, David Miller wrote: >> Atomic operations like clear_bit also will behave that way. > > Are you certain about that? I couldn't find any mention of it in > Documentation/atomic_ops.txt. > > In theory, an architecture could implement atomic bit operations using > a spinlock to insure atomicity. I don't know if any architectures do > this, but if they do then the scenario above could arise. Indeed, we do have platforms like 32-bit sparc and parisc that do this. So, taking that into consideration, this is a bit unfortunate and on such platforms we do have this problem. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2] sctp: start t5 timer only when peer.rwnd is 0 and local.state is SHUTDOWN_PENDING
On Mon, Aug 24, 2015 at 02:13:38PM -0400, Vlad Yasevich wrote: > On 08/23/2015 07:30 AM, Xin Long wrote: > > when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING > > state, > > if B neither claim his rwnd is 0 nor send SACK for this data, A will keep > > retransmitting this data util t5 timeout, Max.Retrans times can't work > > anymore, > > which is bad. > > > > if B's rwnd is not 0, it should send abord after Max.Retrans times, only > > when > > B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A will start > > t5 timer, which is also commit f8d960524 means, but it lacks the condition > > peer.rwnd == 0. > > > > Fixes: f8d960524 ("sctp: Enforce retransmission limit during shutdown") > > Signed-off-by: Xin Long > > --- > > net/sctp/sm_statefuns.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c > > index 3ee27b7..deb9eab 100644 > > --- a/net/sctp/sm_statefuns.c > > +++ b/net/sctp/sm_statefuns.c > > @@ -5412,7 +5412,8 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(struct net > > *net, > > SCTP_INC_STATS(net, SCTP_MIB_T3_RTX_EXPIREDS); > > > > if (asoc->overall_error_count >= asoc->max_retrans) { > > - if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) { > > + if (!q->asoc->peer.rwnd && > > + asoc->state == SCTP_STATE_SHUTDOWN_PENDING) { > > /* > > * We are here likely because the receiver had its rwnd > > * closed for a while and we have not been able to > > > > This may not work as expected. peer.rwnd is the calculated peer window, but > it > also gets updated when we receive sacks. So there is no way to tell that > the current windows is 0 because peer told us, or because we sent data to > make 0 > and the peer hasn't responded. I'm not sure I follow you, Vlad. I don't think we care on why we have zero-window in there, just that if we are at it on that stage. Either one, if it's zero window, we will go through T5 and give it more time to recover, but if it's not zero window, I don't see a reason to enable T5.. Marcelo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] sctp: partial chunk should be drop without sending abort packet
On 08/24/2015 02:47 PM, Marcelo Ricardo Leitner wrote: On Mon, Aug 24, 2015 at 06:08:30PM +0800, Xin Long wrote: as RFC 4960, 6.10 said, *if the receiver detects a partial chunk, it MUST drop the chunk*, we should not send the abort. but if we put this discard to inside state machine, it will send abort. so we just drop the partial chunk there, never let this chunk go into the state machine. Signed-off-by: Xin Long --- This is basically reverting a chunk of Daniel's and Vlad's 26b87c788100 ("net: sctp: fix remote memory pressure from excessive queueing") . Isn't it going to re-introduce the initial issue then? Yes, seems so. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH
On Mon, 24 Aug 2015, Alan Stern wrote: > On Mon, 24 Aug 2015, David Miller wrote: > > > From: Eugene Shatokhin > > Date: Wed, 19 Aug 2015 14:59:01 +0300 > > > > > So the following might be possible, although unlikely: > > > > > > CPU0 CPU1 > > > clear_bit: read dev->flags > > > clear_bit: clear EVENT_RX_KILL in the read value > > > > > > dev->flags=0; > > > > > > clear_bit: write updated dev->flags > > > > > > As a result, dev->flags may become non-zero again. > > > > Is this really possible? > > > > Stores really are "atomic" in the sense that the do their update > > in one indivisible operation. > > Provided you use ACCESS_ONCE or WRITE_ONCE or whatever people like to > call it now. > > > Atomic operations like clear_bit also will behave that way. > > Are you certain about that? I couldn't find any mention of it in > Documentation/atomic_ops.txt. > > In theory, an architecture could implement atomic bit operations using > a spinlock to insure atomicity. I don't know if any architectures do > this, but if they do then the scenario above could arise. Now that I see this in writing, I realize it's not possible after all. clear_bit() et al. will work with a single unsigned long, which doesn't leave any place for spinlocks or other mechanisms. I was thinking of atomic_t. So never mind... Alan Stern -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH, net-next] r8169: On RTL 8101 series bit SYSErr is reserved.
On RTL 8101 series bit SYSErr is reserved. Signed-off-by: Corcodel Marian diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 5693e65..32d2072 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -8256,6 +8256,14 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) RTL_W8(Config1, RTL_R8(Config1) | PMEnable); RTL_W8(Config5, RTL_R8(Config5) & (BWF | MWF | UWF | LanWake | PMEStatus));*/ switch (tp->mac_version) { + case RTL_GIGA_MAC_VER_07: + case RTL_GIGA_MAC_VER_08: + case RTL_GIGA_MAC_VER_09: + case RTL_GIGA_MAC_VER_10: + case RTL_GIGA_MAC_VER_13: + case RTL_GIGA_MAC_VER_16: + pci_write_config_word(pdev, PCI_COMMAND, ~PCI_COMMAND_SERR); + break; case RTL_GIGA_MAC_VER_34: case RTL_GIGA_MAC_VER_35: case RTL_GIGA_MAC_VER_36: -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Low throughput in VMs using VxLAN
On 08/24/2015 12:19 PM, Santosh R wrote: > Hi, > >Earlier I was seeing lower throughput in VMs using VxLan as GRO was > not happening in VM. > Tom Herbert suggested to use "vxlan: GRO support at tunnel layer" patch > series. > With today's net-next (4.2.0-rc7) in host and VM, I could see GRO > happening for vxlan, macvtap and virtual interface in VM. > The throughput is still low between VMs (around 4Gbps compared to > 9Gbps without VxLAN). > Looks like the packet is getting segmented in Host and then GROed in VM. > Is this an expected behaviour? Currently yes. I am working on adding GSO_TUNNEL and related checksum support to virtio to eliminate this segmentation. -vlad > Is my below configuration correct? > > Here is the configuration. > eth (VM) - macvtap - vxlan - phy iface <-> phy iface - vxlan - > macvtap - (VM) eth > > VM is started with > # qemu-system-x86_64 -m 4096 -smp 4 -boot c -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=C2:B2:CA:6F:BC:A4 -device > e1000,netdev=tap0,mac=DE:AD:BE:EF:96:32 -netdev tap,id=hostnet0,fd=3 > 3<>/dev/tap18 -netdev tap,id=tap0,script=no -drive > file=/root/vdisk_rhel65.img > > Here is the skb_segment count for 10 sec iperf receive test. > host # ./funccount skb_segment > Tracing "skb_segment"... Ctrl-C to end. > ^C > FUNC COUNT > skb_segment 58604 > > # ./functrace skb_segment > ... > -0 [006] ..s. 17632.030126: skb_segment <-tcp_gso_segment > ksoftirqd/6-38[006] ..s. 17632.030177: skb_segment <-tcp_gso_segment > ksoftirqd/6-38[006] ..s. 17632.030223: skb_segment <-tcp_gso_segment > ksoftirqd/6-38[006] ..s. 17632.030269: skb_segment <-tcp_gso_segment > ksoftirqd/6-38[006] ..s. 17632.030298: skb_segment <-tcp_gso_segment > qemu-system-x86-5932 [006] ..s. 17632.030489: skb_segment <-tcp_gso_segment > qemu-system-x86-5932 [006] ..s. 17632.030507: skb_segment <-tcp_gso_segment > qemu-system-x86-5932 [006] ..s. 17632.030528: skb_segment <-tcp_gso_segment > qemu-system-x86-5932 [006] ..s. 17632.030550: skb_segment <-tcp_gso_segment > qemu-system-x86-5932 [006] ..s. 17632.030576: skb_segment <-tcp_gso_segment > qemu-system-x86-5932 [006] ..s1 17632.030759: skb_segment <-tcp_gso_segment > qemu-system-x86-5932 [006] ..s1 17632.030814: skb_segment <-tcp_gso_segment > .. > > # Physical interface > 21:32:49.749263 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 2870 > 21:32:49.749278 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 9860 > 21:32:49.749326 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74 > 21:32:49.749333 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74 > 21:32:49.749340 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74 > 21:32:49.749405 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 2870 > 21:32:49.749425 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 11258 > > # VxLAN > 21:32:49.749268 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: > Flags [.], seq 25:2821, ack 1, win 111, options [nop,nop,TS val > 15632994 ecr 13334931], length 2796 > 21:32:49.749281 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: > Flags [.], seq 2821:12607, ack 1, win 111, options [nop,nop,TS val > 15632994 ecr 13334931], length 9786 > 21:32:49.749322 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: > Flags [.], ack 2821, win 270, options [nop,nop,TS val 13334931 ecr > 15632994], length 0 > 21:32:49.749331 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: > Flags [.], ack 7015, win 336, options [nop,nop,TS val 13334931 ecr > 15632994], length 0 > 21:32:49.749336 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: > Flags [.], ack 12607, win 423, options [nop,nop,TS val 13334931 ecr > 15632994], length 0 > 21:32:49.749411 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: > Flags [.], seq 12607:15403, ack 1, win 111, options [nop,nop,TS val > 15632994 ecr 13334931], length 2796 > 21:32:49.749429 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: > Flags [P.], seq 15403:26587, ack 1, win 111, options [nop,nop,TS val > 15632994 ecr 13334931], length 11184 > > # macvtap > 2.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [.], seq 25:2821, > ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length > 2796 > 21:32:49.749281 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: > Flags [.], seq 2821:12607, ack 1, win 111, options [nop,nop,TS val > 15632994 ecr 13334931], length 9786 > 21:32:49.749321 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: > Flags [.], ack 2821, win 270, options [nop,nop,TS val 13334931 ecr > 15632994], length 0 > 21:32:49.749330 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: > Flags [.], ack 7015, win 336, options [nop,nop,TS val 13334931 ecr > 15632994], length 0 > 21:32:49.749335 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: > Flags [.], ack 12607, win 423, options [nop,nop,TS val 13334931 ecr > 15632994], length 0 > 21:32:49.749411 IP 102.44
Re: [PATCH net v2] sctp: start t5 timer only when peer.rwnd is 0 and local.state is SHUTDOWN_PENDING
On 08/23/2015 07:30 AM, Xin Long wrote: > when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING > state, > if B neither claim his rwnd is 0 nor send SACK for this data, A will keep > retransmitting this data util t5 timeout, Max.Retrans times can't work > anymore, > which is bad. > > if B's rwnd is not 0, it should send abord after Max.Retrans times, only when > B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A will start > t5 timer, which is also commit f8d960524 means, but it lacks the condition > peer.rwnd == 0. > > Fixes: f8d960524 ("sctp: Enforce retransmission limit during shutdown") > Signed-off-by: Xin Long > --- > net/sctp/sm_statefuns.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c > index 3ee27b7..deb9eab 100644 > --- a/net/sctp/sm_statefuns.c > +++ b/net/sctp/sm_statefuns.c > @@ -5412,7 +5412,8 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(struct net *net, > SCTP_INC_STATS(net, SCTP_MIB_T3_RTX_EXPIREDS); > > if (asoc->overall_error_count >= asoc->max_retrans) { > - if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) { > + if (!q->asoc->peer.rwnd && > + asoc->state == SCTP_STATE_SHUTDOWN_PENDING) { > /* >* We are here likely because the receiver had its rwnd >* closed for a while and we have not been able to > This may not work as expected. peer.rwnd is the calculated peer window, but it also gets updated when we receive sacks. So there is no way to tell that the current windows is 0 because peer told us, or because we sent data to make 0 and the peer hasn't responded. -vlad -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH
24.08.2015 20:43, David Miller пишет: From: Eugene Shatokhin Date: Wed, 19 Aug 2015 14:59:01 +0300 So the following might be possible, although unlikely: CPU0 CPU1 clear_bit: read dev->flags clear_bit: clear EVENT_RX_KILL in the read value dev->flags=0; clear_bit: write updated dev->flags As a result, dev->flags may become non-zero again. Is this really possible? On x86, it is not possible, so this is not a problem. Perhaps, for ARM too. As for the other architectures supported by the kernel - not sure, no common guarantees, it seems. Anyway, this is not a critical issue, I agree. OK, let us leave things as they are for this one and fix the rest. Stores really are "atomic" in the sense that the do their update in one indivisible operation. Atomic operations like clear_bit also will behave that way. If a clear_bit is in progress, the "dev->flags=0" store will not be able to grab the cache line exclusively until the clear_bit is done. So I think the above sequent of events is completely impossible. Once a clear_bit starts, a write by another foreign agent on the bus is absolutely impossible to legally occur until the clear_bit completes. I think this is a non-issue. Regards, Eugene -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH
On Mon, 24 Aug 2015, David Miller wrote: > From: Eugene Shatokhin > Date: Wed, 19 Aug 2015 14:59:01 +0300 > > > So the following might be possible, although unlikely: > > > > CPU0 CPU1 > > clear_bit: read dev->flags > > clear_bit: clear EVENT_RX_KILL in the read value > > > > dev->flags=0; > > > > clear_bit: write updated dev->flags > > > > As a result, dev->flags may become non-zero again. > > Is this really possible? > > Stores really are "atomic" in the sense that the do their update > in one indivisible operation. Provided you use ACCESS_ONCE or WRITE_ONCE or whatever people like to call it now. > Atomic operations like clear_bit also will behave that way. Are you certain about that? I couldn't find any mention of it in Documentation/atomic_ops.txt. In theory, an architecture could implement atomic bit operations using a spinlock to insure atomicity. I don't know if any architectures do this, but if they do then the scenario above could arise. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] sctp: partial chunk should be drop without sending abort packet
On 08/24/2015 06:08 AM, Xin Long wrote: > as RFC 4960, 6.10 said, *if the receiver detects a partial chunk, it MUST drop > the chunk*, we should not send the abort. but if we put this discard to inside > state machine, it will send abort. > Actually, silently dropping this is _very_ bad. There reason is that you've already processed the leading chunks and may have potentially queued a response... Now, you reach the end of the packet and find that the last chunk is partial. You end up dropping the packet, but still handing the responses. This actually lead to some very interesting issues we were seeing. It is better to terminate the association in this case. -vlad > so we just drop the partial chunk there, never let this chunk go into the > state > machine. > > Signed-off-by: Xin Long > --- > net/sctp/inqueue.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c > index 7e8a16c..a22ca57 100644 > --- a/net/sctp/inqueue.c > +++ b/net/sctp/inqueue.c > @@ -183,9 +183,9 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue) > /* This is not a singleton */ > chunk->singleton = 0; > } else if (chunk->chunk_end > skb_tail_pointer(chunk->skb)) { > - /* Discard inside state machine. */ > - chunk->pdiscard = 1; > - chunk->chunk_end = skb_tail_pointer(chunk->skb); > + sctp_chunk_free(chunk); > + chunk = queue->in_progress = NULL; > + return NULL; > } else { > /* We are at the end of the packet, so mark the chunk >* in case we need to send a SACK. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] sctp: asconf process should treat multiple address parameter as unrecognized parameter
On 08/24/2015 06:07 AM, Xin Long wrote: > currently, we sctp_walk_params(), if we encounter the address parameter, we > will > skip them, we do not care about how many addr params are there. > > but the params of ASCONF chunk should consist of one *Address Parameter* and > one > or more *ASCONF Parameters*. > > so we will process multiple address parameters as unrecognized parameter and > send error cause to peer. > > Signed-off-by: Xin Long > --- > net/sctp/sm_make_chunk.c | 12 ++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c > index 06320c8..0ee5ca7 100644 > --- a/net/sctp/sm_make_chunk.c > +++ b/net/sctp/sm_make_chunk.c > @@ -3217,10 +3217,18 @@ struct sctp_chunk *sctp_process_asconf(struct > sctp_association *asoc, > > /* Process the TLVs contained within the ASCONF chunk. */ > sctp_walk_params(param, addip, addip_hdr.params) { > - /* Skip preceeding address parameters. */ > + /* Skip preceeding address parameters. > + * process multi-addrparam as unrecognized parameters > + */ > if (param.p->type == SCTP_PARAM_IPV4_ADDRESS || > - param.p->type == SCTP_PARAM_IPV6_ADDRESS) > + param.p->type == SCTP_PARAM_IPV6_ADDRESS) { > + if(param.addr != addr_param) { > + all_param_pass = false; > + sctp_add_asconf_response(asconf_ack, 0, > + SCTP_ERROR_UNKNOWN_PARAM, param.v); > + } > continue; > + } > I think it would be much better to catch this in the validation stage. If an implementation inserts multiple address parameters, we don't really know which one we should be using. -vlad > err_code = sctp_process_asconf_param(asoc, asconf, >param.addip); > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/9] net: dsa: Allow configuration of CPU & DSA port speeds/duplex
On 23/08/15 14:24, Andrew Lunn wrote: >>> + port_dn = cd->port_dn[port]; >>> + if (of_phy_is_fixed_link(port_dn)) { >>> + ret = of_phy_register_fixed_link(port_dn); >>> + if (ret) { >>> + netdev_err(master, >>> + "failed to register fixed PHY\n"); >>> + return ret; >>> + } >>> + phydev = of_phy_find_device(port_dn); >>> + genphy_config_init(phydev); >>> + genphy_read_status(phydev); >>> + if (ds->drv->adjust_link) >>> + ds->drv->adjust_link(ds, port, phydev); >> >> This kind of hack here because what you really need is just the link >> parameters, but you cannot obtain such information without first >> configuring the PHY up to a certain point in genphy_config_init(), and >> then have genphy_read_status() copy these values in your phydev structure. >> >> Maybe we should really consider something like this after all: >> >> https://lkml.org/lkml/2015/8/5/490 > > Hi Florian > > This half solves the problem. The nice thing about using the > fixed_link, is that i can just call the adjust_link function with it. > The fixed_phy_status cannot be passed directly to adjust_link. Some > code refactoring or duplication would be needed. Right, and using an adjust_link callback seems a little cleaner anyway since you get an abstracted PHY device to work with. > >> Or maybe, we should really introduce this "cpu" network device after all >> with a dropping xmit function, such that we get ethtool counters to work >> on it, and we can also attach it to a PHY device to configure link >> parameters? > > I keep humming and harring about this. I don't really like the idea of > having an interface which you cannot send/receive packets. Yet it > solves a number of problems like this, and gives you access to > statistics and registers in the usual way. Right that would be my primary motivation and use case as well. > If we do it for the CPU > port, we should also do it for the DSA ports. And we probably want the > call for up to return -ENOSUP, just to make it clear it cannot be used > for anything. We should definitively start a separate thread for this, as there might be real uses cases that are not yet covered that would need a network device. Let's go ahead with your patch for now: Reviewed-by: Florian Fainelli -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 net-next 3/8] tunnel: introduce udp_tun_rx_dst()
Introduce function udp_tun_rx_dst() to initialize tunnel dst on receive path. Signed-off-by: Pravin B Shelar --- Rebased to support ipv6 tun-dst. --- drivers/net/vxlan.c| 29 ++-- include/net/dst_metadata.h | 61 include/net/udp_tunnel.h |4 +++ net/ipv4/ip_gre.c | 21 +++--- net/ipv4/udp_tunnel.c | 25 +- 5 files changed, 97 insertions(+), 43 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 61b457b..5b4cf66 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1264,36 +1264,13 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) } if (vxlan_collect_metadata(vs)) { - tun_dst = metadata_dst_alloc(sizeof(*md), GFP_ATOMIC); + tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), TUNNEL_KEY, +cpu_to_be64(vni >> 8), sizeof(*md)); + if (!tun_dst) goto drop; info = &tun_dst->u.tun_info; - if (vxlan_get_sk_family(vs) == AF_INET) { - const struct iphdr *iph = ip_hdr(skb); - - info->key.u.ipv4.src = iph->saddr; - info->key.u.ipv4.dst = iph->daddr; - info->key.tos = iph->tos; - info->key.ttl = iph->ttl; - } else { - const struct ipv6hdr *ip6h = ipv6_hdr(skb); - - info->key.u.ipv6.src = ip6h->saddr; - info->key.u.ipv6.dst = ip6h->daddr; - info->key.tos = ipv6_get_dsfield(ip6h); - info->key.ttl = ip6h->hop_limit; - } - - info->key.tp_src = udp_hdr(skb)->source; - info->key.tp_dst = udp_hdr(skb)->dest; - - info->mode = IP_TUNNEL_INFO_RX; - info->key.tun_flags = TUNNEL_KEY; - info->key.tun_id = cpu_to_be64(vni >> 8); - if (udp_hdr(skb)->check != 0) - info->key.tun_flags |= TUNNEL_CSUM; - md = ip_tunnel_info_opts(info, sizeof(*md)); } else { memset(md, 0, sizeof(*md)); diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h index 2cb52d5..60c0332 100644 --- a/include/net/dst_metadata.h +++ b/include/net/dst_metadata.h @@ -48,4 +48,65 @@ static inline bool skb_valid_dst(const struct sk_buff *skb) struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags); struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t flags); +static inline struct metadata_dst *tun_rx_dst(__be16 flags, + __be64 tunnel_id, int md_size) +{ + struct metadata_dst *tun_dst; + struct ip_tunnel_info *info; + + tun_dst = metadata_dst_alloc(md_size, GFP_ATOMIC); + if (!tun_dst) + return NULL; + + info = &tun_dst->u.tun_info; + info->mode = IP_TUNNEL_INFO_RX; + info->key.tun_flags = flags; + info->key.tun_id = tunnel_id; + info->key.tp_src = 0; + info->key.tp_dst = 0; + return tun_dst; +} + +static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb, +__be16 flags, +__be64 tunnel_id, +int md_size) +{ + const struct iphdr *iph = ip_hdr(skb); + struct metadata_dst *tun_dst; + struct ip_tunnel_info *info; + + tun_dst = tun_rx_dst(flags, tunnel_id, md_size); + if (!tun_dst) + return NULL; + + info = &tun_dst->u.tun_info; + info->key.u.ipv4.src = iph->saddr; + info->key.u.ipv4.dst = iph->daddr; + info->key.tos = iph->tos; + info->key.ttl = iph->ttl; + return tun_dst; +} + +static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb, +__be16 flags, +__be64 tunnel_id, +int md_size) +{ + const struct ipv6hdr *ip6h = ipv6_hdr(skb); + struct metadata_dst *tun_dst; + struct ip_tunnel_info *info; + + tun_dst = tun_rx_dst(flags, tunnel_id, md_size); + if (!tun_dst) + return NULL; + + info = &tun_dst->u.tun_info; + info->key.u.ipv6.src = ip6h->saddr; + info->key.u.ipv6.dst = ip6h->daddr; + info->key.tos = ipv6_get_dsfield(ip6h); + info->key.ttl = ip6h->hop_limit; + return tun_dst; +} + #endif /* __NET_DST_METADATA_H */ diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h index c491c12..35041d0 100644 --- a/include/net/udp_tunnel.h +++ b/include/net/udp_tunnel.h @@ -93,6 +93,10 @@ int udp_tunnel6_xmit_skb(struct dst_
Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH
From: Eugene Shatokhin Date: Wed, 19 Aug 2015 14:59:01 +0300 > So the following might be possible, although unlikely: > > CPU0 CPU1 > clear_bit: read dev->flags > clear_bit: clear EVENT_RX_KILL in the read value > > dev->flags=0; > > clear_bit: write updated dev->flags > > As a result, dev->flags may become non-zero again. Is this really possible? Stores really are "atomic" in the sense that the do their update in one indivisible operation. Atomic operations like clear_bit also will behave that way. If a clear_bit is in progress, the "dev->flags=0" store will not be able to grab the cache line exclusively until the clear_bit is done. So I think the above sequent of events is completely impossible. Once a clear_bit starts, a write by another foreign agent on the bus is absolutely impossible to legally occur until the clear_bit completes. I think this is a non-issue. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 net-next 6/8] openvswitch: Use Geneve device.
With help of tunnel metadata mode OVS can directly use Geneve devices to implement Geneve tunnels. This patch removes all of the OVS specific Geneve code and make OVS use a Geneve net_device. Basic geneve vport is still there to handle compatibility with current userspace application. Signed-off-by: Pravin B Shelar Reviewed-by: Jesse Gross --- net/openvswitch/Kconfig|2 +- net/openvswitch/vport-geneve.c | 179 +++ 2 files changed, 33 insertions(+), 148 deletions(-) diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig index 422dc05..87b98c0 100644 --- a/net/openvswitch/Kconfig +++ b/net/openvswitch/Kconfig @@ -59,7 +59,7 @@ config OPENVSWITCH_VXLAN config OPENVSWITCH_GENEVE tristate "Open vSwitch Geneve tunneling support" depends on OPENVSWITCH - depends on GENEVE_CORE + depends on GENEVE default OPENVSWITCH ---help--- If you say Y here, then the Open vSwitch will be able create geneve vport. diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c index d01bd63..fa37c95 100644 --- a/net/openvswitch/vport-geneve.c +++ b/net/openvswitch/vport-geneve.c @@ -26,95 +26,44 @@ #include "datapath.h" #include "vport.h" +#include "vport-netdev.h" static struct vport_ops ovs_geneve_vport_ops; - /** * struct geneve_port - Keeps track of open UDP ports - * @gs: The socket created for this port number. - * @name: vport name. + * @dst_port: destination port. */ struct geneve_port { - struct geneve_sock *gs; - char name[IFNAMSIZ]; + u16 port_no; }; -static LIST_HEAD(geneve_ports); - static inline struct geneve_port *geneve_vport(const struct vport *vport) { return vport_priv(vport); } -/* Convert 64 bit tunnel ID to 24 bit VNI. */ -static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni) -{ -#ifdef __BIG_ENDIAN - vni[0] = (__force __u8)(tun_id >> 16); - vni[1] = (__force __u8)(tun_id >> 8); - vni[2] = (__force __u8)tun_id; -#else - vni[0] = (__force __u8)((__force u64)tun_id >> 40); - vni[1] = (__force __u8)((__force u64)tun_id >> 48); - vni[2] = (__force __u8)((__force u64)tun_id >> 56); -#endif -} - -/* Convert 24 bit VNI to 64 bit tunnel ID. */ -static __be64 vni_to_tunnel_id(const __u8 *vni) -{ -#ifdef __BIG_ENDIAN - return (vni[0] << 16) | (vni[1] << 8) | vni[2]; -#else - return (__force __be64)(((__force u64)vni[0] << 40) | - ((__force u64)vni[1] << 48) | - ((__force u64)vni[2] << 56)); -#endif -} - -static void geneve_rcv(struct geneve_sock *gs, struct sk_buff *skb) -{ - struct vport *vport = gs->rcv_data; - struct genevehdr *geneveh = geneve_hdr(skb); - int opts_len; - struct ip_tunnel_info tun_info; - __be64 key; - __be16 flags; - - opts_len = geneveh->opt_len * 4; - - flags = TUNNEL_KEY | TUNNEL_GENEVE_OPT | - (udp_hdr(skb)->check != 0 ? TUNNEL_CSUM : 0) | - (geneveh->oam ? TUNNEL_OAM : 0) | - (geneveh->critical ? TUNNEL_CRIT_OPT : 0); - - key = vni_to_tunnel_id(geneveh->vni); - - ip_tunnel_info_init(&tun_info, ip_hdr(skb), - udp_hdr(skb)->source, udp_hdr(skb)->dest, - key, flags, geneveh->options, opts_len); - - ovs_vport_receive(vport, skb, &tun_info); -} - static int geneve_get_options(const struct vport *vport, struct sk_buff *skb) { struct geneve_port *geneve_port = geneve_vport(vport); - struct inet_sock *sk = inet_sk(geneve_port->gs->sock->sk); - if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, ntohs(sk->inet_sport))) + if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, geneve_port->port_no)) return -EMSGSIZE; return 0; } -static void geneve_tnl_destroy(struct vport *vport) +static int geneve_get_egress_tun_info(struct vport *vport, struct sk_buff *skb, + struct ip_tunnel_info *egress_tun_info) { struct geneve_port *geneve_port = geneve_vport(vport); + struct net *net = ovs_dp_get_net(vport->dp); + __be16 dport = htons(geneve_port->port_no); + __be16 sport = udp_flow_src_port(net, skb, 1, USHRT_MAX, true); - geneve_sock_release(geneve_port->gs); - - ovs_vport_deferred_free(vport); + return ovs_tunnel_get_egress_info(egress_tun_info, + ovs_dp_get_net(vport->dp), + OVS_CB(skb)->egress_tun_info, + IPPROTO_UDP, skb->mark, sport, dport); } static struct vport *geneve_tnl_create(const struct vport_parms *parms) @@ -122,11 +71,11 @@ static struct vport *geneve_tnl_create(const struct vport_parms *parms) struct net *net = ovs_dp_get_net(parms->dp); struct nlattr *options
[PATCH v3 net-next 1/8] geneve: Initialize ethernet address in device setup.
Signed-off-by: Pravin B Shelar Reviewed-by: Jesse Gross Acked-by: Thomas Graf --- drivers/net/geneve.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 897e1a3..95e9da0 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -297,6 +297,7 @@ static void geneve_setup(struct net_device *dev) netif_keep_dst(dev); dev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_NO_QUEUE; + eth_hw_addr_random(dev); } static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = { @@ -364,9 +365,6 @@ static int geneve_newlink(struct net *net, struct net_device *dev, return -EBUSY; } - if (tb[IFLA_ADDRESS] == NULL) - eth_hw_addr_random(dev); - err = register_netdevice(dev); if (err) return err; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 net-next 5/8] geneve: Add support to collect tunnel metadata.
Following patch create new tunnel flag which enable tunnel metadata collection on given device. These devices can be used by tunnel metadata based routing or by OVS. Geneve Consolidation patch get rid of collect_md_tun to simplify tunnel lookup further. Signed-off-by: Pravin B Shelar --- v2-v3: Do not allow regular and metadata tunnel devices on same port. --- drivers/net/geneve.c | 360 -- include/net/geneve.h |3 + include/uapi/linux/if_link.h |1 + 3 files changed, 280 insertions(+), 84 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 0a6d974..c05bc13 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -36,6 +37,7 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); struct geneve_net { struct list_head geneve_list; struct hlist_head vni_list[VNI_HASH_SIZE]; + struct geneve_dev __rcu *collect_md_tun; }; /* Pseudo network device */ @@ -50,6 +52,7 @@ struct geneve_dev { struct sockaddr_in remote; /* IPv4 address for link partner */ struct list_head next;/* geneve's per namespace list */ __be16 dst_port; + bool collect_md; }; static int geneve_net_id; @@ -62,48 +65,95 @@ static inline __u32 geneve_net_vni_hash(u8 vni[3]) return hash_32(vnid, VNI_HASH_BITS); } -/* geneve receive/decap routine */ -static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) +static __be64 vni_to_tunnel_id(const __u8 *vni) +{ +#ifdef __BIG_ENDIAN + return (vni[0] << 16) | (vni[1] << 8) | vni[2]; +#else + return (__force __be64)(((__force u64)vni[0] << 40) | + ((__force u64)vni[1] << 48) | + ((__force u64)vni[2] << 56)); +#endif +} + +static struct geneve_dev *geneve_lookup(struct geneve_net *gn, + struct geneve_sock *gs, + struct iphdr *iph, + struct genevehdr *gnvh) { struct inet_sock *sk = inet_sk(gs->sock->sk); - struct genevehdr *gnvh = geneve_hdr(skb); - struct geneve_dev *dummy, *geneve = NULL; - struct geneve_net *gn; - struct iphdr *iph = NULL; - struct pcpu_sw_netstats *stats; struct hlist_head *vni_list_head; - int err = 0; + struct geneve_dev *geneve; __u32 hash; - iph = ip_hdr(skb); /* Still outer IP header... */ - - gn = gs->rcv_data; + geneve = rcu_dereference(gn->collect_md_tun); + if (geneve) + return geneve; /* Find the device for this VNI */ hash = geneve_net_vni_hash(gnvh->vni); vni_list_head = &gn->vni_list[hash]; - hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) { - if (!memcmp(gnvh->vni, dummy->vni, sizeof(dummy->vni)) && - iph->saddr == dummy->remote.sin_addr.s_addr && - sk->inet_sport == dummy->dst_port) { - geneve = dummy; - break; + hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) { + if (!memcmp(gnvh->vni, geneve->vni, sizeof(geneve->vni)) && + iph->saddr == geneve->remote.sin_addr.s_addr && + sk->inet_sport == geneve->dst_port) { + return geneve; } } + return NULL; +} + +/* geneve receive/decap routine */ +static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) +{ + struct genevehdr *gnvh = geneve_hdr(skb); + struct metadata_dst *tun_dst = NULL; + struct geneve_dev *geneve = NULL; + struct pcpu_sw_netstats *stats; + struct geneve_net *gn; + struct iphdr *iph; + int err; + + iph = ip_hdr(skb); /* Still outer IP header... */ + gn = gs->rcv_data; + geneve = geneve_lookup(gn, gs, iph, gnvh); if (!geneve) goto drop; - /* Drop packets w/ critical options, -* since we don't support any... -*/ - if (gnvh->critical) - goto drop; + if (ip_tunnel_collect_metadata() && geneve->collect_md) { + __be16 flags; + void *opts; + + flags = TUNNEL_KEY | TUNNEL_GENEVE_OPT | + (gnvh->oam ? TUNNEL_OAM : 0) | + (gnvh->critical ? TUNNEL_CRIT_OPT : 0); + + tun_dst = udp_tun_rx_dst(skb, AF_INET, flags, +vni_to_tunnel_id(gnvh->vni), +gnvh->opt_len * 4); + if (!tun_dst) + goto drop; + + /* Update tunnel dst according to Geneve options. */ + opts = ip_tunnel_info_opts(&tun_dst->u.tun_i
[PATCH v3 net-next 2/8] geneve: Use skb mark and protocol to lookup route.
On packet transmit path geneve need to lookup route. Following patch improves route lookup using more parameters. Signed-off-by: Pravin B Shelar Reviewed-by: Jesse Gross Acked-by: Thomas Graf --- drivers/net/geneve.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 95e9da0..3c5b2b1 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -202,6 +202,9 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev) memset(&fl4, 0, sizeof(fl4)); fl4.flowi4_tos = RT_TOS(tos); fl4.daddr = geneve->remote.sin_addr.s_addr; + fl4.flowi4_mark = skb->mark; + fl4.flowi4_proto = IPPROTO_UDP; + rt = ip_route_output_key(geneve->net, &fl4); if (IS_ERR(rt)) { netdev_dbg(dev, "no route to %pI4\n", &fl4.daddr); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 net-next 7/8] geneve: Consolidate Geneve functionality in single module.
geneve_core module handles send and receive functionality. This way OVS could use the Geneve API. Now with use of tunnel meatadata mode OVS can directly use Geneve netdevice. So there is no need for separate module for Geneve. Following patch consolidates Geneve protocol processing in single module. Signed-off-by: Pravin B Shelar --- v2-v3: - Fixed Kconfig dependency. - unified geneve_build_skb() - Fixed geneve_build_skb() error path. --- drivers/net/Kconfig|4 +- drivers/net/geneve.c | 494 +++- include/net/geneve.h | 34 net/ipv4/Kconfig | 14 -- net/ipv4/Makefile |1 - net/ipv4/geneve_core.c | 447 --- 6 files changed, 407 insertions(+), 587 deletions(-) delete mode 100644 net/ipv4/geneve_core.c diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index f503736..7727b8b 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -180,8 +180,8 @@ config VXLAN will be called vxlan. config GENEVE - tristate "Generic Network Virtualization Encapsulation netdev" - depends on INET && GENEVE_CORE + tristate "Generic Network Virtualization Encapsulation" + depends on INET && NET_UDP_TUNNEL select NET_IP_TUNNEL ---help--- This allows one to create geneve virtual interfaces that provide diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index c05bc13..8eb875d 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -18,6 +18,7 @@ #include #include #include +#include #define GENEVE_NETDEV_VER "0.6" @@ -33,13 +34,18 @@ static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); +#define GENEVE_VER 0 +#define GENEVE_BASE_HLEN (sizeof(struct udphdr) + sizeof(struct genevehdr)) + /* per-network namespace private data for this module */ struct geneve_net { - struct list_head geneve_list; - struct hlist_head vni_list[VNI_HASH_SIZE]; - struct geneve_dev __rcu *collect_md_tun; + struct list_headgeneve_list; + struct hlist_head vni_list[VNI_HASH_SIZE]; + struct list_headsock_list; }; +static int geneve_net_id; + /* Pseudo network device */ struct geneve_dev { struct hlist_node hlist; /* vni hash table */ @@ -55,7 +61,15 @@ struct geneve_dev { bool collect_md; }; -static int geneve_net_id; +struct geneve_sock { + boolcollect_md; + struct geneve_net *gn; + struct list_headlist; + struct socket *sock; + struct rcu_head rcu; + int refcnt; + struct udp_offload udp_offloads; +}; static inline __u32 geneve_net_vni_hash(u8 vni[3]) { @@ -76,51 +90,63 @@ static __be64 vni_to_tunnel_id(const __u8 *vni) #endif } -static struct geneve_dev *geneve_lookup(struct geneve_net *gn, - struct geneve_sock *gs, - struct iphdr *iph, - struct genevehdr *gnvh) +static struct geneve_dev *geneve_lookup(struct geneve_net *gn, __be16 port, + __be32 addr, u8 vni[]) { - struct inet_sock *sk = inet_sk(gs->sock->sk); struct hlist_head *vni_list_head; struct geneve_dev *geneve; __u32 hash; - geneve = rcu_dereference(gn->collect_md_tun); - if (geneve) - return geneve; - /* Find the device for this VNI */ - hash = geneve_net_vni_hash(gnvh->vni); + hash = geneve_net_vni_hash(vni); vni_list_head = &gn->vni_list[hash]; hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) { - if (!memcmp(gnvh->vni, geneve->vni, sizeof(geneve->vni)) && - iph->saddr == geneve->remote.sin_addr.s_addr && - sk->inet_sport == geneve->dst_port) { + if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) && + addr == geneve->remote.sin_addr.s_addr && + port == geneve->dst_port) { return geneve; } } return NULL; } +static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb) +{ + return (struct genevehdr *)(udp_hdr(skb) + 1); +} + /* geneve receive/decap routine */ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) { + struct inet_sock *sk = inet_sk(gs->sock->sk); struct genevehdr *gnvh = geneve_hdr(skb); + struct geneve_net *gn = gs->gn; struct metadata_dst *tun_dst = NULL; struct geneve_dev *geneve = NULL; struct pcpu_sw_netstats *stats; - struct geneve_net *gn; struct iphdr *iph; + u8 *vni; + __be32 addr; + bool xnet; int err;
[PATCH v3 net-next 8/8] geneve: Move device hash table to geneve socket.
This change simplifies Geneve Tunnel hash table management. Signed-off-by: Pravin B Shelar Reviewed-by: Jesse Gross --- drivers/net/geneve.c | 58 ++--- 1 files changed, 26 insertions(+), 32 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 9967f4c..8358d41 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -40,7 +40,6 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); /* per-network namespace private data for this module */ struct geneve_net { struct list_headgeneve_list; - struct hlist_head vni_list[VNI_HASH_SIZE]; struct list_headsock_list; }; @@ -63,12 +62,12 @@ struct geneve_dev { struct geneve_sock { boolcollect_md; - struct geneve_net *gn; struct list_headlist; struct socket *sock; struct rcu_head rcu; int refcnt; struct udp_offload udp_offloads; + struct hlist_head vni_list[VNI_HASH_SIZE]; }; static inline __u32 geneve_net_vni_hash(u8 vni[3]) @@ -90,7 +89,7 @@ static __be64 vni_to_tunnel_id(const __u8 *vni) #endif } -static struct geneve_dev *geneve_lookup(struct geneve_net *gn, __be16 port, +static struct geneve_dev *geneve_lookup(struct geneve_sock *gs, __be32 addr, u8 vni[]) { struct hlist_head *vni_list_head; @@ -99,13 +98,11 @@ static struct geneve_dev *geneve_lookup(struct geneve_net *gn, __be16 port, /* Find the device for this VNI */ hash = geneve_net_vni_hash(vni); - vni_list_head = &gn->vni_list[hash]; + vni_list_head = &gs->vni_list[hash]; hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) { if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) && - addr == geneve->remote.sin_addr.s_addr && - port == geneve->dst_port) { + addr == geneve->remote.sin_addr.s_addr) return geneve; - } } return NULL; } @@ -118,9 +115,7 @@ static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb) /* geneve receive/decap routine */ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) { - struct inet_sock *sk = inet_sk(gs->sock->sk); struct genevehdr *gnvh = geneve_hdr(skb); - struct geneve_net *gn = gs->gn; struct metadata_dst *tun_dst = NULL; struct geneve_dev *geneve = NULL; struct pcpu_sw_netstats *stats; @@ -130,8 +125,6 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) bool xnet; int err; - iph = ip_hdr(skb); /* Still outer IP header... */ - if (gs->collect_md) { static u8 zero_vni[3]; @@ -139,10 +132,11 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) addr = 0; } else { vni = gnvh->vni; + iph = ip_hdr(skb); /* Still outer IP header... */ addr = iph->saddr; } - geneve = geneve_lookup(gn, sk->inet_sport, addr, vni); + geneve = geneve_lookup(gs, addr, vni); if (!geneve) goto drop; @@ -419,6 +413,7 @@ static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port, struct geneve_sock *gs; struct socket *sock; struct udp_tunnel_sock_cfg tunnel_cfg; + int h; gs = kzalloc(sizeof(*gs), GFP_KERNEL); if (!gs) @@ -432,7 +427,8 @@ static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port, gs->sock = sock; gs->refcnt = 1; - gs->gn = gn; + for (h = 0; h < VNI_HASH_SIZE; ++h) + INIT_HLIST_HEAD(&gs->vni_list[h]); /* Initialize the geneve udp offloads structure */ gs->udp_offloads.port = port; @@ -446,7 +442,6 @@ static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port, tunnel_cfg.encap_rcv = geneve_udp_encap_recv; tunnel_cfg.encap_destroy = NULL; setup_udp_tunnel_sock(net, sock, &tunnel_cfg); - list_add(&gs->list, &gn->sock_list); return gs; } @@ -491,6 +486,7 @@ static int geneve_open(struct net_device *dev) struct net *net = geneve->net; struct geneve_net *gn = net_generic(net, geneve_net_id); struct geneve_sock *gs; + __u32 hash; gs = geneve_find_sock(gn, geneve->dst_port); if (gs) { @@ -505,14 +501,20 @@ static int geneve_open(struct net_device *dev) out: gs->collect_md = geneve->collect_md; geneve->sock = gs; + + hash = geneve_net_vni_hash(geneve->vni); + hlist_add_head_rcu(&geneve->hlist, &gs->vni_list[hash]); return 0; } static int geneve_stop(struct net_device *dev) { struct geneve_dev *geneve = netdev_priv(dev
[PATCH v3 net-next 0/8] Geneve: Add support for tunnel metadata mode
Following patches adds support for Geneve tunnel metadata mode. OVS can make use of Geneve net-device with tunnel metadata API from kernel. This also allows us to consolidate Geneve implementation from two kernel modules geneve_core and geneve to single geneve module. geneve_core module was targeted to share Geneve encap and decap code between Geneve netdevice and OVS Geneve tunnel implementation, Since OVS no longer needs these API, Geneve code can be consolidated into single geneve module. v2-v3: - make tunnel medata device and regular device mutually exclusive. - Fix Kconfig dependency for Geneve. - Fix dst-port netlink encoding. - drop changelink patch. v1-v2: - Replaced per hash table tunnel pointer (metadata enabled) with flag. - Added support for changelink. - Improve geneve device route lookup with more parameters. Pravin B Shelar (8): geneve: Initialize ethernet address in device setup. geneve: Use skb mark and protocol to lookup route. tunnel: introduce udp_tun_rx_dst() geneve: Make dst-port configurable. geneve: Add support to collect tunnel metadata. openvswitch: Use Geneve device. geneve: Consolidate Geneve functionality in single module. geneve: Move device hash table to geneve socket. drivers/net/Kconfig|2 +- drivers/net/geneve.c | 730 ++-- drivers/net/vxlan.c| 29 +-- include/net/dst_metadata.h | 61 include/net/geneve.h | 35 +-- include/net/udp_tunnel.h |4 + include/uapi/linux/if_link.h |2 + net/ipv4/Kconfig | 14 - net/ipv4/Makefile |1 - net/ipv4/geneve_core.c | 447 net/ipv4/ip_gre.c | 20 +- net/ipv4/udp_tunnel.c | 25 ++- net/openvswitch/Kconfig|2 +- net/openvswitch/vport-geneve.c | 179 ++ 14 files changed, 760 insertions(+), 791 deletions(-) delete mode 100644 net/ipv4/geneve_core.c -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 net-next 4/8] geneve: Make dst-port configurable.
Add netlink interface to configure Geneve UDP port number. So that user can configure it for a Gevene device. Signed-off-by: Pravin B Shelar Reviewed-by: Jesse Gross --- Fixed dst-port netlink encoding --- drivers/net/geneve.c | 25 + include/uapi/linux/if_link.h |1 + 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 3c5b2b1..0a6d974 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -49,6 +49,7 @@ struct geneve_dev { u8 tos; /* TOS override */ struct sockaddr_in remote; /* IPv4 address for link partner */ struct list_head next;/* geneve's per namespace list */ + __be16 dst_port; }; static int geneve_net_id; @@ -64,6 +65,7 @@ static inline __u32 geneve_net_vni_hash(u8 vni[3]) /* geneve receive/decap routine */ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) { + struct inet_sock *sk = inet_sk(gs->sock->sk); struct genevehdr *gnvh = geneve_hdr(skb); struct geneve_dev *dummy, *geneve = NULL; struct geneve_net *gn; @@ -82,7 +84,8 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) vni_list_head = &gn->vni_list[hash]; hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) { if (!memcmp(gnvh->vni, dummy->vni, sizeof(dummy->vni)) && - iph->saddr == dummy->remote.sin_addr.s_addr) { + iph->saddr == dummy->remote.sin_addr.s_addr && + sk->inet_sport == dummy->dst_port) { geneve = dummy; break; } @@ -157,7 +160,7 @@ static int geneve_open(struct net_device *dev) struct geneve_net *gn = net_generic(geneve->net, geneve_net_id); struct geneve_sock *gs; - gs = geneve_sock_add(net, htons(GENEVE_UDP_PORT), geneve_rx, gn, + gs = geneve_sock_add(net, geneve->dst_port, geneve_rx, gn, false, false); if (IS_ERR(gs)) return PTR_ERR(gs); @@ -228,7 +231,7 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev) /* no need to handle local destination and encap bypass...yet... */ err = geneve_xmit_skb(gs, rt, skb, fl4.saddr, fl4.daddr, - tos, ttl, 0, sport, htons(GENEVE_UDP_PORT), 0, + tos, ttl, 0, sport, geneve->dst_port, 0, geneve->vni, 0, NULL, false, !net_eq(geneve->net, dev_net(geneve->dev))); if (err < 0) @@ -308,6 +311,7 @@ static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = { [IFLA_GENEVE_REMOTE]= { .len = FIELD_SIZEOF(struct iphdr, daddr) }, [IFLA_GENEVE_TTL] = { .type = NLA_U8 }, [IFLA_GENEVE_TOS] = { .type = NLA_U8 }, + [IFLA_GENEVE_PORT] = { .type = NLA_U16 }, }; static int geneve_validate(struct nlattr *tb[], struct nlattr *data[]) @@ -341,6 +345,7 @@ static int geneve_newlink(struct net *net, struct net_device *dev, struct hlist_head *vni_list_head; struct sockaddr_in remote; /* IPv4 address for link partner */ __u32 vni, hash; + __be16 dst_port; int err; if (!data[IFLA_GENEVE_ID] || !data[IFLA_GENEVE_REMOTE]) @@ -359,13 +364,20 @@ static int geneve_newlink(struct net *net, struct net_device *dev, if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr))) return -EINVAL; + if (data[IFLA_GENEVE_PORT]) + dst_port = htons(nla_get_u16(data[IFLA_GENEVE_PORT])); + else + dst_port = htons(GENEVE_UDP_PORT); + remote = geneve->remote; hash = geneve_net_vni_hash(geneve->vni); vni_list_head = &gn->vni_list[hash]; hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) { if (!memcmp(geneve->vni, dummy->vni, sizeof(dummy->vni)) && - !memcmp(&remote, &dummy->remote, sizeof(dummy->remote))) + !memcmp(&remote, &dummy->remote, sizeof(dummy->remote)) && + dst_port == dummy->dst_port) { return -EBUSY; + } } err = register_netdevice(dev); @@ -378,6 +390,7 @@ static int geneve_newlink(struct net *net, struct net_device *dev, if (data[IFLA_GENEVE_TOS]) geneve->tos = nla_get_u8(data[IFLA_GENEVE_TOS]); + geneve->dst_port = dst_port; list_add(&geneve->next, &gn->geneve_list); hlist_add_head_rcu(&geneve->hlist, &gn->vni_list[hash]); @@ -402,6 +415,7 @@ static size_t geneve_get_size(const struct net_device *dev) nla_total_size(sizeof(struct in_addr)) + /* IFLA_GENEVE_REMOTE */ nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_TTL */
Re: [PATCH net-next v3 0/2] ila: Precompute checksums
From: Tom Herbert Date: Mon, 24 Aug 2015 09:45:40 -0700 > This patch set: > - Adds argument ot LWT build_state that holds a pointer to the fib >configuration being applied to the new route > - Adds support in ILA to precompute checksum difference for >performance optimization > > v2: > - Move return argument in build_state to end of arguments > > v3: > - Update the signature for ip6_tun_build_state() Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] sctp: asconf's process should verify address parameter is in the beginning
On 08/24/2015 06:07 AM, Xin Long wrote: > in sctp_process_asconf(), we get address parameter from the beginning of the > addip params. but we never check if it's really there. if the addr param is > not > there, it still can pass sctp_verify_asconf(), then to be handled by > sctp_process_asconf(), it will not be safe. > > so add a code in sctp_verify_asconf() to check the address parameter is in the > beginning, or return false to send abort. > > Signed-off-by: Xin Long > --- > net/sctp/sm_make_chunk.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c > index 0ee5ca7..a2a72d5 100644 > --- a/net/sctp/sm_make_chunk.c > +++ b/net/sctp/sm_make_chunk.c > @@ -3122,6 +3122,14 @@ bool sctp_verify_asconf(const struct sctp_association > *asoc, > union sctp_params param; > bool addr_param_seen = false; > > + if(addr_param_needed){ > + /* Ensure the address parameter is in the beginning */ > + param.v = chunk->skb->data + sizeof(sctp_addiphdr_t); > + if (param.p->type != SCTP_PARAM_IPV4_ADDRESS && > + param.p->type != SCTP_PARAM_IPV6_ADDRESS) > + return false; > + } > + Sorry, you can't do that directly without a lot more checks. The parameer may be only only partial, or may not be there at all. You'd end up looking at wrong mememory. A better way would be to set the addr_param_seen only when looking at the first parameter (addip_hdr.params). -vlad > sctp_walk_params(param, addip, addip_hdr.params) { > size_t length = ntohs(param.p->length); > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Low throughput in VMs using VxLAN
On 08/24/2015 09:19 AM, Santosh R wrote: Hi, Earlier I was seeing lower throughput in VMs using VxLan as GRO was not happening in VM. Tom Herbert suggested to use "vxlan: GRO support at tunnel layer" patch series. With today's net-next (4.2.0-rc7) in host and VM, I could see GRO happening for vxlan, macvtap and virtual interface in VM. The throughput is still low between VMs (around 4Gbps compared to 9Gbps without VxLAN). Out of curiosity, have you tried tweaking gro_flush_timeout (gro_flush_interval?) for the VMs eth interface? Say perhaps a value of 1000? (I'm assuming the VM is using virtio_net) Does the behaviour change if vhost-net is loaded into the host and used by the VM? rick jones For completeness, it would also be good to compare the likes of netperf TCP_RR between VxLAN and without. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 6/9] dsa: mv88e6xxx: Set the RGMII delay based on phy interface
On 23/08/15 14:10, Andrew Lunn wrote: > On Sun, Aug 23, 2015 at 11:44:01AM -0700, Florian Fainelli wrote: >> Le 08/23/15 02:46, Andrew Lunn a écrit : >>> Some Marvell switches allow the RGMII Rx and Tx clock to be delayed >>> when the port is using RGMII. Have the adjust_link function look at >>> the phy interface type and enable this delay as requested. >>> >>> Signed-off-by: Andrew Lunn >>> --- >>> drivers/net/dsa/mv88e6xxx.c | 10 ++ >>> drivers/net/dsa/mv88e6xxx.h | 2 ++ >>> 2 files changed, 12 insertions(+) >>> >>> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c >>> index 7901db6503b4..f5af368751b2 100644 >>> --- a/drivers/net/dsa/mv88e6xxx.c >>> +++ b/drivers/net/dsa/mv88e6xxx.c >>> @@ -612,6 +612,16 @@ void mv88e6xxx_adjust_link(struct dsa_switch *ds, int >>> port, >>> if (phydev->duplex == DUPLEX_FULL) >>> reg |= PORT_PCS_CTRL_DUPLEX_FULL; >>> >>> + if ((mv88e6xxx_6352_family(ds) || mv88e6xxx_6351_family(ds)) && >>> + (port >= ps->num_ports - 2)) { >> >> Are we positive that the last two ports of a switch are going to be >> RGMII capable or is this something that should be moved to Device Tree / >> platform data to account for different switch families? Maybe having a >> bitmask of RGMII capable ports stored in "ps" would be good enough? > > Hi Florian > > For these two families, this is correct. And it is a property of the > switch, not the board, so should not be in DT. Other families are > different. Older ones are Fast Ethernet only. Some don't have any > RGMII ports, etc. It could be with time, this condition gets messy, at > which point, a bitmask in ps would make sense. But is it justified > now? Sure, I think for now this patch is good as-is, I was mostly curious whether the assumption about the last 2 ports of the switch being RGMII would hold for a while, and it looks like it will. With that: Reviewed-by: Florian Fainelli -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH
24.08.2015 16:29, Bjørn Mork пишет: Eugene Shatokhin writes: 19.08.2015 15:31, Bjørn Mork пишет: Eugene Shatokhin writes: The problem is not in the reordering but rather in the fact that "dev->flags = 0" is not necessarily atomic w.r.t. "clear_bit(EVENT_RX_KILL, &dev->flags)", and vice versa. So the following might be possible, although unlikely: CPU0 CPU1 clear_bit: read dev->flags clear_bit: clear EVENT_RX_KILL in the read value dev->flags=0; clear_bit: write updated dev->flags As a result, dev->flags may become non-zero again. Ah, right. Thanks for explaining. I cannot prove yet that this is an impossible situation. If anyone can, please explain. If so, this part of the patch will not be needed. I wonder if we could simply move the dev->flags = 0 down a few lines to fix both issues? It doesn't seem to do anything useful except for resetting the flags to a sane initial state after the device is down. Stopping the tasklet rescheduling etc depends only on netif_running(), which will be false when usbnet_stop is called. There is no need to touch dev->flags for this to happen. That was one of the first ideas we discussed here. Unfortunately, it is probably not so simple. Setting dev->flags to 0 makes some delayed operations do nothing and, among other things, not to reschedule usbnet_bh(). Yes, but I believe that is merely a side effect. You should never need to clear multiple flags to get the desired behaviour. As you can see in drivers/net/usb/usbnet.c, usbnet_bh() can be called as a tasklet function and as a timer function in a number of situations (look for the usage of dev->bh and dev->delay there). netif_running() is indeed false when usbnet_stop() runs, usbnet_stop() also disables Tx. This seems to be enough for many cases where usbnet_bh() is scheduled, but I am not so sure about the remaining ones, namely: 1. A work function, usbnet_deferred_kevent(), may reschedule usbnet_bh(). Looks like the workqueue is only stopped in usbnet_disconnect(), so a work item might be processed while usbnet_stop() works. Setting dev->flags to 0 makes the work function do nothing, by the way. See also the comment in usbnet_stop() about this. A work item may be placed to this workqueue in a number of ways, by both usbnet module and the mini-drivers. It is not too easy to track all these situations. That's an understatement :) 2. rx_complete() and tx_complete() may schedule execution of usbnet_bh() as a tasklet or a timer function. These two are URB completion callbacks. It seems, new Rx and Tx URBs cannot be submitted when usbnet_stop() clears dev->flags, indeed. But it does not prevent the completion handlers for the previously submitted URBs from running concurrently with usbnet_stop(). The latter waits for them to complete (via usbnet_terminate_urbs(dev)) but only if FLAG_AVOID_UNLINK_URBS is not set in info->flags. rndis_wlan, however, sets this flag for a few hardware models. So - no guarantees here as well. FLAG_AVOID_UNLINK_URBS looks like it should be replaced by the newer ability to keep the status urb active. I believe that must have been the real reason for adding it, based on the commit message and the effect the flag will have: commit 1487cd5e76337555737cbc55d7d83f41460d198f Author: Jussi Kivilinna Date: Thu Jul 30 19:41:20 2009 +0300 usbnet: allow "minidriver" to prevent urb unlinking on usbnet_stop rndis_wlan devices freeze after running usbnet_stop several times. It appears that firmware freezes in state where it does not respond to any RNDIS commands and device have to be physically unplugged/replugged. This patch lets minidrivers to disable unlink_urbs on usbnet_stop through new info flag. Signed-off-by: Jussi Kivilinna Cc: David Brownell Signed-off-by: John W. Linville The rx urbs will not be resubmitted in any case, and there are of course no tx urbs being submitted. So the only effect of this flag is on the status/interrupt urb, which I can imagine some RNDIS devices wants active all the time. So FLAG_AVOID_UNLINK_URBS should probably be removed and replaced calls to usbnet_status_start() and usbnet_status_stop(). This will require testing on some of the devices with the original firmware problem however. In any case: I do not think this flag should be considered when trying to make usbnet_stop behaviour saner. It's only purpose is to deliberately break usbnet_stop by not actually stopping. If someone could list the particular bits of dev->flags that should be cleared to make sure no deferred call could reschedule usbnet_bh(), etc... Well, it would be enough to clear these first and use dev->flags = 0 later, after tasklet_kill() and del_timer_sync(). I cannot point out these particular bits now. I don't think any of the flags must be cleared. The sequence dev_close(dev->net); usbnet_terminate_urbs(d
[PATCH net-next v3 1/2] lwt: Add cfg argument to build_state
Add cfg and family arguments to lwt build state functions. cfg is a void pointer and will either be a pointer to a fib_config or fib6_config structure. The family parameter indicates which one (either AF_INET or AF_INET6). LWT encpasulation implementation may use the fib configuration to build the LWT state. Signed-off-by: Tom Herbert --- include/net/lwtunnel.h| 3 +++ net/core/lwtunnel.c | 5 +++-- net/ipv4/fib_semantics.c | 17 ++--- net/ipv4/ip_tunnel_core.c | 2 ++ net/ipv6/ila.c| 1 + net/ipv6/route.c | 3 ++- net/mpls/mpls_iptunnel.c | 1 + 7 files changed, 22 insertions(+), 10 deletions(-) diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index 8434898..fce0e35 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -26,6 +26,7 @@ struct lwtunnel_state { struct lwtunnel_encap_ops { int (*build_state)(struct net_device *dev, struct nlattr *encap, + unsigned int family, const void *cfg, struct lwtunnel_state **ts); int (*output)(struct sock *sk, struct sk_buff *skb); int (*input)(struct sk_buff *skb); @@ -80,6 +81,7 @@ int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op, unsigned int num); int lwtunnel_build_state(struct net_device *dev, u16 encap_type, struct nlattr *encap, +unsigned int family, const void *cfg, struct lwtunnel_state **lws); int lwtunnel_fill_encap(struct sk_buff *skb, struct lwtunnel_state *lwtstate); @@ -130,6 +132,7 @@ static inline int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op, static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type, struct nlattr *encap, + unsigned int family, const void *cfg, struct lwtunnel_state **lws) { return -EOPNOTSUPP; diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index e924c2e..dfb1a9c 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -72,7 +72,8 @@ int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *ops, EXPORT_SYMBOL(lwtunnel_encap_del_ops); int lwtunnel_build_state(struct net_device *dev, u16 encap_type, -struct nlattr *encap, struct lwtunnel_state **lws) +struct nlattr *encap, unsigned int family, +const void *cfg, struct lwtunnel_state **lws) { const struct lwtunnel_encap_ops *ops; int ret = -EINVAL; @@ -85,7 +86,7 @@ int lwtunnel_build_state(struct net_device *dev, u16 encap_type, rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[encap_type]); if (likely(ops && ops->build_state)) - ret = ops->build_state(dev, encap, lws); + ret = ops->build_state(dev, encap, family, cfg, lws); rcu_read_unlock(); return ret; diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 01f1c7d..1b2d011 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -511,7 +511,8 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, dev = __dev_get_by_index(net, cfg->fc_oif); ret = lwtunnel_build_state(dev, nla_get_u16( nla_entype), - nla, &lwtstate); + nla, AF_INET, cfg, + &lwtstate); if (ret) goto errout; nexthop_nh->nh_lwtstate = @@ -535,7 +536,8 @@ errout: static int fib_encap_match(struct net *net, u16 encap_type, struct nlattr *encap, - int oif, const struct fib_nh *nh) + int oif, const struct fib_nh *nh, + const struct fib_config *cfg) { struct lwtunnel_state *lwtstate; struct net_device *dev = NULL; @@ -546,8 +548,8 @@ static int fib_encap_match(struct net *net, u16 encap_type, if (oif) dev = __dev_get_by_index(net, oif); - ret = lwtunnel_build_state(dev, encap_type, - encap, &lwtstate); + ret = lwtunnel_build_state(dev, encap_type, encap, + AF_INET, cfg, &lwtstate); if (!ret) { result = lwtunnel_cmp_encap(lwtstate, nh->nh_lwtstate); lwtstate_free(lwtstate); @@ -571,7 +573,7 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info *fi) if (cfg->fc_encap) { if (fib_encap_ma
[PATCH net-next v3 2/2] ila: Precompute checksum difference for translations
In the ILA build state for LWT compute the checksum difference to apply to transport checksums that include the IPv6 pseudo header. The difference is between the route destination (from fib6_config) and the locator to write. Signed-off-by: Tom Herbert --- net/ipv6/ila.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/net/ipv6/ila.c b/net/ipv6/ila.c index ffe4dca..678d2df 100644 --- a/net/ipv6/ila.c +++ b/net/ipv6/ila.c @@ -14,6 +14,8 @@ struct ila_params { __be64 locator; + __be64 locator_match; + __wsum csum_diff; }; static inline struct ila_params *ila_params_lwtunnel( @@ -33,6 +35,9 @@ static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to) static inline __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p) { + if (*(__be64 *)&ip6h->daddr == p->locator_match) + return p->csum_diff; + else return compute_csum_diff8((__be32 *)&ip6h->daddr, (__be32 *)&p->locator); } @@ -130,8 +135,12 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla, struct nlattr *tb[ILA_ATTR_MAX + 1]; size_t encap_len = sizeof(*p); struct lwtunnel_state *newts; + const struct fib6_config *cfg6 = cfg; int ret; + if (family != AF_INET6) + return -EINVAL; + ret = nla_parse_nested(tb, ILA_ATTR_MAX, nla, ila_nl_policy); if (ret < 0) @@ -149,6 +158,15 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla, p->locator = (__force __be64)nla_get_u64(tb[ILA_ATTR_LOCATOR]); + if (cfg6->fc_dst_len > sizeof(__be64)) { + /* Precompute checksum difference for translation since we +* know both the old locator and the new one. +*/ + p->locator_match = *(__be64 *)&cfg6->fc_dst; + p->csum_diff = compute_csum_diff8( + (__be32 *)&p->locator_match, (__be32 *)&p->locator); + } + newts->type = LWTUNNEL_ENCAP_ILA; newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT | LWTUNNEL_STATE_INPUT_REDIRECT; -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v3 0/2] ila: Precompute checksums
This patch set: - Adds argument ot LWT build_state that holds a pointer to the fib configuration being applied to the new route - Adds support in ILA to precompute checksum difference for performance optimization v2: - Move return argument in build_state to end of arguments v3: - Update the signature for ip6_tun_build_state() Tom Herbert (2): lwt: Add cfg argument to build_state ila: Precompute checksum difference for translations include/net/lwtunnel.h| 3 +++ net/core/lwtunnel.c | 5 +++-- net/ipv4/fib_semantics.c | 17 ++--- net/ipv4/ip_tunnel_core.c | 2 ++ net/ipv6/ila.c| 19 +++ net/ipv6/route.c | 3 ++- net/mpls/mpls_iptunnel.c | 1 + 7 files changed, 40 insertions(+), 10 deletions(-) -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] vrf: rename the framework to mrf
Le 22/08/2015 19:47, David Miller a écrit : From: Nicolas Dichtel Date: Sat, 22 Aug 2015 18:10:20 +0200 This patch renames the recently added vrf driver. 'VRF' term is very generic and there is no clear definition of it. For example, someone may expect more isolation and uses network namespaces to implement VRF, This is a rediculous argument. Does someone using VRF on a Cisco box expect Linux namespaces to be used? Sorry, this is not going to get applied. I spent some time today to check threads on this topic on Quagga and on netdev and I digged into the VRF-lite's commercial documentations. I finally agree with you, let's drop this patch. Nicolas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 9/9] phy: fixed_phy: Set phy capabilities even when link is down
On Sun, Aug 23, 2015 at 11:40:07AM -0700, Florian Fainelli wrote: > Le 08/23/15 02:47, Andrew Lunn a écrit : > > What features a phy supports is masked in genphy_config_init() by > > looking at the PHYs BMSR register. > > > > If the link is down, fixed_phy_update_regs() will only set the auto- > > negotiation capable bit in BMSR. Thus genphy_config_init() comes to > > the conclusion the PHY can only perform 10/Half, and masks out the > > higher speed features. If however the link it up, BMSR is set to > > indicate the speed the PHY is capable of auto-negotiating, and > > genphy_config_init() does not mask out the high speed features. > > > > To fix this, when the link is down, have fixed_phy_update_regs() leave > > the link status and auto-negotiation complete bit unset, but set all > > the other bits depending on the fixed phy speed. > > This kinds of revert what Staas did in commit > 868a4215be9a6d80548ccb74763b883dc99d32a2 ("net: phy: fixed_phy: handle > link-down case"). When the link is down, it does not seem to me like we > can rely on the previous speed and duplex parameters to be considered valid. > > Your change does fix a valid use case though... humm. Hi Florian I took at look at Staas fix, and read a bit about what the different bits mean. I've reworked the patch. I now always set the local phy capabilities in BMSR, but only set the negotiated speed and link partner capabilities if the link it up. I also don't error out on speed=0, unless the link is up. This works for my use case, and hopefully also Staas. I will post the new version when we have come to a conclusion about other open issues. Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Low throughput in VMs using VxLAN
Hi, Earlier I was seeing lower throughput in VMs using VxLan as GRO was not happening in VM. Tom Herbert suggested to use "vxlan: GRO support at tunnel layer" patch series. With today's net-next (4.2.0-rc7) in host and VM, I could see GRO happening for vxlan, macvtap and virtual interface in VM. The throughput is still low between VMs (around 4Gbps compared to 9Gbps without VxLAN). Looks like the packet is getting segmented in Host and then GROed in VM. Is this an expected behaviour? Is my below configuration correct? Here is the configuration. eth (VM) - macvtap - vxlan - phy iface <-> phy iface - vxlan - macvtap - (VM) eth VM is started with # qemu-system-x86_64 -m 4096 -smp 4 -boot c -device virtio-net-pci,netdev=hostnet0,id=net0,mac=C2:B2:CA:6F:BC:A4 -device e1000,netdev=tap0,mac=DE:AD:BE:EF:96:32 -netdev tap,id=hostnet0,fd=3 3<>/dev/tap18 -netdev tap,id=tap0,script=no -drive file=/root/vdisk_rhel65.img Here is the skb_segment count for 10 sec iperf receive test. host # ./funccount skb_segment Tracing "skb_segment"... Ctrl-C to end. ^C FUNC COUNT skb_segment 58604 # ./functrace skb_segment ... -0 [006] ..s. 17632.030126: skb_segment <-tcp_gso_segment ksoftirqd/6-38[006] ..s. 17632.030177: skb_segment <-tcp_gso_segment ksoftirqd/6-38[006] ..s. 17632.030223: skb_segment <-tcp_gso_segment ksoftirqd/6-38[006] ..s. 17632.030269: skb_segment <-tcp_gso_segment ksoftirqd/6-38[006] ..s. 17632.030298: skb_segment <-tcp_gso_segment qemu-system-x86-5932 [006] ..s. 17632.030489: skb_segment <-tcp_gso_segment qemu-system-x86-5932 [006] ..s. 17632.030507: skb_segment <-tcp_gso_segment qemu-system-x86-5932 [006] ..s. 17632.030528: skb_segment <-tcp_gso_segment qemu-system-x86-5932 [006] ..s. 17632.030550: skb_segment <-tcp_gso_segment qemu-system-x86-5932 [006] ..s. 17632.030576: skb_segment <-tcp_gso_segment qemu-system-x86-5932 [006] ..s1 17632.030759: skb_segment <-tcp_gso_segment qemu-system-x86-5932 [006] ..s1 17632.030814: skb_segment <-tcp_gso_segment .. # Physical interface 21:32:49.749263 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 2870 21:32:49.749278 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 9860 21:32:49.749326 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74 21:32:49.749333 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74 21:32:49.749340 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74 21:32:49.749405 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 2870 21:32:49.749425 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 11258 # VxLAN 21:32:49.749268 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [.], seq 25:2821, ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length 2796 21:32:49.749281 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [.], seq 2821:12607, ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length 9786 21:32:49.749322 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: Flags [.], ack 2821, win 270, options [nop,nop,TS val 13334931 ecr 15632994], length 0 21:32:49.749331 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: Flags [.], ack 7015, win 336, options [nop,nop,TS val 13334931 ecr 15632994], length 0 21:32:49.749336 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: Flags [.], ack 12607, win 423, options [nop,nop,TS val 13334931 ecr 15632994], length 0 21:32:49.749411 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [.], seq 12607:15403, ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length 2796 21:32:49.749429 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [P.], seq 15403:26587, ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length 11184 # macvtap 2.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [.], seq 25:2821, ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length 2796 21:32:49.749281 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [.], seq 2821:12607, ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length 9786 21:32:49.749321 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: Flags [.], ack 2821, win 270, options [nop,nop,TS val 13334931 ecr 15632994], length 0 21:32:49.749330 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: Flags [.], ack 7015, win 336, options [nop,nop,TS val 13334931 ecr 15632994], length 0 21:32:49.749335 IP 102.44.44.12.commplex-link > 102.44.44.14.60616: Flags [.], ack 12607, win 423, options [nop,nop,TS val 13334931 ecr 15632994], length 0 21:32:49.749411 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [.], seq 12607:15403, ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length 2796 21:32:49.749429 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [P.], seq 15403:26587, ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length 11184 # VM interface 2:02:48.126327 IP 102.44.44.14
[no subject]
-- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe netdev -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] man ip-link: Add little explanations about VLAN qos map
From: Vadim Kochan Add little more info about how to manually set priority by iptables, and some little clarifications about ingress/egress QoS mapping. Signed-off-by: Vadim Kochan --- man/man8/ip-link.8.in | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in index b9137fb..2283b71 100644 --- a/man/man8/ip-link.8.in +++ b/man/man8/ip-link.8.in @@ -349,10 +349,30 @@ where is the physical device to which VLAN device is bound. - specifies whether the VLAN device state is bound to the physical device state. .BI ingress-qos-map " QOS-MAP " -- defines a mapping between priority code points on incoming frames. The format is FROM:TO with multiple mappings separated by spaces. +- defines a mapping of VLAN header prio field to the Linux internal packet +priority on incoming frames. The format is FROM:TO with multiple mappings +separated by spaces. .BI egress-qos-map " QOS-MAP " -- the same as ingress-qos-map but for outgoing frames. +- defines a mapping of Linux internal packet priority to VLAN header prio field +but for outgoing frames. The format is the same as for ingress-qos-map. +.in +4 + +Linux packet priority can be set by +.BR iptables "(8)": +.in +4 +.sp +.B iptables +-t mangle -A POSTROUTING [...] -j CLASSIFY --set-class 0:4 +.sp +.in -4 +and this "4" priority can be used in the egress qos mapping to set VLAN prio "5": +.sp +.in +4 +.B ip +link set veth0.10 type vlan egress 4:5 +.in -4 +.in -4 .in -8 .TP @@ -1090,7 +1110,8 @@ IEEE 802.15.4 device wpan0. .br .BR ip (8), .BR ip-netns (8), -.BR ethtool (8) +.BR ethtool (8), +.BR iptables (8) .SH AUTHOR Original Manpage by Michail Litvak -- 2.4.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch net-next 1/3] mlxsw: Remove duplicate included header
From: Ido Schimmel Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko Signed-off-by: Elad Raz --- drivers/net/ethernet/mellanox/mlxsw/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index 09325b7..0415ff6 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -48,7 +48,6 @@ #include #include #include -#include #include #include #include -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch net-next 3/3] mlxsw: adjust log messages level in __mlxsw_emad_transmit
From: Jiri Pirko When transmit fails, it is an error, not a warning. Do not warn when timeout happens as that is handled by a counter. Signed-off-by: Jiri Pirko Signed-off-by: Ido Schimmel Signed-off-by: Elad Raz --- drivers/net/ethernet/mellanox/mlxsw/core.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index 6ee3f45..dfafb83 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -382,8 +382,8 @@ static int __mlxsw_emad_transmit(struct mlxsw_core *mlxsw_core, err = mlxsw_core_skb_transmit(mlxsw_core->driver_priv, skb, tx_info); if (err) { - dev_warn(mlxsw_core->bus_info->dev, "Failed to transmit EMAD (tid=%llx)\n", -mlxsw_core->emad.tid); + dev_err(mlxsw_core->bus_info->dev, "Failed to transmit EMAD (tid=%llx)\n", + mlxsw_core->emad.tid); dev_kfree_skb(skb); return err; } @@ -393,8 +393,8 @@ static int __mlxsw_emad_transmit(struct mlxsw_core *mlxsw_core, !(mlxsw_core->emad.trans_active), msecs_to_jiffies(MLXSW_EMAD_TIMEOUT_MS)); if (!ret) { - dev_warn(mlxsw_core->bus_info->dev, "EMAD timed-out (tid=%llx)\n", -mlxsw_core->emad.tid); + dev_dbg(mlxsw_core->bus_info->dev, "EMAD timed-out (tid=%llx)\n", + mlxsw_core->emad.tid); mlxsw_core->emad.trans_active = false; mlxsw_core->emad.stats.timeouts++; return -EIO; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch net-next 0/3] mlxsw: small driver update
From: Jiri Pirko Ido Schimmel (1): mlxsw: Remove duplicate included header Jiri Pirko (2): mlxsw: expose EMAD transactions statistics via debugfs mlxsw: adjust log messages level in __mlxsw_emad_transmit drivers/net/ethernet/mellanox/mlxsw/core.c | 60 ++ 1 file changed, 52 insertions(+), 8 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs
From: Jiri Pirko Signed-off-by: Jiri Pirko Signed-off-by: Ido Schimmel Signed-off-by: Elad Raz --- drivers/net/ethernet/mellanox/mlxsw/core.c | 51 -- 1 file changed, 48 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index 0415ff6..6ee3f45 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -98,6 +98,12 @@ struct mlxsw_core { bool trans_active; struct mutex lock; /* One EMAD transaction at a time. */ bool use_emad; + struct { + u64 trans; + u32 fails; + u32 retries; + u32 timeouts; + } stats; } emad; struct mlxsw_core_pcpu_stats __percpu *pcpu_stats; struct dentry *dbg_dir; @@ -390,6 +396,7 @@ static int __mlxsw_emad_transmit(struct mlxsw_core *mlxsw_core, dev_warn(mlxsw_core->bus_info->dev, "EMAD timed-out (tid=%llx)\n", mlxsw_core->emad.tid); mlxsw_core->emad.trans_active = false; + mlxsw_core->emad.stats.timeouts++; return -EIO; } @@ -463,8 +470,10 @@ retry: if (!err || err != -EAGAIN) goto out; } - if (n_retry++ < MLXSW_EMAD_MAX_RETRY) + if (n_retry++ < MLXSW_EMAD_MAX_RETRY) { + mlxsw_core->emad.stats.retries++; goto retry; + } out: dev_kfree_skb(skb); @@ -671,6 +680,35 @@ static const struct file_operations mlxsw_core_rx_stats_dbg_ops = { .llseek = seq_lseek }; +static int mlxsw_core_emad_stats_dbg_read(struct seq_file *file, void *data) +{ + struct mlxsw_core *mlxsw_core = file->private; + + if (mutex_lock_interruptible(&mlxsw_core->emad.lock)) + return -EINTR; + seq_printf(file, "transactions: %llu\n", mlxsw_core->emad.stats.trans); + seq_printf(file, "fails: %u\n", mlxsw_core->emad.stats.fails); + seq_printf(file, "retries: %u\n", mlxsw_core->emad.stats.retries); + seq_printf(file, "timeouts: %u\n", mlxsw_core->emad.stats.timeouts); + mutex_unlock(&mlxsw_core->emad.lock); + return 0; +} + +static int mlxsw_core_emad_stats_dbg_open(struct inode *inode, struct file *f) +{ + struct mlxsw_core *mlxsw_core = inode->i_private; + + return single_open(f, mlxsw_core_emad_stats_dbg_read, mlxsw_core); +} + +static const struct file_operations mlxsw_core_emad_stats_dbg_ops = { + .owner = THIS_MODULE, + .open = mlxsw_core_emad_stats_dbg_open, + .release = single_release, + .read = seq_read, + .llseek = seq_lseek +}; + static void mlxsw_core_buf_dump_dbg(struct mlxsw_core *mlxsw_core, const char *buf, size_t size) { @@ -768,6 +806,8 @@ static int mlxsw_core_debugfs_init(struct mlxsw_core *mlxsw_core) mlxsw_core->dbg.psid_blob.size = sizeof(bus_info->psid); debugfs_create_blob("psid", S_IRUGO, mlxsw_core->dbg_dir, &mlxsw_core->dbg.psid_blob); + debugfs_create_file("emad_stats", S_IRUGO, mlxsw_core->dbg_dir, + mlxsw_core, &mlxsw_core_emad_stats_dbg_ops); return 0; } @@ -1107,8 +1147,10 @@ retry: err = mlxsw_cmd_access_reg(mlxsw_core, in_mbox, out_mbox); if (!err) { err = mlxsw_emad_process_status(mlxsw_core, out_mbox); - if (err == -EAGAIN && n_retry++ < MLXSW_EMAD_MAX_RETRY) + if (err == -EAGAIN && n_retry++ < MLXSW_EMAD_MAX_RETRY) { + mlxsw_core->emad.stats.retries++; goto retry; + } } if (!err) @@ -1137,6 +1179,7 @@ static int mlxsw_core_reg_access(struct mlxsw_core *mlxsw_core, return -EINTR; } + mlxsw_core->emad.stats.trans++; cur_tid = mlxsw_core->emad.tid; dev_dbg(mlxsw_core->bus_info->dev, "Reg access (tid=%llx,reg_id=%x(%s),type=%s)\n", cur_tid, reg->id, mlxsw_reg_id_str(reg->id), @@ -1153,10 +1196,12 @@ static int mlxsw_core_reg_access(struct mlxsw_core *mlxsw_core, err = mlxsw_core_reg_access_emad(mlxsw_core, reg, payload, type); - if (err) + if (err) { dev_err(mlxsw_core->bus_info->dev, "Reg access failed (tid=%llx,reg_id=%x(%s),type=%s)\n", cur_tid, reg->id, mlxsw_reg_id_str(reg->id), mlxsw_core_reg_access_type_str(type)); + mlxsw_core->emad.stats.fails++; + } mutex_unlock(&mlxsw_core->emad.lock); return err; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@
[PATCH] net-next: Fix warning while make xmldocs caused by skbuff.c
This patch fix following warnings. .//net/core/skbuff.c:407: warning: No description found for parameter 'len' .//net/core/skbuff.c:407: warning: Excess function parameter 'length' description in '__netdev_alloc_skb' .//net/core/skbuff.c:476: warning: No description found for parameter 'len' .//net/core/skbuff.c:476: warning: Excess function parameter 'length' description in '__napi_alloc_skb' Signed-off-by: Masanari Iida --- net/core/skbuff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7b84330..dad4dd3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -392,7 +392,7 @@ EXPORT_SYMBOL(napi_alloc_frag); /** * __netdev_alloc_skb - allocate an skbuff for rx on a specific device * @dev: network device to receive on - * @length: length to allocate + * @len: length to allocate * @gfp_mask: get_free_pages mask, passed to alloc_skb * * Allocate a new &sk_buff and assign it a usage count of one. The @@ -461,7 +461,7 @@ EXPORT_SYMBOL(__netdev_alloc_skb); /** * __napi_alloc_skb - allocate skbuff for rx in a specific NAPI instance * @napi: napi instance this buffer was allocated for - * @length: length to allocate + * @len: length to allocate * @gfp_mask: get_free_pages mask, passed to alloc_skb and alloc_pages * * Allocate a new sk_buff for use in NAPI receive. This buffer will -- 2.5.0.234.gefc8a62 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-sunxi] Re: [PATCH] net: sun4i-emac: Claim emac sram
On Mon, Aug 24, 2015 at 11:17:43AM +0200, Hans de Goede wrote: > Hi, > > On 24-08-15 09:46, Maxime Ripard wrote: > >Hi Hans, > > > >On Sun, Aug 23, 2015 at 08:31:38PM +0200, Hans de Goede wrote: > >>Claim the emac sram ourselves, rather then relying on the bootloader > >>having mapped the sram to the emac controller during boot. > >> > >>Signed-off-by: Hans de Goede > >>--- > >> drivers/net/ethernet/allwinner/sun4i-emac.c | 13 +++-- > >> 1 file changed, 11 insertions(+), 2 deletions(-) > >> > >>diff --git a/drivers/net/ethernet/allwinner/sun4i-emac.c > >>b/drivers/net/ethernet/allwinner/sun4i-emac.c > >>index bab01c84..48ce83e 100644 > >>--- a/drivers/net/ethernet/allwinner/sun4i-emac.c > >>+++ b/drivers/net/ethernet/allwinner/sun4i-emac.c > >>@@ -28,6 +28,7 @@ > >> #include > >> #include > >> #include > >>+#include > >> > >> #include "sun4i-emac.h" > >> > >>@@ -857,11 +858,17 @@ static int emac_probe(struct platform_device *pdev) > >> > >>clk_prepare_enable(db->clk); > >> > >>+ ret = sunxi_sram_claim(&pdev->dev); > >>+ if (ret) { > >>+ dev_err(&pdev->dev, "Error couldn't map SRAM to device\n"); > >>+ goto out; > > > >Shouldn't you disable you clock too? > > You're right, but that is a pre-existing problem, iow an unrelated > issue. > > I've put doing a follow-up patch for this on my todo list. Thanks. Maxime -- Maxime Ripard, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com signature.asc Description: Digital signature
Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH
Eugene Shatokhin writes: > 19.08.2015 15:31, Bjørn Mork пишет: >> Eugene Shatokhin writes: >> >>> The problem is not in the reordering but rather in the fact that >>> "dev->flags = 0" is not necessarily atomic >>> w.r.t. "clear_bit(EVENT_RX_KILL, &dev->flags)", and vice versa. >>> >>> So the following might be possible, although unlikely: >>> >>> CPU0 CPU1 >>> clear_bit: read dev->flags >>> clear_bit: clear EVENT_RX_KILL in the read value >>> >>> dev->flags=0; >>> >>> clear_bit: write updated dev->flags >>> >>> As a result, dev->flags may become non-zero again. >> >> Ah, right. Thanks for explaining. >> >>> I cannot prove yet that this is an impossible situation. If anyone >>> can, please explain. If so, this part of the patch will not be needed. >> >> I wonder if we could simply move the dev->flags = 0 down a few lines to >> fix both issues? It doesn't seem to do anything useful except for >> resetting the flags to a sane initial state after the device is down. >> >> Stopping the tasklet rescheduling etc depends only on netif_running(), >> which will be false when usbnet_stop is called. There is no need to >> touch dev->flags for this to happen. > > That was one of the first ideas we discussed here. Unfortunately, it > is probably not so simple. > > Setting dev->flags to 0 makes some delayed operations do nothing and, > among other things, not to reschedule usbnet_bh(). Yes, but I believe that is merely a side effect. You should never need to clear multiple flags to get the desired behaviour. > As you can see in drivers/net/usb/usbnet.c, usbnet_bh() can be called > as a tasklet function and as a timer function in a number of > situations (look for the usage of dev->bh and dev->delay there). > > netif_running() is indeed false when usbnet_stop() runs, usbnet_stop() > also disables Tx. This seems to be enough for many cases where > usbnet_bh() is scheduled, but I am not so sure about the remaining > ones, namely: > > 1. A work function, usbnet_deferred_kevent(), may reschedule > usbnet_bh(). Looks like the workqueue is only stopped in > usbnet_disconnect(), so a work item might be processed while > usbnet_stop() works. Setting dev->flags to 0 makes the work function > do nothing, by the way. See also the comment in usbnet_stop() about > this. > > A work item may be placed to this workqueue in a number of ways, by > both usbnet module and the mini-drivers. It is not too easy to track > all these situations. That's an understatement :) > 2. rx_complete() and tx_complete() may schedule execution of > usbnet_bh() as a tasklet or a timer function. These two are URB > completion callbacks. > > It seems, new Rx and Tx URBs cannot be submitted when usbnet_stop() > clears dev->flags, indeed. But it does not prevent the completion > handlers for the previously submitted URBs from running concurrently > with usbnet_stop(). The latter waits for them to complete (via > usbnet_terminate_urbs(dev)) but only if FLAG_AVOID_UNLINK_URBS is not > set in info->flags. rndis_wlan, however, sets this flag for a few > hardware models. So - no guarantees here as well. FLAG_AVOID_UNLINK_URBS looks like it should be replaced by the newer ability to keep the status urb active. I believe that must have been the real reason for adding it, based on the commit message and the effect the flag will have: commit 1487cd5e76337555737cbc55d7d83f41460d198f Author: Jussi Kivilinna Date: Thu Jul 30 19:41:20 2009 +0300 usbnet: allow "minidriver" to prevent urb unlinking on usbnet_stop rndis_wlan devices freeze after running usbnet_stop several times. It appears that firmware freezes in state where it does not respond to any RNDIS commands and device have to be physically unplugged/replugged. This patch lets minidrivers to disable unlink_urbs on usbnet_stop through new info flag. Signed-off-by: Jussi Kivilinna Cc: David Brownell Signed-off-by: John W. Linville The rx urbs will not be resubmitted in any case, and there are of course no tx urbs being submitted. So the only effect of this flag is on the status/interrupt urb, which I can imagine some RNDIS devices wants active all the time. So FLAG_AVOID_UNLINK_URBS should probably be removed and replaced calls to usbnet_status_start() and usbnet_status_stop(). This will require testing on some of the devices with the original firmware problem however. In any case: I do not think this flag should be considered when trying to make usbnet_stop behaviour saner. It's only purpose is to deliberately break usbnet_stop by not actually stopping. > If someone could list the particular bits of dev->flags that should be > cleared to make sure no deferred call could reschedule usbnet_bh(), > etc... Well, it would be enough to clear these first and use > dev->flags = 0 later, after tasklet_kill() and del_timer_sync(). I > cannot point out these particular bits now. I don'