Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists
On Mon, 28 Sep 2015 11:30:00 -0500 (CDT) Christoph Lameterwrote: > On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote: > > > Not knowing SLUB as well as you, it took me several hours to realize > > init_object() didn't overwrite the freepointer in the object. Thus, I > > think these comments make the reader aware of not-so-obvious > > side-effects of SLAB_POISON and SLAB_RED_ZONE. > > From the source: > > /* > * Object layout: > * > * object address > * Bytes of the object to be managed. > * If the freepointer may overlay the object then the free > * pointer is the first word of the object. > * > * Poisoning uses 0x6b (POISON_FREE) and the last byte is > * 0xa5 (POISON_END) > * > * object + s->object_size > * Padding to reach word boundary. This is also used for Redzoning. > * Padding is extended by another word if Redzoning is enabled and > * object_size == inuse. > * > * We fill with 0xbb (RED_INACTIVE) for inactive objects and with > * 0xcc (RED_ACTIVE) for objects in use. > * > * object + s->inuse > * Meta data starts here. > * > * A. Free pointer (if we cannot overwrite object on free) > * B. Tracking data for SLAB_STORE_USER > * C. Padding to reach required alignment boundary or at mininum > * one word if debugging is on to be able to detect writes > * before the word boundary. Okay, I will remove the comment. The best doc on SLUB and SLAB layout comes from your slides titled: "Slab allocators in the Linux Kernel: SLAB, SLOB, SLUB" Lets gracefully add a link to the slides here: http://events.linuxfoundation.org/sites/events/files/slides/slaballocators.pdf -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net] skbuff: Fix skb checksum flag on skb pull
On Tue, Sep 29, 2015 at 03:27:03AM +0300, Pravin Shelar wrote: > On Mon, Sep 28, 2015 at 2:46 AM, Andrew Vaginwrote: > > Hi, > > > > With this patch, I can't connect two local tcp ipv6 sockets. > > > > [root@fc22-vm criu]# strace -e network python ipv6.py > > socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 3 > > bind(3, {sa_family=AF_INET6, sin6_port=htons(8976), inet_pton(AF_INET6, > > "::", _addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 > > listen(3, 1)= 0 > > socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 4 > > connect(4, {sa_family=AF_INET6, sin6_port=htons(8976), inet_pton(AF_INET6, > > "::1", _addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ETIMEDOUT > > (Connection timed out) > > > > [root@fc22-vm criu]# cat ipv6.py > > import socket > > > > srv = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) > > srv.bind(("::0", 8976)) > > srv.listen(1) > > c = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) > > c.connect(("::1", 8976)) > > > > Can you try following patch. > https://patchwork.ozlabs.org/patch/523632/ It works for me. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [net] orinoco_usb:Fix error handling in ezusb_probe()
RUC_Soft_Secwrites: > Current code assigns 0 to variable 'retval', which makes ezusb_probe() to > return success even if alloc_orinocodev() fails. > > The related code snippets in mantis_dma_init() is as following. > > 1573 static int ezusb_probe(struct usb_interface *interface, > 1574const struct usb_device_id *id) > 1575 { > > > > 1583 int retval = 0; > 1584 int i; > 1585 > 1586 priv = alloc_orinocodev(sizeof(*upriv), >dev, > 1587 ezusb_hard_reset, NULL); > 1588 if (!priv) { > 1589 err("Couldn't allocate orinocodev"); > 1590 goto exit; > 1591 } > ... > > 1729 exit: > 1730 if (fw_entry) { > 1731 firmware.code = NULL; > 1732 firmware.size = 0; > 1733 release_firmware(fw_entry); > 1734 } > 1735 usb_set_intfdata(interface, upriv); > 1736 return retval; > 1737 } > > Fix it by checking the return value from alloc_orinocodev() and assigns > '-ENOMEM' to variable 'retval' in the case of error. > > Signed-off-by: Zhang Yan --- > orinoco_usb.c |1 + > 1 file changed, 1 insertion(+) > diff --git a/orinoco_usb.c b/orinoco_usb.c The patch looks corrupted. And the from header doesn't contain a proper name. -- Kalle Valo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unregister_netdevice warnings when deleting netns
Hello, On Mon, 28 Sep 2015, Eric W. Biederman wrote: > Julian Anastasovwrites: > > > On Mon, 28 Sep 2015, Anand Gurram wrote: > > > >> I am currently using kernel version 3.16.7 on a linux switch. > >> While creating and destroying network namespaces I am observing below logs > >> on the console > >> "unregister_netdevice: waiting for lo to become free. Usage count = 1" > >> > >> Can you please suggest and provide instructions on how to debug this issue. > >> If any fix already available can you please point me to the link. > > > > There are two commits from Linux 4.2 that may help: > > > > commit e9e4dd3267d0 ("net: do not process device backlog during > > unregistration") > > commit 2c17d27c36dc ("net: call rcu_read_lock early in process_backlog") > > > If that message repeats indefinitely it means there is a leaked > reference to the network namespaces lo device. > > If the message just spits out a few times and then goes away it simply > means that something is taking a while to cleanup and drop it's > reference. > > This is slightly complicated by the fact that it is not uncommon when a > network device goes away to redirect all references to itself to the lo > device. Yes, there is a little chance with forwarding disabled, i.e. when presence of "ipv4: Avoid crashing in ip_error" does not matter, flying packet to leave new reference somewhere, without crashing. But it may be another problem, of course. Regards -- Julian Anastasov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
On Mon, 2015-09-28 at 12:37 -0700, Tom Herbert wrote: > I think it's easier to just call skb_checksum_help from the driver > when the packet is actually sent to the device (should be no cost for > late binding). That's true for checksum. Not for things like TSO though, and I wonder if it's worth keeping it simple and doing it *all* in .ndo_features_check()? > > Note that 'seeded with an IPv[46] pseudo header' isn't quite > > sufficient. Some hardware like 8139cp is explicitly told to do a UDP or > > a TCP checksum with a bit in the descriptor, so any UDP-like or TCP > > -like checksum works out fine. > > > UDP or TCP can be determined from csum_offset, e.g. 16=>TCP 6=>UDP Kind of. There'll be false positives there too, though. That was actually the basis of my first attempt to address this, at http://lists.openwall.net/netdev/2013/01/14/36 -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: mwifiex: Make mwifiex_dbg a function, reduce object size
> The mwifiex_dbg macro has two tests that could be consolidated > into a function reducing overall object size ~10KB (~4%). > > So convert the macro into a function. > > $ size drivers/net/wireless/mwifiex/built-in.o* (x86-64 defconfig) >text data bss dec hex filename > 233102 86284809 246539 3c30b > drivers/net/wireless/mwifiex/built-in.o.new > 243949 86284809 257386 3ed6a > drivers/net/wireless/mwifiex/built-in.o.old > > Signed-off-by: Joe PerchesThanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/6] make non-modular code explicitly non-modular
Hi Paul, On Mon, 28 Sep 2015, Paul Gortmaker wrote: > On 28/09/2015 (Mon 23:09) Geert Uytterhoeven wrote: > > > Hi Paul, ... > > > > Why did you choose this approach? > > What about changing the "bool"s to "tristate"s in Kconfig instead? > > Long answer is here: > > https://lkml.org/lkml/2015/8/24/888 You wrote, "If there was demand for them to be tristate, then it would have happened by now." I don't follow your reasoning. You might just as well remove entire drivers and then argue, "If there was demand for drivers without bugs, then someone would have written them by now". Perhaps you meant, "If there was sufficient demand for them to be tristate, then sufficient resources would have been marshalled, as required to get an enhancement written, tested, submitted, reviewed and merged in the mainline kernel." > > To summarize, it adds functionality to code I can't test, and with 300 > or so of these, it already has been a large time sink. Add to that > extending the functionality and testing the new functionality, and it > does not scale. Plus if something hasn't allowed tristate for over 10 > years, where is the value in adding it now? There is value to be gained by completing the tristate support, and there is value destroyed by removing the partial tristate support. I'm not involved in building distro kernels, but I know that Debian's would benefit from these tristates, because it would reduce the size of the m68k multi-platform kernel binary. And even if it is dead code you aim to remove, a lot of people have worked on it (according to git blame), including myself. We should not disregard that effort when we could leverage it instead. For the macmace driver in particular, I did the platform driver conversion, and it should work as a module. I did not change it to tristate at the time because I did not want to deal with the question of the 'psc' global, which lacks an EXPORT_SYMBOL(psc). Anyway, I'll send a patch if Geert doesn't do so first. > > > I gave it a try, and with some small changes the three m68k ethernet > > drivers build fine as modular drivers. I can send patches if you like > > it. > > Per above, I don't see the value in it, but if you want to do it and > test it and own submitting the patches, then I can drop the > corresponding ones from my queue. I can't test right now but I have the hardware and will attend to any issues if need be. I do not expect any issues, because the modular option seems to involve the same code paths in the driver. If the CONFIG_MACMACE=m option was implemented badly and did not work correctly, at least it couldn't be called a regression, presuming that 'm' builds okay, and that the default was 'y' or 'n'. > Either way we get the code matching the Kconfig which is what I'm after > out of this. Yes, me too. > > Note that if you do decide to do this, the one driver really needs more > than just tristate one line change, it had super ancient init code that > predates module_init and probably needs an update. I think the solution for mac8390 is to do in the modular case exactly what Space.c does in the built-in case. That would mean that the modular driver would support only one card, just like the built-in driver. (That limitation is a problem which affects all Nubus card drivers, because they have to do all their own bus matching, because Nubus still lacks the necessary driver model support.) I haven't looked at amd/hplance, but I expect that the issues are similar. Geert, do plan to send patches for any of these drivers? Regards, Finn > > Thanks, > Paul. > -- > > > > > Thanks! > > > > Gr{oetje,eeting}s, > > > > Geert -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unregister_netdevice warnings when deleting netns
Anand Gurramwrites: >>If the message just spits out a few times and then goes away it simply >>means that something is taking a while to cleanup and drop it's >>reference. > > The message just spits out few times and then goes away, I am trying > to debug why cleanup is taking long, > and where it is still referenced. Any pointers in debugging such > issues will be of great help. The one thing I have done in the past is to instrument dev_hold and dev_put and look where in the code the stragglers are coming from (when I can reproduce the issue reliably). Sometimes people have addressed this class of issue with code review, but with a slow cleanup you can't catch this by finding a missing dev_put. It takes some creativity to find these as people rarely make the same mistake twice. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: Mellanox crash with iommu=soft and swiotlb=force
On Tue, Sep 29, 2015 at 12:59:35AM +0300, Or Gerlitz wrote: > On Tue, Sep 29, 2015 at 12:04 AM, Christoffer Dall >wrote: > > Hi, > > > > In doing some performance experiments I found that using a 10G Mellanox > > MX354A Dual port FDR CX3 device on a server running Apache and running > > ab against that server causes the system to crash with 'iommu=soft > > swiotlb=force'. The same behavior is seen without these options on Dom0 > > running under Xen. > > > > I have tried this on v4.0 and v4.3-rc3. > > Woops, needs looking indeed. Unfortunately many people in the team are > off for the Sukkot holiday with real backing to business coming to > play on Oct 6th -- not sure we can really respond on that before. > > Are you running over ARM? which? if not, is that x86 64bit? > I'm running on x86_64. Thanks, -Christoffer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC v2 net-next 05/10] qede: Add basic network device support
> > >> > +struct qede_rx_queue { > > >> > + __le16 *hw_cons_ptr; > > >> > > >> The __ variants of constants should be reserved for use in user > > >> visible API's > > > > > > Really? If so, this needs to be fixed not only here but in lots of > > > places in the series [e.g., entire HW HSI uses __le variants instead of > > > le]. > > > But why is it so? I.e., I understand that __le16 is defined in the > > > uapi directory and thus accessible to users, but why the distinction? > > > > Because it shows whether the type is something exposed to userspace or not. > > > > If there are places where this is done incorrectly in the tree, it is > > not a legitimate reason for you to do so as well. > > Obviously. > We'll fix all of those for next version. I've taken a look and I couldn't find reference to 'le16' anywhere under drivers/net/ethernet/. And 'le16' is actually a fs/ntfs/types.h definition. What am I missing here? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3,1/2] airo: fix IW_AUTH_ALG_OPEN_SYSTEM
> IW_AUTH_ALG_OPEN_SYSTEM is ambiguous in set_auth for WEP as > wpa_supplicant uses it for both no encryption and WEP open system. > Cache the last mode set (only of these two) and use it here. > > This allows wpa_supplicant to work with unencrypted APs. > > Signed-off-by: Ondrej ZaryThanks, 2 patches applied to wireless-drivers-next.git: 4a0f2ea79797 airo: fix IW_AUTH_ALG_OPEN_SYSTEM 2b8fa9e870b7 airo: Implement netif_carrier_on/off Kalle Valo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/7] netfilter: nft_meta: look at pkt->sk rather than skb->sk
Hi Daniel, [auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore] config: m68k-sun3_defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout bcddf1d1557b51bef5ef395b5b7dd7b512794e2f # save the attached .config to linux build tree make.cross ARCH=m68k All warnings (new ones prefixed by >>): net/netfilter/nft_meta.c: In function 'nft_meta_get_eval': >> net/netfilter/nft_meta.c:34:15: warning: unused variable 'sk' >> [-Wunused-variable] struct sock *sk = pkt->sk; ^ vim +/sk +34 net/netfilter/nft_meta.c 18 #include 19 #include 20 #include 21 #include 22 #include 23 #include /* for TCP_TIME_WAIT */ 24 #include 25 #include 26 27 void nft_meta_get_eval(const struct nft_expr *expr, 28 struct nft_regs *regs, 29 const struct nft_pktinfo *pkt) 30 { 31 const struct nft_meta *priv = nft_expr_priv(expr); 32 const struct net_device *in = pkt->in, *out = pkt->out; 33 struct sk_buff *skb = pkt->skb; > 34 struct sock *sk = pkt->sk; 35 u32 *dest = >data[priv->dreg]; 36 37 switch (priv->key) { 38 case NFT_META_LEN: 39 *dest = skb->len; 40 break; 41 case NFT_META_PROTOCOL: 42 *dest = 0; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: unregister_netdevice warnings when deleting netns
Thanks Julian, I will check if these two commits work for me. >I think, they will appear in other stable versions too... Yes, I saw them in other versions, the fix which is suggested in those branches didn't work for me. Hope the above two commits help. Regards, Anand On Tue, Sep 29, 2015 at 12:42 AM, Julian Anastasovwrote: > > Hello, > > On Mon, 28 Sep 2015, Anand Gurram wrote: > >> I am currently using kernel version 3.16.7 on a linux switch. >> While creating and destroying network namespaces I am observing below logs >> on the console >> "unregister_netdevice: waiting for lo to become free. Usage count = 1" >> >> Can you please suggest and provide instructions on how to debug this issue. >> If any fix already available can you please point me to the link. > > There are two commits from Linux 4.2 that may help: > > commit e9e4dd3267d0 ("net: do not process device backlog during > unregistration") > commit 2c17d27c36dc ("net: call rcu_read_lock early in process_backlog") > > For now I see them only in 3.2.71+ and 3.12.48+. > I think, they will appear in other stable versions too... > > Regards > > -- > Julian Anastasov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unregister_netdevice warnings when deleting netns
>If the message just spits out a few times and then goes away it simply >means that something is taking a while to cleanup and drop it's >reference. The message just spits out few times and then goes away, I am trying to debug why cleanup is taking long, and where it is still referenced. Any pointers in debugging such issues will be of great help. Best Regards, Anand On Tue, Sep 29, 2015 at 3:05 AM, Eric W. Biedermanwrote: > Julian Anastasov writes: > >> Hello, >> >> On Mon, 28 Sep 2015, Anand Gurram wrote: >> >>> I am currently using kernel version 3.16.7 on a linux switch. >>> While creating and destroying network namespaces I am observing below logs >>> on the console >>> "unregister_netdevice: waiting for lo to become free. Usage count = 1" >>> >>> Can you please suggest and provide instructions on how to debug this issue. >>> If any fix already available can you please point me to the link. >> >> There are two commits from Linux 4.2 that may help: >> >> commit e9e4dd3267d0 ("net: do not process device backlog during >> unregistration") >> commit 2c17d27c36dc ("net: call rcu_read_lock early in process_backlog") >> >> For now I see them only in 3.2.71+ and 3.12.48+. >> I think, they will appear in other stable versions too... > > If that message repeats indefinitely it means there is a leaked > reference to the network namespaces lo device. > > If the message just spits out a few times and then goes away it simply > means that something is taking a while to cleanup and drop it's > reference. > > This is slightly complicated by the fact that it is not uncommon when a > network device goes away to redirect all references to itself to the lo > device. > > Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
On Mon, 2015-09-28 at 20:04 -0700, Tom Herbert wrote: > > > I've been pondering a bit of a redesign in this space. I think the > > skb struct should be explicit in its instructions to hardware for > > which offloads to do for each packet. > > > > In this way, the stack would be *directly* telling the drivers what to > > do (and what not to do), solving all sorts of bugs and really improving > > driver reliability and implementation. > > > Doesn't CHECKSUM_PARTIAL with csum_offset and csum_start already tell > the driver unambiguously what to do wrt checksum offload? Right. That's precisely what we *do* have. But as things stand, we can't *use* it to its full capability. It's fine for decent devices which can handle such explicit instructions (advertised by the NETIF_F_HW_CSUM feature). The problem is the crappy devices that can *only* checksum UDP and TCP frames, advertised with the NETIF_F_IP{V6,}_CSUM features. We make a primitive attempt *not* to feed arbitrary checksum requests to such hardware. But we fail — we end up feeding *all* Legacy IP packets to a NETIF_F_IP_CSUM device, and *all* IPv6 packets to a NETIF_F_IPV6_CSUM device, regardless of whether they're *actually* TCP or UDP packets. That's the problem I'm trying to solve. And then we *can* make full use of the generic checksum offload (I had it working for ICMPv6 at one point: http://lists.openwall.net/netdev/2013/01/14/38 ). -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists
On Mon, 28 Sep 2015 11:28:15 -0500 (CDT) Christoph Lameterwrote: > On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote: > > > > Do you really need separate parameters for freelist_head? If you just want > > > to deal with one object pass it as freelist_head and set cnt = 1? > > > > Yes, I need it. We need to know both the head and tail of the list to > > splice it. > > Ok so this is to avoid having to scan the list to its end? True. > x is the end > of the list and freelist_head the beginning. That is weird. Yes, it is a bit weird... the bulk free of freelists comes out as a second-class citizen. Okay, I'll try to change the slab_free() and __slab_free() calls to have a "head" and "tail". And let tail be NULL on single object free, to allow compiler to do constant propagation (thus keeping existing fastpath unaffected). (The same code should be generated, but we will have a more intuitive API). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] net: m68k: Allow modular build
Hi David, Paul, This patch series makes the remaining m68k Ethernet drivers modular. It's an alternative to the last 3 patches of Paul Gortmaker's series "[PATCH net-next 0/6] make non-modular code explicitly non-modular". Note that "[PATCH 5/5] net: macmace: Allow modular build" depends on "[PATCH 4/5] m68k/mac: Export Peripheral System Controller (PSC) base address to modules". Feel free to take the dependency through the netdev tree to avoid modular build breakage. This was compile-tested only (mac_defconfig + allmodconfig) due to lack of hardware. Thanks! Geert Uytterhoeven (5): net: mac8390: Allow modular build net: 7990: Export lance_poll() to modules net: hplance: Allow modular build m68k/mac: Export Peripheral System Controller (PSC) base address to modules net: macmace: Allow modular build arch/m68k/mac/psc.c | 1 + drivers/net/ethernet/8390/Kconfig | 2 +- drivers/net/ethernet/8390/mac8390.c | 32 ++-- drivers/net/ethernet/amd/7990.c | 1 + drivers/net/ethernet/amd/Kconfig| 2 +- drivers/net/ethernet/apple/Kconfig | 2 +- 6 files changed, 15 insertions(+), 25 deletions(-) -- 1.9.1 Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] net: hplance: Allow modular build
Signed-off-by: Geert Uytterhoeven--- drivers/net/ethernet/amd/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/amd/Kconfig b/drivers/net/ethernet/amd/Kconfig index afc62ea804fc35d4..0038709fd317d83c 100644 --- a/drivers/net/ethernet/amd/Kconfig +++ b/drivers/net/ethernet/amd/Kconfig @@ -100,7 +100,7 @@ config DECLANCE DEPCA series. (This chipset is better known via the NE2100 cards.) config HPLANCE - bool "HP on-board LANCE support" + tristate "HP on-board LANCE support" depends on DIO select CRC32 ---help--- -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] net: macmace: Allow modular build
Signed-off-by: Geert Uytterhoeven--- This depends on "[PATCH 4/5] m68k/mac: Export Peripheral System Controller (PSC) base address to modules". --- drivers/net/ethernet/apple/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/apple/Kconfig b/drivers/net/ethernet/apple/Kconfig index d19a41b0c6d26691..31071297896c96b5 100644 --- a/drivers/net/ethernet/apple/Kconfig +++ b/drivers/net/ethernet/apple/Kconfig @@ -51,7 +51,7 @@ config BMAC will be called bmac. config MACMACE - bool "Macintosh (AV) onboard MACE ethernet" + tristate "Macintosh (AV) onboard MACE ethernet" depends on MAC select CRC32 ---help--- -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] net: 7990: Export lance_poll() to modules
If CONFIG_HPLANCE=m and CONFIG_NET_POLL_CONTROLLER=y: ERROR: "lance_poll" [drivers/net/ethernet/amd/hplance.ko] undefined! Add the missing export to fix this. Signed-off-by: Geert Uytterhoeven--- drivers/net/ethernet/amd/7990.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/amd/7990.c b/drivers/net/ethernet/amd/7990.c index 98a10d555b793e02..66d0b73c39c03ba2 100644 --- a/drivers/net/ethernet/amd/7990.c +++ b/drivers/net/ethernet/amd/7990.c @@ -661,6 +661,7 @@ void lance_poll(struct net_device *dev) spin_unlock(>devlock); lance_interrupt(dev->irq, dev); } +EXPORT_SYMBOL_GPL(lance_poll); #endif MODULE_LICENSE("GPL"); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] m68k/mac: Export Peripheral System Controller (PSC) base address to modules
If CONFIG_MACMACE=m: ERROR: psc [drivers/net/ethernet/apple/macmace.ko] undefined! Add the missing export to fix this. Signed-off-by: Geert Uytterhoeven--- I'm OK with this going in through the netdev tree, as "net: macmace: Allow modular build" depends on it. --- arch/m68k/mac/psc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/m68k/mac/psc.c b/arch/m68k/mac/psc.c index cd38f29955c87421..2290c0cae48beb8a 100644 --- a/arch/m68k/mac/psc.c +++ b/arch/m68k/mac/psc.c @@ -29,6 +29,7 @@ int psc_present; volatile __u8 *psc; +EXPORT_SYMBOL_GPL(psc); /* * Debugging dump, used in various places to see what's going on. -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] net: mac8390: Allow modular build
The modular driver supports only one card, just like the built-in driver. Note that this limitation is a problem which affects all Nubus card drivers, because they have to do all their own bus matching, because Nubus still lacks the necessary driver model support. Suggested-by: Finn ThainSigned-off-by: Geert Uytterhoeven --- drivers/net/ethernet/8390/Kconfig | 2 +- drivers/net/ethernet/8390/mac8390.c | 32 ++-- 2 files changed, 11 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/8390/Kconfig b/drivers/net/ethernet/8390/Kconfig index edf72258ab1ddabe..29c3075bfb052f1d 100644 --- a/drivers/net/ethernet/8390/Kconfig +++ b/drivers/net/ethernet/8390/Kconfig @@ -64,7 +64,7 @@ config ARM_ETHERH should say Y to this option if you wish to use it with Linux. config MAC8390 - bool "Macintosh NS 8390 based ethernet cards" + tristate "Macintosh NS 8390 based ethernet cards" depends on MAC select CRC32 ---help--- diff --git a/drivers/net/ethernet/8390/mac8390.c b/drivers/net/ethernet/8390/mac8390.c index 65cf60f6718c52fa..b9283901136e974a 100644 --- a/drivers/net/ethernet/8390/mac8390.c +++ b/drivers/net/ethernet/8390/mac8390.c @@ -454,34 +454,22 @@ MODULE_AUTHOR("David Huggins-Daines and others"); MODULE_DESCRIPTION("Macintosh NS8390-based Nubus Ethernet driver"); MODULE_LICENSE("GPL"); -/* overkill, of course */ -static struct net_device *dev_mac8390[15]; -int init_module(void) +static struct net_device *dev_mac8390; + +int __init init_module(void) { - int i; - for (i = 0; i < 15; i++) { - struct net_device *dev = mac8390_probe(-1); - if (IS_ERR(dev)) - break; - dev_mac890[i] = dev; - } - if (!i) { - pr_notice("No useable cards found, driver NOT installed.\n"); - return -ENODEV; + dev_mac8390 = mac8390_probe(-1); + if (IS_ERR(dev_mac8390)) { + pr_warn("mac8390: No card found\n"); + return PTR_ERR(dev_mac8390); } return 0; } -void cleanup_module(void) +void __exit cleanup_module(void) { - int i; - for (i = 0; i < 15; i++) { - struct net_device *dev = dev_mac890[i]; - if (dev) { - unregister_netdev(dev); - free_netdev(dev); - } - } + unregister_netdev(dev_mac8390); + free_netdev(dev_mac8390); } #endif /* MODULE */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Per-flow IPv4 ECMP
On Mon, 28 Sep 2015 06:48:09 -0700 roopawrote: > On 9/28/15, 1:57 AM, Matthew Dupre wrote: > > Hi, > > > > I'm interested in the Linux kernel's support for per-flow IPv4 ECMP > > (i.e. consistent path selection based on a hash of the connection > > tuple). I'd been led to believe[1] that this depended on the route > > cache, which was removed in 3.6. > > > > However, I tested a route with multiple next hops on a 3.10 and > > 3.13 kernel, and ECMP was per-flow! Obviously I'm pleased that > > this was the case, but I'd like to understand why this is > > supported, and whether I can rely on it in future. > > > > Could anyone give me a little clarification on whether this is now > > supported by some means other than the route cache, and whether > > that support is intended to be continued? > > > This is being worked on currently by Peter Nørlund > https://lwn.net/Articles/657431/ Hi, AFAIK if you create a socket on the machine having the multipath route, each socket will seemingly be mapped to particular path, and it will behave as per-flow. But if you use the machine as a router, each packet is forwarded independently, potentially hitting different paths each time. Best Regards Peter Nørlund -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 net-next 0/2] ipv4: Hash-based multipath routing
From: Peter Nørlund > Sent: 29 September 2015 12:29 ... > As for using L4 hashing with anycast, CloudFlare apparently does L4 > hashing - they could have disabled it, but they didn't. Besides, > analysis of my own load balancers showed that only one in every > 500,000,000 packets is fragmented. And even if I hit a fragmented > packet, it is only a problem if the packet hits the wrong load > balancer, and if that load balancer haven't been updated with the state > from another load balancer (that is, one of the very first packets). It > is still a possible scenario though - especially with large HTTP > cookies or file uploads. But apparently it is a common problem that IP > fragments gets dropped on the Internet, so I suspect that ECMP+Anycast > sites are just part of the pool of problematic sites for people with > fragments. Fragmentation is usually more of an issue with UDP than TCP. Some SIP messages can get fragmented... David
Re: [PATCH net-next 0/6] make non-modular code explicitly non-modular
[Re: [PATCH net-next 0/6] make non-modular code explicitly non-modular] On 29/09/2015 (Tue 16:32) Finn Thain wrote: > > Hi Paul, > > On Mon, 28 Sep 2015, Paul Gortmaker wrote: > > > On 28/09/2015 (Mon 23:09) Geert Uytterhoeven wrote: > > > > > Hi Paul, > ... > > > > > > Why did you choose this approach? > > > What about changing the "bool"s to "tristate"s in Kconfig instead? > > > > Long answer is here: > > > > https://lkml.org/lkml/2015/8/24/888 > > You wrote, "If there was demand for them to be tristate, then it would > have happened by now." I don't follow your reasoning. You might just as > well remove entire drivers and then argue, "If there was demand for > drivers without bugs, then someone would have written them by now". I don't see those two sentences being alike, but in the end it does not matter, since Geert has decided to do the conversion and test it. And whatever code gets removed is never truly gone anyway; it lives on in the git history forever. Thanks, Paul. -- > > Perhaps you meant, "If there was sufficient demand for them to be > tristate, then sufficient resources would have been marshalled, as > required to get an enhancement written, tested, submitted, reviewed and > merged in the mainline kernel." > > > > > To summarize, it adds functionality to code I can't test, and with 300 > > or so of these, it already has been a large time sink. Add to that > > extending the functionality and testing the new functionality, and it > > does not scale. Plus if something hasn't allowed tristate for over 10 > > years, where is the value in adding it now? > > There is value to be gained by completing the tristate support, and there > is value destroyed by removing the partial tristate support. > > I'm not involved in building distro kernels, but I know that Debian's > would benefit from these tristates, because it would reduce the size of > the m68k multi-platform kernel binary. > > And even if it is dead code you aim to remove, a lot of people have worked > on it (according to git blame), including myself. We should not disregard > that effort when we could leverage it instead. > > For the macmace driver in particular, I did the platform driver > conversion, and it should work as a module. I did not change it to > tristate at the time because I did not want to deal with the question of > the 'psc' global, which lacks an EXPORT_SYMBOL(psc). Anyway, I'll send a > patch if Geert doesn't do so first. > > > > > > I gave it a try, and with some small changes the three m68k ethernet > > > drivers build fine as modular drivers. I can send patches if you like > > > it. > > > > Per above, I don't see the value in it, but if you want to do it and > > test it and own submitting the patches, then I can drop the > > corresponding ones from my queue. > > I can't test right now but I have the hardware and will attend to any > issues if need be. I do not expect any issues, because the modular option > seems to involve the same code paths in the driver. > > If the CONFIG_MACMACE=m option was implemented badly and did not work > correctly, at least it couldn't be called a regression, presuming that 'm' > builds okay, and that the default was 'y' or 'n'. > > > Either way we get the code matching the Kconfig which is what I'm after > > out of this. > > Yes, me too. > > > > > Note that if you do decide to do this, the one driver really needs more > > than just tristate one line change, it had super ancient init code that > > predates module_init and probably needs an update. > > I think the solution for mac8390 is to do in the modular case exactly what > Space.c does in the built-in case. That would mean that the modular driver > would support only one card, just like the built-in driver. (That > limitation is a problem which affects all Nubus card drivers, because they > have to do all their own bus matching, because Nubus still lacks the > necessary driver model support.) > > I haven't looked at amd/hplance, but I expect that the issues are similar. > > Geert, do plan to send patches for any of these drivers? > > Regards, > Finn > > > > > Thanks, > > Paul. > > -- > > > > > > > > Thanks! > > > > > > Gr{oetje,eeting}s, > > > > > > Geert > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 net-next 0/2] ipv4: Hash-based multipath routing
On Mon, 28 Sep 2015 19:55:41 -0700 (PDT) David Millerwrote: > From: David Miller > Date: Mon, 28 Sep 2015 19:33:55 -0700 (PDT) > > > From: Peter Nørlund > > Date: Wed, 23 Sep 2015 21:49:35 +0200 > > > >> When the routing cache was removed in 3.6, the IPv4 multipath > >> algorithm changed from more or less being destination-based into > >> being quasi-random per-packet scheduling. This increases the risk > >> of out-of-order packets and makes it impossible to use multipath > >> together with anycast services. > >> > >> This patch series replaces the old implementation with flow-based > >> load balancing based on a hash over the source and destination > >> addresses. > > > > This isn't perfect but it's a significant step in the right > > direction. So I'm going to apply this to net-next now and we can > > make incremental improvements upon it. > > Actually, I had to revert, this doesn't build: > > [davem@localhost net-next]$ make -s -j8 > Setup is 16876 bytes (padded to 16896 bytes). > System is 10011 kB > CRC 324f2811 > Kernel: arch/x86/boot/bzImage is ready (#337) > ERROR: "__ip_route_output_key_hash" [net/dccp/dccp_ipv4.ko] undefined! > scripts/Makefile.modpost:90: recipe for target '__modpost' failed > make[1]: *** [__modpost] Error 1 > Makefile:1095: recipe for target 'modules' failed > make: *** [modules] Error 2 Sorry! I forgot to update the EXPORT_SYMBOL_GPL line. In the meantime I've been doing some thinking (and measuring). Considering that the broader goal is to make IPv6 and IPv4 behave as identical as possible, it is probably not such a bad idea to just use the flow dissector + modulo in the IPv4 code too - the patch will be simpler than the current one. I fear the performance impact of the flow dissector though - some of my earlier measurements showed that it was 5-6 times slower than the simple one I used. But maybe it is better to streamline the IPv4/IPv6 multipath first and then improve upon it afterward (make it work, make it right, make it fast). As for using L4 hashing with anycast, CloudFlare apparently does L4 hashing - they could have disabled it, but they didn't. Besides, analysis of my own load balancers showed that only one in every 500,000,000 packets is fragmented. And even if I hit a fragmented packet, it is only a problem if the packet hits the wrong load balancer, and if that load balancer haven't been updated with the state from another load balancer (that is, one of the very first packets). It is still a possible scenario though - especially with large HTTP cookies or file uploads. But apparently it is a common problem that IP fragments gets dropped on the Internet, so I suspect that ECMP+Anycast sites are just part of the pool of problematic sites for people with fragments. I'm still unsettled as to whether the ICMP handling belongs to the kernel or not. The above breakage was in the ICMP-part of the patchset, so judging from that, I guess it wasn't out of the question. But in the "IPv4 and IPv6 should behave identical"-mindset, it probably belongs to a separate, future patchset, adding ICMP handling to both IPv4 and IPv6 - and it is actually more important for IPv6 than IPv4 since PMTUD cannot be disabled. Best Regards, Peter Nørlund -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 4/7] net: tcp_ipv4, udp_ipv4: hook up LOCAL_SOCKET_IN netfilter chains
Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the destination socket for IPv4 unicast and multicast ports have been looked up. Signed-off-by: Daniel Mack--- net/ipv4/netfilter/nf_tables_ipv4.c | 10 +- net/ipv4/tcp_ipv4.c | 8 net/ipv4/udp.c | 15 +++ 3 files changed, 28 insertions(+), 5 deletions(-) diff --git a/net/ipv4/netfilter/nf_tables_ipv4.c b/net/ipv4/netfilter/nf_tables_ipv4.c index abee60a..2e65664 100644 --- a/net/ipv4/netfilter/nf_tables_ipv4.c +++ b/net/ipv4/netfilter/nf_tables_ipv4.c @@ -50,11 +50,11 @@ struct nft_af_info nft_af_ipv4 __read_mostly = { .owner = THIS_MODULE, .nops = 1, .hooks = { - [NF_INET_LOCAL_IN] = nft_do_chain_ipv4, - [NF_INET_LOCAL_OUT] = nft_ipv4_output, - [NF_INET_FORWARD] = nft_do_chain_ipv4, - [NF_INET_PRE_ROUTING] = nft_do_chain_ipv4, - [NF_INET_POST_ROUTING] = nft_do_chain_ipv4, + [NF_INET_LOCAL_IN] = nft_do_chain_ipv4, + [NF_INET_LOCAL_OUT] = nft_ipv4_output, + [NF_INET_FORWARD] = nft_do_chain_ipv4, + [NF_INET_PRE_ROUTING] = nft_do_chain_ipv4, + [NF_INET_POST_ROUTING] = nft_do_chain_ipv4, [NF_INET_LOCAL_SOCKET_IN] = nft_do_chain_ipv4, }, }; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 93898e0..83bc7b3 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -78,6 +78,7 @@ #include #include +#include #include #include #include @@ -1594,6 +1595,13 @@ int tcp_v4_rcv(struct sk_buff *skb) if (!sk) goto no_tcp_socket; + ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) { + sock_put(sk); + return 0; + } + process: if (sk->sk_state == TCP_TIME_WAIT) goto do_time_wait; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index f7d1d5e..57c7571 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -97,6 +97,7 @@ #include #include #include +#include #include #include #include @@ -1633,7 +1634,14 @@ static void flush_stack(struct sock **stack, unsigned int count, struct sock *sk; for (i = 0; i < count; i++) { + int ret; sk = stack[i]; + + ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) + continue; + if (likely(!skb1)) skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC); @@ -1820,6 +1828,13 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, if (sk) { int ret; + ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) { + sock_put(sk); + return 0; + } + if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk)) skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check, inet_compute_pseudo); -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 5/7] net: tcp_ipv6, udp_ipv6: hook up LOCAL_SOCKET_IN netfilter chains
Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the destination socket for IPv6 unicast and multicast ports have been looked up. Signed-off-by: Daniel Mack--- net/ipv6/netfilter/nf_tables_ipv6.c | 14 -- net/ipv6/tcp_ipv6.c | 8 net/ipv6/udp.c | 9 + 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/net/ipv6/netfilter/nf_tables_ipv6.c b/net/ipv6/netfilter/nf_tables_ipv6.c index c8148ba..53c7923 100644 --- a/net/ipv6/netfilter/nf_tables_ipv6.c +++ b/net/ipv6/netfilter/nf_tables_ipv6.c @@ -49,11 +49,12 @@ struct nft_af_info nft_af_ipv6 __read_mostly = { .owner = THIS_MODULE, .nops = 1, .hooks = { - [NF_INET_LOCAL_IN] = nft_do_chain_ipv6, - [NF_INET_LOCAL_OUT] = nft_ipv6_output, - [NF_INET_FORWARD] = nft_do_chain_ipv6, - [NF_INET_PRE_ROUTING] = nft_do_chain_ipv6, - [NF_INET_POST_ROUTING] = nft_do_chain_ipv6, + [NF_INET_LOCAL_IN] = nft_do_chain_ipv6, + [NF_INET_LOCAL_OUT] = nft_ipv6_output, + [NF_INET_FORWARD] = nft_do_chain_ipv6, + [NF_INET_PRE_ROUTING] = nft_do_chain_ipv6, + [NF_INET_POST_ROUTING] = nft_do_chain_ipv6, + [NF_INET_LOCAL_SOCKET_IN] = nft_do_chain_ipv6, }, }; EXPORT_SYMBOL_GPL(nft_af_ipv6); @@ -95,7 +96,8 @@ static const struct nf_chain_type filter_ipv6 = { (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_FORWARD) | (1 << NF_INET_PRE_ROUTING) | - (1 << NF_INET_POST_ROUTING), + (1 << NF_INET_POST_ROUTING) | + (1 << NF_INET_LOCAL_SOCKET_IN), }; static int __init nf_tables_ipv6_init(void) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 97d9314..0b0706d 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -1392,6 +1393,13 @@ static int tcp_v6_rcv(struct sk_buff *skb) if (!sk) goto no_tcp_socket; + ret = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_SOCKET_IN, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) { + sock_put(sk); + return 0; + } + process: if (sk->sk_state == TCP_TIME_WAIT) goto do_time_wait; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 0aba654..99df081 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -746,7 +747,15 @@ static void flush_stack(struct sock **stack, unsigned int count, unsigned int i; for (i = 0; i < count; i++) { + int ret; + sk = stack[i]; + + ret = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_SOCKET_IN, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) + continue; + if (likely(!skb1)) skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC); if (!skb1) { -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the input demux is complete and the final destination socket (if any) has been determined. This helps filtering packets based on information stored in the destination socket, such as cgroup controller supplied net class IDs. Note that rules in such chains are not processed in case the local listen socket cannot be determined. Hence, if no application is listening on a specific task, the resulting error code that is sent back to the remote peer can't be controlled with rules in NF_INET_LOCAL_SOCKET_IN chains. Signed-off-by: Daniel Mack--- include/uapi/linux/netfilter.h | 1 + net/ipv4/netfilter/iptable_filter.c | 1 + net/ipv4/netfilter/nf_tables_ipv4.c | 4 +++- net/netfilter/nf_tables_inet.c | 3 ++- 4 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h index d93f949..96c3f8b 100644 --- a/include/uapi/linux/netfilter.h +++ b/include/uapi/linux/netfilter.h @@ -49,6 +49,7 @@ enum nf_inet_hooks { NF_INET_FORWARD, NF_INET_LOCAL_OUT, NF_INET_POST_ROUTING, + NF_INET_LOCAL_SOCKET_IN, NF_INET_NUMHOOKS }; diff --git a/net/ipv4/netfilter/iptable_filter.c b/net/ipv4/netfilter/iptable_filter.c index a0f3bec..d65616a5 100644 --- a/net/ipv4/netfilter/iptable_filter.c +++ b/net/ipv4/netfilter/iptable_filter.c @@ -21,6 +21,7 @@ MODULE_AUTHOR("Netfilter Core Team "); MODULE_DESCRIPTION("iptables filter table"); #define FILTER_VALID_HOOKS ((1 << NF_INET_LOCAL_IN) | \ + (1 << NF_INET_LOCAL_SOCKET_IN) | \ (1 << NF_INET_FORWARD) | \ (1 << NF_INET_LOCAL_OUT)) diff --git a/net/ipv4/netfilter/nf_tables_ipv4.c b/net/ipv4/netfilter/nf_tables_ipv4.c index aa180d3..abee60a 100644 --- a/net/ipv4/netfilter/nf_tables_ipv4.c +++ b/net/ipv4/netfilter/nf_tables_ipv4.c @@ -55,6 +55,7 @@ struct nft_af_info nft_af_ipv4 __read_mostly = { [NF_INET_FORWARD] = nft_do_chain_ipv4, [NF_INET_PRE_ROUTING] = nft_do_chain_ipv4, [NF_INET_POST_ROUTING] = nft_do_chain_ipv4, + [NF_INET_LOCAL_SOCKET_IN] = nft_do_chain_ipv4, }, }; EXPORT_SYMBOL_GPL(nft_af_ipv4); @@ -96,7 +97,8 @@ static const struct nf_chain_type filter_ipv4 = { (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_FORWARD) | (1 << NF_INET_PRE_ROUTING) | - (1 << NF_INET_POST_ROUTING), + (1 << NF_INET_POST_ROUTING) | + (1 << NF_INET_LOCAL_SOCKET_IN), }; static int __init nf_tables_ipv4_init(void) diff --git a/net/netfilter/nf_tables_inet.c b/net/netfilter/nf_tables_inet.c index 9dd2d21..5544196 100644 --- a/net/netfilter/nf_tables_inet.c +++ b/net/netfilter/nf_tables_inet.c @@ -75,7 +75,8 @@ static const struct nf_chain_type filter_inet = { (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_FORWARD) | (1 << NF_INET_PRE_ROUTING) | - (1 << NF_INET_POST_ROUTING), + (1 << NF_INET_POST_ROUTING) | + (1 << NF_INET_LOCAL_SOCKET_IN), }; static int __init nf_tables_inet_init(void) -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 7/7] net: dccp: hook up LOCAL_SOCKET_IN netfilter chains
Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the destination socket for DCCP packets have been looked up. Signed-off-by: Daniel Mack--- net/dccp/ipv4.c | 14 +- net/dccp/ipv6.c | 14 +- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index ccf4c56..9746138 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include @@ -807,7 +808,7 @@ static int dccp_v4_rcv(struct sk_buff *skb) const struct dccp_hdr *dh; const struct iphdr *iph; struct sock *sk; - int min_cov; + int ret, min_cov; /* Step 1: Check header basics */ @@ -857,6 +858,17 @@ static int dccp_v4_rcv(struct sk_buff *skb) /* * Step 2: +* ... or any LOCAL_SOCKET_IN rule disagrees ... +*/ + ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) { + sock_put(sk); + return 0; + } + + /* +* Step 2: * ... or S.state == TIMEWAIT, * Generate Reset(No Connection) unless P.type == Reset * Drop packet and return diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 5165571..63b51e6 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -14,6 +14,7 @@ #include #include +#include #include #include @@ -691,7 +692,7 @@ static int dccp_v6_rcv(struct sk_buff *skb) { const struct dccp_hdr *dh; struct sock *sk; - int min_cov; + int ret, min_cov; /* Step 1: Check header basics */ @@ -732,6 +733,17 @@ static int dccp_v6_rcv(struct sk_buff *skb) /* * Step 2: +* ... or any LOCAL_SOCKET_IN rule disagrees ... +*/ + ret = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_SOCKET_IN, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) { + sock_put(sk); + return 0; + } + + /* +* Step 2: * ... or S.state == TIMEWAIT, * Generate Reset(No Connection) unless P.type == Reset * Drop packet and return -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 0/7] netfilter: introduce new chain type for local socket input
Here is a patch set that enables for full support for match rules that take into account information about the local receiver socket. Such rules allow administrators to implement per-application or per-container firewalls which filter any type of network traffic directed to or originated from a set of processes on a system, independent of, for instance, local or remote port numbers. In theory, such rules are already supported through the 'meta' and 'socket' rule types, but they currently do not work for ingress packets delivered to unestablished listener sockets. NF_INET_LOCAL_IN chains are iterated once the IP stack decides a packet is directed to the local system, but before the local listener socket is determined. Consequently, filter rules that are based on information derived from the listener socket cannot be used reliably. This patch set introduces a new chain type (NF_INET_LOCAL_SOCKET_IN) that is iterated at a later point in time than NF_INET_LOCAL_IN, after the listener socket demux has succeeded. Chains of this type are hence only looked at _if_ there is a local listener. The input paths for TCP and UDP for IPv4 and IPv6 are patched for the new hook-up, as well as SCTP and DCCP. Possible performance penalties for setups in which this new type is not used need to be considered, but I lack a good test case for that. I'm sure some people reading this do have proper test scenarios they can run with these patches applied. I'd be very interested in these numbers. For SCTP and DCCP, I admittedly lack a proper test case as well, and for UDP, I'm aware of a possible deadlock due to nf_hook() being called under hslot->lock when the stack is flushed preliminarily from __udp[46]_lib_mcast_deliver(). That's fixable, but I've kept it simple for this RFC. Only nftables is supported so far, but enabling iptables as well would be straight forward. I also have trivial patches for libnftnl and nftables to enable the userspace part. I'd appreciate some feedback about this approach. Thanks, Daniel Daniel Mack (7): netfilter: add socket to struct nft_pktinfo netfilter: nft_meta: look at pkt->sk rather than skb->sk netfilter: add NF_INET_LOCAL_SOCKET_IN chain type net: tcp_ipv4, udp_ipv4: hook up LOCAL_SOCKET_IN netfilter chains net: tcp_ipv6, udp_ipv6: hook up LOCAL_SOCKET_IN netfilter chains net: sctp: hook up LOCAL_SOCKET_IN netfilter chains net: dccp: hook up LOCAL_SOCKET_IN netfilter chains include/net/netfilter/nf_tables.h | 2 ++ include/uapi/linux/netfilter.h | 1 + net/dccp/ipv4.c | 14 +- net/dccp/ipv6.c | 14 +- net/ipv4/netfilter/iptable_filter.c | 1 + net/ipv4/netfilter/nf_tables_ipv4.c | 14 -- net/ipv4/tcp_ipv4.c | 8 net/ipv4/udp.c | 15 +++ net/ipv6/netfilter/nf_tables_ipv6.c | 14 -- net/ipv6/tcp_ipv6.c | 8 net/ipv6/udp.c | 9 + net/netfilter/nf_tables_inet.c | 3 ++- net/netfilter/nft_meta.c| 7 --- net/sctp/input.c| 11 ++- 14 files changed, 102 insertions(+), 19 deletions(-) -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 2/7] netfilter: nft_meta: look at pkt->sk rather than skb->sk
pkt->sk is set to whatever was passed to nh_hook() by the caller, and for post demux chains, this is the one that should be looked at, as skb->sk is still NULL at this point in time. Signed-off-by: Daniel Mack--- net/netfilter/nft_meta.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c index cb2f13e..f195bee 100644 --- a/net/netfilter/nft_meta.c +++ b/net/netfilter/nft_meta.c @@ -29,8 +29,9 @@ void nft_meta_get_eval(const struct nft_expr *expr, const struct nft_pktinfo *pkt) { const struct nft_meta *priv = nft_expr_priv(expr); - const struct sk_buff *skb = pkt->skb; const struct net_device *in = pkt->in, *out = pkt->out; + struct sk_buff *skb = pkt->skb; + struct sock *sk = pkt->sk; u32 *dest = >data[priv->dreg]; switch (priv->key) { @@ -168,9 +169,9 @@ void nft_meta_get_eval(const struct nft_expr *expr, break; #ifdef CONFIG_CGROUP_NET_CLASSID case NFT_META_CGROUP: - if (skb->sk == NULL || !sk_fullsock(skb->sk)) + if (sk == NULL || !sk_fullsock(sk)) goto err; - *dest = skb->sk->sk_classid; + *dest = sk->sk_classid; break; #endif default: -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 1/7] netfilter: add socket to struct nft_pktinfo
The high-level netfilter hook API already enables users to pass a socket, but that information is lost when the chains are walked. In order to let internal eval callbacks use the passed filter rather than skb->sk, add a pointer of type 'struct sock' to 'struct nft_pktinfo' and set that field via nft_set_pktinfo(). This allows us to run filter chains from situations where skb->sk is unset. Fall back to skb->sk in case state->sk is NULL, so filter callbacks can be written in a generic way. Signed-off-by: Daniel Mack--- include/net/netfilter/nf_tables.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index aa8bee7..05e97ed 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -13,6 +13,7 @@ #define NFT_JUMP_STACK_SIZE16 struct nft_pktinfo { + struct sock *sk; struct sk_buff *skb; const struct net_device *in; const struct net_device *out; @@ -29,6 +30,7 @@ static inline void nft_set_pktinfo(struct nft_pktinfo *pkt, struct sk_buff *skb, const struct nf_hook_state *state) { + pkt->sk = state->sk ?: skb->sk; pkt->skb = skb; pkt->in = pkt->xt.in = state->in; pkt->out = pkt->xt.out = state->out; -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 6/7] net: sctp: hook up LOCAL_SOCKET_IN netfilter chains
Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the destination socket for SCTP packets have been looked up. Signed-off-by: Daniel Mack--- net/sctp/input.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/sctp/input.c b/net/sctp/input.c index b6493b3..0652406 100644 --- a/net/sctp/input.c +++ b/net/sctp/input.c @@ -45,6 +45,7 @@ #include /* For struct list_head */ #include #include +#include #include /* For struct timeval */ #include #include @@ -115,7 +116,7 @@ int sctp_rcv(struct sk_buff *skb) struct sctphdr *sh; union sctp_addr src; union sctp_addr dest; - int family; + int ret, family; struct sctp_af *af; struct net *net = dev_net(skb->dev); @@ -180,6 +181,14 @@ int sctp_rcv(struct sk_buff *skb) rcvr = asoc ? >base : >base; sk = rcvr->sk; + /* Iterate through rules in LOCAL_SOCKET_IN, +* now that the receiver is known. +*/ + ret = nf_hook(family == AF_INET ? NFPROTO_IPV4 : NFPROTO_IPV6, + NF_INET_LOCAL_SOCKET_IN, sk, skb, skb->dev, NULL, NULL); + if (ret != 1) + goto discard_release; + /* * If a frame arrives on an interface and the receiving socket is * bound to another interface, via SO_BINDTODEVICE, treat it as OOTB -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.3-rc3 Regression: NFS access stall by commit 6ae459bdaaee
On Tue, 29 Sep 2015 02:35:04 +0200, Pravin Shelar wrote: > > On Mon, Sep 28, 2015 at 6:12 AM, Takashi Iwaiwrote: > > [I resent this since the previous mail didn't go out properly, as it > > seems; apologies if you already read it, please disregard] > > > > Hi, > > > > I noticed that NFS access from my workstation slowed down drastically, > > almost stalls, with the fresh 4.3-rc3. There are no particular kernel > > errors / warnings. > > > > Then I performed git section, and it leaded to the commit: > > 6ae459bdaaeebc632b16e54dcbabb490c6931d61 > > skbuff: Fix skb checksum flag on skb pull > > > > Reverting this commit from 4.3-rc3 fixed the issue indeed. > > > > Could you take a look at this? I added Trond to Cc in case he might > > already know of it. > > > I send out fix for similar issue. Can you try the posted patch. > https://patchwork.ozlabs.org/patch/523632/ Yes, the patch fixes the problem, thanks. Feel free to take my tested-by tag: Tested-by: Takashi Iwai But I guess the real fix is only the first chunk and the latter is nothing but a cleanup? If so, it'd be better to split it. Takashi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] xfrm: Fix state threshold configuration from userspace
Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry timer (XFRMA_ETIMER_THRESH) of a state without having to set other attributes like replay counter and byte lifetime. Changing these other values while traffic flows will break the state. Signed-off-by: Michael Rossberg--- net/xfrm/xfrm_user.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index a8de9e3..24e06a2 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -1928,8 +1928,10 @@ static int xfrm_new_ae(struct sk_buff *skb, struct nlmsghdr *nlh, struct nlattr *rp = attrs[XFRMA_REPLAY_VAL]; struct nlattr *re = attrs[XFRMA_REPLAY_ESN_VAL]; struct nlattr *lt = attrs[XFRMA_LTIME_VAL]; + struct nlattr *et = attrs[XFRMA_ETIMER_THRESH]; + struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH]; - if (!lt && !rp && !re) + if (!lt && !rp && !re && !et && !rt) return err; /* pedantic mode - thou shalt sayeth replaceth */ -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/6] net: switchdev: pass callback to dump operation
Similar to the notifier_call callback of a notifier_block, change the function signature of switchdev dump operation to: int switchdev_port_obj_dump(struct net_device *dev, enum switchdev_obj_id id, void *obj, int (*cb)(void *obj)); This allows the caller to pass and expect back a specific switchdev_obj_* structure instead of the generic switchdev_obj one. Drivers implementation of dump operation can now expect this specific structure and call the callback with it. Drivers have been changed accordingly. Signed-off-by: Vivien Didelot--- drivers/net/ethernet/rocker/rocker.c | 21 + include/net/switchdev.h | 9 +--- net/dsa/slave.c | 26 +++-- net/switchdev/switchdev.c| 45 ++-- 4 files changed, 53 insertions(+), 48 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 78fd443..107adb6 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4538,10 +4538,10 @@ static int rocker_port_obj_del(struct net_device *dev, } static int rocker_port_fdb_dump(const struct rocker_port *rocker_port, - struct switchdev_obj *obj) + struct switchdev_obj_fdb *fdb, + int (*cb)(void *obj)) { struct rocker *rocker = rocker_port->rocker; - struct switchdev_obj_fdb *fdb = >u.fdb; struct rocker_fdb_tbl_entry *found; struct hlist_node *tmp; unsigned long lock_flags; @@ -4556,7 +4556,7 @@ static int rocker_port_fdb_dump(const struct rocker_port *rocker_port, fdb->ndm_state = NUD_REACHABLE; fdb->vid = rocker_port_vlan_to_vid(rocker_port, found->key.vlan_id); - err = obj->cb(obj); + err = cb(fdb); if (err) break; } @@ -4566,9 +4566,9 @@ static int rocker_port_fdb_dump(const struct rocker_port *rocker_port, } static int rocker_port_vlan_dump(const struct rocker_port *rocker_port, -struct switchdev_obj *obj) +struct switchdev_obj_vlan *vlan, + int (*cb)(void *obj)) { - struct switchdev_obj_vlan *vlan = >u.vlan; u16 vid; int err = 0; @@ -4579,7 +4579,7 @@ static int rocker_port_vlan_dump(const struct rocker_port *rocker_port, if (rocker_vlan_id_is_internal(htons(vid))) vlan->flags |= BRIDGE_VLAN_INFO_PVID; vlan->vid_begin = vlan->vid_end = vid; - err = obj->cb(obj); + err = cb(vlan); if (err) break; } @@ -4588,17 +4588,18 @@ static int rocker_port_vlan_dump(const struct rocker_port *rocker_port, } static int rocker_port_obj_dump(struct net_device *dev, - struct switchdev_obj *obj) + enum switchdev_obj_id id, void *obj, + int (*cb)(void *obj)) { const struct rocker_port *rocker_port = netdev_priv(dev); int err = 0; - switch (obj->id) { + switch (id) { case SWITCHDEV_OBJ_PORT_FDB: - err = rocker_port_fdb_dump(rocker_port, obj); + err = rocker_port_fdb_dump(rocker_port, obj, cb); break; case SWITCHDEV_OBJ_PORT_VLAN: - err = rocker_port_vlan_dump(rocker_port, obj); + err = rocker_port_vlan_dump(rocker_port, obj, cb); break; default: err = -EOPNOTSUPP; diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 9ef7c56..0a80f2a 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -120,7 +120,8 @@ struct switchdev_ops { int (*switchdev_port_obj_del)(struct net_device *dev, struct switchdev_obj *obj); int (*switchdev_port_obj_dump)(struct net_device *dev, - struct switchdev_obj *obj); + enum switchdev_obj_id id, void *obj, + int (*cb)(void *obj)); }; enum switchdev_notifier_type { @@ -152,7 +153,8 @@ int switchdev_port_attr_set(struct net_device *dev, struct switchdev_attr *attr); int switchdev_port_obj_add(struct net_device *dev, struct switchdev_obj *obj); int switchdev_port_obj_del(struct net_device *dev, struct switchdev_obj *obj); -int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj); +int switchdev_port_obj_dump(struct net_device *dev, enum switchdev_obj_id id, +
[PATCH net-next 0/6] net: switchdev: use specific switchdev_obj_*
This patchset changes switchdev add, del, dump operations from this: int (*switchdev_port_obj_add)(struct net_device *dev, struct switchdev_obj *obj, struct switchdev_trans *trans); int (*switchdev_port_obj_del)(struct net_device *dev, struct switchdev_obj *obj); int (*switchdev_port_obj_dump)(struct net_device *dev, struct switchdev_obj *obj); to something similar to the notifier_call callback of a notifier_block: int (*switchdev_port_obj_add)(struct net_device *dev, enum switchdev_obj_id id, const void *obj, struct switchdev_trans *trans); int (*switchdev_port_obj_del)(struct net_device *dev, enum switchdev_obj_id id, const void *obj); int (*switchdev_port_obj_dump)(struct net_device *dev, enum switchdev_obj_id id, void *obj, int (*cb)(void *obj)); This allows the caller to pass and expect back a specific switchdev_obj_* structure (e.g. switchdev_obj_fdb) instead of the generic switchdev_obj one. This will simplify pushing the callback function down to the drivers. The first 3 patches get rid of the dev parameter of the dump callback, since it is not always neeeded (e.g. vlan_dump) and some drivers (such as DSA drivers) may not have easy access to it. Patches 4 and 5 implement the change in the switchdev operations and its users. Patch 6 extracts the inner switchdev_obj_* structures from switchdev_obj and removes this last one. Vivien Didelot (6): net: switchdev: remove dev in port_vlan_dump_put net: switchdev: move dev in switchdev_fdb_dump net: switchdev: remove dev from switchdev_obj cb net: switchdev: pass callback to dump operation net: switchdev: abstract object in add/del ops net: switchdev: extract struct switchdev_obj_* drivers/net/ethernet/rocker/rocker.c | 42 include/net/switchdev.h | 80 --- net/bridge/br_fdb.c | 11 +-- net/bridge/br_vlan.c | 24 ++--- net/dsa/slave.c | 46 + net/switchdev/switchdev.c| 184 --- 6 files changed, 186 insertions(+), 201 deletions(-) -- 2.5.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 00/14] tcp: listener refactoring preparations
This patch series makes changes to TCP/DCCP stacks so that we can switch listener code to lockless mode. This is done by marking const the listener socket in all appropriate paths. FastOpen code had to be changed to not dynamically allocate a very small structure to make code simpler for following changes. Eric Dumazet (14): tcp/dccp: constify send_synack and send_reset socket argument tcp: remove unused len argument from tcp_rcv_state_process() tcp: remove tcp_rcv_state_process() tcp_hdr argument dccp: use inet6_csk_route_req() helper inet: constify inet_csk_route_child_sock() socket argument inet: constify __inet_inherit_port() sock argument net: constify sk_gfp_atomic() sock argument dccp: constify dccp_create_openreq_child() sock argument tcp: constify tcp_create_openreq_child() socket argument tcp/dccp: constify syn_recv_sock() method sock argument tcp: cookie_init_sequence() cleanups tcp: constify tcp_v{4|6}_route_req() sock argument tcp: constify tcp_syn_flood_action() socket argument tcp: prepare fastopen code for upcoming listener changes include/linux/tcp.h | 22 -- include/net/inet6_connection_sock.h | 2 +- include/net/inet_connection_sock.h | 5 +++-- include/net/inet_hashtables.h | 2 +- include/net/request_sock.h | 16 ++-- include/net/sock.h | 2 +- include/net/tcp.h | 28 ++-- net/core/request_sock.c | 9 - net/dccp/dccp.h | 6 +++--- net/dccp/ipv4.c | 5 +++-- net/dccp/ipv6.c | 24 +++- net/dccp/minisocks.c| 4 ++-- net/ipv4/af_inet.c | 10 +++--- net/ipv4/inet_connection_sock.c | 19 +-- net/ipv4/inet_hashtables.c | 2 +- net/ipv4/syncookies.c | 6 +- net/ipv4/tcp.c | 14 ++ net/ipv4/tcp_fastopen.c | 10 +- net/ipv4/tcp_input.c| 17 + net/ipv4/tcp_ipv4.c | 13 +++-- net/ipv4/tcp_minisocks.c| 7 --- net/ipv6/inet6_connection_sock.c| 8 +--- net/ipv6/syncookies.c | 5 + net/ipv6/tcp_ipv6.c | 33 ++--- 24 files changed, 118 insertions(+), 151 deletions(-) -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 03/14] tcp: remove tcp_rcv_state_process() tcp_hdr argument
Factorize code to get tcp header from skb. It makes no sense to duplicate code in callers. Signed-off-by: Eric Dumazet--- include/net/tcp.h| 3 +-- net/ipv4/tcp_input.c | 4 ++-- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/tcp_minisocks.c | 2 +- net/ipv6/tcp_ipv6.c | 2 +- 5 files changed, 6 insertions(+), 7 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 1cfdedbe47e1..1fe0bd458cb4 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -365,8 +365,7 @@ void tcp_wfree(struct sk_buff *skb); void tcp_write_timer_handler(struct sock *sk); void tcp_delack_timer_handler(struct sock *sk); int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg); -int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb, - const struct tcphdr *th); +int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb); void tcp_rcv_established(struct sock *sk, struct sk_buff *skb, const struct tcphdr *th, unsigned int len); void tcp_rcv_space_adjust(struct sock *sk); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index dcbddf12f4b3..67b27aee8d28 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5698,11 +5698,11 @@ reset_and_undo: * address independent. */ -int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb, - const struct tcphdr *th) +int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); + const struct tcphdr *th = tcp_hdr(skb); struct request_sock *req; int queued = 0; bool acceptable; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 7e5ae1e01009..67c0dc8bddbf 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1420,7 +1420,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) } else sock_rps_save_rxhash(sk, skb); - if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb))) { + if (tcp_rcv_state_process(sk, skb)) { rsk = sk; goto reset; } diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 9c7c61cf7462..139668cc2347 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -821,7 +821,7 @@ int tcp_child_process(struct sock *parent, struct sock *child, int state = child->sk_state; if (!sock_owned_by_user(child)) { - ret = tcp_rcv_state_process(child, skb, tcp_hdr(skb)); + ret = tcp_rcv_state_process(child, skb); /* Wakeup parent, send SIGIO */ if (state == TCP_SYN_RECV && child->sk_state != state) parent->sk_data_ready(parent); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index b6e473f0f62e..334d548a0cf6 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1272,7 +1272,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb) } else sock_rps_save_rxhash(sk, skb); - if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb))) + if (tcp_rcv_state_process(sk, skb)) goto reset; if (opt_skb) goto ipv6_pktoptions; -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 04/14] dccp: use inet6_csk_route_req() helper
Before changing dccp_v6_request_recv_sock() sock argument to const, we need to get rid of security_sk_classify_flow(), and it seems doable by reusing inet6_csk_route_req() helper. We need to add a proto parameter to inet6_csk_route_req(), not assume it is TCP. Signed-off-by: Eric Dumazet--- include/net/inet6_connection_sock.h | 2 +- net/dccp/ipv6.c | 17 +++-- net/ipv6/inet6_connection_sock.c| 8 +--- net/ipv6/tcp_ipv6.c | 7 --- 4 files changed, 13 insertions(+), 21 deletions(-) diff --git a/include/net/inet6_connection_sock.h b/include/net/inet6_connection_sock.h index 81d937e820c4..79b2a4c09ca6 100644 --- a/include/net/inet6_connection_sock.h +++ b/include/net/inet6_connection_sock.h @@ -26,7 +26,7 @@ int inet6_csk_bind_conflict(const struct sock *sk, const struct inet_bind_bucket *tb, bool relax); struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct flowi6 *fl6, - const struct request_sock *req); + const struct request_sock *req, u8 proto); struct request_sock *inet6_csk_search_req(struct sock *sk, const __be16 rport, diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index aa719e700961..0966bc08d362 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -462,22 +462,11 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk, if (sk_acceptq_is_full(sk)) goto out_overflow; - if (dst == NULL) { - struct in6_addr *final_p, final; + if (!dst) { struct flowi6 fl6; - memset(, 0, sizeof(fl6)); - fl6.flowi6_proto = IPPROTO_DCCP; - fl6.daddr = ireq->ir_v6_rmt_addr; - final_p = fl6_update_dst(, np->opt, ); - fl6.saddr = ireq->ir_v6_loc_addr; - fl6.flowi6_oif = sk->sk_bound_dev_if; - fl6.fl6_dport = ireq->ir_rmt_port; - fl6.fl6_sport = htons(ireq->ir_num); - security_sk_classify_flow(sk, flowi6_to_flowi()); - - dst = ip6_dst_lookup_flow(sk, , final_p); - if (IS_ERR(dst)) + dst = inet6_csk_route_req(sk, , req, IPPROTO_DCCP); + if (!dst) goto out; } diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 91b7d33f508b..163bfef3e5db 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -67,15 +67,16 @@ EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict); struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct flowi6 *fl6, - const struct request_sock *req) + const struct request_sock *req, + u8 proto) { struct inet_request_sock *ireq = inet_rsk(req); - struct ipv6_pinfo *np = inet6_sk(sk); + const struct ipv6_pinfo *np = inet6_sk(sk); struct in6_addr *final_p, final; struct dst_entry *dst; memset(fl6, 0, sizeof(*fl6)); - fl6->flowi6_proto = IPPROTO_TCP; + fl6->flowi6_proto = proto; fl6->daddr = ireq->ir_v6_rmt_addr; final_p = fl6_update_dst(fl6, np->opt, ); fl6->saddr = ireq->ir_v6_loc_addr; @@ -91,6 +92,7 @@ struct dst_entry *inet6_csk_route_req(const struct sock *sk, return dst; } +EXPORT_SYMBOL(inet6_csk_route_req); /* * request_sock (formerly open request) hash tables. diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 334d548a0cf6..092a23ef1feb 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -447,7 +447,8 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst, int err = -ENOMEM; /* First, grab a route. */ - if (!dst && (dst = inet6_csk_route_req(sk, fl6, req)) == NULL) + if (!dst && (dst = inet6_csk_route_req(sk, fl6, req, + IPPROTO_TCP)) == NULL) goto done; skb = tcp_make_synack(sk, dst, req, foc); @@ -694,7 +695,7 @@ static struct dst_entry *tcp_v6_route_req(struct sock *sk, struct flowi *fl, { if (strict) *strict = true; - return inet6_csk_route_req(sk, >u.ip6, req); + return inet6_csk_route_req(sk, >u.ip6, req, IPPROTO_TCP); } struct request_sock_ops tcp6_request_sock_ops __read_mostly = { @@ -1058,7 +1059,7 @@ static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, goto out_overflow; if (!dst) { - dst = inet6_csk_route_req(sk, , req); + dst = inet6_csk_route_req(sk, , req, IPPROTO_TCP); if (!dst) goto out; } -- 2.6.0.rc2.230.g3dd15c0 -- To
[PATCH net-next 14/14] tcp: prepare fastopen code for upcoming listener changes
While auditing TCP stack for upcoming 'lockless' listener changes, I found I had to change fastopen_init_queue() to properly init the object before publishing it. Otherwise an other cpu could try to lock the spinlock before it gets properly initialized. Instead of adding appropriate barriers, just remove dynamic memory allocations : - Structure is 28 bytes on 64bit arches. Using additional 8 bytes for holding a pointer seems overkill. - Two listeners can share same cache line and performance would suffer. If we really want to save few bytes, we would instead dynamically allocate whole struct request_sock_queue in the future. Signed-off-by: Eric Dumazet--- include/linux/tcp.h | 22 -- include/net/request_sock.h | 7 ++- net/core/request_sock.c | 9 - net/ipv4/af_inet.c | 10 +++--- net/ipv4/inet_connection_sock.c | 17 - net/ipv4/tcp.c | 14 ++ net/ipv4/tcp_fastopen.c | 10 +- net/ipv4/tcp_ipv4.c | 2 +- net/ipv6/tcp_ipv6.c | 4 ++-- 9 files changed, 35 insertions(+), 60 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index fcb573be75d9..e442e6e9a365 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -382,25 +382,11 @@ static inline bool tcp_passive_fastopen(const struct sock *sk) tcp_sk(sk)->fastopen_rsk != NULL); } -extern void tcp_sock_destruct(struct sock *sk); - -static inline int fastopen_init_queue(struct sock *sk, int backlog) +static inline void fastopen_queue_tune(struct sock *sk, int backlog) { - struct request_sock_queue *queue = - _csk(sk)->icsk_accept_queue; - - if (queue->fastopenq == NULL) { - queue->fastopenq = kzalloc( - sizeof(struct fastopen_queue), - sk->sk_allocation); - if (queue->fastopenq == NULL) - return -ENOMEM; - - sk->sk_destruct = tcp_sock_destruct; - spin_lock_init(>fastopenq->lock); - } - queue->fastopenq->max_qlen = backlog; - return 0; + struct request_sock_queue *queue = _csk(sk)->icsk_accept_queue; + + queue->fastopenq.max_qlen = backlog; } static inline void tcp_saved_syn_free(struct tcp_sock *tp) diff --git a/include/net/request_sock.h b/include/net/request_sock.h index c146b5284786..d2544de329bd 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -180,11 +180,8 @@ struct request_sock_queue { struct request_sock *rskq_accept_tail; u8 rskq_defer_accept; struct listen_sock *listen_opt; - struct fastopen_queue *fastopenq; /* This is non-NULL iff TFO has been -* enabled on this listener. Check -* max_qlen != 0 in fastopen_queue -* to determine if TFO is enabled -* right at this moment. + struct fastopen_queue fastopenq; /* Check max_qlen != 0 to determine +* if TFO is enabled. */ /* temporary alignment, our goal is to get rid of this lock */ diff --git a/net/core/request_sock.c b/net/core/request_sock.c index b42f0e26f89e..e22cfa4ed25f 100644 --- a/net/core/request_sock.c +++ b/net/core/request_sock.c @@ -59,6 +59,13 @@ int reqsk_queue_alloc(struct request_sock_queue *queue, get_random_bytes(>hash_rnd, sizeof(lopt->hash_rnd)); spin_lock_init(>syn_wait_lock); + + spin_lock_init(>fastopenq.lock); + queue->fastopenq.rskq_rst_head = NULL; + queue->fastopenq.rskq_rst_tail = NULL; + queue->fastopenq.qlen = 0; + queue->fastopenq.max_qlen = 0; + queue->rskq_accept_head = NULL; lopt->nr_table_entries = nr_table_entries; lopt->max_qlen_log = ilog2(nr_table_entries); @@ -174,7 +181,7 @@ void reqsk_fastopen_remove(struct sock *sk, struct request_sock *req, struct sock *lsk = req->rsk_listener; struct fastopen_queue *fastopenq; - fastopenq = inet_csk(lsk)->icsk_accept_queue.fastopenq; + fastopenq = _csk(lsk)->icsk_accept_queue.fastopenq; tcp_sk(sk)->fastopen_rsk = NULL; spin_lock_bh(>lock); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 8a556643b874..3af85eecbe11 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -219,17 +219,13 @@ int inet_listen(struct socket *sock, int backlog) * shutdown() (rather than close()). */ if ((sysctl_tcp_fastopen & TFO_SERVER_ENABLE) != 0 && - !inet_csk(sk)->icsk_accept_queue.fastopenq) { + !inet_csk(sk)->icsk_accept_queue.fastopenq.max_qlen) {
[PATCH net-next 13/14] tcp: constify tcp_syn_flood_action() socket argument
tcp_syn_flood_action() will soon be called with unlocked socket. In order to avoid SYN flood warning being emitted multiple times, use xchg(). Extend max_qlen_log and synflood_warned fields in struct listen_sock to u32 Signed-off-by: Eric Dumazet--- include/net/request_sock.h | 5 ++--- net/ipv4/tcp_input.c | 9 + 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/net/request_sock.h b/include/net/request_sock.h index 90247ec7955b..c146b5284786 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -129,9 +129,8 @@ struct listen_sock { atomic_tqlen_dec; /* qlen = qlen_inc - qlen_dec */ atomic_tyoung_dec; - u8 max_qlen_log cacheline_aligned_in_smp; - u8 synflood_warned; - /* 2 bytes hole, try to use */ + u32 max_qlen_log cacheline_aligned_in_smp; + u32 synflood_warned; u32 hash_rnd; u32 nr_table_entries; struct request_sock *syn_table[0]; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 67b27aee8d28..e58cbcd2f07e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -6064,7 +6064,7 @@ EXPORT_SYMBOL(inet_reqsk_alloc); /* * Return true if a syncookie should be sent */ -static bool tcp_syn_flood_action(struct sock *sk, +static bool tcp_syn_flood_action(const struct sock *sk, const struct sk_buff *skb, const char *proto) { @@ -6082,11 +6082,12 @@ static bool tcp_syn_flood_action(struct sock *sk, NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPREQQFULLDROP); lopt = inet_csk(sk)->icsk_accept_queue.listen_opt; - if (!lopt->synflood_warned && sysctl_tcp_syncookies != 2) { - lopt->synflood_warned = 1; + if (!lopt->synflood_warned && + sysctl_tcp_syncookies != 2 && + xchg(>synflood_warned, 1) == 0) pr_info("%s: Possible SYN flooding on port %d. %s. Check SNMP counters.\n", proto, ntohs(tcp_hdr(skb)->dest), msg); - } + return want_cookie; } -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 11/14] tcp: cookie_init_sequence() cleanups
Some common IPv4/IPv6 code can be factorized. Also constify cookie_init_sequence() socket argument. Signed-off-by: Eric Dumazet--- include/net/tcp.h | 19 ++- net/ipv4/syncookies.c | 6 +- net/ipv6/syncookies.c | 5 + 3 files changed, 12 insertions(+), 18 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index a1d2f5d6a430..5aa6672c6f5b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -491,8 +491,9 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb); /* syncookies: remember time of last synqueue overflow * But do not dirty this field too often (once per second is enough) + * It is racy as we do not hold a lock, but race is very minor. */ -static inline void tcp_synq_overflow(struct sock *sk) +static inline void tcp_synq_overflow(const struct sock *sk) { unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; unsigned long now = jiffies; @@ -519,8 +520,7 @@ static inline u32 tcp_cookie_time(void) u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th, u16 *mssp); -__u32 cookie_v4_init_sequence(struct sock *sk, const struct sk_buff *skb, - __u16 *mss); +__u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mss); __u32 cookie_init_timestamp(struct request_sock *req); bool cookie_timestamp_decode(struct tcp_options_received *opt); bool cookie_ecn_ok(const struct tcp_options_received *opt, @@ -533,8 +533,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb); u32 __cookie_v6_init_sequence(const struct ipv6hdr *iph, const struct tcphdr *th, u16 *mssp); -__u32 cookie_v6_init_sequence(struct sock *sk, const struct sk_buff *skb, - __u16 *mss); +__u32 cookie_v6_init_sequence(const struct sk_buff *skb, __u16 *mss); #endif /* tcp_output.c */ @@ -1709,7 +1708,7 @@ struct tcp_request_sock_ops { const struct sock *sk_listener, struct sk_buff *skb); #ifdef CONFIG_SYN_COOKIES - __u32 (*cookie_init_seq)(struct sock *sk, const struct sk_buff *skb, + __u32 (*cookie_init_seq)(const struct sk_buff *skb, __u16 *mss); #endif struct dst_entry *(*route_req)(struct sock *sk, struct flowi *fl, @@ -1725,14 +1724,16 @@ struct tcp_request_sock_ops { #ifdef CONFIG_SYN_COOKIES static inline __u32 cookie_init_sequence(const struct tcp_request_sock_ops *ops, -struct sock *sk, struct sk_buff *skb, +const struct sock *sk, struct sk_buff *skb, __u16 *mss) { - return ops->cookie_init_seq(sk, skb, mss); + tcp_synq_overflow(sk); + NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT); + return ops->cookie_init_seq(skb, mss); } #else static inline __u32 cookie_init_sequence(const struct tcp_request_sock_ops *ops, -struct sock *sk, struct sk_buff *skb, +const struct sock *sk, struct sk_buff *skb, __u16 *mss) { return 0; diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 6595affded20..6b97b5f6457c 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -192,15 +192,11 @@ u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th, } EXPORT_SYMBOL_GPL(__cookie_v4_init_sequence); -__u32 cookie_v4_init_sequence(struct sock *sk, const struct sk_buff *skb, - __u16 *mssp) +__u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mssp) { const struct iphdr *iph = ip_hdr(skb); const struct tcphdr *th = tcp_hdr(skb); - tcp_synq_overflow(sk); - NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT); - return __cookie_v4_init_sequence(iph, th, mssp); } diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c index 2461b3ff9551..7606eba83e7b 100644 --- a/net/ipv6/syncookies.c +++ b/net/ipv6/syncookies.c @@ -114,14 +114,11 @@ u32 __cookie_v6_init_sequence(const struct ipv6hdr *iph, } EXPORT_SYMBOL_GPL(__cookie_v6_init_sequence); -__u32 cookie_v6_init_sequence(struct sock *sk, const struct sk_buff *skb, __u16 *mssp) +__u32 cookie_v6_init_sequence(const struct sk_buff *skb, __u16 *mssp) { const struct ipv6hdr *iph = ipv6_hdr(skb); const struct tcphdr *th = tcp_hdr(skb); - tcp_synq_overflow(sk); - NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT); - return __cookie_v6_init_sequence(iph, th, mssp); } -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH net-next 09/14] tcp: constify tcp_create_openreq_child() socket argument
This method does not touch the listener socket. Signed-off-by: Eric Dumazet--- include/net/tcp.h| 2 +- net/ipv4/tcp_minisocks.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 1fe0bd458cb4..85995c1291d0 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -450,7 +450,7 @@ void tcp_v4_send_check(struct sock *sk, struct sk_buff *skb); void tcp_v4_mtu_reduced(struct sock *sk); void tcp_req_err(struct sock *sk, u32 seq); int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb); -struct sock *tcp_create_openreq_child(struct sock *sk, +struct sock *tcp_create_openreq_child(const struct sock *sk, struct request_sock *req, struct sk_buff *skb); void tcp_ca_openreq_child(struct sock *sk, const struct dst_entry *dst); diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 139668cc2347..897e34273ba3 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -441,7 +441,9 @@ EXPORT_SYMBOL_GPL(tcp_ca_openreq_child); * Actually, we could lots of memory writes here. tp of listening * socket contains all necessary default parameters. */ -struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, struct sk_buff *skb) +struct sock *tcp_create_openreq_child(const struct sock *sk, + struct request_sock *req, + struct sk_buff *skb) { struct sock *newsk = inet_csk_clone_lock(sk, req, GFP_ATOMIC); -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 10/14] tcp/dccp: constify syn_recv_sock() method sock argument
We'll soon no longer hold listener socket lock, these functions do not modify the socket in any way. Signed-off-by: Eric Dumazet--- include/net/inet_connection_sock.h | 2 +- include/net/tcp.h | 2 +- net/dccp/dccp.h| 2 +- net/dccp/ipv4.c| 3 ++- net/dccp/ipv6.c| 5 +++-- net/ipv4/tcp_ipv4.c| 2 +- net/ipv6/tcp_ipv6.c| 5 +++-- 7 files changed, 12 insertions(+), 9 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 187cef7e56d5..ee54f21a8113 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -41,7 +41,7 @@ struct inet_connection_sock_af_ops { int (*rebuild_header)(struct sock *sk); void(*sk_rx_dst_set)(struct sock *sk, const struct sk_buff *skb); int (*conn_request)(struct sock *sk, struct sk_buff *skb); - struct sock *(*syn_recv_sock)(struct sock *sk, struct sk_buff *skb, + struct sock *(*syn_recv_sock)(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst); u16 net_header_len; diff --git a/include/net/tcp.h b/include/net/tcp.h index 85995c1291d0..a1d2f5d6a430 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -454,7 +454,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, struct request_sock *req, struct sk_buff *skb); void tcp_ca_openreq_child(struct sock *sk, const struct dst_entry *dst); -struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb, +struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst); int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb); diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index 2409619b7043..e1f823451565 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -276,7 +276,7 @@ struct sock *dccp_create_openreq_child(const struct sock *sk, int dccp_v4_do_rcv(struct sock *sk, struct sk_buff *skb); -struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb, +struct sock *dccp_v4_request_recv_sock(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst); struct sock *dccp_check_req(struct sock *sk, struct sk_buff *skb, diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 00a14fa4270a..5b7818c63cec 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -390,7 +390,8 @@ static inline u64 dccp_v4_init_sequence(const struct sk_buff *skb) * * This is the equivalent of TCP's tcp_v4_syn_recv_sock */ -struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb, +struct sock *dccp_v4_request_recv_sock(const struct sock *sk, + struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst) { diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 0966bc08d362..e8753aa3b7a4 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -408,13 +408,14 @@ drop: return -1; } -static struct sock *dccp_v6_request_recv_sock(struct sock *sk, +static struct sock *dccp_v6_request_recv_sock(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst) { struct inet_request_sock *ireq = inet_rsk(req); - struct ipv6_pinfo *newnp, *np = inet6_sk(sk); + struct ipv6_pinfo *newnp; + const struct ipv6_pinfo *np = inet6_sk(sk); struct inet_sock *newinet; struct dccp6_sock *newdp6; struct sock *newsk; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 67c0dc8bddbf..ee0239e190cf 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1242,7 +1242,7 @@ EXPORT_SYMBOL(tcp_v4_conn_request); * The three way handshake has completed - we got a valid synack - * now create the new socket. */ -struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb, +struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst) { diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 092a23ef1feb..2330c7be6323 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -986,12 +986,13 @@ drop: return 0; /* don't send reset */ } -static struct sock *tcp_v6_syn_recv_sock(struct sock *sk,
[PATCH net-next 12/14] tcp: constify tcp_v{4|6}_route_req() sock argument
These functions do not change the listener socket. Goal is to make sure tcp_conn_request() is not messing with listener in a racy way. Signed-off-by: Eric Dumazet--- include/net/tcp.h | 2 +- net/ipv4/tcp_ipv4.c | 3 ++- net/ipv6/tcp_ipv6.c | 3 ++- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 5aa6672c6f5b..2c7dfe52f473 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1711,7 +1711,7 @@ struct tcp_request_sock_ops { __u32 (*cookie_init_seq)(const struct sk_buff *skb, __u16 *mss); #endif - struct dst_entry *(*route_req)(struct sock *sk, struct flowi *fl, + struct dst_entry *(*route_req)(const struct sock *sk, struct flowi *fl, const struct request_sock *req, bool *strict); __u32 (*init_seq)(const struct sk_buff *skb); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index ee0239e190cf..f551e9e862db 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1180,7 +1180,8 @@ static void tcp_v4_init_req(struct request_sock *req, ireq->opt = tcp_v4_save_options(skb); } -static struct dst_entry *tcp_v4_route_req(struct sock *sk, struct flowi *fl, +static struct dst_entry *tcp_v4_route_req(const struct sock *sk, + struct flowi *fl, const struct request_sock *req, bool *strict) { diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 2330c7be6323..97bc26e0cd0f 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -689,7 +689,8 @@ static void tcp_v6_init_req(struct request_sock *req, } } -static struct dst_entry *tcp_v6_route_req(struct sock *sk, struct flowi *fl, +static struct dst_entry *tcp_v6_route_req(const struct sock *sk, + struct flowi *fl, const struct request_sock *req, bool *strict) { -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/6] net: switchdev: move dev in switchdev_fdb_dump
The FDB dump callback requires the related net_device so move it to the struct switchdev_fdb_dump superset instead of using a callback param. With this done, it'll be simpler to change the dump function signature. Signed-off-by: Vivien Didelot--- net/switchdev/switchdev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 56d34ed..c0e2047 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -858,6 +858,7 @@ EXPORT_SYMBOL_GPL(switchdev_port_fdb_del); struct switchdev_fdb_dump { struct switchdev_obj obj; + struct net_device *dev; struct sk_buff *skb; struct netlink_callback *cb; int idx; @@ -887,7 +888,7 @@ static int switchdev_port_fdb_dump_cb(struct net_device *dev, ndm->ndm_pad2= 0; ndm->ndm_flags = NTF_SELF; ndm->ndm_type= 0; - ndm->ndm_ifindex = dev->ifindex; + ndm->ndm_ifindex = dump->dev->ifindex; ndm->ndm_state = obj->u.fdb.ndm_state; if (nla_put(dump->skb, NDA_LLADDR, ETH_ALEN, obj->u.fdb.addr)) @@ -927,6 +928,7 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb, .id = SWITCHDEV_OBJ_PORT_FDB, .cb = switchdev_port_fdb_dump_cb, }, + .dev = dev, .skb = skb, .cb = cb, .idx = idx, -- 2.5.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/1] net sysfs: Print link speed as signed integer
On 28/09/15 14:05, Alexander Stein wrote: Otherwise 4294967295 (MBit/s) (-1) will be printed when there is no link. Documentation/ABI/testing/sysfs-class-net does not state if this shall be signed or unsigned. Also remove the now unused variable fmt_udec. [...] - ret = sprintf(buf, fmt_udec, ethtool_cmd_speed()); + ret = sprintf(buf, fmt_dec, ethtool_cmd_speed()); If we print anything numeric, why is zero not appropriate (which would still be unsigned)? -- Cheers, Jeremy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/6] net: switchdev: remove dev from switchdev_obj cb
The net_device associated to a dump operation does not have to be passed to the callback. switchdev stores it in a superset struct, if needed. Also some drivers (such as DSA drivers) may not have easy access to it. This will simplify pushing the callback function down to the drivers. Signed-off-by: Vivien Didelot--- drivers/net/ethernet/rocker/rocker.c | 4 ++-- include/net/switchdev.h | 2 +- net/dsa/slave.c | 4 ++-- net/switchdev/switchdev.c| 6 ++ 4 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index d3f6632..78fd443 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4556,7 +4556,7 @@ static int rocker_port_fdb_dump(const struct rocker_port *rocker_port, fdb->ndm_state = NUD_REACHABLE; fdb->vid = rocker_port_vlan_to_vid(rocker_port, found->key.vlan_id); - err = obj->cb(rocker_port->dev, obj); + err = obj->cb(obj); if (err) break; } @@ -4579,7 +4579,7 @@ static int rocker_port_vlan_dump(const struct rocker_port *rocker_port, if (rocker_vlan_id_is_internal(htons(vid))) vlan->flags |= BRIDGE_VLAN_INFO_PVID; vlan->vid_begin = vlan->vid_end = vid; - err = obj->cb(rocker_port->dev, obj); + err = obj->cb(obj); if (err) break; } diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 1820787..9ef7c56 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -66,7 +66,7 @@ enum switchdev_obj_id { struct switchdev_obj { enum switchdev_obj_id id; - int (*cb)(struct net_device *dev, struct switchdev_obj *obj); + int (*cb)(struct switchdev_obj *obj); union { struct switchdev_obj_vlan { /* PORT_VLAN */ u16 flags; diff --git a/net/dsa/slave.c b/net/dsa/slave.c index f18cae5..0b47647 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -334,7 +334,7 @@ static int dsa_slave_port_vlan_dump(struct net_device *dev, if (test_bit(p->port, untagged)) vlan->flags |= BRIDGE_VLAN_INFO_UNTAGGED; - err = obj->cb(dev, obj); + err = obj->cb(obj); if (err) break; } @@ -397,7 +397,7 @@ static int dsa_slave_port_fdb_dump(struct net_device *dev, obj->u.fdb.vid = vid; obj->u.fdb.ndm_state = is_static ? NUD_NOARP : NUD_REACHABLE; - ret = obj->cb(dev, obj); + ret = obj->cb(obj); if (ret < 0) break; } diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index c0e2047..93f4971 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -514,8 +514,7 @@ static int switchdev_port_vlan_dump_put(struct switchdev_vlan_dump *dump) return 0; } -static int switchdev_port_vlan_dump_cb(struct net_device *dev, - struct switchdev_obj *obj) +static int switchdev_port_vlan_dump_cb(struct switchdev_obj *obj) { struct switchdev_vlan_dump *dump = container_of(obj, struct switchdev_vlan_dump, obj); @@ -864,8 +863,7 @@ struct switchdev_fdb_dump { int idx; }; -static int switchdev_port_fdb_dump_cb(struct net_device *dev, - struct switchdev_obj *obj) +static int switchdev_port_fdb_dump_cb(struct switchdev_obj *obj) { struct switchdev_fdb_dump *dump = container_of(obj, struct switchdev_fdb_dump, obj); -- 2.5.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/6] net: switchdev: remove dev in port_vlan_dump_put
The static switchdev_port_vlan_dump_put function don't need the net_device parameter, so remove it. Signed-off-by: Vivien Didelot--- net/switchdev/switchdev.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 00ee547..56d34ed 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -484,8 +484,7 @@ struct switchdev_vlan_dump { u16 end; }; -static int switchdev_port_vlan_dump_put(struct net_device *dev, - struct switchdev_vlan_dump *dump) +static int switchdev_port_vlan_dump_put(struct switchdev_vlan_dump *dump) { struct bridge_vlan_info vinfo; @@ -531,7 +530,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device *dev, for (dump->begin = dump->end = vlan->vid_begin; dump->begin <= vlan->vid_end; dump->begin++, dump->end++) { - err = switchdev_port_vlan_dump_put(dev, dump); + err = switchdev_port_vlan_dump_put(dump); if (err) return err; } @@ -543,7 +542,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device *dev, /* prepend */ dump->begin = vlan->vid_begin; } else { - err = switchdev_port_vlan_dump_put(dev, dump); + err = switchdev_port_vlan_dump_put(dump); dump->flags = vlan->flags; dump->begin = vlan->vid_begin; dump->end = vlan->vid_end; @@ -555,7 +554,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device *dev, /* append */ dump->end = vlan->vid_end; } else { - err = switchdev_port_vlan_dump_put(dev, dump); + err = switchdev_port_vlan_dump_put(dump); dump->flags = vlan->flags; dump->begin = vlan->vid_begin; dump->end = vlan->vid_end; @@ -588,7 +587,7 @@ static int switchdev_port_vlan_fill(struct sk_buff *skb, struct net_device *dev, goto err_out; if (filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED) /* last one */ - err = switchdev_port_vlan_dump_put(dev, ); + err = switchdev_port_vlan_dump_put(); } err_out: -- 2.5.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 06/14] inet: constify __inet_inherit_port() sock argument
socket is not touched, make it const. Signed-off-by: Eric Dumazet--- include/net/inet_hashtables.h | 2 +- net/ipv4/inet_hashtables.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index b07d126694a7..3fb778d7c875 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -199,7 +199,7 @@ static inline int inet_sk_listen_hashfn(const struct sock *sk) } /* Caller must disable local BH processing. */ -int __inet_inherit_port(struct sock *sk, struct sock *child); +int __inet_inherit_port(const struct sock *sk, struct sock *child); void inet_put_port(struct sock *sk); diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 89120196a949..56742e995dd3 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -126,7 +126,7 @@ void inet_put_port(struct sock *sk) } EXPORT_SYMBOL(inet_put_port); -int __inet_inherit_port(struct sock *sk, struct sock *child) +int __inet_inherit_port(const struct sock *sk, struct sock *child) { struct inet_hashinfo *table = sk->sk_prot->h.hashinfo; unsigned short port = inet_sk(child)->inet_num; -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 05/14] inet: constify inet_csk_route_child_sock() socket argument
The socket points to the (shared) listener. Signed-off-by: Eric Dumazet--- include/net/inet_connection_sock.h | 3 ++- net/ipv4/inet_connection_sock.c| 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 00c3ced6ee55..187cef7e56d5 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -268,7 +268,8 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum); struct dst_entry *inet_csk_route_req(const struct sock *sk, struct flowi4 *fl4, const struct request_sock *req); -struct dst_entry *inet_csk_route_child_sock(struct sock *sk, struct sock *newsk, +struct dst_entry *inet_csk_route_child_sock(const struct sock *sk, + struct sock *newsk, const struct request_sock *req); static inline void inet_csk_reqsk_queue_add(struct sock *sk, diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index ba2f90d90cb5..694a5e8f4f9f 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -439,7 +439,7 @@ no_route: } EXPORT_SYMBOL_GPL(inet_csk_route_req); -struct dst_entry *inet_csk_route_child_sock(struct sock *sk, +struct dst_entry *inet_csk_route_child_sock(const struct sock *sk, struct sock *newsk, const struct request_sock *req) { -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 08/14] dccp: constify dccp_create_openreq_child() sock argument
socket no longer needs to be read/write Signed-off-by: Eric Dumazet--- net/dccp/dccp.h | 2 +- net/dccp/minisocks.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index 8ed1df2771bd..2409619b7043 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -270,7 +270,7 @@ int dccp_reqsk_init(struct request_sock *rq, struct dccp_sock const *dp, int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb); -struct sock *dccp_create_openreq_child(struct sock *sk, +struct sock *dccp_create_openreq_child(const struct sock *sk, const struct request_sock *req, const struct sk_buff *skb); diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c index 9bfd0dc1e6cb..d10aace43672 100644 --- a/net/dccp/minisocks.c +++ b/net/dccp/minisocks.c @@ -72,7 +72,7 @@ void dccp_time_wait(struct sock *sk, int state, int timeo) dccp_done(sk); } -struct sock *dccp_create_openreq_child(struct sock *sk, +struct sock *dccp_create_openreq_child(const struct sock *sk, const struct request_sock *req, const struct sk_buff *skb) { -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 07/14] net: constify sk_gfp_atomic() sock argument
Signed-off-by: Eric Dumazet--- include/net/sock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/sock.h b/include/net/sock.h index 94dff7f566f5..dfe2eb8e1132 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -759,7 +759,7 @@ static inline int sk_memalloc_socks(void) #endif -static inline gfp_t sk_gfp_atomic(struct sock *sk, gfp_t gfp_mask) +static inline gfp_t sk_gfp_atomic(const struct sock *sk, gfp_t gfp_mask) { return GFP_ATOMIC | (sk->sk_allocation & __GFP_MEMALLOC); } -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 02/14] tcp: remove unused len argument from tcp_rcv_state_process()
Once we realize tcp_rcv_synsent_state_process() does not use its 'len' argument and we get rid of it, then it becomes clear this argument is no longer used in tcp_rcv_state_process() Signed-off-by: Eric Dumazet--- include/net/tcp.h| 2 +- net/ipv4/tcp_input.c | 6 +++--- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/tcp_minisocks.c | 3 +-- net/ipv6/tcp_ipv6.c | 2 +- 5 files changed, 7 insertions(+), 8 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index cdbf63d3c5cf..1cfdedbe47e1 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -366,7 +366,7 @@ void tcp_write_timer_handler(struct sock *sk); void tcp_delack_timer_handler(struct sock *sk); int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg); int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb, - const struct tcphdr *th, unsigned int len); + const struct tcphdr *th); void tcp_rcv_established(struct sock *sk, struct sk_buff *skb, const struct tcphdr *th, unsigned int len); void tcp_rcv_space_adjust(struct sock *sk); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4964d53907e9..dcbddf12f4b3 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5472,7 +5472,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack, } static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, -const struct tcphdr *th, unsigned int len) +const struct tcphdr *th) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); @@ -5699,7 +5699,7 @@ reset_and_undo: */ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb, - const struct tcphdr *th, unsigned int len) + const struct tcphdr *th) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); @@ -5749,7 +5749,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb, goto discard; case TCP_SYN_SENT: - queued = tcp_rcv_synsent_state_process(sk, skb, th, len); + queued = tcp_rcv_synsent_state_process(sk, skb, th); if (queued >= 0) return queued; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 4300d0132b9f..7e5ae1e01009 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1420,7 +1420,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) } else sock_rps_save_rxhash(sk, skb); - if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb), skb->len)) { + if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb))) { rsk = sk; goto reset; } diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index e4fe62b6b106..9c7c61cf7462 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -821,8 +821,7 @@ int tcp_child_process(struct sock *parent, struct sock *child, int state = child->sk_state; if (!sock_owned_by_user(child)) { - ret = tcp_rcv_state_process(child, skb, tcp_hdr(skb), - skb->len); + ret = tcp_rcv_state_process(child, skb, tcp_hdr(skb)); /* Wakeup parent, send SIGIO */ if (state == TCP_SYN_RECV && child->sk_state != state) parent->sk_data_ready(parent); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index c47e5c87a2a8..b6e473f0f62e 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1272,7 +1272,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb) } else sock_rps_save_rxhash(sk, skb); - if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb), skb->len)) + if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb))) goto reset; if (opt_skb) goto ipv6_pktoptions; -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 01/14] tcp/dccp: constify send_synack and send_reset socket argument
None of these functions need to change the socket, make it const. Signed-off-by: Eric Dumazet--- include/net/request_sock.h | 4 ++-- net/dccp/dccp.h| 2 +- net/dccp/ipv4.c| 2 +- net/dccp/ipv6.c| 2 +- net/dccp/minisocks.c | 2 +- net/ipv4/tcp_ipv4.c| 4 ++-- net/ipv6/tcp_ipv6.c| 12 ++-- 7 files changed, 14 insertions(+), 14 deletions(-) diff --git a/include/net/request_sock.h b/include/net/request_sock.h index 181f97f9fe1c..90247ec7955b 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -34,9 +34,9 @@ struct request_sock_ops { char*slab_name; int (*rtx_syn_ack)(const struct sock *sk, struct request_sock *req); - void(*send_ack)(struct sock *sk, struct sk_buff *skb, + void(*send_ack)(const struct sock *sk, struct sk_buff *skb, struct request_sock *req); - void(*send_reset)(struct sock *sk, + void(*send_reset)(const struct sock *sk, struct sk_buff *skb); void(*destructor)(struct request_sock *req); void(*syn_ack_timeout)(const struct request_sock *req); diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index 31e96df500d1..8ed1df2771bd 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -229,7 +229,7 @@ void dccp_v4_send_check(struct sock *sk, struct sk_buff *skb); int dccp_retransmit_skb(struct sock *sk); void dccp_send_ack(struct sock *sk); -void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb, +void dccp_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, struct request_sock *rsk); void dccp_send_sync(struct sock *sk, const u64 seq, diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index a46ae9c69ccf..00a14fa4270a 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -527,7 +527,7 @@ out: return err; } -static void dccp_v4_ctl_send_reset(struct sock *sk, struct sk_buff *rxskb) +static void dccp_v4_ctl_send_reset(const struct sock *sk, struct sk_buff *rxskb) { int err; const struct iphdr *rxiph; diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 4fa199dc69a3..aa719e700961 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -234,7 +234,7 @@ static void dccp_v6_reqsk_destructor(struct request_sock *req) kfree_skb(inet_rsk(req)->pktopts); } -static void dccp_v6_ctl_send_reset(struct sock *sk, struct sk_buff *rxskb) +static void dccp_v6_ctl_send_reset(const struct sock *sk, struct sk_buff *rxskb) { const struct ipv6hdr *rxip6h; struct sk_buff *skb; diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c index 838f524cf11a..9bfd0dc1e6cb 100644 --- a/net/dccp/minisocks.c +++ b/net/dccp/minisocks.c @@ -236,7 +236,7 @@ int dccp_child_process(struct sock *parent, struct sock *child, EXPORT_SYMBOL_GPL(dccp_child_process); -void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb, +void dccp_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, struct request_sock *rsk) { DCCP_BUG("DCCP-ACK packets are never sent in LISTEN/RESPOND state"); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index a23ba7daecbf..4300d0132b9f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -576,7 +576,7 @@ EXPORT_SYMBOL(tcp_v4_send_check); * Exception: precedence violation. We do not implement it in any case. */ -static void tcp_v4_send_reset(struct sock *sk, struct sk_buff *skb) +static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) { const struct tcphdr *th = tcp_hdr(skb); struct { @@ -795,7 +795,7 @@ static void tcp_v4_timewait_ack(struct sock *sk, struct sk_buff *skb) inet_twsk_put(tw); } -static void tcp_v4_reqsk_send_ack(struct sock *sk, struct sk_buff *skb, +static void tcp_v4_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, struct request_sock *req) { /* sk->sk_state == TCP_LISTEN -> for regular TCP_SYN_RECV diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 16fb299dcab8..c47e5c87a2a8 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -70,8 +70,8 @@ #include #include -static voidtcp_v6_send_reset(struct sock *sk, struct sk_buff *skb); -static voidtcp_v6_reqsk_send_ack(struct sock *sk, struct sk_buff *skb, +static voidtcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb); +static voidtcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, struct request_sock *req); static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb); @@ -724,7 +724,7 @@ static const struct tcp_request_sock_ops tcp_request_sock_ipv6_ops = { .queue_hash_add =
[PATCH net-next 6/6] net: switchdev: extract struct switchdev_obj_*
Now that switchdev and its drivers directly use specific switchdev_obj_* structures, move them out of the switchdev_obj union and get rif of this outer structure. Signed-off-by: Vivien Didelot--- include/net/switchdev.h | 53 - 1 file changed, 26 insertions(+), 27 deletions(-) diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 230fcfc..8a3bacc 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -64,30 +64,29 @@ enum switchdev_obj_id { SWITCHDEV_OBJ_PORT_FDB, }; -struct switchdev_obj { - enum switchdev_obj_id id; - int (*cb)(struct switchdev_obj *obj); - union { - struct switchdev_obj_vlan { /* PORT_VLAN */ - u16 flags; - u16 vid_begin; - u16 vid_end; - } vlan; - struct switchdev_obj_ipv4_fib { /* IPV4_FIB */ - u32 dst; - int dst_len; - struct fib_info *fi; - u8 tos; - u8 type; - u32 nlflags; - u32 tb_id; - } ipv4_fib; - struct switchdev_obj_fdb { /* PORT_FDB */ - const unsigned char *addr; - u16 vid; - u16 ndm_state; - } fdb; - } u; +/* SWITCHDEV_OBJ_PORT_VLAN */ +struct switchdev_obj_vlan { + u16 flags; + u16 vid_begin; + u16 vid_end; +}; + +/* SWITCHDEV_OBJ_IPV4_FIB */ +struct switchdev_obj_ipv4_fib { + u32 dst; + int dst_len; + struct fib_info *fi; + u8 tos; + u8 type; + u32 nlflags; + u32 tb_id; +}; + +/* SWITCHDEV_OBJ_PORT_FDB */ +struct switchdev_obj_fdb { + const unsigned char *addr; + u16 vid; + u16 ndm_state; }; void switchdev_trans_item_enqueue(struct switchdev_trans *trans, @@ -102,11 +101,11 @@ void *switchdev_trans_item_dequeue(struct switchdev_trans *trans); * * @switchdev_port_attr_set: Set a port attribute (see switchdev_attr). * - * @switchdev_port_obj_add: Add an object to port (see switchdev_obj). + * @switchdev_port_obj_add: Add an object to port (see switchdev_obj_*). * - * @switchdev_port_obj_del: Delete an object from port (see switchdev_obj). + * @switchdev_port_obj_del: Delete an object from port (see switchdev_obj_*). * - * @switchdev_port_obj_dump: Dump port objects (see switchdev_obj). + * @switchdev_port_obj_dump: Dump port objects (see switchdev_obj_*). */ struct switchdev_ops { int (*switchdev_port_attr_get)(struct net_device *dev, -- 2.5.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] net/mlx4: Handle return codes in mlx4_qp_attach_common
Both new_steering_entry() and existing_steering_entry() return values based on their success or failure, but currently they fall through silently. This can make troubleshooting difficult, as we were unable to tell which one of these two functions returned errors or specifically what code was returned. This patch remedies that situation by passing the return codes to err, which is returned by mlx4_qp_attach_common() itself. This also addresses a leak in the call to mlx4_bitmap_free() as well. Signed-off-by: Robb Manes--- Sorry about the poor formatting; I should have used git-send properly. drivers/net/ethernet/mellanox/mlx4/mcg.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c index bd9ea0d..1d4e2e0 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mcg.c +++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c @@ -1184,10 +1184,11 @@ out: if (prot == MLX4_PROT_ETH) { /* manage the steering entry for promisc mode */ if (new_entry) - new_steering_entry(dev, port, steer, index, qp->qpn); + err = new_steering_entry(dev, port, steer, +index, qp->qpn); else - existing_steering_entry(dev, port, steer, - index, qp->qpn); + err = existing_steering_entry(dev, port, steer, + index, qp->qpn); } if (err && link && index != -1) { if (index < dev->caps.num_mgms) -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
question about potential integer truncation in mwifiex_set_wapi_ie and mwifiex_set_wps_ie
hi all, in drivers/net/wireless/mwifiex/sta_ioctl.c the following functions mwifiex_set_wpa_ie_helper mwifiex_set_wapi_ie mwifiex_set_wps_ie can truncate the incoming ie_len argument from u16 to u8 when it gets stored in mwifiex_private.wpa_ie_len, mwifiex_private.wapi_ie_len and mwifiex_private.wps_ie_len, respectively. based on some light code reading it seems a length value of 256 is valid (IEEE_MAX_IE_SIZE and MWIFIEX_MAX_VSIE_LEN seem to limit it) and thus would get truncated to 0 when stored in those u8 fields. the question is whether this is intentional or a bug somewhere. FTR, this issue was detected with the upcoming version of the size overflow plugin we have in PaX/grsecurity and there're a handful of similar cases in the tree where potentially unwanted or unnecessary integer truncations occur, this being one of these. any opinion/help is welcome! cheers, PaX Team -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 5/6] net: switchdev: abstract object in add/del ops
Similar to the notifier_call callback of a notifier_block, change the function signature of switchdev add and del operations to: int switchdev_port_obj_add/del(struct net_device *dev, enum switchdev_obj_id id, void *obj); This allows the caller to pass a specific switchdev_obj_* structure instead of the generic switchdev_obj one. Drivers implementation of these operations and switchdev have been changed accordingly. Signed-off-by: Vivien Didelot--- drivers/net/ethernet/rocker/rocker.c | 21 +++--- include/net/switchdev.h | 18 -- net/bridge/br_fdb.c | 11 ++-- net/bridge/br_vlan.c | 24 +++ net/dsa/slave.c | 20 +++--- net/switchdev/switchdev.c| 122 --- 6 files changed, 99 insertions(+), 117 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 107adb6..9773f5b 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4437,26 +4437,25 @@ static int rocker_port_fdb_add(struct rocker_port *rocker_port, } static int rocker_port_obj_add(struct net_device *dev, - struct switchdev_obj *obj, + enum switchdev_obj_id id, const void *obj, struct switchdev_trans *trans) { struct rocker_port *rocker_port = netdev_priv(dev); const struct switchdev_obj_ipv4_fib *fib4; int err = 0; - switch (obj->id) { + switch (id) { case SWITCHDEV_OBJ_PORT_VLAN: - err = rocker_port_vlans_add(rocker_port, trans, - >u.vlan); + err = rocker_port_vlans_add(rocker_port, trans, obj); break; case SWITCHDEV_OBJ_IPV4_FIB: - fib4 = >u.ipv4_fib; + fib4 = obj; err = rocker_port_fib_ipv4(rocker_port, trans, htonl(fib4->dst), fib4->dst_len, fib4->fi, fib4->tb_id, 0); break; case SWITCHDEV_OBJ_PORT_FDB: - err = rocker_port_fdb_add(rocker_port, trans, >u.fdb); + err = rocker_port_fdb_add(rocker_port, trans, obj); break; default: err = -EOPNOTSUPP; @@ -4509,25 +4508,25 @@ static int rocker_port_fdb_del(struct rocker_port *rocker_port, } static int rocker_port_obj_del(struct net_device *dev, - struct switchdev_obj *obj) + enum switchdev_obj_id id, const void *obj) { struct rocker_port *rocker_port = netdev_priv(dev); const struct switchdev_obj_ipv4_fib *fib4; int err = 0; - switch (obj->id) { + switch (id) { case SWITCHDEV_OBJ_PORT_VLAN: - err = rocker_port_vlans_del(rocker_port, >u.vlan); + err = rocker_port_vlans_del(rocker_port, obj); break; case SWITCHDEV_OBJ_IPV4_FIB: - fib4 = >u.ipv4_fib; + fib4 = obj; err = rocker_port_fib_ipv4(rocker_port, NULL, htonl(fib4->dst), fib4->dst_len, fib4->fi, fib4->tb_id, ROCKER_OP_FLAG_REMOVE); break; case SWITCHDEV_OBJ_PORT_FDB: - err = rocker_port_fdb_del(rocker_port, NULL, >u.fdb); + err = rocker_port_fdb_del(rocker_port, NULL, obj); break; default: err = -EOPNOTSUPP; diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 0a80f2a..230fcfc 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -115,10 +115,12 @@ struct switchdev_ops { struct switchdev_attr *attr, struct switchdev_trans *trans); int (*switchdev_port_obj_add)(struct net_device *dev, - struct switchdev_obj *obj, + enum switchdev_obj_id id, + const void *obj, struct switchdev_trans *trans); int (*switchdev_port_obj_del)(struct net_device *dev, - struct switchdev_obj *obj); + enum switchdev_obj_id id, + const void *obj); int (*switchdev_port_obj_dump)(struct net_device *dev, enum switchdev_obj_id id, void *obj, int (*cb)(void *obj)); @@ -151,8 +153,10 @@ int
[PATCH v2 net-next 5/6] net: switchdev: abstract object in add/del ops
Similar to the notifier_call callback of a notifier_block, change the function signature of switchdev add and del operations to: int switchdev_port_obj_add/del(struct net_device *dev, enum switchdev_obj_id id, void *obj); This allows the caller to pass a specific switchdev_obj_* structure instead of the generic switchdev_obj one. Drivers implementation of these operations and switchdev have been changed accordingly. Signed-off-by: Vivien Didelot--- drivers/net/ethernet/rocker/rocker.c | 21 +++--- include/net/switchdev.h | 18 -- net/bridge/br_fdb.c | 11 ++-- net/bridge/br_vlan.c | 24 +++ net/dsa/slave.c | 20 +++--- net/switchdev/switchdev.c| 122 --- 6 files changed, 99 insertions(+), 117 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 107adb6..9773f5b 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4437,26 +4437,25 @@ static int rocker_port_fdb_add(struct rocker_port *rocker_port, } static int rocker_port_obj_add(struct net_device *dev, - struct switchdev_obj *obj, + enum switchdev_obj_id id, const void *obj, struct switchdev_trans *trans) { struct rocker_port *rocker_port = netdev_priv(dev); const struct switchdev_obj_ipv4_fib *fib4; int err = 0; - switch (obj->id) { + switch (id) { case SWITCHDEV_OBJ_PORT_VLAN: - err = rocker_port_vlans_add(rocker_port, trans, - >u.vlan); + err = rocker_port_vlans_add(rocker_port, trans, obj); break; case SWITCHDEV_OBJ_IPV4_FIB: - fib4 = >u.ipv4_fib; + fib4 = obj; err = rocker_port_fib_ipv4(rocker_port, trans, htonl(fib4->dst), fib4->dst_len, fib4->fi, fib4->tb_id, 0); break; case SWITCHDEV_OBJ_PORT_FDB: - err = rocker_port_fdb_add(rocker_port, trans, >u.fdb); + err = rocker_port_fdb_add(rocker_port, trans, obj); break; default: err = -EOPNOTSUPP; @@ -4509,25 +4508,25 @@ static int rocker_port_fdb_del(struct rocker_port *rocker_port, } static int rocker_port_obj_del(struct net_device *dev, - struct switchdev_obj *obj) + enum switchdev_obj_id id, const void *obj) { struct rocker_port *rocker_port = netdev_priv(dev); const struct switchdev_obj_ipv4_fib *fib4; int err = 0; - switch (obj->id) { + switch (id) { case SWITCHDEV_OBJ_PORT_VLAN: - err = rocker_port_vlans_del(rocker_port, >u.vlan); + err = rocker_port_vlans_del(rocker_port, obj); break; case SWITCHDEV_OBJ_IPV4_FIB: - fib4 = >u.ipv4_fib; + fib4 = obj; err = rocker_port_fib_ipv4(rocker_port, NULL, htonl(fib4->dst), fib4->dst_len, fib4->fi, fib4->tb_id, ROCKER_OP_FLAG_REMOVE); break; case SWITCHDEV_OBJ_PORT_FDB: - err = rocker_port_fdb_del(rocker_port, NULL, >u.fdb); + err = rocker_port_fdb_del(rocker_port, NULL, obj); break; default: err = -EOPNOTSUPP; diff --git a/include/net/switchdev.h b/include/net/switchdev.h index a2f57fb..bcadac3 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -115,10 +115,12 @@ struct switchdev_ops { struct switchdev_attr *attr, struct switchdev_trans *trans); int (*switchdev_port_obj_add)(struct net_device *dev, - struct switchdev_obj *obj, + enum switchdev_obj_id id, + const void *obj, struct switchdev_trans *trans); int (*switchdev_port_obj_del)(struct net_device *dev, - struct switchdev_obj *obj); + enum switchdev_obj_id id, + const void *obj); int (*switchdev_port_obj_dump)(struct net_device *dev, enum switchdev_obj_id id, void *obj, int (*cb)(void *obj)); @@ -151,8 +153,10 @@ int
Re: [RFT] geneve: implement support for IPv6-based tunnels
On Mon, 28 Sep 2015 15:20:33 -0400, John W. Linville wrote: > > To be really useful, geneve should open both IPv4 and IPv6 socket when > > it's metadata based. Take a look at my recent patchset that does this > > for vxlan: http://thread.gmane.org/gmane.linux.network/379282 > > OK, that seems simple enough. So we should just assume that a metadata > tunnel could do either protocol at any time? Or are there more rules > than that? That should be it, on egress. On ingress, udp_tun_rx_dst needs to be called with the appropriate family which seems to be missing from your patch, too (there's AF_INET unconditionally, currently). Jiri -- Jiri Benc -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: dsa: fix preparation of a port STP update
On Tue, Sep 29, 2015 at 12:38:36PM -0400, Vivien Didelot wrote: > Because of the default 0 value of ret in dsa_slave_port_attr_set, a > driver may return -EOPNOTSUPP from the commit phase of a STP state, > which triggers a WARN() from switchdev. > > This happened on a 6185 switch which does not support hardware bridging. > > Reported-by: Andrew Lunn> Signed-off-by: Vivien Didelot Acked-by: Andrew Lunn Fixes: 3563606258cf ("switchdev: convert STP update to switchdev attr set") David: This should be included in the next -rc. Thanks Andrew > --- > net/dsa/slave.c | 11 --- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/net/dsa/slave.c b/net/dsa/slave.c > index 0ae427c..02a3af8 100644 > --- a/net/dsa/slave.c > +++ b/net/dsa/slave.c > @@ -453,12 +453,17 @@ static int dsa_slave_port_attr_set(struct net_device > *dev, > struct switchdev_attr *attr, > struct switchdev_trans *trans) > { > - int ret = 0; > + struct dsa_slave_priv *p = netdev_priv(dev); > + struct dsa_switch *ds = p->parent; > + int ret; > > switch (attr->id) { > case SWITCHDEV_ATTR_PORT_STP_STATE: > - if (switchdev_trans_ph_commit(trans)) > - ret = dsa_slave_stp_update(dev, attr->u.stp_state); > + if (switchdev_trans_ph_prepare(trans)) > + ret = ds->drv->port_stp_update ? 0 : -EOPNOTSUPP; > + else > + ret = ds->drv->port_stp_update(ds, p->port, > +attr->u.stp_state); > break; > default: > ret = -EOPNOTSUPP; > -- > 2.6.0 > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists
On 09/29/2015 08:48 AM, Jesper Dangaard Brouer wrote: Make it possible to free a freelist with several objects by adjusting API of slab_free() and __slab_free() to have head, tail and an objects counter (cnt). Tail being NULL indicate single object free of head object. This allow compiler inline constant propagation in slab_free() and slab_free_freelist_hook() to avoid adding any overhead in case of single object free. This allows a freelist with several objects (all within the same slab-page) to be free'ed using a single locked cmpxchg_double in __slab_free() and with an unlocked cmpxchg_double in slab_free(). Object debugging on the free path is also extended to handle these freelists. When CONFIG_SLUB_DEBUG is enabled it will also detect if objects don't belong to the same slab-page. These changes are needed for the next patch to bulk free the detached freelists it introduces and constructs. Micro benchmarking showed no performance reduction due to this change, when debugging is turned off (compiled with CONFIG_SLUB_DEBUG). Signed-off-by: Jesper Dangaard BrouerSigned-off-by: Alexander Duyck --- V4: - Change API per req of Christoph Lameter - Remove comments in init_object. mm/slub.c | 87 - 1 file changed, 69 insertions(+), 18 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 1cf98d89546d..7c2abc33fd4e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1063,11 +1063,15 @@ bad: return 0; } +/* Supports checking bulk free of a constructed freelist */ static noinline struct kmem_cache_node *free_debug_processing( - struct kmem_cache *s, struct page *page, void *object, + struct kmem_cache *s, struct page *page, + void *head, void *tail, int bulk_cnt, unsigned long addr, unsigned long *flags) { struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + void *object = head; + int cnt = 0; spin_lock_irqsave(>list_lock, *flags); slab_lock(page); @@ -1075,6 +1079,9 @@ static noinline struct kmem_cache_node *free_debug_processing( if (!check_slab(s, page)) goto fail; +next_object: + cnt++; + if (!check_valid_pointer(s, page, object)) { slab_err(s, page, "Invalid object pointer 0x%p", object); goto fail; @@ -1105,8 +1112,19 @@ static noinline struct kmem_cache_node *free_debug_processing( if (s->flags & SLAB_STORE_USER) set_track(s, object, TRACK_FREE, addr); trace(s, page, object, 0); + /* Freepointer not overwritten by init_object(), SLAB_POISON moved it */ init_object(s, object, SLUB_RED_INACTIVE); + + /* Reached end of constructed freelist yet? */ + if (object != tail) { + object = get_freepointer(s, object); + goto next_object; + } out: + if (cnt != bulk_cnt) + slab_err(s, page, "Bulk freelist count(%d) invalid(%d)\n", +bulk_cnt, cnt); + slab_unlock(page); /* * Keep node_lock to preserve integrity @@ -1210,7 +1228,8 @@ static inline int alloc_debug_processing(struct kmem_cache *s, struct page *page, void *object, unsigned long addr) { return 0; } static inline struct kmem_cache_node *free_debug_processing( - struct kmem_cache *s, struct page *page, void *object, + struct kmem_cache *s, struct page *page, + void *head, void *tail, int bulk_cnt, unsigned long addr, unsigned long *flags) { return NULL; } static inline int slab_pad_check(struct kmem_cache *s, struct page *page) @@ -1306,6 +1325,31 @@ static inline void slab_free_hook(struct kmem_cache *s, void *x) kasan_slab_free(s, x); } +/* Compiler cannot detect that slab_free_freelist_hook() can be + * removed if slab_free_hook() evaluates to nothing. Thus, we need to + * catch all relevant config debug options here. + */ Is it actually generating nothing but a pointer walking loop or is there a bit of code cruft that is being evaluated inside the loop? +#if defined(CONFIG_KMEMCHECK) || \ + defined(CONFIG_LOCKDEP) || \ + defined(CONFIG_DEBUG_KMEMLEAK) || \ + defined(CONFIG_DEBUG_OBJECTS_FREE) || \ + defined(CONFIG_KASAN) +static inline void slab_free_freelist_hook(struct kmem_cache *s, + void *head, void *tail) +{ + void *object = head; + void *tail_obj = tail ? : head; + + do { + slab_free_hook(s, object); + } while ((object != tail_obj) && +(object = get_freepointer(s, object))); +} +#else +static inline void slab_free_freelist_hook(struct kmem_cache *s, void *obj_tail, + void *freelist_head) {} +#endif + Instead of messing around with an #else you might just wrap
[MM PATCH V4 3/6] slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG
The #ifdef of CONFIG_SLUB_DEBUG is located very far from the associated #else. For readability mark it with a comment. Signed-off-by: Jesper Dangaard BrouerAcked-by: Christoph Lameter --- mm/slub.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index 024eed32da2c..1cf98d89546d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1202,7 +1202,7 @@ unsigned long kmem_cache_flags(unsigned long object_size, return flags; } -#else +#else /* !CONFIG_SLUB_DEBUG */ static inline void setup_object_debug(struct kmem_cache *s, struct page *page, void *object) {} -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[MM PATCH V4 0/6] Further optimizing SLAB/SLUB bulking
Most important part of this patchset is the introducing of what I call detached freelist, for improving SLUB performance of object freeing in the "slowpath" of kmem_cache_free_bulk. Tagging patchset with "V4" to avoid confusion with "V2": (V2) http://thread.gmane.org/gmane.linux.kernel.mm/137469 Addressing comments from: ("V3") http://thread.gmane.org/gmane.linux.kernel.mm/139268 I've added Christoph Lameter's ACKs from prev review. * Only patch 5 is changed significantly and needs review. * Benchmarked, performance is the same Notes for patches: * First two patches (from Christoph) are already in AKPM MMOTS. * Patch 3 is trivial * Patch 4 is a repost, implements bulking for SLAB. - http://thread.gmane.org/gmane.linux.kernel.mm/138220 * Patch 5 and 6 are the important patches - Patch 5 handle "freelists" in slab_free() and __slab_free(). - Patch 6 intro detached freelists, and significant performance improvement Patches should be ready for the MM-tree, as I'm now handling kmem debug support. Based on top of commit 519f526d39 in net-next, but I've tested it applies on top of mmotm-2015-09-18-16-08. The benchmarking tools are avail here: https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm See: slab_bulk_test0{1,2,3}.c This was joint work with Alexander Duyck while still at Red Hat. This patchset is part of my network stack use-case. I'll post the network side of the patchset as soon as I've cleaned it up, rebased it on net-next and re-run all the benchmarks. --- Christoph Lameter (2): slub: create new ___slab_alloc function that can be called with irqs disabled slub: Avoid irqoff/on in bulk allocation Jesper Dangaard Brouer (4): slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG slab: implement bulking for SLAB allocator slub: support for bulk free with SLUB freelists slub: optimize bulk slowpath free by detached freelist mm/slab.c | 87 ++-- mm/slub.c | 263 +++-- 2 files changed, 249 insertions(+), 101 deletions(-) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 2/6] net: switchdev: move dev in switchdev_fdb_dump
The FDB dump callback requires the related net_device so move it to the struct switchdev_fdb_dump superset instead of using a callback param. With this done, it'll be simpler to change the dump function signature. Signed-off-by: Vivien Didelot--- net/switchdev/switchdev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 56d34ed..c0e2047 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -858,6 +858,7 @@ EXPORT_SYMBOL_GPL(switchdev_port_fdb_del); struct switchdev_fdb_dump { struct switchdev_obj obj; + struct net_device *dev; struct sk_buff *skb; struct netlink_callback *cb; int idx; @@ -887,7 +888,7 @@ static int switchdev_port_fdb_dump_cb(struct net_device *dev, ndm->ndm_pad2= 0; ndm->ndm_flags = NTF_SELF; ndm->ndm_type= 0; - ndm->ndm_ifindex = dev->ifindex; + ndm->ndm_ifindex = dump->dev->ifindex; ndm->ndm_state = obj->u.fdb.ndm_state; if (nla_put(dump->skb, NDA_LLADDR, ETH_ALEN, obj->u.fdb.addr)) @@ -927,6 +928,7 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb, .id = SWITCHDEV_OBJ_PORT_FDB, .cb = switchdev_port_fdb_dump_cb, }, + .dev = dev, .skb = skb, .cb = cb, .idx = idx, -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: dsa: fix preparation of a port STP update
Because of the default 0 value of ret in dsa_slave_port_attr_set, a driver may return -EOPNOTSUPP from the commit phase of a STP state, which triggers a WARN() from switchdev. This happened on a 6185 switch which does not support hardware bridging. Reported-by: Andrew LunnSigned-off-by: Vivien Didelot --- net/dsa/slave.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 0ae427c..02a3af8 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -453,12 +453,17 @@ static int dsa_slave_port_attr_set(struct net_device *dev, struct switchdev_attr *attr, struct switchdev_trans *trans) { - int ret = 0; + struct dsa_slave_priv *p = netdev_priv(dev); + struct dsa_switch *ds = p->parent; + int ret; switch (attr->id) { case SWITCHDEV_ATTR_PORT_STP_STATE: - if (switchdev_trans_ph_commit(trans)) - ret = dsa_slave_stp_update(dev, attr->u.stp_state); + if (switchdev_trans_ph_prepare(trans)) + ret = ds->drv->port_stp_update ? 0 : -EOPNOTSUPP; + else + ret = ds->drv->port_stp_update(ds, p->port, + attr->u.stp_state); break; default: ret = -EOPNOTSUPP; -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 1/6] net: switchdev: remove dev in port_vlan_dump_put
The static switchdev_port_vlan_dump_put function does not need the net_device parameter, so remove it. Signed-off-by: Vivien Didelot--- net/switchdev/switchdev.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 00ee547..56d34ed 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -484,8 +484,7 @@ struct switchdev_vlan_dump { u16 end; }; -static int switchdev_port_vlan_dump_put(struct net_device *dev, - struct switchdev_vlan_dump *dump) +static int switchdev_port_vlan_dump_put(struct switchdev_vlan_dump *dump) { struct bridge_vlan_info vinfo; @@ -531,7 +530,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device *dev, for (dump->begin = dump->end = vlan->vid_begin; dump->begin <= vlan->vid_end; dump->begin++, dump->end++) { - err = switchdev_port_vlan_dump_put(dev, dump); + err = switchdev_port_vlan_dump_put(dump); if (err) return err; } @@ -543,7 +542,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device *dev, /* prepend */ dump->begin = vlan->vid_begin; } else { - err = switchdev_port_vlan_dump_put(dev, dump); + err = switchdev_port_vlan_dump_put(dump); dump->flags = vlan->flags; dump->begin = vlan->vid_begin; dump->end = vlan->vid_end; @@ -555,7 +554,7 @@ static int switchdev_port_vlan_dump_cb(struct net_device *dev, /* append */ dump->end = vlan->vid_end; } else { - err = switchdev_port_vlan_dump_put(dev, dump); + err = switchdev_port_vlan_dump_put(dump); dump->flags = vlan->flags; dump->begin = vlan->vid_begin; dump->end = vlan->vid_end; @@ -588,7 +587,7 @@ static int switchdev_port_vlan_fill(struct sk_buff *skb, struct net_device *dev, goto err_out; if (filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED) /* last one */ - err = switchdev_port_vlan_dump_put(dev, ); + err = switchdev_port_vlan_dump_put(); } err_out: -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 4/6] net: switchdev: pass callback to dump operation
Similar to the notifier_call callback of a notifier_block, change the function signature of switchdev dump operation to: int switchdev_port_obj_dump(struct net_device *dev, enum switchdev_obj_id id, void *obj, int (*cb)(void *obj)); This allows the caller to pass and expect back a specific switchdev_obj_* structure instead of the generic switchdev_obj one. Drivers implementation of dump operation can now expect this specific structure and call the callback with it. Drivers have been changed accordingly. Signed-off-by: Vivien Didelot--- drivers/net/ethernet/rocker/rocker.c | 21 + include/net/switchdev.h | 9 +--- net/dsa/slave.c | 26 +++-- net/switchdev/switchdev.c| 45 ++-- 4 files changed, 53 insertions(+), 48 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 78fd443..107adb6 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4538,10 +4538,10 @@ static int rocker_port_obj_del(struct net_device *dev, } static int rocker_port_fdb_dump(const struct rocker_port *rocker_port, - struct switchdev_obj *obj) + struct switchdev_obj_fdb *fdb, + int (*cb)(void *obj)) { struct rocker *rocker = rocker_port->rocker; - struct switchdev_obj_fdb *fdb = >u.fdb; struct rocker_fdb_tbl_entry *found; struct hlist_node *tmp; unsigned long lock_flags; @@ -4556,7 +4556,7 @@ static int rocker_port_fdb_dump(const struct rocker_port *rocker_port, fdb->ndm_state = NUD_REACHABLE; fdb->vid = rocker_port_vlan_to_vid(rocker_port, found->key.vlan_id); - err = obj->cb(obj); + err = cb(fdb); if (err) break; } @@ -4566,9 +4566,9 @@ static int rocker_port_fdb_dump(const struct rocker_port *rocker_port, } static int rocker_port_vlan_dump(const struct rocker_port *rocker_port, -struct switchdev_obj *obj) +struct switchdev_obj_vlan *vlan, + int (*cb)(void *obj)) { - struct switchdev_obj_vlan *vlan = >u.vlan; u16 vid; int err = 0; @@ -4579,7 +4579,7 @@ static int rocker_port_vlan_dump(const struct rocker_port *rocker_port, if (rocker_vlan_id_is_internal(htons(vid))) vlan->flags |= BRIDGE_VLAN_INFO_PVID; vlan->vid_begin = vlan->vid_end = vid; - err = obj->cb(obj); + err = cb(vlan); if (err) break; } @@ -4588,17 +4588,18 @@ static int rocker_port_vlan_dump(const struct rocker_port *rocker_port, } static int rocker_port_obj_dump(struct net_device *dev, - struct switchdev_obj *obj) + enum switchdev_obj_id id, void *obj, + int (*cb)(void *obj)) { const struct rocker_port *rocker_port = netdev_priv(dev); int err = 0; - switch (obj->id) { + switch (id) { case SWITCHDEV_OBJ_PORT_FDB: - err = rocker_port_fdb_dump(rocker_port, obj); + err = rocker_port_fdb_dump(rocker_port, obj, cb); break; case SWITCHDEV_OBJ_PORT_VLAN: - err = rocker_port_vlan_dump(rocker_port, obj); + err = rocker_port_vlan_dump(rocker_port, obj, cb); break; default: err = -EOPNOTSUPP; diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 9ef7c56..a2f57fb 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -120,7 +120,8 @@ struct switchdev_ops { int (*switchdev_port_obj_del)(struct net_device *dev, struct switchdev_obj *obj); int (*switchdev_port_obj_dump)(struct net_device *dev, - struct switchdev_obj *obj); + enum switchdev_obj_id id, void *obj, + int (*cb)(void *obj)); }; enum switchdev_notifier_type { @@ -152,7 +153,8 @@ int switchdev_port_attr_set(struct net_device *dev, struct switchdev_attr *attr); int switchdev_port_obj_add(struct net_device *dev, struct switchdev_obj *obj); int switchdev_port_obj_del(struct net_device *dev, struct switchdev_obj *obj); -int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj); +int switchdev_port_obj_dump(struct net_device *dev, enum switchdev_obj_id id, +
Re: unregister_netdevice warnings when deleting netns
Hi Julian and Eric I tried both the patches which you have suggested, the issue is still seen, I am observing same warning message thrown on the console "unregister_netdevice: waiting for lo to become free. Usage count = 1". >Sometimes people have addressed this class of issue with code review, >but with a slow cleanup you can't catch this by finding a missing >dev_put. Yeah, currently since slow cleanup is happening I am unable to trace just by having count for dev_hold and dev_put. Actually at the time of tearing down the name space there is an active TCP connection present. When this TCP connection is not present then we are not seeing this issue. Any additional ideas and suggestions on debugging in above scenario? Best Regards, Anand On Tue, Sep 29, 2015 at 12:14 PM, Eric W. Biedermanwrote: > Anand Gurram writes: > >>>If the message just spits out a few times and then goes away it simply >>>means that something is taking a while to cleanup and drop it's >>>reference. >> >> The message just spits out few times and then goes away, I am trying >> to debug why cleanup is taking long, >> and where it is still referenced. Any pointers in debugging such >> issues will be of great help. > > The one thing I have done in the past is to instrument dev_hold > and dev_put and look where in the code the stragglers are coming from > (when I can reproduce the issue reliably). > > Sometimes people have addressed this class of issue with code review, > but with a slow cleanup you can't catch this by finding a missing > dev_put. > > It takes some creativity to find these as people rarely make the same > mistake twice. > > Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/6] net: switchdev: pass callback to dump operation
Hi Vivien, [auto build test results on next-20150929 -- if it's inappropriate base, please ignore] config: i386-randconfig-s1-201539 (attached as .config) reproduce: git checkout b215cce51157820c4fb92ecfdc72f281a4286676 # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): In file included from net/core/rtnetlink.c:47:0: >> include/net/switchdev.h:216:1: error: expected identifier or '(' before '{' >> token { ^ >> include/net/switchdev.h:213:19: warning: 'switchdev_port_obj_dump' declared >> 'static' but never defined [-Wunused-function] static inline int switchdev_port_obj_dump(struct net_device *dev, ^ vim +216 include/net/switchdev.h 491d0f15 Scott Feldman 2015-05-10 207 static inline int switchdev_port_obj_del(struct net_device *dev, 491d0f15 Scott Feldman 2015-05-10 208 struct switchdev_obj *obj) 491d0f15 Scott Feldman 2015-05-10 209 { 491d0f15 Scott Feldman 2015-05-10 210 return -EOPNOTSUPP; 491d0f15 Scott Feldman 2015-05-10 211 } 491d0f15 Scott Feldman 2015-05-10 212 45d4122c Samudrala, Sridhar 2015-05-13 @213 static inline int switchdev_port_obj_dump(struct net_device *dev, b215cce5 Vivien Didelot 2015-09-29 214 enum switchdev_obj_id id, void *obj, b215cce5 Vivien Didelot 2015-09-29 215 int (*cb)(void *obj)); 45d4122c Samudrala, Sridhar 2015-05-13 @216 { 45d4122c Samudrala, Sridhar 2015-05-13 217 return -EOPNOTSUPP; 45d4122c Samudrala, Sridhar 2015-05-13 218 } 45d4122c Samudrala, Sridhar 2015-05-13 219 :: The code at line 216 was first introduced by commit :: 45d4122ca7cdb3a4b91f392605cd22cfa75f1d99 switchdev: add support for fdb add/del/dump via switchdev_port_obj ops. :: TO: Samudrala, Sridhar <sridhar.samudr...@intel.com> :: CC: David S. Miller <da...@davemloft.net> --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
[PATCH] testptp: Silence compiler warnings on ppc64
When compiling Documentation/ptp/testptp.c the following compiler warnings are printed out: Documentation/ptp/testptp.c: In function ‘main’: Documentation/ptp/testptp.c:367:11: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘__s64’ [-Wformat=] event.t.sec, event.t.nsec); ^ Documentation/ptp/testptp.c:505:5: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘__s64’ [-Wformat=] (pct+2*i)->sec, (pct+2*i)->nsec); ^ Documentation/ptp/testptp.c:507:5: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘__s64’ [-Wformat=] (pct+2*i+1)->sec, (pct+2*i+1)->nsec); ^ Documentation/ptp/testptp.c:509:5: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘__s64’ [-Wformat=] (pct+2*i+2)->sec, (pct+2*i+2)->nsec); This happens because __s64 is by default defined as "long" on ppc64, not as "long long". However, to fix these warnings, it's possible to define the __SANE_USERSPACE_TYPES__ so that __s64 gets defined to "long long" on ppc64, too. Signed-off-by: Thomas Huth--- Documentation/ptp/testptp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c index 2bc8abc..6c6247a 100644 --- a/Documentation/ptp/testptp.c +++ b/Documentation/ptp/testptp.c @@ -18,6 +18,7 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #define _GNU_SOURCE +#define __SANE_USERSPACE_TYPES__/* For PPC64, to get LL64 types */ #include #include #include -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[MM PATCH V4 6/6] slub: optimize bulk slowpath free by detached freelist
This change focus on improving the speed of object freeing in the "slowpath" of kmem_cache_free_bulk. The calls slab_free (fastpath) and __slab_free (slowpath) have been extended with support for bulk free, which amortize the overhead of the (locked) cmpxchg_double. To use the new bulking feature, we build what I call a detached freelist. The detached freelist takes advantage of three properties: 1) the free function call owns the object that is about to be freed, thus writing into this memory is synchronization-free. 2) many freelist's can co-exist side-by-side in the same slab-page each with a separate head pointer. 3) it is the visibility of the head pointer that needs synchronization. Given these properties, the brilliant part is that the detached freelist can be constructed without any need for synchronization. The freelist is constructed directly in the page objects, without any synchronization needed. The detached freelist is allocated on the stack of the function call kmem_cache_free_bulk. Thus, the freelist head pointer is not visible to other CPUs. All objects in a SLUB freelist must belong to the same slab-page. Thus, constructing the detached freelist is about matching objects that belong to the same slab-page. The bulk free array is scanned is a progressive manor with a limited look-ahead facility. Kmem debug support is handled in call of slab_free(). Notice kmem_cache_free_bulk no longer need to disable IRQs. This only slowed down single free bulk with approx 3 cycles. Performance data: Benchmarked[1] obj size 256 bytes on CPU i7-4790K @ 4.00GHz SLUB fastpath single object quick reuse: 47 cycles(tsc) 11.931 ns To get stable and comparable numbers, the kernel have been booted with "slab_merge" (this also improve performance for larger bulk sizes). Performance data, compared against fallback bulking: bulk - fallback bulk- improvement with this patch 1 - 62 cycles(tsc) 15.662 ns - 49 cycles(tsc) 12.407 ns- improved 21.0% 2 - 55 cycles(tsc) 13.935 ns - 30 cycles(tsc) 7.506 ns - improved 45.5% 3 - 53 cycles(tsc) 13.341 ns - 23 cycles(tsc) 5.865 ns - improved 56.6% 4 - 52 cycles(tsc) 13.081 ns - 20 cycles(tsc) 5.048 ns - improved 61.5% 8 - 50 cycles(tsc) 12.627 ns - 18 cycles(tsc) 4.659 ns - improved 64.0% 16 - 49 cycles(tsc) 12.412 ns - 17 cycles(tsc) 4.495 ns - improved 65.3% 30 - 49 cycles(tsc) 12.484 ns - 18 cycles(tsc) 4.533 ns - improved 63.3% 32 - 50 cycles(tsc) 12.627 ns - 18 cycles(tsc) 4.707 ns - improved 64.0% 34 - 96 cycles(tsc) 24.243 ns - 23 cycles(tsc) 5.976 ns - improved 76.0% 48 - 83 cycles(tsc) 20.818 ns - 21 cycles(tsc) 5.329 ns - improved 74.7% 64 - 74 cycles(tsc) 18.700 ns - 20 cycles(tsc) 5.127 ns - improved 73.0% 128 - 90 cycles(tsc) 22.734 ns - 27 cycles(tsc) 6.833 ns - improved 70.0% 158 - 99 cycles(tsc) 24.776 ns - 30 cycles(tsc) 7.583 ns - improved 69.7% 250 - 104 cycles(tsc) 26.089 ns - 37 cycles(tsc) 9.280 ns - improved 64.4% Performance data, compared current in-kernel bulking: bulk - curr in-kernel - improvement with this patch 1 - 46 cycles(tsc) - 49 cycles(tsc) - improved (cycles:-3) -6.5% 2 - 27 cycles(tsc) - 30 cycles(tsc) - improved (cycles:-3) -11.1% 3 - 21 cycles(tsc) - 23 cycles(tsc) - improved (cycles:-2) -9.5% 4 - 18 cycles(tsc) - 20 cycles(tsc) - improved (cycles:-2) -11.1% 8 - 17 cycles(tsc) - 18 cycles(tsc) - improved (cycles:-1) -5.9% 16 - 18 cycles(tsc) - 17 cycles(tsc) - improved (cycles: 1) 5.6% 30 - 18 cycles(tsc) - 18 cycles(tsc) - improved (cycles: 0) 0.0% 32 - 18 cycles(tsc) - 18 cycles(tsc) - improved (cycles: 0) 0.0% 34 - 78 cycles(tsc) - 23 cycles(tsc) - improved (cycles:55) 70.5% 48 - 60 cycles(tsc) - 21 cycles(tsc) - improved (cycles:39) 65.0% 64 - 49 cycles(tsc) - 20 cycles(tsc) - improved (cycles:29) 59.2% 128 - 69 cycles(tsc) - 27 cycles(tsc) - improved (cycles:42) 60.9% 158 - 79 cycles(tsc) - 30 cycles(tsc) - improved (cycles:49) 62.0% 250 - 86 cycles(tsc) - 37 cycles(tsc) - improved (cycles:49) 57.0% Performance with normal SLUB merging is significantly slower for larger bulking. This is believed to (primarily) be an effect of not having to share the per-CPU data-structures, as tuning per-CPU size can achieve similar performance. bulk - slab_nomerge - normal SLUB merge 1 - 49 cycles(tsc) - 49 cycles(tsc) - merge slower with cycles:0 2 - 30 cycles(tsc) - 30 cycles(tsc) - merge slower with cycles:0 3 - 23 cycles(tsc) - 23 cycles(tsc) - merge slower with cycles:0 4 - 20 cycles(tsc) - 20 cycles(tsc) - merge slower with cycles:0 8 - 18 cycles(tsc) - 18 cycles(tsc) - merge slower with cycles:0 16 - 17 cycles(tsc) - 17 cycles(tsc) - merge slower with cycles:0 30 - 18 cycles(tsc) - 23 cycles(tsc) - merge slower with cycles:5 32 - 18 cycles(tsc) - 22 cycles(tsc) - merge slower with cycles:4 34 - 23 cycles(tsc) - 22 cycles(tsc) - merge slower with cycles:-1 48 - 21
[PATCH v2 net-next 6/6] net: switchdev: extract struct switchdev_obj_*
Now that switchdev and its drivers directly use specific switchdev_obj_* structures, move them out of the switchdev_obj union and get rif of this outer structure. Signed-off-by: Vivien Didelot--- include/net/switchdev.h | 53 - 1 file changed, 26 insertions(+), 27 deletions(-) diff --git a/include/net/switchdev.h b/include/net/switchdev.h index bcadac3..e11425e 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -64,30 +64,29 @@ enum switchdev_obj_id { SWITCHDEV_OBJ_PORT_FDB, }; -struct switchdev_obj { - enum switchdev_obj_id id; - int (*cb)(struct switchdev_obj *obj); - union { - struct switchdev_obj_vlan { /* PORT_VLAN */ - u16 flags; - u16 vid_begin; - u16 vid_end; - } vlan; - struct switchdev_obj_ipv4_fib { /* IPV4_FIB */ - u32 dst; - int dst_len; - struct fib_info *fi; - u8 tos; - u8 type; - u32 nlflags; - u32 tb_id; - } ipv4_fib; - struct switchdev_obj_fdb { /* PORT_FDB */ - const unsigned char *addr; - u16 vid; - u16 ndm_state; - } fdb; - } u; +/* SWITCHDEV_OBJ_PORT_VLAN */ +struct switchdev_obj_vlan { + u16 flags; + u16 vid_begin; + u16 vid_end; +}; + +/* SWITCHDEV_OBJ_IPV4_FIB */ +struct switchdev_obj_ipv4_fib { + u32 dst; + int dst_len; + struct fib_info *fi; + u8 tos; + u8 type; + u32 nlflags; + u32 tb_id; +}; + +/* SWITCHDEV_OBJ_PORT_FDB */ +struct switchdev_obj_fdb { + const unsigned char *addr; + u16 vid; + u16 ndm_state; }; void switchdev_trans_item_enqueue(struct switchdev_trans *trans, @@ -102,11 +101,11 @@ void *switchdev_trans_item_dequeue(struct switchdev_trans *trans); * * @switchdev_port_attr_set: Set a port attribute (see switchdev_attr). * - * @switchdev_port_obj_add: Add an object to port (see switchdev_obj). + * @switchdev_port_obj_add: Add an object to port (see switchdev_obj_*). * - * @switchdev_port_obj_del: Delete an object from port (see switchdev_obj). + * @switchdev_port_obj_del: Delete an object from port (see switchdev_obj_*). * - * @switchdev_port_obj_dump: Dump port objects (see switchdev_obj). + * @switchdev_port_obj_dump: Dump port objects (see switchdev_obj_*). */ struct switchdev_ops { int (*switchdev_port_attr_get)(struct net_device *dev, -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists
On Tue, 29 Sep 2015 09:38:30 -0700 Alexander Duyckwrote: > On 09/29/2015 08:48 AM, Jesper Dangaard Brouer wrote: > > Make it possible to free a freelist with several objects by adjusting > > API of slab_free() and __slab_free() to have head, tail and an objects > > counter (cnt). > > > > Tail being NULL indicate single object free of head object. This > > allow compiler inline constant propagation in slab_free() and > > slab_free_freelist_hook() to avoid adding any overhead in case of > > single object free. > > > > This allows a freelist with several objects (all within the same > > slab-page) to be free'ed using a single locked cmpxchg_double in > > __slab_free() and with an unlocked cmpxchg_double in slab_free(). > > > > Object debugging on the free path is also extended to handle these > > freelists. When CONFIG_SLUB_DEBUG is enabled it will also detect if > > objects don't belong to the same slab-page. > > > > These changes are needed for the next patch to bulk free the detached > > freelists it introduces and constructs. > > > > Micro benchmarking showed no performance reduction due to this change, > > when debugging is turned off (compiled with CONFIG_SLUB_DEBUG). > > > > Signed-off-by: Jesper Dangaard Brouer > > Signed-off-by: Alexander Duyck > > > > --- > > V4: > > - Change API per req of Christoph Lameter > > - Remove comments in init_object. > > [...] > > > > +/* Compiler cannot detect that slab_free_freelist_hook() can be > > + * removed if slab_free_hook() evaluates to nothing. Thus, we need to > > + * catch all relevant config debug options here. > > + */ > > Is it actually generating nothing but a pointer walking loop or is there > a bit of code cruft that is being evaluated inside the loop? If any of the defines are activated, then slab_free_hook(s, object) will generate some code. In the case of single object free, then the compiler see that it can remove the loop, and also notice if slab_free_hook() eval to nothing. The compiler is not smart enough to remove the loop for multiobject case, even-though it can see that slab_free_hook() eval to nothing (in that case it does a pointer walk without any code eval). Thus, I need this construct. > > +#if defined(CONFIG_KMEMCHECK) || \ > > + defined(CONFIG_LOCKDEP) || \ > > + defined(CONFIG_DEBUG_KMEMLEAK) || \ > > + defined(CONFIG_DEBUG_OBJECTS_FREE) || \ > > + defined(CONFIG_KASAN) > > +static inline void slab_free_freelist_hook(struct kmem_cache *s, > > + void *head, void *tail) > > +{ > > + void *object = head; > > + void *tail_obj = tail ? : head; > > + > > + do { > > + slab_free_hook(s, object); > > + } while ((object != tail_obj) && > > +(object = get_freepointer(s, object))); > > +} > > +#else > > +static inline void slab_free_freelist_hook(struct kmem_cache *s, void > > *obj_tail, > > + void *freelist_head) {} > > +#endif > > + > > Instead of messing around with an #else you might just wrap the contents > of slab_free_freelist_hook in the #if/#endif instead of the entire > function declaration. I had it that way in an earlier version of the patch, but I liked better this way. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fw: [Bug 105221] New: system panics under load on mlx4_en interfaces
Begin forwarded message: Date: Tue, 29 Sep 2015 07:19:32 + From: "bugzilla-dae...@bugzilla.kernel.org"To: "shemmin...@linux-foundation.org" Subject: [Bug 105221] New: system panics under load on mlx4_en interfaces https://bugzilla.kernel.org/show_bug.cgi?id=105221 Bug ID: 105221 Summary: system panics under load on mlx4_en interfaces Product: Networking Version: 2.5 Kernel Version: 4.3.0-rc3-vanilla Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Other Assignee: shemmin...@linux-foundation.org Reporter: tho...@drewermann.org Regression: No We are using HP ProLiant DL320e Gen8 with a dual port ConnectX-2 EN network Mellanox NIC (P/N: MNPH29D_A2-A5) installed. BIOS, iLO, microcode and NIC firwmwares are up to date. Already tried to change interrupts. All offloading features are currently disabled: Features for eth2: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off [fixed] receive-hashing: on highdma: on [fixed] rx-vlan-filter: on [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: on rx-vlan-stag-filter: on [fixed] l2-fwd-offload: off [fixed] busy-poll: on [fixed] When putting load on those NICs we are receiving a kpanic. The issue can be reproduced at any time. Kernel version doesn't make any difference. [ 176.892495] [ cut here ] [ 176.892513] kernel BUG at net/core/skbuff.c:2097! [ 176.892525] invalid opcode: [#1] SMP [ 176.892538] Modules linked in: cpufreq_stats cpufreq_userspace cpufreq_powersave iptable_filter cpufreq_conservative xt_CT nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc ip_gre ip_tunnel gre intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha256_ssse3 sha256_generic hmac drbg ansi_cprng aesni_intel mgag200 aes_x86_64 lrw ttm drm_kms_helper gf128mul glue_helper drm ablk_helper iTCO_wdt cryptd iTCO_vendor_support joydev evdev psmouse ie31200_edac serio_raw hpilo i2c_algo_bit edac_core lpc_ich hpwdt snd_pcm snd_timer snd 8250_fintek soundcore pcspkr mfd_core ipmi_si ipmi_msghandler shpchp button pcc_cpufreq acpi_cpufreq processor acpi_power_meter 8021q [ 176.892778] garp mrp stp llc dummy autofs4 ext4 crc16 mbcache jbd2 dm_mod mlx4_en vxlan ip6_udp_tunnel udp_tunnel sg sd_mod uas usb_storage scsi_mod hid_generic usbhid hid crc32c_intel mlx4_core ehci_pci uhci_hcd tg3 ehci_hcd ptp pps_core libphy usbcore usb_common thermal [ 176.892868] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.3.0-rc3-vanillaice #1 [ 176.892885] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013 [ 176.892902] task: 81814540 ti: 8180 task.ti: 8180 [ 176.892919] RIP: 0010:[] [] __skb_checksum+0x2d6/0x2f0 [ 176.892942] RSP: 0018:8802474038f8 EFLAGS: 00010286 [ 176.892955] RAX: 12f3 RBX: 12f3 RCX: 0ec6 [ 176.892972] RDX: 88022ce1d980 RSI: 12f3 RDI: 8800afed4400 [ 176.892988] RBP: R08: 880247403978 R09: 12f3 [ 176.893005] R10: 88022ce1d300 R11: 0002 R12: [ 176.893021] R13: R14: 12f3 R15: [ 176.893038] FS: () GS:88024740() knlGS: [ 176.893056] CS: 0010 DS: ES: CR0: 80050033 [ 176.893070] CR2: 7f42a19c CR3: 0180d000 CR4: 001406f0 [ 176.893086] Stack: [ 176.893092] b0ddb200 880247403978 12f3 81814540 [ 176.893113] 81814540 81814540 8800 [ 176.893134] 0246 8800afed4400
[MM PATCH V4 4/6] slab: implement bulking for SLAB allocator
Implement a basic approach of bulking in the SLAB allocator. Simply use local_irq_{disable,enable} and call single alloc/free in a loop. This simple implementation approach is surprising fast. Notice the normal SLAB fastpath is: 96 cycles (24.119 ns). Below table show that single object bulking only takes 42 cycles. This can be explained by the bulk APIs requirement to be called from a known interrupt context, that is with interrupts enabled. This allow us to avoid the expensive (37 cycles) local_irq_{save,restore}, and instead use the much faster (7 cycles) local_irq_{disable,restore}. Benchmarked[1] obj size 256 bytes on CPU i7-4790K @ 4.00GHz: bulk - Current - simple SLAB bulk implementation 1 - 115 cycles(tsc) 28.812 ns - 42 cycles(tsc) 10.715 ns - improved 63.5% 2 - 103 cycles(tsc) 25.956 ns - 27 cycles(tsc) 6.985 ns - improved 73.8% 3 - 101 cycles(tsc) 25.336 ns - 22 cycles(tsc) 5.733 ns - improved 78.2% 4 - 100 cycles(tsc) 25.147 ns - 21 cycles(tsc) 5.319 ns - improved 79.0% 8 - 98 cycles(tsc) 24.616 ns - 18 cycles(tsc) 4.620 ns - improved 81.6% 16 - 97 cycles(tsc) 24.408 ns - 17 cycles(tsc) 4.344 ns - improved 82.5% 30 - 98 cycles(tsc) 24.641 ns - 16 cycles(tsc) 4.202 ns - improved 83.7% 32 - 98 cycles(tsc) 24.607 ns - 16 cycles(tsc) 4.199 ns - improved 83.7% 34 - 98 cycles(tsc) 24.605 ns - 18 cycles(tsc) 4.579 ns - improved 81.6% 48 - 97 cycles(tsc) 24.463 ns - 17 cycles(tsc) 4.405 ns - improved 82.5% 64 - 97 cycles(tsc) 24.370 ns - 17 cycles(tsc) 4.384 ns - improved 82.5% 128 - 99 cycles(tsc) 24.763 ns - 19 cycles(tsc) 4.755 ns - improved 80.8% 158 - 98 cycles(tsc) 24.708 ns - 18 cycles(tsc) 4.723 ns - improved 81.6% 250 - 101 cycles(tsc) 25.342 ns - 20 cycles(tsc) 5.035 ns - improved 80.2% Also notice how well bulking maintains the performance when the bulk size increases (which is a soar spot for the SLUB allocator). Increasing the bulk size further: 20 cycles(tsc) 5.214 ns (bulk: 512) 30 cycles(tsc) 7.734 ns (bulk: 768) 40 cycles(tsc) 10.244 ns (bulk:1024) 72 cycles(tsc) 18.049 ns (bulk:2048) 90 cycles(tsc) 22.585 ns (bulk:4096) It is not recommended to perform large bulking with SLAB, as local interrupts are disabled for the entire period. If these kind of use-cases evolve, this interface should be adjusted to mitigate/reduce the interrupts off period. [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c Signed-off-by: Jesper Dangaard BrouerAcked-by: Christoph Lameter --- mm/slab.c | 87 +++-- 1 file changed, 62 insertions(+), 25 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index c77ebe6cc87c..21da6b1ccae3 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3234,11 +3234,15 @@ __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags) #endif /* CONFIG_NUMA */ static __always_inline void * -slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller) +slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller, + bool irq_off_needed) { unsigned long save_flags; void *objp; + /* Compiler need to remove irq_off_needed branch statements */ + BUILD_BUG_ON(!__builtin_constant_p(irq_off_needed)); + flags &= gfp_allowed_mask; lockdep_trace_alloc(flags); @@ -3249,9 +3253,11 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller) cachep = memcg_kmem_get_cache(cachep, flags); cache_alloc_debugcheck_before(cachep, flags); - local_irq_save(save_flags); + if (irq_off_needed) + local_irq_save(save_flags); objp = __do_cache_alloc(cachep, flags); - local_irq_restore(save_flags); + if (irq_off_needed) + local_irq_restore(save_flags); objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller); kmemleak_alloc_recursive(objp, cachep->object_size, 1, cachep->flags, flags); @@ -3407,7 +3413,7 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp, */ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) { - void *ret = slab_alloc(cachep, flags, _RET_IP_); + void *ret = slab_alloc(cachep, flags, _RET_IP_, true); trace_kmem_cache_alloc(_RET_IP_, ret, cachep->object_size, cachep->size, flags); @@ -3416,16 +3422,23 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) } EXPORT_SYMBOL(kmem_cache_alloc); -void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) -{ - __kmem_cache_free_bulk(s, size, p); -} -EXPORT_SYMBOL(kmem_cache_free_bulk); - +/* Note that interrupts must be enabled when calling this function. */ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, - void **p) +
[v4.3.0-rc2->3] Regression: BIG networking performance loss
With kernels vmlinuz-4.3.0-rc2-00228-gd4a748a and earlier it is no problem for me to stream HD-videos (700-800 Kbyte/s) from YouTube. With the same video material and kernels vmlinuz-4.3.0-rc2-00438-gd8cc397 and later I only reach 70-80 KByte/s. That's a one-tenth than before. The merges between 00228 -> 00438 are: d8cc397 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 c91d707 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending bcba282 Merge tag 'usb-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb fb740f9 Merge tag 'tty-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty b11e7b8 Merge tag 'staging-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging 7c1efea Merge tag 'driver-core-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core 64b796e Merge tag 'char-misc-4.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc 518a7cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07) Driver:r8169. Thanks, Jörg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 0/6] net: switchdev: use specific switchdev_obj_*
This patchset changes switchdev add, del, dump operations from this: int (*switchdev_port_obj_add)(struct net_device *dev, struct switchdev_obj *obj, struct switchdev_trans *trans); int (*switchdev_port_obj_del)(struct net_device *dev, struct switchdev_obj *obj); int (*switchdev_port_obj_dump)(struct net_device *dev, struct switchdev_obj *obj); to something similar to the notifier_call callback of a notifier_block: int (*switchdev_port_obj_add)(struct net_device *dev, enum switchdev_obj_id id, const void *obj, struct switchdev_trans *trans); int (*switchdev_port_obj_del)(struct net_device *dev, enum switchdev_obj_id id, const void *obj); int (*switchdev_port_obj_dump)(struct net_device *dev, enum switchdev_obj_id id, void *obj, int (*cb)(void *obj)); This allows the caller to pass and expect back a specific switchdev_obj_* structure (e.g. switchdev_obj_fdb) instead of the generic switchdev_obj one. This will simplify pushing the callback function down to the drivers. The first 3 patches get rid of the dev parameter of the dump callback, since it is not always neeeded (e.g. vlan_dump) and some drivers (such as DSA drivers) may not have easy access to it. Patches 4 and 5 implement the change in the switchdev operations and its users. Patch 6 extracts the inner switchdev_obj_* structures from switchdev_obj and removes this last one. v2: fix error spotted by kbuild (extra ';' inline switchdev_port_obj_dump). Vivien Didelot (6): net: switchdev: remove dev in port_vlan_dump_put net: switchdev: move dev in switchdev_fdb_dump net: switchdev: remove dev from switchdev_obj cb net: switchdev: pass callback to dump operation net: switchdev: abstract object in add/del ops net: switchdev: extract struct switchdev_obj_* drivers/net/ethernet/rocker/rocker.c | 42 include/net/switchdev.h | 80 --- net/bridge/br_fdb.c | 11 +-- net/bridge/br_vlan.c | 24 ++--- net/dsa/slave.c | 46 + net/switchdev/switchdev.c| 184 --- 6 files changed, 186 insertions(+), 201 deletions(-) -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 3/6] net: switchdev: remove dev from switchdev_obj cb
The net_device associated to a dump operation does not have to be passed to the callback. switchdev stores it in a superset struct, if needed. Also some drivers (such as DSA drivers) may not have easy access to it. This will simplify pushing the callback function down to the drivers. Signed-off-by: Vivien Didelot--- drivers/net/ethernet/rocker/rocker.c | 4 ++-- include/net/switchdev.h | 2 +- net/dsa/slave.c | 4 ++-- net/switchdev/switchdev.c| 6 ++ 4 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index d3f6632..78fd443 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4556,7 +4556,7 @@ static int rocker_port_fdb_dump(const struct rocker_port *rocker_port, fdb->ndm_state = NUD_REACHABLE; fdb->vid = rocker_port_vlan_to_vid(rocker_port, found->key.vlan_id); - err = obj->cb(rocker_port->dev, obj); + err = obj->cb(obj); if (err) break; } @@ -4579,7 +4579,7 @@ static int rocker_port_vlan_dump(const struct rocker_port *rocker_port, if (rocker_vlan_id_is_internal(htons(vid))) vlan->flags |= BRIDGE_VLAN_INFO_PVID; vlan->vid_begin = vlan->vid_end = vid; - err = obj->cb(rocker_port->dev, obj); + err = obj->cb(obj); if (err) break; } diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 1820787..9ef7c56 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -66,7 +66,7 @@ enum switchdev_obj_id { struct switchdev_obj { enum switchdev_obj_id id; - int (*cb)(struct net_device *dev, struct switchdev_obj *obj); + int (*cb)(struct switchdev_obj *obj); union { struct switchdev_obj_vlan { /* PORT_VLAN */ u16 flags; diff --git a/net/dsa/slave.c b/net/dsa/slave.c index f18cae5..0b47647 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -334,7 +334,7 @@ static int dsa_slave_port_vlan_dump(struct net_device *dev, if (test_bit(p->port, untagged)) vlan->flags |= BRIDGE_VLAN_INFO_UNTAGGED; - err = obj->cb(dev, obj); + err = obj->cb(obj); if (err) break; } @@ -397,7 +397,7 @@ static int dsa_slave_port_fdb_dump(struct net_device *dev, obj->u.fdb.vid = vid; obj->u.fdb.ndm_state = is_static ? NUD_NOARP : NUD_REACHABLE; - ret = obj->cb(dev, obj); + ret = obj->cb(obj); if (ret < 0) break; } diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index c0e2047..93f4971 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -514,8 +514,7 @@ static int switchdev_port_vlan_dump_put(struct switchdev_vlan_dump *dump) return 0; } -static int switchdev_port_vlan_dump_cb(struct net_device *dev, - struct switchdev_obj *obj) +static int switchdev_port_vlan_dump_cb(struct switchdev_obj *obj) { struct switchdev_vlan_dump *dump = container_of(obj, struct switchdev_vlan_dump, obj); @@ -864,8 +863,7 @@ struct switchdev_fdb_dump { int idx; }; -static int switchdev_port_fdb_dump_cb(struct net_device *dev, - struct switchdev_obj *obj) +static int switchdev_port_fdb_dump_cb(struct switchdev_obj *obj) { struct switchdev_fdb_dump *dump = container_of(obj, struct switchdev_fdb_dump, obj); -- 2.6.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] net: Add support for filtering neigh dump by master device
Add support for filtering neighbor dumps by master device by adding the NDA_MASTER attribute to the dump request. A new netlink flag, NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the request and output is filtered as requested. Signed-off-by: David Ahern--- v2 - added NLM_F_DUMP_FILTERED flag for userspace feedback that request is supported This method works for other filters as well and other dump commands. Works fine for all combinations of new and old kernel and new and old ip: 1. new ip command on old kernel, NDA_MASTER attribute is ignored 2. old ip command on new kernel, NDA_MASTER attribute is not present 3. new ip on new kernel ... goodness ensues by limiting data to only what user wants include/uapi/linux/netlink.h | 1 + net/core/neighbour.c | 32 +++- 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index 6f3fe16cd22a..f095155d8749 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -54,6 +54,7 @@ struct nlmsghdr { #define NLM_F_ACK 4 /* Reply with ack, with zero or error code */ #define NLM_F_ECHO 8 /* Echo this request*/ #define NLM_F_DUMP_INTR16 /* Dump was inconsistent due to sequence change */ +#define NLM_F_DUMP_FILTERED32 /* Dump was filtered as requested */ /* Modifiers to GET request */ #define NLM_F_ROOT 0x100 /* specify tree root*/ diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 2b515ba7e94f..8c57fdf4d68e 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2235,14 +2235,42 @@ static void neigh_update_notify(struct neighbour *neigh) __neigh_notify(neigh, RTM_NEWNEIGH, 0); } +static bool neigh_master_filtered(struct net_device *dev, int master_idx) +{ + struct net_device *master; + + if (!master_idx) + return false; + + master = netdev_master_upper_dev_get(dev); + if (!master || master->ifindex != master_idx) + return true; + + return false; +} + static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); + const struct nlmsghdr *nlh = cb->nlh; + struct nlattr *tb[NDA_MAX + 1]; struct neighbour *n; int rc, h, s_h = cb->args[1]; int idx, s_idx = idx = cb->args[2]; struct neigh_hash_table *nht; + int filter_master_idx = 0; + unsigned int flags = NLM_F_MULTI; + int err; + + err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL); + if (!err) { + if (tb[NDA_MASTER]) + filter_master_idx = nla_get_u32(tb[NDA_MASTER]); + + if (filter_master_idx) + flags |= NLM_F_DUMP_FILTERED; + } rcu_read_lock_bh(); nht = rcu_dereference_bh(tbl->nht); @@ -2255,12 +2283,14 @@ static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, n = rcu_dereference_bh(n->next)) { if (!net_eq(dev_net(n->dev), net)) continue; + if (neigh_master_filtered(n->dev, filter_master_idx)) + continue; if (idx < s_idx) goto next; if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, RTM_NEWNEIGH, - NLM_F_MULTI) < 0) { + flags) < 0) { rc = -1; goto out; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists
Make it possible to free a freelist with several objects by adjusting API of slab_free() and __slab_free() to have head, tail and an objects counter (cnt). Tail being NULL indicate single object free of head object. This allow compiler inline constant propagation in slab_free() and slab_free_freelist_hook() to avoid adding any overhead in case of single object free. This allows a freelist with several objects (all within the same slab-page) to be free'ed using a single locked cmpxchg_double in __slab_free() and with an unlocked cmpxchg_double in slab_free(). Object debugging on the free path is also extended to handle these freelists. When CONFIG_SLUB_DEBUG is enabled it will also detect if objects don't belong to the same slab-page. These changes are needed for the next patch to bulk free the detached freelists it introduces and constructs. Micro benchmarking showed no performance reduction due to this change, when debugging is turned off (compiled with CONFIG_SLUB_DEBUG). Signed-off-by: Jesper Dangaard BrouerSigned-off-by: Alexander Duyck --- V4: - Change API per req of Christoph Lameter - Remove comments in init_object. mm/slub.c | 87 - 1 file changed, 69 insertions(+), 18 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 1cf98d89546d..7c2abc33fd4e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1063,11 +1063,15 @@ bad: return 0; } +/* Supports checking bulk free of a constructed freelist */ static noinline struct kmem_cache_node *free_debug_processing( - struct kmem_cache *s, struct page *page, void *object, + struct kmem_cache *s, struct page *page, + void *head, void *tail, int bulk_cnt, unsigned long addr, unsigned long *flags) { struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + void *object = head; + int cnt = 0; spin_lock_irqsave(>list_lock, *flags); slab_lock(page); @@ -1075,6 +1079,9 @@ static noinline struct kmem_cache_node *free_debug_processing( if (!check_slab(s, page)) goto fail; +next_object: + cnt++; + if (!check_valid_pointer(s, page, object)) { slab_err(s, page, "Invalid object pointer 0x%p", object); goto fail; @@ -1105,8 +1112,19 @@ static noinline struct kmem_cache_node *free_debug_processing( if (s->flags & SLAB_STORE_USER) set_track(s, object, TRACK_FREE, addr); trace(s, page, object, 0); + /* Freepointer not overwritten by init_object(), SLAB_POISON moved it */ init_object(s, object, SLUB_RED_INACTIVE); + + /* Reached end of constructed freelist yet? */ + if (object != tail) { + object = get_freepointer(s, object); + goto next_object; + } out: + if (cnt != bulk_cnt) + slab_err(s, page, "Bulk freelist count(%d) invalid(%d)\n", +bulk_cnt, cnt); + slab_unlock(page); /* * Keep node_lock to preserve integrity @@ -1210,7 +1228,8 @@ static inline int alloc_debug_processing(struct kmem_cache *s, struct page *page, void *object, unsigned long addr) { return 0; } static inline struct kmem_cache_node *free_debug_processing( - struct kmem_cache *s, struct page *page, void *object, + struct kmem_cache *s, struct page *page, + void *head, void *tail, int bulk_cnt, unsigned long addr, unsigned long *flags) { return NULL; } static inline int slab_pad_check(struct kmem_cache *s, struct page *page) @@ -1306,6 +1325,31 @@ static inline void slab_free_hook(struct kmem_cache *s, void *x) kasan_slab_free(s, x); } +/* Compiler cannot detect that slab_free_freelist_hook() can be + * removed if slab_free_hook() evaluates to nothing. Thus, we need to + * catch all relevant config debug options here. + */ +#if defined(CONFIG_KMEMCHECK) || \ + defined(CONFIG_LOCKDEP) || \ + defined(CONFIG_DEBUG_KMEMLEAK) || \ + defined(CONFIG_DEBUG_OBJECTS_FREE) || \ + defined(CONFIG_KASAN) +static inline void slab_free_freelist_hook(struct kmem_cache *s, + void *head, void *tail) +{ + void *object = head; + void *tail_obj = tail ? : head; + + do { + slab_free_hook(s, object); + } while ((object != tail_obj) && +(object = get_freepointer(s, object))); +} +#else +static inline void slab_free_freelist_hook(struct kmem_cache *s, void *obj_tail, + void *freelist_head) {} +#endif + static void setup_object(struct kmem_cache *s, struct page *page, void *object) { @@ -2586,10 +2630,11 @@ EXPORT_SYMBOL(kmem_cache_alloc_node_trace); * handling required then we can return immediately. */ static void
[MM PATCH V4 2/6] slub: Avoid irqoff/on in bulk allocation
From: Christoph LameterNOTICE: Accepted by AKPM http://ozlabs.org/~akpm/mmots/broken-out/slub-avoid-irqoff-on-in-bulk-allocation.patch Use the new function that can do allocation while interrupts are disabled. Avoids irq on/off sequences. Signed-off-by: Christoph Lameter Signed-off-by: Jesper Dangaard Brouer --- mm/slub.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 02cfb3a5983e..024eed32da2c 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2821,30 +2821,23 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void *object = c->freelist; if (unlikely(!object)) { - local_irq_enable(); /* * Invoking slow path likely have side-effect * of re-populating per CPU c->freelist */ - p[i] = __slab_alloc(s, flags, NUMA_NO_NODE, + p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, _RET_IP_, c); - if (unlikely(!p[i])) { - __kmem_cache_free_bulk(s, i, p); - return false; - } - local_irq_disable(); + if (unlikely(!p[i])) + goto error; + c = this_cpu_ptr(s->cpu_slab); continue; /* goto for-loop */ } /* kmem_cache debug support */ s = slab_pre_alloc_hook(s, flags); - if (unlikely(!s)) { - __kmem_cache_free_bulk(s, i, p); - c->tid = next_tid(c->tid); - local_irq_enable(); - return false; - } + if (unlikely(!s)) + goto error; c->freelist = get_freepointer(s, object); p[i] = object; @@ -2864,6 +2857,11 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, } return true; + +error: + __kmem_cache_free_bulk(s, i, p); + local_irq_enable(); + return false; } EXPORT_SYMBOL(kmem_cache_alloc_bulk); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[MM PATCH V4 1/6] slub: create new ___slab_alloc function that can be called with irqs disabled
From: Christoph LameterNOTICE: Accepted by AKPM http://ozlabs.org/~akpm/mmots/broken-out/slub-create-new-___slab_alloc-function-that-can-be-called-with-irqs-disabled.patch Bulk alloc needs a function like that because it enables interrupts before calling __slab_alloc which promptly disables them again using the expensive local_irq_save(). Signed-off-by: Christoph Lameter Signed-off-by: Jesper Dangaard Brouer --- mm/slub.c | 44 +--- 1 file changed, 29 insertions(+), 15 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index f614b5dc396b..02cfb3a5983e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2298,23 +2298,15 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page) * And if we were unable to get a new slab from the partial slab lists then * we need to allocate a new slab. This is the slowest path since it involves * a call to the page allocator and the setup of a new slab. + * + * Version of __slab_alloc to use when we know that interrupts are + * already disabled (which is the case for bulk allocation). */ -static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, +static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, unsigned long addr, struct kmem_cache_cpu *c) { void *freelist; struct page *page; - unsigned long flags; - - local_irq_save(flags); -#ifdef CONFIG_PREEMPT - /* -* We may have been preempted and rescheduled on a different -* cpu before disabling interrupts. Need to reload cpu area -* pointer. -*/ - c = this_cpu_ptr(s->cpu_slab); -#endif page = c->page; if (!page) @@ -2372,7 +2364,6 @@ load_freelist: VM_BUG_ON(!c->page->frozen); c->freelist = get_freepointer(s, freelist); c->tid = next_tid(c->tid); - local_irq_restore(flags); return freelist; new_slab: @@ -2389,7 +2380,6 @@ new_slab: if (unlikely(!freelist)) { slab_out_of_memory(s, gfpflags, node); - local_irq_restore(flags); return NULL; } @@ -2405,11 +2395,35 @@ new_slab: deactivate_slab(s, page, get_freepointer(s, freelist)); c->page = NULL; c->freelist = NULL; - local_irq_restore(flags); return freelist; } /* + * Another one that disabled interrupt and compensates for possible + * cpu changes by refetching the per cpu area pointer. + */ +static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, + unsigned long addr, struct kmem_cache_cpu *c) +{ + void *p; + unsigned long flags; + + local_irq_save(flags); +#ifdef CONFIG_PREEMPT + /* +* We may have been preempted and rescheduled on a different +* cpu before disabling interrupts. Need to reload cpu area +* pointer. +*/ + c = this_cpu_ptr(s->cpu_slab); +#endif + + p = ___slab_alloc(s, gfpflags, node, addr, c); + local_irq_restore(flags); + return p; +} + +/* * Inlined fastpath so that allocation functions (kmalloc, kmem_cache_alloc) * have the fastpath folded into their functions. So no function call * overhead for requests that can be satisfied on the fastpath. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH bluetooth-next 1/4] netlink: add nla_get for le32 and le64
Hi Dave, > This patch adds missing inline wrappers for nla_get_le32 and > nla_get_le64. The 802.15.4 MAC byteorder is little endian and we keep > the byteorder for fields like address configuration in the same > byteorder as it comes from the MAC layer. > > To provide these fields for nl802154 userspace applications, we need > these inline wrappers for netlink. > > Cc: David S. Miller> Signed-off-by: Alexander Aring > --- > include/net/netlink.h | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/include/net/netlink.h b/include/net/netlink.h > index 2a5dbcc..0e31727 100644 > --- a/include/net/netlink.h > +++ b/include/net/netlink.h > @@ -1004,6 +1004,15 @@ static inline __be32 nla_get_be32(const struct nlattr > *nla) > } > > /** > + * nla_get_le32 - return payload of __le32 attribute > + * @nla: __le32 netlink attribute > + */ > +static inline __le32 nla_get_le32(const struct nlattr *nla) > +{ > + return *(__le32 *) nla_data(nla); > +} > + > +/** > * nla_get_u16 - return payload of u16 attribute > * @nla: u16 netlink attribute > */ > @@ -1066,6 +1075,15 @@ static inline __be64 nla_get_be64(const struct nlattr > *nla) > } > > /** > + * nla_get_le64 - return payload of __le64 attribute > + * @nla: __le64 netlink attribute > + */ > +static inline __le64 nla_get_le64(const struct nlattr *nla) > +{ > + return *(__le64 *) nla_data(nla); > +} > + > +/** > * nla_get_s32 - return payload of s32 attribute > * @nla: s32 netlink attribute > */ do you have any objections to me taking this change through the bluetooth-next tree? Regards Marcel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set
From: David AhernDate: Mon, 28 Sep 2015 10:12:13 -0700 > Wolfgang reported that IPv6 stack is ignoring oif in output route lookups: ... > The stack does consider the oif but a mismatch in rt6_device_match is not > considered fatal because RT6_LOOKUP_F_IFACE is not set in the flags. > > Cc: Wolfgang Nothdurft > Signed-off-by: David Ahern Applied, thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND: [PATCH v3 net-next] sky2: use random address if EEPROM is bad
From: Liviu DudauDate: Mon, 28 Sep 2015 17:51:51 +0100 > On some embedded systems the EEPROM does not contain a valid MAC address. > In that case it is better to fallback to a generated mac address and > let init scripts fix the value later. > > Reported-by: Liviu Dudau > Signed-off-by: Stephen Hemminger > [Changed handcoded setup to use eth_hw_addr_random() and to save new address > into HW] > Signed-off-by: Liviu Dudau Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] use napi_schedule_irqoff()
This patch set is meant to replace the calls to napi_schedule with napi_schedule_irqoff as this should help to reduce the interrupt overhead slightly by removing the unneeded call to local_irq_save and local_irq_restore. --- Alexander Duyck (3): ixgbe/ixgbevf: use napi_schedule_irqoff() i40e/i40evf: use napi_schedule_irqoff() fm10k: use napi_schedule_irqoff() drivers/net/ethernet/intel/fm10k/fm10k_pci.c |2 +- drivers/net/ethernet/intel/i40e/i40e_main.c |6 -- drivers/net/ethernet/intel/i40evf/i40evf_main.c |2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |4 ++-- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |2 +- 5 files changed, 9 insertions(+), 7 deletions(-) -- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] i40e/i40evf: use napi_schedule_irqoff()
The i40e_intr and i40e/i40evf_msix_clean_rings functions run from hard interrupt context or with interrupts already disabled in netpoll. They can use napi_schedule_irqoff() instead of napi_schedule() Signed-off-by: Alexander Duyck--- drivers/net/ethernet/intel/i40e/i40e_main.c |6 -- drivers/net/ethernet/intel/i40evf/i40evf_main.c |2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 484226e0365d..3cc97d4f5f70 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3281,7 +3281,7 @@ static irqreturn_t i40e_msix_clean_rings(int irq, void *data) if (!q_vector->tx.ring && !q_vector->rx.ring) return IRQ_HANDLED; - napi_schedule(_vector->napi); + napi_schedule_irqoff(_vector->napi); return IRQ_HANDLED; } @@ -3450,6 +3450,8 @@ static irqreturn_t i40e_intr(int irq, void *data) /* only q0 is used in MSI/Legacy mode, and none are used in MSIX */ if (icr0 & I40E_PFINT_ICR0_QUEUE_0_MASK) { + struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi]; + struct i40e_q_vector *q_vector = vsi->q_vectors[0]; /* temporarily disable queue cause for NAPI processing */ u32 qval = rd32(hw, I40E_QINT_RQCTL(0)); @@ -3462,7 +3464,7 @@ static irqreturn_t i40e_intr(int irq, void *data) wr32(hw, I40E_QINT_TQCTL(0), qval); if (!test_bit(__I40E_DOWN, >state)) - napi_schedule(>vsi[pf->lan_vsi]->q_vectors[0]->napi); + napi_schedule_irqoff(_vector->napi); } if (icr0 & I40E_PFINT_ICR0_ADMINQ_MASK) { diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 5e1336321c2f..4b3db099f58c 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -334,7 +334,7 @@ static irqreturn_t i40evf_msix_clean_rings(int irq, void *data) if (!q_vector->tx.ring && !q_vector->rx.ring) return IRQ_HANDLED; - napi_schedule(_vector->napi); + napi_schedule_irqoff(_vector->napi); return IRQ_HANDLED; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] fm10k: use napi_schedule_irqoff()
The fm10k_msix_clean_rings function runs from hard interrupt context or with interrupts already disabled in netpoll. It can use napi_schedule_irqoff() instead of napi_schedule() Signed-off-by: Alexander Duyck--- drivers/net/ethernet/intel/fm10k/fm10k_pci.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c index 74be792f3f1b..5fbffbaefe32 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c @@ -846,7 +846,7 @@ static irqreturn_t fm10k_msix_clean_rings(int __always_unused irq, void *data) struct fm10k_q_vector *q_vector = data; if (q_vector->rx.count || q_vector->tx.count) - napi_schedule(_vector->napi); + napi_schedule_irqoff(_vector->napi); return IRQ_HANDLED; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] skbuff: Fix skb checksum partial check.
From: Pravin B ShelarDate: Mon, 28 Sep 2015 17:24:25 -0700 > Earlier patch 6ae459bda tried to detect void ckecksum partial > skb by comparing pull length to checksum offset. But it does > not work for all cases since checksum-offset depends on > updates to skb->data. > > Following patch fixes it by validating checksum start offset > after skb-data pointer is updated. Negative value of checksum > offset start means there is no need to checksum. > > Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull") > Reported-by: Andrew Vagin > Signed-off-by: Pravin B Shelar > --- > This and 6ae459bda patches needs to be backported to stable. Applied and both queued up for -stable, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 00/11] net: L3 master device
On 9/29/15 5:23 PM, David Miller wrote: From: David AhernDate: Mon, 28 Sep 2015 10:16:50 -0700 v2 - rebased to top of net-next - addressed Niks comments (checking master, removing extra lines, and flipping the order of patches 1 and 2) This still needs some work: ERROR: "l3mdev_master_ifindex_rcu" [net/ipv6/ipv6.ko] undefined! scripts/Makefile.modpost:90: recipe for target '__modpost' failed make[1]: *** [__modpost] Error 1 Makefile:1095: recipe for target 'modules' failed make: *** [modules] Error 2 ugh. All of my builds have CONFIG_IPV6=y. Will kickout a v3 later. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] 802.1AD: Flow handling, actions, vlan parsing and netlink attributes
On Fri, Sep 25, 2015 at 3:35 PM, Thomas F Herbertwrote: > Pravin, > > Another comment and question. Please seen inline below. > > Thanks, > > --Tom > > On 9/24/15 7:42 PM, Pravin Shelar wrote: >> >> On Thu, Sep 24, 2015 at 10:58 AM, Thomas F Herbert >> wrote: >>> >>> Add support for 802.1ad including the ability to push and pop double >>> tagged vlans. Add support for 802.1ad to netlink parsing and flow >>> conversion. Uses double nested encap attributes to represent double >>> tagged vlan. Inner TPID encoded along with ctci in nested attributes. >>> >>> Signed-off-by: Thomas F Herbert >>> --- >>> net/openvswitch/flow.c | 83 + >>> net/openvswitch/flow.h | 5 ++ >>> net/openvswitch/flow_netlink.c | 166 >>> ++--- >>> 3 files changed, 230 insertions(+), 24 deletions(-) >>> ... >>> @@ -1320,6 +1437,7 @@ static int __ovs_nla_put_key(const struct >>> sw_flow_key *swkey, >>> { >>> struct ovs_key_ethernet *eth_key; >>> struct nlattr *nla, *encap; >>> + struct nlattr *in_encap = NULL; >>> >>> if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id)) >>> goto nla_put_failure; >>> @@ -1368,17 +1486,42 @@ static int __ovs_nla_put_key(const struct >>> sw_flow_key *swkey, >>> ether_addr_copy(eth_key->eth_src, output->eth.src); >>> ether_addr_copy(eth_key->eth_dst, output->eth.dst); >>> >>> - if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) { >>> + if (swkey->eth.tci || eth_type_vlan(swkey->eth.type)) { >>> __be16 eth_type; >>> - eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0x); >>> + >>> + if (swkey->eth.cvlan.ctci || >>> + eth_type_vlan(swkey->eth.cvlan.c_tpid)) >>> + eth_type = !is_mask ? htons(ETH_P_8021AD) : >>> + htons(0x); >>> + else >>> + eth_type = !is_mask ? htons(ETH_P_8021Q) : >>> + htons(0x); >>> + >> >> Here we can directly dump output->eth.type to netlink. No need to >> check for inner encap. > > The eth.type is set to the inner encapsulated protocol not to the tpid. We > don't "know" what the outer tpid so I assume it is 802.1Q. To address this > situation, do you think I should add the outer tpid to sw_flow_key? > Also see comment above in flow.h. > With the addition of nested vlan, we need to add outer tpid. This will simplify vlan netlink serialization too. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Poor IPv6 TCP performance in 4.3-rc3
Hi, I'm seeing really poor IPv6 performance compared to IPv4. I've checked using two different ARM platforms - an iMX6 platform using the FEC driver, and an Armada 38x using mvneta. The following was captured using iperf between the target system and my laptop. The problem only occurs one-way. The 4.3-rc3 platform is running iperf in server mode, the laptop is in client mode. Armada 38x: ipv6: [ 4] 0.0-23.9 sec 170 KBytes 58.3 Kbits/sec ipv4: [ 4] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec iMX6Q: ipv6: [ 4] 0.0-11.1 sec 640 KBytes 474 Kbits/sec ipv4: [ 4] 0.0-10.0 sec 655 MBytes 549 Mbits/sec iMX6D with 4.2: ipv6: [ 4] 0.0-10.0 sec 685 MBytes 574 Mbits/sec ipv4: [ 4] 0.0-10.0 sec 696 MBytes 583 Mbits/sec It looks like there's an IPv6 regression between 4.2 and 4.3-rc3. Turning GRO off on Armada 38x gives: ipv6: [ 4] 0.0-10.0 sec 1.08 GBytes 923 Mbits/sec ipv4: [ 5] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec I haven't started to debug yet, but I thought I'd post a heads-up in case it's a known problem. I'll try to get some packet logs on Thursday, and I'll try to bisect. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] bridge: vlan: add per-vlan struct and move to rhashtables
From: Nikolay AleksandrovDate: Fri, 25 Sep 2015 19:00:11 +0200 > This patch changes the bridge vlan implementation to use rhashtables > instead of bitmaps. This seems to be taking the code in a good direction, and I'm kinda happy to see more rhashtable users in the tree as well. So, applied to net-next, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/1] net sysfs: Print link speed as signed integer
From: Alexander SteinDate: Mon, 28 Sep 2015 15:05:33 +0200 > Otherwise 4294967295 (MBit/s) (-1) will be printed when there is no link. > Documentation/ABI/testing/sysfs-class-net does not state if this shall be > signed or unsigned. > Also remove the now unused variable fmt_udec. > > Signed-off-by: Alexander Stein Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/6] ila: Optimization to preserve value of early demux
In the current implementation of ILA, LWT is used to perform translation on both the input and output paths. This is functional, however there is a big performance hit in the receive path. Early demux occurs before the routing lookup (a hit actually obviates the route lookup). Therefore the stack currently performs early demux before translation so that a local connection with ILA addresses is never matched. Note that this issue is not just with ILA, but pretty much any translated or encapsulated packet handled by LWT would miss the opportunity for early demux. Solving the general problem seems non trivial since we would need to move the route lookup before early demx thereby mitigating the value. This patch set addresses the issue for ILA by adding a fast locator lookup that occurs before early demux. This is done by creating an XFRM hook to perform address translation early in the receive path. For the backend we implement an rhashtable that contains identifier to locator to mappings. The table also allows more specific matches that include original locator and interface. This patch set: - Add an rhashtable function to atomically replace and element. This is useful to implement sub-trees from a table entry without needing to use a special anchor structure as the table entry. - Add a start callback for starting a netlink dump. - Creates an ila directory under net/ipv6 and moves ila.c to it. ila.c is split into ila_common.c and ila_lwt.c. - Implement a table to do identifier->locator mapping. This is an rhashtable. - Configuration for the table with netlink. - Add XFRM xlat_addr facility. This includes a callback registeration function and hook to call registered callbacks. - Call xfrm6_xlat_addr from ipv6_rcv before NF_HOOK and routing. Testing: Running 200 netperf TCP_RR streams No ILA, baseline 85.72% CPU utilization 1861945 tps 93/163/330 50/90/99% latencies ILA before fix (LWT on both input and output) 83.47 CPU utilization 16583186 tps (-11% from baseline) 107/183/338 50/90/99% latencies ILA after fix (hook for input) 84.97% CPU utilization 1833948 tps (-1.5% from baseline) 95/164/331 50/90/99% latencies Hacked DNPT to do ILA 80.94% CPU utilization 1683315 tps (-10% from baseline) 104/179/350 50/90/99% latencies Tom Herbert (6): ila: Create net/ipv6/ila directory rhashtable: add function to replace an element netlink: add a start callback for starting a netlink dump xfrm: Add xfrm6 address translation function ipv6: Call xfrm6_xlat_addr from ipv6_rcv ila: Add support for xfrm6_xlat_addr include/linux/netlink.h| 2 + include/linux/rhashtable.h | 82 ++ include/net/genetlink.h| 2 + include/net/xfrm.h | 25 ++ include/uapi/linux/ila.h | 22 ++ net/ipv6/Kconfig | 5 + net/ipv6/Makefile | 3 +- net/ipv6/ila.c | 229 net/ipv6/ila/Makefile | 7 + net/ipv6/ila/ila.h | 48 net/ipv6/ila/ila_common.c | 103 net/ipv6/ila/ila_lwt.c | 152 +++ net/ipv6/ila/ila_xlat.c| 642 + net/ipv6/ip6_input.c | 3 + net/ipv6/xfrm6_policy.c| 7 + net/ipv6/xfrm6_xlat_addr.c | 66 + net/netlink/af_netlink.c | 4 + net/netlink/genetlink.c| 16 ++ 18 files changed, 1188 insertions(+), 230 deletions(-) delete mode 100644 net/ipv6/ila.c create mode 100644 net/ipv6/ila/Makefile create mode 100644 net/ipv6/ila/ila.h create mode 100644 net/ipv6/ila/ila_common.c create mode 100644 net/ipv6/ila/ila_lwt.c create mode 100644 net/ipv6/ila/ila_xlat.c create mode 100644 net/ipv6/xfrm6_xlat_addr.c -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/6] rhashtable: add function to replace an element
Add the rhashtable_replace_fast function. This replaces one object in the table with another atomically. The hashes of the new and old objects must be equal. Signed-off-by: Tom Herbert--- include/linux/rhashtable.h | 82 ++ 1 file changed, 82 insertions(+) diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index 843ceca..77deece 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -819,4 +819,86 @@ out: return err; } +/* Internal function, please use rhashtable_replace_fast() instead */ +static inline int __rhashtable_replace_fast( + struct rhashtable *ht, struct bucket_table *tbl, + struct rhash_head *obj_old, struct rhash_head *obj_new, + const struct rhashtable_params params) +{ + struct rhash_head __rcu **pprev; + struct rhash_head *he; + spinlock_t *lock; + unsigned int hash; + int err = -ENOENT; + + /* Minimally, the old and new objects must have same hash +* (which should mean identifiers are the same). +*/ + hash = rht_head_hashfn(ht, tbl, obj_old, params); + if (hash != rht_head_hashfn(ht, tbl, obj_new, params)) + return -EINVAL; + + lock = rht_bucket_lock(tbl, hash); + + spin_lock_bh(lock); + + pprev = >buckets[hash]; + rht_for_each(he, tbl, hash) { + if (he != obj_old) { + pprev = >next; + continue; + } + + rcu_assign_pointer(obj_new->next, obj_old->next); + rcu_assign_pointer(*pprev, obj_new); + err = 0; + break; + } + + spin_unlock_bh(lock); + + return err; +} + +/** + * rhashtable_replace_fast - replace an object in hash table + * @ht:hash table + * @obj_old: pointer to hash head inside object being replaced + * @obj_new: pointer to hash head inside object which is new + * @params:hash table parameters + * + * Replacing an object doesn't affect the number of elements in the hash table + * or bucket, so we don't need to worry about shrinking or expanding the + * table here. + * + * Returns zero on success, -ENOENT if the entry could not be found, + * -EINVAL if hash is not the same for the old and new objects. + */ +static inline int rhashtable_replace_fast( + struct rhashtable *ht, struct rhash_head *obj_old, + struct rhash_head *obj_new, + const struct rhashtable_params params) +{ + struct bucket_table *tbl; + int err; + + rcu_read_lock(); + + tbl = rht_dereference_rcu(ht->tbl, ht); + + /* Because we have already taken (and released) the bucket +* lock in old_tbl, if we find that future_tbl is not yet +* visible then that guarantees the entry to still be in +* the old tbl if it exists. +*/ + while ((err = __rhashtable_replace_fast(ht, tbl, obj_old, + obj_new, params)) && + (tbl = rht_dereference_rcu(tbl->future_tbl, ht))) + ; + + rcu_read_unlock(); + + return err; +} + #endif /* _LINUX_RHASHTABLE_H */ -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html