Re: [PATCH v7 net-next 01/16] gso: Remove arbitrary checks for unsupported GSO
On Fri, May 20, 2016 at 6:02 PM, Hannes Frederic Sowa wrote: > Hello, > > On 18.05.2016 18:06, Tom Herbert wrote: >> In several gso_segment functions there are checks of gso_type against >> a seemingly arbitrary list of SKB_GSO_* flags. This seems like an >> attempt to identify unsupported GSO types, but since the stack is >> the one that set these GSO types in the first place this seems >> unnecessary to do. If a combination isn't valid in the first >> place that stack should not allow setting it. >> >> This is a code simplication especially for add new GSO types. > > I couldn't still wrap my head around this. > > I wonder if this is safe in case of if the packet is generated from an > untrusted virtual machine over virtio_net? The problem is the test is kind of pointless. I think the original point of the tests was to prevent the case of IPIP and TSO6 or SIT and TSO4. However since we dropped IPIP and SIT and instead replaced them with the IPXIP6 and IPXIP4 there shouldn't be any combinations that don't work now other than UDP being combined with TCP segmentation. If we are wanting to prevent that case we might place something earlier in the segmentation code that would perform an and with all the L4 GSO types that should be mutually exclusive, verify that value is non-zero, and then and that value with itself - 1 to verify that exactly 1 bit is set in that grouping and no more than that. - Alex
Re: [PATCH 2/2] ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit path.
> On May 21, 2016, at 1:48 AM, David Miller wrote: > > From: Haishuang Yan > Date: Wed, 18 May 2016 18:05:52 +0800 > >> In gre6 xmit path, we are sending a GRE packet, so set fl6 proto >> to IPPROTO_GRE properly. >> >> Signed-off-by: Haishuang Yan > > I think it would be a lot better to initialize the flow protocol field > properly in ip6gre_tnl_link_config() instead of fixing it up every > single transmit. > > I agree, I will modify the changes in v2. Thanks Haishuang
Re: IPv6 extension header privileges
On (05/21/16 02:20), Hannes Frederic Sowa wrote: > > There are some options inherently protocol depending like the jumbo > payload option, which should be under control of the kernel, or the > router alert option for igmp, which causes packets to be steered towards > the slow/software path of routers, which can be used for DoS attacks. > > Setting CALIPSO options in IPv6 on packets as users would defeat the > whole CALIPSO model, etc. > > The RFC3542 requires at least some of the options in dst/hop-by-hop "requires" is a strong word. 3542 declares it as a "may" (lower case). The only thing required strongly is IPV6_NEXTHOP itself. I suspect 3542 was written at a time when hbh and dst opt were loosely defined and the "may" is just a place-holder (i.e., it's not even a MAY) > > AFAIK people worried about the parsing overhead and thus decided to > block it for ordinary users. That's probably more likely, esp for hbh options. It may also be interesting to find out what BSD does in these cases. --Sowmini
Re: [PATCH v7 net-next 01/16] gso: Remove arbitrary checks for unsupported GSO
Hello, On 18.05.2016 18:06, Tom Herbert wrote: > In several gso_segment functions there are checks of gso_type against > a seemingly arbitrary list of SKB_GSO_* flags. This seems like an > attempt to identify unsupported GSO types, but since the stack is > the one that set these GSO types in the first place this seems > unnecessary to do. If a combination isn't valid in the first > place that stack should not allow setting it. > > This is a code simplication especially for add new GSO types. I couldn't still wrap my head around this. I wonder if this is safe in case of if the packet is generated from an untrusted virtual machine over virtio_net? Bye, Hannes
Re: off-by-one in DecodeQ931
I'm a bit new to this; is this patch OK? On Tue, May 3, 2016 at 1:12 AM, Toby DiPasquale wrote: > On Mon, Apr 25, 2016 at 11:29 AM, Florian Westphal wrote: >> -> sz (size_t) will underflow here >> >> I'd suggest to change the if (sz < 1) to if (sz < 2) to >> resolve this, the while loop below has to be taken anyway. > > Thanks, Florian! Updated patch below: > > Signed-off-by: Toby DiPasquale > > diff --git a/net/netfilter/nf_conntrack_h323_asn1.c > b/net/netfilter/nf_conntrack_h323_asn1.c > index bcd5ed6..89b2e46 100644 > --- a/net/netfilter/nf_conntrack_h323_asn1.c > +++ b/net/netfilter/nf_conntrack_h323_asn1.c > @@ -846,9 +846,10 @@ int DecodeQ931(unsigned char *buf, size_t sz, Q931 *q931) > sz -= len; > > /* Message Type */ > - if (sz < 1) > + if (sz < 2) > return H323_ERROR_BOUND; > q931->MessageType = *p++; > + sz--; > PRINT("MessageType = %02X\n", q931->MessageType); > if (*p & 0x80) { > p++; -- Toby DiPasquale
Re: IPv6 extension header privileges
On 21.05.2016 00:37, Tom Herbert wrote: > Hi, > > In ipv6_sockglue.c I noticed: > > /* hop-by-hop / destination options are privileged option */ > retv = -EPERM; > if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW)) >break; > > Can anyone provide that rationale as to why these are privileged ops? There are some options inherently protocol depending like the jumbo payload option, which should be under control of the kernel, or the router alert option for igmp, which causes packets to be steered towards the slow/software path of routers, which can be used for DoS attacks. Setting CALIPSO options in IPv6 on packets as users would defeat the whole CALIPSO model, etc. The RFC3542 requires at least some of the options in dst/hop-by-hop extensions to be only be settable by privileged users. More fine-grained parsing and setting of those options has never been implemented. AFAIK people worried about the parsing overhead and thus decided to block it for ordinary users. Bye, Hannes
[GIT] Networking
1) Tunneling fixes from Tom Herbert and Alexander Duyck. 2) AF_UNIX updates some struct sock bit fields with the socket lock, whereas setsockopt() sets overlapping ones with locking. Seperate out the synchronized vs. the AF_UNIX unsynchronized ones to avoid corruption. From Andrey Ryabinin. 3) Mount BPF filesystem with mount_nodev rather than mount_ns, from Eric bieferman. 4) A couple kmemdup conversions, from Muhammad Falak R Wani. 5) BPF verifier fixes from Alexei Starovoitov. 6) Don't let tunneled UDP packets get stuck in socket queues, if something goes wrong during the encapsulation just drop the packet rather than signalling an error up the call stack. From Hannes Frederic Sowa. 7) SKB ref after free in batman-adv, from Florian Westphal. 8) TCP iSCSI, ocfs2, rds, and tipc have to disable BH in it's TCP callbacks since the TCP stack runs pre-emptibly now. From Eric Dumazet. 9) Fix crash in fixed_phy_add, from Rabin Vincent. 10) Fix length checks in xen-netback, from Paul Durrant. 11) Fix mixup in KEY vs. KEYID macsec attributes, from Sabrina Dubroca. 12) RDS connection spamming bug fixes from Sowmini Varadhan. Please pull, thanks a lot! The following changes since commit 07b75260ebc2c789724c594d7eaf0194fa47b3be: Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus (2016-05-19 10:02:26 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to 95829b3a9c0b1d88778b23bc2afdf5a83de066ff: net: suppress warnings on dev_alloc_skb (2016-05-20 19:58:32 -0400) Alan Liu (1): ath10k: add max_tx_power for QCA6174 WLAN.RM.2.0 firmware Alexander Duyck (2): ip6_gre: Do not allow segmentation offloads GRE_CSUM is enabled with FOU/GUE intel: Add support for IPv6 IP-in-IP offload Alexei Starovoitov (2): bpf: support decreasing order in direct packet access bpf: teach verifier to recognize imm += ptr pattern Amit Ghadge (1): net: Fix coding style warnings and errors. Andrey Ryabinin (1): net: sock: move ->sk_shutdown out of bitfields. Anilkumar Kolli (1): ath10k: fix kernel panic, move arvifs list head init before htt init Antonio Quartulli (1): batman-adv: make sure ELP/OGM orig MAC is updated on address change Arnd Bergmann (1): mlx5: avoid unused variable warning Bjorn Andersson (1): wcn36xx: Set SMD timeout to 10 seconds Christian Daudt (1): brcmfmac: Add 4356 sdio support Dan Carpenter (3): rtlwifi: rtl818x: silence uninitialized variable warning airo: prevent potential underflow in airo_set_freq() atmel: potential underflow in atmel_set_freq() Daniel Borkmann (1): bpf: rather use get_random_int for randomizations David S. Miller (8): Merge branch 'tcp_bh_fixes' Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Revert "net: pegasus: remove dead coding" Merge branch 'GREoIPV6' Merge branch 'rds-conn-spamming' Merge branch 'GREoIPV6-followups' Merge tag 'wireless-drivers-next-for-davem-2016-05-13' of git://git.kernel.org/.../kvalo/wireless-drivers-next Merge branch 'bpf-verifier-fixes' Denys Vlasenko (1): rtlwifi: rtl818x: Deinline indexed IO functions, save 21568 bytes Emmanuel Grumbach (4): iwlwifi: mvm: allow a debug knob for Tx A-MSDU even if rate control forbids it iwlwifi: remove IWLWIFI_DEBUG_EXPERIMENTAL_UCODE iwlwifi: don't access a nonexistent register upon assert iwlwifi: add default value to disable_11ac mod param description Eric Dumazet (4): scsi_tcp: block BH in TCP callbacks ocfs2/cluster: block BH in TCP callbacks rds: tcp: block BH in TCP callbacks tipc: block BH in TCP callbacks Eric W. Biederman (1): bpf: Use mount_nodev not mount_ns to mount the bpf filesystem Fabio Estevam (1): Revert "phy: add support for a reset-gpio specification" Florian Westphal (1): batman-adv: fix skb deref after free Golan Ben-Ami (1): iwlwifi: mvm: add more registers to dump upon error Gregory Greenman (2): iwlwifi: consider VHT 160MHz while parsing NVM iwlwifi: turn on SGI support for VHT 160MHz Guy Mishol (1): wlcore/wl12xx: Fix fw logger over sdio Haim Dreyfuss (4): iwlwifi: Rename 9560 to 9260 and add new PCI IDs for it iwlwifi: allow combining different phy images with mac images iwlwifi: Fix firmware name maximum length definition iwlwifi: pcie: don't wake up the NIC when writing CSRs in MSIX mode Hannes Frederic Sowa (1): udp: prevent skbs lingering in tunnel socket queues Helmut Schaa (4): ath9k: reuse ar9003_hw_tx_power_regwrite for tx99 setup ath9k: Move TX99 config option under ath9k debugging ath9k: Simplify ar9003_hw_tx99_set_txpower ath9
Re: [PATCHv2] net: suppress warnings on dev_alloc_skb
From: Neil Horman Date: Thu, 19 May 2016 11:30:54 -0400 > Noticed an allocation failure in a network driver the other day on a 32 bit > system: ... > Thought that perhaps the big splat in the logs wasn't really necessecary, as > all call sites for dev_alloc_skb: > > a) check the return code for the function > > and > > b) either print their own error message or have a recovery path that makes the > warning moot. > > Fix it by modifying dev_alloc_pages to pass __GFP_NOWARN as a gfp flag to > suppress the warning > > applies to the net tree > > Signed-off-by: Neil Horman Applied, thanks Neil.
Re: [PATCH net] udp: prevent skbs lingering in tunnel socket queues
From: Hannes Frederic Sowa Date: Thu, 19 May 2016 15:58:33 +0200 > In case we find a socket with encapsulation enabled we should call > the encap_recv function even if just a udp header without payload is > available. The callbacks are responsible for correctly verifying and > dropping the packets. > > Also, in case the header validation fails for geneve and vxlan we > shouldn't put the skb back into the socket queue, no one will pick > them up there. Instead we can simply discard them in the respective > encap_recv functions. > > Signed-off-by: Hannes Frederic Sowa Looks good, applied and queued up for -stable. Thanks!
Re: [PATCH net-next 0/2] bpf: verifier fixes
From: Alexei Starovoitov Date: Thu, 19 May 2016 18:17:12 -0700 > Further testing of 'direct packet access' uncovered > several usability issues. Fix them. Series applied, thanks Alexei.
Re: [PATCH] net:liquidio: use kmemdup
From: Muhammad Falak R Wani Date: Thu, 19 May 2016 19:22:49 +0530 > Use kmemdup when some other buffer is immediately copied into allocated > region. It replaces call to allocation followed by memcpy, by a single > call to kmemdup. > > Signed-off-by: Muhammad Falak R Wani Applied.
Re: [PATCH] net: usb: ch9200: use kmemdup
From: Muhammad Falak R Wani Date: Thu, 19 May 2016 19:26:50 +0530 > Use kmemdup when some other buffer is immediately copied into allocated > region. It replaces call to allocation followed by memcpy, by a single > call to kmemdup. > > Signed-off-by: Muhammad Falak R Wani Applied.
Re: [PATCH] ps3_gelic: use kmemdup
From: Muhammad Falak R Wani Date: Thu, 19 May 2016 19:24:41 +0530 > Use kmemdup when some other buffer is immediately copied into allocated > region. It replaces call to allocation followed by memcpy, by a single > call to kmemdup. > > Signed-off-by: Muhammad Falak R Wani Applied.
Re: [PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
From: ebied...@xmission.com (Eric W. Biederman) Date: Fri, 20 May 2016 17:22:48 -0500 > > While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the > bpf filesystem. Looking at the code I saw a broken usage of mount_ns > with current->nsproxy->mnt_ns. As the code does not acquire a > reference to the mount namespace it can not possibly be correct to > store the mount namespace on the superblock as it does. > > Replace mount_ns with mount_nodev so that each mount of the bpf > filesystem returns a distinct instance, and the code is not buggy. > > In discussion with Hannes Frederic Sowa it was reported that the use > of mount_ns was an attempt to have one bpf instance per mount > namespace, in an attempt to keep resources that pin resources from > hiding. That intent simply does not work, the vfs is not built to > allow that kind of behavior. Which means that the bpf filesystem > really is buggy both semantically and in it's implemenation as it does > not nor can it implement the original intent. > > This change is userspace visible, but my experience with similar > filesystems leads me to believe nothing will break with a model of each > mount of the bpf filesystem is distinct from all others. > > Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") > Cc: Hannes Frederic Sowa > Acked-by: Daniel Borkmann > Signed-off-by: "Eric W. Biederman" Applied and queued up for -stable, thanks everyone.
Re: pull-request: wireless-drivers-next 2016-05-13 (take two)
From: Kalle Valo Date: Thu, 19 May 2016 15:45:08 +0300 > this the second version of the last pull request to net-next for 4.7, > which got postponed due to the recent iwlwifi merge conflict. Now that > Linus fixed the merge problem in his tree I actually didn't have to fix > anything in my tree anymore. So that's why I still use the same tag as > in my previous pull request. > > The only dependency is that you need Linus' iwlwifi fix in your tree > before you pull this: > > 0e034f5c4bc4 iwlwifi: fix mis-merge that breaks the driver > > So if you could pull latest Linus' tree to net-next that would solve > that. I just did a test merge of that on your net-next tree and didn't > see any conflicts, so this pull should go smoothly. Also these patches > have been in linux-next at least a week now. But please let me know if > you see any issues. Done, pulled, thanks a lot! > And I think a lesson learned from this is that I need to immediately > merge wireless-drivers to wireless-drivers-next if there are any > conflicts between the trees. Or how do you suggest to handle cases like > that? You shouldn't have to worry about this kind of stuff at all, and let's just hope I handle the conflict better next time. :-)
Re: [RFC net-next] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
From: Eric Dumazet Date: Thu, 19 May 2016 05:35:20 -0700 > From: Eric Dumazet > > Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host > agent [1] are problematic at scale : > > For each qdisc/class found in the dump, we currently lock the root qdisc > spinlock in order to get stats. Sampling stats every 5 seconds from > thousands of HTB classes is a challenge when the root qdisc spinlock is > under high pressure. > > These stats are using u64 or u32 fields, so reading integral values > should not prevent writers from doing concurrent updates if the kernel > arch is a 64bit one. > > Being able to atomically fetch all counters like packets and bytes sent > at the expense of interfering in fast path (queue and dequeue packets) > is simply not worth the pain, as the values are generally stale after 1 > usec. > > These lock acquisitions slow down the fast path by 10 to 20 % > > An audit of existing qdiscs showed that sch_fq_codel is the only qdisc > that might need the qdisc lock in fq_codel_dump_stats() and > fq_codel_dump_class_stats() > > gnet_dump_force_lock() call is added there and could be added to other > qdisc stat handlers if needed. > > [1] > http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf > > Signed-off-by: Eric Dumazet I guess the off-by-one situations are not a big enough deal to code new locking or memory barrier for, so I'm fine with this. Please resubmit when I open net-next back up, thanks.
Re: [PATCH net V2] tuntap: correctly wake up process during uninit
From: Jason Wang Date: Thu, 19 May 2016 13:36:51 +0800 > We used to check dev->reg_state against NETREG_REGISTERED after each > time we are woke up. But after commit 9e641bdcfa4e ("net-tun: > restructure tun_do_read for better sleep/wakeup efficiency"), it uses > skb_recv_datagram() which does not check dev->reg_state. This will > result if we delete a tun/tap device after a process is blocked in the > reading. The device will wait for the reference count which was held > by that process for ever. > > Fixes this by using RCV_SHUTDOWN which will be checked during > sk_recv_datagram() before trying to wake up the process during uninit. > > Fixes: 9e641bdcfa4e ("net-tun: restructure tun_do_read for better > sleep/wakeup efficiency") > Cc: Eric Dumazet > Cc: Xi Wang > Cc: Michael S. Tsirkin > Signed-off-by: Jason Wang > --- > - The patch is needed for -stable. > - Changes from v1: remove unnecessary NETREG_REGISTERED check in tun_do_read() Applied and queued up for -stable.
Re: [PATCH net,stable v2] net: cdc_ncm: update datagram size after changing mtu
From: Robert Dobrowolski Date: Thu, 19 May 2016 11:56:09 +0200 > From: Rafal Redzimski > > Current implementation updates the mtu size and notify cdc_ncm > device using USB_CDC_SET_MAX_DATAGRAM_SIZE request about datagram > size change instead of changing rx_urb_size. > > Whenever mtu is being changed, datagram size should also be > updated. Also updating maxmtu formula so it takes max_datagram_size with > use of cdc_ncm_max_dgram_size() and not ctx. > > Signed-off-by: Robert Dobrowolski > Signed-off-by: Rafal Redzimski Appied, thanks.
Re: [PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
On 21.05.2016 00:22, Eric W. Biederman wrote: > > While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the > bpf filesystem. Looking at the code I saw a broken usage of mount_ns > with current->nsproxy->mnt_ns. As the code does not acquire a > reference to the mount namespace it can not possibly be correct to > store the mount namespace on the superblock as it does. > > Replace mount_ns with mount_nodev so that each mount of the bpf > filesystem returns a distinct instance, and the code is not buggy. > > In discussion with Hannes Frederic Sowa it was reported that the use > of mount_ns was an attempt to have one bpf instance per mount > namespace, in an attempt to keep resources that pin resources from > hiding. That intent simply does not work, the vfs is not built to > allow that kind of behavior. Which means that the bpf filesystem > really is buggy both semantically and in it's implemenation as it does > not nor can it implement the original intent. > > This change is userspace visible, but my experience with similar > filesystems leads me to believe nothing will break with a model of each > mount of the bpf filesystem is distinct from all others. > > Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") > Cc: Hannes Frederic Sowa > Acked-by: Daniel Borkmann > Signed-off-by: "Eric W. Biederman" Acked-by: Hannes Frederic Sowa Thanks Eric!
Re: [net-next PATCH 0/2] Follow-ups for GUEoIPv6 patches
From: Alexander Duyck Date: Wed, 18 May 2016 10:44:40 -0700 > This patch series is meant to be applied after: > [PATCH v7 net-next 00/16] ipv6: Enable GUEoIPv6 and more fixes for v6 > tunneling > > The first patch addresses an issue we already resolved in the GREv4 and is > now present in GREv6 with the introduction of FOU/GUE for IPv6 based GRE > tunnels. > > The second patch goes through and enables IPv6 tunnel offloads for the Intel > NICs that already support the IPv4 based IP-in-IP tunnel offloads. I have > only done a bit of touch testing but have seen ~20 Gb/s over an i40e > interface using a v4-in-v6 tunnel, and I have verified IPv6 GRE is still > passing traffic at around the same rate. I plan to do further testing but > with these patches present it should enable a wider audience to be able to > test the new features introduced in Tom's patchset with hardware offloads. Series applied, thanks Alex.
Re: [PATCH net 0/2] RDS: TCP: connection spamming fixes
From: Sowmini Varadhan Date: Wed, 18 May 2016 10:06:22 -0700 > We have been testing the RDS-TCP code with a connection spammer > that sends incoming SYNs to the RDS listen port well after > an rds-tcp connection has been established, and found a few > race-windows that are fixed by this patch series. > > Patch 1 avoids a null pointer deref when an incoming SYN > shows up when a netns is being dismantled, or when the > rds-tcp module is being unloaded. > > Patch 2 addresses the case when a SYN is received after the > connection arbitration algorithm has converged: the incoming > SYN should not needlessly quiesce the transmit path, and it > should not result in needless TCP connection resets due to > re-execution of the connection arbitration logic. Series applied, thanks.
Re: [RFC][PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
Hannes Frederic Sowa writes: > On 18.05.2016 22:43, Daniel Borkmann wrote: >> Eric, please send the patch officially and feel free to add my Ack. Done. >> Given >> the circumstances, moving to mount_nodev() seems the best way forward. To >> also address above mentioned concern from Hannes, we need to remove the >> FS_USERNS_MOUNT flag along with the change. It looks like the fix is best >> addressed in a single patch if you want to include it. If not, we can >> otherwise send it separately as well, I don't mind. > > I agree. Would make most sense to make the change in one patch. Later on > we can reason about if it makes sense to use the net namespace to split > bpf maps and programs or maybe even introduce a new primitive for that. I will let you two take care of the FS_USERNS_MOUNT flag. Removal of the FS_USERNS_MOUNT flag because it was added prematurely is a completely different analysis of consequences and possible regressions in userspace. The two changes should be kept separate to make it easy to handle the unlikely case that either of them cause a regression. But I have not objections to removing the FS_USERNS_MOUNT flag from the bpf filesystem. Eric
IPv6 extension header privileges
Hi, In ipv6_sockglue.c I noticed: /* hop-by-hop / destination options are privileged option */ retv = -EPERM; if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW)) break; Can anyone provide that rationale as to why these are privileged ops? Thanks, Tom
[PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the bpf filesystem. Looking at the code I saw a broken usage of mount_ns with current->nsproxy->mnt_ns. As the code does not acquire a reference to the mount namespace it can not possibly be correct to store the mount namespace on the superblock as it does. Replace mount_ns with mount_nodev so that each mount of the bpf filesystem returns a distinct instance, and the code is not buggy. In discussion with Hannes Frederic Sowa it was reported that the use of mount_ns was an attempt to have one bpf instance per mount namespace, in an attempt to keep resources that pin resources from hiding. That intent simply does not work, the vfs is not built to allow that kind of behavior. Which means that the bpf filesystem really is buggy both semantically and in it's implemenation as it does not nor can it implement the original intent. This change is userspace visible, but my experience with similar filesystems leads me to believe nothing will break with a model of each mount of the bpf filesystem is distinct from all others. Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") Cc: Hannes Frederic Sowa Acked-by: Daniel Borkmann Signed-off-by: "Eric W. Biederman" --- kernel/bpf/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 8f94ca1860cf..55d923688f85 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -378,7 +378,7 @@ static int bpf_fill_super(struct super_block *sb, void *data, int silent) static struct dentry *bpf_mount(struct file_system_type *type, int flags, const char *dev_name, void *data) { - return mount_ns(type, flags, current->nsproxy->mnt_ns, bpf_fill_super); + return mount_nodev(type, flags, data, bpf_fill_super); } static struct file_system_type bpf_fs_type = { -- 2.8.1
Re: [PATCH v4] net: sock: move ->sk_shutdown out of bitfields.
From: Andrey Ryabinin Date: Wed, 18 May 2016 19:19:27 +0300 > ->sk_shutdown bits share one bitfield with some other bits in sock struct, > such as ->sk_no_check_[r,t]x, ->sk_userlocks ... > sock_setsockopt() may write to these bits, while holding the socket lock. > > In case of AF_UNIX sockets, we change ->sk_shutdown bits while holding only > unix_state_lock(). So concurrent setsockopt() and shutdown() may lead > to corrupting these bits. > > Fix this by moving ->sk_shutdown bits out of bitfield into a separate byte. > This will not change the 'struct sock' size since ->sk_shutdown moved into > previously unused 16-bit hole. > > Signed-off-by: Andrey Ryabinin > Suggested-by: Hannes Frederic Sowa Looks good, applied and queued up for -stable. Thanks.
Re: [PATCH v7 net-next 00/16] ipv6: Enable GUEoIPv6 and more fixes for v6 tunneling
From: Tom Herbert Date: Wed, 18 May 2016 09:06:08 -0700 > This patch set: > - Fixes GRE6 to process translate flags correctly from configuration > - Adds support for GSO and GRO for ip6ip6 and ip4ip6 > - Add support for FOU and GUE in IPv6 > - Support GRE, ip6ip6 and ip4ip6 over FOU/GUE > - Fixes ip6_input to deal with UDP encapsulations > - Some other minor fixes Series applied, thanks Tom.
Re: [PATCH] Revert "phy: add support for a reset-gpio specification"
From: Fabio Estevam Date: Wed, 18 May 2016 13:05:00 -0300 > Commit da47b4572056 ("phy: add support for a reset-gpio specification") > causes the following xtensa qemu crash according to Guenter Roeck: ... > This reverts commit da47b4572056487fd7941c26f73b3e8815ff712a. > > Reported-by: Guenter Roeck > Signed-off-by: Fabio Estevam Applied, thanks.
Re: [PATCH net-next] xen-netback: only deinitialized hash if it was initialized
From: Paul Durrant Date: Wed, 18 May 2016 15:55:42 +0100 > A domain with a frontend that does not implement a control ring has been > seen to cause a crash during domain save. This was apparently because > the call to xenvif_deinit_hash() in xenvif_disconnect_ctrl() is made > regardless of whether a control ring was connected, and hence > xenvif_hash_init() was called. > > This patch brings the call to xenvif_deinit_hash() in > xenvif_disconnect_ctrl() inside the if clause that checks whether the > control ring event channel was connected. This is sufficient to ensure > it is only called if xenvif_init_hash() was called previously. > > Signed-off-by: Paul Durrant > Reported-by: Boris Ostrovsky > Tested-by: Boris Ostrovsky Applied, thanks.
Re: [RFC] net: remove busylock
On Fri, 2016-05-20 at 19:49 +0200, Jesper Dangaard Brouer wrote: > On Fri, 20 May 2016 07:16:55 -0700 > Eric Dumazet wrote: > > > Since bonding pretends to be multiqueue, TCQ_F_ONETXQUEUE is not set > > on sch->flags when HTB is installed at the bonding device root. > > If would be cool if you could run a test with removed busylock and > allow HTB to bulk dequeue. I added a /sys/class/net/eth0/tx_bulk_limit to tune the number of packets that we could bulk dequeue from a virtual device (no BQL on them) 200 TCP_RR through HTB on eth0 (bonding device) 1) busylock enabled With tx_bulk_limit set to 8, we get 12.7 % increase. lpaa23:~# for f in `seq 1 16`; do echo $f|tee /sys/class/net/eth0/tx_bulk_limit; sar -n DEV 3 3|grep eth0|grep Average; done 1 Average: eth0 868625.67 868487.44 57055.69 56826.37 0.00 0.00 0.56 2 Average: eth0 927081.67 926920.78 60923.83 60649.90 0.00 0.00 0.44 3 Average: eth0 957678.11 957554.00 62877.04 62653.89 0.00 0.00 0.56 4 Average: eth0 966912.44 966747.33 63532.72 63255.51 0.00 0.00 0.56 5 Average: eth0 973137.56 972950.44 63958.31 63661.39 0.00 0.00 0.44 6 Average: eth0 958589.22 958449.44 62961.79 62712.56 0.00 0.00 0.67 7 Average: eth0 960737.67 960672.22 62968.34 62857.97 0.00 0.00 0.44 8 Average: eth0 979271.78 979201.67 64199.47 64070.84 0.00 0.00 0.56 9 Average: eth0 982464.33 982390.33 64418.42 64278.93 0.00 0.00 0.56 10 Average: eth0 982698.00 982578.22 64495.25 64291.28 0.00 0.00 0.44 11 Average: eth0 981862.22 981746.00 64438.16 64236.31 0.00 0.00 0.56 12 Average: eth0 983277.44 983096.33 64632.79 64327.79 0.00 0.00 0.44 13 Average: eth0 981221.11 981018.00 64521.82 64189.26 0.00 0.00 0.67 14 Average: eth0 981754.11 981555.89 64553.19 64224.39 0.00 0.00 0.44 15 Average: eth0 982484.33 982301.67 64572.00 64273.38 0.00 0.00 0.44 16 Average: eth0 978529.56 978326.67 64350.89 64013.39 0.00 0.00 0.67 2) busylock disabled Well, bulk dequeue helps, but does not close the gap. (busylock is needed) lpaa23:~# for f in `seq 1 16`; do echo $f|tee /sys/class/net/eth0/tx_bulk_limit; sar -n DEV 3 3|grep eth0|grep Average; done 1 Average: eth0 795408.44 795407.67 52044.66 52045.93 0.00 0.00 0.56 2 Average: eth0 843411.78 843415.11 55185.23 55184.51 0.00 0.00 0.56 3 Average: eth0 876175.89 876175.00 57329.50 57327.98 0.00 0.00 0.44 4 Average: eth0 890631.22 890629.44 58274.58 58274.25 0.00 0.00 0.67 5 Average: eth0 900672.00 900668.89 58931.29 58930.54 0.00 0.00 0.44 6 Average: eth0 908325.78 908328.22 59432.97 59431.76 0.00 0.00 0.56 7 Average: eth0 913895.33 913885.11 59796.89 59795.46 0.00 0.00 0.56 8 Average: eth0 914429.11 914433.56 59832.26 59831.23 0.00 0.00 0.67 9 Average: eth0 918701.11 918699.67 60110.68 60110.36 0.00 0.00 0.55 10 Average: eth0 920382.33 920376.56 60223.31 60220.54 0.00 0.00 0.67 11 Average: eth0 914341.67 914344.67 59826.25 59825.90 0.00 0.00 0.67 12 Average: eth0 912697.00 912693.78 59718.77 59717.44 0.00 0.00 0.44 13 Average: eth0 917392.56 917385.00 60025.79 60024.34 0.00 0.00 0.44 14 Average: eth0 918232.89 918233.78 60081.04 60079.94 0.00 0.00 0.67 15 Average: eth0 918377.11 918381.00 60091.14 60089.79 0.00 0.00 0.44 16 Average: eth0 913817.56 913812.33 59792.09 59790.66 0.00 0.00 0.56
Re: [PATCH 1/3] netfilter: ipset: use setup_timer() and mod_timer().
On Sat, 14 May 2016, Muhammad Falak R Wani wrote: > Use setup_timer() and instead of init_timer(), being the preferred way > of setting up a timer. > > Also, quoting the mod_timer() function comment: > -> mod_timer() is a more efficient way to update the expire field of an >active timer (if the timer is inactive it will be activated). > > Use setup_timer() and mod_timer() to setup and arm a timer, making the > code compact and easier to read. > > Signed-off-by: Muhammad Falak R Wani > --- > net/netfilter/ipset/ip_set_bitmap_gen.h | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) The patch and the other too in the series as well are applied in the ipset git repository and will be submitted for kernel inclusion. Thanks. Best regards, Jozsef - E-mail : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences H-1525 Budapest 114, POB. 49, Hungary
Re: [PATCH net-next 5/7] Driver: Vmxnet3: Add support for get_coalesce, set_coalesce ethtool operations
On Sun, 8 May 2016, Ben Hutchings wrote: > > Would a patch that maps 0 to 'no coalescing' be acceptable? That is: > > > > rx-usecs = 0 -> coalescing disabled. > > rx-usecs = 1 -> default (chosen by the device). > > rx-usecs = 2 -> adaptive coalescing. > > rx-usecs = 3 -> static coalescing. > > I still don't like it much. For the 3 special values (0 isn't really > special): > > 1 = default: When the driver sets the virtual device to this mode, can it > then read back what the actual settings are, or are they hidden? If it can, > then userland can also read the defaults and explicitly return to them later. > But I do see the usefulness of an explicit request to reset to defaults. > > 2 = adaptive coalescing: There are already fields to request adaptive > coalescing; you should support them. > > 3 = static coalescing: I don't understand what this means. static refers to the number of packets to batch before raising an interrupt - which maps to existing tx_max_coaleced_frames. Have sent out v2 of the patch that no longer uses special values of rx-usecs to distinguish between the coalescing modes. Existing ethtool_coalesce fields are used as appropriate instead. In v2, driver can no longer issue "revert to defaults" command (think such a mechanism might be useful addition but belongs to ethtool framework). Thanks, Shri
Re: [PATCH] net: alx: use custom skb allocator
On Fri, May 20, 2016 at 03:56:23PM +0800, Feng Tang wrote: > Hi, > > Please ignore this patch. > > I found the problem and made the patch with kernel 4.4 with Ubuntu 12.04 > on Lenovo Y580. > > After submitting the patch, I tried to google the datasheet for > Atheros 8161 dev, and found there was already a kernel bugzilla > https://bugzilla.kernel.org/show_bug.cgi?id=70761 > that had a fix from Jarod Wilson which get merged in v4.5 (commit c406700cd) > > Per my test, the new 4.6.0 kernel works fine with Jarod's patch. Might not hurt to get this some testing by people for whom my patch didn't fix things. It seems to help for a fair number of people, but not everyone, so perhaps this would be of use as well. Personally, my own alx-driven hardware didn't have a problem to begin with, so I'm not able to do more than make sure things don't regress w/already functioning hardware. > On Fri, May 20, 2016 at 01:41:03PM +0800, Feng Tang wrote: > > This patch follows Eric Dumazet's commit 7b70176421 to fix one > > exactly same bug in alx driver, that the network link will > > be lost in 1-5 minutes after the device is up. > > > > Following is a git log from Eric's 7b70176421: > > > > "We had reports ( https://bugzilla.kernel.org/show_bug.cgi?id=54021 ) > > that using high order pages for skb allocations is problematic for atl1c > > > > We do not know exactly what the problem is, but we suspect that crossing > > 4K pages is not well supported by this hardware. > > > > Use a custom allocator, using page allocator and 2K fragments for > > optimal stack behavior. We might make this allocator generic > > in future kernels." > > > > And my debug shows the same suspect, most of the errors happen > > when there is a RX buffer address with 0x..f80, hopefully > > this will get noticed and fixed from silicon side. > > > > My victim is a Lenovo Y580 Laptop with Atheros ALX AR8161 etherenet > > device(PCI ID 1969:1091), with this patch the ethernet dev > > works just fine > > > > Signed-off-by: Feng Tang > > --- > > drivers/net/ethernet/atheros/alx/alx.h | 4 +++ > > drivers/net/ethernet/atheros/alx/main.c | 48 > > - > > 2 files changed, 51 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/atheros/alx/alx.h > > b/drivers/net/ethernet/atheros/alx/alx.h > > index 8fc93c5..d02c424 100644 > > --- a/drivers/net/ethernet/atheros/alx/alx.h > > +++ b/drivers/net/ethernet/atheros/alx/alx.h > > @@ -96,6 +96,10 @@ struct alx_priv { > > unsigned int rx_ringsz; > > unsigned int rxbuf_size; > > > > + struct page *rx_page; > > + unsigned int rx_page_offset; > > + unsigned int rx_frag_size; > > + > > struct napi_struct napi; > > struct alx_tx_queue txq; > > struct alx_rx_queue rxq; > > diff --git a/drivers/net/ethernet/atheros/alx/main.c > > b/drivers/net/ethernet/atheros/alx/main.c > > index 9fe8b5e..c98acdc 100644 > > --- a/drivers/net/ethernet/atheros/alx/main.c > > +++ b/drivers/net/ethernet/atheros/alx/main.c > > @@ -70,6 +70,35 @@ static void alx_free_txbuf(struct alx_priv *alx, int > > entry) > > } > > } > > > > +static struct sk_buff *alx_alloc_skb(struct alx_priv *alx, gfp_t gfp) > > +{ > > + struct sk_buff *skb; > > + struct page *page; > > + > > + if (alx->rx_frag_size > PAGE_SIZE) > > + return __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp); > > + > > + page = alx->rx_page; > > + if (!page) { > > + alx->rx_page = page = alloc_page(gfp); > > + if (unlikely(!page)) > > + return NULL; > > + alx->rx_page_offset = 0; > > + } > > + > > + skb = build_skb(page_address(page) + alx->rx_page_offset, > > + alx->rx_frag_size); > > + if (likely(skb)) { > > + alx->rx_page_offset += alx->rx_frag_size; > > + if (alx->rx_page_offset >= PAGE_SIZE) > > + alx->rx_page = NULL; > > + else > > + get_page(page); > > + } > > + return skb; > > +} > > + > > + > > static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t gfp) > > { > > struct alx_rx_queue *rxq = &alx->rxq; > > @@ -86,7 +115,7 @@ static int alx_refill_rx_ring(struct alx_priv *alx, > > gfp_t gfp) > > while (!cur_buf->skb && next != rxq->read_idx) { > > struct alx_rfd *rfd = &rxq->rfd[cur]; > > > > - skb = __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp); > > + skb = alx_alloc_skb(alx, gfp); > > if (!skb) > > break; > > dma = dma_map_single(&alx->hw.pdev->dev, > > @@ -124,6 +153,7 @@ static int alx_refill_rx_ring(struct alx_priv *alx, > > gfp_t gfp) > > alx_write_mem16(&alx->hw, ALX_RFD_PIDX, cur); > > } > > > > + > > return count; > > } > > > > @@ -592,6 +622,11 @@ static void alx_free_rings(struct alx_priv *alx) > > kfree(alx->txq.bufs); > > kfree(alx->rxq.bufs); > > > > + if (alx->rx_p
Re: [PATCH] mlx5: avoid unused variable warning
From: Arnd Bergmann Date: Wed, 18 May 2016 16:21:07 +0200 > When CONFIG_NET_CLS_ACT is disabled, we get a new warning in the mlx5 > ethernet driver because the tc_for_each_action() loop never references > the iterator: > > mellanox/mlx5/core/en_tc.c: In function 'mlx5e_stats_flower': > mellanox/mlx5/core/en_tc.c:431:20: error: unused variable 'a' > [-Werror=unused-variable] > struct tc_action *a; > > This changes the dummy tc_for_each_action() macro by adding a > cast to void, letting the compiler know that the variable is > intentionally declared but not used here. I could not come up > with a nicer workaround, but this seems to do the trick. > > Signed-off-by: Arnd Bergmann > Fixes: aad7e08d39bd ("net/mlx5e: Hardware offloaded flower filter statistics > support") > Fixes: 00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef") Applied, thanks Arnd.
Re: [PATCH net] bpf: rather use get_random_int for randomizations
From: Daniel Borkmann Date: Wed, 18 May 2016 14:14:28 +0200 > Start address randomization and blinding in BPF currently use > prandom_u32(). prandom_u32() values are not exposed to unpriviledged > user space to my knowledge, but given other kernel facilities such as > ASLR, stack canaries, etc make use of stronger get_random_int(), we > better make use of it here as well given blinding requests successively > new random values. get_random_int() has minimal entropy pool depletion, > is not cryptographically secure, but doesn't need to be for our use > cases here. > > Suggested-by: Hannes Frederic Sowa > Signed-off-by: Daniel Borkmann Ok, applied, thanks Daniel.
Re: [PATCH net 1/1] qede: Fix DMA address APIs usage
From: Manish Chopra Date: Wed, 18 May 2016 07:43:57 -0400 > Driver incorrectly uses dma_unmap_addr_set() to set > a variable which is in truth a dma_addr_t > [i.e not defined using DEFINE_DMA_UNMAP_ADDR()] and is > being used by the driver flows other than unmapping > physical addresses. This patch fixes driver fastpath > where CONFIG_NEED_DMA_MAP_STATE is not set. > > Signed-off-by: Manish Chopra > Signed-off-by: Yuval Mintz Applied, thanks.
Re: [PATCH net] macsec: fix netlink attribute for key id
From: Sabrina Dubroca Date: Wed, 18 May 2016 13:34:40 +0200 > In my last commit I replaced MACSEC_SA_ATTR_KEYID by > MACSEC_SA_ATTR_KEY. > > Fixes: 8acca6acebd0 ("macsec: key identifier is 128 bits, not 64") > Signed-off-by: Sabrina Dubroca Applied and queued up for -stable, thanks.
Re: [PATCH net-next] xen-netback: correct length checks on hash copy_ops
From: Paul Durrant Date: Wed, 18 May 2016 08:53:01 +0100 > The length checks on the grant table copy_ops for setting hash key and > hash mapping are checking the local 'len' value which is correct in > the case of the former but not the latter. This was picked up by > static analysis checks. > > This patch replaces checks of 'len' with 'copy_op.len' in both cases > to correct the incorrect check, keep the two checks consistent, and to > make it clear what the checks are for. > > Signed-off-by: Paul Durrant > Reported-by: Dan Carpenter Applied.
Re: [PATCHv3] phy: fix crash in fixed_phy_add()
From: Rabin Vincent Date: Wed, 18 May 2016 12:47:31 +0200 > From: Rabin Vincent > > Since e7f4dc3536a ("mdio: Move allocation of interrupts into core"), > platforms which call fixed_phy_add() before fixed_mdio_bus_init() is > called (for example, because the platform code and the fixed_phy driver > use the same initcall level) crash in fixed_phy_add() since the > ->mii_bus is not allocated. > > Also since e7f4dc3536a, these interrupts are initalized to polling by > default. The few (old) platforms which directly use fixed_phy_add() > from their platform code all pass PHY_POLL for the irq argument, so we > can keep these platforms not crashing by simply not attempting to set > the irq if PHY_POLL is passed. > > Also, even if problems have not been reported on more modern platforms > which used fixed_phy_register() from drivers' probe functions, we return > -EPROBE_DEFER if the MDIO bus is not yet registered so that the probe is > retried later. > > Fixes: e7f4dc3536a400 ("mdio: Move allocation of interrupts into core") > Signed-off-by: Rabin Vincent > --- > v3: One more suggestion from Andrew: check mii bus state Applied, thanks.
Re: [RFC] net: remove busylock
On Fri, 20 May 2016 07:16:55 -0700 Eric Dumazet wrote: > Since bonding pretends to be multiqueue, TCQ_F_ONETXQUEUE is not set > on sch->flags when HTB is installed at the bonding device root. If would be cool if you could run a test with removed busylock and allow HTB to bulk dequeue. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH 2/2] ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit path.
From: Haishuang Yan Date: Wed, 18 May 2016 18:05:52 +0800 > In gre6 xmit path, we are sending a GRE packet, so set fl6 proto > to IPPROTO_GRE properly. > > Signed-off-by: Haishuang Yan I think it would be a lot better to initialize the flow protocol field properly in ip6gre_tnl_link_config() instead of fixing it up every single transmit.
Re: [PATCH 1/2] ip6_gre: Fix MTU setting for ip6gretap
From: Haishuang Yan Date: Wed, 18 May 2016 18:05:51 +0800 > When creat an ip6gretap interface with an unreachable route, > the MTU is about 14 bytes larger than what was needed. > > If the remote address is reachable: > ping6 2001:0:130::1 -c 2 > PING 2001:0:130::1(2001:0:130::1) 56 data bytes > 64 bytes from 2001:0:130::1: icmp_seq=1 ttl=64 time=1.46 ms > 64 bytes from 2001:0:130::1: icmp_seq=2 ttl=64 time=81.1 ms > > --- 2001:0:130::1 ping statistics --- > 2 packets transmitted, 2 received, 0% packet loss, time 1001ms > rtt min/avg/max/mdev = 1.465/41.316/81.167/39.851 ms > > ip link add ip6gretap1 type ip6gretap\ > local 2001:0:130::2 remote 2001:0:130::1 > ip link show ip6gretap1 > 11: ip6gretap1@NONE: mtu 1434 ... > link/ether c2:f3:f8:c1:2c:bf brd ff:ff:ff:ff:ff:ff > > The MTU value 1434 is right. But if we delete the direct route: > ip -6 route del 2001:0:130::/64 > ping6 2001:0:130::1 -c 2 > connect: Network is unreachable > ip link add ip6gretap1 type ip6gretap\ > local 2001:0:130::2 remote 2001:0:130::1 > ip link show ip6gretap1 > 12: ip6gretap1@NONE: mtu 1448 ... > link/ether 7e:e1:d2:c4:06:5e brd ff:ff:ff:ff:ff:ff > > Now, the MTU value 1448 is larger than what was needed. > > This patch fix the issue in this situation. > > Signed-off-by: Haishuang Yan It's great that you have provided precise examples of how to reproduce the problem. But you need to explain in your commit message how your fix works. Why does simply checking the device type and subtracting the ethernet header length work here? What is the full context for your fix and why does it work?
Re: [PATCH] net: sock: Add option for memory optimized hints.
On Fri, 2016-05-20 at 15:36 +0200, peter enderborg wrote: > From: Peter Enderborg > > When sending data the socket allocates memory for > payload on a cache or a page alloc. The page alloc > then might trigger compation that takes long time. > This can be avoided with smaller chunks. But > userspace can not know what is the right size for > the smaller sends. For this we add a SIZEHINT > getsocketopt where the userspace can get the size > for send that will fit into one page (order 0) or > the max for a slab cache allocation. Very obscure thing if you ask me. Exposing kernel internals like that seems to work around a more serious issue. What kind of sockets would ever use KMALLOC_MAX_CACHE_SIZE allocations exactly ? af_unix and tcp are definitely clean. 28d6427109d13b0f447cba5761f88d3548e83605 net: attempt high order allocations in sock_alloc_send_pskb() e370a7236321773245c5522d8bb299380830d3b2 af_unix: improve STREAM behavior with fragmented memory eb6a24816b247c0be6b2e97e68933072874bbe54 af_unix: reduce high order page allocations
[PATCH][3.13.y-ckt][PATCH 1/1] bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr
From: Toshiaki Makita BugLink: http://bugs.launchpad.net/bugs/1581585 br_fdb_changeaddr() assumes that there is at most one local entry per port per vlan. It used to be true, but since commit 36fd2b63e3b4 ("bridge: allow creating/deleting fdb entries via netlink"), it has not been so. Therefore, the function might fail to search a correct previous address to be deleted and delete an arbitrary local entry if user has added local entries manually. Example of problematic case: ip link set eth0 address ee:ff:12:34:56:78 brctl addif br0 eth0 bridge fdb add 12:34:56:78:90:ab dev eth0 master ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the address 12:34:56:78:90:ab might be deleted instead of ee:ff:12:34:56:78, the original mac address of eth0. Address this issue by introducing a new flag, added_by_user, to struct net_bridge_fdb_entry. Note that br_fdb_delete_by_port() has to set added_by_user to 0 in cases like: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 bridge fdb add aa:bb:cc:dd:ee:ff dev eth0 master brctl addif br0 eth1 brctl delif br0 eth0 In this case, kernel should delete the user-added entry aa:bb:cc:dd:ee:ff, but it also should have been added by "brctl addif br0 eth1" originally, so we don't delete it and treat it a new kernel-created entry. Signed-off-by: Toshiaki Makita Signed-off-by: David S. Miller (backported from commit a5642ab4744bc8c5a8c7ce7c6e30c01bd6bbc691) Signed-off-by: Joseph Salisbury --- net/bridge/br_fdb.c | 16 net/bridge/br_input.c | 8 net/bridge/br_private.h | 3 ++- 3 files changed, 18 insertions(+), 9 deletions(-) diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index 62c444d..551a27d 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -104,7 +104,7 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr) struct net_bridge_fdb_entry *f; f = hlist_entry(h, struct net_bridge_fdb_entry, hlist); - if (f->dst == p && f->is_local) { + if (f->dst == p && f->is_local && !f->added_by_user) { /* maybe another port has same hw addr? */ struct net_bridge_port *op; u16 vid = f->vlan_id; @@ -247,6 +247,7 @@ void br_fdb_delete_by_port(struct net_bridge *br, ether_addr_equal(op->dev->dev_addr, f->addr.addr)) { f->dst = op; + f->added_by_user = 0; goto skip_delete; } } @@ -397,6 +398,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head, fdb->vlan_id = vid; fdb->is_local = 0; fdb->is_static = 0; + fdb->added_by_user = 0; fdb->updated = fdb->used = jiffies; hlist_add_head_rcu(&fdb->hlist, head); } @@ -447,7 +449,7 @@ int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source, } void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source, - const unsigned char *addr, u16 vid) + const unsigned char *addr, u16 vid, bool added_by_user) { struct hlist_head *head = &br->hash[br_mac_hash(addr, vid)]; struct net_bridge_fdb_entry *fdb; @@ -473,13 +475,18 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source, /* fastpath: update of existing entry */ fdb->dst = source; fdb->updated = jiffies; + if (unlikely(added_by_user)) + fdb->added_by_user = 1; } } else { spin_lock(&br->hash_lock); if (likely(!fdb_find(head, addr, vid))) { fdb = fdb_create(head, source, addr, vid); - if (fdb) + if (fdb) { + if (unlikely(added_by_user)) + fdb->added_by_user = 1; fdb_notify(br, fdb, RTM_NEWNEIGH); + } } /* else we lose race and someone else inserts * it first, don't bother updating @@ -648,6 +655,7 @@ static int fdb_add_entry(struct net_bridge_port *source, const __u8 *addr, modified = true; } + fdb->added_by_user = 1; fdb->used = jiffies; if (modified) { @@ -666,7 +674,7 @@ static int __br_fdb_add(struct ndmsg *ndm, struct net_bridge_port *p, if (ndm->ndm
[PATCH][v3.13.y-ckt][PATCH 0/1] bridge: notify user space after fdb update
BugLink: http://bugs.launchpad.net/bugs/1581585 Hello, Please consider including upstream commit c65c7a306610ee7c13669a8f5601b472c19dc6f1 in the next v3.13.y-ckt release. It was included in the mainline tree as of v3.14-rc3. It has been tested and confirmed to resolve http://bugs.launchpad.net/bugs/1581585 . This commit also reques commit a5642ab4744 as a prereq. Commit a5642ab4744 does not apply cleanly to v3.13-y.ckt, so I performed a backport, which is in email 1/1 Prereq: commit a5642ab4744bc8c5a8c7ce7c6e30c01bd6bbc691 Author: Toshiaki Makita Date: Fri Feb 7 16:48:18 2014 +0900 bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr Real fix: commit c65c7a306610ee7c13669a8f5601b472c19dc6f1 Author: Jon Maxwell Date: Thu May 29 17:27:16 2014 +1000 bridge: notify user space after fdb update Regards, Joe Salisbury
Re: [PATCH net v2] fou: avoid using sk_user_data before it is initialised
On Thu, May 19, 2016 at 10:57 PM, Simon Horman wrote: > During initialisation sk->sk_user_data should not be used before > it is initialised. > > Found by bisection after noticing the following: > > $ ip fou add port ipproto 47 > [0.383417] BUG: unable to handle kernel NULL pointer dereference at > 0008 > [0.384132] IP: [] fou_nl_cmd_add_port+0x1e1/0x230 > [0.384707] PGD 1fafc067 PUD 1fb72067 PMD 0 > [0.385110] Oops: 0002 [#1] SMP > [0.385387] Modules linked in: > [0.385667] CPU: 0 PID: 55 Comm: ip Not tainted 4.6.0-03623-g0b7962a6c4a3 > #430 > [0.386244] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.7.5-20140531_083030-gandalf 04/01/2014 > [0.386244] task: 88001fb9cac0 ti: 88001fbc8000 task.ti: > 88001fbc8000 > [0.386244] RIP: 0010:[] [] > fou_nl_cmd_add_port+0x1e1/0x230 > [0.386244] RSP: 0018:88001fbcbb78 EFLAGS: 00010246 > [0.386244] RAX: 0001 RBX: 88001fb8eb40 RCX: > 002f > [0.386244] RDX: RSI: RDI: > 880019fcafc0 > [0.386244] RBP: 880019fcaf80 R08: 8130c370 R09: > 880019fcaf80 > [0.386244] R10: 880019e12b8c R11: R12: > 0004 > [0.386244] R13: 0014 R14: 88001fb1a300 R15: > 816634c0 > [0.386244] FS: 7f016eb4d700() GS:88001a20() > knlGS: > [0.386244] CS: 0010 DS: ES: CR0: 80050033 > [0.386244] CR2: 0008 CR3: 1fb69000 CR4: > 06b0 > [0.386244] Stack: > [0.386244] 88001faaea24 8800192426c0 0002002f0001 > > [0.386244] > b822 > [0.386244] 81461480 88001faaea14 0004 > 812b0e17 > [0.386244] Call Trace: > [0.386244] [] ? genl_family_rcv_msg+0x197/0x320 > [0.386244] [] ? genl_family_rcv_msg+0x320/0x320 > [0.386244] [] ? genl_rcv_msg+0x70/0xb0 > [0.386244] [] ? netlink_rcv_skb+0xa1/0xc0 > [0.386244] [] ? genl_rcv+0x24/0x40 > [0.386244] [] ? netlink_unicast+0x143/0x1d0 > [0.386244] [] ? netlink_sendmsg+0x366/0x390 > [0.386244] [] ? rw_copy_check_uvector+0x68/0x110 > [0.386244] [] ? sock_sendmsg+0x10/0x20 > [0.386244] [] ? ___sys_sendmsg+0x1f1/0x200 > [0.386244] [] ? pipe_write+0x1a0/0x420 > [0.386244] [] ? do_filp_open+0x92/0xe0 > [0.386244] [] ? __sys_sendmsg+0x41/0x70 > [0.386244] [] ? entry_SYSCALL_64_fastpath+0x13/0x8f > [0.386244] Code: 4c 24 12 48 8b 93 28 02 00 00 48 c7 83 68 03 00 00 e0 76 > 32 81 48 c7 83 78 03 00 00 50 61 32 81 48 c7 83 80 03 00 00 e0 64 32 81 <88> > 4a 08 e9 20 ff ff ff 4c 89 e7 bb 8e ff ff ff e8 1a 34 07 00 > [0.386244] RIP [] fou_nl_cmd_add_port+0x1e1/0x230 > [0.386244] RSP > [0.386244] CR2: 0008 > [0.407176] ---[ end trace 13bf0d24a4b7f9c3 ]--- > > Fixes: d92283e338f6 ("fou: change to use UDP socket GRO") > Signed-off-by: Simon Horman Acked-by: Cong Wang Thanks!
Re: [PATCH] net: sock: Add option for memory optimized hints.
It is not appropriate to submit new features at this time as we are in the merge window and we are only bug fixes. Please resubmit this when the net-next tree opens back up.
Re: [PATCH 1/1] net: pegasus: remove dead coding
From: Petko Manolov Date: Fri, 20 May 2016 10:33:47 +0300 > On 16-05-19 11:35:42, David Miller wrote: >> From: Heinrich Schuchardt >> Date: Wed, 18 May 2016 02:13:30 +0200 >> >> > (!count || count < 4) is always true. >> > So let's remove the coding which is dead at least since 2005. >> > >> > Signed-off-by: Heinrich Schuchardt >> >> Applied. > > David, the patch you applied is broken. It seems that you didn't follow the > discussion from the past couple of days. Please revert it. I did.
Re: [PATCH] net: pegasus: remove unused variables and labels
From: Arnd Bergmann Date: Fri, 20 May 2016 10:22:30 +0200 > The latest dead-code removal was slightly incomplete and > left a few things behind that we now get compiler warnings > for: > > drivers/net/usb/pegasus.c: In function 'read_bulk_callback': > drivers/net/usb/pegasus.c:475:1: error: label 'goon' defined but not used > [-Werror=unused-label] > drivers/net/usb/pegasus.c:446:8: error: unused variable 'pkt_len' > [-Werror=unused-variable] > drivers/net/usb/pegasus.c:445:6: error: unused variable 'buf' > [-Werror=unused-variable] > drivers/net/usb/pegasus.c:443:17: error: unused variable 'count' > [-Werror=unused-variable] > > This removes them as well. > > Signed-off-by: Arnd Bergmann > Fixes: e00be9e4d0ff ("net: pegasus: remove dead coding") The patch in question was broken and has been reverted.
ip rule duplicates
Hi, When we add the same rule again with flag NLM_F_EXCL we expect that we receive error: RTNETLINK answers: File exists This behaviour is already in ip routing part. I have noticed that iproute2 when adds new rule it attach flag NLM_F_EXCL to call. (see http://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/tree/ip/iprule.c#n334) Next thing what I found is that this flag is not handled from kernel side. I implemented this feature and I tested this with qemu x86 on: linux-4.5.4 linux (git) linux-stable (git) Tested with ipv4 and ipv6. current behaviour with ipv4: localhost ~ # ip rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005 localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005 localhost ~ # ip rule 0: from all lookup local 1005: from 10.46.177.97 lookup 104 1005: from 10.46.177.97 lookup 104 32766: from all lookup main 32767: from all lookup default expected behavior after patch: localhost ~ # ip rule 0:from all lookup local 32766:from all lookup main 32767:from all lookup default localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005 localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005 RTNETLINK answers: File exists localhost ~ # ip rule 0:from all lookup local 1005:from 10.46.177.97 lookup 104 32766:from all lookup main 32767:from all lookup default There was already topic regarding this but I don't see any changes merged and problem still occurs. (see http://marc.info/?l=linux-netdev&m=113577886110391&w=2) -- Best regards, Mateusz Bajorski >From 9c3f80dceec414ff31d0c38d0107dec279fc9894 Mon Sep 17 00:00:00 2001 From: Mateusz Bajorski Date: Fri, 20 May 2016 14:29:56 +0200 Subject: [PATCH] Added NLM_F_EXCL support to fib_nl_newrule Signed-off-by: Mateusz Bajorski --- net/core/fib_rules.c | 41 + 1 file changed, 41 insertions(+) diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 840aceb..c1bc07cd 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -291,6 +291,47 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh) if (err < 0) goto errout; + if (nlh->nlmsg_flags & NLM_F_EXCL) { + list_for_each_entry(rule, &ops->rules_list, list) { + if (frh->action && (frh->action != rule->action)) +continue; + + if (frh_get_table(frh, tb) && +(frh_get_table(frh, tb) != rule->table)) +continue; + + if (tb[FRA_PRIORITY] && +(rule->pref != nla_get_u32(tb[FRA_PRIORITY]))) +continue; + + if (tb[FRA_IIFNAME] && +nla_strcmp(tb[FRA_IIFNAME], rule->iifname)) +continue; + + if (tb[FRA_OIFNAME] && +nla_strcmp(tb[FRA_OIFNAME], rule->oifname)) +continue; + + if (tb[FRA_FWMARK] && +(rule->mark != nla_get_u32(tb[FRA_FWMARK]))) +continue; + + if (tb[FRA_FWMASK] && +(rule->mark_mask != nla_get_u32(tb[FRA_FWMASK]))) +continue; + + if (tb[FRA_TUN_ID] && +(rule->tun_id != nla_get_be64(tb[FRA_TUN_ID]))) +continue; + + if (!ops->compare(rule, frh, tb)) +continue; + + err = -EEXIST; + goto errout; + } + } + rule = kzalloc(ops->rule_size, GFP_KERNEL); if (rule == NULL) { err = -ENOMEM; -- 2.6.4
[PATCH] ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n
Commit fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob") moves the default TTL assignment, and as side-effect IPv4 TTL now has a default value only if sysctl support is enabled (CONFIG_SYSCTL=y). The sysctl_ip_default_ttl is fundamental for IP to work properly, as it provides the TTL to be used as default. The defautl TTL may be used in ip_selected_ttl, through the following flow: ip_select_ttl ip4_dst_hoplimit net->ipv4.sysctl_ip_default_ttl This commit fixes the issue by assigning net->ipv4.sysctl_ip_default_ttl in net_init_net, called during ipv4's initialization. Without this commit, a kernel built without sysctl support will send all IP packets with zero TTL (unless a TTL is explicitly set, e.g. with setsockopt). Given a similar issue might appear on the other knobs that were namespaceify, this commit also moves them. Fixes: fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob") Signed-off-by: Ezequiel Garcia --- net/ipv4/af_inet.c | 8 net/ipv4/sysctl_net_ipv4.c | 4 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 2e6e65fc4d20..8d334816f89d 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1697,6 +1697,14 @@ static __net_init int inet_init_net(struct net *net) */ net->ipv4.ping_group_range.range[0] = make_kgid(&init_user_ns, 1); net->ipv4.ping_group_range.range[1] = make_kgid(&init_user_ns, 0); + + /* Default values for sysctl-controlled parameters. +* We set them here, in case sysctl is not compiled. +*/ + net->ipv4.sysctl_ip_default_ttl = IPDEFTTL; + net->ipv4.sysctl_ip_dynaddr = 0; + net->ipv4.sysctl_ip_early_demux = 1; + return 0; } diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index bb0419582b8d..1cb67de106fe 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -999,10 +999,6 @@ static __net_init int ipv4_sysctl_init_net(struct net *net) if (!net->ipv4.sysctl_local_reserved_ports) goto err_ports; - net->ipv4.sysctl_ip_default_ttl = IPDEFTTL; - net->ipv4.sysctl_ip_dynaddr = 0; - net->ipv4.sysctl_ip_early_demux = 1; - return 0; err_ports: -- 2.7.0
Re: [RFC] net: remove busylock
On 16-05-20 06:11 AM, Eric Dumazet wrote: > On Fri, 2016-05-20 at 09:29 +0200, Jesper Dangaard Brouer wrote: > > >> The hole idea behind allowing bulk qdisc dequeue, was to mitigate this, >> by allowing dequeue to do more work, while holding the lock. >> >> You mention HTB. Notice HTB does not take advantage of bulk dequeue. >> Have you tried to enable/allow HTB to bulk dequeue? >> > > Well, __QDISC___STATE_RUNNING means exactly that : one cpu is dequeueing > many packets from the qdisc and tx them to the device. > > It is generic for any kind of qdisc. > > HTB bulk dequeue would have to call ->dequeue() mutiple times. If you do > this while holding qdisc spinlock, you block other cpus from doing > concurrent ->enqueue(), adding latencies (always the same trade off...) > > HTB wont be anytime soon have separate protections for the ->enqueue() > and the ->dequeue(). Have you looked at this monster ? I did, many > times... > I came to the conclusion that we just need to rewrite a new modern version of HTB at some point. Easier said than done however. > Note that I am working on a patch to transform __QDISC___STATE_RUNNING > to a seqcount do that we can grab stats without holding the qdisc lock. > > > >
[PATCH] Added NLM_F_EXCL support to fib_nl_newrule
When adding rule with NLM_F_EXCL flag then check if the same rule exist. If yes then exit with -EEXIST. This is already implemented in iproute2: if (cmd == RTM_NEWRULE) { req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL; req.r.rtm_type = RTN_UNICAST; } Tested ipv4 and ipv6 with net-next linux on qemu x86 expected behavior after patch: localhost ~ # ip rule 0:from all lookup local 32766:from all lookup main 32767:from all lookup default localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005 localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005 RTNETLINK answers: File exists localhost ~ # ip rule 0:from all lookup local 1005:from 10.46.177.97 lookup 104 32766:from all lookup main 32767:from all lookup default There was already topic regarding this but I don't see any changes merged and problem still occurs. https://lkml.kernel.org/r/1135778809.5944.7.camel+%28%29+localhost+%21+localdomain Signed-off-by: Mateusz Bajorski --- net/core/fib_rules.c | 41 + 1 file changed, 41 insertions(+) diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 840aceb..b9816a3 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -291,6 +291,47 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh) if (err < 0) goto errout; + if (nlh->nlmsg_flags & NLM_F_EXCL) { + list_for_each_entry(rule, &ops->rules_list, list) { + if (frh->action && (frh->action != rule->action)) + continue; + + if (frh_get_table(frh, tb) && + frh_get_table(frh, tb) != rule->table) + continue; + + if (tb[FRA_PRIORITY] && + rule->pref != nla_get_u32(tb[FRA_PRIORITY])) + continue; + + if (tb[FRA_IIFNAME] && + nla_strcmp(tb[FRA_IIFNAME], rule->iifname)) + continue; + + if (tb[FRA_OIFNAME] && + nla_strcmp(tb[FRA_OIFNAME], rule->oifname)) + continue; + + if (tb[FRA_FWMARK] && + rule->mark != nla_get_u32(tb[FRA_FWMARK])) + continue; + + if (tb[FRA_FWMASK] && + rule->mark_mask != nla_get_u32(tb[FRA_FWMASK])) + continue; + + if (tb[FRA_TUN_ID] && + rule->tun_id != nla_get_be64(tb[FRA_TUN_ID])) + continue; + + if (!ops->compare(rule, frh, tb)) + continue; + + err = -EEXIST; + goto errout; + } + } + rule = kzalloc(ops->rule_size, GFP_KERNEL); if (rule == NULL) { err = -ENOMEM; -- 2.6.4
Re: [RFC] net: remove busylock
On Fri, 2016-05-20 at 06:47 -0700, Eric Dumazet wrote: > On Fri, 2016-05-20 at 06:11 -0700, Eric Dumazet wrote: > > On Fri, 2016-05-20 at 09:29 +0200, Jesper Dangaard Brouer wrote: > > > > > > > The hole idea behind allowing bulk qdisc dequeue, was to mitigate this, > > > by allowing dequeue to do more work, while holding the lock. > > > > > > You mention HTB. Notice HTB does not take advantage of bulk dequeue. > > > Have you tried to enable/allow HTB to bulk dequeue? > > > > > > > Well, __QDISC___STATE_RUNNING means exactly that : one cpu is dequeueing > > many packets from the qdisc and tx them to the device. > > > > It is generic for any kind of qdisc. > > > > HTB bulk dequeue would have to call ->dequeue() mutiple times. If you do > > this while holding qdisc spinlock, you block other cpus from doing > > concurrent ->enqueue(), adding latencies (always the same trade off...) > > > > HTB wont be anytime soon have separate protections for the ->enqueue() > > and the ->dequeue(). Have you looked at this monster ? I did, many > > times... > > > > Note that I am working on a patch to transform __QDISC___STATE_RUNNING > > to a seqcount do that we can grab stats without holding the qdisc lock. > > Slide note : __qdisc_run() could probably avoid a __netif_schedule() > when it breaks the loop, if another cpu is busy spinning on qdisc lock. > > -> Less (spurious) TX softirq invocations, so less chance to trigger the > infamous ksoftirqd bug we discussed lately. Also note that in our case, we have HTB on a bonding device, and FQ/pacing on slaves. Since bonding pretends to be multiqueue, TCQ_F_ONETXQUEUE is not set on sch->flags when HTB is installed at the bonding device root. We might add a flag to tell qdisc layer that a device is virtual and could benefit from bulk dequeue, since the ultimate TX queue is located on another (physical) netdev, eventually MQ enabled.
Re: [RFC] net: remove busylock
On Fri, 2016-05-20 at 06:11 -0700, Eric Dumazet wrote: > On Fri, 2016-05-20 at 09:29 +0200, Jesper Dangaard Brouer wrote: > > > > The hole idea behind allowing bulk qdisc dequeue, was to mitigate this, > > by allowing dequeue to do more work, while holding the lock. > > > > You mention HTB. Notice HTB does not take advantage of bulk dequeue. > > Have you tried to enable/allow HTB to bulk dequeue? > > > > Well, __QDISC___STATE_RUNNING means exactly that : one cpu is dequeueing > many packets from the qdisc and tx them to the device. > > It is generic for any kind of qdisc. > > HTB bulk dequeue would have to call ->dequeue() mutiple times. If you do > this while holding qdisc spinlock, you block other cpus from doing > concurrent ->enqueue(), adding latencies (always the same trade off...) > > HTB wont be anytime soon have separate protections for the ->enqueue() > and the ->dequeue(). Have you looked at this monster ? I did, many > times... > > Note that I am working on a patch to transform __QDISC___STATE_RUNNING > to a seqcount do that we can grab stats without holding the qdisc lock. Slide note : __qdisc_run() could probably avoid a __netif_schedule() when it breaks the loop, if another cpu is busy spinning on qdisc lock. -> Less (spurious) TX softirq invocations, so less chance to trigger the infamous ksoftirqd bug we discussed lately.
[PATCH] net: sock: Add option for memory optimized hints.
From: Peter Enderborg When sending data the socket allocates memory for payload on a cache or a page alloc. The page alloc then might trigger compation that takes long time. This can be avoided with smaller chunks. But userspace can not know what is the right size for the smaller sends. For this we add a SIZEHINT getsocketopt where the userspace can get the size for send that will fit into one page (order 0) or the max for a slab cache allocation. Signed-off-by: Peter Enderborg --- include/uapi/asm-generic/socket.h | 2 ++ include/uapi/linux/socket.h | 9 + net/core/sock.c | 17 + 3 files changed, 28 insertions(+) diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index 67d632f..f6a4921 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -92,4 +92,6 @@ #define SO_CNX_ADVICE 53 +#define SO_SIZEHINT54 + #endif /* __ASM_GENERIC_SOCKET_H */ diff --git a/include/uapi/linux/socket.h b/include/uapi/linux/socket.h index 76ab0c6..16db7e8 100644 --- a/include/uapi/linux/socket.h +++ b/include/uapi/linux/socket.h @@ -1,6 +1,8 @@ #ifndef _UAPI_LINUX_SOCKET_H #define _UAPI_LINUX_SOCKET_H +#include + /* * Desired design of maximum size and alignment (see RFC2553) */ @@ -18,4 +20,11 @@ struct __kernel_sockaddr_storage { /* _SS_MAXSIZE value minus size of ss_family */ } __attribute__ ((aligned(_K_SS_ALIGNSIZE))); /* force desired alignment */ +struct sock_sizehint { + __u32 order_zero_size; + /* max payload size that can fit into one page in kernel */ + __u32 cache_size; + /* max payload size that can fit in socket slab cache */ +}; + #endif /* _UAPI_LINUX_SOCKET_H */ diff --git a/net/core/sock.c b/net/core/sock.c index 7e73c26..4c9cd92 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1254,6 +1254,23 @@ int sock_getsockopt(struct socket *sock, int level, int optname, v.val = sk->sk_incoming_cpu; break; + case SO_SIZEHINT: + { + struct sock_sizehint hint; + + if (len > sizeof(hint)) + len = sizeof(hint); + + hint.order_zero_size = PAGE_SIZE - + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + hint.cache_size = KMALLOC_MAX_CACHE_SIZE - + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + + if (copy_to_user(optval, &hint, len)) + return -EFAULT; + goto lenout; + } + default: /* We implement the SO_SNDLOWAT etc to not be settable * (1003.1g 7). -- 2.4.2
Re: [RFC] net: remove busylock
On Fri, 2016-05-20 at 09:29 +0200, Jesper Dangaard Brouer wrote: > The hole idea behind allowing bulk qdisc dequeue, was to mitigate this, > by allowing dequeue to do more work, while holding the lock. > > You mention HTB. Notice HTB does not take advantage of bulk dequeue. > Have you tried to enable/allow HTB to bulk dequeue? > Well, __QDISC___STATE_RUNNING means exactly that : one cpu is dequeueing many packets from the qdisc and tx them to the device. It is generic for any kind of qdisc. HTB bulk dequeue would have to call ->dequeue() mutiple times. If you do this while holding qdisc spinlock, you block other cpus from doing concurrent ->enqueue(), adding latencies (always the same trade off...) HTB wont be anytime soon have separate protections for the ->enqueue() and the ->dequeue(). Have you looked at this monster ? I did, many times... Note that I am working on a patch to transform __QDISC___STATE_RUNNING to a seqcount do that we can grab stats without holding the qdisc lock.
Re: [RFC net-next] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
On Fri, 2016-05-20 at 06:01 -0700, Eric Dumazet wrote: > Tricky, since sch_direct_xmit() releases the qdisc spinlock and grabs it > again, while owning the ' running seqcount' > > Needs more LOCKDEP tricks ;) > That would be : @@ -137,10 +137,10 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, HARD_TX_UNLOCK(dev, txq); } else { - spin_lock(root_lock); + spin_lock_nested(root_lock, SINGLE_DEPTH_NESTING); return qdisc_qlen(q); } - spin_lock(root_lock); + spin_lock_nested(root_lock, SINGLE_DEPTH_NESTING); if (dev_xmit_complete(ret)) { /* Driver sent out skb successfully or skb was consumed */
Re: [RFC net-next] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
On Fri, 2016-05-20 at 05:44 -0700, Eric Dumazet wrote: > On Thu, 2016-05-19 at 19:45 -0700, Eric Dumazet wrote: > > On Thu, 2016-05-19 at 18:50 -0700, Cong Wang wrote: > > > On Thu, May 19, 2016 at 5:35 AM, Eric Dumazet > > > wrote: > > > > > > > > These stats are using u64 or u32 fields, so reading integral values > > > > should not prevent writers from doing concurrent updates if the kernel > > > > arch is a 64bit one. > > > > > > > > Being able to atomically fetch all counters like packets and bytes sent > > > > at the expense of interfering in fast path (queue and dequeue packets) > > > > is simply not worth the pain, as the values are generally stale after 1 > > > > usec. > > > > > > I think one purpose of this lock is to make sure we have an atomic > > > snapshot of these counters as a whole. IOW, we may need another > > > lock rather than the qdisc root lock to guarantee this. > > > > Right, this was stated in the changelog. > > > > I played a bit at changing qdisc->__state to a seqcount. > > > > But this would add 2 additional smp_wmb() barriers. > > Although this would allow the mechanism to be used both on 32bit an > 64bit kernels. > > This would also add LOCKDEP annotations which can be nice for debugging. > > Also the seqcount value >> 1 would give us the number of __qdisc_run() > and we could compute packets/(seqcount>>1) to get the average number of > packets processed per round. Tricky, since sch_direct_xmit() releases the qdisc spinlock and grabs it again, while owning the ' running seqcount' Needs more LOCKDEP tricks ;)
Re: [PATCH] ptp: use memdup_user().
On Fri, May 20, 2016 at 05:51:02PM +0530, Muhammad Falak R Wani wrote: > Use memdup_user to duplicate a memory region from user-space to > kernel-space, instead of open coding using kmalloc & copy_from_user. > > Signed-off-by: Muhammad Falak R Wani Acked-by: Richard Cochran
Re: [RFC net-next] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
On Thu, 2016-05-19 at 19:45 -0700, Eric Dumazet wrote: > On Thu, 2016-05-19 at 18:50 -0700, Cong Wang wrote: > > On Thu, May 19, 2016 at 5:35 AM, Eric Dumazet > > wrote: > > > > > > These stats are using u64 or u32 fields, so reading integral values > > > should not prevent writers from doing concurrent updates if the kernel > > > arch is a 64bit one. > > > > > > Being able to atomically fetch all counters like packets and bytes sent > > > at the expense of interfering in fast path (queue and dequeue packets) > > > is simply not worth the pain, as the values are generally stale after 1 > > > usec. > > > > I think one purpose of this lock is to make sure we have an atomic > > snapshot of these counters as a whole. IOW, we may need another > > lock rather than the qdisc root lock to guarantee this. > > Right, this was stated in the changelog. > > I played a bit at changing qdisc->__state to a seqcount. > > But this would add 2 additional smp_wmb() barriers. Although this would allow the mechanism to be used both on 32bit an 64bit kernels. This would also add LOCKDEP annotations which can be nice for debugging. Also the seqcount value >> 1 would give us the number of __qdisc_run() and we could compute packets/(seqcount>>1) to get the average number of packets processed per round. diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 941ec99cd3b6..471095beca09 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4610,6 +4610,7 @@ static int bond_check_params(struct bond_params *params) static struct lock_class_key bonding_netdev_xmit_lock_key; static struct lock_class_key bonding_netdev_addr_lock_key; static struct lock_class_key bonding_tx_busylock_key; +static struct lock_class_key bonding_qdisc_running; static void bond_set_lockdep_class_one(struct net_device *dev, struct netdev_queue *txq, @@ -4625,6 +4626,7 @@ static void bond_set_lockdep_class(struct net_device *dev) &bonding_netdev_addr_lock_key); netdev_for_each_tx_queue(dev, bond_set_lockdep_class_one, NULL); dev->qdisc_tx_busylock = &bonding_tx_busylock_key; + dev->qdisc_running = &bonding_qdisc_running; } /* Called from registration process */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c148edfe4965..e06646b69b06 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1862,6 +1862,7 @@ struct net_device { #endif struct phy_device *phydev; struct lock_class_key *qdisc_tx_busylock; + struct lock_class_key *qdisc_running; boolproto_down; }; #define to_net_dev(d) container_of(d, struct net_device, dev) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index a1fd76c22a59..bff8d895ef8a 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -29,13 +29,6 @@ enum qdisc_state_t { __QDISC_STATE_THROTTLED, }; -/* - * following bits are only changed while qdisc lock is held - */ -enum qdisc___state_t { - __QDISC___STATE_RUNNING = 1, -}; - struct qdisc_size_table { struct rcu_head rcu; struct list_headlist; @@ -93,7 +86,7 @@ struct Qdisc { unsigned long state; struct sk_buff_head q; struct gnet_stats_basic_packed bstats; - unsigned int__state; + seqcount_t running; struct gnet_stats_queue qstats; struct rcu_head rcu_head; int padded; @@ -104,20 +97,20 @@ struct Qdisc { static inline bool qdisc_is_running(const struct Qdisc *qdisc) { - return (qdisc->__state & __QDISC___STATE_RUNNING) ? true : false; + return (raw_read_seqcount(&qdisc->running) & 1) ? true : false; } static inline bool qdisc_run_begin(struct Qdisc *qdisc) { if (qdisc_is_running(qdisc)) return false; - qdisc->__state |= __QDISC___STATE_RUNNING; + write_seqcount_begin(&qdisc->running); return true; } static inline void qdisc_run_end(struct Qdisc *qdisc) { - qdisc->__state &= ~__QDISC___STATE_RUNNING; + write_seqcount_end(&qdisc->running); } static inline bool qdisc_may_bulk(const struct Qdisc *qdisc) diff --git a/net/core/dev.c b/net/core/dev.c index 904ff431d570..55b414dead29 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3075,7 +3075,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, /* * Heuristic to force contended enqueues to serialize on a * separate lock before trying to get qdisc main lock. -* This permits __QDISC___STATE_RUNNING owner to get the lock more +* This permits qdisc->running owner to get the lock more * often and dequeue packets faster. */ contended = qdisc_is_running(q); diff --git a/net/sched/sch_generic.c b/
rxrpc: Simplify connect() implementation and simplify sendmsg() op
Hi Dave, Are you okay with taking this into net-next? I no longer take away the ability to do connect(), but now it does nothing more than specify a default address and mark the socket as being client only. As before, the default address is overridden if sendmsg() is given an address. The other significant UAPI change is that I deal better with a race between two threads trying to register calls with the same user ID (one of them will get an error). David --- rxrpc: Simplify connect() implementation and simplify sendmsg() op Simplify the RxRPC connect() implementation. It will just note the destination address it is given, and if a sendmsg() comes along with no address, this will be assigned as the address. No transport struct will be held internally, which will allow us to remove this later. Simplify sendmsg() also. Whilst a call is active, userspace refers to it by a private unique user ID specified in a control message. When sendmsg() sees a user ID that doesn't map to an extant call, it creates a new call for that user ID and attempts to add it. If, when we try to add it, the user ID is now registered, we now reject the message with -EEXIST. We should never see this situation unless two threads are racing, trying to create a call with the same ID - which would be an error. It also isn't required to provide sendmsg() with an address - provided the control message data holds a user ID that maps to a currently active call. Signed-off-by: David Howells --- include/linux/rxrpc.h | 18 ++-- net/rxrpc/af_rxrpc.c | 177 +++ net/rxrpc/ar-call.c | 158 +++ net/rxrpc/ar-connection.c | 17 net/rxrpc/ar-internal.h | 22 ++--- net/rxrpc/ar-output.c | 186 +- 6 files changed, 234 insertions(+), 344 deletions(-) diff --git a/include/linux/rxrpc.h b/include/linux/rxrpc.h index a53915cd5581..1e8f216e2cf1 100644 --- a/include/linux/rxrpc.h +++ b/include/linux/rxrpc.h @@ -40,16 +40,18 @@ struct sockaddr_rxrpc { /* * RxRPC control messages + * - If neither abort or accept are specified, the message is a data message. * - terminal messages mean that a user call ID tag can be recycled + * - s/r/- indicate whether these are applicable to sendmsg() and/or recvmsg() */ -#define RXRPC_USER_CALL_ID 1 /* user call ID specifier */ -#define RXRPC_ABORT2 /* abort request / notification [terminal] */ -#define RXRPC_ACK 3 /* [Server] RPC op final ACK received [terminal] */ -#define RXRPC_NET_ERROR5 /* network error received [terminal] */ -#define RXRPC_BUSY 6 /* server busy received [terminal] */ -#define RXRPC_LOCAL_ERROR 7 /* local error generated [terminal] */ -#define RXRPC_NEW_CALL 8 /* [Server] new incoming call notification */ -#define RXRPC_ACCEPT 9 /* [Server] accept request */ +#define RXRPC_USER_CALL_ID 1 /* sr: user call ID specifier */ +#define RXRPC_ABORT2 /* sr: abort request / notification [terminal] */ +#define RXRPC_ACK 3 /* -r: [Service] RPC op final ACK received [terminal] */ +#define RXRPC_NET_ERROR5 /* -r: network error received [terminal] */ +#define RXRPC_BUSY 6 /* -r: server busy received [terminal] */ +#define RXRPC_LOCAL_ERROR 7 /* -r: local error generated [terminal] */ +#define RXRPC_NEW_CALL 8 /* -r: [Service] new incoming call notification */ +#define RXRPC_ACCEPT 9 /* s-: [Service] accept request */ /* * RxRPC security levels diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index e45e94ca030f..484b5ee16a50 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -137,33 +137,33 @@ static int rxrpc_bind(struct socket *sock, struct sockaddr *saddr, int len) lock_sock(&rx->sk); - if (rx->sk.sk_state != RXRPC_UNCONNECTED) { + if (rx->sk.sk_state != RXRPC_UNBOUND) { ret = -EINVAL; goto error_unlock; } memcpy(&rx->srx, srx, sizeof(rx->srx)); - /* Find or create a local transport endpoint to use */ local = rxrpc_lookup_local(&rx->srx); if (IS_ERR(local)) { ret = PTR_ERR(local); goto error_unlock; } - rx->local = local; - if (srx->srx_service) { + if (rx->srx.srx_service) { write_lock_bh(&local->services_lock); list_for_each_entry(prx, &local->services, listen_link) { - if (prx->srx.srx_service == srx->srx_service) + if (prx->srx.srx_service == rx->srx.srx_service) goto service_in_use; } + rx->local = local; list_add_tail(&rx->listen_link, &loc
[PATCH] ptp: use memdup_user().
Use memdup_user to duplicate a memory region from user-space to kernel-space, instead of open coding using kmalloc & copy_from_user. Signed-off-by: Muhammad Falak R Wani --- drivers/ptp/ptp_chardev.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c index 579fd65..0b1ac6b 100644 --- a/drivers/ptp/ptp_chardev.c +++ b/drivers/ptp/ptp_chardev.c @@ -208,14 +208,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg) break; case PTP_SYS_OFFSET: - sysoff = kmalloc(sizeof(*sysoff), GFP_KERNEL); - if (!sysoff) { - err = -ENOMEM; - break; - } - if (copy_from_user(sysoff, (void __user *)arg, - sizeof(*sysoff))) { - err = -EFAULT; + sysoff = memdup_user((void __user *)arg, sizeof(*sysoff)); + if (IS_ERR(sysoff)) { + err = PTR_ERR(sysoff); break; } if (sysoff->n_samples > PTP_MAX_SAMPLES) { -- 1.9.1
[PATCH] wan: cosa: use memdup_user().
Use memdup_user to duplicate a memory region from user-space to kernel-space, instead of open coding using kmalloc & copy_from_user. Signed-off-by: Muhammad Falak R Wani --- drivers/net/wan/cosa.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/net/wan/cosa.c b/drivers/net/wan/cosa.c index b87fe0a..fb37439 100644 --- a/drivers/net/wan/cosa.c +++ b/drivers/net/wan/cosa.c @@ -876,15 +876,10 @@ static ssize_t cosa_write(struct file *file, count = COSA_MTU; /* Allocate the buffer */ - kbuf = kmalloc(count, GFP_KERNEL|GFP_DMA); - if (kbuf == NULL) { + kbuf = memdup_user(buf, count); + if (IS_ERR(kbuf)) { up(&chan->wsem); - return -ENOMEM; - } - if (copy_from_user(kbuf, buf, count)) { - up(&chan->wsem); - kfree(kbuf); - return -EFAULT; + return PTR_ERR(kbuf); } chan->tx_status=0; cosa_start_tx(chan, kbuf, count); -- 1.9.1
[PATCH V2 4.8 1/2] brcmutil: add field storing control channel to the struct brcmu_chan
Our d11 code supports encoding/decoding channel info into/from chanspec format used by firmware. Current implementation is quite misleading because of the way "chnum" field is used. When encoding channel info, "chnum" has to be filled by a caller with *center* channel number. However when decoding chanspec the same field is filled with a *control* channel number. 1) This can be confusing. It's expected for information to be the same after encoding and decoding. 2) It doesn't allow accessing all info when decoding. Some functions may need to know both channel numbers, e.g. cfg80211 callback getting current channel. Solve this by adding a separated field for control channel. Signed-off-by: Rafał Miłecki Reviewed-by: Arend van Spriel --- V2: Update commit description message. --- .../broadcom/brcm80211/brcmfmac/cfg80211.c | 17 + .../net/wireless/broadcom/brcm80211/brcmfmac/p2p.c | 10 +- .../net/wireless/broadcom/brcm80211/brcmutil/d11.c | 18 ++ .../broadcom/brcm80211/include/brcmu_d11.h | 22 ++ 4 files changed, 46 insertions(+), 21 deletions(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c index d0631b6..597495d 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -2734,7 +2734,7 @@ static s32 brcmf_inform_single_bss(struct brcmf_cfg80211_info *cfg, if (!bi->ctl_ch) { ch.chspec = le16_to_cpu(bi->chanspec); cfg->d11inf.decchspec(&ch); - bi->ctl_ch = ch.chnum; + bi->ctl_ch = ch.control_ch_num; } channel = bi->ctl_ch; @@ -2852,7 +2852,7 @@ static s32 brcmf_inform_ibss(struct brcmf_cfg80211_info *cfg, else band = wiphy->bands[NL80211_BAND_5GHZ]; - freq = ieee80211_channel_to_frequency(ch.chnum, band->band); + freq = ieee80211_channel_to_frequency(ch.control_ch_num, band->band); cfg->channel = freq; notify_channel = ieee80211_get_channel(wiphy, freq); @@ -2862,7 +2862,7 @@ static s32 brcmf_inform_ibss(struct brcmf_cfg80211_info *cfg, notify_ielen = le32_to_cpu(bi->ie_length); notify_signal = (s16)le16_to_cpu(bi->RSSI) * 100; - brcmf_dbg(CONN, "channel: %d(%d)\n", ch.chnum, freq); + brcmf_dbg(CONN, "channel: %d(%d)\n", ch.control_ch_num, freq); brcmf_dbg(CONN, "capability: %X\n", notify_capability); brcmf_dbg(CONN, "beacon interval: %d\n", notify_interval); brcmf_dbg(CONN, "signal: %d\n", notify_signal); @@ -5280,7 +5280,7 @@ brcmf_bss_roaming_done(struct brcmf_cfg80211_info *cfg, else band = wiphy->bands[NL80211_BAND_5GHZ]; - freq = ieee80211_channel_to_frequency(ch.chnum, band->band); + freq = ieee80211_channel_to_frequency(ch.control_ch_num, band->band); notify_channel = ieee80211_get_channel(wiphy, freq); done: @@ -5802,14 +5802,15 @@ static int brcmf_construct_chaninfo(struct brcmf_cfg80211_info *cfg, channel = band->channels; index = band->n_channels; for (j = 0; j < band->n_channels; j++) { - if (channel[j].hw_value == ch.chnum) { + if (channel[j].hw_value == ch.control_ch_num) { index = j; break; } } channel[index].center_freq = - ieee80211_channel_to_frequency(ch.chnum, band->band); - channel[index].hw_value = ch.chnum; + ieee80211_channel_to_frequency(ch.control_ch_num, + band->band); + channel[index].hw_value = ch.control_ch_num; /* assuming the chanspecs order is HT20, * HT40 upper, HT40 lower, and VHT80. @@ -5911,7 +5912,7 @@ static int brcmf_enable_bw40_2g(struct brcmf_cfg80211_info *cfg) if (WARN_ON(ch.bw != BRCMU_CHAN_BW_40)) continue; for (j = 0; j < band->n_channels; j++) { - if (band->channels[j].hw_value == ch.chnum) + if (band->channels[j].hw_value == ch.control_ch_num) break; } if (WARN_ON(j == band->n_channels)) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c index a70cda6..1652a48 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c @@ -1246,7 +1246,7 @@ bool brcmf_p2p_scan_finding_common_channel(struct brcmf_cfg80211_info *cfg,
[PATCH V2 4.8 2/2] brcmfmac: support get_channel cfg80211 callback
This is important for brcmfmac as some of released firmwares (e.g. brcmfmac4366b-pcie.bin) may pick different channel than requested. This has been tested with BCM4366B1 in D-Link DIR-885L. Signed-off-by: Rafał Miłecki --- V2: Check if ndev isn't NULL, update description. --- .../broadcom/brcm80211/brcmfmac/cfg80211.c | 63 ++ 1 file changed, 63 insertions(+) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c index 597495d..299a404 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -4892,6 +4892,68 @@ exit: return err; } +static int brcmf_cfg80211_get_channel(struct wiphy *wiphy, + struct wireless_dev *wdev, + struct cfg80211_chan_def *chandef) +{ + struct brcmf_cfg80211_info *cfg = wiphy_to_cfg(wiphy); + struct net_device *ndev = wdev->netdev; + struct brcmf_if *ifp; + struct brcmu_chan ch; + enum nl80211_band band = 0; + enum nl80211_chan_width width = 0; + u32 chanspec; + int freq, err; + + if (!ndev) + return -ENODEV; + ifp = netdev_priv(ndev); + + err = brcmf_fil_iovar_int_get(ifp, "chanspec", &chanspec); + if (err) { + brcmf_err("chanspec failed (%d)\n", err); + return err; + } + + ch.chspec = chanspec; + cfg->d11inf.decchspec(&ch); + + switch (ch.band) { + case BRCMU_CHAN_BAND_2G: + band = NL80211_BAND_2GHZ; + break; + case BRCMU_CHAN_BAND_5G: + band = NL80211_BAND_5GHZ; + break; + } + + switch (ch.bw) { + case BRCMU_CHAN_BW_80: + width = NL80211_CHAN_WIDTH_80; + break; + case BRCMU_CHAN_BW_40: + width = NL80211_CHAN_WIDTH_40; + break; + case BRCMU_CHAN_BW_20: + width = NL80211_CHAN_WIDTH_20; + break; + case BRCMU_CHAN_BW_80P80: + width = NL80211_CHAN_WIDTH_80P80; + break; + case BRCMU_CHAN_BW_160: + width = NL80211_CHAN_WIDTH_160; + break; + } + + freq = ieee80211_channel_to_frequency(ch.control_ch_num, band); + chandef->chan = ieee80211_get_channel(wiphy, freq); + chandef->width = width; + chandef->center_freq1 = ieee80211_channel_to_frequency(ch.chnum, band); + chandef->center_freq2 = 0; + + return 0; +} + static int brcmf_cfg80211_crit_proto_start(struct wiphy *wiphy, struct wireless_dev *wdev, enum nl80211_crit_proto_id proto, @@ -5054,6 +5116,7 @@ static struct cfg80211_ops brcmf_cfg80211_ops = { .mgmt_tx = brcmf_cfg80211_mgmt_tx, .remain_on_channel = brcmf_p2p_remain_on_channel, .cancel_remain_on_channel = brcmf_cfg80211_cancel_remain_on_channel, + .get_channel = brcmf_cfg80211_get_channel, .start_p2p_device = brcmf_p2p_start_device, .stop_p2p_device = brcmf_p2p_stop_device, .crit_proto_start = brcmf_cfg80211_crit_proto_start, -- 1.8.4.5
Re: [PATCH 4.8 2/2] brcmfmac: support get_channel cfg80211 callback
On 20 May 2016 at 09:42, Arend Van Spriel wrote: > On 19-5-2016 13:02, Rafał Miłecki wrote: >> This is important for brcmfmac as the firmware may pick different >> channel than requested. This has been tested with BCM4366B1 (in D-Link >> DIR-885L). > > Can you elaborate? Is this for AP or STA mode? This happens when using 5 GHz PHY with one AP interface. It seems firmware respects: band, channel width & control channel location. However it picks center channel in a more or less random way. E.g. I configured hostapd to setup AP using 36 control channel with VHT80 (chanspec 0xe02a). Almost every time I was restarting AP I got firmware picking different chanspecs, all cases listed below: 0xe03a BND_5G | BW_80 | SB_LL | 58 0xe06a BND_5G | BW_80 | SB_LL | 106 0xe07a BND_5G | BW_80 | SB_LL | 122 0xe09b BND_5G | BW_80 | SB_LL | 155 I'm a bit disappointed seeing this FullMAC firmware doing such tricks on its own. I would prefer to simply relay on hostapd doing this kind of channel switching. I saw many times hostapd e.g. respecting my 40/80 MHz channel but switching control one in order to avoid collisions with another AP's control frames on the same frequency. On the other hand I never got this problem with BCM43602 using Broadcom's official firmware, so it's some new feature you enabled when building brcmfmac4366b-pcie.bin. >> Signed-off-by: Rafał Miłecki >> --- >> .../broadcom/brcm80211/brcmfmac/cfg80211.c | 59 >> ++ >> 1 file changed, 59 insertions(+) >> >> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c >> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c >> index 597495d..4fb9e3a 100644 >> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c >> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c >> @@ -4892,6 +4892,64 @@ exit: >> return err; >> } >> >> +static int brcmf_cfg80211_get_channel(struct wiphy *wiphy, >> + struct wireless_dev *wdev, >> + struct cfg80211_chan_def *chandef) >> +{ >> + struct brcmf_cfg80211_info *cfg = wiphy_to_cfg(wiphy); >> + struct net_device *ndev = wdev->netdev; >> + struct brcmf_if *ifp = netdev_priv(ndev); > > Can this operation be done on a P2P_DEVICE interface, ie. wdev->netdev > == NULL? I don't have any experience with P2P, thanks a lot for pointing this to me! >> + struct brcmu_chan ch; >> + enum nl80211_band band = 0; >> + enum nl80211_chan_width width = 0; >> + u32 chanspec; >> + int freq, err; >> + >> + err = brcmf_fil_iovar_int_get(ifp, "chanspec", &chanspec); >> + if (err) { >> + brcmf_err("chanspec failed (%d)\n", err); >> + return err; >> + } >> + >> + ch.chspec = chanspec; >> + cfg->d11inf.decchspec(&ch); >> + >> + switch (ch.band) { >> + case BRCMU_CHAN_BAND_2G: >> + band = NL80211_BAND_2GHZ; >> + break; >> + case BRCMU_CHAN_BAND_5G: >> + band = NL80211_BAND_5GHZ; >> + break; >> + } >> + >> + switch (ch.bw) { >> + case BRCMU_CHAN_BW_80: >> + width = NL80211_CHAN_WIDTH_80; >> + break; >> + case BRCMU_CHAN_BW_40: >> + width = NL80211_CHAN_WIDTH_40; >> + break; >> + case BRCMU_CHAN_BW_20: >> + width = NL80211_CHAN_WIDTH_20; >> + break; >> + case BRCMU_CHAN_BW_80P80: >> + width = NL80211_CHAN_WIDTH_80P80; >> + break; > > Not much sense to support this given that center_freq2 is set to zero below. OK.
Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)
Hi, On Fri, May 20, 2016 at 12:36 PM, Marc Zyngier wrote: > On 20/05/16 11:30, Andre Heider wrote: >> Hi, >> >> On Fri, May 20, 2016 at 10:14 AM, Giuseppe CAVALLARO >> wrote: >>> On 5/20/2016 9:56 AM, Marc Zyngier wrote: On 20/05/16 06:44, Andre Heider wrote: > > Giuseppe, Alexandre, et al., > > On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier > wrote: >> >> On Thu, 17 Mar 2016 00:56:40 +0100 >> Bert Lindner wrote: >>> >>> On 2016-03-16 18:42, Marc Zyngier wrote: On 16/03/16 15:10, Bert Lindner wrote: > > On 2016-03-16 14:10, Andreas Färber wrote: >> >> Am 16.03.2016 um 13:09 schrieb Robin Murphy: >>> >>> On 16/03/16 11:39, Marc Zyngier wrote: On 16/03/16 11:19, Bert Lindner wrote: > > ... > > For the board sun7i-a20-olinuxino-lime2, there seems to be a > problem > with the eth0 PHY in mainline kernel 4.5.0 that developed since > 4.5.0-rc7. Ethernet does not work, although eth0 is reported: > > ... > > [9.767125] NET: Registered protocol family 10 > [ 10.357405] libphy: PHY stmmac-0: not found > [ 10.362382] eth0: Could not attach to PHY > [ 10.366557] stmmac_open: Cannot attach to PHY (error: -19) > > ... >> >> v4 fixes for 4.5 are here: >> >> https://patchwork.ozlabs.org/patch/598195/ (revert) >> https://patchwork.ozlabs.org/patch/598196/ > > ... Good to know, thanks. Could you also give the potential fix a go (as mentioned by Andreas)? Just to make sure that whatever gets merged next will actually fix the issue. >>> >>> >>> Yes sure, it took a while because I had to travel. Confirmed, the >>> v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2: >>> >>> root@lime2-079f:~# cat /proc/version >>> Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version >>> 4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC >>> 2016 >>> >>> dmesg: >>> [8.245273] NET: Registered protocol family 10 >>> [9.297406] RX IPC Checksum Offload disabled >>> [9.297460] No MAC Management Counters available >>> [9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>> [ 16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up - >>> 1Gbps/Full - flow control rx/tx >>> [ 16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>> >>> The board is connected to my laptop rather than to a switch, so that >>> might be where the flow control message comes from (not sure). Anyway >>> ethernet works. >> >> >> Cool, many thanks for taking the time to test and report. >> >> Hopefully Giuseppe will get this merged quickly enough in mainline, and >> it should then trickle into a 4.5-stable release (cc-ing stable on >> these patches would probably be a good idea, BTW). > > > stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since > v4.5 [0], including all five stable releases :( All the A20 platforms are dead, actually. > The v4.5 patches quoted above are already +4 weeks old, could we > please get them into stable? For that, the maintainer would have needed to CC stable, which he didn't. I'd expect someone who cares to send these patches to stable. It'd be better if the maintainer would do it himself though. >>> >>> >>> sure, I can send the patches to stable (sorry if I missed to add >>> stable ML on CC). >>> >>> Andre, I have not clear if the train of patches actually fix the >>> issue or if you need my support to fix something else. In that case >>> I need some input for debugging (e.g. kernel log). >> >> Bert already confirmed that those two patches fixes stmmac on his >> Lime2, so I assume that it fixes the issue for all A20 platforms. >> >>> let me know, is it enough to re-send the patches only? >> >> Just a resend with cc:stable :) > > Not quite. Please read Documentation/stable_kernel_rules.txt, and the > section that concerns networking patches (and then consult > Documentation/networking/netdev-FAQ.txt which has all the details). Right, it's just been a while since I sent a patch. What I really meant was "do what's neccessary for -stable inclusion" :) Regards, Andre
Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)
On 20/05/16 11:30, Andre Heider wrote: > Hi, > > On Fri, May 20, 2016 at 10:14 AM, Giuseppe CAVALLARO > wrote: >> On 5/20/2016 9:56 AM, Marc Zyngier wrote: >>> >>> On 20/05/16 06:44, Andre Heider wrote: Giuseppe, Alexandre, et al., On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier wrote: > > On Thu, 17 Mar 2016 00:56:40 +0100 > Bert Lindner wrote: >> >> On 2016-03-16 18:42, Marc Zyngier wrote: >>> >>> On 16/03/16 15:10, Bert Lindner wrote: On 2016-03-16 14:10, Andreas Färber wrote: > > Am 16.03.2016 um 13:09 schrieb Robin Murphy: >> >> On 16/03/16 11:39, Marc Zyngier wrote: >>> >>> On 16/03/16 11:19, Bert Lindner wrote: ... For the board sun7i-a20-olinuxino-lime2, there seems to be a problem with the eth0 PHY in mainline kernel 4.5.0 that developed since 4.5.0-rc7. Ethernet does not work, although eth0 is reported: ... [9.767125] NET: Registered protocol family 10 [ 10.357405] libphy: PHY stmmac-0: not found [ 10.362382] eth0: Could not attach to PHY [ 10.366557] stmmac_open: Cannot attach to PHY (error: -19) ... > > v4 fixes for 4.5 are here: > > https://patchwork.ozlabs.org/patch/598195/ (revert) > https://patchwork.ozlabs.org/patch/598196/ ... >>> >>> Good to know, thanks. Could you also give the potential fix a go (as >>> mentioned by Andreas)? Just to make sure that whatever gets merged >>> next >>> will actually fix the issue. >> >> >> Yes sure, it took a while because I had to travel. Confirmed, the >> v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2: >> >> root@lime2-079f:~# cat /proc/version >> Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version >> 4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC >> 2016 >> >> dmesg: >> [8.245273] NET: Registered protocol family 10 >> [9.297406] RX IPC Checksum Offload disabled >> [9.297460] No MAC Management Counters available >> [9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >> [ 16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up - >> 1Gbps/Full - flow control rx/tx >> [ 16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >> >> The board is connected to my laptop rather than to a switch, so that >> might be where the flow control message comes from (not sure). Anyway >> ethernet works. > > > Cool, many thanks for taking the time to test and report. > > Hopefully Giuseppe will get this merged quickly enough in mainline, and > it should then trickle into a 4.5-stable release (cc-ing stable on > these patches would probably be a good idea, BTW). stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since v4.5 [0], including all five stable releases :( >>> >>> >>> All the A20 platforms are dead, actually. >>> The v4.5 patches quoted above are already +4 weeks old, could we please get them into stable? >>> >>> >>> For that, the maintainer would have needed to CC stable, which he >>> didn't. I'd expect someone who cares to send these patches to stable. >>> It'd be better if the maintainer would do it himself though. >> >> >> sure, I can send the patches to stable (sorry if I missed to add >> stable ML on CC). >> >> Andre, I have not clear if the train of patches actually fix the >> issue or if you need my support to fix something else. In that case >> I need some input for debugging (e.g. kernel log). > > Bert already confirmed that those two patches fixes stmmac on his > Lime2, so I assume that it fixes the issue for all A20 platforms. > >> let me know, is it enough to re-send the patches only? > > Just a resend with cc:stable :) Not quite. Please read Documentation/stable_kernel_rules.txt, and the section that concerns networking patches (and then consult Documentation/networking/netdev-FAQ.txt which has all the details). Thanks, M. -- Jazz is not dead. It just smells funny...
Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)
Hi, On Fri, May 20, 2016 at 10:14 AM, Giuseppe CAVALLARO wrote: > On 5/20/2016 9:56 AM, Marc Zyngier wrote: >> >> On 20/05/16 06:44, Andre Heider wrote: >>> >>> Giuseppe, Alexandre, et al., >>> >>> On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier >>> wrote: On Thu, 17 Mar 2016 00:56:40 +0100 Bert Lindner wrote: > > On 2016-03-16 18:42, Marc Zyngier wrote: >> >> On 16/03/16 15:10, Bert Lindner wrote: >>> >>> On 2016-03-16 14:10, Andreas Färber wrote: Am 16.03.2016 um 13:09 schrieb Robin Murphy: > > On 16/03/16 11:39, Marc Zyngier wrote: >> >> On 16/03/16 11:19, Bert Lindner wrote: >>> >>> ... >>> >>> For the board sun7i-a20-olinuxino-lime2, there seems to be a >>> problem >>> with the eth0 PHY in mainline kernel 4.5.0 that developed since >>> 4.5.0-rc7. Ethernet does not work, although eth0 is reported: >>> >>> ... >>> >>> [9.767125] NET: Registered protocol family 10 >>> [ 10.357405] libphy: PHY stmmac-0: not found >>> [ 10.362382] eth0: Could not attach to PHY >>> [ 10.366557] stmmac_open: Cannot attach to PHY (error: -19) >>> >>> ... v4 fixes for 4.5 are here: https://patchwork.ozlabs.org/patch/598195/ (revert) https://patchwork.ozlabs.org/patch/598196/ >>> >>> ... >> >> Good to know, thanks. Could you also give the potential fix a go (as >> mentioned by Andreas)? Just to make sure that whatever gets merged >> next >> will actually fix the issue. > > > Yes sure, it took a while because I had to travel. Confirmed, the > v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2: > > root@lime2-079f:~# cat /proc/version > Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version > 4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC > 2016 > > dmesg: > [8.245273] NET: Registered protocol family 10 > [9.297406] RX IPC Checksum Offload disabled > [9.297460] No MAC Management Counters available > [9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up - > 1Gbps/Full - flow control rx/tx > [ 16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > > The board is connected to my laptop rather than to a switch, so that > might be where the flow control message comes from (not sure). Anyway > ethernet works. Cool, many thanks for taking the time to test and report. Hopefully Giuseppe will get this merged quickly enough in mainline, and it should then trickle into a 4.5-stable release (cc-ing stable on these patches would probably be a good idea, BTW). >>> >>> >>> stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since >>> v4.5 [0], including all five stable releases :( >> >> >> All the A20 platforms are dead, actually. >> >>> The v4.5 patches quoted above are already +4 weeks old, could we >>> please get them into stable? >> >> >> For that, the maintainer would have needed to CC stable, which he >> didn't. I'd expect someone who cares to send these patches to stable. >> It'd be better if the maintainer would do it himself though. > > > sure, I can send the patches to stable (sorry if I missed to add > stable ML on CC). > > Andre, I have not clear if the train of patches actually fix the > issue or if you need my support to fix something else. In that case > I need some input for debugging (e.g. kernel log). Bert already confirmed that those two patches fixes stmmac on his Lime2, so I assume that it fixes the issue for all A20 platforms. > let me know, is it enough to re-send the patches only? Just a resend with cc:stable :) Thanks! Andre
Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
On Fri, May 20, 2016 at 11:20:04AM +0200, Jiri Benc wrote: > On Fri, 20 May 2016 18:12:05 +0900, Simon Horman wrote: [...] > > 3. With regards to the mirroring part of your question, I need to check > >on that and possibly its broken. But my thinking is that a mirroring > >vport would regarded in the same way as any other vport in this respect > >and the presence of the "layer3" flag would control whether an Ethernet > >header is present or not. > > > >It may be the case that its not possible to use a tunnel vport as a > >mirroring vport. And as all other vports currently do not support > >"layer3" then currently an Ethernet header would always be present > >on output to a mirror. > > We can just require a mirror port to be always L2 and output the L3 > packets with zero Ethernet addresses. For mirroring, this should be > okay(?) Yes, I expect that is a good approach. I'll look over the code to see about making that so if its not already the case.
Re: [PATCH] net: pegasus: remove unused variables and labels
On Friday 20 May 2016 12:32:23 Petko Manolov wrote: > Guys, come on. This code is not dead. This code is executed every time an > ethernet packet is received. It takes care of various error statistics. More > importantly, it sends the actual (reported by the adapter) packet length to > the > network layer along with the packet. > > This patch removes skb_put() and netif_rx() calls and effectively kills the > RX > path. Not to mention that the driver was not even compiled before sending > the > patch upstream. > > The only sensible, although cosmetic, change would be to replace: > > if (!count || count < 4) > > with > > if (count < 4) > > even though GCC takes care and it optimizes away "!count" condition. > > Please revert this patch before Linus pulls from the network tree. > Agreed. I failed to check the commit that introduced the warning for the more serious problem. Please revert e00be9e4d0ff, it just makes no sense. Arnd
RE: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
Hi Arnd, Any comments? Please reply when you see the email since we want this eSDHC issue to be fixed soon. All the patches are Freescale-specific and is to fix the eSDHC driver. Could we let them merged first if you were talking about a new way of abstracting getting SoC version. Thanks :) Best regards, Yangbo Lu > -Original Message- > From: Scott Wood [mailto:o...@buserror.net] > Sent: Wednesday, May 11, 2016 11:26 AM > To: Arnd Bergmann; linux-arm-ker...@lists.infradead.org > Cc: Yangbo Lu; linuxppc-...@lists.ozlabs.org; Mark Rutland; > devicet...@vger.kernel.org; ulf.hans...@linaro.org; Russell King; Bhupesh > Sharma; netdev@vger.kernel.org; Joerg Roedel; Kumar Gala; linux- > m...@vger.kernel.org; linux-ker...@vger.kernel.org; Yang-Leo Li; > io...@lists.linux-foundation.org; Rob Herring; linux-...@vger.kernel.org; > Claudiu Manoil; Santosh Shilimkar; Xiaobo Xie; linux-...@vger.kernel.org; > Qiang Zhao > Subject: Re: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240- > R1.0-R2.0 > > On Thu, 2016-05-05 at 13:10 +0200, Arnd Bergmann wrote: > > On Thursday 05 May 2016 09:41:32 Yangbo Lu wrote: > > > > -Original Message- > > > > From: Arnd Bergmann [mailto:a...@arndb.de] > > > > Sent: Thursday, May 05, 2016 4:32 PM > > > > To: linuxppc-...@lists.ozlabs.org > > > > Cc: Yangbo Lu; linux-...@vger.kernel.org; > > > > devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > > > > linux-ker...@vger.kernel.org; linux-...@vger.kernel.org; > > > > linux-...@vger.kernel.org; iommu@lists.linux- foundation.org; > > > > netdev@vger.kernel.org; Mark Rutland; ulf.hans...@linaro.org; > > > > Russell King; Bhupesh Sharma; Joerg Roedel; Santosh Shilimkar; > > > > Yang-Leo Li; Scott Wood; Rob Herring; Claudiu Manoil; Kumar Gala; > > > > Xiaobo Xie; Qiang Zhao > > > > Subject: Re: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for > > > > T4240- > > > > R1.0-R2.0 > > > > > > > > On Thursday 05 May 2016 11:12:30 Yangbo Lu wrote: > > > > > IIRC, it is the same IP block as i.MX and Arnd's point is this > > > > > won't even compile on !PPC. It is things like this that prevent > > > > > sharing the driver. > > > > > > The whole point of using the MMIO SVR instead of the PPC SPR is so > > > that it will work on ARM... The guts driver should build on any > > > platform as long as OF is enabled, and if it doesn't find a node to > > > bind to it will return 0 for SVR, and the eSDHC driver will continue > > > (after printing an error that should be removed) without the ability > > > to test for errata based on SVR. > > > > It feels like a bad design to have to come up with a different method > > for each SoC type here when they all do the same thing and want to > > identify some variant of the chip to do device specific quirks. > > > > As far as I'm concerned, every driver in drivers/soc that needs to > > export a symbol to be used by a device driver is an indication that we > > don't have the right set of abstractions yet. There are cases that are > > not worth abstracting because the functionality is rather obscure and > > only a couple of drivers for one particular chip ever need it. > > > > Finding out the version of the SoC does not look like this case. > > I'm open to new ways of abstracting this, but can that please be > discussed after these patches are merged? This patchset is fixing a > problem, the existing abstraction is unappealing and not widely adopted, > a new abstraction is not ready, and we're only touching code for our > hardware. > > Oh, and the existing abstraction isn't even "existing". I don't see any > examples where soc_device is being used like this -- or even any way for > a driver (the one consuming the information, not the soc "driver") to get > a reference to the soc_device that's been registered short of searching > for the device object by name -- and you're asking for new functionality > in drivers/base/soc.c. > > > > > I think the first four patches take care of building for ARM, but > > > > the problem remains if you want to enable COMPILE_TEST as we need > > > > for certain automated checking. > > > > > > What specific problem is there with COMPILE_TEST? > > > > COMPILE_TEST is solvable here and the way it is implemented in this > > case (selecting FSL_GUTS from the driver) indeed looks like it works > > correctly, but it's still awkward that this means building the SoC > > specific ID stuff into the vmlinux binary for any driver that uses > > something like that for a particular SoC. > > Please keep in mind that this is a Freescale-specific driver... it's not > as if we're attaching this dependency to common SDHCI code. > > > > > > > > Dealing with Si revs is a common problem. We should have a > > > > > common solution. There is soc_device for this purpose. > > > > > > > > Exactly. The last time this came up, I think we agreed to > > > > implement a helper using glob_match() on the soc_device strings. > > > > Unfortunately this hasn't happened
Re: [PATCH] net: pegasus: remove unused variables and labels
Guys, come on. This code is not dead. This code is executed every time an ethernet packet is received. It takes care of various error statistics. More importantly, it sends the actual (reported by the adapter) packet length to the network layer along with the packet. This patch removes skb_put() and netif_rx() calls and effectively kills the RX path. Not to mention that the driver was not even compiled before sending the patch upstream. The only sensible, although cosmetic, change would be to replace: if (!count || count < 4) with if (count < 4) even though GCC takes care and it optimizes away "!count" condition. Please revert this patch before Linus pulls from the network tree. Petko On 16-05-20 10:22:30, Arnd Bergmann wrote: > The latest dead-code removal was slightly incomplete and > left a few things behind that we now get compiler warnings > for: > > drivers/net/usb/pegasus.c: In function 'read_bulk_callback': > drivers/net/usb/pegasus.c:475:1: error: label 'goon' defined but not used > [-Werror=unused-label] > drivers/net/usb/pegasus.c:446:8: error: unused variable 'pkt_len' > [-Werror=unused-variable] > drivers/net/usb/pegasus.c:445:6: error: unused variable 'buf' > [-Werror=unused-variable] > drivers/net/usb/pegasus.c:443:17: error: unused variable 'count' > [-Werror=unused-variable] > > This removes them as well. > > Signed-off-by: Arnd Bergmann > Fixes: e00be9e4d0ff ("net: pegasus: remove dead coding") > --- > drivers/net/usb/pegasus.c | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c > index 1903d2e2b62d..df6d90ab7ca7 100644 > --- a/drivers/net/usb/pegasus.c > +++ b/drivers/net/usb/pegasus.c > @@ -440,10 +440,8 @@ static void read_bulk_callback(struct urb *urb) > { > pegasus_t *pegasus = urb->context; > struct net_device *net; > - int rx_status, count = urb->actual_length; > + int rx_status; > int status = urb->status; > - u8 *buf = urb->transfer_buffer; > - __u16 pkt_len; > > if (!pegasus) > return; > @@ -472,7 +470,6 @@ static void read_bulk_callback(struct urb *urb) > netif_dbg(pegasus, rx_err, net, "RX status %d\n", status); > } > > -goon: > usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb, > usb_rcvbulkpipe(pegasus->usb, 1), > pegasus->rx_skb->data, PEGASUS_MTU, > -- > 2.7.0 > >
Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
On Fri, 20 May 2016 18:12:05 +0900, Simon Horman wrote: > 1. push_eth adds an Ethernet header with all-zero addresses and >the Ethernet type as determined from skb->protocol which is in >turn determined by the tunnel header (we have discussed that >bit before). > >In principle it is pushed when needed. And this happens automatically >as controlled by user-space. > >It is possible to modify the Ethernet addresses using a custom rule. >(I need to exercise that more often.) > > 2. For the GRE part of the scenario above it is important to know that with >the accompanying user-space patch set OvS user-space the user-space >representation of a vport (from now on simply vport) may be "layer3" or >not. > >This allows OvS user-space to determine if an Ethernet header should be >present or not on receive. And if it needs to be present or not on >transmit. This allows it to automatically use pop_eth and push_eth to >control the presence of an Ethernet header so its there when it needs to >be and not when it doesn't. > >So if a GRE vport is "layer3" then no Ethernet header should be >present on transmit, regardless of where the packet came from. And >conversely if the GRE vport is not "layer3" then an Ethernet header >should be present. I think this works for me. Thanks a lot for answering my questions! > 3. With regards to the mirroring part of your connection, I need to check >on that and possibly its broken. But my thinking is that a mirroring >vport would regarded in the same way as any other vport in this respect >and the presence of the "layer3" flag would control whether an Ethernet >header is present or not. > >It may be the case that its not possible to use a tunnel vport as a >mirroring vport. And as all other vports currently do not support >"layer3" then currently an Ethernet header would always be present >on output to a mirror. We can just require a mirror port to be always L2 and output the L3 packets with zero Ethernet addresses. For mirroring, this should be okay(?) Jiri
Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
On Fri, May 20, 2016 at 10:39:39AM +0200, Jiri Benc wrote: > On Fri, 20 May 2016 17:16:13 +0900, Simon Horman wrote: > > My understanding is that currently OvS handles access ports using a > > push_vlan action. > > When needed (i.e. when the packet goes to a non-access port), yes. > > > And that this patch set in conjunction with its > > user-space counterpart should ensure that a push_eth action occurs first. > > This is the context of my remarks above. > > Okay, works for me principally. If it's later found insufficient, > relaxing push_vlan and pop_vlan to work also for L3 frames is still > easily possible without breaking compatibility. Right. I'm all for allowing extension later if the need arises. > Out of curiosity (and without looking at the user space patchset), what > will the pushed Ethernet header contain? E.g., consider the following > scenario: two GRE ports, both set as access ports with the same tag, > and a mirror port mirroring everything. Now an IP packet without inner > Ethernet header is received on one of the GRE interfaces. > > Will the packet be output to the second GRE port? Will it be sent out > without the inner Ethernet header? Are custom rules necessary, or will > NORMAL take care of this? What will be sent to the mirror port? Let me take a stab at answering that without running any tests. 1. push_eth adds an Ethernet header with all-zero addresses and the Ethernet type as determined from skb->protocol which is in turn determined by the tunnel header (we have discussed that bit before). In principle it is pushed when needed. And this happens automatically as controlled by user-space. It is possible to modify the Ethernet addresses using a custom rule. (I need to exercise that more often.) 2. For the GRE part of the scenario above it is important to know that with the accompanying user-space patch set OvS user-space the user-space representation of a vport (from now on simply vport) may be "layer3" or not. This allows OvS user-space to determine if an Ethernet header should be present or not on receive. And if it needs to be present or not on transmit. This allows it to automatically use pop_eth and push_eth to control the presence of an Ethernet header so its there when it needs to be and not when it doesn't. So if a GRE vport is "layer3" then no Ethernet header should be present on transmit, regardless of where the packet came from. And conversely if the GRE vport is not "layer3" then an Ethernet header should be present. 3. With regards to the mirroring part of your connection, I need to check on that and possibly its broken. But my thinking is that a mirroring vport would regarded in the same way as any other vport in this respect and the presence of the "layer3" flag would control whether an Ethernet header is present or not. It may be the case that its not possible to use a tunnel vport as a mirroring vport. And as all other vports currently do not support "layer3" then currently an Ethernet header would always be present on output to a mirror.
Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
On Fri, 20 May 2016 17:16:13 +0900, Simon Horman wrote: > My understanding is that currently OvS handles access ports using a > push_vlan action. When needed (i.e. when the packet goes to a non-access port), yes. > And that this patch set in conjunction with its > user-space counterpart should ensure that a push_eth action occurs first. > This is the context of my remarks above. Okay, works for me principally. If it's later found insufficient, relaxing push_vlan and pop_vlan to work also for L3 frames is still easily possible without breaking compatibility. Out of curiosity (and without looking at the user space patchset), what will the pushed Ethernet header contain? E.g., consider the following scenario: two GRE ports, both set as access ports with the same tag, and a mirror port mirroring everything. Now an IP packet without inner Ethernet header is received on one of the GRE interfaces. Will the packet be output to the second GRE port? Will it be sent out without the inner Ethernet header? Are custom rules necessary, or will NORMAL take care of this? What will be sent to the mirror port? Thanks! Jiri
[PATCH] net: pegasus: remove unused variables and labels
The latest dead-code removal was slightly incomplete and left a few things behind that we now get compiler warnings for: drivers/net/usb/pegasus.c: In function 'read_bulk_callback': drivers/net/usb/pegasus.c:475:1: error: label 'goon' defined but not used [-Werror=unused-label] drivers/net/usb/pegasus.c:446:8: error: unused variable 'pkt_len' [-Werror=unused-variable] drivers/net/usb/pegasus.c:445:6: error: unused variable 'buf' [-Werror=unused-variable] drivers/net/usb/pegasus.c:443:17: error: unused variable 'count' [-Werror=unused-variable] This removes them as well. Signed-off-by: Arnd Bergmann Fixes: e00be9e4d0ff ("net: pegasus: remove dead coding") --- drivers/net/usb/pegasus.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c index 1903d2e2b62d..df6d90ab7ca7 100644 --- a/drivers/net/usb/pegasus.c +++ b/drivers/net/usb/pegasus.c @@ -440,10 +440,8 @@ static void read_bulk_callback(struct urb *urb) { pegasus_t *pegasus = urb->context; struct net_device *net; - int rx_status, count = urb->actual_length; + int rx_status; int status = urb->status; - u8 *buf = urb->transfer_buffer; - __u16 pkt_len; if (!pegasus) return; @@ -472,7 +470,6 @@ static void read_bulk_callback(struct urb *urb) netif_dbg(pegasus, rx_err, net, "RX status %d\n", status); } -goon: usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb, usb_rcvbulkpipe(pegasus->usb, 1), pegasus->rx_skb->data, PEGASUS_MTU, -- 2.7.0
Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
On Fri, May 20, 2016 at 05:11:23PM +0900, Simon Horman wrote: > On Fri, May 20, 2016 at 10:00:28AM +0200, Jiri Benc wrote: > > On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote: > > > The second option does seem rather tempting although I'm not sure > > > that it actually plays out in the access-port scenario at this time. > > > > We support gre ports to be access ports currently. With conversion to > > ipgre, this needs to continue working. It's no problem for frames with > > the Ethernet header but now we have a situation where a port is tagged, > > thus the user expects that packets received on that port will behave > > accordingly. I don't think we can make some packets honor this and some > > ignore this; and we can't disallow gre to be an access port. > > > > How do you plan to solve this? By user space always pushing an ethernet > > header before push_vlan? > > Yes. That is my understanding of how OvS currently handles access ports but > I have a feeling that either I am mistaken or that you are referring to a > slightly different scenario. Hi again. I apologise for having sent my previous email a little too quickly. My understanding is that currently OvS handles access ports using a push_vlan action. And that this patch set in conjunction with its user-space counterpart should ensure that a push_eth action occurs first. This is the context of my remarks above.
Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)
Hello On 5/20/2016 9:56 AM, Marc Zyngier wrote: On 20/05/16 06:44, Andre Heider wrote: Giuseppe, Alexandre, et al., On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier wrote: On Thu, 17 Mar 2016 00:56:40 +0100 Bert Lindner wrote: On 2016-03-16 18:42, Marc Zyngier wrote: On 16/03/16 15:10, Bert Lindner wrote: On 2016-03-16 14:10, Andreas Färber wrote: Am 16.03.2016 um 13:09 schrieb Robin Murphy: On 16/03/16 11:39, Marc Zyngier wrote: On 16/03/16 11:19, Bert Lindner wrote: ... For the board sun7i-a20-olinuxino-lime2, there seems to be a problem with the eth0 PHY in mainline kernel 4.5.0 that developed since 4.5.0-rc7. Ethernet does not work, although eth0 is reported: ... [9.767125] NET: Registered protocol family 10 [ 10.357405] libphy: PHY stmmac-0: not found [ 10.362382] eth0: Could not attach to PHY [ 10.366557] stmmac_open: Cannot attach to PHY (error: -19) ... v4 fixes for 4.5 are here: https://patchwork.ozlabs.org/patch/598195/ (revert) https://patchwork.ozlabs.org/patch/598196/ ... Good to know, thanks. Could you also give the potential fix a go (as mentioned by Andreas)? Just to make sure that whatever gets merged next will actually fix the issue. Yes sure, it took a while because I had to travel. Confirmed, the v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2: root@lime2-079f:~# cat /proc/version Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version 4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC 2016 dmesg: [8.245273] NET: Registered protocol family 10 [9.297406] RX IPC Checksum Offload disabled [9.297460] No MAC Management Counters available [9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready The board is connected to my laptop rather than to a switch, so that might be where the flow control message comes from (not sure). Anyway ethernet works. Cool, many thanks for taking the time to test and report. Hopefully Giuseppe will get this merged quickly enough in mainline, and it should then trickle into a 4.5-stable release (cc-ing stable on these patches would probably be a good idea, BTW). stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since v4.5 [0], including all five stable releases :( All the A20 platforms are dead, actually. The v4.5 patches quoted above are already +4 weeks old, could we please get them into stable? For that, the maintainer would have needed to CC stable, which he didn't. I'd expect someone who cares to send these patches to stable. It'd be better if the maintainer would do it himself though. sure, I can send the patches to stable (sorry if I missed to add stable ML on CC). Andre, I have not clear if the train of patches actually fix the issue or if you need my support to fix something else. In that case I need some input for debugging (e.g. kernel log). let me know, is it enough to re-send the patches only? Regards Peppe Thanks, M.
Re: [net-next PATCH V1 0/3] net: enable use of kmem_cache_alloc_bulk in network stack
On Mon, 09 May 2016 15:44:24 +0200 Jesper Dangaard Brouer wrote: > This patchset enables use of the slab/kmem_cache bulk alloc API, and > completes the use the slab/kmem_cache bulking API in the network stack. > > I've not included the patches that introduce a SKB bulk hint, which > would beneficial for drivers like mlx5. Lets see if we can agree on > this patchset first. Conclusion: Guess we could not agree on this patchset. The main problem seems to be that the patch always bulk allocated 8 SKBs, without knowing if we actually need them. My earlier patchset also included a "hint" interface, as mlx5 driver knows after processing the RX queue, how many SKBs it needs, thus it can request to bulk alloc the needed amount. (The only problem with mlx5 is that preallocating SKBs into it's RX ring is a broken scheme). A better scheme would be, if the driver on RX knows how many packets are available in the RX queue. Then we can bulk alloc the needed amount of SKBs. Thus, we can circle back to this once the driver can provide this information. (This goes nicely hand-in-hand with my idea of pulling out avail RX packet-pages from the RX ring, and start prefetch'es to hide the cache-misses). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
On Fri, May 20, 2016 at 10:00:28AM +0200, Jiri Benc wrote: > On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote: > > The second option does seem rather tempting although I'm not sure > > that it actually plays out in the access-port scenario at this time. > > We support gre ports to be access ports currently. With conversion to > ipgre, this needs to continue working. It's no problem for frames with > the Ethernet header but now we have a situation where a port is tagged, > thus the user expects that packets received on that port will behave > accordingly. I don't think we can make some packets honor this and some > ignore this; and we can't disallow gre to be an access port. > > How do you plan to solve this? By user space always pushing an ethernet > header before push_vlan? Yes. That is my understanding of how OvS currently handles access ports but I have a feeling that either I am mistaken or that you are referring to a slightly different scenario.
Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote: > The second option does seem rather tempting although I'm not sure > that it actually plays out in the access-port scenario at this time. We support gre ports to be access ports currently. With conversion to ipgre, this needs to continue working. It's no problem for frames with the Ethernet header but now we have a situation where a port is tagged, thus the user expects that packets received on that port will behave accordingly. I don't think we can make some packets honor this and some ignore this; and we can't disallow gre to be an access port. How do you plan to solve this? By user space always pushing an ethernet header before push_vlan? Jiri
Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)
On 20/05/16 06:44, Andre Heider wrote: > Giuseppe, Alexandre, et al., > > On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier wrote: >> On Thu, 17 Mar 2016 00:56:40 +0100 >> Bert Lindner wrote: >>> On 2016-03-16 18:42, Marc Zyngier wrote: On 16/03/16 15:10, Bert Lindner wrote: > On 2016-03-16 14:10, Andreas Färber wrote: >> Am 16.03.2016 um 13:09 schrieb Robin Murphy: >>> On 16/03/16 11:39, Marc Zyngier wrote: On 16/03/16 11:19, Bert Lindner wrote: > ... > For the board sun7i-a20-olinuxino-lime2, there seems to be a problem > with the eth0 PHY in mainline kernel 4.5.0 that developed since > 4.5.0-rc7. Ethernet does not work, although eth0 is reported: > ... > [9.767125] NET: Registered protocol family 10 > [ 10.357405] libphy: PHY stmmac-0: not found > [ 10.362382] eth0: Could not attach to PHY > [ 10.366557] stmmac_open: Cannot attach to PHY (error: -19) > ... >> v4 fixes for 4.5 are here: >> >> https://patchwork.ozlabs.org/patch/598195/ (revert) >> https://patchwork.ozlabs.org/patch/598196/ > ... Good to know, thanks. Could you also give the potential fix a go (as mentioned by Andreas)? Just to make sure that whatever gets merged next will actually fix the issue. >>> >>> Yes sure, it took a while because I had to travel. Confirmed, the >>> v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2: >>> >>> root@lime2-079f:~# cat /proc/version >>> Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version >>> 4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC 2016 >>> >>> dmesg: >>> [8.245273] NET: Registered protocol family 10 >>> [9.297406] RX IPC Checksum Offload disabled >>> [9.297460] No MAC Management Counters available >>> [9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>> [ 16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up - >>> 1Gbps/Full - flow control rx/tx >>> [ 16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>> >>> The board is connected to my laptop rather than to a switch, so that >>> might be where the flow control message comes from (not sure). Anyway >>> ethernet works. >> >> Cool, many thanks for taking the time to test and report. >> >> Hopefully Giuseppe will get this merged quickly enough in mainline, and >> it should then trickle into a 4.5-stable release (cc-ing stable on >> these patches would probably be a good idea, BTW). > > stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since > v4.5 [0], including all five stable releases :( All the A20 platforms are dead, actually. > The v4.5 patches quoted above are already +4 weeks old, could we > please get them into stable? For that, the maintainer would have needed to CC stable, which he didn't. I'd expect someone who cares to send these patches to stable. It'd be better if the maintainer would do it himself though. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] net: alx: use custom skb allocator
Hi, Please ignore this patch. I found the problem and made the patch with kernel 4.4 with Ubuntu 12.04 on Lenovo Y580. After submitting the patch, I tried to google the datasheet for Atheros 8161 dev, and found there was already a kernel bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=70761 that had a fix from Jarod Wilson which get merged in v4.5 (commit c406700cd) Per my test, the new 4.6.0 kernel works fine with Jarod's patch. Thanks, Feng On Fri, May 20, 2016 at 01:41:03PM +0800, Feng Tang wrote: > This patch follows Eric Dumazet's commit 7b70176421 to fix one > exactly same bug in alx driver, that the network link will > be lost in 1-5 minutes after the device is up. > > Following is a git log from Eric's 7b70176421: > > "We had reports ( https://bugzilla.kernel.org/show_bug.cgi?id=54021 ) > that using high order pages for skb allocations is problematic for atl1c > > We do not know exactly what the problem is, but we suspect that crossing > 4K pages is not well supported by this hardware. > > Use a custom allocator, using page allocator and 2K fragments for > optimal stack behavior. We might make this allocator generic > in future kernels." > > And my debug shows the same suspect, most of the errors happen > when there is a RX buffer address with 0x..f80, hopefully > this will get noticed and fixed from silicon side. > > My victim is a Lenovo Y580 Laptop with Atheros ALX AR8161 etherenet > device(PCI ID 1969:1091), with this patch the ethernet dev > works just fine > > Signed-off-by: Feng Tang > --- > drivers/net/ethernet/atheros/alx/alx.h | 4 +++ > drivers/net/ethernet/atheros/alx/main.c | 48 > - > 2 files changed, 51 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/atheros/alx/alx.h > b/drivers/net/ethernet/atheros/alx/alx.h > index 8fc93c5..d02c424 100644 > --- a/drivers/net/ethernet/atheros/alx/alx.h > +++ b/drivers/net/ethernet/atheros/alx/alx.h > @@ -96,6 +96,10 @@ struct alx_priv { > unsigned int rx_ringsz; > unsigned int rxbuf_size; > > + struct page *rx_page; > + unsigned int rx_page_offset; > + unsigned int rx_frag_size; > + > struct napi_struct napi; > struct alx_tx_queue txq; > struct alx_rx_queue rxq; > diff --git a/drivers/net/ethernet/atheros/alx/main.c > b/drivers/net/ethernet/atheros/alx/main.c > index 9fe8b5e..c98acdc 100644 > --- a/drivers/net/ethernet/atheros/alx/main.c > +++ b/drivers/net/ethernet/atheros/alx/main.c > @@ -70,6 +70,35 @@ static void alx_free_txbuf(struct alx_priv *alx, int entry) > } > } > > +static struct sk_buff *alx_alloc_skb(struct alx_priv *alx, gfp_t gfp) > +{ > + struct sk_buff *skb; > + struct page *page; > + > + if (alx->rx_frag_size > PAGE_SIZE) > + return __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp); > + > + page = alx->rx_page; > + if (!page) { > + alx->rx_page = page = alloc_page(gfp); > + if (unlikely(!page)) > + return NULL; > + alx->rx_page_offset = 0; > + } > + > + skb = build_skb(page_address(page) + alx->rx_page_offset, > + alx->rx_frag_size); > + if (likely(skb)) { > + alx->rx_page_offset += alx->rx_frag_size; > + if (alx->rx_page_offset >= PAGE_SIZE) > + alx->rx_page = NULL; > + else > + get_page(page); > + } > + return skb; > +} > + > + > static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t gfp) > { > struct alx_rx_queue *rxq = &alx->rxq; > @@ -86,7 +115,7 @@ static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t > gfp) > while (!cur_buf->skb && next != rxq->read_idx) { > struct alx_rfd *rfd = &rxq->rfd[cur]; > > - skb = __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp); > + skb = alx_alloc_skb(alx, gfp); > if (!skb) > break; > dma = dma_map_single(&alx->hw.pdev->dev, > @@ -124,6 +153,7 @@ static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t > gfp) > alx_write_mem16(&alx->hw, ALX_RFD_PIDX, cur); > } > > + > return count; > } > > @@ -592,6 +622,11 @@ static void alx_free_rings(struct alx_priv *alx) > kfree(alx->txq.bufs); > kfree(alx->rxq.bufs); > > + if (alx->rx_page) { > + put_page(alx->rx_page); > + alx->rx_page = NULL; > + } > + > dma_free_coherent(&alx->hw.pdev->dev, > alx->descmem.size, > alx->descmem.virt, > @@ -646,6 +681,7 @@ static int alx_request_irq(struct alx_priv *alx) > alx->dev->name, alx); > if (!err) > goto out; > + > /* fall back to legacy interrupt */ > pci_disable_msi(alx->hw.pdev); > } > @@ -689,6 +725,7 @@ static int alx_init_sw(s
Re: [PATCH] brcmfmac: use kmemdup
On 19-5-2016 15:59, Muhammad Falak R Wani wrote: > Use kmemdup when some other buffer is immediately copied into allocated > region. It replaces call to allocation followed by memcpy, by a single > call to kmemdup. Acked-by: Arend van Spriel > Signed-off-by: Muhammad Falak R Wani > --- > drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > index d0631b6..705adaa 100644 > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > @@ -6699,11 +6699,10 @@ struct brcmf_cfg80211_info > *brcmf_cfg80211_attach(struct brcmf_pub *drvr, > return NULL; > } > > - ops = kzalloc(sizeof(*ops), GFP_KERNEL); > + ops = kmemdup(&brcmf_cfg80211_ops, sizeof(*ops), GFP_KERNEL); > if (!ops) > return NULL; > > - memcpy(ops, &brcmf_cfg80211_ops, sizeof(*ops)); > ifp = netdev_priv(ndev); > #ifdef CONFIG_PM > if (brcmf_feat_is_enabled(ifp, BRCMF_FEAT_WOWL_GTK)) >
Re: [PATCH 4.8 1/2] brcmutil: add field storing control channel to the struct brcmu_chan
On 19-5-2016 13:02, Rafał Miłecki wrote: > Our d11 code supports encoding/decoding channel info into/from chanspec > format used by firmware. Current implementation is quite misleading > because of the way "chnum" field is used. > When encoding channel info, "chnum" has to be filled by a caller with > *center* channel number. However when decoding chanspec the same field > is filled with a *control* channel number. > > This can be confusing and doesn't allow accessing all info when > decoding. Solve it by adding a separated field for control channel. The need to "access all info" is probably the other patch so you might hint here that this change is needed for the .get_channel() callback. Reviewed-by: Arend van Spriel > Signed-off-by: Rafał Miłecki > --- > .../broadcom/brcm80211/brcmfmac/cfg80211.c | 17 + > .../net/wireless/broadcom/brcm80211/brcmfmac/p2p.c | 10 +- > .../net/wireless/broadcom/brcm80211/brcmutil/d11.c | 18 ++ > .../broadcom/brcm80211/include/brcmu_d11.h | 22 > ++ > 4 files changed, 46 insertions(+), 21 deletions(-) > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > index d0631b6..597495d 100644 > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > @@ -2734,7 +2734,7 @@ static s32 brcmf_inform_single_bss(struct > brcmf_cfg80211_info *cfg, > if (!bi->ctl_ch) { > ch.chspec = le16_to_cpu(bi->chanspec); > cfg->d11inf.decchspec(&ch); > - bi->ctl_ch = ch.chnum; > + bi->ctl_ch = ch.control_ch_num; > } > channel = bi->ctl_ch; > > @@ -2852,7 +2852,7 @@ static s32 brcmf_inform_ibss(struct brcmf_cfg80211_info > *cfg, > else > band = wiphy->bands[NL80211_BAND_5GHZ]; > > - freq = ieee80211_channel_to_frequency(ch.chnum, band->band); > + freq = ieee80211_channel_to_frequency(ch.control_ch_num, band->band); > cfg->channel = freq; > notify_channel = ieee80211_get_channel(wiphy, freq); > > @@ -2862,7 +2862,7 @@ static s32 brcmf_inform_ibss(struct brcmf_cfg80211_info > *cfg, > notify_ielen = le32_to_cpu(bi->ie_length); > notify_signal = (s16)le16_to_cpu(bi->RSSI) * 100; > > - brcmf_dbg(CONN, "channel: %d(%d)\n", ch.chnum, freq); > + brcmf_dbg(CONN, "channel: %d(%d)\n", ch.control_ch_num, freq); > brcmf_dbg(CONN, "capability: %X\n", notify_capability); > brcmf_dbg(CONN, "beacon interval: %d\n", notify_interval); > brcmf_dbg(CONN, "signal: %d\n", notify_signal); > @@ -5280,7 +5280,7 @@ brcmf_bss_roaming_done(struct brcmf_cfg80211_info *cfg, > else > band = wiphy->bands[NL80211_BAND_5GHZ]; > > - freq = ieee80211_channel_to_frequency(ch.chnum, band->band); > + freq = ieee80211_channel_to_frequency(ch.control_ch_num, band->band); > notify_channel = ieee80211_get_channel(wiphy, freq); > > done: > @@ -5802,14 +5802,15 @@ static int brcmf_construct_chaninfo(struct > brcmf_cfg80211_info *cfg, > channel = band->channels; > index = band->n_channels; > for (j = 0; j < band->n_channels; j++) { > - if (channel[j].hw_value == ch.chnum) { > + if (channel[j].hw_value == ch.control_ch_num) { > index = j; > break; > } > } > channel[index].center_freq = > - ieee80211_channel_to_frequency(ch.chnum, band->band); > - channel[index].hw_value = ch.chnum; > + ieee80211_channel_to_frequency(ch.control_ch_num, > +band->band); > + channel[index].hw_value = ch.control_ch_num; > > /* assuming the chanspecs order is HT20, >* HT40 upper, HT40 lower, and VHT80. > @@ -5911,7 +5912,7 @@ static int brcmf_enable_bw40_2g(struct > brcmf_cfg80211_info *cfg) > if (WARN_ON(ch.bw != BRCMU_CHAN_BW_40)) > continue; > for (j = 0; j < band->n_channels; j++) { > - if (band->channels[j].hw_value == ch.chnum) > + if (band->channels[j].hw_value == > ch.control_ch_num) > break; > } > if (WARN_ON(j == band->n_channels)) > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c > b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c > index a70cda6..1652a48 100644 > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c > @@ -1246,7 +1246,7 @@ bool brcmf_p2p_scan_finding_common_cha
Re: [PATCH 4.8 2/2] brcmfmac: support get_channel cfg80211 callback
On 19-5-2016 13:02, Rafał Miłecki wrote: > This is important for brcmfmac as the firmware may pick different > channel than requested. This has been tested with BCM4366B1 (in D-Link > DIR-885L). Can you elaborate? Is this for AP or STA mode? > Signed-off-by: Rafał Miłecki > --- > .../broadcom/brcm80211/brcmfmac/cfg80211.c | 59 > ++ > 1 file changed, 59 insertions(+) > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > index 597495d..4fb9e3a 100644 > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c > @@ -4892,6 +4892,64 @@ exit: > return err; > } > > +static int brcmf_cfg80211_get_channel(struct wiphy *wiphy, > + struct wireless_dev *wdev, > + struct cfg80211_chan_def *chandef) > +{ > + struct brcmf_cfg80211_info *cfg = wiphy_to_cfg(wiphy); > + struct net_device *ndev = wdev->netdev; > + struct brcmf_if *ifp = netdev_priv(ndev); Can this operation be done on a P2P_DEVICE interface, ie. wdev->netdev == NULL? > + struct brcmu_chan ch; > + enum nl80211_band band = 0; > + enum nl80211_chan_width width = 0; > + u32 chanspec; > + int freq, err; > + > + err = brcmf_fil_iovar_int_get(ifp, "chanspec", &chanspec); > + if (err) { > + brcmf_err("chanspec failed (%d)\n", err); > + return err; > + } > + > + ch.chspec = chanspec; > + cfg->d11inf.decchspec(&ch); > + > + switch (ch.band) { > + case BRCMU_CHAN_BAND_2G: > + band = NL80211_BAND_2GHZ; > + break; > + case BRCMU_CHAN_BAND_5G: > + band = NL80211_BAND_5GHZ; > + break; > + } > + > + switch (ch.bw) { > + case BRCMU_CHAN_BW_80: > + width = NL80211_CHAN_WIDTH_80; > + break; > + case BRCMU_CHAN_BW_40: > + width = NL80211_CHAN_WIDTH_40; > + break; > + case BRCMU_CHAN_BW_20: > + width = NL80211_CHAN_WIDTH_20; > + break; > + case BRCMU_CHAN_BW_80P80: > + width = NL80211_CHAN_WIDTH_80P80; > + break; Not much sense to support this given that center_freq2 is set to zero below. Regards, Arend > + case BRCMU_CHAN_BW_160: > + width = NL80211_CHAN_WIDTH_160; > + break; > + } > + > + freq = ieee80211_channel_to_frequency(ch.control_ch_num, band); > + chandef->chan = ieee80211_get_channel(wiphy, freq); > + chandef->width = width; > + chandef->center_freq1 = ieee80211_channel_to_frequency(ch.chnum, band); > + chandef->center_freq2 = 0; > + > + return 0; > +} > + > static int brcmf_cfg80211_crit_proto_start(struct wiphy *wiphy, > struct wireless_dev *wdev, > enum nl80211_crit_proto_id proto, > @@ -5054,6 +5112,7 @@ static struct cfg80211_ops brcmf_cfg80211_ops = { > .mgmt_tx = brcmf_cfg80211_mgmt_tx, > .remain_on_channel = brcmf_p2p_remain_on_channel, > .cancel_remain_on_channel = brcmf_cfg80211_cancel_remain_on_channel, > + .get_channel = brcmf_cfg80211_get_channel, > .start_p2p_device = brcmf_p2p_start_device, > .stop_p2p_device = brcmf_p2p_stop_device, > .crit_proto_start = brcmf_cfg80211_crit_proto_start, >
Re: [PATCH 1/1] net: pegasus: remove dead coding
On 16-05-19 11:35:42, David Miller wrote: > From: Heinrich Schuchardt > Date: Wed, 18 May 2016 02:13:30 +0200 > > > (!count || count < 4) is always true. > > So let's remove the coding which is dead at least since 2005. > > > > Signed-off-by: Heinrich Schuchardt > > Applied. David, the patch you applied is broken. It seems that you didn't follow the discussion from the past couple of days. Please revert it. Petko
Re: [RFC] net: remove busylock
On Thu, 19 May 2016 11:03:32 -0700 Alexander Duyck wrote: > On Thu, May 19, 2016 at 10:08 AM, Eric Dumazet wrote: > > busylock was added at the time we had expensive ticket spinlocks > > > > (commit 79640a4ca6955e3ebdb7038508fa7a0cd7fa5527 ("net: add additional > > lock to qdisc to increase throughput") > > > > Now kernel spinlocks are MCS, this busylock things is no longer > > relevant. It is slowing down things a bit. > > > > > > With HTB qdisc, here are the numbers for 200 concurrent TCP_RR, on a host > > with 48 hyperthreads. > > [...] > > The main point of the busy lock is to deal with the bulk throughput > case, not the latency case which would be relatively well behaved. > The problem wasn't really related to lock bouncing slowing things > down. It was the fairness between the threads that was killing us > because the dequeue needs to have priority. Yes, exactly. > The main problem that the busy lock solved was the fact that you could > start a number of stream tests equal to the number of CPUs in a given > system and the result was that the performance would drop off a cliff > and you would drop almost all the packets for almost all the streams > because the qdisc never had a chance to drain because it would be CPU > - 1 enqueues, followed by 1 dequeue. Notice 1 enqueue does not guarantee 1 dequeue. If one CPU has entered dequeue/xmit state (setting __QDISC___STATE_RUNNING) then other CPUs will just enqueue. Thus, N CPUs will enqueue (a packet each), and 1 CPU will dequeue a packet. The problem arise due to the fairness of the ticket spinlock, and AFAIK the MCS lock is even more fair. All enqueue's get their turn before the dequeue can move forward. And the qdisc dequeue CPU have an unfortunate pattern of releasing the qdisc root_lock and acquiring it again (qdisc_restart+sch_direct_xmit). Thus, while dequeue is waiting re-aquire the root_lock, N CPUs will enqueue packets. The hole idea behind allowing bulk qdisc dequeue, was to mitigate this, by allowing dequeue to do more work, while holding the lock. You mention HTB. Notice HTB does not take advantage of bulk dequeue. Have you tried to enable/allow HTB to bulk dequeue? > What we need if we are going to get rid of busy lock would be some > sort of priority locking mechanism that would allow the dequeue thread > to jump to the head of the line if it is attempting to take the lock. > Otherwise you end up spending all your time enqueuing packets into > oblivion because the qdiscs just overflow without the busy lock in > place. Exactly. The qdisc locking scheme is designed to only allow one dequeuing CPU to run (via state __QDISC___STATE_RUNNING). Jamal told me this was an optimization. Maybe this optimization "broke" when locking got fair? I don't want to offend the original qdisc designers, but when I look at the qdisc locking code, I keep thinking this scheme is broken. Wouldn't it be better to have seperate an enqueue lock and a dequeue lock? (with a producer/consumer queue). Once we get John's lockless qdisc scheme, then the individual qdiscs can implement a locking scheme like this, and we can keep the old locking scheme intact for legacy qdisc. Or we can clean up this locking scheme and update the legacy qdisc to use a MPMC queue? -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer