Re: [PATCH v7 net-next 01/16] gso: Remove arbitrary checks for unsupported GSO

2016-05-20 Thread Alexander Duyck
On Fri, May 20, 2016 at 6:02 PM, Hannes Frederic Sowa
 wrote:
> Hello,
>
> On 18.05.2016 18:06, Tom Herbert wrote:
>> In several gso_segment functions there are checks of gso_type against
>> a seemingly arbitrary list of SKB_GSO_* flags. This seems like an
>> attempt to identify unsupported GSO types, but since the stack is
>> the one that set these GSO types in the first place this seems
>> unnecessary to do. If a combination isn't valid in the first
>> place that stack should not allow setting it.
>>
>> This is a code simplication especially for add new GSO types.
>
> I couldn't still wrap my head around this.
>
> I wonder if this is safe in case of if the packet is generated from an
> untrusted virtual machine over virtio_net?

The problem is the test is kind of pointless.  I think the original
point of the tests was to prevent the case of IPIP and TSO6 or SIT and
TSO4.  However since we dropped IPIP and SIT and instead replaced them
with the IPXIP6 and IPXIP4 there shouldn't be any combinations that
don't work now other than UDP being combined with TCP segmentation.

If we are wanting to prevent that case we might place something
earlier in the segmentation code that would perform an and with all
the L4 GSO types that should be mutually exclusive, verify that value
is non-zero, and then and that value with itself - 1 to verify that
exactly 1 bit is set in that grouping and no more than that.

- Alex


Re: [PATCH 2/2] ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit path.

2016-05-20 Thread 严海双

> On May 21, 2016, at 1:48 AM, David Miller  wrote:
> 
> From: Haishuang Yan 
> Date: Wed, 18 May 2016 18:05:52 +0800
> 
>> In gre6 xmit path, we are sending a GRE packet, so set fl6 proto
>> to IPPROTO_GRE properly.
>> 
>> Signed-off-by: Haishuang Yan 
> 
> I think it would be a lot better to initialize the flow protocol field
> properly in ip6gre_tnl_link_config() instead of fixing it up every
> single transmit.
> 
> 

I agree, I will modify the changes in v2.

Thanks

Haishuang




Re: IPv6 extension header privileges

2016-05-20 Thread Sowmini Varadhan
On (05/21/16 02:20), Hannes Frederic Sowa wrote:
> 
> There are some options inherently protocol depending like the jumbo
> payload option, which should be under control of the kernel, or the
> router alert option for igmp, which causes packets to be steered towards
> the slow/software path of routers, which can be used for DoS attacks.
> 
> Setting CALIPSO options in IPv6 on packets as users would defeat the
> whole CALIPSO model, etc.
> 
> The RFC3542 requires at least some of the options in dst/hop-by-hop

"requires" is a strong word. 3542 declares it as a  "may" (lower case).
The only thing required strongly is IPV6_NEXTHOP itself.

I suspect 3542 was written at a time when hbh and dst opt were loosely
defined and the "may" is just a place-holder (i.e., it's not even a MAY)

> 
> AFAIK people worried about the parsing overhead and thus decided to
> block it for ordinary users.

That's probably more likely, esp for hbh options. It may also be 
interesting to find out what BSD does in these cases. 

--Sowmini 


Re: [PATCH v7 net-next 01/16] gso: Remove arbitrary checks for unsupported GSO

2016-05-20 Thread Hannes Frederic Sowa
Hello,

On 18.05.2016 18:06, Tom Herbert wrote:
> In several gso_segment functions there are checks of gso_type against
> a seemingly arbitrary list of SKB_GSO_* flags. This seems like an
> attempt to identify unsupported GSO types, but since the stack is
> the one that set these GSO types in the first place this seems
> unnecessary to do. If a combination isn't valid in the first
> place that stack should not allow setting it.
> 
> This is a code simplication especially for add new GSO types.

I couldn't still wrap my head around this.

I wonder if this is safe in case of if the packet is generated from an
untrusted virtual machine over virtio_net?

Bye,
Hannes



Re: off-by-one in DecodeQ931

2016-05-20 Thread Toby DiPasquale
I'm a bit new to this; is this patch OK?

On Tue, May 3, 2016 at 1:12 AM, Toby DiPasquale  wrote:
> On Mon, Apr 25, 2016 at 11:29 AM, Florian Westphal  wrote:
>> -> sz (size_t) will underflow here
>>
>> I'd suggest to change the if (sz < 1) to if (sz < 2) to
>> resolve this, the while loop below has to be taken anyway.
>
> Thanks, Florian! Updated patch below:
>
> Signed-off-by: Toby DiPasquale 
>
> diff --git a/net/netfilter/nf_conntrack_h323_asn1.c
> b/net/netfilter/nf_conntrack_h323_asn1.c
> index bcd5ed6..89b2e46 100644
> --- a/net/netfilter/nf_conntrack_h323_asn1.c
> +++ b/net/netfilter/nf_conntrack_h323_asn1.c
> @@ -846,9 +846,10 @@ int DecodeQ931(unsigned char *buf, size_t sz, Q931 *q931)
> sz -= len;
>
> /* Message Type */
> -   if (sz < 1)
> +   if (sz < 2)
> return H323_ERROR_BOUND;
> q931->MessageType = *p++;
> +   sz--;
> PRINT("MessageType = %02X\n", q931->MessageType);
> if (*p & 0x80) {
> p++;



-- 
Toby DiPasquale


Re: IPv6 extension header privileges

2016-05-20 Thread Hannes Frederic Sowa
On 21.05.2016 00:37, Tom Herbert wrote:
> Hi,
> 
> In ipv6_sockglue.c I noticed:
> 
> /* hop-by-hop / destination options are privileged option */
> retv = -EPERM;
> if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
>break;
> 
> Can anyone provide that rationale as to why these are privileged ops?

There are some options inherently protocol depending like the jumbo
payload option, which should be under control of the kernel, or the
router alert option for igmp, which causes packets to be steered towards
the slow/software path of routers, which can be used for DoS attacks.

Setting CALIPSO options in IPv6 on packets as users would defeat the
whole CALIPSO model, etc.

The RFC3542 requires at least some of the options in dst/hop-by-hop
extensions to be only be settable by privileged users. More fine-grained
parsing and setting of those options has never been implemented.

AFAIK people worried about the parsing overhead and thus decided to
block it for ordinary users.

Bye,
Hannes



[GIT] Networking

2016-05-20 Thread David Miller

1) Tunneling fixes from Tom Herbert and Alexander Duyck.

2) AF_UNIX updates some struct sock bit fields with the socket lock,
   whereas setsockopt() sets overlapping ones with locking.  Seperate
   out the synchronized vs. the AF_UNIX unsynchronized ones to avoid
   corruption.  From Andrey Ryabinin.

3) Mount BPF filesystem with mount_nodev rather than mount_ns, from
   Eric bieferman.

4) A couple kmemdup conversions, from Muhammad Falak R Wani.

5) BPF verifier fixes from Alexei Starovoitov.

6) Don't let tunneled UDP packets get stuck in socket queues, if
   something goes wrong during the encapsulation just drop the
   packet rather than signalling an error up the call stack.
   From Hannes Frederic Sowa.

7) SKB ref after free in batman-adv, from Florian Westphal.

8) TCP iSCSI, ocfs2, rds, and tipc have to disable BH in it's TCP
   callbacks since the TCP stack runs pre-emptibly now.  From Eric
   Dumazet.

9) Fix crash in fixed_phy_add, from Rabin Vincent.

10) Fix length checks in xen-netback, from Paul Durrant.

11) Fix mixup in KEY vs. KEYID macsec attributes, from Sabrina
Dubroca.

12) RDS connection spamming bug fixes from Sowmini Varadhan.

Please pull, thanks a lot!

The following changes since commit 07b75260ebc2c789724c594d7eaf0194fa47b3be:

  Merge branch 'upstream' of 
git://git.linux-mips.org/pub/scm/ralf/upstream-linus (2016-05-19 10:02:26 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 95829b3a9c0b1d88778b23bc2afdf5a83de066ff:

  net: suppress warnings on dev_alloc_skb (2016-05-20 19:58:32 -0400)


Alan Liu (1):
  ath10k: add max_tx_power for QCA6174 WLAN.RM.2.0 firmware

Alexander Duyck (2):
  ip6_gre: Do not allow segmentation offloads GRE_CSUM is enabled with 
FOU/GUE
  intel: Add support for IPv6 IP-in-IP offload

Alexei Starovoitov (2):
  bpf: support decreasing order in direct packet access
  bpf: teach verifier to recognize imm += ptr pattern

Amit Ghadge (1):
  net: Fix coding style warnings and errors.

Andrey Ryabinin (1):
  net: sock: move ->sk_shutdown out of bitfields.

Anilkumar Kolli (1):
  ath10k: fix kernel panic, move arvifs list head init before htt init

Antonio Quartulli (1):
  batman-adv: make sure ELP/OGM orig MAC is updated on address change

Arnd Bergmann (1):
  mlx5: avoid unused variable warning

Bjorn Andersson (1):
  wcn36xx: Set SMD timeout to 10 seconds

Christian Daudt (1):
  brcmfmac: Add 4356 sdio support

Dan Carpenter (3):
  rtlwifi: rtl818x: silence uninitialized variable warning
  airo: prevent potential underflow in airo_set_freq()
  atmel: potential underflow in atmel_set_freq()

Daniel Borkmann (1):
  bpf: rather use get_random_int for randomizations

David S. Miller (8):
  Merge branch 'tcp_bh_fixes'
  Merge tag 'batman-adv-fix-for-davem' of 
git://git.open-mesh.org/linux-merge
  Revert "net: pegasus: remove dead coding"
  Merge branch 'GREoIPV6'
  Merge branch 'rds-conn-spamming'
  Merge branch 'GREoIPV6-followups'
  Merge tag 'wireless-drivers-next-for-davem-2016-05-13' of 
git://git.kernel.org/.../kvalo/wireless-drivers-next
  Merge branch 'bpf-verifier-fixes'

Denys Vlasenko (1):
  rtlwifi: rtl818x: Deinline indexed IO functions, save 21568 bytes

Emmanuel Grumbach (4):
  iwlwifi: mvm: allow a debug knob for Tx A-MSDU even if rate control 
forbids it
  iwlwifi: remove IWLWIFI_DEBUG_EXPERIMENTAL_UCODE
  iwlwifi: don't access a nonexistent register upon assert
  iwlwifi: add default value to disable_11ac mod param description

Eric Dumazet (4):
  scsi_tcp: block BH in TCP callbacks
  ocfs2/cluster: block BH in TCP callbacks
  rds: tcp: block BH in TCP callbacks
  tipc: block BH in TCP callbacks

Eric W. Biederman (1):
  bpf: Use mount_nodev not mount_ns to mount the bpf filesystem

Fabio Estevam (1):
  Revert "phy: add support for a reset-gpio specification"

Florian Westphal (1):
  batman-adv: fix skb deref after free

Golan Ben-Ami (1):
  iwlwifi: mvm: add more registers to dump upon error

Gregory Greenman (2):
  iwlwifi: consider VHT 160MHz while parsing NVM
  iwlwifi: turn on SGI support for VHT 160MHz

Guy Mishol (1):
  wlcore/wl12xx: Fix fw logger over sdio

Haim Dreyfuss (4):
  iwlwifi: Rename 9560 to 9260 and add new PCI IDs for it
  iwlwifi: allow combining different phy images with mac images
  iwlwifi: Fix firmware name maximum length definition
  iwlwifi: pcie: don't wake up the NIC when writing CSRs in MSIX mode

Hannes Frederic Sowa (1):
  udp: prevent skbs lingering in tunnel socket queues

Helmut Schaa (4):
  ath9k: reuse ar9003_hw_tx_power_regwrite for tx99 setup
  ath9k: Move TX99 config option under ath9k debugging
  ath9k: Simplify ar9003_hw_tx99_set_txpower
  ath9

Re: [PATCHv2] net: suppress warnings on dev_alloc_skb

2016-05-20 Thread David Miller
From: Neil Horman 
Date: Thu, 19 May 2016 11:30:54 -0400

> Noticed an allocation failure in a network driver the other day on a 32 bit
> system:
 ...
> Thought that perhaps the big splat in the logs wasn't really necessecary, as
> all call sites for dev_alloc_skb:
> 
> a) check the return code for the function
> 
> and
> 
> b) either print their own error message or have a recovery path that makes the
> warning moot.
> 
> Fix it by modifying dev_alloc_pages to pass __GFP_NOWARN as a gfp flag to
> suppress the warning
> 
> applies to the net tree
> 
> Signed-off-by: Neil Horman 

Applied, thanks Neil.


Re: [PATCH net] udp: prevent skbs lingering in tunnel socket queues

2016-05-20 Thread David Miller
From: Hannes Frederic Sowa 
Date: Thu, 19 May 2016 15:58:33 +0200

> In case we find a socket with encapsulation enabled we should call
> the encap_recv function even if just a udp header without payload is
> available. The callbacks are responsible for correctly verifying and
> dropping the packets.
> 
> Also, in case the header validation fails for geneve and vxlan we
> shouldn't put the skb back into the socket queue, no one will pick
> them up there.  Instead we can simply discard them in the respective
> encap_recv functions.
> 
> Signed-off-by: Hannes Frederic Sowa 

Looks good, applied and queued up for -stable.

Thanks!


Re: [PATCH net-next 0/2] bpf: verifier fixes

2016-05-20 Thread David Miller
From: Alexei Starovoitov 
Date: Thu, 19 May 2016 18:17:12 -0700

> Further testing of 'direct packet access' uncovered
> several usability issues. Fix them.

Series applied, thanks Alexei.


Re: [PATCH] net:liquidio: use kmemdup

2016-05-20 Thread David Miller
From: Muhammad Falak R Wani 
Date: Thu, 19 May 2016 19:22:49 +0530

> Use kmemdup when some other buffer is immediately copied into allocated
> region. It replaces call to allocation followed by memcpy, by a single
> call to kmemdup.
> 
> Signed-off-by: Muhammad Falak R Wani 

Applied.


Re: [PATCH] net: usb: ch9200: use kmemdup

2016-05-20 Thread David Miller
From: Muhammad Falak R Wani 
Date: Thu, 19 May 2016 19:26:50 +0530

> Use kmemdup when some other buffer is immediately copied into allocated
> region. It replaces call to allocation followed by memcpy, by a single
> call to kmemdup.
> 
> Signed-off-by: Muhammad Falak R Wani 

Applied.


Re: [PATCH] ps3_gelic: use kmemdup

2016-05-20 Thread David Miller
From: Muhammad Falak R Wani 
Date: Thu, 19 May 2016 19:24:41 +0530

> Use kmemdup when some other buffer is immediately copied into allocated
> region. It replaces call to allocation followed by memcpy, by a single
> call to kmemdup.
> 
> Signed-off-by: Muhammad Falak R Wani 

Applied.


Re: [PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem

2016-05-20 Thread David Miller
From: ebied...@xmission.com (Eric W. Biederman)
Date: Fri, 20 May 2016 17:22:48 -0500

> 
> While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
> bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
> with current->nsproxy->mnt_ns. As the code does not acquire a
> reference to the mount namespace it can not possibly be correct to
> store the mount namespace on the superblock as it does.
> 
> Replace mount_ns with mount_nodev so that each mount of the bpf
> filesystem returns a distinct instance, and the code is not buggy.
> 
> In discussion with Hannes Frederic Sowa it was reported that the use
> of mount_ns was an attempt to have one bpf instance per mount
> namespace, in an attempt to keep resources that pin resources from
> hiding.  That intent simply does not work, the vfs is not built to
> allow that kind of behavior.  Which means that the bpf filesystem
> really is buggy both semantically and in it's implemenation as it does
> not nor can it implement the original intent.
> 
> This change is userspace visible, but my experience with similar
> filesystems leads me to believe nothing will break with a model of each
> mount of the bpf filesystem is distinct from all others.
> 
> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
> Cc: Hannes Frederic Sowa 
> Acked-by: Daniel Borkmann 
> Signed-off-by: "Eric W. Biederman" 

Applied and queued up for -stable, thanks everyone.


Re: pull-request: wireless-drivers-next 2016-05-13 (take two)

2016-05-20 Thread David Miller
From: Kalle Valo 
Date: Thu, 19 May 2016 15:45:08 +0300

> this the second version of the last pull request to net-next for 4.7,
> which got postponed due to the recent iwlwifi merge conflict. Now that
> Linus fixed the merge problem in his tree I actually didn't have to fix
> anything in my tree anymore. So that's why I still use the same tag as
> in my previous pull request.
> 
> The only dependency is that you need Linus' iwlwifi fix in your tree
> before you pull this:
> 
> 0e034f5c4bc4 iwlwifi: fix mis-merge that breaks the driver
> 
> So if you could pull latest Linus' tree to net-next that would solve
> that. I just did a test merge of that on your net-next tree and didn't
> see any conflicts, so this pull should go smoothly. Also these patches
> have been in linux-next at least a week now. But please let me know if
> you see any issues.

Done, pulled, thanks a lot!

> And I think a lesson learned from this is that I need to immediately
> merge wireless-drivers to wireless-drivers-next if there are any
> conflicts between the trees. Or how do you suggest to handle cases like
> that?

You shouldn't have to worry about this kind of stuff at all, and let's
just hope I handle the conflict better next time. :-)



Re: [RFC net-next] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump

2016-05-20 Thread David Miller
From: Eric Dumazet 
Date: Thu, 19 May 2016 05:35:20 -0700

> From: Eric Dumazet 
> 
> Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
> agent [1] are problematic at scale :
> 
> For each qdisc/class found in the dump, we currently lock the root qdisc
> spinlock in order to get stats. Sampling stats every 5 seconds from
> thousands of HTB classes is a challenge when the root qdisc spinlock is
> under high pressure.
> 
> These stats are using u64 or u32 fields, so reading integral values
> should not prevent writers from doing concurrent updates if the kernel
> arch is a 64bit one.
> 
> Being able to atomically fetch all counters like packets and bytes sent
> at the expense of interfering in fast path (queue and dequeue packets)
> is simply not worth the pain, as the values are generally stale after 1
> usec.
> 
> These lock acquisitions slow down the fast path by 10 to 20 %
> 
> An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
> that might need the qdisc lock in fq_codel_dump_stats() and
> fq_codel_dump_class_stats()
> 
> gnet_dump_force_lock() call is added there and could be added to other
> qdisc stat handlers if needed.
> 
> [1]
> http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf
> 
> Signed-off-by: Eric Dumazet 

I guess the off-by-one situations are not a big enough deal to code new
locking or memory barrier for, so I'm fine with this.

Please resubmit when I open net-next back up, thanks.


Re: [PATCH net V2] tuntap: correctly wake up process during uninit

2016-05-20 Thread David Miller
From: Jason Wang 
Date: Thu, 19 May 2016 13:36:51 +0800

> We used to check dev->reg_state against NETREG_REGISTERED after each
> time we are woke up. But after commit 9e641bdcfa4e ("net-tun:
> restructure tun_do_read for better sleep/wakeup efficiency"), it uses
> skb_recv_datagram() which does not check dev->reg_state. This will
> result if we delete a tun/tap device after a process is blocked in the
> reading. The device will wait for the reference count which was held
> by that process for ever.
> 
> Fixes this by using RCV_SHUTDOWN which will be checked during
> sk_recv_datagram() before trying to wake up the process during uninit.
> 
> Fixes: 9e641bdcfa4e ("net-tun: restructure tun_do_read for better
> sleep/wakeup efficiency")
> Cc: Eric Dumazet 
> Cc: Xi Wang 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
> ---
> - The patch is needed for -stable.
> - Changes from v1: remove unnecessary NETREG_REGISTERED check in tun_do_read()

Applied and queued up for -stable.


Re: [PATCH net,stable v2] net: cdc_ncm: update datagram size after changing mtu

2016-05-20 Thread David Miller
From: Robert Dobrowolski 
Date: Thu, 19 May 2016 11:56:09 +0200

> From: Rafal Redzimski 
> 
> Current implementation updates the mtu size and notify cdc_ncm
> device using USB_CDC_SET_MAX_DATAGRAM_SIZE request about datagram
> size change instead of changing rx_urb_size.
> 
> Whenever mtu is being changed, datagram size should also be
> updated. Also updating maxmtu formula so it takes max_datagram_size with
> use of cdc_ncm_max_dgram_size() and not ctx.
> 
> Signed-off-by: Robert Dobrowolski 
> Signed-off-by: Rafal Redzimski 

Appied, thanks.


Re: [PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem

2016-05-20 Thread Hannes Frederic Sowa
On 21.05.2016 00:22, Eric W. Biederman wrote:
> 
> While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
> bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
> with current->nsproxy->mnt_ns. As the code does not acquire a
> reference to the mount namespace it can not possibly be correct to
> store the mount namespace on the superblock as it does.
> 
> Replace mount_ns with mount_nodev so that each mount of the bpf
> filesystem returns a distinct instance, and the code is not buggy.
> 
> In discussion with Hannes Frederic Sowa it was reported that the use
> of mount_ns was an attempt to have one bpf instance per mount
> namespace, in an attempt to keep resources that pin resources from
> hiding.  That intent simply does not work, the vfs is not built to
> allow that kind of behavior.  Which means that the bpf filesystem
> really is buggy both semantically and in it's implemenation as it does
> not nor can it implement the original intent.
> 
> This change is userspace visible, but my experience with similar
> filesystems leads me to believe nothing will break with a model of each
> mount of the bpf filesystem is distinct from all others.
> 
> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
> Cc: Hannes Frederic Sowa 
> Acked-by: Daniel Borkmann 
> Signed-off-by: "Eric W. Biederman" 

Acked-by: Hannes Frederic Sowa 

Thanks Eric!




Re: [net-next PATCH 0/2] Follow-ups for GUEoIPv6 patches

2016-05-20 Thread David Miller
From: Alexander Duyck 
Date: Wed, 18 May 2016 10:44:40 -0700

> This patch series is meant to be applied after:
> [PATCH v7 net-next 00/16] ipv6: Enable GUEoIPv6 and more fixes for v6 
> tunneling
> 
> The first patch addresses an issue we already resolved in the GREv4 and is
> now present in GREv6 with the introduction of FOU/GUE for IPv6 based GRE
> tunnels.
> 
> The second patch goes through and enables IPv6 tunnel offloads for the Intel
> NICs that already support the IPv4 based IP-in-IP tunnel offloads.  I have
> only done a bit of touch testing but have seen ~20 Gb/s over an i40e
> interface using a v4-in-v6 tunnel, and I have verified IPv6 GRE is still
> passing traffic at around the same rate.  I plan to do further testing but
> with these patches present it should enable a wider audience to be able to
> test the new features introduced in Tom's patchset with hardware offloads.

Series applied, thanks Alex.


Re: [PATCH net 0/2] RDS: TCP: connection spamming fixes

2016-05-20 Thread David Miller
From: Sowmini Varadhan 
Date: Wed, 18 May 2016 10:06:22 -0700

> We have been testing the RDS-TCP code with a connection spammer
> that sends incoming SYNs to the RDS listen port well after 
> an rds-tcp connection has been established, and found a few 
> race-windows that are fixed by this patch series.
> 
> Patch 1 avoids a null pointer deref when an incoming SYN 
> shows up when a netns is being dismantled, or when the 
> rds-tcp module is being unloaded. 
> 
> Patch 2 addresses the case when a SYN is received after the
> connection arbitration algorithm has converged: the incoming
> SYN should not needlessly quiesce the transmit path, and it
> should not result in needless TCP connection resets due to
> re-execution of the connection arbitration logic.

Series applied, thanks.


Re: [RFC][PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem

2016-05-20 Thread Eric W. Biederman
Hannes Frederic Sowa  writes:

> On 18.05.2016 22:43, Daniel Borkmann wrote:
>> Eric, please send the patch officially and feel free to add my Ack.

Done.

>> Given
>> the circumstances, moving to mount_nodev() seems the best way forward. To
>> also address above mentioned concern from Hannes, we need to remove the
>> FS_USERNS_MOUNT flag along with the change. It looks like the fix is best
>> addressed in a single patch if you want to include it. If not, we can
>> otherwise send it separately as well, I don't mind.
>
> I agree. Would make most sense to make the change in one patch. Later on
> we can reason about if it makes sense to use the net namespace to split
> bpf maps and programs or maybe even introduce a new primitive for that.

I will let you two take care of the FS_USERNS_MOUNT flag.

Removal of the FS_USERNS_MOUNT flag because it was added prematurely is
a completely different analysis of consequences and possible regressions
in userspace.  The two changes should be kept separate to make it easy
to handle the unlikely case that either of them cause a regression.

But I have not objections to removing the FS_USERNS_MOUNT flag from the
bpf filesystem.

Eric



IPv6 extension header privileges

2016-05-20 Thread Tom Herbert
Hi,

In ipv6_sockglue.c I noticed:

/* hop-by-hop / destination options are privileged option */
retv = -EPERM;
if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
   break;

Can anyone provide that rationale as to why these are privileged ops?

Thanks,
Tom


[PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem

2016-05-20 Thread Eric W. Biederman

While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
with current->nsproxy->mnt_ns. As the code does not acquire a
reference to the mount namespace it can not possibly be correct to
store the mount namespace on the superblock as it does.

Replace mount_ns with mount_nodev so that each mount of the bpf
filesystem returns a distinct instance, and the code is not buggy.

In discussion with Hannes Frederic Sowa it was reported that the use
of mount_ns was an attempt to have one bpf instance per mount
namespace, in an attempt to keep resources that pin resources from
hiding.  That intent simply does not work, the vfs is not built to
allow that kind of behavior.  Which means that the bpf filesystem
really is buggy both semantically and in it's implemenation as it does
not nor can it implement the original intent.

This change is userspace visible, but my experience with similar
filesystems leads me to believe nothing will break with a model of each
mount of the bpf filesystem is distinct from all others.

Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
Cc: Hannes Frederic Sowa 
Acked-by: Daniel Borkmann 
Signed-off-by: "Eric W. Biederman" 
---
 kernel/bpf/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 8f94ca1860cf..55d923688f85 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -378,7 +378,7 @@ static int bpf_fill_super(struct super_block *sb, void 
*data, int silent)
 static struct dentry *bpf_mount(struct file_system_type *type, int flags,
const char *dev_name, void *data)
 {
-   return mount_ns(type, flags, current->nsproxy->mnt_ns, bpf_fill_super);
+   return mount_nodev(type, flags, data, bpf_fill_super);
 }
 
 static struct file_system_type bpf_fs_type = {
-- 
2.8.1



Re: [PATCH v4] net: sock: move ->sk_shutdown out of bitfields.

2016-05-20 Thread David Miller
From: Andrey Ryabinin 
Date: Wed, 18 May 2016 19:19:27 +0300

> ->sk_shutdown bits share one bitfield with some other bits in sock struct,
> such as ->sk_no_check_[r,t]x, ->sk_userlocks ...
> sock_setsockopt() may write to these bits, while holding the socket lock.
> 
> In case of AF_UNIX sockets, we change ->sk_shutdown bits while holding only
> unix_state_lock(). So concurrent setsockopt() and shutdown() may lead
> to corrupting these bits.
> 
> Fix this by moving ->sk_shutdown bits out of bitfield into a separate byte.
> This will not change the 'struct sock' size since ->sk_shutdown moved into
> previously unused 16-bit hole.
> 
> Signed-off-by: Andrey Ryabinin 
> Suggested-by: Hannes Frederic Sowa 

Looks good, applied and queued up for -stable.

Thanks.


Re: [PATCH v7 net-next 00/16] ipv6: Enable GUEoIPv6 and more fixes for v6 tunneling

2016-05-20 Thread David Miller
From: Tom Herbert 
Date: Wed, 18 May 2016 09:06:08 -0700

> This patch set:
>   - Fixes GRE6 to process translate flags correctly from configuration
>   - Adds support for GSO and GRO for ip6ip6 and ip4ip6
>   - Add support for FOU and GUE in IPv6
>   - Support GRE, ip6ip6 and ip4ip6 over FOU/GUE
>   - Fixes ip6_input to deal with UDP encapsulations
>   - Some other minor fixes

Series applied, thanks Tom.


Re: [PATCH] Revert "phy: add support for a reset-gpio specification"

2016-05-20 Thread David Miller
From: Fabio Estevam 
Date: Wed, 18 May 2016 13:05:00 -0300

> Commit da47b4572056 ("phy: add support for a reset-gpio specification")
> causes the following xtensa qemu crash according to Guenter Roeck:
 ...
> This reverts commit da47b4572056487fd7941c26f73b3e8815ff712a.
> 
> Reported-by: Guenter Roeck 
> Signed-off-by: Fabio Estevam 

Applied, thanks.


Re: [PATCH net-next] xen-netback: only deinitialized hash if it was initialized

2016-05-20 Thread David Miller
From: Paul Durrant 
Date: Wed, 18 May 2016 15:55:42 +0100

> A domain with a frontend that does not implement a control ring has been
> seen to cause a crash during domain save. This was apparently because
> the call to xenvif_deinit_hash() in xenvif_disconnect_ctrl() is made
> regardless of whether a control ring was connected, and hence
> xenvif_hash_init() was called.
> 
> This patch brings the call to xenvif_deinit_hash() in
> xenvif_disconnect_ctrl() inside the if clause that checks whether the
> control ring event channel was connected. This is sufficient to ensure
> it is only called if xenvif_init_hash() was called previously.
> 
> Signed-off-by: Paul Durrant 
> Reported-by: Boris Ostrovsky 
> Tested-by: Boris Ostrovsky 

Applied, thanks.


Re: [RFC] net: remove busylock

2016-05-20 Thread Eric Dumazet
On Fri, 2016-05-20 at 19:49 +0200, Jesper Dangaard Brouer wrote:
> On Fri, 20 May 2016 07:16:55 -0700
> Eric Dumazet  wrote:
> 
> > Since bonding pretends to be multiqueue, TCQ_F_ONETXQUEUE is not set
> > on sch->flags when HTB is installed at the bonding device root.
> 
> If would be cool if you could run a test with removed busylock and
> allow HTB to bulk dequeue.

I added a /sys/class/net/eth0/tx_bulk_limit to tune the number of
packets that we could bulk dequeue from a virtual device 
(no BQL on them)

200 TCP_RR through HTB on eth0 (bonding device)

1) busylock enabled


With tx_bulk_limit set to 8, we get 12.7 % increase.

lpaa23:~# for f in `seq 1 16`; do echo $f|tee 
/sys/class/net/eth0/tx_bulk_limit; sar -n DEV 3 3|grep eth0|grep Average; done
1
Average: eth0 868625.67 868487.44  57055.69  56826.37  0.00  
0.00  0.56
2
Average: eth0 927081.67 926920.78  60923.83  60649.90  0.00  
0.00  0.44
3
Average: eth0 957678.11 957554.00  62877.04  62653.89  0.00  
0.00  0.56
4
Average: eth0 966912.44 966747.33  63532.72  63255.51  0.00  
0.00  0.56
5
Average: eth0 973137.56 972950.44  63958.31  63661.39  0.00  
0.00  0.44
6
Average: eth0 958589.22 958449.44  62961.79  62712.56  0.00  
0.00  0.67
7
Average: eth0 960737.67 960672.22  62968.34  62857.97  0.00  
0.00  0.44
8
Average: eth0 979271.78 979201.67  64199.47  64070.84  0.00  
0.00  0.56
9
Average: eth0 982464.33 982390.33  64418.42  64278.93  0.00  
0.00  0.56
10
Average: eth0 982698.00 982578.22  64495.25  64291.28  0.00  
0.00  0.44
11
Average: eth0 981862.22 981746.00  64438.16  64236.31  0.00  
0.00  0.56
12
Average: eth0 983277.44 983096.33  64632.79  64327.79  0.00  
0.00  0.44
13
Average: eth0 981221.11 981018.00  64521.82  64189.26  0.00  
0.00  0.67
14
Average: eth0 981754.11 981555.89  64553.19  64224.39  0.00  
0.00  0.44
15
Average: eth0 982484.33 982301.67  64572.00  64273.38  0.00  
0.00  0.44
16
Average: eth0 978529.56 978326.67  64350.89  64013.39  0.00  
0.00  0.67


2) busylock disabled


Well, bulk dequeue helps, but does not close the gap. 
(busylock is needed)

lpaa23:~# for f in `seq 1 16`; do echo $f|tee 
/sys/class/net/eth0/tx_bulk_limit; sar -n DEV 3 3|grep eth0|grep Average; done
1
Average: eth0 795408.44 795407.67  52044.66  52045.93  0.00  
0.00  0.56
2
Average: eth0 843411.78 843415.11  55185.23  55184.51  0.00  
0.00  0.56
3
Average: eth0 876175.89 876175.00  57329.50  57327.98  0.00  
0.00  0.44
4
Average: eth0 890631.22 890629.44  58274.58  58274.25  0.00  
0.00  0.67
5
Average: eth0 900672.00 900668.89  58931.29  58930.54  0.00  
0.00  0.44
6
Average: eth0 908325.78 908328.22  59432.97  59431.76  0.00  
0.00  0.56
7
Average: eth0 913895.33 913885.11  59796.89  59795.46  0.00  
0.00  0.56
8
Average: eth0 914429.11 914433.56  59832.26  59831.23  0.00  
0.00  0.67
9
Average: eth0 918701.11 918699.67  60110.68  60110.36  0.00  
0.00  0.55
10
Average: eth0 920382.33 920376.56  60223.31  60220.54  0.00  
0.00  0.67
11
Average: eth0 914341.67 914344.67  59826.25  59825.90  0.00  
0.00  0.67
12
Average: eth0 912697.00 912693.78  59718.77  59717.44  0.00  
0.00  0.44
13
Average: eth0 917392.56 917385.00  60025.79  60024.34  0.00  
0.00  0.44
14
Average: eth0 918232.89 918233.78  60081.04  60079.94  0.00  
0.00  0.67
15
Average: eth0 918377.11 918381.00  60091.14  60089.79  0.00  
0.00  0.44
16
Average: eth0 913817.56 913812.33  59792.09  59790.66  0.00  
0.00  0.56




Re: [PATCH 1/3] netfilter: ipset: use setup_timer() and mod_timer().

2016-05-20 Thread Jozsef Kadlecsik
On Sat, 14 May 2016, Muhammad Falak R Wani wrote:

> Use setup_timer() and instead of init_timer(), being the preferred way
> of setting up a timer.
> 
> Also, quoting the mod_timer() function comment:
> -> mod_timer() is a more efficient way to update the expire field of an
>active timer (if the timer is inactive it will be activated).
> 
> Use setup_timer() and mod_timer() to setup and arm a timer, making the
> code compact and easier to read.
> 
> Signed-off-by: Muhammad Falak R Wani 
> ---
>  net/netfilter/ipset/ip_set_bitmap_gen.h | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)

The patch and the other too in the series as well are applied in the ipset 
git repository and will be submitted for kernel inclusion. Thanks.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH net-next 5/7] Driver: Vmxnet3: Add support for get_coalesce, set_coalesce ethtool operations

2016-05-20 Thread Shrikrishna Khare


On Sun, 8 May 2016, Ben Hutchings wrote:

> > Would a patch that maps 0 to 'no coalescing' be acceptable? That is:
> > 
> > rx-usecs = 0 -> coalescing disabled.
> > rx-usecs = 1 -> default (chosen by the device).
> > rx-usecs = 2 -> adaptive coalescing.
> > rx-usecs = 3 -> static coalescing.
> 
> I still don't like it much.  For the 3 special values (0 isn't really
> special):
> 
> 1 = default: When the driver sets the virtual device to this mode, can it 
> then read back what the actual settings are, or are they hidden?  If it can, 
> then userland can also read the defaults and explicitly return to them later. 
>  But I do see the usefulness of an explicit request to reset to defaults.
> 
> 2 = adaptive coalescing: There are already fields to request adaptive 
> coalescing; you should support them.
> 
> 3 = static coalescing: I don't understand what this means.

static refers to the number of packets to batch before raising an 
interrupt - which maps to existing tx_max_coaleced_frames.

Have sent out v2 of the patch that no longer uses special values of 
rx-usecs to distinguish between the coalescing modes. Existing 
ethtool_coalesce fields are used as appropriate instead.

In v2, driver can no longer issue "revert to defaults" command 
(think such a mechanism might be useful addition but belongs to ethtool 
framework).


Thanks,
Shri

Re: [PATCH] net: alx: use custom skb allocator

2016-05-20 Thread Jarod Wilson
On Fri, May 20, 2016 at 03:56:23PM +0800, Feng Tang wrote:
> Hi,
> 
> Please ignore this patch. 
> 
> I found the problem and made the patch with kernel 4.4 with Ubuntu 12.04
> on Lenovo Y580. 
> 
> After submitting the patch, I tried to google the datasheet for
> Atheros 8161 dev, and found there was already a kernel bugzilla 
> https://bugzilla.kernel.org/show_bug.cgi?id=70761
> that had a fix from Jarod Wilson which get merged in v4.5 (commit c406700cd) 
> 
> Per my test, the new 4.6.0 kernel works fine with Jarod's patch. 

Might not hurt to get this some testing by people for whom my patch didn't
fix things. It seems to help for a fair number of people, but not
everyone, so perhaps this would be of use as well.

Personally, my own alx-driven hardware didn't have a problem to begin
with, so I'm not able to do more than make sure things don't regress
w/already functioning hardware.


> On Fri, May 20, 2016 at 01:41:03PM +0800, Feng Tang wrote:
> > This patch follows Eric Dumazet's commit 7b70176421 to fix one
> > exactly same bug in alx driver, that the network link will
> > be lost in 1-5 minutes after the device is up.
> > 
> > Following is a git log from Eric's 7b70176421:
> > 
> > "We had reports ( https://bugzilla.kernel.org/show_bug.cgi?id=54021 )
> > that using high order pages for skb allocations is problematic for atl1c
> > 
> > We do not know exactly what the problem is, but we suspect that crossing
> > 4K pages is not well supported by this hardware.
> > 
> > Use a custom allocator, using page allocator and 2K fragments for
> > optimal stack behavior. We might make this allocator generic
> >  in future kernels."
> > 
> > And my debug shows the same suspect, most of the errors happen
> > when there is a RX buffer address with 0x..f80, hopefully
> > this will get noticed and fixed from silicon side.
> > 
> > My victim is a Lenovo Y580 Laptop with Atheros ALX AR8161 etherenet
> > device(PCI ID 1969:1091), with this patch the ethernet dev
> > works just fine
> > 
> > Signed-off-by: Feng Tang 
> > ---
> >  drivers/net/ethernet/atheros/alx/alx.h  |  4 +++
> >  drivers/net/ethernet/atheros/alx/main.c | 48 
> > -
> >  2 files changed, 51 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/atheros/alx/alx.h 
> > b/drivers/net/ethernet/atheros/alx/alx.h
> > index 8fc93c5..d02c424 100644
> > --- a/drivers/net/ethernet/atheros/alx/alx.h
> > +++ b/drivers/net/ethernet/atheros/alx/alx.h
> > @@ -96,6 +96,10 @@ struct alx_priv {
> > unsigned int rx_ringsz;
> > unsigned int rxbuf_size;
> >  
> > +   struct page  *rx_page;
> > +   unsigned int rx_page_offset;
> > +   unsigned int rx_frag_size;
> > +
> > struct napi_struct napi;
> > struct alx_tx_queue txq;
> > struct alx_rx_queue rxq;
> > diff --git a/drivers/net/ethernet/atheros/alx/main.c 
> > b/drivers/net/ethernet/atheros/alx/main.c
> > index 9fe8b5e..c98acdc 100644
> > --- a/drivers/net/ethernet/atheros/alx/main.c
> > +++ b/drivers/net/ethernet/atheros/alx/main.c
> > @@ -70,6 +70,35 @@ static void alx_free_txbuf(struct alx_priv *alx, int 
> > entry)
> > }
> >  }
> >  
> > +static struct sk_buff *alx_alloc_skb(struct alx_priv *alx, gfp_t gfp)
> > +{
> > +   struct sk_buff *skb;
> > +   struct page *page;
> > +
> > +   if (alx->rx_frag_size > PAGE_SIZE)
> > +   return __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp);
> > +
> > +   page = alx->rx_page;
> > +   if (!page) {
> > +   alx->rx_page = page = alloc_page(gfp);
> > +   if (unlikely(!page))
> > +   return NULL;
> > +   alx->rx_page_offset = 0;
> > +   }
> > +
> > +   skb = build_skb(page_address(page) + alx->rx_page_offset,
> > +   alx->rx_frag_size);
> > +   if (likely(skb)) {
> > +   alx->rx_page_offset += alx->rx_frag_size;
> > +   if (alx->rx_page_offset >= PAGE_SIZE)
> > +   alx->rx_page = NULL;
> > +   else
> > +   get_page(page);
> > +   }
> > +   return skb;
> > +}
> > +
> > +
> >  static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t gfp)
> >  {
> > struct alx_rx_queue *rxq = &alx->rxq;
> > @@ -86,7 +115,7 @@ static int alx_refill_rx_ring(struct alx_priv *alx, 
> > gfp_t gfp)
> > while (!cur_buf->skb && next != rxq->read_idx) {
> > struct alx_rfd *rfd = &rxq->rfd[cur];
> >  
> > -   skb = __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp);
> > +   skb = alx_alloc_skb(alx, gfp);
> > if (!skb)
> > break;
> > dma = dma_map_single(&alx->hw.pdev->dev,
> > @@ -124,6 +153,7 @@ static int alx_refill_rx_ring(struct alx_priv *alx, 
> > gfp_t gfp)
> > alx_write_mem16(&alx->hw, ALX_RFD_PIDX, cur);
> > }
> >  
> > +
> > return count;
> >  }
> >  
> > @@ -592,6 +622,11 @@ static void alx_free_rings(struct alx_priv *alx)
> > kfree(alx->txq.bufs);
> > kfree(alx->rxq.bufs);
> >  
> > +   if (alx->rx_p

Re: [PATCH] mlx5: avoid unused variable warning

2016-05-20 Thread David Miller
From: Arnd Bergmann 
Date: Wed, 18 May 2016 16:21:07 +0200

> When CONFIG_NET_CLS_ACT is disabled, we get a new warning in the mlx5
> ethernet driver because the tc_for_each_action() loop never references
> the iterator:
> 
> mellanox/mlx5/core/en_tc.c: In function 'mlx5e_stats_flower':
> mellanox/mlx5/core/en_tc.c:431:20: error: unused variable 'a' 
> [-Werror=unused-variable]
>   struct tc_action *a;
> 
> This changes the dummy tc_for_each_action() macro by adding a
> cast to void, letting the compiler know that the variable is
> intentionally declared but not used here. I could not come up
> with a nicer workaround, but this seems to do the trick.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: aad7e08d39bd ("net/mlx5e: Hardware offloaded flower filter statistics 
> support")
> Fixes: 00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef")

Applied, thanks Arnd.


Re: [PATCH net] bpf: rather use get_random_int for randomizations

2016-05-20 Thread David Miller
From: Daniel Borkmann 
Date: Wed, 18 May 2016 14:14:28 +0200

> Start address randomization and blinding in BPF currently use
> prandom_u32(). prandom_u32() values are not exposed to unpriviledged
> user space to my knowledge, but given other kernel facilities such as
> ASLR, stack canaries, etc make use of stronger get_random_int(), we
> better make use of it here as well given blinding requests successively
> new random values. get_random_int() has minimal entropy pool depletion,
> is not cryptographically secure, but doesn't need to be for our use
> cases here.
> 
> Suggested-by: Hannes Frederic Sowa 
> Signed-off-by: Daniel Borkmann 

Ok, applied, thanks Daniel.


Re: [PATCH net 1/1] qede: Fix DMA address APIs usage

2016-05-20 Thread David Miller
From: Manish Chopra 
Date: Wed, 18 May 2016 07:43:57 -0400

> Driver incorrectly uses dma_unmap_addr_set() to set
> a variable which is in truth a dma_addr_t
> [i.e not defined using DEFINE_DMA_UNMAP_ADDR()] and is
> being used by the driver flows other than unmapping
> physical addresses. This patch fixes driver fastpath
> where CONFIG_NEED_DMA_MAP_STATE is not set.
> 
> Signed-off-by: Manish Chopra 
> Signed-off-by: Yuval Mintz 

Applied, thanks.


Re: [PATCH net] macsec: fix netlink attribute for key id

2016-05-20 Thread David Miller
From: Sabrina Dubroca 
Date: Wed, 18 May 2016 13:34:40 +0200

> In my last commit I replaced MACSEC_SA_ATTR_KEYID by
> MACSEC_SA_ATTR_KEY.
> 
> Fixes: 8acca6acebd0 ("macsec: key identifier is 128 bits, not 64")
> Signed-off-by: Sabrina Dubroca 

Applied and queued up for -stable, thanks.


Re: [PATCH net-next] xen-netback: correct length checks on hash copy_ops

2016-05-20 Thread David Miller
From: Paul Durrant 
Date: Wed, 18 May 2016 08:53:01 +0100

> The length checks on the grant table copy_ops for setting hash key and
> hash mapping are checking the local 'len' value which is correct in
> the case of the former but not the latter. This was picked up by
> static analysis checks.
> 
> This patch replaces checks of 'len' with 'copy_op.len' in both cases
> to correct the incorrect check, keep the two checks consistent, and to
> make it clear what the checks are for.
> 
> Signed-off-by: Paul Durrant 
> Reported-by: Dan Carpenter 

Applied.


Re: [PATCHv3] phy: fix crash in fixed_phy_add()

2016-05-20 Thread David Miller
From: Rabin Vincent 
Date: Wed, 18 May 2016 12:47:31 +0200

> From: Rabin Vincent 
> 
> Since e7f4dc3536a ("mdio: Move allocation of interrupts into core"),
> platforms which call fixed_phy_add() before fixed_mdio_bus_init() is
> called (for example, because the platform code and the fixed_phy driver
> use the same initcall level) crash in fixed_phy_add() since the
> ->mii_bus is not allocated.
> 
> Also since e7f4dc3536a, these interrupts are initalized to polling by
> default.  The few (old) platforms which directly use fixed_phy_add()
> from their platform code all pass PHY_POLL for the irq argument, so we
> can keep these platforms not crashing by simply not attempting to set
> the irq if PHY_POLL is passed.
> 
> Also, even if problems have not been reported on more modern platforms
> which used fixed_phy_register() from drivers' probe functions, we return
> -EPROBE_DEFER if the MDIO bus is not yet registered so that the probe is
> retried later.
> 
> Fixes: e7f4dc3536a400 ("mdio: Move allocation of interrupts into core")
> Signed-off-by: Rabin Vincent 
> ---
> v3: One more suggestion from Andrew: check mii bus state

Applied, thanks.


Re: [RFC] net: remove busylock

2016-05-20 Thread Jesper Dangaard Brouer
On Fri, 20 May 2016 07:16:55 -0700
Eric Dumazet  wrote:

> Since bonding pretends to be multiqueue, TCQ_F_ONETXQUEUE is not set
> on sch->flags when HTB is installed at the bonding device root.

If would be cool if you could run a test with removed busylock and
allow HTB to bulk dequeue.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH 2/2] ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit path.

2016-05-20 Thread David Miller
From: Haishuang Yan 
Date: Wed, 18 May 2016 18:05:52 +0800

> In gre6 xmit path, we are sending a GRE packet, so set fl6 proto
> to IPPROTO_GRE properly.
> 
> Signed-off-by: Haishuang Yan 

I think it would be a lot better to initialize the flow protocol field
properly in ip6gre_tnl_link_config() instead of fixing it up every
single transmit.



Re: [PATCH 1/2] ip6_gre: Fix MTU setting for ip6gretap

2016-05-20 Thread David Miller
From: Haishuang Yan 
Date: Wed, 18 May 2016 18:05:51 +0800

> When creat an ip6gretap interface with an unreachable route,
> the MTU is about 14 bytes larger than what was needed.
> 
> If the remote address is reachable:
> ping6 2001:0:130::1 -c 2
> PING 2001:0:130::1(2001:0:130::1) 56 data bytes
> 64 bytes from 2001:0:130::1: icmp_seq=1 ttl=64 time=1.46 ms
> 64 bytes from 2001:0:130::1: icmp_seq=2 ttl=64 time=81.1 ms
> 
> --- 2001:0:130::1 ping statistics ---
> 2 packets transmitted, 2 received, 0% packet loss, time 1001ms
> rtt min/avg/max/mdev = 1.465/41.316/81.167/39.851 ms
> 
> ip link add ip6gretap1 type ip6gretap\
>  local 2001:0:130::2 remote 2001:0:130::1
> ip link show ip6gretap1
> 11: ip6gretap1@NONE:  mtu 1434 ...
> link/ether c2:f3:f8:c1:2c:bf brd ff:ff:ff:ff:ff:ff
> 
> The MTU value 1434 is right. But if we delete the direct route:
> ip -6 route del 2001:0:130::/64
> ping6 2001:0:130::1 -c 2
> connect: Network is unreachable
> ip link add ip6gretap1 type ip6gretap\
>  local 2001:0:130::2 remote 2001:0:130::1
> ip link show ip6gretap1
> 12: ip6gretap1@NONE:  mtu 1448 ...
> link/ether 7e:e1:d2:c4:06:5e brd ff:ff:ff:ff:ff:ff
> 
> Now, the MTU value 1448 is larger than what was needed.
> 
> This patch fix the issue in this situation.
> 
> Signed-off-by: Haishuang Yan 

It's great that you have provided precise examples of how to reproduce
the problem.

But you need to explain in your commit message how your fix works.
Why does simply checking the device type and subtracting the ethernet
header length work here?  What is the full context for your fix and
why does it work?


Re: [PATCH] net: sock: Add option for memory optimized hints.

2016-05-20 Thread Eric Dumazet
On Fri, 2016-05-20 at 15:36 +0200, peter enderborg wrote:
> From: Peter Enderborg 
> 
> When sending data the socket allocates memory for
> payload on a cache or a page alloc. The page alloc
> then might trigger compation that takes long time.
> This can be avoided with smaller chunks. But
> userspace can not know what is the right size for
> the smaller sends. For this we add a SIZEHINT
> getsocketopt where the userspace can get the size
> for send that will fit into one page (order 0) or
> the max for a slab cache allocation.


Very obscure thing if you ask me. Exposing kernel internals like that
seems to work around a more serious issue.

What kind of sockets would ever use KMALLOC_MAX_CACHE_SIZE  allocations
exactly ?

af_unix and tcp are definitely clean.

28d6427109d13b0f447cba5761f88d3548e83605 net: attempt high order allocations in 
sock_alloc_send_pskb()
e370a7236321773245c5522d8bb299380830d3b2 af_unix: improve STREAM behavior with 
fragmented memory
eb6a24816b247c0be6b2e97e68933072874bbe54 af_unix: reduce high order page 
allocations




[PATCH][3.13.y-ckt][PATCH 1/1] bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr

2016-05-20 Thread Joseph Salisbury
From: Toshiaki Makita 

BugLink: http://bugs.launchpad.net/bugs/1581585

br_fdb_changeaddr() assumes that there is at most one local entry per port
per vlan. It used to be true, but since commit 36fd2b63e3b4 ("bridge: allow
creating/deleting fdb entries via netlink"), it has not been so.
Therefore, the function might fail to search a correct previous address
to be deleted and delete an arbitrary local entry if user has added local
entries manually.

Example of problematic case:
  ip link set eth0 address ee:ff:12:34:56:78
  brctl addif br0 eth0
  bridge fdb add 12:34:56:78:90:ab dev eth0 master
  ip link set eth0 address aa:bb:cc:dd:ee:ff
Then, the address 12:34:56:78:90:ab might be deleted instead of
ee:ff:12:34:56:78, the original mac address of eth0.

Address this issue by introducing a new flag, added_by_user, to struct
net_bridge_fdb_entry.

Note that br_fdb_delete_by_port() has to set added_by_user to 0 in cases
like:
  ip link set eth0 address 12:34:56:78:90:ab
  ip link set eth1 address aa:bb:cc:dd:ee:ff
  brctl addif br0 eth0
  bridge fdb add aa:bb:cc:dd:ee:ff dev eth0 master
  brctl addif br0 eth1
  brctl delif br0 eth0
In this case, kernel should delete the user-added entry aa:bb:cc:dd:ee:ff,
but it also should have been added by "brctl addif br0 eth1" originally,
so we don't delete it and treat it a new kernel-created entry.

Signed-off-by: Toshiaki Makita 
Signed-off-by: David S. Miller 
(backported from commit a5642ab4744bc8c5a8c7ce7c6e30c01bd6bbc691)
Signed-off-by: Joseph Salisbury 
---
 net/bridge/br_fdb.c | 16 
 net/bridge/br_input.c   |  8 
 net/bridge/br_private.h |  3 ++-
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 62c444d..551a27d 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -104,7 +104,7 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const 
unsigned char *newaddr)
struct net_bridge_fdb_entry *f;
 
f = hlist_entry(h, struct net_bridge_fdb_entry, hlist);
-   if (f->dst == p && f->is_local) {
+   if (f->dst == p && f->is_local && !f->added_by_user) {
/* maybe another port has same hw addr? */
struct net_bridge_port *op;
u16 vid = f->vlan_id;
@@ -247,6 +247,7 @@ void br_fdb_delete_by_port(struct net_bridge *br,
ether_addr_equal(op->dev->dev_addr,
 f->addr.addr)) {
f->dst = op;
+   f->added_by_user = 0;
goto skip_delete;
}
}
@@ -397,6 +398,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct 
hlist_head *head,
fdb->vlan_id = vid;
fdb->is_local = 0;
fdb->is_static = 0;
+   fdb->added_by_user = 0;
fdb->updated = fdb->used = jiffies;
hlist_add_head_rcu(&fdb->hlist, head);
}
@@ -447,7 +449,7 @@ int br_fdb_insert(struct net_bridge *br, struct 
net_bridge_port *source,
 }
 
 void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
-  const unsigned char *addr, u16 vid)
+  const unsigned char *addr, u16 vid, bool added_by_user)
 {
struct hlist_head *head = &br->hash[br_mac_hash(addr, vid)];
struct net_bridge_fdb_entry *fdb;
@@ -473,13 +475,18 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
/* fastpath: update of existing entry */
fdb->dst = source;
fdb->updated = jiffies;
+   if (unlikely(added_by_user))
+   fdb->added_by_user = 1;
}
} else {
spin_lock(&br->hash_lock);
if (likely(!fdb_find(head, addr, vid))) {
fdb = fdb_create(head, source, addr, vid);
-   if (fdb)
+   if (fdb) {
+   if (unlikely(added_by_user))
+   fdb->added_by_user = 1;
fdb_notify(br, fdb, RTM_NEWNEIGH);
+   }
}
/* else  we lose race and someone else inserts
 * it first, don't bother updating
@@ -648,6 +655,7 @@ static int fdb_add_entry(struct net_bridge_port *source, 
const __u8 *addr,
 
modified = true;
}
+   fdb->added_by_user = 1;
 
fdb->used = jiffies;
if (modified) {
@@ -666,7 +674,7 @@ static int __br_fdb_add(struct ndmsg *ndm, struct 
net_bridge_port *p,
if (ndm->ndm

[PATCH][v3.13.y-ckt][PATCH 0/1] bridge: notify user space after fdb update

2016-05-20 Thread Joseph Salisbury
BugLink: http://bugs.launchpad.net/bugs/1581585

Hello,

Please consider including upstream commit 
c65c7a306610ee7c13669a8f5601b472c19dc6f1
in the next v3.13.y-ckt release.  It was included in the mainline tree as of 
v3.14-rc3.  It has been tested and confirmed to resolve
http://bugs.launchpad.net/bugs/1581585 .

This commit also reques commit a5642ab4744 as a prereq.  Commit  a5642ab4744 
does not apply cleanly to v3.13-y.ckt, so I performed a backport, which is
in email 1/1


Prereq:
commit a5642ab4744bc8c5a8c7ce7c6e30c01bd6bbc691
Author: Toshiaki Makita 
Date:   Fri Feb 7 16:48:18 2014 +0900

bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr

Real fix:
commit c65c7a306610ee7c13669a8f5601b472c19dc6f1
Author: Jon Maxwell 
Date:   Thu May 29 17:27:16 2014 +1000

bridge: notify user space after fdb update

Regards,

Joe Salisbury


Re: [PATCH net v2] fou: avoid using sk_user_data before it is initialised

2016-05-20 Thread Cong Wang
On Thu, May 19, 2016 at 10:57 PM, Simon Horman
 wrote:
> During initialisation sk->sk_user_data should not be used before
> it is initialised.
>
> Found by bisection after noticing the following:
>
> $ ip fou add port  ipproto 47
> [0.383417] BUG: unable to handle kernel NULL pointer dereference at 
> 0008
> [0.384132] IP: [] fou_nl_cmd_add_port+0x1e1/0x230
> [0.384707] PGD 1fafc067 PUD 1fb72067 PMD 0
> [0.385110] Oops: 0002 [#1] SMP
> [0.385387] Modules linked in:
> [0.385667] CPU: 0 PID: 55 Comm: ip Not tainted 4.6.0-03623-g0b7962a6c4a3 
> #430
> [0.386244] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.7.5-20140531_083030-gandalf 04/01/2014
> [0.386244] task: 88001fb9cac0 ti: 88001fbc8000 task.ti: 
> 88001fbc8000
> [0.386244] RIP: 0010:[]  [] 
> fou_nl_cmd_add_port+0x1e1/0x230
> [0.386244] RSP: 0018:88001fbcbb78  EFLAGS: 00010246
> [0.386244] RAX: 0001 RBX: 88001fb8eb40 RCX: 
> 002f
> [0.386244] RDX:  RSI:  RDI: 
> 880019fcafc0
> [0.386244] RBP: 880019fcaf80 R08: 8130c370 R09: 
> 880019fcaf80
> [0.386244] R10: 880019e12b8c R11:  R12: 
> 0004
> [0.386244] R13: 0014 R14: 88001fb1a300 R15: 
> 816634c0
> [0.386244] FS:  7f016eb4d700() GS:88001a20() 
> knlGS:
> [0.386244] CS:  0010 DS:  ES:  CR0: 80050033
> [0.386244] CR2: 0008 CR3: 1fb69000 CR4: 
> 06b0
> [0.386244] Stack:
> [0.386244]  88001faaea24 8800192426c0 0002002f0001 
> 
> [0.386244]     
> b822
> [0.386244]  81461480 88001faaea14 0004 
> 812b0e17
> [0.386244] Call Trace:
> [0.386244]  [] ? genl_family_rcv_msg+0x197/0x320
> [0.386244]  [] ? genl_family_rcv_msg+0x320/0x320
> [0.386244]  [] ? genl_rcv_msg+0x70/0xb0
> [0.386244]  [] ? netlink_rcv_skb+0xa1/0xc0
> [0.386244]  [] ? genl_rcv+0x24/0x40
> [0.386244]  [] ? netlink_unicast+0x143/0x1d0
> [0.386244]  [] ? netlink_sendmsg+0x366/0x390
> [0.386244]  [] ? rw_copy_check_uvector+0x68/0x110
> [0.386244]  [] ? sock_sendmsg+0x10/0x20
> [0.386244]  [] ? ___sys_sendmsg+0x1f1/0x200
> [0.386244]  [] ? pipe_write+0x1a0/0x420
> [0.386244]  [] ? do_filp_open+0x92/0xe0
> [0.386244]  [] ? __sys_sendmsg+0x41/0x70
> [0.386244]  [] ? entry_SYSCALL_64_fastpath+0x13/0x8f
> [0.386244] Code: 4c 24 12 48 8b 93 28 02 00 00 48 c7 83 68 03 00 00 e0 76 
> 32 81 48 c7 83 78 03 00 00 50 61 32 81 48 c7 83 80 03 00 00 e0 64 32 81 <88> 
> 4a 08 e9 20 ff ff ff 4c 89 e7 bb 8e ff ff ff e8 1a 34 07 00
> [0.386244] RIP  [] fou_nl_cmd_add_port+0x1e1/0x230
> [0.386244]  RSP 
> [0.386244] CR2: 0008
> [0.407176] ---[ end trace 13bf0d24a4b7f9c3 ]---
>
> Fixes: d92283e338f6 ("fou: change to use UDP socket GRO")
> Signed-off-by: Simon Horman 

Acked-by: Cong Wang 

Thanks!


Re: [PATCH] net: sock: Add option for memory optimized hints.

2016-05-20 Thread David Miller

It is not appropriate to submit new features at this time as we are in
the merge window and we are only bug fixes.

Please resubmit this when the net-next tree opens back up.


Re: [PATCH 1/1] net: pegasus: remove dead coding

2016-05-20 Thread David Miller
From: Petko Manolov 
Date: Fri, 20 May 2016 10:33:47 +0300

> On 16-05-19 11:35:42, David Miller wrote:
>> From: Heinrich Schuchardt 
>> Date: Wed, 18 May 2016 02:13:30 +0200
>> 
>> > (!count || count < 4) is always true.
>> > So let's remove the coding which is dead at least since 2005.
>> > 
>> > Signed-off-by: Heinrich Schuchardt 
>> 
>> Applied.
> 
> David, the patch you applied is broken.  It seems that you didn't follow the 
> discussion from the past couple of days.  Please revert it.

I did.


Re: [PATCH] net: pegasus: remove unused variables and labels

2016-05-20 Thread David Miller
From: Arnd Bergmann 
Date: Fri, 20 May 2016 10:22:30 +0200

> The latest dead-code removal was slightly incomplete and
> left a few things behind that we now get compiler warnings
> for:
> 
> drivers/net/usb/pegasus.c: In function 'read_bulk_callback':
> drivers/net/usb/pegasus.c:475:1: error: label 'goon' defined but not used 
> [-Werror=unused-label]
> drivers/net/usb/pegasus.c:446:8: error: unused variable 'pkt_len' 
> [-Werror=unused-variable]
> drivers/net/usb/pegasus.c:445:6: error: unused variable 'buf' 
> [-Werror=unused-variable]
> drivers/net/usb/pegasus.c:443:17: error: unused variable 'count' 
> [-Werror=unused-variable]
> 
> This removes them as well.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: e00be9e4d0ff ("net: pegasus: remove dead coding")

The patch in question was broken and has been reverted.


ip rule duplicates

2016-05-20 Thread Mateusz Bajorski

Hi,

When we add the same rule again with flag NLM_F_EXCL we expect that we 
receive error:

RTNETLINK answers: File exists
This behaviour is already in ip routing part.

I have noticed that iproute2 when adds new rule it attach flag 
NLM_F_EXCL to call.
(see 
http://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/tree/ip/iprule.c#n334)


Next thing what I found is that this flag is not handled from kernel side.

I implemented this feature and I tested this with qemu x86 on:
linux-4.5.4
linux (git)
linux-stable (git)
Tested with ipv4 and ipv6.

current behaviour with ipv4:
localhost ~ # ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
localhost ~ # ip rule
0: from all lookup local
1005: from 10.46.177.97 lookup 104
1005: from 10.46.177.97 lookup 104
32766: from all lookup main
32767: from all lookup default

expected behavior after patch:
localhost ~ # ip rule
0:from all lookup local
32766:from all lookup main
32767:from all lookup default
localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
RTNETLINK answers: File exists
localhost ~ # ip rule
0:from all lookup local
1005:from 10.46.177.97 lookup 104
32766:from all lookup main
32767:from all lookup default


There was already topic regarding this but I don't see any changes 
merged and problem still occurs.

(see http://marc.info/?l=linux-netdev&m=113577886110391&w=2)

--

Best regards,

Mateusz Bajorski
>From 9c3f80dceec414ff31d0c38d0107dec279fc9894 Mon Sep 17 00:00:00 2001
From: Mateusz Bajorski 
Date: Fri, 20 May 2016 14:29:56 +0200
Subject: [PATCH] Added NLM_F_EXCL support to fib_nl_newrule

Signed-off-by: Mateusz Bajorski 
---
 net/core/fib_rules.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 840aceb..c1bc07cd 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -291,6 +291,47 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh)
 	if (err < 0)
 		goto errout;
 
+	if (nlh->nlmsg_flags & NLM_F_EXCL) {
+		list_for_each_entry(rule, &ops->rules_list, list) {
+			if (frh->action && (frh->action != rule->action))
+continue;
+
+			if (frh_get_table(frh, tb) &&
+(frh_get_table(frh, tb) != rule->table))
+continue;
+
+			if (tb[FRA_PRIORITY] &&
+(rule->pref != nla_get_u32(tb[FRA_PRIORITY])))
+continue;
+
+			if (tb[FRA_IIFNAME] &&
+nla_strcmp(tb[FRA_IIFNAME], rule->iifname))
+continue;
+
+			if (tb[FRA_OIFNAME] &&
+nla_strcmp(tb[FRA_OIFNAME], rule->oifname))
+continue;
+
+			if (tb[FRA_FWMARK] &&
+(rule->mark != nla_get_u32(tb[FRA_FWMARK])))
+continue;
+
+			if (tb[FRA_FWMASK] &&
+(rule->mark_mask != nla_get_u32(tb[FRA_FWMASK])))
+continue;
+
+			if (tb[FRA_TUN_ID] &&
+(rule->tun_id != nla_get_be64(tb[FRA_TUN_ID])))
+continue;
+
+			if (!ops->compare(rule, frh, tb))
+continue;
+
+			err = -EEXIST;
+			goto errout;
+		}
+	}
+
 	rule = kzalloc(ops->rule_size, GFP_KERNEL);
 	if (rule == NULL) {
 		err = -ENOMEM;
-- 
2.6.4



[PATCH] ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n

2016-05-20 Thread Ezequiel Garcia
Commit fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
moves the default TTL assignment, and as side-effect IPv4 TTL now
has a default value only if sysctl support is enabled (CONFIG_SYSCTL=y).

The sysctl_ip_default_ttl is fundamental for IP to work properly,
as it provides the TTL to be used as default. The defautl TTL may be
used in ip_selected_ttl, through the following flow:

  ip_select_ttl
ip4_dst_hoplimit
  net->ipv4.sysctl_ip_default_ttl

This commit fixes the issue by assigning net->ipv4.sysctl_ip_default_ttl
in net_init_net, called during ipv4's initialization.

Without this commit, a kernel built without sysctl support will send
all IP packets with zero TTL (unless a TTL is explicitly set, e.g.
with setsockopt).

Given a similar issue might appear on the other knobs that were
namespaceify, this commit also moves them.

Fixes: fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
Signed-off-by: Ezequiel Garcia 
---
 net/ipv4/af_inet.c | 8 
 net/ipv4/sysctl_net_ipv4.c | 4 
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 2e6e65fc4d20..8d334816f89d 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1697,6 +1697,14 @@ static __net_init int inet_init_net(struct net *net)
 */
net->ipv4.ping_group_range.range[0] = make_kgid(&init_user_ns, 1);
net->ipv4.ping_group_range.range[1] = make_kgid(&init_user_ns, 0);
+
+   /* Default values for sysctl-controlled parameters.
+* We set them here, in case sysctl is not compiled.
+*/
+   net->ipv4.sysctl_ip_default_ttl = IPDEFTTL;
+   net->ipv4.sysctl_ip_dynaddr = 0;
+   net->ipv4.sysctl_ip_early_demux = 1;
+
return 0;
 }
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index bb0419582b8d..1cb67de106fe 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -999,10 +999,6 @@ static __net_init int ipv4_sysctl_init_net(struct net *net)
if (!net->ipv4.sysctl_local_reserved_ports)
goto err_ports;
 
-   net->ipv4.sysctl_ip_default_ttl = IPDEFTTL;
-   net->ipv4.sysctl_ip_dynaddr = 0;
-   net->ipv4.sysctl_ip_early_demux = 1;
-
return 0;
 
 err_ports:
-- 
2.7.0



Re: [RFC] net: remove busylock

2016-05-20 Thread John Fastabend
On 16-05-20 06:11 AM, Eric Dumazet wrote:
> On Fri, 2016-05-20 at 09:29 +0200, Jesper Dangaard Brouer wrote:
> 
> 
>> The hole idea behind allowing bulk qdisc dequeue, was to mitigate this,
>> by allowing dequeue to do more work, while holding the lock.
>>
>> You mention HTB.  Notice HTB does not take advantage of bulk dequeue.
>> Have you tried to enable/allow HTB to bulk dequeue?
>>
> 
> Well, __QDISC___STATE_RUNNING means exactly that : one cpu is dequeueing
> many packets from the qdisc and tx them to the device.
> 
> It is generic for any kind of qdisc.
> 
> HTB bulk dequeue would have to call ->dequeue() mutiple times. If you do
> this while holding qdisc spinlock, you block other cpus from doing
> concurrent ->enqueue(), adding latencies (always the same trade off...)
> 
> HTB wont be anytime soon have separate protections for the ->enqueue()
> and the ->dequeue(). Have you looked at this monster ? I did, many
> times...
> 

I came to the conclusion that we just need to rewrite a new modern
version of HTB at some point. Easier said than done however.

> Note that I am working on a patch to transform __QDISC___STATE_RUNNING
> to a seqcount do that we can grab stats without holding the qdisc lock.
> 
> 
> 
> 



[PATCH] Added NLM_F_EXCL support to fib_nl_newrule

2016-05-20 Thread Mateusz Bajorski
When adding rule with NLM_F_EXCL flag then check if the same rule exist.
If yes then exit with -EEXIST.

This is already implemented in iproute2:
if (cmd == RTM_NEWRULE) {
req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL;
req.r.rtm_type = RTN_UNICAST;
}

Tested ipv4 and ipv6 with net-next linux on qemu x86

expected behavior after patch:
localhost ~ # ip rule
0:from all lookup local
32766:from all lookup main
32767:from all lookup default
localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
RTNETLINK answers: File exists
localhost ~ # ip rule
0:from all lookup local
1005:from 10.46.177.97 lookup 104
32766:from all lookup main
32767:from all lookup default

There was already topic regarding this but I don't see any changes
merged and problem still occurs.
https://lkml.kernel.org/r/1135778809.5944.7.camel+%28%29+localhost+%21+localdomain

Signed-off-by: Mateusz Bajorski 
---
 net/core/fib_rules.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 840aceb..b9816a3 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -291,6 +291,47 @@ static int fib_nl_newrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
if (err < 0)
goto errout;
 
+   if (nlh->nlmsg_flags & NLM_F_EXCL) {
+   list_for_each_entry(rule, &ops->rules_list, list) {
+   if (frh->action && (frh->action != rule->action))
+   continue;
+
+   if (frh_get_table(frh, tb) &&
+   frh_get_table(frh, tb) != rule->table)
+   continue;
+
+   if (tb[FRA_PRIORITY] &&
+   rule->pref != nla_get_u32(tb[FRA_PRIORITY]))
+   continue;
+
+   if (tb[FRA_IIFNAME] &&
+   nla_strcmp(tb[FRA_IIFNAME], rule->iifname))
+   continue;
+
+   if (tb[FRA_OIFNAME] &&
+   nla_strcmp(tb[FRA_OIFNAME], rule->oifname))
+   continue;
+
+   if (tb[FRA_FWMARK] &&
+   rule->mark != nla_get_u32(tb[FRA_FWMARK]))
+   continue;
+
+   if (tb[FRA_FWMASK] &&
+   rule->mark_mask != nla_get_u32(tb[FRA_FWMASK]))
+   continue;
+
+   if (tb[FRA_TUN_ID] &&
+   rule->tun_id != nla_get_be64(tb[FRA_TUN_ID]))
+   continue;
+
+   if (!ops->compare(rule, frh, tb))
+   continue;
+
+   err = -EEXIST;
+   goto errout;
+   }
+   }
+
rule = kzalloc(ops->rule_size, GFP_KERNEL);
if (rule == NULL) {
err = -ENOMEM;
-- 
2.6.4


Re: [RFC] net: remove busylock

2016-05-20 Thread Eric Dumazet
On Fri, 2016-05-20 at 06:47 -0700, Eric Dumazet wrote:
> On Fri, 2016-05-20 at 06:11 -0700, Eric Dumazet wrote:
> > On Fri, 2016-05-20 at 09:29 +0200, Jesper Dangaard Brouer wrote:
> > 
> > 
> > > The hole idea behind allowing bulk qdisc dequeue, was to mitigate this,
> > > by allowing dequeue to do more work, while holding the lock.
> > > 
> > > You mention HTB.  Notice HTB does not take advantage of bulk dequeue.
> > > Have you tried to enable/allow HTB to bulk dequeue?
> > > 
> > 
> > Well, __QDISC___STATE_RUNNING means exactly that : one cpu is dequeueing
> > many packets from the qdisc and tx them to the device.
> > 
> > It is generic for any kind of qdisc.
> > 
> > HTB bulk dequeue would have to call ->dequeue() mutiple times. If you do
> > this while holding qdisc spinlock, you block other cpus from doing
> > concurrent ->enqueue(), adding latencies (always the same trade off...)
> > 
> > HTB wont be anytime soon have separate protections for the ->enqueue()
> > and the ->dequeue(). Have you looked at this monster ? I did, many
> > times...
> > 
> > Note that I am working on a patch to transform __QDISC___STATE_RUNNING
> > to a seqcount do that we can grab stats without holding the qdisc lock.
> 
> Slide note : __qdisc_run() could probably avoid a __netif_schedule()
> when it breaks the loop, if another cpu is busy spinning on qdisc lock.
> 
> -> Less (spurious) TX softirq invocations, so less chance to trigger the
> infamous ksoftirqd bug we discussed lately.

Also note that in our case, we have HTB on a bonding device, and
FQ/pacing on slaves.

Since bonding pretends to be multiqueue, TCQ_F_ONETXQUEUE is not set
on sch->flags when HTB is installed at the bonding device root.

We might add a flag to tell qdisc layer that a device is virtual and
could benefit from bulk dequeue, since the ultimate TX queue is located
on another (physical) netdev, eventually MQ enabled.






Re: [RFC] net: remove busylock

2016-05-20 Thread Eric Dumazet
On Fri, 2016-05-20 at 06:11 -0700, Eric Dumazet wrote:
> On Fri, 2016-05-20 at 09:29 +0200, Jesper Dangaard Brouer wrote:
> 
> 
> > The hole idea behind allowing bulk qdisc dequeue, was to mitigate this,
> > by allowing dequeue to do more work, while holding the lock.
> > 
> > You mention HTB.  Notice HTB does not take advantage of bulk dequeue.
> > Have you tried to enable/allow HTB to bulk dequeue?
> > 
> 
> Well, __QDISC___STATE_RUNNING means exactly that : one cpu is dequeueing
> many packets from the qdisc and tx them to the device.
> 
> It is generic for any kind of qdisc.
> 
> HTB bulk dequeue would have to call ->dequeue() mutiple times. If you do
> this while holding qdisc spinlock, you block other cpus from doing
> concurrent ->enqueue(), adding latencies (always the same trade off...)
> 
> HTB wont be anytime soon have separate protections for the ->enqueue()
> and the ->dequeue(). Have you looked at this monster ? I did, many
> times...
> 
> Note that I am working on a patch to transform __QDISC___STATE_RUNNING
> to a seqcount do that we can grab stats without holding the qdisc lock.

Slide note : __qdisc_run() could probably avoid a __netif_schedule()
when it breaks the loop, if another cpu is busy spinning on qdisc lock.

-> Less (spurious) TX softirq invocations, so less chance to trigger the
infamous ksoftirqd bug we discussed lately.





[PATCH] net: sock: Add option for memory optimized hints.

2016-05-20 Thread peter enderborg

From: Peter Enderborg 

When sending data the socket allocates memory for
payload on a cache or a page alloc. The page alloc
then might trigger compation that takes long time.
This can be avoided with smaller chunks. But
userspace can not know what is the right size for
the smaller sends. For this we add a SIZEHINT
getsocketopt where the userspace can get the size
for send that will fit into one page (order 0) or
the max for a slab cache allocation.

Signed-off-by: Peter Enderborg 
---
 include/uapi/asm-generic/socket.h |  2 ++
 include/uapi/linux/socket.h   |  9 +
 net/core/sock.c   | 17 +
 3 files changed, 28 insertions(+)

diff --git a/include/uapi/asm-generic/socket.h 
b/include/uapi/asm-generic/socket.h
index 67d632f..f6a4921 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -92,4 +92,6 @@

 #define SO_CNX_ADVICE  53

+#define SO_SIZEHINT54
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/uapi/linux/socket.h b/include/uapi/linux/socket.h
index 76ab0c6..16db7e8 100644
--- a/include/uapi/linux/socket.h
+++ b/include/uapi/linux/socket.h
@@ -1,6 +1,8 @@
 #ifndef _UAPI_LINUX_SOCKET_H
 #define _UAPI_LINUX_SOCKET_H

+#include 
+
 /*
  * Desired design of maximum size and alignment (see RFC2553)
  */
@@ -18,4 +20,11 @@ struct __kernel_sockaddr_storage {
/* _SS_MAXSIZE value minus size of ss_family */
 } __attribute__ ((aligned(_K_SS_ALIGNSIZE)));  /* force desired alignment */

+struct sock_sizehint {
+   __u32   order_zero_size;
+   /* max payload size that can fit into one page in kernel */
+   __u32   cache_size;
+   /* max payload size that can fit in socket slab cache */
+};
+
 #endif /* _UAPI_LINUX_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 7e73c26..4c9cd92 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1254,6 +1254,23 @@ int sock_getsockopt(struct socket *sock, int level, int 
optname,
v.val = sk->sk_incoming_cpu;
break;

+   case SO_SIZEHINT:
+   {
+   struct sock_sizehint hint;
+
+   if (len > sizeof(hint))
+   len = sizeof(hint);
+
+   hint.order_zero_size = PAGE_SIZE -
+   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+   hint.cache_size =  KMALLOC_MAX_CACHE_SIZE -
+   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
+   if (copy_to_user(optval, &hint, len))
+   return -EFAULT;
+   goto lenout;
+   }
+
default:
/* We implement the SO_SNDLOWAT etc to not be settable
 * (1003.1g 7).
--
2.4.2



Re: [RFC] net: remove busylock

2016-05-20 Thread Eric Dumazet
On Fri, 2016-05-20 at 09:29 +0200, Jesper Dangaard Brouer wrote:


> The hole idea behind allowing bulk qdisc dequeue, was to mitigate this,
> by allowing dequeue to do more work, while holding the lock.
> 
> You mention HTB.  Notice HTB does not take advantage of bulk dequeue.
> Have you tried to enable/allow HTB to bulk dequeue?
> 

Well, __QDISC___STATE_RUNNING means exactly that : one cpu is dequeueing
many packets from the qdisc and tx them to the device.

It is generic for any kind of qdisc.

HTB bulk dequeue would have to call ->dequeue() mutiple times. If you do
this while holding qdisc spinlock, you block other cpus from doing
concurrent ->enqueue(), adding latencies (always the same trade off...)

HTB wont be anytime soon have separate protections for the ->enqueue()
and the ->dequeue(). Have you looked at this monster ? I did, many
times...

Note that I am working on a patch to transform __QDISC___STATE_RUNNING
to a seqcount do that we can grab stats without holding the qdisc lock.






Re: [RFC net-next] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump

2016-05-20 Thread Eric Dumazet
On Fri, 2016-05-20 at 06:01 -0700, Eric Dumazet wrote:

> Tricky, since sch_direct_xmit() releases the qdisc spinlock and grabs it
> again, while owning the ' running seqcount'
> 
> Needs more LOCKDEP tricks ;)
> 

That would be :

@@ -137,10 +137,10 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
HARD_TX_UNLOCK(dev, txq);
} else {
-   spin_lock(root_lock);
+   spin_lock_nested(root_lock, SINGLE_DEPTH_NESTING);
return qdisc_qlen(q);
}
-   spin_lock(root_lock);
+   spin_lock_nested(root_lock, SINGLE_DEPTH_NESTING);
 
if (dev_xmit_complete(ret)) {
/* Driver sent out skb successfully or skb was consumed */




Re: [RFC net-next] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump

2016-05-20 Thread Eric Dumazet
On Fri, 2016-05-20 at 05:44 -0700, Eric Dumazet wrote:
> On Thu, 2016-05-19 at 19:45 -0700, Eric Dumazet wrote:
> > On Thu, 2016-05-19 at 18:50 -0700, Cong Wang wrote:
> > > On Thu, May 19, 2016 at 5:35 AM, Eric Dumazet  
> > > wrote:
> > > >
> > > > These stats are using u64 or u32 fields, so reading integral values
> > > > should not prevent writers from doing concurrent updates if the kernel
> > > > arch is a 64bit one.
> > > >
> > > > Being able to atomically fetch all counters like packets and bytes sent
> > > > at the expense of interfering in fast path (queue and dequeue packets)
> > > > is simply not worth the pain, as the values are generally stale after 1
> > > > usec.
> > > 
> > > I think one purpose of this lock is to make sure we have an atomic
> > > snapshot of these counters as a whole. IOW, we may need another
> > > lock rather than the qdisc root lock to guarantee this.
> > 
> > Right, this was stated in the changelog.
> > 
> > I played a bit at changing qdisc->__state to a seqcount.
> > 
> > But this would add 2 additional smp_wmb() barriers.
> 
> Although this would allow the mechanism to be used both on 32bit an
> 64bit kernels.
> 
> This would also add LOCKDEP annotations which can be nice for debugging.
> 
> Also the seqcount value >> 1 would give us the number of __qdisc_run()
> and we could compute packets/(seqcount>>1) to get the average number of
> packets processed per round.

Tricky, since sch_direct_xmit() releases the qdisc spinlock and grabs it
again, while owning the ' running seqcount'

Needs more LOCKDEP tricks ;)




Re: [PATCH] ptp: use memdup_user().

2016-05-20 Thread Richard Cochran
On Fri, May 20, 2016 at 05:51:02PM +0530, Muhammad Falak R Wani wrote:
> Use memdup_user to duplicate a memory region from user-space to
> kernel-space, instead of open coding using kmalloc & copy_from_user.
> 
> Signed-off-by: Muhammad Falak R Wani 

Acked-by: Richard Cochran 


Re: [RFC net-next] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump

2016-05-20 Thread Eric Dumazet
On Thu, 2016-05-19 at 19:45 -0700, Eric Dumazet wrote:
> On Thu, 2016-05-19 at 18:50 -0700, Cong Wang wrote:
> > On Thu, May 19, 2016 at 5:35 AM, Eric Dumazet  
> > wrote:
> > >
> > > These stats are using u64 or u32 fields, so reading integral values
> > > should not prevent writers from doing concurrent updates if the kernel
> > > arch is a 64bit one.
> > >
> > > Being able to atomically fetch all counters like packets and bytes sent
> > > at the expense of interfering in fast path (queue and dequeue packets)
> > > is simply not worth the pain, as the values are generally stale after 1
> > > usec.
> > 
> > I think one purpose of this lock is to make sure we have an atomic
> > snapshot of these counters as a whole. IOW, we may need another
> > lock rather than the qdisc root lock to guarantee this.
> 
> Right, this was stated in the changelog.
> 
> I played a bit at changing qdisc->__state to a seqcount.
> 
> But this would add 2 additional smp_wmb() barriers.

Although this would allow the mechanism to be used both on 32bit an
64bit kernels.

This would also add LOCKDEP annotations which can be nice for debugging.

Also the seqcount value >> 1 would give us the number of __qdisc_run()
and we could compute packets/(seqcount>>1) to get the average number of
packets processed per round.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 941ec99cd3b6..471095beca09 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4610,6 +4610,7 @@ static int bond_check_params(struct bond_params *params)
 static struct lock_class_key bonding_netdev_xmit_lock_key;
 static struct lock_class_key bonding_netdev_addr_lock_key;
 static struct lock_class_key bonding_tx_busylock_key;
+static struct lock_class_key bonding_qdisc_running;
 
 static void bond_set_lockdep_class_one(struct net_device *dev,
   struct netdev_queue *txq,
@@ -4625,6 +4626,7 @@ static void bond_set_lockdep_class(struct net_device *dev)
  &bonding_netdev_addr_lock_key);
netdev_for_each_tx_queue(dev, bond_set_lockdep_class_one, NULL);
dev->qdisc_tx_busylock = &bonding_tx_busylock_key;
+   dev->qdisc_running = &bonding_qdisc_running;
 }
 
 /* Called from registration process */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c148edfe4965..e06646b69b06 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1862,6 +1862,7 @@ struct net_device {
 #endif
struct phy_device   *phydev;
struct lock_class_key   *qdisc_tx_busylock;
+   struct lock_class_key   *qdisc_running;
boolproto_down;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a1fd76c22a59..bff8d895ef8a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -29,13 +29,6 @@ enum qdisc_state_t {
__QDISC_STATE_THROTTLED,
 };
 
-/*
- * following bits are only changed while qdisc lock is held
- */
-enum qdisc___state_t {
-   __QDISC___STATE_RUNNING = 1,
-};
-
 struct qdisc_size_table {
struct rcu_head rcu;
struct list_headlist;
@@ -93,7 +86,7 @@ struct Qdisc {
unsigned long   state;
struct sk_buff_head q;
struct gnet_stats_basic_packed bstats;
-   unsigned int__state;
+   seqcount_t  running;
struct gnet_stats_queue qstats;
struct rcu_head rcu_head;
int padded;
@@ -104,20 +97,20 @@ struct Qdisc {
 
 static inline bool qdisc_is_running(const struct Qdisc *qdisc)
 {
-   return (qdisc->__state & __QDISC___STATE_RUNNING) ? true : false;
+   return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
 }
 
 static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 {
if (qdisc_is_running(qdisc))
return false;
-   qdisc->__state |= __QDISC___STATE_RUNNING;
+   write_seqcount_begin(&qdisc->running);
return true;
 }
 
 static inline void qdisc_run_end(struct Qdisc *qdisc)
 {
-   qdisc->__state &= ~__QDISC___STATE_RUNNING;
+   write_seqcount_end(&qdisc->running);
 }
 
 static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff431d570..55b414dead29 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3075,7 +3075,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, 
struct Qdisc *q,
/*
 * Heuristic to force contended enqueues to serialize on a
 * separate lock before trying to get qdisc main lock.
-* This permits __QDISC___STATE_RUNNING owner to get the lock more
+* This permits qdisc->running owner to get the lock more
 * often and dequeue packets faster.
 */
contended = qdisc_is_running(q);
diff --git a/net/sched/sch_generic.c b/

rxrpc: Simplify connect() implementation and simplify sendmsg() op

2016-05-20 Thread David Howells
Hi Dave,

Are you okay with taking this into net-next?

I no longer take away the ability to do connect(), but now it does nothing
more than specify a default address and mark the socket as being client
only.  As before, the default address is overridden if sendmsg() is given
an address.

The other significant UAPI change is that I deal better with a race between
two threads trying to register calls with the same user ID (one of them
will get an error).

David
---
rxrpc: Simplify connect() implementation and simplify sendmsg() op

Simplify the RxRPC connect() implementation.  It will just note the
destination address it is given, and if a sendmsg() comes along with no
address, this will be assigned as the address.  No transport struct will be
held internally, which will allow us to remove this later.

Simplify sendmsg() also.  Whilst a call is active, userspace refers to it
by a private unique user ID specified in a control message.  When sendmsg()
sees a user ID that doesn't map to an extant call, it creates a new call
for that user ID and attempts to add it.  If, when we try to add it, the
user ID is now registered, we now reject the message with -EEXIST.  We
should never see this situation unless two threads are racing, trying to
create a call with the same ID - which would be an error.

It also isn't required to provide sendmsg() with an address - provided the
control message data holds a user ID that maps to a currently active call.

Signed-off-by: David Howells 
---
 include/linux/rxrpc.h |   18 ++--
 net/rxrpc/af_rxrpc.c  |  177 +++
 net/rxrpc/ar-call.c   |  158 +++
 net/rxrpc/ar-connection.c |   17 
 net/rxrpc/ar-internal.h   |   22 ++---
 net/rxrpc/ar-output.c |  186 +-
 6 files changed, 234 insertions(+), 344 deletions(-)

diff --git a/include/linux/rxrpc.h b/include/linux/rxrpc.h
index a53915cd5581..1e8f216e2cf1 100644
--- a/include/linux/rxrpc.h
+++ b/include/linux/rxrpc.h
@@ -40,16 +40,18 @@ struct sockaddr_rxrpc {
 
 /*
  * RxRPC control messages
+ * - If neither abort or accept are specified, the message is a data message.
  * - terminal messages mean that a user call ID tag can be recycled
+ * - s/r/- indicate whether these are applicable to sendmsg() and/or recvmsg()
  */
-#define RXRPC_USER_CALL_ID 1   /* user call ID specifier */
-#define RXRPC_ABORT2   /* abort request / notification 
[terminal] */
-#define RXRPC_ACK  3   /* [Server] RPC op final ACK received 
[terminal] */
-#define RXRPC_NET_ERROR5   /* network error received 
[terminal] */
-#define RXRPC_BUSY 6   /* server busy received [terminal] */
-#define RXRPC_LOCAL_ERROR  7   /* local error generated [terminal] */
-#define RXRPC_NEW_CALL 8   /* [Server] new incoming call 
notification */
-#define RXRPC_ACCEPT   9   /* [Server] accept request */
+#define RXRPC_USER_CALL_ID 1   /* sr: user call ID specifier */
+#define RXRPC_ABORT2   /* sr: abort request / notification 
[terminal] */
+#define RXRPC_ACK  3   /* -r: [Service] RPC op final ACK 
received [terminal] */
+#define RXRPC_NET_ERROR5   /* -r: network error received 
[terminal] */
+#define RXRPC_BUSY 6   /* -r: server busy received [terminal] 
*/
+#define RXRPC_LOCAL_ERROR  7   /* -r: local error generated [terminal] 
*/
+#define RXRPC_NEW_CALL 8   /* -r: [Service] new incoming call 
notification */
+#define RXRPC_ACCEPT   9   /* s-: [Service] accept request */
 
 /*
  * RxRPC security levels
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index e45e94ca030f..484b5ee16a50 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -137,33 +137,33 @@ static int rxrpc_bind(struct socket *sock, struct 
sockaddr *saddr, int len)
 
lock_sock(&rx->sk);
 
-   if (rx->sk.sk_state != RXRPC_UNCONNECTED) {
+   if (rx->sk.sk_state != RXRPC_UNBOUND) {
ret = -EINVAL;
goto error_unlock;
}
 
memcpy(&rx->srx, srx, sizeof(rx->srx));
 
-   /* Find or create a local transport endpoint to use */
local = rxrpc_lookup_local(&rx->srx);
if (IS_ERR(local)) {
ret = PTR_ERR(local);
goto error_unlock;
}
 
-   rx->local = local;
-   if (srx->srx_service) {
+   if (rx->srx.srx_service) {
write_lock_bh(&local->services_lock);
list_for_each_entry(prx, &local->services, listen_link) {
-   if (prx->srx.srx_service == srx->srx_service)
+   if (prx->srx.srx_service == rx->srx.srx_service)
goto service_in_use;
}
 
+   rx->local = local;
list_add_tail(&rx->listen_link, &loc

[PATCH] ptp: use memdup_user().

2016-05-20 Thread Muhammad Falak R Wani
Use memdup_user to duplicate a memory region from user-space to
kernel-space, instead of open coding using kmalloc & copy_from_user.

Signed-off-by: Muhammad Falak R Wani 
---
 drivers/ptp/ptp_chardev.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index 579fd65..0b1ac6b 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -208,14 +208,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
break;
 
case PTP_SYS_OFFSET:
-   sysoff = kmalloc(sizeof(*sysoff), GFP_KERNEL);
-   if (!sysoff) {
-   err = -ENOMEM;
-   break;
-   }
-   if (copy_from_user(sysoff, (void __user *)arg,
-  sizeof(*sysoff))) {
-   err = -EFAULT;
+   sysoff = memdup_user((void __user *)arg, sizeof(*sysoff));
+   if (IS_ERR(sysoff)) {
+   err = PTR_ERR(sysoff);
break;
}
if (sysoff->n_samples > PTP_MAX_SAMPLES) {
-- 
1.9.1



[PATCH] wan: cosa: use memdup_user().

2016-05-20 Thread Muhammad Falak R Wani
Use memdup_user to duplicate a memory region from user-space to
kernel-space, instead of open coding using kmalloc & copy_from_user.

Signed-off-by: Muhammad Falak R Wani 
---
 drivers/net/wan/cosa.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wan/cosa.c b/drivers/net/wan/cosa.c
index b87fe0a..fb37439 100644
--- a/drivers/net/wan/cosa.c
+++ b/drivers/net/wan/cosa.c
@@ -876,15 +876,10 @@ static ssize_t cosa_write(struct file *file,
count = COSA_MTU;

/* Allocate the buffer */
-   kbuf = kmalloc(count, GFP_KERNEL|GFP_DMA);
-   if (kbuf == NULL) {
+   kbuf = memdup_user(buf, count);
+   if (IS_ERR(kbuf)) {
up(&chan->wsem);
-   return -ENOMEM;
-   }
-   if (copy_from_user(kbuf, buf, count)) {
-   up(&chan->wsem);
-   kfree(kbuf);
-   return -EFAULT;
+   return PTR_ERR(kbuf);
}
chan->tx_status=0;
cosa_start_tx(chan, kbuf, count);
-- 
1.9.1



[PATCH V2 4.8 1/2] brcmutil: add field storing control channel to the struct brcmu_chan

2016-05-20 Thread Rafał Miłecki
Our d11 code supports encoding/decoding channel info into/from chanspec
format used by firmware. Current implementation is quite misleading
because of the way "chnum" field is used.
When encoding channel info, "chnum" has to be filled by a caller with
*center* channel number. However when decoding chanspec the same field
is filled with a *control* channel number.

1) This can be confusing. It's expected for information to be the same
   after encoding and decoding.
2) It doesn't allow accessing all info when decoding. Some functions may
   need to know both channel numbers, e.g. cfg80211 callback getting
   current channel.
Solve this by adding a separated field for control channel.

Signed-off-by: Rafał Miłecki 
Reviewed-by: Arend van Spriel 
---
V2: Update commit description message.
---
 .../broadcom/brcm80211/brcmfmac/cfg80211.c | 17 +
 .../net/wireless/broadcom/brcm80211/brcmfmac/p2p.c | 10 +-
 .../net/wireless/broadcom/brcm80211/brcmutil/d11.c | 18 ++
 .../broadcom/brcm80211/include/brcmu_d11.h | 22 ++
 4 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index d0631b6..597495d 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -2734,7 +2734,7 @@ static s32 brcmf_inform_single_bss(struct 
brcmf_cfg80211_info *cfg,
if (!bi->ctl_ch) {
ch.chspec = le16_to_cpu(bi->chanspec);
cfg->d11inf.decchspec(&ch);
-   bi->ctl_ch = ch.chnum;
+   bi->ctl_ch = ch.control_ch_num;
}
channel = bi->ctl_ch;
 
@@ -2852,7 +2852,7 @@ static s32 brcmf_inform_ibss(struct brcmf_cfg80211_info 
*cfg,
else
band = wiphy->bands[NL80211_BAND_5GHZ];
 
-   freq = ieee80211_channel_to_frequency(ch.chnum, band->band);
+   freq = ieee80211_channel_to_frequency(ch.control_ch_num, band->band);
cfg->channel = freq;
notify_channel = ieee80211_get_channel(wiphy, freq);
 
@@ -2862,7 +2862,7 @@ static s32 brcmf_inform_ibss(struct brcmf_cfg80211_info 
*cfg,
notify_ielen = le32_to_cpu(bi->ie_length);
notify_signal = (s16)le16_to_cpu(bi->RSSI) * 100;
 
-   brcmf_dbg(CONN, "channel: %d(%d)\n", ch.chnum, freq);
+   brcmf_dbg(CONN, "channel: %d(%d)\n", ch.control_ch_num, freq);
brcmf_dbg(CONN, "capability: %X\n", notify_capability);
brcmf_dbg(CONN, "beacon interval: %d\n", notify_interval);
brcmf_dbg(CONN, "signal: %d\n", notify_signal);
@@ -5280,7 +5280,7 @@ brcmf_bss_roaming_done(struct brcmf_cfg80211_info *cfg,
else
band = wiphy->bands[NL80211_BAND_5GHZ];
 
-   freq = ieee80211_channel_to_frequency(ch.chnum, band->band);
+   freq = ieee80211_channel_to_frequency(ch.control_ch_num, band->band);
notify_channel = ieee80211_get_channel(wiphy, freq);
 
 done:
@@ -5802,14 +5802,15 @@ static int brcmf_construct_chaninfo(struct 
brcmf_cfg80211_info *cfg,
channel = band->channels;
index = band->n_channels;
for (j = 0; j < band->n_channels; j++) {
-   if (channel[j].hw_value == ch.chnum) {
+   if (channel[j].hw_value == ch.control_ch_num) {
index = j;
break;
}
}
channel[index].center_freq =
-   ieee80211_channel_to_frequency(ch.chnum, band->band);
-   channel[index].hw_value = ch.chnum;
+   ieee80211_channel_to_frequency(ch.control_ch_num,
+  band->band);
+   channel[index].hw_value = ch.control_ch_num;
 
/* assuming the chanspecs order is HT20,
 * HT40 upper, HT40 lower, and VHT80.
@@ -5911,7 +5912,7 @@ static int brcmf_enable_bw40_2g(struct 
brcmf_cfg80211_info *cfg)
if (WARN_ON(ch.bw != BRCMU_CHAN_BW_40))
continue;
for (j = 0; j < band->n_channels; j++) {
-   if (band->channels[j].hw_value == ch.chnum)
+   if (band->channels[j].hw_value == 
ch.control_ch_num)
break;
}
if (WARN_ON(j == band->n_channels))
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
index a70cda6..1652a48 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
@@ -1246,7 +1246,7 @@ bool brcmf_p2p_scan_finding_common_channel(struct 
brcmf_cfg80211_info *cfg,
   

[PATCH V2 4.8 2/2] brcmfmac: support get_channel cfg80211 callback

2016-05-20 Thread Rafał Miłecki
This is important for brcmfmac as some of released firmwares (e.g.
brcmfmac4366b-pcie.bin) may pick different channel than requested. This
has been tested with BCM4366B1 in D-Link DIR-885L.

Signed-off-by: Rafał Miłecki 
---
V2: Check if ndev isn't NULL, update description.
---
 .../broadcom/brcm80211/brcmfmac/cfg80211.c | 63 ++
 1 file changed, 63 insertions(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 597495d..299a404 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -4892,6 +4892,68 @@ exit:
return err;
 }
 
+static int brcmf_cfg80211_get_channel(struct wiphy *wiphy,
+ struct wireless_dev *wdev,
+ struct cfg80211_chan_def *chandef)
+{
+   struct brcmf_cfg80211_info *cfg = wiphy_to_cfg(wiphy);
+   struct net_device *ndev = wdev->netdev;
+   struct brcmf_if *ifp;
+   struct brcmu_chan ch;
+   enum nl80211_band band = 0;
+   enum nl80211_chan_width width = 0;
+   u32 chanspec;
+   int freq, err;
+
+   if (!ndev)
+   return -ENODEV;
+   ifp = netdev_priv(ndev);
+
+   err = brcmf_fil_iovar_int_get(ifp, "chanspec", &chanspec);
+   if (err) {
+   brcmf_err("chanspec failed (%d)\n", err);
+   return err;
+   }
+
+   ch.chspec = chanspec;
+   cfg->d11inf.decchspec(&ch);
+
+   switch (ch.band) {
+   case BRCMU_CHAN_BAND_2G:
+   band = NL80211_BAND_2GHZ;
+   break;
+   case BRCMU_CHAN_BAND_5G:
+   band = NL80211_BAND_5GHZ;
+   break;
+   }
+
+   switch (ch.bw) {
+   case BRCMU_CHAN_BW_80:
+   width = NL80211_CHAN_WIDTH_80;
+   break;
+   case BRCMU_CHAN_BW_40:
+   width = NL80211_CHAN_WIDTH_40;
+   break;
+   case BRCMU_CHAN_BW_20:
+   width = NL80211_CHAN_WIDTH_20;
+   break;
+   case BRCMU_CHAN_BW_80P80:
+   width = NL80211_CHAN_WIDTH_80P80;
+   break;
+   case BRCMU_CHAN_BW_160:
+   width = NL80211_CHAN_WIDTH_160;
+   break;
+   }
+
+   freq = ieee80211_channel_to_frequency(ch.control_ch_num, band);
+   chandef->chan = ieee80211_get_channel(wiphy, freq);
+   chandef->width = width;
+   chandef->center_freq1 = ieee80211_channel_to_frequency(ch.chnum, band);
+   chandef->center_freq2 = 0;
+
+   return 0;
+}
+
 static int brcmf_cfg80211_crit_proto_start(struct wiphy *wiphy,
   struct wireless_dev *wdev,
   enum nl80211_crit_proto_id proto,
@@ -5054,6 +5116,7 @@ static struct cfg80211_ops brcmf_cfg80211_ops = {
.mgmt_tx = brcmf_cfg80211_mgmt_tx,
.remain_on_channel = brcmf_p2p_remain_on_channel,
.cancel_remain_on_channel = brcmf_cfg80211_cancel_remain_on_channel,
+   .get_channel = brcmf_cfg80211_get_channel,
.start_p2p_device = brcmf_p2p_start_device,
.stop_p2p_device = brcmf_p2p_stop_device,
.crit_proto_start = brcmf_cfg80211_crit_proto_start,
-- 
1.8.4.5



Re: [PATCH 4.8 2/2] brcmfmac: support get_channel cfg80211 callback

2016-05-20 Thread Rafał Miłecki
On 20 May 2016 at 09:42, Arend Van Spriel  wrote:
> On 19-5-2016 13:02, Rafał Miłecki wrote:
>> This is important for brcmfmac as the firmware may pick different
>> channel than requested. This has been tested with BCM4366B1 (in D-Link
>> DIR-885L).
>
> Can you elaborate? Is this for AP or STA mode?

This happens when using 5 GHz PHY with one AP interface. It seems
firmware respects: band, channel width & control channel location.
However it picks center channel in a more or less random way.

E.g. I configured hostapd to setup AP using 36 control channel with
VHT80 (chanspec 0xe02a). Almost every time I was restarting AP I got
firmware picking different chanspecs, all cases listed below:
0xe03a BND_5G | BW_80 | SB_LL | 58
0xe06a BND_5G | BW_80 | SB_LL | 106
0xe07a BND_5G | BW_80 | SB_LL | 122
0xe09b BND_5G | BW_80 | SB_LL | 155

I'm a bit disappointed seeing this FullMAC firmware doing such tricks
on its own. I would prefer to simply relay on hostapd doing this kind
of channel switching. I saw many times hostapd e.g. respecting my
40/80 MHz channel but switching control one in order to avoid
collisions with another AP's control frames on the same frequency.

On the other hand I never got this problem with BCM43602 using
Broadcom's official firmware, so it's some new feature you enabled
when building brcmfmac4366b-pcie.bin.


>> Signed-off-by: Rafał Miłecki 
>> ---
>>  .../broadcom/brcm80211/brcmfmac/cfg80211.c | 59 
>> ++
>>  1 file changed, 59 insertions(+)
>>
>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
>> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
>> index 597495d..4fb9e3a 100644
>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
>> @@ -4892,6 +4892,64 @@ exit:
>>   return err;
>>  }
>>
>> +static int brcmf_cfg80211_get_channel(struct wiphy *wiphy,
>> +   struct wireless_dev *wdev,
>> +   struct cfg80211_chan_def *chandef)
>> +{
>> + struct brcmf_cfg80211_info *cfg = wiphy_to_cfg(wiphy);
>> + struct net_device *ndev = wdev->netdev;
>> + struct brcmf_if *ifp = netdev_priv(ndev);
>
> Can this operation be done on a P2P_DEVICE interface, ie. wdev->netdev
> == NULL?

I don't have any experience with P2P, thanks a lot for pointing this to me!


>> + struct brcmu_chan ch;
>> + enum nl80211_band band = 0;
>> + enum nl80211_chan_width width = 0;
>> + u32 chanspec;
>> + int freq, err;
>> +
>> + err = brcmf_fil_iovar_int_get(ifp, "chanspec", &chanspec);
>> + if (err) {
>> + brcmf_err("chanspec failed (%d)\n", err);
>> + return err;
>> + }
>> +
>> + ch.chspec = chanspec;
>> + cfg->d11inf.decchspec(&ch);
>> +
>> + switch (ch.band) {
>> + case BRCMU_CHAN_BAND_2G:
>> + band = NL80211_BAND_2GHZ;
>> + break;
>> + case BRCMU_CHAN_BAND_5G:
>> + band = NL80211_BAND_5GHZ;
>> + break;
>> + }
>> +
>> + switch (ch.bw) {
>> + case BRCMU_CHAN_BW_80:
>> + width = NL80211_CHAN_WIDTH_80;
>> + break;
>> + case BRCMU_CHAN_BW_40:
>> + width = NL80211_CHAN_WIDTH_40;
>> + break;
>> + case BRCMU_CHAN_BW_20:
>> + width = NL80211_CHAN_WIDTH_20;
>> + break;
>> + case BRCMU_CHAN_BW_80P80:
>> + width = NL80211_CHAN_WIDTH_80P80;
>> + break;
>
> Not much sense to support this given that center_freq2 is set to zero below.

OK.


Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)

2016-05-20 Thread Andre Heider
Hi,

On Fri, May 20, 2016 at 12:36 PM, Marc Zyngier  wrote:
> On 20/05/16 11:30, Andre Heider wrote:
>> Hi,
>>
>> On Fri, May 20, 2016 at 10:14 AM, Giuseppe CAVALLARO
>>  wrote:
>>> On 5/20/2016 9:56 AM, Marc Zyngier wrote:

 On 20/05/16 06:44, Andre Heider wrote:
>
> Giuseppe, Alexandre, et al.,
>
> On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier 
> wrote:
>>
>> On Thu, 17 Mar 2016 00:56:40 +0100
>> Bert Lindner  wrote:
>>>
>>> On 2016-03-16 18:42, Marc Zyngier wrote:

 On 16/03/16 15:10, Bert Lindner wrote:
>
> On 2016-03-16 14:10, Andreas Färber wrote:
>>
>> Am 16.03.2016 um 13:09 schrieb Robin Murphy:
>>>
>>> On 16/03/16 11:39, Marc Zyngier wrote:

 On 16/03/16 11:19, Bert Lindner wrote:
>
> ...
>
> For the board sun7i-a20-olinuxino-lime2, there seems to be a
> problem
> with the eth0 PHY in mainline kernel 4.5.0 that developed since
> 4.5.0-rc7. Ethernet does not work, although eth0 is reported:
>
> ...
>
> [9.767125] NET: Registered protocol family 10
> [   10.357405] libphy: PHY stmmac-0: not found
> [   10.362382] eth0: Could not attach to PHY
> [   10.366557] stmmac_open: Cannot attach to PHY (error: -19)
>
> ...
>>
>> v4 fixes for 4.5 are here:
>>
>> https://patchwork.ozlabs.org/patch/598195/ (revert)
>> https://patchwork.ozlabs.org/patch/598196/
>
> ...

 Good to know, thanks. Could you also give the potential fix a go (as
 mentioned by Andreas)? Just to make sure that whatever gets merged
 next
 will actually fix the issue.
>>>
>>>
>>> Yes sure, it took a while because I had to travel. Confirmed, the
>>> v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2:
>>>
>>> root@lime2-079f:~# cat /proc/version
>>> Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version
>>> 4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC
>>> 2016
>>>
>>> dmesg:
>>> [8.245273] NET: Registered protocol family 10
>>> [9.297406]  RX IPC Checksum Offload disabled
>>> [9.297460]  No MAC Management Counters available
>>> [9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> [   16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up -
>>> 1Gbps/Full - flow control rx/tx
>>> [   16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>
>>> The board is connected to my laptop rather than to a switch, so that
>>> might be where the flow control message comes from (not sure). Anyway
>>> ethernet works.
>>
>>
>> Cool, many thanks for taking the time to test and report.
>>
>> Hopefully Giuseppe will get this merged quickly enough in mainline, and
>> it should then trickle into a 4.5-stable release (cc-ing stable on
>> these patches would probably be a good idea, BTW).
>
>
> stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since
> v4.5 [0], including all five stable releases :(


 All the A20 platforms are dead, actually.

> The v4.5 patches quoted above are already +4 weeks old, could we
> please get them into stable?


 For that, the maintainer would have needed to CC stable, which he
 didn't. I'd expect someone who cares to send these patches to stable.
 It'd be better if the maintainer would do it himself though.
>>>
>>>
>>> sure, I can send the patches to stable (sorry if I missed to add
>>> stable ML on CC).
>>>
>>> Andre, I have not clear if the train of patches actually fix the
>>> issue or if you need my support to fix something else. In that case
>>> I need some input for debugging (e.g. kernel log).
>>
>> Bert already confirmed that those two patches fixes stmmac on his
>> Lime2, so I assume that it fixes the issue for all A20 platforms.
>>
>>> let me know, is it enough to re-send the patches only?
>>
>> Just a resend with cc:stable :)
>
> Not quite. Please read Documentation/stable_kernel_rules.txt, and the
> section that concerns networking patches (and then consult
> Documentation/networking/netdev-FAQ.txt which has all the details).

Right, it's just been a while since I sent a patch.
What I really meant was "do what's neccessary for -stable inclusion" :)

Regards,
Andre


Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)

2016-05-20 Thread Marc Zyngier
On 20/05/16 11:30, Andre Heider wrote:
> Hi,
> 
> On Fri, May 20, 2016 at 10:14 AM, Giuseppe CAVALLARO
>  wrote:
>> On 5/20/2016 9:56 AM, Marc Zyngier wrote:
>>>
>>> On 20/05/16 06:44, Andre Heider wrote:

 Giuseppe, Alexandre, et al.,

 On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier 
 wrote:
>
> On Thu, 17 Mar 2016 00:56:40 +0100
> Bert Lindner  wrote:
>>
>> On 2016-03-16 18:42, Marc Zyngier wrote:
>>>
>>> On 16/03/16 15:10, Bert Lindner wrote:

 On 2016-03-16 14:10, Andreas Färber wrote:
>
> Am 16.03.2016 um 13:09 schrieb Robin Murphy:
>>
>> On 16/03/16 11:39, Marc Zyngier wrote:
>>>
>>> On 16/03/16 11:19, Bert Lindner wrote:

 ...

 For the board sun7i-a20-olinuxino-lime2, there seems to be a
 problem
 with the eth0 PHY in mainline kernel 4.5.0 that developed since
 4.5.0-rc7. Ethernet does not work, although eth0 is reported:

 ...

 [9.767125] NET: Registered protocol family 10
 [   10.357405] libphy: PHY stmmac-0: not found
 [   10.362382] eth0: Could not attach to PHY
 [   10.366557] stmmac_open: Cannot attach to PHY (error: -19)

 ...
>
> v4 fixes for 4.5 are here:
>
> https://patchwork.ozlabs.org/patch/598195/ (revert)
> https://patchwork.ozlabs.org/patch/598196/

 ...
>>>
>>> Good to know, thanks. Could you also give the potential fix a go (as
>>> mentioned by Andreas)? Just to make sure that whatever gets merged
>>> next
>>> will actually fix the issue.
>>
>>
>> Yes sure, it took a while because I had to travel. Confirmed, the
>> v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2:
>>
>> root@lime2-079f:~# cat /proc/version
>> Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version
>> 4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC
>> 2016
>>
>> dmesg:
>> [8.245273] NET: Registered protocol family 10
>> [9.297406]  RX IPC Checksum Offload disabled
>> [9.297460]  No MAC Management Counters available
>> [9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>> [   16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up -
>> 1Gbps/Full - flow control rx/tx
>> [   16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>
>> The board is connected to my laptop rather than to a switch, so that
>> might be where the flow control message comes from (not sure). Anyway
>> ethernet works.
>
>
> Cool, many thanks for taking the time to test and report.
>
> Hopefully Giuseppe will get this merged quickly enough in mainline, and
> it should then trickle into a 4.5-stable release (cc-ing stable on
> these patches would probably be a good idea, BTW).


 stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since
 v4.5 [0], including all five stable releases :(
>>>
>>>
>>> All the A20 platforms are dead, actually.
>>>
 The v4.5 patches quoted above are already +4 weeks old, could we
 please get them into stable?
>>>
>>>
>>> For that, the maintainer would have needed to CC stable, which he
>>> didn't. I'd expect someone who cares to send these patches to stable.
>>> It'd be better if the maintainer would do it himself though.
>>
>>
>> sure, I can send the patches to stable (sorry if I missed to add
>> stable ML on CC).
>>
>> Andre, I have not clear if the train of patches actually fix the
>> issue or if you need my support to fix something else. In that case
>> I need some input for debugging (e.g. kernel log).
> 
> Bert already confirmed that those two patches fixes stmmac on his
> Lime2, so I assume that it fixes the issue for all A20 platforms.
> 
>> let me know, is it enough to re-send the patches only?
> 
> Just a resend with cc:stable :)

Not quite. Please read Documentation/stable_kernel_rules.txt, and the
section that concerns networking patches (and then consult
Documentation/networking/netdev-FAQ.txt which has all the details).

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)

2016-05-20 Thread Andre Heider
Hi,

On Fri, May 20, 2016 at 10:14 AM, Giuseppe CAVALLARO
 wrote:
> On 5/20/2016 9:56 AM, Marc Zyngier wrote:
>>
>> On 20/05/16 06:44, Andre Heider wrote:
>>>
>>> Giuseppe, Alexandre, et al.,
>>>
>>> On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier 
>>> wrote:

 On Thu, 17 Mar 2016 00:56:40 +0100
 Bert Lindner  wrote:
>
> On 2016-03-16 18:42, Marc Zyngier wrote:
>>
>> On 16/03/16 15:10, Bert Lindner wrote:
>>>
>>> On 2016-03-16 14:10, Andreas Färber wrote:

 Am 16.03.2016 um 13:09 schrieb Robin Murphy:
>
> On 16/03/16 11:39, Marc Zyngier wrote:
>>
>> On 16/03/16 11:19, Bert Lindner wrote:
>>>
>>> ...
>>>
>>> For the board sun7i-a20-olinuxino-lime2, there seems to be a
>>> problem
>>> with the eth0 PHY in mainline kernel 4.5.0 that developed since
>>> 4.5.0-rc7. Ethernet does not work, although eth0 is reported:
>>>
>>> ...
>>>
>>> [9.767125] NET: Registered protocol family 10
>>> [   10.357405] libphy: PHY stmmac-0: not found
>>> [   10.362382] eth0: Could not attach to PHY
>>> [   10.366557] stmmac_open: Cannot attach to PHY (error: -19)
>>>
>>> ...

 v4 fixes for 4.5 are here:

 https://patchwork.ozlabs.org/patch/598195/ (revert)
 https://patchwork.ozlabs.org/patch/598196/
>>>
>>> ...
>>
>> Good to know, thanks. Could you also give the potential fix a go (as
>> mentioned by Andreas)? Just to make sure that whatever gets merged
>> next
>> will actually fix the issue.
>
>
> Yes sure, it took a while because I had to travel. Confirmed, the
> v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2:
>
> root@lime2-079f:~# cat /proc/version
> Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version
> 4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC
> 2016
>
> dmesg:
> [8.245273] NET: Registered protocol family 10
> [9.297406]  RX IPC Checksum Offload disabled
> [9.297460]  No MAC Management Counters available
> [9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up -
> 1Gbps/Full - flow control rx/tx
> [   16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>
> The board is connected to my laptop rather than to a switch, so that
> might be where the flow control message comes from (not sure). Anyway
> ethernet works.


 Cool, many thanks for taking the time to test and report.

 Hopefully Giuseppe will get this merged quickly enough in mainline, and
 it should then trickle into a 4.5-stable release (cc-ing stable on
 these patches would probably be a good idea, BTW).
>>>
>>>
>>> stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since
>>> v4.5 [0], including all five stable releases :(
>>
>>
>> All the A20 platforms are dead, actually.
>>
>>> The v4.5 patches quoted above are already +4 weeks old, could we
>>> please get them into stable?
>>
>>
>> For that, the maintainer would have needed to CC stable, which he
>> didn't. I'd expect someone who cares to send these patches to stable.
>> It'd be better if the maintainer would do it himself though.
>
>
> sure, I can send the patches to stable (sorry if I missed to add
> stable ML on CC).
>
> Andre, I have not clear if the train of patches actually fix the
> issue or if you need my support to fix something else. In that case
> I need some input for debugging (e.g. kernel log).

Bert already confirmed that those two patches fixes stmmac on his
Lime2, so I assume that it fixes the issue for all A20 platforms.

> let me know, is it enough to re-send the patches only?

Just a resend with cc:stable :)

Thanks!
Andre


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Simon Horman
On Fri, May 20, 2016 at 11:20:04AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 18:12:05 +0900, Simon Horman wrote:

[...]

> > 3. With regards to the mirroring part of your question, I need to check
> >on that and possibly its broken. But my thinking is that a mirroring
> >vport would regarded in the same way as any other vport in this respect
> >and the presence of the "layer3" flag would control whether an Ethernet
> >header is present or not.
> > 
> >It may be the case that its not possible to use a tunnel vport as a
> >mirroring vport. And as all other vports currently do not support
> >"layer3" then currently an Ethernet header would always be present
> >on output to a mirror.
> 
> We can just require a mirror port to be always L2 and output the L3
> packets with zero Ethernet addresses. For mirroring, this should be
> okay(?)

Yes, I expect that is a good approach.

I'll look over the code to see about making that so if its not already the
case.


Re: [PATCH] net: pegasus: remove unused variables and labels

2016-05-20 Thread Arnd Bergmann
On Friday 20 May 2016 12:32:23 Petko Manolov wrote:
> Guys, come on.  This code is not dead.  This code is executed every time an 
> ethernet packet is received.  It takes care of various error statistics. More 
> importantly, it sends the actual (reported by the adapter) packet length to 
> the 
> network layer along with the packet.
> 
> This patch removes skb_put() and netif_rx() calls and effectively kills the 
> RX 
> path.  Not to mention that the driver was not even compiled before sending 
> the 
> patch upstream.
> 
> The only sensible, although cosmetic, change would be to replace:
> 
> if (!count || count < 4)
> 
> with
> 
> if (count < 4)
> 
> even though GCC takes care and it optimizes away "!count" condition.
> 
> Please revert this patch before Linus pulls from the network tree.
> 

Agreed. I failed to check the commit that introduced the warning for
the more serious problem.

Please revert e00be9e4d0ff, it just makes no sense.

Arnd


RE: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

2016-05-20 Thread Yangbo Lu
Hi Arnd,

Any comments? 
Please reply when you see the email since we want this eSDHC issue to be fixed 
soon.

All the patches are Freescale-specific and is to fix the eSDHC driver.
Could we let them merged first if you were talking about a new way of 
abstracting getting SoC version.


Thanks :)


Best regards,
Yangbo Lu

> -Original Message-
> From: Scott Wood [mailto:o...@buserror.net]
> Sent: Wednesday, May 11, 2016 11:26 AM
> To: Arnd Bergmann; linux-arm-ker...@lists.infradead.org
> Cc: Yangbo Lu; linuxppc-...@lists.ozlabs.org; Mark Rutland;
> devicet...@vger.kernel.org; ulf.hans...@linaro.org; Russell King; Bhupesh
> Sharma; netdev@vger.kernel.org; Joerg Roedel; Kumar Gala; linux-
> m...@vger.kernel.org; linux-ker...@vger.kernel.org; Yang-Leo Li;
> io...@lists.linux-foundation.org; Rob Herring; linux-...@vger.kernel.org;
> Claudiu Manoil; Santosh Shilimkar; Xiaobo Xie; linux-...@vger.kernel.org;
> Qiang Zhao
> Subject: Re: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240-
> R1.0-R2.0
> 
> On Thu, 2016-05-05 at 13:10 +0200, Arnd Bergmann wrote:
> > On Thursday 05 May 2016 09:41:32 Yangbo Lu wrote:
> > > > -Original Message-
> > > > From: Arnd Bergmann [mailto:a...@arndb.de]
> > > > Sent: Thursday, May 05, 2016 4:32 PM
> > > > To: linuxppc-...@lists.ozlabs.org
> > > > Cc: Yangbo Lu; linux-...@vger.kernel.org;
> > > > devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> > > > linux-ker...@vger.kernel.org; linux-...@vger.kernel.org;
> > > > linux-...@vger.kernel.org; iommu@lists.linux- foundation.org;
> > > > netdev@vger.kernel.org; Mark Rutland; ulf.hans...@linaro.org;
> > > > Russell King; Bhupesh Sharma; Joerg Roedel; Santosh Shilimkar;
> > > > Yang-Leo Li; Scott Wood; Rob Herring; Claudiu Manoil; Kumar Gala;
> > > > Xiaobo Xie; Qiang Zhao
> > > > Subject: Re: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for
> > > > T4240-
> > > > R1.0-R2.0
> > > >
> > > > On Thursday 05 May 2016 11:12:30 Yangbo Lu wrote:
> > > > > IIRC, it is the same IP block as i.MX and Arnd's point is this
> > > > > won't even compile on !PPC. It is things like this that prevent
> > > > > sharing the driver.
> > >
> > > The whole point of using the MMIO SVR instead of the PPC SPR is so
> > > that it will work on ARM...  The guts driver should build on any
> > > platform as long as OF is enabled, and if it doesn't find a node to
> > > bind to it will return 0 for SVR, and the eSDHC driver will continue
> > > (after printing an error that should be removed) without the ability
> > > to test for errata based on SVR.
> >
> > It feels like a bad design to have to come up with a different method
> > for each SoC type here when they all do the same thing and want to
> > identify some variant of the chip to do device specific quirks.
> >
> > As far as I'm concerned, every driver in drivers/soc that needs to
> > export a symbol to be used by a device driver is an indication that we
> > don't have the right set of abstractions yet. There are cases that are
> > not worth abstracting because the functionality is rather obscure and
> > only a couple of drivers for one particular chip ever need it.
> >
> > Finding out the version of the SoC does not look like this case.
> 
> I'm open to new ways of abstracting this, but can that please be
> discussed after these patches are merged?  This patchset is fixing a
> problem, the existing abstraction is unappealing and not widely adopted,
> a new abstraction is not ready, and we're only touching code for our
> hardware.
> 
> Oh, and the existing abstraction isn't even "existing".  I don't see any
> examples where soc_device is being used like this -- or even any way for
> a driver (the one consuming the information, not the soc "driver") to get
> a reference to the soc_device that's been registered short of searching
> for the device object by name -- and you're asking for new functionality
> in drivers/base/soc.c.
> 
> > > > I think the first four patches take care of building for ARM, but
> > > > the problem remains if you want to enable COMPILE_TEST as we need
> > > > for certain automated checking.
> > >
> > > What specific problem is there with COMPILE_TEST?
> >
> > COMPILE_TEST is solvable here and the way it is implemented in this
> > case (selecting FSL_GUTS from the driver) indeed looks like it works
> > correctly, but it's still awkward that this means building the SoC
> > specific ID stuff into the vmlinux binary for any driver that uses
> > something like that for a particular SoC.
> 
> Please keep in mind that this is a Freescale-specific driver... it's not
> as if we're attaching this dependency to common SDHCI code.
> 
> >
> > > > > Dealing with Si revs is a common problem. We should have a
> > > > > common solution. There is soc_device for this purpose.
> > > >
> > > > Exactly. The last time this came up, I think we agreed to
> > > > implement a helper using glob_match() on the soc_device strings.
> > > > Unfortunately this hasn't happened 

Re: [PATCH] net: pegasus: remove unused variables and labels

2016-05-20 Thread Petko Manolov
Guys, come on.  This code is not dead.  This code is executed every time an 
ethernet packet is received.  It takes care of various error statistics. More 
importantly, it sends the actual (reported by the adapter) packet length to the 
network layer along with the packet.

This patch removes skb_put() and netif_rx() calls and effectively kills the RX 
path.  Not to mention that the driver was not even compiled before sending the 
patch upstream.

The only sensible, although cosmetic, change would be to replace:

if (!count || count < 4)

with

if (count < 4)

even though GCC takes care and it optimizes away "!count" condition.

Please revert this patch before Linus pulls from the network tree.


Petko



On 16-05-20 10:22:30, Arnd Bergmann wrote:
> The latest dead-code removal was slightly incomplete and
> left a few things behind that we now get compiler warnings
> for:
> 
> drivers/net/usb/pegasus.c: In function 'read_bulk_callback':
> drivers/net/usb/pegasus.c:475:1: error: label 'goon' defined but not used 
> [-Werror=unused-label]
> drivers/net/usb/pegasus.c:446:8: error: unused variable 'pkt_len' 
> [-Werror=unused-variable]
> drivers/net/usb/pegasus.c:445:6: error: unused variable 'buf' 
> [-Werror=unused-variable]
> drivers/net/usb/pegasus.c:443:17: error: unused variable 'count' 
> [-Werror=unused-variable]
> 
> This removes them as well.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: e00be9e4d0ff ("net: pegasus: remove dead coding")
> ---
>  drivers/net/usb/pegasus.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
> index 1903d2e2b62d..df6d90ab7ca7 100644
> --- a/drivers/net/usb/pegasus.c
> +++ b/drivers/net/usb/pegasus.c
> @@ -440,10 +440,8 @@ static void read_bulk_callback(struct urb *urb)
>  {
>   pegasus_t *pegasus = urb->context;
>   struct net_device *net;
> - int rx_status, count = urb->actual_length;
> + int rx_status;
>   int status = urb->status;
> - u8 *buf = urb->transfer_buffer;
> - __u16 pkt_len;
>  
>   if (!pegasus)
>   return;
> @@ -472,7 +470,6 @@ static void read_bulk_callback(struct urb *urb)
>   netif_dbg(pegasus, rx_err, net, "RX status %d\n", status);
>   }
>  
> -goon:
>   usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb,
> usb_rcvbulkpipe(pegasus->usb, 1),
> pegasus->rx_skb->data, PEGASUS_MTU,
> -- 
> 2.7.0
> 
> 


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Jiri Benc
On Fri, 20 May 2016 18:12:05 +0900, Simon Horman wrote:
> 1. push_eth adds an Ethernet header with all-zero addresses and
>the Ethernet type as determined from skb->protocol which is in
>turn determined by the tunnel header (we have discussed that
>bit before).
> 
>In principle it is pushed when needed. And this happens automatically
>as controlled by user-space.
> 
>It is possible to modify the Ethernet addresses using a custom rule.
>(I need to exercise that more often.)
> 
> 2. For the GRE part of the scenario above it is important to know that with
>the accompanying user-space patch set OvS user-space the user-space
>representation of a vport (from now on simply vport) may be "layer3" or
>not.
> 
>This allows OvS user-space to determine if an Ethernet header should be
>present or not on receive. And if it needs to be present or not on
>transmit. This allows it to automatically use pop_eth and push_eth to
>control the presence of an Ethernet header so its there when it needs to
>be and not when it doesn't.
> 
>So if a GRE vport is "layer3" then no Ethernet header should be
>present on transmit, regardless of where the packet came from. And
>conversely if the GRE vport is not "layer3" then an Ethernet header
>should be present.

I think this works for me. Thanks a lot for answering my questions!

> 3. With regards to the mirroring part of your connection, I need to check
>on that and possibly its broken. But my thinking is that a mirroring
>vport would regarded in the same way as any other vport in this respect
>and the presence of the "layer3" flag would control whether an Ethernet
>header is present or not.
> 
>It may be the case that its not possible to use a tunnel vport as a
>mirroring vport. And as all other vports currently do not support
>"layer3" then currently an Ethernet header would always be present
>on output to a mirror.

We can just require a mirror port to be always L2 and output the L3
packets with zero Ethernet addresses. For mirroring, this should be
okay(?)

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Simon Horman
On Fri, May 20, 2016 at 10:39:39AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 17:16:13 +0900, Simon Horman wrote:
> > My understanding is that currently OvS handles access ports using a
> > push_vlan action.
> 
> When needed (i.e. when the packet goes to a non-access port), yes.
> 
> > And that this patch set in conjunction with its
> > user-space counterpart should ensure that a push_eth action occurs first.
> > This is the context of my remarks above.
> 
> Okay, works for me principally. If it's later found insufficient,
> relaxing push_vlan and pop_vlan to work also for L3 frames is still
> easily possible without breaking compatibility.

Right. I'm all for allowing extension later if the need arises.

> Out of curiosity (and without looking at the user space patchset), what
> will the pushed Ethernet header contain? E.g., consider the following
> scenario: two GRE ports, both set as access ports with the same tag,
> and a mirror port mirroring everything. Now an IP packet without inner
> Ethernet header is received on one of the GRE interfaces.
> 
> Will the packet be output to the second GRE port? Will it be sent out
> without the inner Ethernet header? Are custom rules necessary, or will
> NORMAL take care of this? What will be sent to the mirror port?

Let me take a stab at answering that without running any tests.

1. push_eth adds an Ethernet header with all-zero addresses and
   the Ethernet type as determined from skb->protocol which is in
   turn determined by the tunnel header (we have discussed that
   bit before).

   In principle it is pushed when needed. And this happens automatically
   as controlled by user-space.

   It is possible to modify the Ethernet addresses using a custom rule.
   (I need to exercise that more often.)

2. For the GRE part of the scenario above it is important to know that with
   the accompanying user-space patch set OvS user-space the user-space
   representation of a vport (from now on simply vport) may be "layer3" or
   not.

   This allows OvS user-space to determine if an Ethernet header should be
   present or not on receive. And if it needs to be present or not on
   transmit. This allows it to automatically use pop_eth and push_eth to
   control the presence of an Ethernet header so its there when it needs to
   be and not when it doesn't.

   So if a GRE vport is "layer3" then no Ethernet header should be
   present on transmit, regardless of where the packet came from. And
   conversely if the GRE vport is not "layer3" then an Ethernet header
   should be present.

3. With regards to the mirroring part of your connection, I need to check
   on that and possibly its broken. But my thinking is that a mirroring
   vport would regarded in the same way as any other vport in this respect
   and the presence of the "layer3" flag would control whether an Ethernet
   header is present or not.

   It may be the case that its not possible to use a tunnel vport as a
   mirroring vport. And as all other vports currently do not support
   "layer3" then currently an Ethernet header would always be present
   on output to a mirror.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Jiri Benc
On Fri, 20 May 2016 17:16:13 +0900, Simon Horman wrote:
> My understanding is that currently OvS handles access ports using a
> push_vlan action.

When needed (i.e. when the packet goes to a non-access port), yes.

> And that this patch set in conjunction with its
> user-space counterpart should ensure that a push_eth action occurs first.
> This is the context of my remarks above.

Okay, works for me principally. If it's later found insufficient,
relaxing push_vlan and pop_vlan to work also for L3 frames is still
easily possible without breaking compatibility.

Out of curiosity (and without looking at the user space patchset), what
will the pushed Ethernet header contain? E.g., consider the following
scenario: two GRE ports, both set as access ports with the same tag,
and a mirror port mirroring everything. Now an IP packet without inner
Ethernet header is received on one of the GRE interfaces.

Will the packet be output to the second GRE port? Will it be sent out
without the inner Ethernet header? Are custom rules necessary, or will
NORMAL take care of this? What will be sent to the mirror port?

Thanks!

 Jiri


[PATCH] net: pegasus: remove unused variables and labels

2016-05-20 Thread Arnd Bergmann
The latest dead-code removal was slightly incomplete and
left a few things behind that we now get compiler warnings
for:

drivers/net/usb/pegasus.c: In function 'read_bulk_callback':
drivers/net/usb/pegasus.c:475:1: error: label 'goon' defined but not used 
[-Werror=unused-label]
drivers/net/usb/pegasus.c:446:8: error: unused variable 'pkt_len' 
[-Werror=unused-variable]
drivers/net/usb/pegasus.c:445:6: error: unused variable 'buf' 
[-Werror=unused-variable]
drivers/net/usb/pegasus.c:443:17: error: unused variable 'count' 
[-Werror=unused-variable]

This removes them as well.

Signed-off-by: Arnd Bergmann 
Fixes: e00be9e4d0ff ("net: pegasus: remove dead coding")
---
 drivers/net/usb/pegasus.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
index 1903d2e2b62d..df6d90ab7ca7 100644
--- a/drivers/net/usb/pegasus.c
+++ b/drivers/net/usb/pegasus.c
@@ -440,10 +440,8 @@ static void read_bulk_callback(struct urb *urb)
 {
pegasus_t *pegasus = urb->context;
struct net_device *net;
-   int rx_status, count = urb->actual_length;
+   int rx_status;
int status = urb->status;
-   u8 *buf = urb->transfer_buffer;
-   __u16 pkt_len;
 
if (!pegasus)
return;
@@ -472,7 +470,6 @@ static void read_bulk_callback(struct urb *urb)
netif_dbg(pegasus, rx_err, net, "RX status %d\n", status);
}
 
-goon:
usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb,
  usb_rcvbulkpipe(pegasus->usb, 1),
  pegasus->rx_skb->data, PEGASUS_MTU,
-- 
2.7.0



Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Simon Horman
On Fri, May 20, 2016 at 05:11:23PM +0900, Simon Horman wrote:
> On Fri, May 20, 2016 at 10:00:28AM +0200, Jiri Benc wrote:
> > On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> > > The second option does seem rather tempting although I'm not sure
> > > that it actually plays out in the access-port scenario at this time.
> > 
> > We support gre ports to be access ports currently. With conversion to
> > ipgre, this needs to continue working. It's no problem for frames with
> > the Ethernet header but now we have a situation where a port is tagged,
> > thus the user expects that packets received on that port will behave
> > accordingly. I don't think we can make some packets honor this and some
> > ignore this; and we can't disallow gre to be an access port.
> > 
> > How do you plan to solve this? By user space always pushing an ethernet
> > header before push_vlan?
> 
> Yes. That is my understanding of how OvS currently handles access ports but
> I have a feeling that either I am mistaken or that you are referring to a
> slightly different scenario.

Hi again.

I apologise for having sent my previous email a little too quickly.

My understanding is that currently OvS handles access ports using a
push_vlan action. And that this patch set in conjunction with its
user-space counterpart should ensure that a push_eth action occurs first.
This is the context of my remarks above.


Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)

2016-05-20 Thread Giuseppe CAVALLARO

Hello

On 5/20/2016 9:56 AM, Marc Zyngier wrote:

On 20/05/16 06:44, Andre Heider wrote:

Giuseppe, Alexandre, et al.,

On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier  wrote:

On Thu, 17 Mar 2016 00:56:40 +0100
Bert Lindner  wrote:

On 2016-03-16 18:42, Marc Zyngier wrote:

On 16/03/16 15:10, Bert Lindner wrote:

On 2016-03-16 14:10, Andreas Färber wrote:

Am 16.03.2016 um 13:09 schrieb Robin Murphy:

On 16/03/16 11:39, Marc Zyngier wrote:

On 16/03/16 11:19, Bert Lindner wrote:

...

For the board sun7i-a20-olinuxino-lime2, there seems to be a problem
with the eth0 PHY in mainline kernel 4.5.0 that developed since
4.5.0-rc7. Ethernet does not work, although eth0 is reported:

...

[9.767125] NET: Registered protocol family 10
[   10.357405] libphy: PHY stmmac-0: not found
[   10.362382] eth0: Could not attach to PHY
[   10.366557] stmmac_open: Cannot attach to PHY (error: -19)

...

v4 fixes for 4.5 are here:

https://patchwork.ozlabs.org/patch/598195/ (revert)
https://patchwork.ozlabs.org/patch/598196/

...

Good to know, thanks. Could you also give the potential fix a go (as
mentioned by Andreas)? Just to make sure that whatever gets merged next
will actually fix the issue.


Yes sure, it took a while because I had to travel. Confirmed, the
v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2:

root@lime2-079f:~# cat /proc/version
Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version
4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC 2016

dmesg:
[8.245273] NET: Registered protocol family 10
[9.297406]  RX IPC Checksum Offload disabled
[9.297460]  No MAC Management Counters available
[9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up -
1Gbps/Full - flow control rx/tx
[   16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

The board is connected to my laptop rather than to a switch, so that
might be where the flow control message comes from (not sure). Anyway
ethernet works.


Cool, many thanks for taking the time to test and report.

Hopefully Giuseppe will get this merged quickly enough in mainline, and
it should then trickle into a 4.5-stable release (cc-ing stable on
these patches would probably be a good idea, BTW).


stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since
v4.5 [0], including all five stable releases :(


All the A20 platforms are dead, actually.


The v4.5 patches quoted above are already +4 weeks old, could we
please get them into stable?


For that, the maintainer would have needed to CC stable, which he
didn't. I'd expect someone who cares to send these patches to stable.
It'd be better if the maintainer would do it himself though.


sure, I can send the patches to stable (sorry if I missed to add
stable ML on CC).

Andre, I have not clear if the train of patches actually fix the
issue or if you need my support to fix something else. In that case
I need some input for debugging (e.g. kernel log).

let me know, is it enough to re-send the patches only?

Regards
Peppe



Thanks,

M.





Re: [net-next PATCH V1 0/3] net: enable use of kmem_cache_alloc_bulk in network stack

2016-05-20 Thread Jesper Dangaard Brouer
On Mon, 09 May 2016 15:44:24 +0200
Jesper Dangaard Brouer  wrote:

> This patchset enables use of the slab/kmem_cache bulk alloc API, and
> completes the use the slab/kmem_cache bulking API in the network stack.
> 
> I've not included the patches that introduce a SKB bulk hint, which
> would beneficial for drivers like mlx5.  Lets see if we can agree on
> this patchset first.

Conclusion: Guess we could not agree on this patchset.

The main problem seems to be that the patch always bulk allocated 8
SKBs, without knowing if we actually need them.  My earlier patchset
also included a "hint" interface, as mlx5 driver knows after processing
the RX queue, how many SKBs it needs, thus it can request to bulk alloc
the needed amount. (The only problem with mlx5 is that preallocating
SKBs into it's RX ring is a broken scheme).

A better scheme would be, if the driver on RX knows how many packets
are available in the RX queue.  Then we can bulk alloc the needed
amount of SKBs.  Thus, we can circle back to this once the driver can
provide this information. (This goes nicely hand-in-hand with my idea
of pulling out avail RX packet-pages from the RX ring, and start
prefetch'es to hide the cache-misses).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Simon Horman
On Fri, May 20, 2016 at 10:00:28AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> > The second option does seem rather tempting although I'm not sure
> > that it actually plays out in the access-port scenario at this time.
> 
> We support gre ports to be access ports currently. With conversion to
> ipgre, this needs to continue working. It's no problem for frames with
> the Ethernet header but now we have a situation where a port is tagged,
> thus the user expects that packets received on that port will behave
> accordingly. I don't think we can make some packets honor this and some
> ignore this; and we can't disallow gre to be an access port.
> 
> How do you plan to solve this? By user space always pushing an ethernet
> header before push_vlan?

Yes. That is my understanding of how OvS currently handles access ports but
I have a feeling that either I am mistaken or that you are referring to a
slightly different scenario.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Jiri Benc
On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> The second option does seem rather tempting although I'm not sure
> that it actually plays out in the access-port scenario at this time.

We support gre ports to be access ports currently. With conversion to
ipgre, this needs to continue working. It's no problem for frames with
the Ethernet header but now we have a situation where a port is tagged,
thus the user expects that packets received on that port will behave
accordingly. I don't think we can make some packets honor this and some
ignore this; and we can't disallow gre to be an access port.

How do you plan to solve this? By user space always pushing an ethernet
header before push_vlan?

 Jiri


Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)

2016-05-20 Thread Marc Zyngier
On 20/05/16 06:44, Andre Heider wrote:
> Giuseppe, Alexandre, et al.,
> 
> On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier  wrote:
>> On Thu, 17 Mar 2016 00:56:40 +0100
>> Bert Lindner  wrote:
>>> On 2016-03-16 18:42, Marc Zyngier wrote:
 On 16/03/16 15:10, Bert Lindner wrote:
> On 2016-03-16 14:10, Andreas Färber wrote:
>> Am 16.03.2016 um 13:09 schrieb Robin Murphy:
>>> On 16/03/16 11:39, Marc Zyngier wrote:
 On 16/03/16 11:19, Bert Lindner wrote:
> ...
> For the board sun7i-a20-olinuxino-lime2, there seems to be a problem
> with the eth0 PHY in mainline kernel 4.5.0 that developed since
> 4.5.0-rc7. Ethernet does not work, although eth0 is reported:
> ...
> [9.767125] NET: Registered protocol family 10
> [   10.357405] libphy: PHY stmmac-0: not found
> [   10.362382] eth0: Could not attach to PHY
> [   10.366557] stmmac_open: Cannot attach to PHY (error: -19)
> ...
>> v4 fixes for 4.5 are here:
>>
>> https://patchwork.ozlabs.org/patch/598195/ (revert)
>> https://patchwork.ozlabs.org/patch/598196/
> ...
 Good to know, thanks. Could you also give the potential fix a go (as
 mentioned by Andreas)? Just to make sure that whatever gets merged next
 will actually fix the issue.
>>>
>>> Yes sure, it took a while because I had to travel. Confirmed, the
>>> v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2:
>>>
>>> root@lime2-079f:~# cat /proc/version
>>> Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version
>>> 4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC 2016
>>>
>>> dmesg:
>>> [8.245273] NET: Registered protocol family 10
>>> [9.297406]  RX IPC Checksum Offload disabled
>>> [9.297460]  No MAC Management Counters available
>>> [9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> [   16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up -
>>> 1Gbps/Full - flow control rx/tx
>>> [   16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>
>>> The board is connected to my laptop rather than to a switch, so that
>>> might be where the flow control message comes from (not sure). Anyway
>>> ethernet works.
>>
>> Cool, many thanks for taking the time to test and report.
>>
>> Hopefully Giuseppe will get this merged quickly enough in mainline, and
>> it should then trickle into a 4.5-stable release (cc-ing stable on
>> these patches would probably be a good idea, BTW).
> 
> stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since
> v4.5 [0], including all five stable releases :(

All the A20 platforms are dead, actually.

> The v4.5 patches quoted above are already +4 weeks old, could we
> please get them into stable?

For that, the maintainer would have needed to CC stable, which he
didn't. I'd expect someone who cares to send these patches to stable.
It'd be better if the maintainer would do it himself though.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


Re: [PATCH] net: alx: use custom skb allocator

2016-05-20 Thread Feng Tang
Hi,

Please ignore this patch. 

I found the problem and made the patch with kernel 4.4 with Ubuntu 12.04
on Lenovo Y580. 

After submitting the patch, I tried to google the datasheet for
Atheros 8161 dev, and found there was already a kernel bugzilla 
https://bugzilla.kernel.org/show_bug.cgi?id=70761
that had a fix from Jarod Wilson which get merged in v4.5 (commit c406700cd) 

Per my test, the new 4.6.0 kernel works fine with Jarod's patch. 

Thanks,
Feng

On Fri, May 20, 2016 at 01:41:03PM +0800, Feng Tang wrote:
> This patch follows Eric Dumazet's commit 7b70176421 to fix one
> exactly same bug in alx driver, that the network link will
> be lost in 1-5 minutes after the device is up.
> 
> Following is a git log from Eric's 7b70176421:
> 
> "We had reports ( https://bugzilla.kernel.org/show_bug.cgi?id=54021 )
> that using high order pages for skb allocations is problematic for atl1c
> 
> We do not know exactly what the problem is, but we suspect that crossing
> 4K pages is not well supported by this hardware.
> 
> Use a custom allocator, using page allocator and 2K fragments for
> optimal stack behavior. We might make this allocator generic
>  in future kernels."
> 
> And my debug shows the same suspect, most of the errors happen
> when there is a RX buffer address with 0x..f80, hopefully
> this will get noticed and fixed from silicon side.
> 
> My victim is a Lenovo Y580 Laptop with Atheros ALX AR8161 etherenet
> device(PCI ID 1969:1091), with this patch the ethernet dev
> works just fine
> 
> Signed-off-by: Feng Tang 
> ---
>  drivers/net/ethernet/atheros/alx/alx.h  |  4 +++
>  drivers/net/ethernet/atheros/alx/main.c | 48 
> -
>  2 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/atheros/alx/alx.h 
> b/drivers/net/ethernet/atheros/alx/alx.h
> index 8fc93c5..d02c424 100644
> --- a/drivers/net/ethernet/atheros/alx/alx.h
> +++ b/drivers/net/ethernet/atheros/alx/alx.h
> @@ -96,6 +96,10 @@ struct alx_priv {
>   unsigned int rx_ringsz;
>   unsigned int rxbuf_size;
>  
> + struct page  *rx_page;
> + unsigned int rx_page_offset;
> + unsigned int rx_frag_size;
> +
>   struct napi_struct napi;
>   struct alx_tx_queue txq;
>   struct alx_rx_queue rxq;
> diff --git a/drivers/net/ethernet/atheros/alx/main.c 
> b/drivers/net/ethernet/atheros/alx/main.c
> index 9fe8b5e..c98acdc 100644
> --- a/drivers/net/ethernet/atheros/alx/main.c
> +++ b/drivers/net/ethernet/atheros/alx/main.c
> @@ -70,6 +70,35 @@ static void alx_free_txbuf(struct alx_priv *alx, int entry)
>   }
>  }
>  
> +static struct sk_buff *alx_alloc_skb(struct alx_priv *alx, gfp_t gfp)
> +{
> + struct sk_buff *skb;
> + struct page *page;
> +
> + if (alx->rx_frag_size > PAGE_SIZE)
> + return __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp);
> +
> + page = alx->rx_page;
> + if (!page) {
> + alx->rx_page = page = alloc_page(gfp);
> + if (unlikely(!page))
> + return NULL;
> + alx->rx_page_offset = 0;
> + }
> +
> + skb = build_skb(page_address(page) + alx->rx_page_offset,
> + alx->rx_frag_size);
> + if (likely(skb)) {
> + alx->rx_page_offset += alx->rx_frag_size;
> + if (alx->rx_page_offset >= PAGE_SIZE)
> + alx->rx_page = NULL;
> + else
> + get_page(page);
> + }
> + return skb;
> +}
> +
> +
>  static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t gfp)
>  {
>   struct alx_rx_queue *rxq = &alx->rxq;
> @@ -86,7 +115,7 @@ static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t 
> gfp)
>   while (!cur_buf->skb && next != rxq->read_idx) {
>   struct alx_rfd *rfd = &rxq->rfd[cur];
>  
> - skb = __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp);
> + skb = alx_alloc_skb(alx, gfp);
>   if (!skb)
>   break;
>   dma = dma_map_single(&alx->hw.pdev->dev,
> @@ -124,6 +153,7 @@ static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t 
> gfp)
>   alx_write_mem16(&alx->hw, ALX_RFD_PIDX, cur);
>   }
>  
> +
>   return count;
>  }
>  
> @@ -592,6 +622,11 @@ static void alx_free_rings(struct alx_priv *alx)
>   kfree(alx->txq.bufs);
>   kfree(alx->rxq.bufs);
>  
> + if (alx->rx_page) {
> + put_page(alx->rx_page);
> + alx->rx_page = NULL;
> + }
> +
>   dma_free_coherent(&alx->hw.pdev->dev,
> alx->descmem.size,
> alx->descmem.virt,
> @@ -646,6 +681,7 @@ static int alx_request_irq(struct alx_priv *alx)
> alx->dev->name, alx);
>   if (!err)
>   goto out;
> +
>   /* fall back to legacy interrupt */
>   pci_disable_msi(alx->hw.pdev);
>   }
> @@ -689,6 +725,7 @@ static int alx_init_sw(s

Re: [PATCH] brcmfmac: use kmemdup

2016-05-20 Thread Arend Van Spriel
On 19-5-2016 15:59, Muhammad Falak R Wani wrote:
> Use kmemdup when some other buffer is immediately copied into allocated
> region. It replaces call to allocation followed by memcpy, by a single
> call to kmemdup.

Acked-by: Arend van Spriel 
> Signed-off-by: Muhammad Falak R Wani 
> ---
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> index d0631b6..705adaa 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> @@ -6699,11 +6699,10 @@ struct brcmf_cfg80211_info 
> *brcmf_cfg80211_attach(struct brcmf_pub *drvr,
>   return NULL;
>   }
>  
> - ops = kzalloc(sizeof(*ops), GFP_KERNEL);
> + ops = kmemdup(&brcmf_cfg80211_ops, sizeof(*ops), GFP_KERNEL);
>   if (!ops)
>   return NULL;
>  
> - memcpy(ops, &brcmf_cfg80211_ops, sizeof(*ops));
>   ifp = netdev_priv(ndev);
>  #ifdef CONFIG_PM
>   if (brcmf_feat_is_enabled(ifp, BRCMF_FEAT_WOWL_GTK))
> 


Re: [PATCH 4.8 1/2] brcmutil: add field storing control channel to the struct brcmu_chan

2016-05-20 Thread Arend Van Spriel
On 19-5-2016 13:02, Rafał Miłecki wrote:
> Our d11 code supports encoding/decoding channel info into/from chanspec
> format used by firmware. Current implementation is quite misleading
> because of the way "chnum" field is used.
> When encoding channel info, "chnum" has to be filled by a caller with
> *center* channel number. However when decoding chanspec the same field
> is filled with a *control* channel number.
> 
> This can be confusing and doesn't allow accessing all info when
> decoding. Solve it by adding a separated field for control channel.

The need to "access all info" is probably the other patch so you might
hint here that this change is needed for the .get_channel() callback.

Reviewed-by: Arend van Spriel 
> Signed-off-by: Rafał Miłecki 
> ---
>  .../broadcom/brcm80211/brcmfmac/cfg80211.c | 17 +
>  .../net/wireless/broadcom/brcm80211/brcmfmac/p2p.c | 10 +-
>  .../net/wireless/broadcom/brcm80211/brcmutil/d11.c | 18 ++
>  .../broadcom/brcm80211/include/brcmu_d11.h | 22 
> ++
>  4 files changed, 46 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> index d0631b6..597495d 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> @@ -2734,7 +2734,7 @@ static s32 brcmf_inform_single_bss(struct 
> brcmf_cfg80211_info *cfg,
>   if (!bi->ctl_ch) {
>   ch.chspec = le16_to_cpu(bi->chanspec);
>   cfg->d11inf.decchspec(&ch);
> - bi->ctl_ch = ch.chnum;
> + bi->ctl_ch = ch.control_ch_num;
>   }
>   channel = bi->ctl_ch;
>  
> @@ -2852,7 +2852,7 @@ static s32 brcmf_inform_ibss(struct brcmf_cfg80211_info 
> *cfg,
>   else
>   band = wiphy->bands[NL80211_BAND_5GHZ];
>  
> - freq = ieee80211_channel_to_frequency(ch.chnum, band->band);
> + freq = ieee80211_channel_to_frequency(ch.control_ch_num, band->band);
>   cfg->channel = freq;
>   notify_channel = ieee80211_get_channel(wiphy, freq);
>  
> @@ -2862,7 +2862,7 @@ static s32 brcmf_inform_ibss(struct brcmf_cfg80211_info 
> *cfg,
>   notify_ielen = le32_to_cpu(bi->ie_length);
>   notify_signal = (s16)le16_to_cpu(bi->RSSI) * 100;
>  
> - brcmf_dbg(CONN, "channel: %d(%d)\n", ch.chnum, freq);
> + brcmf_dbg(CONN, "channel: %d(%d)\n", ch.control_ch_num, freq);
>   brcmf_dbg(CONN, "capability: %X\n", notify_capability);
>   brcmf_dbg(CONN, "beacon interval: %d\n", notify_interval);
>   brcmf_dbg(CONN, "signal: %d\n", notify_signal);
> @@ -5280,7 +5280,7 @@ brcmf_bss_roaming_done(struct brcmf_cfg80211_info *cfg,
>   else
>   band = wiphy->bands[NL80211_BAND_5GHZ];
>  
> - freq = ieee80211_channel_to_frequency(ch.chnum, band->band);
> + freq = ieee80211_channel_to_frequency(ch.control_ch_num, band->band);
>   notify_channel = ieee80211_get_channel(wiphy, freq);
>  
>  done:
> @@ -5802,14 +5802,15 @@ static int brcmf_construct_chaninfo(struct 
> brcmf_cfg80211_info *cfg,
>   channel = band->channels;
>   index = band->n_channels;
>   for (j = 0; j < band->n_channels; j++) {
> - if (channel[j].hw_value == ch.chnum) {
> + if (channel[j].hw_value == ch.control_ch_num) {
>   index = j;
>   break;
>   }
>   }
>   channel[index].center_freq =
> - ieee80211_channel_to_frequency(ch.chnum, band->band);
> - channel[index].hw_value = ch.chnum;
> + ieee80211_channel_to_frequency(ch.control_ch_num,
> +band->band);
> + channel[index].hw_value = ch.control_ch_num;
>  
>   /* assuming the chanspecs order is HT20,
>* HT40 upper, HT40 lower, and VHT80.
> @@ -5911,7 +5912,7 @@ static int brcmf_enable_bw40_2g(struct 
> brcmf_cfg80211_info *cfg)
>   if (WARN_ON(ch.bw != BRCMU_CHAN_BW_40))
>   continue;
>   for (j = 0; j < band->n_channels; j++) {
> - if (band->channels[j].hw_value == ch.chnum)
> + if (band->channels[j].hw_value == 
> ch.control_ch_num)
>   break;
>   }
>   if (WARN_ON(j == band->n_channels))
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
> index a70cda6..1652a48 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
> @@ -1246,7 +1246,7 @@ bool brcmf_p2p_scan_finding_common_cha

Re: [PATCH 4.8 2/2] brcmfmac: support get_channel cfg80211 callback

2016-05-20 Thread Arend Van Spriel
On 19-5-2016 13:02, Rafał Miłecki wrote:
> This is important for brcmfmac as the firmware may pick different
> channel than requested. This has been tested with BCM4366B1 (in D-Link
> DIR-885L).

Can you elaborate? Is this for AP or STA mode?

> Signed-off-by: Rafał Miłecki 
> ---
>  .../broadcom/brcm80211/brcmfmac/cfg80211.c | 59 
> ++
>  1 file changed, 59 insertions(+)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> index 597495d..4fb9e3a 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> @@ -4892,6 +4892,64 @@ exit:
>   return err;
>  }
>  
> +static int brcmf_cfg80211_get_channel(struct wiphy *wiphy,
> +   struct wireless_dev *wdev,
> +   struct cfg80211_chan_def *chandef)
> +{
> + struct brcmf_cfg80211_info *cfg = wiphy_to_cfg(wiphy);
> + struct net_device *ndev = wdev->netdev;
> + struct brcmf_if *ifp = netdev_priv(ndev);

Can this operation be done on a P2P_DEVICE interface, ie. wdev->netdev
== NULL?

> + struct brcmu_chan ch;
> + enum nl80211_band band = 0;
> + enum nl80211_chan_width width = 0;
> + u32 chanspec;
> + int freq, err;
> +
> + err = brcmf_fil_iovar_int_get(ifp, "chanspec", &chanspec);
> + if (err) {
> + brcmf_err("chanspec failed (%d)\n", err);
> + return err;
> + }
> +
> + ch.chspec = chanspec;
> + cfg->d11inf.decchspec(&ch);
> +
> + switch (ch.band) {
> + case BRCMU_CHAN_BAND_2G:
> + band = NL80211_BAND_2GHZ;
> + break;
> + case BRCMU_CHAN_BAND_5G:
> + band = NL80211_BAND_5GHZ;
> + break;
> + }
> +
> + switch (ch.bw) {
> + case BRCMU_CHAN_BW_80:
> + width = NL80211_CHAN_WIDTH_80;
> + break;
> + case BRCMU_CHAN_BW_40:
> + width = NL80211_CHAN_WIDTH_40;
> + break;
> + case BRCMU_CHAN_BW_20:
> + width = NL80211_CHAN_WIDTH_20;
> + break;
> + case BRCMU_CHAN_BW_80P80:
> + width = NL80211_CHAN_WIDTH_80P80;
> + break;

Not much sense to support this given that center_freq2 is set to zero below.

Regards,
Arend

> + case BRCMU_CHAN_BW_160:
> + width = NL80211_CHAN_WIDTH_160;
> + break;
> + }
> +
> + freq = ieee80211_channel_to_frequency(ch.control_ch_num, band);
> + chandef->chan = ieee80211_get_channel(wiphy, freq);
> + chandef->width = width;
> + chandef->center_freq1 = ieee80211_channel_to_frequency(ch.chnum, band);
> + chandef->center_freq2 = 0;
> +
> + return 0;
> +}
> +
>  static int brcmf_cfg80211_crit_proto_start(struct wiphy *wiphy,
>  struct wireless_dev *wdev,
>  enum nl80211_crit_proto_id proto,
> @@ -5054,6 +5112,7 @@ static struct cfg80211_ops brcmf_cfg80211_ops = {
>   .mgmt_tx = brcmf_cfg80211_mgmt_tx,
>   .remain_on_channel = brcmf_p2p_remain_on_channel,
>   .cancel_remain_on_channel = brcmf_cfg80211_cancel_remain_on_channel,
> + .get_channel = brcmf_cfg80211_get_channel,
>   .start_p2p_device = brcmf_p2p_start_device,
>   .stop_p2p_device = brcmf_p2p_stop_device,
>   .crit_proto_start = brcmf_cfg80211_crit_proto_start,
> 


Re: [PATCH 1/1] net: pegasus: remove dead coding

2016-05-20 Thread Petko Manolov
On 16-05-19 11:35:42, David Miller wrote:
> From: Heinrich Schuchardt 
> Date: Wed, 18 May 2016 02:13:30 +0200
> 
> > (!count || count < 4) is always true.
> > So let's remove the coding which is dead at least since 2005.
> > 
> > Signed-off-by: Heinrich Schuchardt 
> 
> Applied.

David, the patch you applied is broken.  It seems that you didn't follow the 
discussion from the past couple of days.  Please revert it.


Petko


Re: [RFC] net: remove busylock

2016-05-20 Thread Jesper Dangaard Brouer
On Thu, 19 May 2016 11:03:32 -0700
Alexander Duyck  wrote:

> On Thu, May 19, 2016 at 10:08 AM, Eric Dumazet  wrote:
> > busylock was added at the time we had expensive ticket spinlocks
> >
> > (commit 79640a4ca6955e3ebdb7038508fa7a0cd7fa5527 ("net: add additional
> > lock to qdisc to increase throughput")
> >
> > Now kernel spinlocks are MCS, this busylock things is no longer
> > relevant. It is slowing down things a bit.
> >
> >
> > With HTB qdisc, here are the numbers for 200 concurrent TCP_RR, on a host 
> > with 48 hyperthreads.
> >
[...]
> 
> The main point of the busy lock is to deal with the bulk throughput
> case, not the latency case which would be relatively well behaved.
> The problem wasn't really related to lock bouncing slowing things
> down.  It was the fairness between the threads that was killing us
> because the dequeue needs to have priority.

Yes, exactly.
 
> The main problem that the busy lock solved was the fact that you could
> start a number of stream tests equal to the number of CPUs in a given
> system and the result was that the performance would drop off a cliff
> and you would drop almost all the packets for almost all the streams
> because the qdisc never had a chance to drain because it would be CPU
> - 1 enqueues, followed by 1 dequeue.

Notice 1 enqueue does not guarantee 1 dequeue.  If one CPU has entered
dequeue/xmit state (setting __QDISC___STATE_RUNNING) then other CPUs
will just enqueue.  Thus, N CPUs will enqueue (a packet each), and 1 CPU
will dequeue a packet.

The problem arise due to the fairness of the ticket spinlock, and AFAIK
the MCS lock is even more fair.  All enqueue's get their turn before
the dequeue can move forward.  And the qdisc dequeue CPU have an
unfortunate pattern of releasing the qdisc root_lock and acquiring it
again (qdisc_restart+sch_direct_xmit). Thus, while dequeue is waiting
re-aquire the root_lock, N CPUs will enqueue packets.

The hole idea behind allowing bulk qdisc dequeue, was to mitigate this,
by allowing dequeue to do more work, while holding the lock.

You mention HTB.  Notice HTB does not take advantage of bulk dequeue.
Have you tried to enable/allow HTB to bulk dequeue?


> What we need if we are going to get rid of busy lock would be some
> sort of priority locking mechanism that would allow the dequeue thread
> to jump to the head of the line if it is attempting to take the lock.
> Otherwise you end up spending all your time enqueuing packets into
> oblivion because the qdiscs just overflow without the busy lock in
> place.

Exactly. The qdisc locking scheme is designed to only allow one
dequeuing CPU to run (via state __QDISC___STATE_RUNNING).  Jamal told
me this was an optimization.  Maybe this optimization "broke" when
locking got fair?

I don't want to offend the original qdisc designers, but when I look at
the qdisc locking code, I keep thinking this scheme is broken.

Wouldn't it be better to have seperate an enqueue lock and a dequeue
lock? (with a producer/consumer queue).

Once we get John's lockless qdisc scheme, then the individual qdiscs
can implement a locking scheme like this, and we can keep the old
locking scheme intact for legacy qdisc. Or we can clean up this locking
scheme and update the legacy qdisc to use a MPMC queue?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer