Re: general protection fault in skb_segment

2017-12-29 Thread Willem de Bruijn
> syzkaller hit the following crash on
> 37759fa6d0fa9e4d6036d19ac12f555bfc0aeafd
> git://git.cmpxchg.org/linux-mmots.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers

Reproduced with the C reproducer on v4.15-rc1 and mainline
going back at least to v4.8, but not v4.7. SCTP GSO was
introduced in v4.8-rc1, so a patch in this set is likely the starting
point. Indeed crashes at 90017accff61 ("sctp: Add GSO support"),
but not at 90017accff61~4.

The reproducer with its sandbox removed shows this invocation in strace -f

# strace -f ./repro2
[... skipped ...]
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
open("/dev/net/tun", O_RDONLY)  = 4
fcntl(4, F_DUPFD, 3)= 5
socket(PF_PACKET, SOCK_RAW|SOCK_CLOEXEC, 8) = 6
ioctl(4, TUNSETIFF, 0x20e63000) = 0
ioctl(3, SIOCSIFFLAGS, {ifr_name="syz0",
ifr_flags=IFF_UP|IFF_PROMISC|IFF_ALLMULTI}) = 0
setsockopt(6, SOL_PACKET, 0xf /* PACKET_??? */, [4096], 4) = 0
ioctl(6, SIOCGIFINDEX, {ifr_name="syz0", ifr_index=24}) = 0
bind(6, {sa_family=AF_PACKET, proto=, if24, pkttype=PACKET_HOST,
addr(6)={1, aa00}, 20) = 0
dup2(6, 5)  = 5
write(5, "\0\201\1\0\350\367\0\0\3\0E\364\0 \0d\0\0\7\2042\342\0\0\0
\177\0\0\1\0\t"..., 42

where 0xf in setsockopt is PACKET_VNET_HDR

So this is a packet socket writing something that apparently looks
like an SCTP packet, is only 42 bytes long, but has GSO set in its
virtio_net_hdr struct.

It crashes in skb_segment seemingly on a NULL list_skb.

(gdb) list *(skb_segment+0x2a4)
0x8167cc24 is in skb_segment (net/core/skbuff.c:3566).
3561if (hsize < 0)
3562hsize = 0;
3563if (hsize > len || !sg)
3564hsize = len;
3565
3566if (!hsize && i >= nfrags && skb_headlen(list_skb) &&
3567(skb_headlen(list_skb) == len || sg)) {
3568BUG_ON(skb_headlen(list_skb) > len);
3569
3570i = 0;

Likely there is a hidden assumption about SCTP GSO packets that does
not hold for such packets generated by PF_PACKET.

SCTP GSO introduced the GSO_BY_FRAGS mss value, so the code
takes a different path for SCTP packets generated by the SCTP stack.

PF_PACKET does not necessarily set gso_size to GSO_BY_FRAGS, so
does not take the branch that requires list_skb to be non-zero here:

if (unlikely(mss == GSO_BY_FRAGS)) {
len = list_skb->len;
} else {
len = head_skb->len - offset;
if (len > mss)
len = mss;
}

hsize = skb_headlen(head_skb) - offset;
if (hsize < 0)
hsize = 0;
if (hsize > len || !sg)
hsize = len;

if (!hsize && i >= nfrags && skb_headlen(list_skb) &&
(skb_headlen(list_skb) == len || sg)) {

Somewhat tangential, but any PF_PACKET socket can set this
magic gso_size value in its virtio_net_hdr, so if it is assumed to
be an SCTP GSO specific option, setting it for a TCP GSO packet
may also cause unexpected results.

The crash requires a kernel with CONFIG_IP_SCTP enabled.


RE: [PATCH v3] igb: Free IRQs when device is hotplugged

2017-12-29 Thread Brown, Aaron F
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Lyude Paul
> Sent: Tuesday, December 12, 2017 11:32 AM
> To: intel-wired-...@lists.osuosl.org
> Cc: Fujinaka, Todd ; Stephen Hemminger
> ; sta...@vger.kernel.org; Kirsher, Jeffrey T
> ; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH v3] igb: Free IRQs when device is hotplugged
> 
> Recently I got a Caldigit TS3 Thunderbolt 3 dock, and noticed that upon
> hotplugging my kernel would immediately crash due to igb:
> 
> [  680.825801] kernel BUG at drivers/pci/msi.c:352!
> [  680.828388] invalid opcode:  [#1] SMP
> [  680.829194] Modules linked in: igb(O) thunderbolt i2c_algo_bit joydev vfat
> fat btusb btrtl btbcm btintel bluetooth ecdh_generic hp_wmi
> sparse_keymap rfkill wmi_bmof iTCO_wdt intel_rapl
> x86_pkg_temp_thermal coretemp crc32_pclmul snd_pcm rtsx_pci_ms
> mei_me snd_timer memstick snd pcspkr mei soundcore i2c_i801 tpm_tis
> psmouse shpchp wmi tpm_tis_core tpm video hp_wireless acpi_pad
> rtsx_pci_sdmmc mmc_core crc32c_intel serio_raw rtsx_pci mfd_core
> xhci_pci xhci_hcd i2c_hid i2c_core [last unloaded: igb]
> [  680.831085] CPU: 1 PID: 78 Comm: kworker/u16:1 Tainted: G   O
> 4.15.0-rc3Lyude-Test+ #6
> [  680.831596] Hardware name: HP HP ZBook Studio G4/826B, BIOS P71 Ver.
> 01.03 06/09/2017
> [  680.832168] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [  680.832687] RIP: 0010:free_msi_irqs+0x180/0x1b0
> [  680.833271] RSP: 0018:c930fbf0 EFLAGS: 00010286
> [  680.833761] RAX: 8803405f9c00 RBX: 88033e3d2e40 RCX:
> 002c
> [  680.834278] RDX:  RSI: 00ac RDI:
> 880340be2178
> [  680.834832] RBP:  R08: 880340be1ff0 R09:
> 8803405f9c00
> [  680.835342] R10:  R11: 0040 R12:
> 88033d63a298
> [  680.835822] R13: 88033d63a000 R14: 0060 R15:
> 880341959000
> [  680.836332] FS:  () GS:88034f44()
> knlGS:
> [  680.836817] CS:  0010 DS:  ES:  CR0: 80050033
> [  680.837360] CR2: 55e64044afdf CR3: 01c09002 CR4:
> 003606e0
> [  680.837954] Call Trace:
> [  680.838853]  pci_disable_msix+0xce/0xf0
> [  680.839616]  igb_reset_interrupt_capability+0x5d/0x60 [igb]
> [  680.840278]  igb_remove+0x9d/0x110 [igb]
> [  680.840764]  pci_device_remove+0x36/0xb0
> [  680.841279]  device_release_driver_internal+0x157/0x220
> [  680.841739]  pci_stop_bus_device+0x7d/0xa0
> [  680.842255]  pci_stop_bus_device+0x2b/0xa0
> [  680.842722]  pci_stop_bus_device+0x3d/0xa0
> [  680.843189]  pci_stop_and_remove_bus_device+0xe/0x20
> [  680.843627]  trim_stale_devices+0xf3/0x140
> [  680.844086]  trim_stale_devices+0x94/0x140
> [  680.844532]  trim_stale_devices+0xa6/0x140
> [  680.845031]  ? get_slot_status+0x90/0xc0
> [  680.845536]  acpiphp_check_bridge.part.5+0xfe/0x140
> [  680.846021]  acpiphp_hotplug_notify+0x175/0x200
> [  680.846581]  ? free_bridge+0x100/0x100
> [  680.847113]  acpi_device_hotplug+0x8a/0x490
> [  680.847535]  acpi_hotplug_work_fn+0x1a/0x30
> [  680.848076]  process_one_work+0x182/0x3a0
> [  680.848543]  worker_thread+0x2e/0x380
> [  680.848963]  ? process_one_work+0x3a0/0x3a0
> [  680.849373]  kthread+0x111/0x130
> [  680.849776]  ? kthread_create_worker_on_cpu+0x50/0x50
> [  680.850188]  ret_from_fork+0x1f/0x30
> [  680.850601] Code: 43 14 85 c0 0f 84 d5 fe ff ff 31 ed eb 0f 83 c5 01 39 6b 
> 14 0f
> 86 c5 fe ff ff 8b 7b 10 01 ef e8 b7 e4 d2 ff 48 83 78 70 00 74 e3 <0f> 0b 49 
> 8d b5
> a0 00 00 00 e8 62 6f d3 ff e9 c7 fe ff ff 48 8b
> [  680.851497] RIP: free_msi_irqs+0x180/0x1b0 RSP: c930fbf0
> 
> As it turns out, normally the freeing of IRQs that would fix this is called
> inside of the scope of __igb_close(). However, since the device is
> already gone by the point we try to unregister the netdevice from the
> driver due to a hotplug we end up seeing that the netif isn't present
> and thus, forget to free any of the device IRQs.
> 
> So: make sure that if we're in the process of dismantling the netdev, we
> always allow __igb_close() to be called so that IRQs may be freed
> normally. Additionally, only allow igb_close() to be called from
> __igb_close() if it hasn't already been called for the given adapter.
> 
> Signed-off-by: Lyude Paul 
> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach")
> Cc: Todd Fujinaka 
> Cc: Stephen Hemminger 
> Cc: sta...@vger.kernel.org
> ---
> Changes since v2:
>   - Remove hunk in __igb_close() that was left over by accident, it's
> not needed
> 
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Tested-by: Aaron Brown 


Re: [PATCH net-next v5 0/5] Introduce NETIF_F_GRO_HW

2017-12-29 Thread Michael Chan
On Fri, Dec 29, 2017 at 7:12 AM, Alexander Duyck
 wrote:
> On Fri, Dec 29, 2017 at 4:43 AM, Sabrina Dubroca  wrote:
>> 2017-12-22, 10:14:32 -0800, Alexander Duyck wrote:
>>> On Fri, Dec 22, 2017 at 6:57 AM, Sabrina Dubroca  
>>> wrote:
>>> > IIUC, with the patches that were applied, each driver can define
>>> > whether GRO_HW depends on GRO? From a user's perspective, this
>>> > inconsistent behavior is going to be quite confusing.
>>> >
>>> > Worse than inconsistent behavior, it looks like a driver deciding that
>>> > GRO_HW doesn't depend on GRO is going to introduce a change of
>>> > behavior.  Previously, when GRO was disabled, there wouldn't be any
>>> > packet over MTU handed to the network stack.  Now, even if GRO is
>>> > disabled, GRO_HW might still be enabled, so we might get over-MTU
>>> > packets because of hardware GRO.
>>>
>>> This isn't actually true. LRO was still handling packets larger than
>>> MTU over even when GRO was disabled.
>>
>> Sure, LRO will also cause that, but we're speaking in the context of
>> GRO here, which means no LRO.
>
> We are talking about GRO_HW. Which is basically aggregation being
> performed in hardware. The choice of name is unfortunate though since
> it implies this is GRO when what is actually happening is just
> GRO-like. Really the only difference between LRO and GRO_HW is that
> the frames are better formed in the final result. It is a matter of
> what is performed versus where it is performed.

I think the name GRO_HW is perfectly fine.  It is GRO aggregation done
in hardware, and hardware providing extra information to the driver to
setup the SKB just like GRO.  I don't know what better name to call it
than GRO_HW.


Re: [PATCH iproute2 4/4] man: routel/routef: don't mention filesystem paths

2017-12-29 Thread Stephen Hemminger
On Fri, 29 Dec 2017 23:01:25 +0100
Luca Boccassi  wrote:

> The filesytem paths to these scripts might be different on various
> distros, so don't mention it in the manpages. It is not really useful
> information anyway.
> 
> Originally submitted as Debian bug:
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561424
> 
> Reported-by: jida...@jidanni.org
> Signed-off-by: Luca Boccassi 
> ---
>  man/man8/routel.8 | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/man/man8/routel.8 b/man/man8/routel.8
> index 82d580fb..2270eacb 100644
> --- a/man/man8/routel.8
> +++ b/man/man8/routel.8
> @@ -17,11 +17,6 @@ The routel script will list routes in a format that some 
> might consider easier t
>  .br
>  The routef script does not take any arguments and will simply flush the 
> routing table down the drain. Beware! This means deleting all routes which 
> will make your network unusable!
>  
> -.SH "FILES"
> -.LP
> -\fI/usr/bin/routef\fP
> -.br
> -\fI/usr/bin/routel\fP
>  .SH "AUTHORS"
>  .LP
>  The routel script was written by Stephen R. van den Berg , 
> 1999/04/18 and donated to the public domain.

Sure hardcode paths are not good.
Alternative would be generate man page like ipaddress is.


Re: [PATCH iproute2 3/4] man: ip-address: document 15-char limit for LABEL

2017-12-29 Thread Stephen Hemminger
On Fri, 29 Dec 2017 23:01:24 +0100
Luca Boccassi  wrote:

> Trying to set a label longer than 15 characters returns an error:
>  RTNETLINK answers: Numerical result out of range
> 
> Document the limit in the manpage.
> 
> Originally reported as a Debian bug:
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661886
> 
> Reported-by: Gabor Kiss 
> Signed-off-by: Luca Boccassi 
> ---
>  man/man8/ip-address.8.in | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/man/man8/ip-address.8.in b/man/man8/ip-address.8.in
> index eaa179c6..e7f14533 100644
> --- a/man/man8/ip-address.8.in
> +++ b/man/man8/ip-address.8.in
> @@ -190,6 +190,7 @@ Each address may be tagged with a label string.
>  In order to preserve compatibility with Linux-2.0 net aliases,
>  this string must coincide with the name of the device or must be prefixed
>  with the device name followed by colon.
> +The maximum allowed length is 15 characters.

Since these are aliases, lets be precise:
The maximum allowed total length of label is 15 characters.


Re: [PATCH iproute2 2/4] man: add more keywords to ip.8 short description

2017-12-29 Thread Stephen Hemminger
On Fri, 29 Dec 2017 23:01:23 +0100
Luca Boccassi  wrote:

> A Debian user suggested adding more network-related keywords to the
> ip manpage, so that manpage-scraping and indexing software like
> apropos can do a better job of categorizing the programs.
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877983
> 
> Suggested-by: Lynoure Braakman 
> Signed-off-by: Luca Boccassi 
> ---
>  man/man8/ip.8 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/man/man8/ip.8 b/man/man8/ip.8
> index ae018fdf..94e64319 100644
> --- a/man/man8/ip.8
> +++ b/man/man8/ip.8
> @@ -1,6 +1,6 @@
>  .TH IP 8 "20 Dec 2011" "iproute2" "Linux"
>  .SH NAME
> -ip \- show / manipulate routing, devices, policy routing and tunnels
> +ip \- show / manipulate routing, network devices, policy routing, interfaces 
> (ethernet and more) and tunnels

Let's just keep it short and drop out policy routing (since that is part of 
routing anyway
and drop the (ethernet and more).



Re: iproute2 net-next

2017-12-29 Thread Stephen Hemminger
On Fri, 29 Dec 2017 09:58:23 +0100
Jiri Pirko  wrote:

> Fri, Dec 29, 2017 at 12:46:31AM CET, dan...@iogearbox.net wrote:
> >On 12/26/2017 10:35 AM, Leon Romanovsky wrote:  
> >> On Mon, Dec 25, 2017 at 10:14:26PM -0800, Stephen Hemminger wrote:  
> >>> On Tue, 26 Dec 2017 06:47:43 +0200
> >>> Leon Romanovsky  wrote:
> >>>  
>  On Mon, Dec 25, 2017 at 10:49:19AM -0800, Stephen Hemminger wrote:  
> > David Ahern has agreed to take over managing the net-next branch of 
> > iproute2.
> > The new location is:
> >  
> > https://git.kernel.org/pub/scm/linux/kernel/git/dsahern/iproute2-next.git/
> >
> > In the past, I have accepted new features into iproute2 master branch, 
> > but
> > am changing the policy so that outside of the merge window (up until 
> > -rc1)
> > new features will get put into net-next to get some more review and 
> > testing
> > time. This means that things like the proposed batch streaming mode will
> > go through net-next.  
> 
>  Did you consider to create one shared repo for the iproute2 to allow
>  multiple committers workflow?  
> >>>
> >>> For now having separate trees is best, there is no need for multiple
> >>> committers the load is very light.
> >>>  
>  It will be much convenient for the users to have one place for
>  master/stable/net-next branches, instead of actually following two
>  different repositories.  
> >>>
> >>> If you are doing network development, you already need to deal with
> >>> multiple repo's on the kernel side so there is no difference.  
> >> 
> >> I agree with you that one extra "git remote add .." is not so huge and
> >> all people who develop for the netdev will do it. My concern is about
> >> Documentation and newcomers, who will have a hard time to find a right
> >> tree.  
> >
> >I guess it would certainly help to identify the official repo to rebase
> >against much quicker if it would be under a common group on korg e.g.
> >
> >  * iproute2/iproute2.git - for current cycle
> >  * iproute2/iproute2-next.git- for net-next bits
> >
> >and also be in line with other tooling (ethtool and others), even if
> >not as high volume, but it would make it unambiguous right away from
> >the other, private iproute2 repos on korg, imho. Just a thought.  
> 
> +1
> 
> I was about to suggest this. This is nice opportunity to do such change.
> 
> 
> >  
>  Example, of such shared repo:
>  BPF: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/
>  Bluetooth: 
>  https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/
>  RDMA: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/  
> >>>
> >>> Most of these are high volume or vendor silo'd which is not the case 
> >>> here.  
> >Cheers,
> >Daniel  

Good news
kup does support links so could make links from personal to iproute2 directory

Bad news
kup won't allow me to make iproute2 directory right now. Will have to wait for
Konstantin


[PATCH net-next 0/2] net: stmmac: Couple of debug prints improvements

2017-12-29 Thread Florian Fainelli
Hi all,

While working on a particular problem, I had to turn on debug prints and found
them to be useful, but could deserve some improvements in order to help debug
situations.

Florian Fainelli (2):
  net: stmmac: Pad ring number with zeroes in display_ring()
  net: stmmac: Allow debug prints of frame_len/COE

 drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c | 2 +-
 drivers/net/ethernet/stmicro/stmmac/enh_desc.c | 2 +-
 drivers/net/ethernet/stmicro/stmmac/norm_desc.c| 2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 5 ++---
 4 files changed, 5 insertions(+), 6 deletions(-)

-- 
2.14.1



[PATCH net-next 1/2] net: stmmac: Pad ring number with zeroes in display_ring()

2017-12-29 Thread Florian Fainelli
Make the printing of the ring number consistent and properly aligned by
padding the ring number with up to 3 zeroes, which covers the maximum
ring size. This makes it a lot easier to see outliers in debug prints.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c | 2 +-
 drivers/net/ethernet/stmicro/stmmac/enh_desc.c | 2 +-
 drivers/net/ethernet/stmicro/stmmac/norm_desc.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
index 7e089bf906b4..2fd8456999f6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
@@ -406,7 +406,7 @@ static void dwmac4_display_ring(void *head, unsigned int 
size, bool rx)
pr_info("%s descriptor ring:\n", rx ? "RX" : "TX");
 
for (i = 0; i < size; i++) {
-   pr_info("%d [0x%x]: 0x%x 0x%x 0x%x 0x%x\n",
+   pr_info("%03d [0x%x]: 0x%x 0x%x 0x%x 0x%x\n",
i, (unsigned int)virt_to_phys(p),
le32_to_cpu(p->des0), le32_to_cpu(p->des1),
le32_to_cpu(p->des2), le32_to_cpu(p->des3));
diff --git a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c 
b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
index 2a828a312814..b47cb5c4da51 100644
--- a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
@@ -428,7 +428,7 @@ static void enh_desc_display_ring(void *head, unsigned int 
size, bool rx)
u64 x;
 
x = *(u64 *)ep;
-   pr_info("%d [0x%x]: 0x%x 0x%x 0x%x 0x%x\n",
+   pr_info("%03d [0x%x]: 0x%x 0x%x 0x%x 0x%x\n",
i, (unsigned int)virt_to_phys(ep),
(unsigned int)x, (unsigned int)(x >> 32),
ep->basic.des2, ep->basic.des3);
diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c 
b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
index db4cee57bb24..ebd9e5e00f16 100644
--- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
@@ -288,7 +288,7 @@ static void ndesc_display_ring(void *head, unsigned int 
size, bool rx)
u64 x;
 
x = *(u64 *)p;
-   pr_info("%d [0x%x]: 0x%x 0x%x 0x%x 0x%x",
+   pr_info("%03d [0x%x]: 0x%x 0x%x 0x%x 0x%x",
i, (unsigned int)virt_to_phys(p),
(unsigned int)x, (unsigned int)(x >> 32),
p->des2, p->des3);
-- 
2.14.1



[PATCH net-next 2/2] net: stmmac: Allow debug prints of frame_len/COE

2017-12-29 Thread Florian Fainelli
There is no reason not to allow printing the frame_len/COE value and put
that under a check for ETH_FRAME_LEN, drop it so we can see what the
descriptor reports.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 0323d672e1c5..d9c98fd810bb 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3436,9 +3436,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, 
u32 queue)
if (netif_msg_rx_status(priv)) {
netdev_dbg(priv->dev, "\tdesc: %p [entry %d] 
buff=0x%x\n",
   p, entry, des);
-   if (frame_len > ETH_FRAME_LEN)
-   netdev_dbg(priv->dev, "frame size %d, 
COE: %d\n",
-  frame_len, status);
+   netdev_dbg(priv->dev, "frame size %d, COE: 
%d\n",
+  frame_len, status);
}
 
/* The zero-copy is always used for all the sizes
-- 
2.14.1



Re: b53 tags on bpi-r1 (bcm53125)

2017-12-29 Thread Florian Fainelli
Le 12/29/17 à 13:56, Florian Fainelli a écrit :
> Le 12/29/17 à 12:21, Florian Fainelli a écrit :
>> Hi Jochen,
>>
>> Le 12/18/17 à 02:44, Jochen Friedrich a écrit :
>>> Hi Florian,
>>>
>>> unfortunately, this doesn't make any difference.
>>>
>>> Just out of curiosity, BPI-R1 has pull-down resistors on LED6 and 7
>>> (MII_MODE0/1). However, the public available 53125U sheet doesn't
>>> document these pins:
>>>
>>> LED[6] IMP_MODE[0] Pull-up Active low (since IMP Mode is not used)
>>> LED[7] IMP_MODE[1] Pull-up Active low (since IMP Mode is not used)
>>>
>>> Is this MII mode maybe incompatible with Broadcom tags?
>>
>> AFAICT, it should not, this is largely independent from enabling
>> Broadcom tags.
>>
>> I have now reproduced this on my BPI-R1 as well and am looking at what
>> might be going wrong.
> 
> OK, so I have been able to find out what was going on. On BCM53125 and
> earlier switches, we need to do a couple of things for Broadcom tags to
> be usable:
> 
> - turn on managed mode (SM_SW_FWD_MODE)
> - configure Port 8 for IMP mode (B53_GLOBAL_CONFIG, setting bit
> GC_FRM_MGMT_PORT_MII)
> 
> After doing that, I can now see the correct outgoing packets on my host,
> however, the replies don't seem to be delivered to the per-port DSA
> network devices, and I suspect it's because of stmmac, so I am
> investigating this now.
> 

So stmmac was indeed part of the problem. I had to clear the
GMAC_CONTROL_ACS bit in GMAC_CORE_INIT in order to allow stmmac to
properly receive packets, otherwise, packets were truncated to 8 bytes
on reception which I assume is because the Broadcom tags may look like
some sort of weird LLC/SNAP packet which was confusing the hardware.
This is all working correctly now after a bunch of changes that I will
submit in the next few days.

Preliminary patches available here:

https://github.com/ffainelli/linux/commits/stmmac-fixes

Thank you very much for your patience!
-- 
Florian


RE: ATTENTION!!!

2017-12-29 Thread Loretta Robles


From: Loretta Robles
Sent: Friday, December 29, 2017 1:01 PM
To: Loretta Robles
Subject: ATTENTION!!!

You have been randomly selected for a donation. Contact soriz4...@gmail.com for 
claims.


Re: KMSAN reports use of uninitialized memory in pfkey_sendmsg()

2017-12-29 Thread Eric Biggers
On Fri, Dec 29, 2017 at 05:49:34PM +0100, Dmitry Vyukov wrote:
> On Fri, Dec 29, 2017 at 5:48 PM, Alexander Potapenko  
> wrote:
> > Hi all,
> >
> > KMSAN reports a use of uninitialized value on the following program:
> >
> > ==
> > // autogenerated by syzkaller (http://github.com/google/syzkaller)
> >
> > #include 
> > #include 
> > #include 
> >
> > int main()
> > {
> >   int sock = socket(PF_KEY, SOCK_RAW, 2);
> >   char buf[24];
> >   memset(buf, 0, 24);
> >   buf[0] = 2;
> >   buf[1] = 4;
> >   buf[4] = 3;
> >   buf[16] = 1;
> >   buf[18] = 0x17;  // SADB_X_EXT_NAT_T_OA
> >   struct msghdr hdr;
> >   memset(&hdr, 0, sizeof(struct msghdr));
> >   struct iovec iov;
> >   hdr.msg_iov = &iov;
> >   hdr.msg_iovlen = 1;
> >   iov.iov_base = buf;
> >   iov.iov_len = 0x18;
> >   sendmsg(sock, &hdr, 0);
> > }
> > ==
> >
> > the report is below:
> >
> > ==
> > BUG: KMSAN: use of uninitialized memory in pfkey_sendmsg+0x11ad/0x1900
> > CPU: 0 PID: 2946 Comm: probe Not tainted 4.13.0+ #3626
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > Call Trace:
> >  show_stack+0x12f/0x180 arch/x86/kernel/dumpstack.c:177
> >  __dump_stack lib/dump_stack.c:16
> >  dump_stack+0x185/0x1d0 lib/dump_stack.c:52
> >  kmsan_report+0x137/0x1c0 mm/kmsan/kmsan.c:1066
> >  __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:676
> >  verify_address_len net/key/af_key.c:404
> >  parse_exthdrs net/key/af_key.c:532
> >  pfkey_process net/key/af_key.c:2811
> >  pfkey_sendmsg+0x11ad/0x1900 net/key/af_key.c:3654
> >  sock_sendmsg_nosec net/socket.c:633
> >  sock_sendmsg net/socket.c:643
> >  ___sys_sendmsg+0xed5/0x1330 net/socket.c:2035
> >  __sys_sendmsg net/socket.c:2069
> >  SYSC_sendmsg+0x2a6/0x3d0 net/socket.c:2080
> >  SyS_sendmsg+0x54/0x80 net/socket.c:2076
> >  do_syscall_64+0x2f4/0x420 arch/x86/entry/common.c:284
> >  entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
> > RIP: 0033:0x401140
> > RSP: 002b:7ffe52abc9e8 EFLAGS: 0246 ORIG_RAX: 002e
> > RAX: ffda RBX: 004002b0 RCX: 00401140
> > RDX:  RSI: 7ffe52abca10 RDI: 0003
> > RBP: 7ffe52abca80 R08: 0002 R09: 0004
> > R10: 0004 R11: 0246 R12: 
> > R13: 004063e0 R14: 00406470 R15: 
> > origin:
> >  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:63
> >  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303
> >  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213
> >  kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:339
> >  kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:346
> >  slab_post_alloc_hook mm/slab.h:442
> >  slab_alloc_node mm/slub.c:2724
> >  __kmalloc_node_track_caller+0xa46/0x1230 mm/slub.c:4335
> >  __kmalloc_reserve net/core/skbuff.c:139
> >  __alloc_skb+0x2c6/0x9f0 net/core/skbuff.c:232
> >  alloc_skb ./include/linux/skbuff.h:904
> >  pfkey_sendmsg+0x201/0x1900 net/key/af_key.c:3641
> >  sock_sendmsg_nosec net/socket.c:633
> >  sock_sendmsg net/socket.c:643
> >  ___sys_sendmsg+0xed5/0x1330 net/socket.c:2035
> >  __sys_sendmsg net/socket.c:2069
> >  SYSC_sendmsg+0x2a6/0x3d0 net/socket.c:2080
> >  SyS_sendmsg+0x54/0x80 net/socket.c:2076
> >  do_syscall_64+0x2f4/0x420 arch/x86/entry/common.c:284
> >  return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
> > ==
> >
> > Apparently pfkey_sendmsg reads skb past the end of the buffer copied
> > from the userspace.
> > Could someone please take a look?
> 

Thanks for reporting this!  The problem is that verify_address_len() doesn't
verify that the buffer has space for the ->sa_family field before reading it.
I've sent out a patch.

I also noticed a similar bug in parse_exthdrs(); it doesn't verify that the
buffer has space for the 'struct sadb_ext' before reading it.  I've sent out a
patch for that as well.

Eric


[PATCH] af_key: fix buffer overread in parse_exthdrs()

2017-12-29 Thread Eric Biggers
From: Eric Biggers 

If a message sent to a PF_KEY socket ended with an incomplete extension
header (fewer than 4 bytes remaining), then parse_exthdrs() read past
the end of the message, into uninitialized memory.  Fix it by returning
-EINVAL in this case.

Reproducer:

#include 
#include 
#include 

int main()
{
int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
char buf[17] = { 0 };
struct sadb_msg *msg = (void *)buf;

msg->sadb_msg_version = PF_KEY_V2;
msg->sadb_msg_type = SADB_DELETE;
msg->sadb_msg_len = 2;

write(sock, buf, 17);
}

Cc: sta...@vger.kernel.org
Signed-off-by: Eric Biggers 
---
 net/key/af_key.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 596499cc8b2f..d40861a048fe 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -516,6 +516,9 @@ static int parse_exthdrs(struct sk_buff *skb, const struct 
sadb_msg *hdr, void *
uint16_t ext_type;
int ext_len;
 
+   if (len < sizeof(*ehdr))
+   return -EINVAL;
+
ext_len  = ehdr->sadb_ext_len;
ext_len *= sizeof(uint64_t);
ext_type = ehdr->sadb_ext_type;
-- 
2.15.1



[PATCH] af_key: fix buffer overread in verify_address_len()

2017-12-29 Thread Eric Biggers
From: Eric Biggers 

If a message sent to a PF_KEY socket ended with one of the extensions
that takes a 'struct sadb_address' but there were not enough bytes
remaining in the message for the ->sa_family member of the 'struct
sockaddr' which is supposed to follow, then verify_address_len() read
past the end of the message, into uninitialized memory.  Fix it by
returning -EINVAL in this case.

This bug was found using syzkaller with KMSAN.

Reproducer:

#include 
#include 
#include 

int main()
{
int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
char buf[24] = { 0 };
struct sadb_msg *msg = (void *)buf;
struct sadb_address *addr = (void *)(msg + 1);

msg->sadb_msg_version = PF_KEY_V2;
msg->sadb_msg_type = SADB_DELETE;
msg->sadb_msg_len = 3;
addr->sadb_address_len = 1;
addr->sadb_address_exttype = SADB_EXT_ADDRESS_SRC;

write(sock, buf, 24);
}

Reported-by: Alexander Potapenko 
Cc: sta...@vger.kernel.org
Signed-off-by: Eric Biggers 
---
 net/key/af_key.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 3dffb892d52c..596499cc8b2f 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -401,6 +401,11 @@ static int verify_address_len(const void *p)
 #endif
int len;
 
+   if (sp->sadb_address_len <
+   DIV_ROUND_UP(sizeof(*sp) + offsetofend(typeof(*addr), sa_family),
+sizeof(uint64_t)))
+   return -EINVAL;
+
switch (addr->sa_family) {
case AF_INET:
len = DIV_ROUND_UP(sizeof(*sp) + sizeof(*sin), 
sizeof(uint64_t));
-- 
2.15.1



Re: [PATCH v3 bpf-next 1/2] tools/bpftool: use version from the kernel source tree

2017-12-29 Thread Daniel Borkmann
On 12/27/2017 08:16 PM, Roman Gushchin wrote:
> Bpftool determines it's own version based on the kernel
> version, which is picked from the linux/version.h header.
> 
> It's strange to use the version of the installed kernel
> headers, and makes much more sense to use the version
> of the actual source tree, where bpftool sources are.
> 
> Fix this by building kernelversion target and use
> the resulting string as bpftool version.

Series applied to bpf-next, thanks everyone!


Re: [RFT net-next v3 3/5] net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration

2017-12-29 Thread Martin Blumenstingl
Hi Jerome,

On Fri, Dec 29, 2017 at 6:57 PM, Jerome Brunet  wrote:
> On Thu, 2017-12-28 at 23:21 +0100, Martin Blumenstingl wrote:
>> While testing the dwmac-meson8b with an RGMII PHY on Meson8b we
>> discovered that the m25_div is not actually a divider but rather a gate.
>> This matches with the datasheet which describes bit 10 as "Generate
>> 25MHz clock for PHY". Back when the driver was written it was assumed
>> that this was a divider (which could divide by 5 or 10) because other
>> clock bits in the datasheet were documented incorrectly.
>
> Maybe this bit 10 is indeed a 5/10 divider, as amlogic claims it is. Maybe, as
> Emiliano suggested, the output rate of div250 actually needs to be 250Mhz in
> RGMII, before being divided by 10 to produce the 25MHz of div25
>
> IOW, maybe we need this intermediate rate.
I am starting to believe that you (Emiliano and Jerome) are both right
does anyone of you have access to a scope so we can measure the actual
clock output?

> It would not be surprising, 1GBps usually requires a 125MHz clock somewhere.
this could mean that two clocks are derived from m250_div (names are
not final obviously):
- phy_ref_clk (25MHz or 50MHz)
- rgmii_tx_clk (fixed divide by 2, 125MHz)

> This is still doable:
> * Keep the divider
> * drop CLK_SET_RATE_PARENT on div25
> * call set_rate on div250 with 250MHz then on div25 with 25Mhz
yep, I will try this next
this would also be work with the assumption that the rgmii_tx_clk is
derived from m250_div

>
>>
>> Tests have shown that without bit 10 set the RTL8211F RGMII PHY on
>> Odroid-C1 (using a Meson8b SoC) does not work.
>> On GXBB and newer SoCs (where the driver was initially tested with RGMII
>> PHYs) this is not a problem because the input clock is running at 1GHz.
>> The m250_div clock's biggest possible divider is 7 (3-bit divider, with
>> value 0 being reserved). Thus we end up with a m250_div of 4 and a
>> "m25_div" of 10 (= register value 1).
>>
>> Instead it turns out that the Ethernet IP block seems to have a fixed
>> "divide by 10" clock internally. This means that bit 10 is a gate clock
>> which enables the RGMII clock output.
>>
>> This replaces the "m25_div" clock with a clock gate called "m25_en"
>> which ensures that we can set this bit to 1 whenever we enable RGMII
>> mode. This however means that we are now missing a "divide by 10" after
>> the m250_div (and before our new m25_en), otherwise the common clock
>> framework thinks that the rate of the m25_en clock is 10-times higher
>> than it is in the actual hardware. That is solved by adding a
>> fixed-factor clock which divides the m250_div output by 10.
>>
>> Fixes: 566e8251625304 ("net: stmmac: add a glue driver for the Amlogic Meson 
>> 8b / GXBB DWMAC")
>> Reported-by: Emiliano Ingrassia 
>> Signed-off-by: Martin Blumenstingl 
>> ---
>>  .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 66 
>> +-
>>  1 file changed, 38 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
>> index 1c14210df465..7199e8c08536 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
>> @@ -40,9 +40,7 @@
>>  #define PRG_ETH0_CLK_M250_DIV_SHIFT7
>>  #define PRG_ETH0_CLK_M250_DIV_WIDTH3
>>
>> -/* divides the result of m25_sel by either 5 (bit unset) or 10 (bit set) */
>> -#define PRG_ETH0_CLK_M25_DIV_SHIFT 10
>> -#define PRG_ETH0_CLK_M25_DIV_WIDTH 1
>> +#define PRG_ETH0_GENERATE_25M_PHY_CLOCK10
>>
>>  #define PRG_ETH0_INVERTED_RMII_CLK BIT(11)
>>  #define PRG_ETH0_TX_AND_PHY_REF_CLKBIT(12)
>> @@ -63,8 +61,11 @@ struct meson8b_dwmac {
>> struct clk_divider  m250_div;
>> struct clk  *m250_div_clk;
>>
>> -   struct clk_divider  m25_div;
>> -   struct clk  *m25_div_clk;
>> +   struct clk_fixed_factor fixed_div10;
>> +   struct clk  *fixed_div10_clk;
>> +
>> +   struct clk_gate m25_en;
>> +   struct clk  *m25_en_clk;
>
> Maybe it could be the topic of another series but we don't need to keep all
> those structures around, thanks to devm
>
> all clk_divider, fixed_factor, gate and mux can go away
> You only need to keep  the'struct clk*' you are going to use later on
>
> at the moment it would be m25_en_clk only.
let's get the whole thing to work first, then I will have another look
at the struct members (it already annoyed me too)


PS: on another note: the title of this series and most patches is
wrong as I just discovered:
the 25MHz clock is *NOT* the RGMII clock, but it's the "PHY reference
clock". Hardkernel calls it "ETH_PHY_REF_CLK_25MOUT" in their
Odroid-C1 schematics [4] on page 23, which is the only actual
description I could find for this pin (on the Khadas VIM2 schematics
for example there's a 25MHz clock seemingly coming out of nowhere)

I used th

Re: [RFT net-next v3 0/5] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-29 Thread Emiliano Ingrassia
Hi Jerome, Hi Martin,

On Fri, Dec 29, 2017 at 07:04:23PM +0100, Jerome Brunet wrote:
> On Fri, 2017-12-29 at 02:31 +0100, Emiliano Ingrassia wrote:
> > Hi Martin, Hi Dave,
> > 
> > On Thu, Dec 28, 2017 at 11:21:23PM +0100, Martin Blumenstingl wrote:
> > > Hi Dave,
> > > 
> > > please do not apply this series until it got a Tested-by from Emiliano.
> > > 
> > > 
> > > Hi Emiliano,
> > > 
> > > you reported [0] that you couldn't get dwmac-meson8b to work on your
> > > Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
> > > I think I was able to find a fix: it consists of two patches (which you
> > > find in this series)
> > > 
> > > Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
> > > only partially test this (I could only check if the clocks were
> > > calculated correctly when using a dummy 52394Hz input clock instead
> > > of MPLL2).
> > > 
> > > Could you please give this series a try and let me know about the
> > > results?
> > > You obviously still need your two "ARM: dts: meson8b" patches which
> > > - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
> > > - enable Ethernet on the Odroid-C1
> > > 
> > > When testing on Meson8b this also needs a fix for the MPLL clock driver:
> > > "clk: meson: mpll: use 64-bit maths in params_from_rate", see:
> > > https://patchwork.kernel.org/patch/10131677/
> > > 
> > > 
> > > I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
> > > and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
> > > fine (so let's hope that this also fixes your Meson8b issue :)).
> > > 
> > > 
> > > changes since v1 at [1]:
> > > - changed the subject of the cover-letter to indicate that this is all
> > >   about the RGMII clock
> > > - added PATCH #1 which ensures that we don't unnecessarily change the
> > >   parent clocks in RMII mode (and also makes the code easier to
> > >   understand)
> > > - changed subject of PATCH #2 (formerly PATCH #1) to state that this
> > >   is about the RGMII clock
> > > - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
> > > - replaced PATCH #3 (formerly PATCH #2) with one that sets
> > >   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
> > >   on Meson8b correctly
> > > 
> > > changes since v2 at [2]:
> > > - added PATCH #2 to make the following patch easier
> > > - Emiliano reported that there's currently another bug in the
> > >   dwmac-meson8b driver which prevents it from working with RGMII PHYs on
> > >   Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
> > >   (instead of a divide by 5 or divide by 10 clock divider). This has not
> > >   been visible on GXBB and later due to the input clock which always led
> > >   to a selection of "divide by 10" (which is done internally in the IP
> > >   block, but the bit actually means "enable RGMII clock output").
> > >   PATCH #3 was added to address this issue.
> > > - the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
> > >   updated and the patch itself rebased because the m25_div clock was
> > >   removed with the new PATCH #3 (so some of the statements were not
> > >   valid anymore)
> > > 
> > 
> > Here is the clk_summary relative to ethernet on Odroid-C1+
> > with this new series applied:
> > 
> > xtal112400  
> > 0 0
> >  sys_pll00  12  0 0
> >   cpu_clk   00  12  0 0
> >  vid_pll00   73200  0 0
> >  fixed_pll  22  255000  0 0
> >   mpll2 11   24701  
> > 0 0
> >c941.ethernet#m250_sel   11   24701  0 0
> > c941.ethernet#m250_div  11   24701  
> > 0 0
> >  c941.ethernet#fixed_div10  112470  0 0
> >   c941.ethernet#m25_en  112470  
> > 0 0
> > 
> > The ethernet prg0 register is set to 0x74A1 which should be correct with
> > respect to the information contained in the S805 SoC manual.
> > Actually, the ethernet is not yet fully functional.
> > Trying to ping the board, I can see ARP request from host to board using
> > tcpdump. However, the host can't see any response.
> 
> If the rx path is ok-ish, I suppose the clock setting applied is good.
> Maybe you could try to play with the tx delay (BIT 5-6 of the register) ?
>

Thanks for the suggestion. Finally the ethernet works correctly using 4
ns as tx-delay.
The clock summary is the same reported above. The prg0 ethernet register
value is 0x74c1 as expected.

I would like to thanks Martin for the support!
As soon as this patch series [v3] will be submitted, I'll submit my patch for
the device tree.
Let me know if you have any qu

Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface

2017-12-29 Thread Antoine Tenart
Hi Russell,

On Thu, Dec 28, 2017 at 06:59:21PM +, Russell King - ARM Linux wrote:
> On Thu, Dec 28, 2017 at 11:04:16AM +0100, Antoine Tenart wrote:
> > 
> > That's not what I remembered. You had some valid points, and others
> > related to PHY modes the driver wasn't supporting before the phylink
> > transition. My understanding of this was that you wanted a full
> > featured support while I only wanted to convert the already supported
> > modes.
> 
> You are mistaken - you can get a full refresher on where things were
> at via https://patchwork.kernel.org/patch/9963971/

I read it again and still have the same feeling. There's been a
misunderstanding at some point. Anyway, let's move forward :)

> 1. I asked for details about what mvpp2.c supports that phylink does
>not (as you indicated that there were certain things that mvpp2
>supports that phylink does not.)  I'm still awaiting a response.

I don't remember PHY modes supported in the PPv2 driver that aren't
supported in PHYLINK. I think this point is the main misunderstanding. I
thought you wanted me to support modes unsupported in the PPv2 driver
before. But you explained quite well what these comments were about
below.

So I guess this point is resolved (aka I'll have to take your comments
into account for the v2).

> 2. 25th Sept, you indicated that you would get someone to test
>an issue related to in-band AN. No results of that testing have
>been forthcoming.

That's right. I asked someone to make a test, but did not get an answer.
And because the PHYLINK patch stalled on my side I kinda forget about
it. I'll try again to have this test made.

> I am not after a full featured support, what I'm after is ensuring
> that phylink is (a) used correctly and (b) implementations using it
> are correct.  Part of that is ensuring that users don't introduce
> unexpected failure conditions.
> 
> So, when you do this in the validate() callback:
> 
>  +   phylink_set(mask, 1000baseX_Full);
> 
> and then do this in the mac_config() callback:
> 
>  +   if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
>  +   port->phy_interface != PHY_INTERFACE_MODE_SGMII)
>  +   return;
> 
> and this in the link_state() callback:
> 
>  +   if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
>  +   port->phy_interface != PHY_INTERFACE_MODE_SGMII)
>  +   return 0;
> 
> the result is that phylink thinks that you support 1000base-X modes,
> and it will call mac_config() asking for 1000base-X, but you silently
> ignore that, leaving the hardware configured in whatever state it was.
> That leads to a silent failure as far as the user is concerned.
> 
> So, if you do not intend to support 1000base-X initially, don't
> allow it in the validate callback until you do.
> 
> It gets worse, because the return in link_state() means that phylink
> thinks that the link is up if it has requested 1000base-X, which it
> won't be unless you've properly configured it.
> 
> It's this kind of unreliability that I was concerned about in your
> patch.  I'm not demanding "full featured implementation" but I do
> want you to use it correctly.

Thanks for the detailed explanations!

> > > What I'm most concerned about, given the bindings for comphy that
> > > have been merged, is that Free Electrons is pushing forward seemingly
> > > with no regard to the requirement that the serdes lanes are dynamically
> > > reconfigurable, and that's a basic requirement for SFP, and for the
> > > 88x3310 PHYs on the Macchiatobin platform.
> > 
> > The main idea behind the comphy driver is to provide a way to
> > reconfigure the serdes lanes at runtime. Could you develop what are
> > blocking points to properly support SFP, regarding the current comphy
> > support?
> 
> If it supports serdes lane mode reconfiguration (iow, switching between
> 1000base-X, 2500base-X, SGMII, 10G-KR), then that's all that's required.

It does, and the PPv2 driver already ask the COMPHY driver to perform
these reconfigurations (when using the 10G/1G interface on the mcbin for
example).

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


Re: [PATCH net-next 2/2] l2tp: add peer_offset parameter

2017-12-29 Thread Lorenzo Bianconi
> Sorry for only just seeing this (vacation).
>
>
> On 28/12/17 19:45, Guillaume Nault wrote:
>>
>> On Thu, Dec 28, 2017 at 07:23:48PM +0100, Lorenzo Bianconi wrote:
>>>
>>> On Dec 28, Guillaume Nault wrote:

 After a quick review of L2TPv3 and pseudowires RFCs, I still don't see
 how adding some padding between the L2TPv3 header and the payload could
 constitute a valid frame. Of course, the base feature is already there,
 but after a quick test, it looks like the padding bits aren't
 initialised and leak memory.
>>>
>>> Do you mean for L2TPv2 or L2TPv3? For L2TPv3 offset/peer_offset are
>>> initialized
>>> in l2tp_nl_cmd_session_create()
>>>
>> That's the offsets stored in the l2tp_session_cfg structure. But I was
>> talking about the xmit path: l2tp_build_l2tpv3_header() doesn't
>> initialise the padding between the header and the payload. So when
>> someone activates this option, then every transmitted packet leaks
>> memory on the wire.
>>
>>> Setting session data offset is already supported in L2TP kernel module
>>> (and could be already used by userspace applications);
>>> for L2TPv2 there is an optional 16-bit value in the header while for
>>> L2TPv3
>>> the offset is configured by userspace.
>>> At the moment the kernel (for L2TPv3) uses offset for both tx and rx
>>> side.
>>> Userspace (iproute2) allows to distinguish tx offset (offset) from rx one
>>> (peer_offset) but since the rx part is not handled at the moment
>>> (I fixed peer_offset support in iproute2, I have not sent the patch
>>> upstream yet, attached below)
>>> this leads to a misalignment between tunnel endpoints.
>>> You can easily reproduce the issue using this setup (and the below patch
>>> for iproute2):
>>>
>>> ip l2tp add tunnel local  remote  tunnel_id 
>>> peer_tunnel_id  udp_sport  udp_dport 
>>> ip l2tp add tunnel local  remote  tunnel_id 
>>> peer_tunnel_id  udp_sport  udp_dport 
>>>
>>> ip l2tp add session name l2tp0 tunnel_id  session_id 
>>> peer_session_id  offset 8 peer_offset 16
>>> ip l2tp add session name l2tp0 tunnel_id  session_id 
>>> peer_session_id  offset 16 peer_offset 8
>>>
>> Yes, I'm well aware of that. And thanks for having worked on a full
>> solution including iproute2. But does one really need to set
>> asymetrical offset values? It doesn't look wrong to require setting the
>> same value on both sides. Other options need this, like "l2spec_type".
>>
>> Here we have an option that:
>>* creates invalid packets (AFAIK),
>>* is buggy and leaks memory on the network,
>>* doesn't seem to have any use case (even the manpage
>>  says "This is hardly ever used").
>>
>> So I'm sorry, but I don't see the point in expanding this option to
>> allow even stranger setups. If there's a use case, then fine.
>> Otherwise, let's just acknowledge that the "peer_offset" option of
>> iproute2 is a noop (and maybe remove it from the manpage).
>>
>> And the kernel "offset" option needs to be fixed. Actually, I wouldn't
>> mind if it was converted to be a noop, or even rejected. L2TP already
>> has its share of unused features that complicate the code and hamper
>> evolution and bug fixing. As I said earlier, if it's buggy, unused and
>> can't even produce valid packets, then why bothering with it?
>>
>> But that's just my point of view. James, do you have an opinion on
>> this?
>
>
> I agree, Guillaume.
>
> The L2TPv3 protocol RFC dropped the configurable offset of L2TPv2 - instead,
> the Layer-2-Specific-Sublayer is supposed to handle any transport-specific
> data alignment requirements. I think a configurable offset has found its way
> into iproute2 l2tp commands by mistake, perhaps because the netlink API
> defines an attribute for it, but which was only intended for use with
> L2TPv2. For L2TPv2, we only configure the offset for transmitted packets. In
> received packets, the offset (if present) is obtained from the L2TPv2 header
> in each received packet. There is no need to add a peer-offset netlink
> attribute to set the offset expected in received packets.
>
> Lorenzo, is this being added to fix interoperability with another L2TPv3
> implementation? If so, can you share more details?
>

Hi James,

I introduced peer_offset parameter to fix a specific setup where
tunnel endpoints
running L2TPv3 would use different values for tx offset (since in
iproute2 there is no
restriction on it), not to fix a given an interoperability issue.
I introduced this feature since:
 - offset has been added for long time to L2TPv3 implementation
   (commit f7faffa3ff8ef6ae712ef16312b8a2aa7a1c95fe and
   commit 309795f4bec2d69cd507a631f82065c2198a0825) and I wanted to
preserve UABI
 - have the same degree of freedom for offset parameter we have in
L2TPv2 and fix the issue
   described above

Now what we can do I guess is:
- as suggested by Guillaume drop completely the offset support without removing
  netlink attribute in order to not break UABI
- fix offset support initializing properly padding bit

[PATCH iproute2 2/4] man: add more keywords to ip.8 short description

2017-12-29 Thread Luca Boccassi
A Debian user suggested adding more network-related keywords to the
ip manpage, so that manpage-scraping and indexing software like
apropos can do a better job of categorizing the programs.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877983

Suggested-by: Lynoure Braakman 
Signed-off-by: Luca Boccassi 
---
 man/man8/ip.8 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/man8/ip.8 b/man/man8/ip.8
index ae018fdf..94e64319 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -1,6 +1,6 @@
 .TH IP 8 "20 Dec 2011" "iproute2" "Linux"
 .SH NAME
-ip \- show / manipulate routing, devices, policy routing and tunnels
+ip \- show / manipulate routing, network devices, policy routing, interfaces 
(ethernet and more) and tunnels
 .SH SYNOPSIS
 
 .ad l
-- 
2.14.2



[PATCH iproute2 3/4] man: ip-address: document 15-char limit for LABEL

2017-12-29 Thread Luca Boccassi
Trying to set a label longer than 15 characters returns an error:
 RTNETLINK answers: Numerical result out of range

Document the limit in the manpage.

Originally reported as a Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661886

Reported-by: Gabor Kiss 
Signed-off-by: Luca Boccassi 
---
 man/man8/ip-address.8.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/man/man8/ip-address.8.in b/man/man8/ip-address.8.in
index eaa179c6..e7f14533 100644
--- a/man/man8/ip-address.8.in
+++ b/man/man8/ip-address.8.in
@@ -190,6 +190,7 @@ Each address may be tagged with a label string.
 In order to preserve compatibility with Linux-2.0 net aliases,
 this string must coincide with the name of the device or must be prefixed
 with the device name followed by colon.
+The maximum allowed length is 15 characters.
 
 .TP
 .BI scope " SCOPE_VALUE"
-- 
2.14.2



[PATCH iproute2 4/4] man: routel/routef: don't mention filesystem paths

2017-12-29 Thread Luca Boccassi
The filesytem paths to these scripts might be different on various
distros, so don't mention it in the manpages. It is not really useful
information anyway.

Originally submitted as Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561424

Reported-by: jida...@jidanni.org
Signed-off-by: Luca Boccassi 
---
 man/man8/routel.8 | 5 -
 1 file changed, 5 deletions(-)

diff --git a/man/man8/routel.8 b/man/man8/routel.8
index 82d580fb..2270eacb 100644
--- a/man/man8/routel.8
+++ b/man/man8/routel.8
@@ -17,11 +17,6 @@ The routel script will list routes in a format that some 
might consider easier t
 .br
 The routef script does not take any arguments and will simply flush the 
routing table down the drain. Beware! This means deleting all routes which will 
make your network unusable!
 
-.SH "FILES"
-.LP
-\fI/usr/bin/routef\fP
-.br
-\fI/usr/bin/routel\fP
 .SH "AUTHORS"
 .LP
 The routel script was written by Stephen R. van den Berg , 
1999/04/18 and donated to the public domain.
-- 
2.14.2



[PATCH iproute2 1/4] man: drop references to Debian-specific paths

2017-12-29 Thread Luca Boccassi
Documentation should be distribution-agnostic - any specific quirks
should be handled by downstream maintainers, if necessary.
Remove mentions of Debian paths and package names.

Signed-off-by: Luca Boccassi 
---
 man/man8/lnstat.8 | 3 +--
 man/man8/ss.8 | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/man/man8/lnstat.8 b/man/man8/lnstat.8
index acd5f4a2..b98241bf 100644
--- a/man/man8/lnstat.8
+++ b/man/man8/lnstat.8
@@ -254,8 +254,7 @@ Number of hash table list traversals for output traffic. 
Deprecated since IP
 route cache removal, therefore always zero.
 
 .SH SEE ALSO
-.BR ip (8),
-and /usr/share/doc/iproute-doc/README.lnstat (package iproute-doc on Debian)
+.BR ip (8)
 .br
 .SH AUTHOR
 lnstat was written by Harald Welte .
diff --git a/man/man8/ss.8 b/man/man8/ss.8
index 0d526734..973afbe0 100644
--- a/man/man8/ss.8
+++ b/man/man8/ss.8
@@ -327,7 +327,7 @@ Read filter information from FILE.
 Each line of FILE is interpreted like single command line option. If FILE is - 
stdin is used.
 .TP
 .B FILTER := [ state STATE-FILTER ] [ EXPRESSION ]
-Please take a look at the official documentation (Debian package iproute-doc) 
for details regarding filters.
+Please take a look at the official documentation for details regarding filters.
 
 .SH STATE-FILTER
 
@@ -382,7 +382,6 @@ Find all local processes connected to X server.
 List all the tcp sockets in state FIN-WAIT-1 for our apache to network 
193.233.7/24 and look at their timers.
 .SH SEE ALSO
 .BR ip (8),
-.BR /usr/share/doc/iproute-doc/ss.html " (package iproute�doc)",
 .br
 .BR RFC " 793 "
 - https://tools.ietf.org/rfc/rfc793.txt (TCP states)
-- 
2.14.2



Re: b53 tags on bpi-r1 (bcm53125)

2017-12-29 Thread Florian Fainelli
Le 12/29/17 à 12:21, Florian Fainelli a écrit :
> Hi Jochen,
> 
> Le 12/18/17 à 02:44, Jochen Friedrich a écrit :
>> Hi Florian,
>>
>> unfortunately, this doesn't make any difference.
>>
>> Just out of curiosity, BPI-R1 has pull-down resistors on LED6 and 7
>> (MII_MODE0/1). However, the public available 53125U sheet doesn't
>> document these pins:
>>
>> LED[6] IMP_MODE[0] Pull-up Active low (since IMP Mode is not used)
>> LED[7] IMP_MODE[1] Pull-up Active low (since IMP Mode is not used)
>>
>> Is this MII mode maybe incompatible with Broadcom tags?
> 
> AFAICT, it should not, this is largely independent from enabling
> Broadcom tags.
> 
> I have now reproduced this on my BPI-R1 as well and am looking at what
> might be going wrong.

OK, so I have been able to find out what was going on. On BCM53125 and
earlier switches, we need to do a couple of things for Broadcom tags to
be usable:

- turn on managed mode (SM_SW_FWD_MODE)
- configure Port 8 for IMP mode (B53_GLOBAL_CONFIG, setting bit
GC_FRM_MGMT_PORT_MII)

After doing that, I can now see the correct outgoing packets on my host,
however, the replies don't seem to be delivered to the per-port DSA
network devices, and I suspect it's because of stmmac, so I am
investigating this now.
-- 
Florian


[PATCH] p54spi: Delete an error message for a failed memory allocation in p54spi_rx()

2017-12-29 Thread SF Markus Elfring
From: Markus Elfring 
Date: Fri, 29 Dec 2017 22:33:10 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/wireless/intersil/p54/p54spi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wireless/intersil/p54/p54spi.c 
b/drivers/net/wireless/intersil/p54/p54spi.c
index e41bf042352e..900cfaacf25e 100644
--- a/drivers/net/wireless/intersil/p54/p54spi.c
+++ b/drivers/net/wireless/intersil/p54/p54spi.c
@@ -367,7 +367,6 @@ static int p54spi_rx(struct p54s_priv *priv)
skb = dev_alloc_skb(len + 4);
if (!skb) {
p54spi_sleep(priv);
-   dev_err(&priv->spi->dev, "could not alloc skb");
return -ENOMEM;
}
 
-- 
2.15.1



[PATCH] rt2x00: Delete an error message for a failed memory allocation in rt2x00queue_allocate()

2017-12-29 Thread SF Markus Elfring
From: Markus Elfring 
Date: Fri, 29 Dec 2017 22:11:42 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/wireless/ralink/rt2x00/rt2x00queue.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c 
b/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c
index a2c1ca5c76d1..6598cefdbe71 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c
@@ -1244,10 +1244,8 @@ int rt2x00queue_allocate(struct rt2x00_dev *rt2x00dev)
rt2x00dev->data_queues = 2 + rt2x00dev->ops->tx_queues + req_atim;
 
queue = kcalloc(rt2x00dev->data_queues, sizeof(*queue), GFP_KERNEL);
-   if (!queue) {
-   rt2x00_err(rt2x00dev, "Queue allocation failed\n");
+   if (!queue)
return -ENOMEM;
-   }
 
/*
 * Initialize pointers
-- 
2.15.1



[PATCH] cw1200: Delete an error message for a failed memory allocation in three functions

2017-12-29 Thread SF Markus Elfring
From: Markus Elfring 
Date: Fri, 29 Dec 2017 21:48:05 +0100

Omit an extra message for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/wireless/st/cw1200/cw1200_sdio.c | 4 +---
 drivers/net/wireless/st/cw1200/cw1200_spi.c  | 4 +---
 drivers/net/wireless/st/cw1200/fwio.c| 1 -
 3 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/st/cw1200/cw1200_sdio.c 
b/drivers/net/wireless/st/cw1200/cw1200_sdio.c
index 1037ec62659d..82c6827579a9 100644
--- a/drivers/net/wireless/st/cw1200/cw1200_sdio.c
+++ b/drivers/net/wireless/st/cw1200/cw1200_sdio.c
@@ -289,10 +289,8 @@ static int cw1200_sdio_probe(struct sdio_func *func,
return -ENODEV;
 
self = kzalloc(sizeof(*self), GFP_KERNEL);
-   if (!self) {
-   pr_err("Can't allocate SDIO hwbus_priv.\n");
+   if (!self)
return -ENOMEM;
-   }
 
func->card->quirks |= MMC_QUIRK_LENIENT_FN0;
 
diff --git a/drivers/net/wireless/st/cw1200/cw1200_spi.c 
b/drivers/net/wireless/st/cw1200/cw1200_spi.c
index 412fb6e49aed..7b6d5e5c5a62 100644
--- a/drivers/net/wireless/st/cw1200/cw1200_spi.c
+++ b/drivers/net/wireless/st/cw1200/cw1200_spi.c
@@ -399,10 +399,8 @@ static int cw1200_spi_probe(struct spi_device *func)
}
 
self = devm_kzalloc(&func->dev, sizeof(*self), GFP_KERNEL);
-   if (!self) {
-   pr_err("Can't allocate SPI hwbus_priv.");
+   if (!self)
return -ENOMEM;
-   }
 
self->pdata = plat_data;
self->func = func;
diff --git a/drivers/net/wireless/st/cw1200/fwio.c 
b/drivers/net/wireless/st/cw1200/fwio.c
index 30e7646d04af..79dd7a8ffb05 100644
--- a/drivers/net/wireless/st/cw1200/fwio.c
+++ b/drivers/net/wireless/st/cw1200/fwio.c
@@ -153,7 +153,6 @@ static int cw1200_load_firmware_cw1200(struct cw1200_common 
*priv)
 
buf = kmalloc(DOWNLOAD_BLOCK_SIZE, GFP_KERNEL | GFP_DMA);
if (!buf) {
-   pr_err("Can't allocate firmware load buffer.\n");
ret = -ENOMEM;
goto firmware_release;
}
-- 
2.15.1



Re: b53 tags on bpi-r1 (bcm53125)

2017-12-29 Thread Florian Fainelli
Hi Jochen,

Le 12/18/17 à 02:44, Jochen Friedrich a écrit :
> Hi Florian,
> 
> unfortunately, this doesn't make any difference.
> 
> Just out of curiosity, BPI-R1 has pull-down resistors on LED6 and 7
> (MII_MODE0/1). However, the public available 53125U sheet doesn't
> document these pins:
> 
> LED[6] IMP_MODE[0] Pull-up Active low (since IMP Mode is not used)
> LED[7] IMP_MODE[1] Pull-up Active low (since IMP Mode is not used)
> 
> Is this MII mode maybe incompatible with Broadcom tags?

AFAICT, it should not, this is largely independent from enabling
Broadcom tags.

I have now reproduced this on my BPI-R1 as well and am looking at what
might be going wrong.

Thanks!
-- 
Florian


[PATCH] wlcore: Delete an error message for a failed memory allocation in three functions

2017-12-29 Thread SF Markus Elfring
From: Markus Elfring 
Date: Fri, 29 Dec 2017 20:40:49 +0100

Omit an extra message for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/wireless/ti/wlcore/main.c | 8 ++--
 drivers/net/wireless/ti/wlcore/spi.c  | 4 +---
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/ti/wlcore/main.c 
b/drivers/net/wireless/ti/wlcore/main.c
index c346c021b999..970e59871547 100644
--- a/drivers/net/wireless/ti/wlcore/main.c
+++ b/drivers/net/wireless/ti/wlcore/main.c
@@ -1301,10 +1301,8 @@ static struct sk_buff *wl12xx_alloc_dummy_packet(struct 
wl1271 *wl)
sizeof(struct wl1271_tx_hw_descr) - sizeof(*hdr);
 
skb = dev_alloc_skb(TOTAL_TX_DUMMY_PACKET_SIZE);
-   if (!skb) {
-   wl1271_warning("Failed to allocate a dummy packet skb");
+   if (!skb)
return NULL;
-   }
 
skb_reserve(skb, sizeof(struct wl1271_tx_hw_descr));
 
@@ -1421,10 +1419,8 @@ int wl1271_rx_filter_alloc_field(struct wl12xx_rx_filter 
*filter,
field = &filter->fields[filter->num_fields];
 
field->pattern = kzalloc(len, GFP_KERNEL);
-   if (!field->pattern) {
-   wl1271_warning("Failed to allocate RX filter pattern");
+   if (!field->pattern)
return -ENOMEM;
-   }
 
filter->num_fields++;
 
diff --git a/drivers/net/wireless/ti/wlcore/spi.c 
b/drivers/net/wireless/ti/wlcore/spi.c
index 62ce54a949e9..fc94a0d1a01d 100644
--- a/drivers/net/wireless/ti/wlcore/spi.c
+++ b/drivers/net/wireless/ti/wlcore/spi.c
@@ -489,10 +489,8 @@ static int wl1271_probe(struct spi_device *spi)
pdev_data->if_ops = &spi_ops;
 
glue = devm_kzalloc(&spi->dev, sizeof(*glue), GFP_KERNEL);
-   if (!glue) {
-   dev_err(&spi->dev, "can't allocate glue\n");
+   if (!glue)
return -ENOMEM;
-   }
 
glue->dev = &spi->dev;
 
-- 
2.15.1



[PATCH net-next] net: dsa: Fix dsa_legacy_register() return value

2017-12-29 Thread Florian Fainelli
We need to make the dsa_legacy_register() stub return 0 in order for
dsa_init_module() to successfully register and continue registering the
ETH_P_XDSA packet handler.

Fixes: 2a93c1a3651f ("net: dsa: Allow compiling out legacy support")
Reported-by: Egil Hjelmeland 
Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa_priv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index b03665e8fb4e..cefb0c3c6d51 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -103,7 +103,7 @@ void dsa_legacy_unregister(void);
 #else
 static inline int dsa_legacy_register(void)
 {
-   return -ENODEV;
+   return 0;
 }
 
 static inline void dsa_legacy_unregister(void) { }
-- 
2.14.1



Re: [PATCH net-next 2/2] l2tp: add peer_offset parameter

2017-12-29 Thread James Chapman

Sorry for only just seeing this (vacation).

On 28/12/17 19:45, Guillaume Nault wrote:

On Thu, Dec 28, 2017 at 07:23:48PM +0100, Lorenzo Bianconi wrote:

On Dec 28, Guillaume Nault wrote:

After a quick review of L2TPv3 and pseudowires RFCs, I still don't see
how adding some padding between the L2TPv3 header and the payload could
constitute a valid frame. Of course, the base feature is already there,
but after a quick test, it looks like the padding bits aren't
initialised and leak memory.

Do you mean for L2TPv2 or L2TPv3? For L2TPv3 offset/peer_offset are initialized
in l2tp_nl_cmd_session_create()


That's the offsets stored in the l2tp_session_cfg structure. But I was
talking about the xmit path: l2tp_build_l2tpv3_header() doesn't
initialise the padding between the header and the payload. So when
someone activates this option, then every transmitted packet leaks
memory on the wire.


Setting session data offset is already supported in L2TP kernel module
(and could be already used by userspace applications);
for L2TPv2 there is an optional 16-bit value in the header while for L2TPv3
the offset is configured by userspace.
At the moment the kernel (for L2TPv3) uses offset for both tx and rx side.
Userspace (iproute2) allows to distinguish tx offset (offset) from rx one
(peer_offset) but since the rx part is not handled at the moment
(I fixed peer_offset support in iproute2, I have not sent the patch upstream 
yet, attached below)
this leads to a misalignment between tunnel endpoints.
You can easily reproduce the issue using this setup (and the below patch for 
iproute2):

ip l2tp add tunnel local  remote  tunnel_id  peer_tunnel_id  udp_sport 
 udp_dport 
ip l2tp add tunnel local  remote  tunnel_id  peer_tunnel_id  udp_sport 
 udp_dport 

ip l2tp add session name l2tp0 tunnel_id  session_id  peer_session_id 
 offset 8 peer_offset 16
ip l2tp add session name l2tp0 tunnel_id  session_id  peer_session_id 
 offset 16 peer_offset 8


Yes, I'm well aware of that. And thanks for having worked on a full
solution including iproute2. But does one really need to set
asymetrical offset values? It doesn't look wrong to require setting the
same value on both sides. Other options need this, like "l2spec_type".

Here we have an option that:
   * creates invalid packets (AFAIK),
   * is buggy and leaks memory on the network,
   * doesn't seem to have any use case (even the manpage
 says "This is hardly ever used").

So I'm sorry, but I don't see the point in expanding this option to
allow even stranger setups. If there's a use case, then fine.
Otherwise, let's just acknowledge that the "peer_offset" option of
iproute2 is a noop (and maybe remove it from the manpage).

And the kernel "offset" option needs to be fixed. Actually, I wouldn't
mind if it was converted to be a noop, or even rejected. L2TP already
has its share of unused features that complicate the code and hamper
evolution and bug fixing. As I said earlier, if it's buggy, unused and
can't even produce valid packets, then why bothering with it?

But that's just my point of view. James, do you have an opinion on
this?


I agree, Guillaume.

The L2TPv3 protocol RFC dropped the configurable offset of L2TPv2 - 
instead, the Layer-2-Specific-Sublayer is supposed to handle any 
transport-specific data alignment requirements. I think a configurable 
offset has found its way into iproute2 l2tp commands by mistake, perhaps 
because the netlink API defines an attribute for it, but which was only 
intended for use with L2TPv2. For L2TPv2, we only configure the offset 
for transmitted packets. In received packets, the offset (if present) is 
obtained from the L2TPv2 header in each received packet. There is no 
need to add a peer-offset netlink attribute to set the offset expected 
in received packets.


Lorenzo, is this being added to fix interoperability with another L2TPv3 
implementation? If so, can you share more details?






[net 1/1] tipc: fix problems with multipoint-to-point flow control

2017-12-29 Thread Jon Maloy
In commit 04d7b574b245 ("tipc: add multipoint-to-point flow control") we
introduced a protocol for preventing buffer overflow when many group
members try to simultaneously send messages to the same receiving member.

Stress test of this mechanism has revealed a couple of related bugs:

- When the receiving member receives an advertisement REMIT message from
  one of the senders, it will sometimes prematurely activate a pending
  member and send it the remitted advertisement, although the upper
  limit for active senders has been reached. This leads to accumulation
  of illegal advertisements, and eventually to messages being dropped
  because of receive buffer overflow.

- When the receiving member leaves REMITTED state while a received
  message is being read, we miss to look at the pending queue, to
  activate the oldest pending peer. This leads to some pending senders
  being starved out, and never getting the opportunity to profit from
  the remitted advertisement.

We fix the former in the function tipc_group_proto_rcv() by returning
directly from the function once it becomes clear that the remitting
peer cannot leave REMITTED state at that point.

We fix the latter in the function tipc_group_update_rcv_win() by looking
up and activate the longest pending peer when it becomes clear that the
remitting peer now can leave REMITTED state.

Signed-off-by: Jon Maloy 
---
 net/tipc/group.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/tipc/group.c b/net/tipc/group.c
index 8e12ab5..5f4ffae 100644
--- a/net/tipc/group.c
+++ b/net/tipc/group.c
@@ -109,7 +109,8 @@ static void tipc_group_proto_xmit(struct tipc_group *grp, 
struct tipc_member *m,
 static void tipc_group_decr_active(struct tipc_group *grp,
   struct tipc_member *m)
 {
-   if (m->state == MBR_ACTIVE || m->state == MBR_RECLAIMING)
+   if (m->state == MBR_ACTIVE || m->state == MBR_RECLAIMING ||
+   m->state == MBR_REMITTED)
grp->active_cnt--;
 }
 
@@ -562,7 +563,7 @@ void tipc_group_update_rcv_win(struct tipc_group *grp, int 
blks, u32 node,
int max_active = grp->max_active;
int reclaim_limit = max_active * 3 / 4;
int active_cnt = grp->active_cnt;
-   struct tipc_member *m, *rm;
+   struct tipc_member *m, *rm, *pm;
 
m = tipc_group_find_member(grp, node, port);
if (!m)
@@ -605,6 +606,17 @@ void tipc_group_update_rcv_win(struct tipc_group *grp, int 
blks, u32 node,
pr_warn_ratelimited("Rcv unexpected msg after REMIT\n");
tipc_group_proto_xmit(grp, m, GRP_ADV_MSG, xmitq);
}
+   grp->active_cnt--;
+   list_del_init(&m->list);
+   if (list_empty(&grp->pending))
+   return;
+
+   /* Set oldest pending member to active and advertise */
+   pm = list_first_entry(&grp->pending, struct tipc_member, list);
+   pm->state = MBR_ACTIVE;
+   list_move_tail(&pm->list, &grp->active);
+   grp->active_cnt++;
+   tipc_group_proto_xmit(grp, pm, GRP_ADV_MSG, xmitq);
break;
case MBR_RECLAIMING:
case MBR_DISCOVERED:
@@ -742,14 +754,14 @@ void tipc_group_proto_rcv(struct tipc_group *grp, bool 
*usr_wakeup,
if (!m || m->state != MBR_RECLAIMING)
return;
 
-   list_del_init(&m->list);
-   grp->active_cnt--;
remitted = msg_grp_remitted(hdr);
 
/* Messages preceding the REMIT still in receive queue */
if (m->advertised > remitted) {
m->state = MBR_REMITTED;
in_flight = m->advertised - remitted;
+   m->advertised = ADV_IDLE + in_flight;
+   return;
}
/* All messages preceding the REMIT have been read */
if (m->advertised <= remitted) {
@@ -761,6 +773,8 @@ void tipc_group_proto_rcv(struct tipc_group *grp, bool 
*usr_wakeup,
tipc_group_proto_xmit(grp, m, GRP_ADV_MSG, xmitq);
 
m->advertised = ADV_IDLE + in_flight;
+   grp->active_cnt--;
+   list_del_init(&m->list);
 
/* Set oldest pending member to active and advertise */
if (list_empty(&grp->pending))
-- 
2.1.4



[PATCH net] cxgb4: Fix FW flash errors

2017-12-29 Thread Ganesh Goudar
From: Arjun Vynipadath 

Initialize adapter->params.sf_fw_start to fix firmware flash
issues. Use existing macros defined for FW flash addresses.

Fixes: 96ac18f14a5a ("cxgb4: Add support for new flash parts")
Signed-off-by: Arjun Vynipadath 
Signed-off-by: Casey Leedom 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  1 -
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 17 -
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 6f9fa6e..d8424ed 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -344,7 +344,6 @@ struct adapter_params {
 
unsigned int sf_size; /* serial flash size in bytes */
unsigned int sf_nsec; /* # of flash sectors */
-   unsigned int sf_fw_start; /* start of FW image in flash */
 
unsigned int fw_vers; /* firmware version */
unsigned int bs_vers; /* bootstrap version */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c 
b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index f63210f..375ef86 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -2844,8 +2844,6 @@ enum {
SF_RD_DATA_FAST = 0xb,/* read flash */
SF_RD_ID= 0x9f,   /* read ID */
SF_ERASE_SECTOR = 0xd8,   /* erase sector */
-
-   FW_MAX_SIZE = 16 * SF_SEC_SIZE,
 };
 
 /**
@@ -3558,8 +3556,9 @@ int t4_load_fw(struct adapter *adap, const u8 *fw_data, 
unsigned int size)
const __be32 *p = (const __be32 *)fw_data;
const struct fw_hdr *hdr = (const struct fw_hdr *)fw_data;
unsigned int sf_sec_size = adap->params.sf_size / adap->params.sf_nsec;
-   unsigned int fw_img_start = adap->params.sf_fw_start;
-   unsigned int fw_start_sec = fw_img_start / sf_sec_size;
+   unsigned int fw_start_sec = FLASH_FW_START_SEC;
+   unsigned int fw_size = FLASH_FW_MAX_SIZE;
+   unsigned int fw_start = FLASH_FW_START;
 
if (!size) {
dev_err(adap->pdev_dev, "FW image has no data\n");
@@ -3575,9 +3574,9 @@ int t4_load_fw(struct adapter *adap, const u8 *fw_data, 
unsigned int size)
"FW image size differs from size in FW header\n");
return -EINVAL;
}
-   if (size > FW_MAX_SIZE) {
+   if (size > fw_size) {
dev_err(adap->pdev_dev, "FW image too large, max is %u bytes\n",
-   FW_MAX_SIZE);
+   fw_size);
return -EFBIG;
}
if (!t4_fw_matches_chip(adap, hdr))
@@ -3604,11 +3603,11 @@ int t4_load_fw(struct adapter *adap, const u8 *fw_data, 
unsigned int size)
 */
memcpy(first_page, fw_data, SF_PAGE_SIZE);
((struct fw_hdr *)first_page)->fw_ver = cpu_to_be32(0x);
-   ret = t4_write_flash(adap, fw_img_start, SF_PAGE_SIZE, first_page);
+   ret = t4_write_flash(adap, fw_start, SF_PAGE_SIZE, first_page);
if (ret)
goto out;
 
-   addr = fw_img_start;
+   addr = fw_start;
for (size -= SF_PAGE_SIZE; size; size -= SF_PAGE_SIZE) {
addr += SF_PAGE_SIZE;
fw_data += SF_PAGE_SIZE;
@@ -3618,7 +3617,7 @@ int t4_load_fw(struct adapter *adap, const u8 *fw_data, 
unsigned int size)
}
 
ret = t4_write_flash(adap,
-fw_img_start + offsetof(struct fw_hdr, fw_ver),
+fw_start + offsetof(struct fw_hdr, fw_ver),
 sizeof(hdr->fw_ver), (const u8 *)&hdr->fw_ver);
 out:
if (ret)
-- 
2.1.0



Re: [PATCH RESEND 1/3] net: Fix possible race in peernet2id_alloc()

2017-12-29 Thread Eric W. Biederman
Kirill Tkhai  writes:

> peernet2id_alloc() is racy without rtnl_lock() as atomic_read(&peer->count)
> under net->nsid_lock does not guarantee, peer is alive:
>
> rcu_read_lock()
> peernet2id_alloc()..
>   spin_lock_bh(&net->nsid_lock)   ..
>   atomic_read(&peer->count) == 1  ..
>   ..  put_net()
>   ..cleanup_net()
>   ..  for_each_net(tmp)
>   ..
> spin_lock_bh(&tmp->nsid_lock)
>   ..__peernet2id(tmp, net) == 
> -1
>   ....
>   ....
> __peernet2id_alloc(alloc == true)   ..
>   ....
> rcu_read_unlock()   ..
> ..synchronize_rcu()
> ..kmem_cache_free(net)
>
> After the above situation, net::netns_id contains id pointing to freed memory,
> and any other dereferencing by the id will operate with this freed memory.
>
> Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
> ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
> is generic interface, and better we fix it before someone really starts
> use it in wrong context.

Nacked-by: "Eric W. Biederman" 

I have already made a clear objection to the first unnecessary and
confusing hunk.  Simply resending the muddle headed code doesn't make it
better.

Eric


> Signed-off-by: Kirill Tkhai 
> ---
>  net/core/net_namespace.c |   23 +++
>  1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 60a71be75aea..6a4eab438221 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -221,17 +221,32 @@ static void rtnl_net_notifyid(struct net *net, int cmd, 
> int id);
>   */
>  int peernet2id_alloc(struct net *net, struct net *peer)
>  {
> - bool alloc;
> + bool alloc = false, alive = false;
>   int id;
>  
> - if (atomic_read(&net->count) == 0)
> - return NETNSA_NSID_NOT_ASSIGNED;
>   spin_lock_bh(&net->nsid_lock);
> - alloc = atomic_read(&peer->count) == 0 ? false : true;
> + /* Spinlock guarantees we never hash a peer to net->netns_ids
> +  * after idr_destroy(&net->netns_ids) occurs in cleanup_net().
> +  */
> + if (atomic_read(&net->count) == 0) {
> + id = NETNSA_NSID_NOT_ASSIGNED;
> + goto unlock;
> + }
> + /*
> +  * When peer is obtained from RCU lists, we may race with
> +  * its cleanup. Check whether it's alive, and this guarantees
> +  * we never hash a peer back to net->netns_ids, after it has
> +  * just been idr_remove()'d from there in cleanup_net().
> +  */
> + if (maybe_get_net(peer))
> + alive = alloc = true;
>   id = __peernet2id_alloc(net, peer, &alloc);
> +unlock:
>   spin_unlock_bh(&net->nsid_lock);
>   if (alloc && id >= 0)
>   rtnl_net_notifyid(net, RTM_NEWNSID, id);
> + if (alive)
> + put_net(peer);
>   return id;
>  }
>  EXPORT_SYMBOL_GPL(peernet2id_alloc);


Re: [RFT net-next v3 0/5] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-29 Thread Jerome Brunet
On Fri, 2017-12-29 at 02:31 +0100, Emiliano Ingrassia wrote:
> Hi Martin, Hi Dave,
> 
> On Thu, Dec 28, 2017 at 11:21:23PM +0100, Martin Blumenstingl wrote:
> > Hi Dave,
> > 
> > please do not apply this series until it got a Tested-by from Emiliano.
> > 
> > 
> > Hi Emiliano,
> > 
> > you reported [0] that you couldn't get dwmac-meson8b to work on your
> > Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
> > I think I was able to find a fix: it consists of two patches (which you
> > find in this series)
> > 
> > Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
> > only partially test this (I could only check if the clocks were
> > calculated correctly when using a dummy 52394Hz input clock instead
> > of MPLL2).
> > 
> > Could you please give this series a try and let me know about the
> > results?
> > You obviously still need your two "ARM: dts: meson8b" patches which
> > - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
> > - enable Ethernet on the Odroid-C1
> > 
> > When testing on Meson8b this also needs a fix for the MPLL clock driver:
> > "clk: meson: mpll: use 64-bit maths in params_from_rate", see:
> > https://patchwork.kernel.org/patch/10131677/
> > 
> > 
> > I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
> > and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
> > fine (so let's hope that this also fixes your Meson8b issue :)).
> > 
> > 
> > changes since v1 at [1]:
> > - changed the subject of the cover-letter to indicate that this is all
> >   about the RGMII clock
> > - added PATCH #1 which ensures that we don't unnecessarily change the
> >   parent clocks in RMII mode (and also makes the code easier to
> >   understand)
> > - changed subject of PATCH #2 (formerly PATCH #1) to state that this
> >   is about the RGMII clock
> > - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
> > - replaced PATCH #3 (formerly PATCH #2) with one that sets
> >   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
> >   on Meson8b correctly
> > 
> > changes since v2 at [2]:
> > - added PATCH #2 to make the following patch easier
> > - Emiliano reported that there's currently another bug in the
> >   dwmac-meson8b driver which prevents it from working with RGMII PHYs on
> >   Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
> >   (instead of a divide by 5 or divide by 10 clock divider). This has not
> >   been visible on GXBB and later due to the input clock which always led
> >   to a selection of "divide by 10" (which is done internally in the IP
> >   block, but the bit actually means "enable RGMII clock output").
> >   PATCH #3 was added to address this issue.
> > - the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
> >   updated and the patch itself rebased because the m25_div clock was
> >   removed with the new PATCH #3 (so some of the statements were not
> >   valid anymore)
> > 
> 
> Here is the clk_summary relative to ethernet on Odroid-C1+
> with this new series applied:
> 
> xtal  112400  0 0
>  sys_pll  00  12  0 0
>   cpu_clk 00  12  0 0
>  vid_pll  00   73200  0 0
>  fixed_pll22  255000  0 0
>   mpll2   11   24701  
> 0 0
>c941.ethernet#m250_sel   11   24701  0 0
> c941.ethernet#m250_div11   24701  
> 0 0
>  c941.ethernet#fixed_div10  112470  0 0
>   c941.ethernet#m25_en112470  
> 0 0
> 
> The ethernet prg0 register is set to 0x74A1 which should be correct with
> respect to the information contained in the S805 SoC manual.
> Actually, the ethernet is not yet fully functional.
> Trying to ping the board, I can see ARP request from host to board using
> tcpdump. However, the host can't see any response.

If the rx path is ok-ish, I suppose the clock setting applied is good.
Maybe you could try to play with the tx delay (BIT 5-6 of the register) ?

> 
> Following the U-Boot value for prg0 register, which is 0x7d21, I also
> tried to set bit 11. As expected, this did not have any influence.
> Another thing that we should check is the "Ethernet Memory PD" (see S805
> manual - sec. 5.4) register which bits 3-2 enable/disable ethernet
> normal operation. However, those bits are already cleared by U-Boot.
> 
> Thank you for the support.
> 
> Best regards,
> 
> Emiliano
> 
> > 
> > [0] 
> > http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005596.html
> > [1] 
> > http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005848.html
> > [2] 
> > http:

[PATCH net] ethtool: do not print warning for applications using legacy API

2017-12-29 Thread Stephen Hemminger
From: Stephen Hemminger 

In kernel log ths message appears on every boot:
 "warning: `NetworkChangeNo' uses legacy ethtool link settings API,
  link modes are only partially reported"

When ethtool link settings API changed, it started complaining about
usages of old API. Ironically, the original patch was from google but
the application using the legacy API is chrome.

Linux ABI is fixed as much as possible. The kernel must not break it
and should not complain about applications using legacy API's.
This patch just removes the warning since using legacy API's
in Linux is perfectly acceptable.

Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Signed-off-by: Stephen Hemminger 
---
 net/core/ethtool.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f8fcf450a36e..8225416911ae 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -770,15 +770,6 @@ static int ethtool_set_link_ksettings(struct net_device 
*dev,
return dev->ethtool_ops->set_link_ksettings(dev, &link_ksettings);
 }
 
-static void
-warn_incomplete_ethtool_legacy_settings_conversion(const char *details)
-{
-   char name[sizeof(current->comm)];
-
-   pr_info_once("warning: `%s' uses legacy ethtool link settings API, 
%s\n",
-get_task_comm(name, current), details);
-}
-
 /* Query device for its ethtool_cmd settings.
  *
  * Backward compatibility note: for compatibility with legacy ethtool,
@@ -805,10 +796,8 @@ static int ethtool_get_settings(struct net_device *dev, 
void __user *useraddr)
   &link_ksettings);
if (err < 0)
return err;
-   if (!convert_link_ksettings_to_legacy_settings(&cmd,
-  &link_ksettings))
-   warn_incomplete_ethtool_legacy_settings_conversion(
-   "link modes are only partially reported");
+   convert_link_ksettings_to_legacy_settings(&cmd,
+ &link_ksettings);
 
/* send a sensible cmd tag back to user */
cmd.cmd = ETHTOOL_GSET;
-- 
2.11.0



Re: [RFT net-next v3 3/5] net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration

2017-12-29 Thread Jerome Brunet
On Thu, 2017-12-28 at 23:21 +0100, Martin Blumenstingl wrote:
> While testing the dwmac-meson8b with an RGMII PHY on Meson8b we
> discovered that the m25_div is not actually a divider but rather a gate.
> This matches with the datasheet which describes bit 10 as "Generate
> 25MHz clock for PHY". Back when the driver was written it was assumed
> that this was a divider (which could divide by 5 or 10) because other
> clock bits in the datasheet were documented incorrectly.

Maybe this bit 10 is indeed a 5/10 divider, as amlogic claims it is. Maybe, as
Emiliano suggested, the output rate of div250 actually needs to be 250Mhz in
RGMII, before being divided by 10 to produce the 25MHz of div25

IOW, maybe we need this intermediate rate.
It would not be surprising, 1GBps usually requires a 125MHz clock somewhere. 

This is still doable:
* Keep the divider
* drop CLK_SET_RATE_PARENT on div25
* call set_rate on div250 with 250MHz then on div25 with 25Mhz


> 
> Tests have shown that without bit 10 set the RTL8211F RGMII PHY on
> Odroid-C1 (using a Meson8b SoC) does not work.
> On GXBB and newer SoCs (where the driver was initially tested with RGMII
> PHYs) this is not a problem because the input clock is running at 1GHz.
> The m250_div clock's biggest possible divider is 7 (3-bit divider, with
> value 0 being reserved). Thus we end up with a m250_div of 4 and a
> "m25_div" of 10 (= register value 1).
> 
> Instead it turns out that the Ethernet IP block seems to have a fixed
> "divide by 10" clock internally. This means that bit 10 is a gate clock
> which enables the RGMII clock output.
> 
> This replaces the "m25_div" clock with a clock gate called "m25_en"
> which ensures that we can set this bit to 1 whenever we enable RGMII
> mode. This however means that we are now missing a "divide by 10" after
> the m250_div (and before our new m25_en), otherwise the common clock
> framework thinks that the rate of the m25_en clock is 10-times higher
> than it is in the actual hardware. That is solved by adding a
> fixed-factor clock which divides the m250_div output by 10.
> 
> Fixes: 566e8251625304 ("net: stmmac: add a glue driver for the Amlogic Meson 
> 8b / GXBB DWMAC")
> Reported-by: Emiliano Ingrassia 
> Signed-off-by: Martin Blumenstingl 
> ---
>  .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 66 
> +-
>  1 file changed, 38 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
> index 1c14210df465..7199e8c08536 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
> @@ -40,9 +40,7 @@
>  #define PRG_ETH0_CLK_M250_DIV_SHIFT7
>  #define PRG_ETH0_CLK_M250_DIV_WIDTH3
>  
> -/* divides the result of m25_sel by either 5 (bit unset) or 10 (bit set) */
> -#define PRG_ETH0_CLK_M25_DIV_SHIFT 10
> -#define PRG_ETH0_CLK_M25_DIV_WIDTH 1
> +#define PRG_ETH0_GENERATE_25M_PHY_CLOCK10
>  
>  #define PRG_ETH0_INVERTED_RMII_CLK BIT(11)
>  #define PRG_ETH0_TX_AND_PHY_REF_CLKBIT(12)
> @@ -63,8 +61,11 @@ struct meson8b_dwmac {
> struct clk_divider  m250_div;
> struct clk  *m250_div_clk;
>  
> -   struct clk_divider  m25_div;
> -   struct clk  *m25_div_clk;
> +   struct clk_fixed_factor fixed_div10;
> +   struct clk  *fixed_div10_clk;
> +
> +   struct clk_gate m25_en;
> +   struct clk  *m25_en_clk;

Maybe it could be the topic of another series but we don't need to keep all
those structures around, thanks to devm

all clk_divider, fixed_factor, gate and mux can go away
You only need to keep  the'struct clk*' you are going to use later on

at the moment it would be m25_en_clk only.

>  
> u32 tx_delay_ns;



Re: [patch net-next v2 00/10] Add support for resource abstraction

2017-12-29 Thread Arkadi Sharshevsky


On 12/28/2017 06:33 PM, David Ahern wrote:
> On 12/28/17 10:23 AM, Jiri Pirko wrote:
>>> So there are 4 tables exported to userspace:
>>> 
>>> 1. mlxsw_erif table which is not in any of the kvd regions (no
>>> resource path is given) and it has a size of 1000. Does
>>> mlxsw_erif mean a rif as in Router Interfaces? So the switch
>>> supports up to 1000 router interfaces.
>>> 
>>> 2. mlxsw_host4 in /kvd/hash_single with a size of 62. Based on
>>> the
>> Size tells you the actual size. It cannot give you max size. The
>> reason is simple. The resources are shared among multiple tables.
>> That is exactly what this resource patch makes visible.
>> 
>> 
> 
> In the erif table, the 1000 is the max not current usage. I do not
> have 1000 interfaces:
> 
> $ ip -br li sh | wc -l 597
> 
> 
> $ devlink dpipe table dump pci/:03:00.0 name mlxsw_erif ... index
> 503 match_value: type field_exact header mlxsw_meta field erif_port
> mapping ifindex mapping_value 601 value 503 action_value: type
> field_modify header mlxsw_meta field l3_forward value 1
> 
> 
> The host4 table it is current size with no maximum.
> 
> The meaning of table size needs to be consistent across tables.
> 

You are right the egress RIF table size is not correct, I will
definitely fix it, but it is not what you think it should be. So in
order to clarify this point, just a reminder:

1. Both dpipe and devlink resource are abstraction models for
hardware entities, and as a result they true to provide generic objects.
Each driver/ASIC should register his own and it absolutely proprietary
implementation. There is absolutely NO industry standard here, the only
thing that resembles a standard is that dpipe looks a bit like P4 only
because its proved to be useful for describing packet forwarding
pipelines. The host4 table is just a hardware process in the mellanox
spectrum ASIC pipeline and it should not be part of ABI, sorry I clearly
don't understand how this even came up.

2. Dpipe table is a single hardware process, most of the time it uses
some resources (for example LPM algorithm uses hash memory).

3. ERIF table is a table that is located in the end of the L3 pipeline.
The current dpipe description is not complete and that why it caused
confusion. The table performs match on rif index and packet type
(UC/MC/BC) and performs forward/drop decision. As you can see, for each
rif the table can have several entries, which provide different
statistics for different traffic types per rif, currently only the UC
is exposed with forward.

4. ASICs use shared resource for many processes, this is exactly the
behavior we want to expose!

Again, the size of the ERIF table should NOT provide the number of
rifs which are in use, simply because dpipe tables do not describe
hardware resources.

In the future the RIF bank will be exported as resource object with size
of 1000, and in order to observe how much are in use you should check
its occupancy. This is the whole reason of this interface.



Re: ACPI issues on cold power on [bisected]

2017-12-29 Thread Jonathan McDowell
On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote:
> On Fri, Dec 08, 2017 at 03:11:59PM +, Jonathan McDowell wrote:
> > I've been sitting on this for a while and should have spent time to
> > investigate sooner, but it's been an odd failure mode that wasn't quite
> > obvious.
> > 
> > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I
> > don't see anything after grub says its booting. In 4.10 onwards the
> > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy
> > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below;
> > taken from 4.12 as that's the most recent error dmesg I have saved but
> > also seen back in 4.10. It's always address 0x30 for the dereference.
> > 
> > Rebooting the laptop does not lead to these problems; it's *only* from a
> > complete cold boot that they arise (which didn't help me in terms of
> > being able to reliably bisect). Once I realised that I was able to
> > bisect, but it leads me to an odd commit:
> > 
> > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6
> > (mm/slab: fix kmemcg cache creation delayed issue)
> > 
> > If I revert this then I can cold boot without problems.
> > 
> > Also I don't see the problem with a stock Debian kernel, I think because
> > the ACPI support is modularised.
> 
> Sorry for late response. I was on a long vacation.

No problem. I've been trying to get around to diagnosing this for a
while now anyway and this isn't a great time of year for fast responses.

> I have tried to solve the problem however I don't find any clue yet.
> 
> >From my analysis, oops report shows that 'struct sock *ssk' passed to
> netlink_broadcast_filtered() is NULL. It means that some of
> netlink_kernel_create() returns NULL. Maybe, it is due to slab
> allocation failure. Could you check it by inserting some log on that
> part? The issue cannot be reproducible in my side so I need your help.

I've added some debug in acpi_bus_generate_netlink_event +
genlmsg_multicast and the problem seems to be that genlmsg_multicast is
getting called when init_net.genl_sock has not yet been initialised,
leading to the NULL deference.

Full dmesg output from a cold 4.14.8 boot at:

https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken

And the same kernel after a reboot ("shutdown -r now"):

https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working

Patch that I've applied is at

https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff

The interesting difference seems to be:

 PCI: Using ACPI for IRQ routing
+ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492)
+ERROR: init_net.genl_sock is NULL
+BUG: unable to handle kernel NULL pointer dereference at 0030
+IP: netlink_broadcast_filtered+0x20/0x3d0
+PGD 0 P4D 0 
+Oops:  [#1] SMP
+Modules linked in:
+CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1
+Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017
+Workqueue: kacpi_notify acpi_os_execute_deferred

9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and
there's no visible event for it on a reboot, just on a cold power on.
Some sort of ordering issues such that genl_sock is being initialised
later with the slab change?

J.

-- 
  Hail Eris. All hail Discordia.   |  .''`.  Debian GNU/Linux Developer
  Fnord?   | : :' :  Happy to accept PGP signed
   | `. `'   or encrypted mail - RSA
   |   `-key on the keyservers.


KMSAN reports use of uninitialized memory in pfkey_sendmsg()

2017-12-29 Thread Alexander Potapenko
Hi all,

KMSAN reports a use of uninitialized value on the following program:

==
// autogenerated by syzkaller (http://github.com/google/syzkaller)

#include 
#include 
#include 

int main()
{
  int sock = socket(PF_KEY, SOCK_RAW, 2);
  char buf[24];
  memset(buf, 0, 24);
  buf[0] = 2;
  buf[1] = 4;
  buf[4] = 3;
  buf[16] = 1;
  buf[18] = 0x17;  // SADB_X_EXT_NAT_T_OA
  struct msghdr hdr;
  memset(&hdr, 0, sizeof(struct msghdr));
  struct iovec iov;
  hdr.msg_iov = &iov;
  hdr.msg_iovlen = 1;
  iov.iov_base = buf;
  iov.iov_len = 0x18;
  sendmsg(sock, &hdr, 0);
}
==

the report is below:

==
BUG: KMSAN: use of uninitialized memory in pfkey_sendmsg+0x11ad/0x1900
CPU: 0 PID: 2946 Comm: probe Not tainted 4.13.0+ #3626
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 show_stack+0x12f/0x180 arch/x86/kernel/dumpstack.c:177
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x185/0x1d0 lib/dump_stack.c:52
 kmsan_report+0x137/0x1c0 mm/kmsan/kmsan.c:1066
 __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:676
 verify_address_len net/key/af_key.c:404
 parse_exthdrs net/key/af_key.c:532
 pfkey_process net/key/af_key.c:2811
 pfkey_sendmsg+0x11ad/0x1900 net/key/af_key.c:3654
 sock_sendmsg_nosec net/socket.c:633
 sock_sendmsg net/socket.c:643
 ___sys_sendmsg+0xed5/0x1330 net/socket.c:2035
 __sys_sendmsg net/socket.c:2069
 SYSC_sendmsg+0x2a6/0x3d0 net/socket.c:2080
 SyS_sendmsg+0x54/0x80 net/socket.c:2076
 do_syscall_64+0x2f4/0x420 arch/x86/entry/common.c:284
 entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
RIP: 0033:0x401140
RSP: 002b:7ffe52abc9e8 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX: 004002b0 RCX: 00401140
RDX:  RSI: 7ffe52abca10 RDI: 0003
RBP: 7ffe52abca80 R08: 0002 R09: 0004
R10: 0004 R11: 0246 R12: 
R13: 004063e0 R14: 00406470 R15: 
origin:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:63
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303
 kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213
 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:339
 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:346
 slab_post_alloc_hook mm/slab.h:442
 slab_alloc_node mm/slub.c:2724
 __kmalloc_node_track_caller+0xa46/0x1230 mm/slub.c:4335
 __kmalloc_reserve net/core/skbuff.c:139
 __alloc_skb+0x2c6/0x9f0 net/core/skbuff.c:232
 alloc_skb ./include/linux/skbuff.h:904
 pfkey_sendmsg+0x201/0x1900 net/key/af_key.c:3641
 sock_sendmsg_nosec net/socket.c:633
 sock_sendmsg net/socket.c:643
 ___sys_sendmsg+0xed5/0x1330 net/socket.c:2035
 __sys_sendmsg net/socket.c:2069
 SYSC_sendmsg+0x2a6/0x3d0 net/socket.c:2080
 SyS_sendmsg+0x54/0x80 net/socket.c:2076
 do_syscall_64+0x2f4/0x420 arch/x86/entry/common.c:284
 return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
==

Apparently pfkey_sendmsg reads skb past the end of the buffer copied
from the userspace.
Could someone please take a look?

Thanks,
-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


Re: KMSAN reports use of uninitialized memory in pfkey_sendmsg()

2017-12-29 Thread Dmitry Vyukov
On Fri, Dec 29, 2017 at 5:48 PM, Alexander Potapenko  wrote:
> Hi all,
>
> KMSAN reports a use of uninitialized value on the following program:
>
> ==
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
>
> #include 
> #include 
> #include 
>
> int main()
> {
>   int sock = socket(PF_KEY, SOCK_RAW, 2);
>   char buf[24];
>   memset(buf, 0, 24);
>   buf[0] = 2;
>   buf[1] = 4;
>   buf[4] = 3;
>   buf[16] = 1;
>   buf[18] = 0x17;  // SADB_X_EXT_NAT_T_OA
>   struct msghdr hdr;
>   memset(&hdr, 0, sizeof(struct msghdr));
>   struct iovec iov;
>   hdr.msg_iov = &iov;
>   hdr.msg_iovlen = 1;
>   iov.iov_base = buf;
>   iov.iov_len = 0x18;
>   sendmsg(sock, &hdr, 0);
> }
> ==
>
> the report is below:
>
> ==
> BUG: KMSAN: use of uninitialized memory in pfkey_sendmsg+0x11ad/0x1900
> CPU: 0 PID: 2946 Comm: probe Not tainted 4.13.0+ #3626
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  show_stack+0x12f/0x180 arch/x86/kernel/dumpstack.c:177
>  __dump_stack lib/dump_stack.c:16
>  dump_stack+0x185/0x1d0 lib/dump_stack.c:52
>  kmsan_report+0x137/0x1c0 mm/kmsan/kmsan.c:1066
>  __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:676
>  verify_address_len net/key/af_key.c:404
>  parse_exthdrs net/key/af_key.c:532
>  pfkey_process net/key/af_key.c:2811
>  pfkey_sendmsg+0x11ad/0x1900 net/key/af_key.c:3654
>  sock_sendmsg_nosec net/socket.c:633
>  sock_sendmsg net/socket.c:643
>  ___sys_sendmsg+0xed5/0x1330 net/socket.c:2035
>  __sys_sendmsg net/socket.c:2069
>  SYSC_sendmsg+0x2a6/0x3d0 net/socket.c:2080
>  SyS_sendmsg+0x54/0x80 net/socket.c:2076
>  do_syscall_64+0x2f4/0x420 arch/x86/entry/common.c:284
>  entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
> RIP: 0033:0x401140
> RSP: 002b:7ffe52abc9e8 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 004002b0 RCX: 00401140
> RDX:  RSI: 7ffe52abca10 RDI: 0003
> RBP: 7ffe52abca80 R08: 0002 R09: 0004
> R10: 0004 R11: 0246 R12: 
> R13: 004063e0 R14: 00406470 R15: 
> origin:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:63
>  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303
>  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213
>  kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:339
>  kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:346
>  slab_post_alloc_hook mm/slab.h:442
>  slab_alloc_node mm/slub.c:2724
>  __kmalloc_node_track_caller+0xa46/0x1230 mm/slub.c:4335
>  __kmalloc_reserve net/core/skbuff.c:139
>  __alloc_skb+0x2c6/0x9f0 net/core/skbuff.c:232
>  alloc_skb ./include/linux/skbuff.h:904
>  pfkey_sendmsg+0x201/0x1900 net/key/af_key.c:3641
>  sock_sendmsg_nosec net/socket.c:633
>  sock_sendmsg net/socket.c:643
>  ___sys_sendmsg+0xed5/0x1330 net/socket.c:2035
>  __sys_sendmsg net/socket.c:2069
>  SYSC_sendmsg+0x2a6/0x3d0 net/socket.c:2080
>  SyS_sendmsg+0x54/0x80 net/socket.c:2076
>  do_syscall_64+0x2f4/0x420 arch/x86/entry/common.c:284
>  return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
> ==
>
> Apparently pfkey_sendmsg reads skb past the end of the buffer copied
> from the userspace.
> Could someone please take a look?

+Eric


[PATCH] 3c59x: fix missing dma_mapping_error check

2017-12-29 Thread Neil Horman
A few spots in 3c59x missed calls to dma_mapping_error checks, casuing
WARN_ONS to trigger.  Clean those up.

Signed-off-by: Neil Horman 
CC: Steffen Klassert 
CC: "David S. Miller" 
Reported-by: tedheads...@gmail.com
---
 drivers/net/ethernet/3com/3c59x.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c 
b/drivers/net/ethernet/3com/3c59x.c
index f4e13a7014bd..6be9212f9093 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -1729,6 +1729,7 @@ vortex_open(struct net_device *dev)
struct vortex_private *vp = netdev_priv(dev);
int i;
int retval;
+   dma_addr_t dma;
 
/* Use the now-standard shared IRQ implementation. */
if ((retval = request_irq(dev->irq, vp->full_bus_master_rx ?
@@ -1753,7 +1754,11 @@ vortex_open(struct net_device *dev)
break;  /* Bad news!  */
 
skb_reserve(skb, NET_IP_ALIGN); /* Align IP on 16 byte 
boundaries */
-   vp->rx_ring[i].addr = 
cpu_to_le32(pci_map_single(VORTEX_PCI(vp), skb->data, PKT_BUF_SZ, 
PCI_DMA_FROMDEVICE));
+   dma = pci_map_single(VORTEX_PCI(vp), skb->data,
+PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
+   if (dma_mapping_error(&VORTEX_PCI(vp)->dev, dma))
+   break;
+   vp->rx_ring[i].addr = cpu_to_le32(dma);
}
if (i != RX_RING_SIZE) {
pr_emerg("%s: no memory for rx ring\n", dev->name);
@@ -2067,6 +2072,9 @@ vortex_start_xmit(struct sk_buff *skb, struct net_device 
*dev)
int len = (skb->len + 3) & ~3;
vp->tx_skb_dma = pci_map_single(VORTEX_PCI(vp), skb->data, len,
PCI_DMA_TODEVICE);
+   if (dma_mapping_error(&VORTEX_PCI(vp)->dev, vp->tx_skb_dma))
+   return NETDEV_TX_OK;
+
spin_lock_irq(&vp->window_lock);
window_set(vp, 7);
iowrite32(vp->tx_skb_dma, ioaddr + Wn7_MasterAddr);
@@ -2594,6 +2602,7 @@ boomerang_rx(struct net_device *dev)
void __iomem *ioaddr = vp->ioaddr;
int rx_status;
int rx_work_limit = vp->dirty_rx + RX_RING_SIZE - vp->cur_rx;
+   dma_addr_t dma;
 
if (vortex_debug > 5)
pr_debug("boomerang_rx(): status %4.4x\n", 
ioread16(ioaddr+EL3_STATUS));
@@ -2673,7 +2682,11 @@ boomerang_rx(struct net_device *dev)
break;  /* Bad news!  */
}
 
-   vp->rx_ring[entry].addr = 
cpu_to_le32(pci_map_single(VORTEX_PCI(vp), skb->data, PKT_BUF_SZ, 
PCI_DMA_FROMDEVICE));
+   dma = pci_map_single(VORTEX_PCI(vp), skb->data,
+PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
+   if (dma_mapping_error(&VORTEX_PCI(vp)->dev, dma))
+   break;
+   vp->rx_ring[entry].addr = cpu_to_le32(dma);
vp->rx_skbuff[entry] = skb;
}
vp->rx_ring[entry].status = 0;  /* Clear complete bit. */
-- 
2.14.3



Re: [PATCH net v2] netns, rtnetlink: fix struct net reference leak

2017-12-29 Thread Craig Gallek
On Sat, Dec 23, 2017 at 5:12 PM, Nicolas Dichtel
 wrote:
> Le 22/12/2017 à 21:36, Craig Gallek a écrit :
>> From: Craig Gallek 
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index 60a71be75aea..4b7ea33f5705 100644
>> --- a/net/core/net_namespace.c
>> +++ b/net/core/net_namespace.c
>> @@ -627,6 +627,8 @@ static int rtnl_net_newid(struct sk_buff *skb, struct 
>> nlmsghdr *nlh,
>>   return -EINVAL;
>>   }
>>   nsid = nla_get_s32(tb[NETNSA_NSID]);
>> + if (nsid < 0)
>> + return -EINVAL;
> No, this breaks the current behavior.
> Look at alloc_netid(). If reqid is < 0, the kernel allocates an nsid with no
> constraint. If reqid is >= 0, it tries to alloc the specified nsid.
Ah, thanks.  alloc_netid does appear to do the right thing.  In fact,
this seems to be another clue to the problem.  The current behavior is
to allocate from [0,max) when the input value is negative and the
problem seems to trigger when 0 is allocated.  Changing this range to
[1, max) fixes the problem, so there must be code elsewhere that does
not handle the case where the id is zero properly...

>>
>>   if (tb[NETNSA_PID]) {
>>   peer = get_net_ns_by_pid(nla_get_u32(tb[NETNSA_PID]));
>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>> index dabba2a91fc8..a928b8f081b8 100644
>> --- a/net/core/rtnetlink.c
>> +++ b/net/core/rtnetlink.c
>> @@ -1733,10 +1733,12 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, 
>> struct netlink_callback *cb)
>>   ifla_policy, NULL) >= 0) {
>>   if (tb[IFLA_IF_NETNSID]) {
>>   netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
>> - tgt_net = get_target_net(skb, netnsid);
>> - if (IS_ERR(tgt_net)) {
>> - tgt_net = net;
>> - netnsid = -1;
>> + if (netnsid >= 0) {
>> + tgt_net = get_target_net(skb, netnsid);
> I would prefer to put this test in get_target_net.
>
>> + if (IS_ERR(tgt_net)) {
>> + tgt_net = net;
>> + netnsid = -1;
> Maybe using NETNSA_NSID_NOT_ASSIGNED is better? Same for the initialization of
> this variable.
>
>> + }
>>   }
>>   }
>>
>> @@ -2792,6 +2794,11 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
>> nlmsghdr *nlh,
>>   if (tb[IFLA_LINK_NETNSID]) {
>>   int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
>>
>> + if (id < 0) {
>> + err =  -EINVAL;
>> + goto out;
>> + }
>> +
> This is not needed. get_net_ns_by_id() returns NULL if id is < 0.
Indeed, and by extension get_target_net does not need this check
either (as it calls get_net_ns_by_id).

I'm having trouble debugging this remotely, so I'll give it a whirl
when I get back to the office next week.

Thanks again for the pointers,
Craig


Re: net-next lan9303 and CONFIG_NET_DSA_LEGACY

2017-12-29 Thread Egil Hjelmeland

On 15. des. 2017 21:06, Florian Fainelli wrote:

On December 15, 2017 6:51:45 AM PST, Egil Hjelmeland 
 wrote:

Hi
I found that our lan9303 board is unable to receive network data unless

CONFIG_NET_DSA_LEGACY is set. Is that a problem with the driver, or our

Any advise would be appreciated.


Your DTS appears sane and using the new binding. Is the switch driver 
successfully probing and it is just packet transmission/reception that is non 
functional?
Hi Egil,



Hi Florian

I found that the problem is that !CONFIG_NET_DSA_LEGACY version of 
dsa_legacy_register() return -ENODEV;


That makes dsa_init_module() drop out before 
"dev_add_pack(&dsa_pack_type)", and we don't get any handler for ETH_P_XDSA.


Changed dsa_legacy_register to return 0, and it works.

Egil



[iptables] extensions: add support for 'srh' match

2017-12-29 Thread Ahmed Abdelsalam
This patch adds a new exetension to iptables to supprt 'srh' match
The implementation considers revision 7 of the SRH draft.
https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07

Signed-off-by: Ahmed Abdelsalam 
---
 extensions/libip6t_srh.c| 283 
 include/linux/netfilter_ipv6/ip6t_srh.h |  63 +++
 2 files changed, 346 insertions(+)
 create mode 100644 extensions/libip6t_srh.c
 create mode 100644 include/linux/netfilter_ipv6/ip6t_srh.h

diff --git a/extensions/libip6t_srh.c b/extensions/libip6t_srh.c
new file mode 100644
index 000..fd92ba1
--- /dev/null
+++ b/extensions/libip6t_srh.c
@@ -0,0 +1,283 @@
+/**
+ * Segment Routing Header 'srh' match extension
+ *
+ * Author:
+ *   Ahmed Abdelsalam   
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/* srh command-line options */
+enum {
+   O_SRH_NEXTHDR,
+   O_SRH_LEN_EQ,
+   O_SRH_LEN_GT,
+   O_SRH_LEN_LT,
+   O_SRH_SEGS_EQ,
+   O_SRH_SEGS_GT,
+   O_SRH_SEGS_LT,
+   O_SRH_LAST_EQ,
+   O_SRH_LAST_GT,
+   O_SRH_LAST_LT,
+   O_SRH_TAG,
+};
+
+static void srh_help(void)
+{
+   printf(
+"srh match options:\n"
+"[!] --srh-next-hdr next-hdrNext Header value of SRH\n"
+"[!] --srh-len-eq   hdr_len Hdr Ext Len value of SRH\n"
+"[!] --srh-len-gt   hdr_len Hdr Ext Len value of SRH\n"
+"[!] --srh-len-lt   hdr_len Hdr Ext Len value of SRH\n"
+"[!] --srh-segs-eq  segs_left   Segments Left value of SRH\n"
+"[!] --srh-segs-gt  segs_left   Segments Left value of SRH\n"
+"[!] --srh-segs-lt  segs_left   Segments Left value of SRH\n"
+"[!] --srh-last-eq  last_entry  Last Entry value of SRH\n"
+"[!] --srh-last-gt  last_entry  Last Entry value of SRH\n"
+"[!] --srh-last-lt  last_entry  Last Entry value of SRH\n"
+"[!] --srh-tag  tag Tag value of SRH\n");
+}
+
+#define s struct ip6t_srh
+static const struct xt_option_entry srh_opts[] = {
+   { .name = "srh-next-hdr", .id = O_SRH_NEXTHDR, .type = XTTYPE_UINT8,
+   .flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, next_hdr)},
+   { .name = "srh-len-eq", .id = O_SRH_LEN_EQ, .type = XTTYPE_UINT8,
+   .flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, hdr_len)},
+   { .name = "srh-len-gt", .id = O_SRH_LEN_GT, .type = XTTYPE_UINT8,
+   .flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, hdr_len)},
+   { .name = "srh-len-lt", .id = O_SRH_LEN_LT, .type = XTTYPE_UINT8,
+   .flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, hdr_len)},
+   { .name = "srh-segs-eq", .id = O_SRH_SEGS_EQ, .type = XTTYPE_UINT8,
+   .flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, segs_left)},
+   { .name = "srh-segs-gt", .id = O_SRH_SEGS_GT, .type = XTTYPE_UINT8,
+   .flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, segs_left)},
+   { .name = "srh-segs-lt", .id = O_SRH_SEGS_LT, .type = XTTYPE_UINT8,
+   .flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, segs_left)},
+   { .name = "srh-last-eq", .id = O_SRH_LAST_EQ, .type = XTTYPE_UINT8,
+   .flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, last_entry)},
+   { .name = "srh-last-gt", .id = O_SRH_LAST_GT, .type = XTTYPE_UINT8,
+   flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, last_entry)},
+   { .name = "srh-last-lt", .id = O_SRH_LAST_LT, .type = XTTYPE_UINT8,
+   .flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, last_entry)},
+   { .name = "srh-tag", .id = O_SRH_TAG, .type = XTTYPE_UINT16,
+   .flags = XTOPT_INVERT | XTOPT_PUT, XTOPT_POINTER(s, tag)},
+   { }
+};
+#undef s
+
+static void srh_init(struct xt_entry_match *m)
+{
+   struct ip6t_srh *srhinfo = (void *)m->data;
+
+   srhinfo->mt_flags = 0;
+   srhinfo->mt_invflags = 0;
+}
+
+static void srh_parse(struct xt_option_call *cb)
+{
+   struct ip6t_srh *srhinfo = cb->data;
+
+   xtables_option_parse(cb);
+   switch (cb->entry->id) {
+   case O_SRH_NEXTHDR:
+   srhinfo->mt_flags |= IP6T_SRH_NEXTHDR;
+   if (cb->invert)
+   srhinfo->mt_invflags |= IP6T_SRH_INV_NEXTHDR;
+   break;
+   case O_SRH_LEN_EQ:
+   srhinfo->mt_flags |= IP6T_SRH_LEN_EQ;
+   if (cb->invert)
+   srhinfo->mt_invflags |= IP6T_SRH_INV_LEN_EQ;
+   break;
+   case O_SRH_LEN_GT:
+   srhinfo->mt_flags |= IP6T_SRH_LEN_GT;
+   if (cb->invert)
+   srhinfo->mt_invflags |= IP6T_SRH_INV_LEN_GT;
+   break;
+   case O_SRH_LEN_LT:
+   srhinfo->mt_flags |= IP6T_SRH_LEN_LT;
+   if (cb->invert)
+   srhinfo->mt_invflags |= IP6T_SRH_INV_LEN_LT;
+   break;
+   case O_SRH_SEGS_EQ:
+   srhinfo->mt_flags |= IP6T_SRH_SEGS_EQ;
+   if (cb->invert)
+   srhinfo->

[net-next] netfilter: add segment routing header 'srh' match

2017-12-29 Thread Ahmed Abdelsalam
It allows matching packets based on Segment Routing Header
(SRH) information.
The implementation considers revision 7 of the SRH draft.
https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07

Currently supported match options include:
(1) Next Header
(2) Hdr Ext Len
(3) Segments Left
(4) Last Entry
(5) Tag value of SRH

Signed-off-by: Ahmed Abdelsalam 
---
 include/uapi/linux/netfilter_ipv6/ip6t_srh.h |  63 ++
 net/ipv6/netfilter/Kconfig   |   9 ++
 net/ipv6/netfilter/Makefile  |   1 +
 net/ipv6/netfilter/ip6t_srh.c| 165 +++
 4 files changed, 238 insertions(+)
 create mode 100644 include/uapi/linux/netfilter_ipv6/ip6t_srh.h
 create mode 100644 net/ipv6/netfilter/ip6t_srh.c

diff --git a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h 
b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
new file mode 100644
index 000..1b5dbd8
--- /dev/null
+++ b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
@@ -0,0 +1,63 @@
+/**
+ * Definitions for Segment Routing Header 'srh' match
+ *
+ * Author:
+ *   Ahmed Abdelsalam   
+ */
+
+#ifndef _IP6T_SRH_H
+#define _IP6T_SRH_H
+
+#include 
+#include 
+
+/* Values for "mt_flags" field in struct ip6t_srh */
+#define IP6T_SRH_NEXTHDR0x0001
+#define IP6T_SRH_LEN_EQ 0x0002
+#define IP6T_SRH_LEN_GT 0x0004
+#define IP6T_SRH_LEN_LT 0x0008
+#define IP6T_SRH_SEGS_EQ0x0010
+#define IP6T_SRH_SEGS_GT0x0020
+#define IP6T_SRH_SEGS_LT0x0040
+#define IP6T_SRH_LAST_EQ0x0080
+#define IP6T_SRH_LAST_GT0x0100
+#define IP6T_SRH_LAST_LT0x0200
+#define IP6T_SRH_TAG0x0400
+#define IP6T_SRH_MASK   0x07FF
+
+/* Values for "mt_invflags" field in struct ip6t_srh */
+#define IP6T_SRH_INV_NEXTHDR0x0001
+#define IP6T_SRH_INV_LEN_EQ 0x0002
+#define IP6T_SRH_INV_LEN_GT 0x0004
+#define IP6T_SRH_INV_LEN_LT 0x0008
+#define IP6T_SRH_INV_SEGS_EQ0x0010
+#define IP6T_SRH_INV_SEGS_GT0x0020
+#define IP6T_SRH_INV_SEGS_LT0x0040
+#define IP6T_SRH_INV_LAST_EQ0x0080
+#define IP6T_SRH_INV_LAST_GT0x0100
+#define IP6T_SRH_INV_LAST_LT0x0200
+#define IP6T_SRH_INV_TAG0x0400
+#define IP6T_SRH_INV_MASK   0x07FF
+
+/**
+ *  struct ip6t_srh - SRH match options
+ *  @ next_hdr: Next header field of SRH
+ *  @ hdr_len: Extension header length field of SRH
+ *  @ segs_left: Segments left field of SRH
+ *  @ last_entry: Last entry field of SRH
+ *  @ tag: Tag field of SRH
+ *  @ mt_flags: match options
+ *  @ mt_invflags: Invert the sense of match options
+ */
+
+struct ip6t_srh {
+   __u8next_hdr;
+   __u8hdr_len;
+   __u8segs_left;
+   __u8last_entry;
+   __u16   tag;
+   __u16   mt_flags;
+   __u16   mt_invflags;
+};
+
+#endif /*_IP6T_SRH_H*/
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 6acb2ee..e1818eb 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -232,6 +232,15 @@ config IP6_NF_MATCH_RT
 
  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP6_NF_MATCH_SRH
+tristate '"srh" Segment Routing header match support'
+depends on NETFILTER_ADVANCED
+help
+  srh matching allows you to match packets based on the segment
+ routing header of the packet.
+
+  To compile it as a module, choose M here.  If unsure, say N.
+
 # The targets
 config IP6_NF_TARGET_HL
tristate '"HL" hoplimit target support'
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index c6ee0cd..e0d51a9 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -54,6 +54,7 @@ obj-$(CONFIG_IP6_NF_MATCH_MH) += ip6t_mh.o
 obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_hbh.o
 obj-$(CONFIG_IP6_NF_MATCH_RPFILTER) += ip6t_rpfilter.o
 obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o
+obj-$(CONFIG_IP6_NF_MATCH_SRH) += ip6t_srh.o
 
 # targets
 obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o
diff --git a/net/ipv6/netfilter/ip6t_srh.c b/net/ipv6/netfilter/ip6t_srh.c
new file mode 100644
index 000..75e41dc9
--- /dev/null
+++ b/net/ipv6/netfilter/ip6t_srh.c
@@ -0,0 +1,165 @@
+/*
+ * Kernel module to match Segment Routing Header (SRH) parameters.
+ *
+ * Author:
+ * Ahmed Abdelsalam 
+ *
+ *
+ *  This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Xtables: IPv6 Segment Routing 

Re: [PATCH net-next v5 0/5] Introduce NETIF_F_GRO_HW

2017-12-29 Thread Alexander Duyck
On Fri, Dec 29, 2017 at 4:43 AM, Sabrina Dubroca  wrote:
> 2017-12-22, 10:14:32 -0800, Alexander Duyck wrote:
>> On Fri, Dec 22, 2017 at 6:57 AM, Sabrina Dubroca  
>> wrote:
>> > IIUC, with the patches that were applied, each driver can define
>> > whether GRO_HW depends on GRO? From a user's perspective, this
>> > inconsistent behavior is going to be quite confusing.
>> >
>> > Worse than inconsistent behavior, it looks like a driver deciding that
>> > GRO_HW doesn't depend on GRO is going to introduce a change of
>> > behavior.  Previously, when GRO was disabled, there wouldn't be any
>> > packet over MTU handed to the network stack.  Now, even if GRO is
>> > disabled, GRO_HW might still be enabled, so we might get over-MTU
>> > packets because of hardware GRO.
>>
>> This isn't actually true. LRO was still handling packets larger than
>> MTU over even when GRO was disabled.
>
> Sure, LRO will also cause that, but we're speaking in the context of
> GRO here, which means no LRO.

We are talking about GRO_HW. Which is basically aggregation being
performed in hardware. The choice of name is unfortunate though since
it implies this is GRO when what is actually happening is just
GRO-like. Really the only difference between LRO and GRO_HW is that
the frames are better formed in the final result. It is a matter of
what is performed versus where it is performed.

>> > I don't think drivers should be allowed to say "GRO_HW doesn't depend
>> > on GRO".
>>
>> Why not, it doesn't. In my mind GRO_HW is closer to LRO than it is to
>> GRO.
>
> Why do you say that? It looks like GRO to me. These drivers are
> calling tcp_gro_complete() for example.

The final frame output in the case of frames that are aggregated will
be the same as GRO. However LRO could theoretically do the same thing
if the hardware had been implemented correctly in the first place.

>> The only ugly bit as I see it is that these devices were exposing
>> the feature via the GRO flag in the first place. So for the sake of
>> legacy they might want to carry around the dependency.
>>
>> > I think it's reasonable to be able to disable software GRO even if
>> > hardware GRO is enabled. Thus, I would propose:
>> > - keep the current GRO flag
>> > - add a GRO_HW flag, depending on GRO, enforced by the core as in
>> >   earlier versions of these patches
>> > - add a GRO_SW flag, also depending on GRO
>>
>> This seems like a bunch of extra overhead for not much gain. Do we
>> really need to fork GRO into 3 bits? I would argue that GRO_HW really
>> should have been branded something like FORWARDABLE_LRO, but nobody
>> wanted to touch the name LRO since it apparently has some negative
>> stigma to it. If we had used a name like that we probably wouldn't be
>> going through all these extra hoops. The only real reason why this is
>> even being associated with GRO in the first place is that is how this
>> feature was hidden by the drivers so they got around having to deal
>> with the LRO being disabled for routing/forwarding issue. Those are
>> the parts that want to keep it associated with GRO since that is how
>> they exposed it in their devices originally.
>
> I think it wouldn't have hidden behind GRO if it wasn't GRO. Again,
> why do you think it's not GRO?

The only real piece I see as missing the propagation bits for GRO_HW
to upper devices. The argument was that GRO doesn't have to do that,
but I think we are going to have to get there at some point so that
when we encounter the first piece of hardware that does this wrong we
have a switch ready to allow us to disable it for debugging. If we are
insistent on having a switch that binds together GRO and GRO_HW one
thought I had was to change the meaning of GRO_HW for virtual devices
to indicate that we allow GRO to happen on the lower devices
associated with this device. My thought was that GRO_HW allows GRO to
take place on the physical hardware below the netdev, so why couldn't
we say that GRO_HW also applies to upper devices and them indicating
they don't want GRO, GRO_HW, or LRO for lower devices. Thoughts? The
only reason it occurred to me is that there is currently no way to
propagate a disable of GRO so GRO_HW might be a way to do that for
virtual devices. It would make GRO_HW more about the location of the
GRO feature instead of the feature itself. Basically if you clear it
then nothing below that point should be doing any sort of aggregation.

Thanks.

- Alex


3c59x: pci_unmap_single() oops

2017-12-29 Thread tedheadster
In the 4.15.0-rc5 kernel (and likely earlier) I get the following oops.

3c59x :00:0c.0 enp0s12: renamed from eth0
enp0s12:  setting half-duplex.
[ cut here ]
3c59x :00:0c.0: DMA-API: device driver failed to check map
error[device address=0x09e1b040] [size=1536 bytes] [mapped as
single]
WARNING: CPU: 0 PID: 1 at check_unmap+0x559/0x695
Modules linked in: ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore pcspkr
serio_raw 3c59x mii usb_common ipv6
CPU: 0 PID: 1 Comm: systemd Not tainted 4.15.0-rc5.i486 #10
EIP: check_unmap+0x559/0x695
EFLAGS: 00010096 CPU: 0
EAX: 008c EBX: cb8a8660 ECX: c0881544 EDX: 0001
ESI: cb8e5280 EDI: c06e8b9f EBP: cb821e50 ESP: cb821df8
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 004f1700 CR3: 0b2f9000 CR4: 
Call Trace:
 ? mntput_no_expire+0x13/0x105
 debug_dma_unmap_page+0x61/0x69
 pci_unmap_single+0x4c/0x56 [3c59x]
 boomerang_rx+0x250/0x42a [3c59x]
 boomerang_interrupt+0xde/0x3ea [3c59x]
 __handle_irq_event_percpu+0x2a/0xaf
 handle_irq_event_percpu+0x17/0x3d
 handle_irq_event+0x22/0x3b
 handle_level_irq+0x55/0x7a
 handle_irq+0x4f/0x58
 do_IRQ+0x35/0x95
 common_interrupt+0x34/0x40
EIP: 0xb7ae6970
EFLAGS: 0246 CPU: 0
EAX: b7e1c3d8 EBX: b7eef344 ECX:  EDX: 007048c4
ESI: 0013 EDI: 007048c4 EBP: bf8d81c8 ESP: bf8d8094
 DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
Code: 01 00 00 8b 58 08 e9 4a 01 00 00 bb 1b 2f 70 c0 89 d8 57 ff 75
e4 ff 75 e0 ff 75 dc ff 75 d8 53 50 68 bd 16 71 c0 e8 1f 3c e1 ff <0f>
ff 83 c4 20 83 3d 44 d5 7e c0 00 75 0f a1 f0 e6 7b c0 85 c0
---[ end trace 8b519628d8703199 ]---

This may relate to "3c59x: Add dma error checking and recovery"

- Matthew Whitehead


Compliment of the day

2017-12-29 Thread Kabore Zongo


--
Compliment of the day , I am a professional banker with International 
and Qualified Experience. Please i seek your Urgent Attention and 
Cooperation to transfer the sum of $8.5M into your account. Risk free 
and Beneficial. Waiting response so that we can proceed to the next 
level(zongokab2...@yahoo.com

--



[PATCH net-next 3/5] net: phy: marvell10g: clean up interface mode switching

2017-12-29 Thread Russell King
Centralise the PHY interface mode switching, rather than having it in
two places.

Signed-off-by: Russell King 
---
 drivers/net/phy/marvell10g.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
index 632f2ec15e39..c01bb2654d69 100644
--- a/drivers/net/phy/marvell10g.c
+++ b/drivers/net/phy/marvell10g.c
@@ -333,6 +333,24 @@ static int mv3310_aneg_done(struct phy_device *phydev)
return genphy_c45_aneg_done(phydev);
 }
 
+static void mv3310_update_interface(struct phy_device *phydev)
+{
+   if ((phydev->interface == PHY_INTERFACE_MODE_SGMII ||
+phydev->interface == PHY_INTERFACE_MODE_10GKR) && phydev->link) {
+   /* The PHY automatically switches its serdes interface (and
+* active PHYXS instance) between Cisco SGMII and 10GBase-KR
+* modes according to the speed.  Florian suggests setting
+* phydev->interface to communicate this to the MAC. Only do
+* this if we are already in either SGMII or 10GBase-KR mode.
+*/
+   if (phydev->speed == SPEED_1)
+   phydev->interface = PHY_INTERFACE_MODE_10GKR;
+   else if (phydev->speed >= SPEED_10 &&
+phydev->speed < SPEED_1)
+   phydev->interface = PHY_INTERFACE_MODE_SGMII;
+   }
+}
+
 /* 10GBASE-ER,LR,LRM,SR do not support autonegotiation. */
 static int mv3310_read_10gbr_status(struct phy_device *phydev)
 {
@@ -340,8 +358,7 @@ static int mv3310_read_10gbr_status(struct phy_device 
*phydev)
phydev->speed = SPEED_1;
phydev->duplex = DUPLEX_FULL;
 
-   if (phydev->interface == PHY_INTERFACE_MODE_SGMII)
-   phydev->interface = PHY_INTERFACE_MODE_10GKR;
+   mv3310_update_interface(phydev);
 
return 0;
 }
@@ -441,20 +458,7 @@ static int mv3310_read_status(struct phy_device *phydev)
}
}
 
-   if ((phydev->interface == PHY_INTERFACE_MODE_SGMII ||
-phydev->interface == PHY_INTERFACE_MODE_10GKR) && phydev->link) {
-   /* The PHY automatically switches its serdes interface (and
-* active PHYXS instance) between Cisco SGMII and 10GBase-KR
-* modes according to the speed.  Florian suggests setting
-* phydev->interface to communicate this to the MAC. Only do
-* this if we are already in either SGMII or 10GBase-KR mode.
-*/
-   if (phydev->speed == SPEED_1)
-   phydev->interface = PHY_INTERFACE_MODE_10GKR;
-   else if (phydev->speed >= SPEED_10 &&
-phydev->speed < SPEED_1)
-   phydev->interface = PHY_INTERFACE_MODE_SGMII;
-   }
+   mv3310_update_interface(phydev);
 
return 0;
 }
-- 
2.7.4



[PATCH net-next 5/5] net: phy: marvell10g: add support for half duplex 100M and 10M

2017-12-29 Thread Russell King
Add support for half-duplex 100M and 10M copper connections by parsing
the advertisment results rather than trying to decode the negotiated
speed from one of the PHYs "vendor" registers.  This allows us to
decode the duplex as well, which means we can support half-duplex mode
for the slower speeds.

Signed-off-by: Russell King 
---
 drivers/net/phy/marvell10g.c | 37 -
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
index c01bb2654d69..26220593c529 100644
--- a/drivers/net/phy/marvell10g.c
+++ b/drivers/net/phy/marvell10g.c
@@ -42,13 +42,6 @@ enum {
 */
MV_AN_CTRL1000  = 0x8000, /* 1000base-T control register */
MV_AN_STAT1000  = 0x8001, /* 1000base-T status register */
-
-   /* This register appears to reflect the copper status */
-   MV_AN_RESULT= 0xa016,
-   MV_AN_RESULT_SPD_10 = BIT(12),
-   MV_AN_RESULT_SPD_100= BIT(13),
-   MV_AN_RESULT_SPD_1000   = BIT(14),
-   MV_AN_RESULT_SPD_1  = BIT(15),
 };
 
 static int mv3310_modify(struct phy_device *phydev, int devad, u16 reg,
@@ -239,12 +232,18 @@ static int mv3310_config_init(struct phy_device *phydev)
if (val & MDIO_PMA_EXTABLE_1000BKX)
__set_bit(ETHTOOL_LINK_MODE_1000baseKX_Full_BIT,
  supported);
-   if (val & MDIO_PMA_EXTABLE_100BTX)
+   if (val & MDIO_PMA_EXTABLE_100BTX) {
__set_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT,
  supported);
-   if (val & MDIO_PMA_EXTABLE_10BT)
+   __set_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT,
+ supported);
+   }
+   if (val & MDIO_PMA_EXTABLE_10BT) {
__set_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT,
  supported);
+   __set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
+ supported);
+   }
}
 
if (!ethtool_convert_link_mode_to_legacy_u32(&mask, supported))
@@ -412,22 +411,8 @@ static int mv3310_read_status(struct phy_device *phydev)
 
phydev->lp_advertising |= mii_stat1000_to_ethtool_lpa_t(val);
 
-   if (phydev->autoneg == AUTONEG_ENABLE) {
-   val = phy_read_mmd(phydev, MDIO_MMD_AN, MV_AN_RESULT);
-   if (val < 0)
-   return val;
-
-   if (val & MV_AN_RESULT_SPD_1)
-   phydev->speed = SPEED_1;
-   else if (val & MV_AN_RESULT_SPD_1000)
-   phydev->speed = SPEED_1000;
-   else if (val & MV_AN_RESULT_SPD_100)
-   phydev->speed = SPEED_100;
-   else if (val & MV_AN_RESULT_SPD_10)
-   phydev->speed = SPEED_10;
-
-   phydev->duplex = DUPLEX_FULL;
-   }
+   if (phydev->autoneg == AUTONEG_ENABLE)
+   phy_resolve_aneg_linkmode(phydev);
}
 
if (phydev->autoneg != AUTONEG_ENABLE) {
@@ -469,7 +454,9 @@ static struct phy_driver mv3310_drivers[] = {
.phy_id_mask= MARVELL_PHY_ID_MASK,
.name   = "mv88x3310",
.features   = SUPPORTED_10baseT_Full |
+ SUPPORTED_10baseT_Half |
  SUPPORTED_100baseT_Full |
+ SUPPORTED_100baseT_Half |
  SUPPORTED_1000baseT_Full |
  SUPPORTED_Autoneg |
  SUPPORTED_TP |
-- 
2.7.4



[PATCH net-next 4/5] net: phy: add helper to convert negotiation result to phy settings

2017-12-29 Thread Russell King
Add a helper to convert the result of the autonegotiation advertisment
into the PHYs speed and duplex settings.  If the result is full duplex,
also extract the pause mode settings from the link partner advertisment.

Signed-off-by: Russell King 
---
 drivers/net/phy/phy-core.c | 43 +++
 include/linux/phy.h|  2 ++
 2 files changed, 45 insertions(+)

diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 9c08850eed16..3d67f182b844 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -189,6 +189,49 @@ size_t phy_speeds(unsigned int *speeds, size_t size,
return count;
 }
 
+/**
+ * phy_resolve_aneg_linkmode - resolve the advertisments into phy settings
+ * @phydev: The phy_device struct
+ *
+ * Resolve our and the link partner advertisments into their corresponding
+ * speed and duplex. If full duplex was negotiated, extract the pause mode
+ * from the link partner mask.
+ */
+void phy_resolve_aneg_linkmode(struct phy_device *phydev)
+{
+   u32 common = phydev->lp_advertising & phydev->advertising;
+
+   if (common & ADVERTISED_1baseT_Full) {
+   phydev->speed = SPEED_1;
+   phydev->duplex = DUPLEX_FULL;
+   } else if (common & ADVERTISED_1000baseT_Full) {
+   phydev->speed = SPEED_1000;
+   phydev->duplex = DUPLEX_FULL;
+   } else if (common & ADVERTISED_1000baseT_Half) {
+   phydev->speed = SPEED_1000;
+   phydev->duplex = DUPLEX_HALF;
+   } else if (common & ADVERTISED_100baseT_Full) {
+   phydev->speed = SPEED_100;
+   phydev->duplex = DUPLEX_FULL;
+   } else if (common & ADVERTISED_100baseT_Half) {
+   phydev->speed = SPEED_100;
+   phydev->duplex = DUPLEX_HALF;
+   } else if (common & ADVERTISED_10baseT_Full) {
+   phydev->speed = SPEED_10;
+   phydev->duplex = DUPLEX_FULL;
+   } else if (common & ADVERTISED_10baseT_Half) {
+   phydev->speed = SPEED_10;
+   phydev->duplex = DUPLEX_HALF;
+   }
+
+   if (phydev->duplex == DUPLEX_FULL) {
+   phydev->pause = !!(phydev->lp_advertising & ADVERTISED_Pause);
+   phydev->asym_pause = !!(phydev->lp_advertising &
+   ADVERTISED_Asym_Pause);
+   }
+}
+EXPORT_SYMBOL_GPL(phy_resolve_aneg_linkmode);
+
 static void mmd_phy_indirect(struct mii_bus *bus, int phy_addr, int devad,
 u16 regnum)
 {
diff --git a/include/linux/phy.h b/include/linux/phy.h
index bb89a60716c2..0db52b272fd7 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -693,6 +693,8 @@ phy_lookup_setting(int speed, int duplex, const unsigned 
long *mask,
 size_t phy_speeds(unsigned int *speeds, size_t size,
  unsigned long *mask, size_t maxbit);
 
+void phy_resolve_aneg_linkmode(struct phy_device *phydev);
+
 /**
  * phy_read_mmd - Convenience function for reading a register
  * from an MMD on a given PHY.
-- 
2.7.4



[PATCH net-next 2/5] net: phy: marvell10g: add MDI swap reporting

2017-12-29 Thread Russell King
Add reporting of the MDI swap to the Marvell 10G PHY driver by providing
a generic implementation for the standard 10GBASE-T pair swap register
and polarity register.  We also support reading the MDI swap status for
1G and below from a PCS register.

Signed-off-by: Russell King 
---
 drivers/net/phy/marvell10g.c | 31 +++
 drivers/net/phy/phy-c45.c| 33 +
 include/linux/phy.h  |  1 +
 3 files changed, 65 insertions(+)

diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
index 0d503493ac14..632f2ec15e39 100644
--- a/drivers/net/phy/marvell10g.c
+++ b/drivers/net/phy/marvell10g.c
@@ -31,6 +31,11 @@ enum {
MV_PCS_BASE_R   = 0x1000,
MV_PCS_1000BASEX= 0x2000,
 
+   MV_PCS_PAIRSWAP = 0x8182,
+   MV_PCS_PAIRSWAP_MASK= 0x0003,
+   MV_PCS_PAIRSWAP_AB  = 0x0002,
+   MV_PCS_PAIRSWAP_NONE= 0x0003,
+
/* These registers appear at 0x800X and 0xa00X - the 0xa00X control
 * registers appear to set themselves to the 0x800X when AN is
 * restarted, but status registers appear readable from either.
@@ -267,6 +272,9 @@ static int mv3310_config_aneg(struct phy_device *phydev)
u32 advertising;
int ret;
 
+   /* We don't support manual MDI control */
+   phydev->mdix_ctrl = ETH_TP_MDI_AUTO;
+
if (phydev->autoneg == AUTONEG_DISABLE) {
ret = genphy_c45_pma_setup_forced(phydev);
if (ret < 0)
@@ -356,6 +364,7 @@ static int mv3310_read_status(struct phy_device *phydev)
phydev->link = 0;
phydev->pause = 0;
phydev->asym_pause = 0;
+   phydev->mdix = 0;
 
val = phy_read_mmd(phydev, MDIO_MMD_PCS, MV_PCS_BASE_R + MDIO_STAT1);
if (val < 0)
@@ -410,6 +419,28 @@ static int mv3310_read_status(struct phy_device *phydev)
return val;
}
 
+   if (phydev->speed == SPEED_1) {
+   val = genphy_c45_read_mdix(phydev);
+   if (val < 0)
+   return val;
+   } else {
+   val = phy_read_mmd(phydev, MDIO_MMD_PCS, MV_PCS_PAIRSWAP);
+   if (val < 0)
+   return val;
+
+   switch (val & MV_PCS_PAIRSWAP_MASK) {
+   case MV_PCS_PAIRSWAP_AB:
+   phydev->mdix = ETH_TP_MDI_X;
+   break;
+   case MV_PCS_PAIRSWAP_NONE:
+   phydev->mdix = ETH_TP_MDI;
+   break;
+   default:
+   phydev->mdix = ETH_TP_MDI_INVALID;
+   break;
+   }
+   }
+
if ((phydev->interface == PHY_INTERFACE_MODE_SGMII ||
 phydev->interface == PHY_INTERFACE_MODE_10GKR) && phydev->link) {
/* The PHY automatically switches its serdes interface (and
diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
index dada819c6b78..a4576859afae 100644
--- a/drivers/net/phy/phy-c45.c
+++ b/drivers/net/phy/phy-c45.c
@@ -233,6 +233,39 @@ int genphy_c45_read_pma(struct phy_device *phydev)
 }
 EXPORT_SYMBOL_GPL(genphy_c45_read_pma);
 
+/**
+ * genphy_c45_read_mdix - read mdix status from PMA
+ * @phydev: target phy_device struct
+ */
+int genphy_c45_read_mdix(struct phy_device *phydev)
+{
+   int val;
+
+   if (phydev->speed == SPEED_1) {
+   val = phy_read_mmd(phydev, MDIO_MMD_PMAPMD,
+  MDIO_PMA_10GBT_SWAPPOL);
+   if (val < 0)
+   return val;
+
+   switch (val) {
+   case MDIO_PMA_10GBT_SWAPPOL_ABNX | MDIO_PMA_10GBT_SWAPPOL_CDNX:
+   phydev->mdix = ETH_TP_MDI;
+   break;
+
+   case 0:
+   phydev->mdix = ETH_TP_MDI_X;
+   break;
+
+   default:
+   phydev->mdix = ETH_TP_MDI_INVALID;
+   break;
+   }
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(genphy_c45_read_mdix);
+
 /* The gen10g_* functions are the old Clause 45 stub */
 
 static int gen10g_config_aneg(struct phy_device *phydev)
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 0ee4ece312da..bb89a60716c2 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -943,6 +943,7 @@ int genphy_c45_read_lpa(struct phy_device *phydev);
 int genphy_c45_read_pma(struct phy_device *phydev);
 int genphy_c45_pma_setup_forced(struct phy_device *phydev);
 int genphy_c45_an_disable_aneg(struct phy_device *phydev);
+int genphy_c45_read_mdix(struct phy_device *phydev);
 
 static inline int phy_read_status(struct phy_device *phydev)
 {
-- 
2.7.4



[PATCH net-next 1/5] net: phy: marvell10g: update header comments

2017-12-29 Thread Russell King
Update header comments to indicate the newly found behaviour with XAUI
interfaces.

Signed-off-by: Russell King 
---
 drivers/net/phy/marvell10g.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
index 87f18cee1533..0d503493ac14 100644
--- a/drivers/net/phy/marvell10g.c
+++ b/drivers/net/phy/marvell10g.c
@@ -6,12 +6,18 @@
  *
  * There appears to be several different data paths through the PHY which
  * are automatically managed by the PHY.  The following has been determined
- * via observation and experimentation:
+ * via observation and experimentation for a setup using single-lane Serdes:
  *
  *   SGMII PHYXS -- BASE-T PCS -- 10G PMA -- AN -- Copper (for <= 1G)
  *  10GBASE-KR PHYXS -- BASE-T PCS -- 10G PMA -- AN -- Copper (for 10G)
  *  10GBASE-KR PHYXS -- BASE-R PCS -- Fiber
  *
+ * With XAUI, observation shows:
+ *
+ *XAUI PHYXS -- 
+ *
+ * and no switching of the host interface mode occurs.
+ *
  * If both the fiber and copper ports are connected, the first to gain
  * link takes priority and the other port is completely locked out.
  */
-- 
2.7.4



[PATCH net-next 0/5] marvell10g updates

2017-12-29 Thread Russell King - ARM Linux
Hi,

This series:
- adds MDI/MDIX reporting
- adds support for 10/100Mbps half-duplex link modes
- adds a comment describing the setup on VF610 ZII boards (where
  the phy interface mode doesn't change.)
- cleans up the phy interace mode switching

 drivers/net/phy/marvell10g.c | 110 +++
 drivers/net/phy/phy-c45.c|  33 +
 drivers/net/phy/phy-core.c   |  43 +
 include/linux/phy.h  |   3 ++
 4 files changed, 148 insertions(+), 41 deletions(-)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [PATCH net-next v5 0/5] Introduce NETIF_F_GRO_HW

2017-12-29 Thread Sabrina Dubroca
2017-12-22, 10:14:32 -0800, Alexander Duyck wrote:
> On Fri, Dec 22, 2017 at 6:57 AM, Sabrina Dubroca  wrote:
> > IIUC, with the patches that were applied, each driver can define
> > whether GRO_HW depends on GRO? From a user's perspective, this
> > inconsistent behavior is going to be quite confusing.
> >
> > Worse than inconsistent behavior, it looks like a driver deciding that
> > GRO_HW doesn't depend on GRO is going to introduce a change of
> > behavior.  Previously, when GRO was disabled, there wouldn't be any
> > packet over MTU handed to the network stack.  Now, even if GRO is
> > disabled, GRO_HW might still be enabled, so we might get over-MTU
> > packets because of hardware GRO.
> 
> This isn't actually true. LRO was still handling packets larger than
> MTU over even when GRO was disabled.

Sure, LRO will also cause that, but we're speaking in the context of
GRO here, which means no LRO.


> > I don't think drivers should be allowed to say "GRO_HW doesn't depend
> > on GRO".
> 
> Why not, it doesn't. In my mind GRO_HW is closer to LRO than it is to
> GRO.

Why do you say that? It looks like GRO to me. These drivers are
calling tcp_gro_complete() for example.

> The only ugly bit as I see it is that these devices were exposing
> the feature via the GRO flag in the first place. So for the sake of
> legacy they might want to carry around the dependency.
> 
> > I think it's reasonable to be able to disable software GRO even if
> > hardware GRO is enabled. Thus, I would propose:
> > - keep the current GRO flag
> > - add a GRO_HW flag, depending on GRO, enforced by the core as in
> >   earlier versions of these patches
> > - add a GRO_SW flag, also depending on GRO
> 
> This seems like a bunch of extra overhead for not much gain. Do we
> really need to fork GRO into 3 bits? I would argue that GRO_HW really
> should have been branded something like FORWARDABLE_LRO, but nobody
> wanted to touch the name LRO since it apparently has some negative
> stigma to it. If we had used a name like that we probably wouldn't be
> going through all these extra hoops. The only real reason why this is
> even being associated with GRO in the first place is that is how this
> feature was hidden by the drivers so they got around having to deal
> with the LRO being disabled for routing/forwarding issue. Those are
> the parts that want to keep it associated with GRO since that is how
> they exposed it in their devices originally.

I think it wouldn't have hidden behind GRO if it wasn't GRO. Again,
why do you think it's not GRO?

-- 
Sabrina


[PATCH net-next 1/2] net: dsa: lan9303: phy_addr_sel_strap rename and retype

2017-12-29 Thread Egil Hjelmeland
chip->phy_addr_sel_strap is declared as a bool, but is also used as an
integer address base.

Rename 'phy_addr_sel_strap' to 'phy_addr_base', and change type to int.

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 20 ++--
 include/linux/dsa/lan9303.h|  2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 944901f03f8b..3088cdc5d205 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -479,7 +479,8 @@ static int lan9303_detect_phy_setup(struct lan9303 *chip)
 {
int reg;
 
-   /* depending on the 'phy_addr_sel_strap' setting, the three phys are
+   /* Calculate chip->phy_addr_base:
+* Depending on the 'phy_addr_sel_strap' setting, the three phys are
 * using IDs 0-1-2 or IDs 1-2-3. We cannot read back the
 * 'phy_addr_sel_strap' setting directly, so we need a test, which
 * configuration is active:
@@ -495,12 +496,12 @@ static int lan9303_detect_phy_setup(struct lan9303 *chip)
}
 
if ((reg != 0) && (reg != 0x))
-   chip->phy_addr_sel_strap = 1;
+   chip->phy_addr_base = 1;
else
-   chip->phy_addr_sel_strap = 0;
+   chip->phy_addr_base = 0;
 
dev_dbg(chip->dev, "Phy setup '%s' detected\n",
-   chip->phy_addr_sel_strap ? "1-2-3" : "0-1-2");
+   chip->phy_addr_base ? "1-2-3" : "0-1-2");
 
return 0;
 }
@@ -1019,7 +1020,7 @@ static int lan9303_get_sset_count(struct dsa_switch *ds)
 static int lan9303_phy_read(struct dsa_switch *ds, int phy, int regnum)
 {
struct lan9303 *chip = ds->priv;
-   int phy_base = chip->phy_addr_sel_strap;
+   int phy_base = chip->phy_addr_base;
 
if (phy == phy_base)
return lan9303_virt_phy_reg_read(chip, regnum);
@@ -1033,7 +1034,7 @@ static int lan9303_phy_write(struct dsa_switch *ds, int 
phy, int regnum,
 u16 val)
 {
struct lan9303 *chip = ds->priv;
-   int phy_base = chip->phy_addr_sel_strap;
+   int phy_base = chip->phy_addr_base;
 
if (phy == phy_base)
return lan9303_virt_phy_reg_write(chip, regnum, val);
@@ -1070,7 +1071,7 @@ static void lan9303_adjust_link(struct dsa_switch *ds, 
int port,
 
res =  lan9303_phy_write(ds, port, MII_BMCR, ctl);
 
-   if (port == chip->phy_addr_sel_strap) {
+   if (port == chip->phy_addr_base) {
/* Virtual Phy: Remove Turbo 200Mbit mode */
lan9303_read(chip->regmap, LAN9303_VIRT_SPECIAL_CTRL, &ctl);
 
@@ -1094,8 +1095,7 @@ static void lan9303_port_disable(struct dsa_switch *ds, 
int port,
struct lan9303 *chip = ds->priv;
 
lan9303_disable_processing_port(chip, port);
-   lan9303_phy_write(ds, chip->phy_addr_sel_strap + port,
- MII_BMCR, BMCR_PDOWN);
+   lan9303_phy_write(ds, chip->phy_addr_base + port, MII_BMCR, BMCR_PDOWN);
 }
 
 static int lan9303_port_bridge_join(struct dsa_switch *ds, int port,
@@ -1289,7 +1289,7 @@ static int lan9303_register_switch(struct lan9303 *chip)
 
chip->ds->priv = chip;
chip->ds->ops = &lan9303_switch_ops;
-   chip->ds->phys_mii_mask = chip->phy_addr_sel_strap ? 0xe : 0x7;
+   chip->ds->phys_mii_mask = chip->phy_addr_base ? 0xe : 0x7;
 
return dsa_register_switch(chip->ds);
 }
diff --git a/include/linux/dsa/lan9303.h b/include/linux/dsa/lan9303.h
index b6514c29563f..b4f22112ba75 100644
--- a/include/linux/dsa/lan9303.h
+++ b/include/linux/dsa/lan9303.h
@@ -23,7 +23,7 @@ struct lan9303 {
struct regmap_irq_chip_data *irq_data;
struct gpio_desc *reset_gpio;
u32 reset_duration; /* in [ms] */
-   bool phy_addr_sel_strap;
+   int phy_addr_base;
struct dsa_switch *ds;
struct mutex indirect_mutex; /* protect indexed register access */
struct mutex alr_mutex; /* protect ALR access */
-- 
2.14.1



[PATCH net-next 2/2] net: dsa: lan9303: Adjust phy_addr_base expressions

2017-12-29 Thread Egil Hjelmeland
Simplify calculation of chip->phy_addr_base in lan9303_detect_phy_setup().

Use GENMASK to calculate phys_mii_mask from LAN9303_NUM_PORTS and
phy_addr_base.

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 3088cdc5d205..4efb772dbc7e 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -495,10 +495,7 @@ static int lan9303_detect_phy_setup(struct lan9303 *chip)
return reg;
}
 
-   if ((reg != 0) && (reg != 0x))
-   chip->phy_addr_base = 1;
-   else
-   chip->phy_addr_base = 0;
+   chip->phy_addr_base = reg != 0 && reg != 0x;
 
dev_dbg(chip->dev, "Phy setup '%s' detected\n",
chip->phy_addr_base ? "1-2-3" : "0-1-2");
@@ -1283,13 +1280,16 @@ static const struct dsa_switch_ops lan9303_switch_ops = 
{
 
 static int lan9303_register_switch(struct lan9303 *chip)
 {
+   int base;
+
chip->ds = dsa_switch_alloc(chip->dev, LAN9303_NUM_PORTS);
if (!chip->ds)
return -ENOMEM;
 
chip->ds->priv = chip;
chip->ds->ops = &lan9303_switch_ops;
-   chip->ds->phys_mii_mask = chip->phy_addr_base ? 0xe : 0x7;
+   base = chip->phy_addr_base;
+   chip->ds->phys_mii_mask = GENMASK(LAN9303_NUM_PORTS - 1 + base, base);
 
return dsa_register_switch(chip->ds);
 }
-- 
2.14.1



[PATCH net-next 0/2] net: dsa: lan9303: phy_addr_sel_strap rename and retype

2017-12-29 Thread Egil Hjelmeland
Non functional cleanups involving chip->phy_addr_sel_strap.
As promised in https://lkml.org/lkml/2017/11/6/273 

Egil Hjelmeland (2):
  net: dsa: lan9303: phy_addr_sel_strap rename and retype
  net: dsa: lan9303: Adjust phy_addr_base expressions

 drivers/net/dsa/lan9303-core.c | 24 
 include/linux/dsa/lan9303.h|  2 +-
 2 files changed, 13 insertions(+), 13 deletions(-)

-- 
2.14.1



Re: [PATCH bpf-next v3 3/3] libbpf: add missing SPDX-License-Identifier

2017-12-29 Thread Philippe Ombredanne
Eric,

On Thu, Dec 28, 2017 at 9:04 AM, Eric Leblond  wrote:
> Signed-off-by: Eric Leblond 
> Acked-by: Alexei Starovoitov 
> ---
>  tools/lib/bpf/bpf.c| 2 ++
>  tools/lib/bpf/bpf.h| 2 ++
>  tools/lib/bpf/libbpf.c | 2 ++
>  tools/lib/bpf/libbpf.h | 2 ++
>  4 files changed, 8 insertions(+)
>
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index cdfabbe118cc..9e53dbbca2bd 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -1,3 +1,5 @@
> +/* SPDX-License-Identifier: LGPL-2.1 */


In a .c file these should using C++-style comments as in //.
As requested by Linus, discussed on list and documented in Thomas doc patches.

> +
>  /*
>   * common eBPF ELF operations.
>   *
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 9f44c196931e..8d18fb73d7fb 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -1,3 +1,5 @@
> +/* SPDX-License-Identifier: LGPL-2.1 */

And this is correct for a .h

> +
>  /*
>   * common eBPF ELF operations.
>   *
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 5fe8aaa2123e..878e240a681b 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -1,3 +1,5 @@
> +/* SPDX-License-Identifier: LGPL-2.1 */

Use  // here.

> +
>  /*
>   * Common eBPF ELF object loading operations.
>   *
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index e42f96900318..f85906533cdd 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1,3 +1,5 @@
> +/* SPDX-License-Identifier: LGPL-2.1 */
> +
>  /*
>   * Common eBPF ELF object loading operations.
>   *
> --
> 2.15.1
>


-- 
Cordially
Philippe Ombredanne


Re: [PATCH net-next 2/2] tuntap: XDP transmission

2017-12-29 Thread Jesper Dangaard Brouer
On Fri, 29 Dec 2017 18:00:04 +0800
Jason Wang  wrote:

> This patch implements XDP transmission for TAP. Since we can't create
> new queues for TAP during XDP set, exist ptr_ring was reused for
> queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG
> (0x1ULL) was encoded into lowest bit of xpd_buff pointer during
> ptr_ring_produce, and was decoded during consuming. XDP metadata was
> stored in the headroom of the packet which should work in most of
> cases since driver usually reserve enough headroom. Very minor changes
> were done for vhost_net: it just need to peek the length depends on
> the type of pointer.
> 
> Tests was done on two Intel E5-2630 2.40GHz machines connected back to
> back through two 82599ES. Traffic were generated through MoonGen and
> testpmd(rxonly) in guest reports 2.97Mpps when xdp_redirect_map is
> doing redirection from ixgbe to TAP.

IMHO a performance measurement without something to compare against is
useless.  What was the performance before?


> Cc: Jesper Dangaard Brouer 
> Signed-off-by: Jason Wang 
> ---
>  drivers/net/tun.c  | 205 
> -
>  drivers/vhost/net.c|  13 +++-
>  include/linux/if_tun.h |  17 
>  3 files changed, 197 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 2c89efe..be6d993 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -240,6 +240,24 @@ struct tun_struct {
>   struct tun_steering_prog __rcu *steering_prog;
>  };
>  
> +bool tun_is_xdp_buff(void *ptr)
> +{
> + return (unsigned long)ptr & TUN_XDP_FLAG;
> +}
> +EXPORT_SYMBOL(tun_is_xdp_buff);
> +
> +void *tun_xdp_to_ptr(void *ptr)
> +{
> + return (void *)((unsigned long)ptr | TUN_XDP_FLAG);
> +}
> +EXPORT_SYMBOL(tun_xdp_to_ptr);
> +
> +void *tun_ptr_to_xdp(void *ptr)
> +{
> + return (void *)((unsigned long)ptr & ~TUN_XDP_FLAG);
> +}
> +EXPORT_SYMBOL(tun_ptr_to_xdp);
> +
>  static int tun_napi_receive(struct napi_struct *napi, int budget)
>  {
>   struct tun_file *tfile = container_of(napi, struct tun_file, napi);
> @@ -630,12 +648,25 @@ static struct tun_struct *tun_enable_queue(struct 
> tun_file *tfile)
>   return tun;
>  }
>  
> +static void tun_ptr_free(void *ptr)
> +{
> + if (!ptr)
> + return;
> + if (tun_is_xdp_buff(ptr)) {
> + struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
> +
> + put_page(virt_to_head_page(xdp->data));

(Yet another XDP-free call point, I need to convert to use an accessor
later to transition driver into an xdp_buff return API)

> + } else {
> + __skb_array_destroy_skb(ptr);
> + }
> +}
> +
>  static void tun_queue_purge(struct tun_file *tfile)
>  {
> - struct sk_buff *skb;
> + void *ptr;
>  
> - while ((skb = ptr_ring_consume(&tfile->tx_ring)) != NULL)
> - kfree_skb(skb);
> + while ((ptr = ptr_ring_consume(&tfile->tx_ring)) != NULL)
> + tun_ptr_free(ptr);
>  
>   skb_queue_purge(&tfile->sk.sk_write_queue);
>   skb_queue_purge(&tfile->sk.sk_error_queue);
> @@ -688,8 +719,7 @@ static void __tun_detach(struct tun_file *tfile, bool 
> clean)
>   unregister_netdevice(tun->dev);
>   }
>   if (tun)
> - ptr_ring_cleanup(&tfile->tx_ring,
> -  __skb_array_destroy_skb);
> + ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
>   sock_put(&tfile->sk);
>   }
>  }
> @@ -1201,6 +1231,54 @@ static const struct net_device_ops tun_netdev_ops = {
>   .ndo_get_stats64= tun_net_get_stats64,
>  };
>  
> +static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
> +{
> + struct tun_struct *tun = netdev_priv(dev);
> + struct xdp_buff *buff = xdp->data_hard_start;
> + int headroom = xdp->data - xdp->data_hard_start;
> + struct tun_file *tfile;
> + u32 numqueues;
> + int ret = 0;
> +
> + /* Assure headroom is available and buff is properly aligned */
> + if (unlikely(headroom < sizeof(*xdp) || tun_is_xdp_buff(xdp)))
> + return -ENOSPC;
> +
> + *buff = *xdp;
> +
> + rcu_read_lock();
> +
> + numqueues = READ_ONCE(tun->numqueues);
> + if (!numqueues) {
> + ret = -ENOSPC;
> + goto out;
> + }
> + tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
> + numqueues]);

Several concurrent CPUs can get the same 'tfile'.

> + /* Encode the XDP flag into lowest bit for consumer to differ
> +  * XDP buffer from sk_buff.
> +  */
> + if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(buff))) {
> + this_cpu_inc(tun->pcpu_stats->tx_dropped);
> + ret = -ENOSPC;
> + }

The ptr_ring_produce() will take a lock per packet, limiting the
performance.  (Again a case where I would have liked a bulk API for
ndo_xdp_xmit()).


> 

[PATCH net-next 2/7] net: phy: use unlocked accessors for indirect MMD accesses

2017-12-29 Thread Russell King
Use unlocked accessors for indirect MMD accesses to clause 22 PHYs.
This permits tracing of these accesses.

Reviewed-by: Florian Fainelli 
Signed-off-by: Russell King 
---
 drivers/net/phy/phy-core.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 21f75ae244b3..83d32644cb4d 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -193,13 +193,14 @@ static void mmd_phy_indirect(struct mii_bus *bus, int 
phy_addr, int devad,
 u16 regnum)
 {
/* Write the desired MMD Devad */
-   bus->write(bus, phy_addr, MII_MMD_CTRL, devad);
+   __mdiobus_write(bus, phy_addr, MII_MMD_CTRL, devad);
 
/* Write the desired MMD register address */
-   bus->write(bus, phy_addr, MII_MMD_DATA, regnum);
+   __mdiobus_write(bus, phy_addr, MII_MMD_DATA, regnum);
 
/* Select the Function : DATA with no post increment */
-   bus->write(bus, phy_addr, MII_MMD_CTRL, devad | MII_MMD_CTRL_NOINCR);
+   __mdiobus_write(bus, phy_addr, MII_MMD_CTRL,
+   devad | MII_MMD_CTRL_NOINCR);
 }
 
 /**
@@ -232,7 +233,7 @@ int phy_read_mmd(struct phy_device *phydev, int devad, u32 
regnum)
mmd_phy_indirect(bus, phy_addr, devad, regnum);
 
/* Read the content of the MMD's selected register */
-   val = bus->read(bus, phy_addr, MII_MMD_DATA);
+   val = __mdiobus_read(bus, phy_addr, MII_MMD_DATA);
mutex_unlock(&bus->mdio_lock);
}
return val;
@@ -271,7 +272,7 @@ int phy_write_mmd(struct phy_device *phydev, int devad, u32 
regnum, u16 val)
mmd_phy_indirect(bus, phy_addr, devad, regnum);
 
/* Write the data into MMD's selected register */
-   bus->write(bus, phy_addr, MII_MMD_DATA, val);
+   __mdiobus_write(bus, phy_addr, MII_MMD_DATA, val);
mutex_unlock(&bus->mdio_lock);
 
ret = 0;
-- 
2.7.4



[PATCH net-next 6/7] net: phy: add phy_modify() accessor

2017-12-29 Thread Russell King
Add phy_modify() convenience accessor to complement the mdiobus
counterpart.

Signed-off-by: Russell King 
---
 drivers/net/phy/phy-core.c | 23 +++
 include/linux/phy.h|  1 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index df39e6711b76..9c08850eed16 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -306,6 +306,29 @@ int __phy_modify(struct phy_device *phydev, u32 regnum, 
u16 mask, u16 set)
 }
 EXPORT_SYMBOL_GPL(__phy_modify);
 
+/**
+ * phy_modify - Convenience function for modifying a given PHY register
+ * @phydev: the phy_device struct
+ * @regnum: register number to write
+ * @mask: bit mask of bits to clear
+ * @set: new value of bits set in mask to write to @regnum
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
+int phy_modify(struct phy_device *phydev, u32 regnum, u16 mask, u16 set)
+{
+   int ret;
+
+   mutex_lock(&phydev->mdio.bus->mdio_lock);
+   ret = __phy_modify(phydev, regnum, mask, set);
+   mutex_unlock(&phydev->mdio.bus->mdio_lock);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(phy_modify);
+
 static int __phy_read_page(struct phy_device *phydev)
 {
return phydev->drv->read_page(phydev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index f408220d95ec..0ee4ece312da 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -760,6 +760,7 @@ static inline int __phy_write(struct phy_device *phydev, 
u32 regnum, u16 val)
 }
 
 int __phy_modify(struct phy_device *phydev, u32 regnum, u16 mask, u16 set);
+int phy_modify(struct phy_device *phydev, u32 regnum, u16 mask, u16 set);
 
 /**
  * phy_interrupt_is_valid - Convenience function for testing a given PHY irq
-- 
2.7.4



[PATCH net-next 1/7] net: mdiobus: add unlocked accessors

2017-12-29 Thread Russell King
Add unlocked versions of the bus accessors, which allows access to the
bus with all the tracing. These accessors validate that the bus mutex
is held, which is a basic requirement for all mii bus accesses.

Reviewed-by: Florian Fainelli 
Signed-off-by: Russell King 
---
 drivers/net/phy/mdio_bus.c | 65 +-
 include/linux/mdio.h   |  3 +++
 2 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 54d00a1d2bef..75be8be0d169 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -494,6 +494,55 @@ struct phy_device *mdiobus_scan(struct mii_bus *bus, int 
addr)
 EXPORT_SYMBOL(mdiobus_scan);
 
 /**
+ * __mdiobus_read - Unlocked version of the mdiobus_read function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to read
+ *
+ * Read a MDIO bus register. Caller must hold the mdio bus lock.
+ *
+ * NOTE: MUST NOT be called from interrupt context.
+ */
+int __mdiobus_read(struct mii_bus *bus, int addr, u32 regnum)
+{
+   int retval;
+
+   WARN_ON_ONCE(!mutex_is_locked(&bus->mdio_lock));
+
+   retval = bus->read(bus, addr, regnum);
+
+   trace_mdio_access(bus, 1, addr, regnum, retval, retval);
+
+   return retval;
+}
+EXPORT_SYMBOL(__mdiobus_read);
+
+/**
+ * __mdiobus_write - Unlocked version of the mdiobus_write function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ *
+ * Write a MDIO bus register. Caller must hold the mdio bus lock.
+ *
+ * NOTE: MUST NOT be called from interrupt context.
+ */
+int __mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val)
+{
+   int err;
+
+   WARN_ON_ONCE(!mutex_is_locked(&bus->mdio_lock));
+
+   err = bus->write(bus, addr, regnum, val);
+
+   trace_mdio_access(bus, 0, addr, regnum, val, err);
+
+   return err;
+}
+EXPORT_SYMBOL(__mdiobus_write);
+
+/**
  * mdiobus_read_nested - Nested version of the mdiobus_read function
  * @bus: the mii_bus struct
  * @addr: the phy address
@@ -513,11 +562,9 @@ int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 
regnum)
BUG_ON(in_interrupt());
 
mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
-   retval = bus->read(bus, addr, regnum);
+   retval = __mdiobus_read(bus, addr, regnum);
mutex_unlock(&bus->mdio_lock);
 
-   trace_mdio_access(bus, 1, addr, regnum, retval, retval);
-
return retval;
 }
 EXPORT_SYMBOL(mdiobus_read_nested);
@@ -539,11 +586,9 @@ int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum)
BUG_ON(in_interrupt());
 
mutex_lock(&bus->mdio_lock);
-   retval = bus->read(bus, addr, regnum);
+   retval = __mdiobus_read(bus, addr, regnum);
mutex_unlock(&bus->mdio_lock);
 
-   trace_mdio_access(bus, 1, addr, regnum, retval, retval);
-
return retval;
 }
 EXPORT_SYMBOL(mdiobus_read);
@@ -569,11 +614,9 @@ int mdiobus_write_nested(struct mii_bus *bus, int addr, 
u32 regnum, u16 val)
BUG_ON(in_interrupt());
 
mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
-   err = bus->write(bus, addr, regnum, val);
+   err = __mdiobus_write(bus, addr, regnum, val);
mutex_unlock(&bus->mdio_lock);
 
-   trace_mdio_access(bus, 0, addr, regnum, val, err);
-
return err;
 }
 EXPORT_SYMBOL(mdiobus_write_nested);
@@ -596,11 +639,9 @@ int mdiobus_write(struct mii_bus *bus, int addr, u32 
regnum, u16 val)
BUG_ON(in_interrupt());
 
mutex_lock(&bus->mdio_lock);
-   err = bus->write(bus, addr, regnum, val);
+   err = __mdiobus_write(bus, addr, regnum, val);
mutex_unlock(&bus->mdio_lock);
 
-   trace_mdio_access(bus, 0, addr, regnum, val, err);
-
return err;
 }
 EXPORT_SYMBOL(mdiobus_write);
diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index ca08ab16ecdc..34796e29c90c 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -257,6 +257,9 @@ static inline u16 ethtool_adv_to_mmd_eee_adv_t(u32 adv)
return reg;
 }
 
+int __mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
+int __mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+
 int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
-- 
2.7.4



[PATCH net-next 4/7] net: phy: add paged phy register accessors

2017-12-29 Thread Russell King
Add a set of paged phy register accessors which are inherently safe in
their design against other accesses interfering with the paged access.

Signed-off-by: Russell King 
---
 drivers/net/phy/phy-core.c | 157 +
 include/linux/phy.h|  11 
 2 files changed, 168 insertions(+)

diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 37c039da0c16..df39e6711b76 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -305,3 +305,160 @@ int __phy_modify(struct phy_device *phydev, u32 regnum, 
u16 mask, u16 set)
return ret;
 }
 EXPORT_SYMBOL_GPL(__phy_modify);
+
+static int __phy_read_page(struct phy_device *phydev)
+{
+   return phydev->drv->read_page(phydev);
+}
+
+static int __phy_write_page(struct phy_device *phydev, int page)
+{
+   return phydev->drv->write_page(phydev, page);
+}
+
+/**
+ * phy_save_page() - take the bus lock and save the current page
+ * @phydev: a pointer to a &struct phy_device
+ *
+ * Take the MDIO bus lock, and return the current page number. On error,
+ * returns a negative errno. phy_restore_page() must be called after this
+ * to release the lock even on failure.
+ */
+int phy_save_page(struct phy_device *phydev)
+{
+   mutex_lock(&phydev->mdio.bus->mdio_lock);
+   return __phy_read_page(phydev);
+}
+EXPORT_SYMBOL_GPL(phy_save_page);
+
+/**
+ * phy_select_page() - take the bus lock, save the current page, and set a page
+ * @phydev: a pointer to a &struct phy_device
+ * @page: desired page
+ *
+ * Take the MDIO bus lock to protect against concurrent access, save the
+ * current PHY page, and set the current page.  On error, returns a
+ * negative errno, otherwise returns the previous page number.
+ * phy_restore_page() must be called after this to restore the page
+ * number (if this call was successful) and release the lock.
+ */
+int phy_select_page(struct phy_device *phydev, int page)
+{
+   int ret, oldpage;
+
+   oldpage = ret = phy_save_page(phydev);
+   if (ret < 0)
+   return ret;
+
+   if (oldpage != page) {
+   ret = __phy_write_page(phydev, page);
+   if (ret < 0)
+   return ret;
+   }
+
+   return oldpage;
+}
+EXPORT_SYMBOL_GPL(phy_select_page);
+
+/**
+ * phy_restore_page() - restore the page register and release the bus lock
+ * @phydev: a pointer to a &struct phy_device
+ * @oldpage: the old page, return value from phy_save_page() or 
phy_select_page()
+ * @ret: operation's return code
+ *
+ * Release the MDIO bus lock, restoring @oldpage if it is a valid page.
+ * This function propagates the earliest error code from the group of
+ * operations.
+ *
+ * Returns:
+ *   @oldpage if it was a negative value, otherwise
+ *   @ret if it was a negative errno value, otherwise
+ *   phy_write_page()'s negative value if it were in error, otherwise
+ *   @ret.
+ */
+int phy_restore_page(struct phy_device *phydev, int oldpage, int ret)
+{
+   int r;
+
+   if (oldpage >= 0) {
+   r = __phy_write_page(phydev, oldpage);
+
+   /* Propagate the operation return code if the page write
+* was successful.
+*/
+   if (ret >= 0 && r < 0)
+   ret = r;
+   } else {
+   /* Propagate the phy page selection error code */
+   ret = oldpage;
+   }
+
+   mutex_unlock(&phydev->mdio.bus->mdio_lock);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(phy_restore_page);
+
+/**
+ * phy_read_paged() - Convenience function for reading a paged register
+ * @phydev: a pointer to a &struct phy_device
+ * @page: the page for the phy
+ * @regnum: register number
+ *
+ * Same rules as for phy_read();
+ */
+int phy_read_paged(struct phy_device *phydev, int page, u32 regnum)
+{
+   int ret = 0, oldpage;
+
+   oldpage = phy_select_page(phydev, page);
+   if (oldpage >= 0)
+   ret = __phy_read(phydev, regnum);
+
+   return phy_restore_page(phydev, oldpage, ret);
+}
+EXPORT_SYMBOL(phy_read_paged);
+
+/**
+ * phy_write_paged() - Convenience function for writing a paged register
+ * @phydev: a pointer to a &struct phy_device
+ * @page: the page for the phy
+ * @regnum: register number
+ * @val: value to write
+ *
+ * Same rules as for phy_write();
+ */
+int phy_write_paged(struct phy_device *phydev, int page, u32 regnum, u16 val)
+{
+   int ret = 0, oldpage;
+
+   oldpage = phy_select_page(phydev, page);
+   if (oldpage >= 0)
+   ret = __phy_write(phydev, regnum, val);
+
+   return phy_restore_page(phydev, oldpage, ret);
+}
+EXPORT_SYMBOL(phy_write_paged);
+
+/**
+ * phy_modify_paged() - Convenience function for modifying a paged register
+ * @phydev: a pointer to a &struct phy_device
+ * @page: the page for the phy
+ * @regnum: register number
+ * @mask: bit mask of bits to clear
+ * @set: bit mask of bits to set
+ *
+ * Same rules as for phy_

[PATCH net-next 5/7] net: phy: marvell: fix paged access races

2017-12-29 Thread Russell King
For paged accesses to be truely safe, we need to hold the bus lock to
prevent anyone else gaining access to the registers while we modify
them.

The phydev->lock mutex does not do this: userspace via the MII ioctl
can still sneak in and read or write any register while we are on a
different page, and the suspend/resume methods can be called by a
thread different to the thread polling the phy status.

Races have been observed with mvneta on SolidRun Clearfog with phylink,
particularly between the phylib worker reading the PHYs status, and
the thread resuming mvneta, calling phy_start() which then calls
through to m88e1121_config_aneg_rgmii_delays(), which tries to
read-modify-write the MSCR register:

CPU0CPU1
marvell_read_status_page()
marvell_set_page(phydev, MII_MARVELL_FIBER_PAGE)
...
m88e1121_config_aneg_rgmii_delays()
set_page(MII_MARVELL_MSCR_PAGE)
phy_read(phydev, MII_88E1121_PHY_MSCR_REG)
marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
...
phy_write(phydev, MII_88E1121_PHY_MSCR_REG)

The result of this is we end up writing the copper page register 21,
which causes the copper PHY to be disabled, and the link partner sees
the link immediately go down.

Solve this by taking the bus lock instead of the PHY lock, thereby
preventing other accesses to the PHY while we are accessing other PHY
pages.

Signed-off-by: Russell King 
---
 drivers/net/phy/marvell.c | 354 ++
 1 file changed, 137 insertions(+), 217 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 82104edca393..1cb84064d658 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -83,7 +83,7 @@
 #define MII_88E1121_PHY_MSCR_REG   21
 #define MII_88E1121_PHY_MSCR_RX_DELAY  BIT(5)
 #define MII_88E1121_PHY_MSCR_TX_DELAY  BIT(4)
-#define MII_88E1121_PHY_MSCR_DELAY_MASK(~(BIT(5) | BIT(4)))
+#define MII_88E1121_PHY_MSCR_DELAY_MASK(BIT(5) | BIT(4))
 
 #define MII_88E1121_MISC_TEST  0x1a
 #define MII_88E1510_MISC_TEST_TEMP_THRESHOLD_MASK  0x1f00
@@ -177,27 +177,19 @@ struct marvell_priv {
struct device *hwmon_dev;
 };
 
-static int marvell_get_page(struct phy_device *phydev)
+static int marvell_read_page(struct phy_device *phydev)
 {
-   return phy_read(phydev, MII_MARVELL_PHY_PAGE);
+   return __phy_read(phydev, MII_MARVELL_PHY_PAGE);
 }
 
-static int marvell_set_page(struct phy_device *phydev, int page)
+static int marvell_write_page(struct phy_device *phydev, int page)
 {
-   return phy_write(phydev, MII_MARVELL_PHY_PAGE, page);
+   return __phy_write(phydev, MII_MARVELL_PHY_PAGE, page);
 }
 
-static int marvell_get_set_page(struct phy_device *phydev, int page)
+static int marvell_set_page(struct phy_device *phydev, int page)
 {
-   int oldpage = marvell_get_page(phydev);
-
-   if (oldpage < 0)
-   return oldpage;
-
-   if (page != oldpage)
-   return marvell_set_page(phydev, page);
-
-   return 0;
+   return phy_write(phydev, MII_MARVELL_PHY_PAGE, page);
 }
 
 static int marvell_ack_interrupt(struct phy_device *phydev)
@@ -399,7 +391,7 @@ static int m88e_config_aneg(struct phy_device *phydev)
 static int marvell_of_reg_init(struct phy_device *phydev)
 {
const __be32 *paddr;
-   int len, i, saved_page, current_page, ret;
+   int len, i, saved_page, current_page, ret = 0;
 
if (!phydev->mdio.dev.of_node)
return 0;
@@ -409,12 +401,11 @@ static int marvell_of_reg_init(struct phy_device *phydev)
if (!paddr || len < (4 * sizeof(*paddr)))
return 0;
 
-   saved_page = marvell_get_page(phydev);
+   saved_page = phy_save_page(phydev);
if (saved_page < 0)
-   return saved_page;
+   goto err;
current_page = saved_page;
 
-   ret = 0;
len /= sizeof(*paddr);
for (i = 0; i < len - 3; i += 4) {
u16 page = be32_to_cpup(paddr + i);
@@ -425,14 +416,14 @@ static int marvell_of_reg_init(struct phy_device *phydev)
 
if (page != current_page) {
current_page = page;
-   ret = marvell_set_page(phydev, page);
+   ret = marvell_write_page(phydev, page);
if (ret < 0)
goto err;
}
 
val = 0;
if (mask) {
-   val = phy_read(phydev, reg);
+   val = __phy_read(phydev, reg);
if (val < 0) {
ret = val;
goto err;
@@ -441,17 +432,12 @@ static int marvell_of_reg_init(struct phy_device *phydev)
}
val

[PATCH net-next 3/7] net: phy: add unlocked accessors

2017-12-29 Thread Russell King
Add unlocked versions of the bus accessors, which allows access to the
bus with all the tracing. These accessors validate that the bus mutex
is held, which is a basic requirement for all mii bus accesses.

Also added is a read-modify-write unlocked accessor with the same
locking requirements.

Signed-off-by: Russell King 
---
 drivers/net/phy/phy-core.c | 25 +
 include/linux/phy.h| 28 
 2 files changed, 53 insertions(+)

diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 83d32644cb4d..37c039da0c16 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -280,3 +280,28 @@ int phy_write_mmd(struct phy_device *phydev, int devad, 
u32 regnum, u16 val)
return ret;
 }
 EXPORT_SYMBOL(phy_write_mmd);
+
+/**
+ * __phy_modify() - Convenience function for modifying a PHY register
+ * @phydev: a pointer to a &struct phy_device
+ * @regnum: register number
+ * @mask: bit mask of bits to clear
+ * @set: bit mask of bits to set
+ *
+ * Unlocked helper function which allows a PHY register to be modified as
+ * new register value = (old register value & mask) | set
+ */
+int __phy_modify(struct phy_device *phydev, u32 regnum, u16 mask, u16 set)
+{
+   int ret, res;
+
+   ret = __phy_read(phydev, regnum);
+   if (ret >= 0) {
+   res = __phy_write(phydev, regnum, (ret & ~mask) | set);
+   if (res < 0)
+   ret = res;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(__phy_modify);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 71d777fe6c3d..3dae5b7408b4 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -716,6 +716,18 @@ static inline int phy_read(struct phy_device *phydev, u32 
regnum)
 }
 
 /**
+ * __phy_read - convenience function for reading a given PHY register
+ * @phydev: the phy_device struct
+ * @regnum: register number to read
+ *
+ * The caller must have taken the MDIO bus lock.
+ */
+static inline int __phy_read(struct phy_device *phydev, u32 regnum)
+{
+   return __mdiobus_read(phydev->mdio.bus, phydev->mdio.addr, regnum);
+}
+
+/**
  * phy_write - Convenience function for writing a given PHY register
  * @phydev: the phy_device struct
  * @regnum: register number to write
@@ -731,6 +743,22 @@ static inline int phy_write(struct phy_device *phydev, u32 
regnum, u16 val)
 }
 
 /**
+ * __phy_write - Convenience function for writing a given PHY register
+ * @phydev: the phy_device struct
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ *
+ * The caller must have taken the MDIO bus lock.
+ */
+static inline int __phy_write(struct phy_device *phydev, u32 regnum, u16 val)
+{
+   return __mdiobus_write(phydev->mdio.bus, phydev->mdio.addr, regnum,
+  val);
+}
+
+int __phy_modify(struct phy_device *phydev, u32 regnum, u16 mask, u16 set);
+
+/**
  * phy_interrupt_is_valid - Convenience function for testing a given PHY irq
  * @phydev: the phy_device struct
  *
-- 
2.7.4



[PATCH net-next 7/7] net: phy: convert read-modify-write to phy_modify()

2017-12-29 Thread Russell King
Convert read-modify-write sequences in at803x, Marvell and core phylib
to use phy_modify() to ensure safety.

Signed-off-by: Russell King 
---
 drivers/net/phy/at803x.c | 20 +++
 drivers/net/phy/marvell.c| 82 
 drivers/net/phy/phy_device.c | 50 +--
 3 files changed, 43 insertions(+), 109 deletions(-)

diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index e911e4990b20..e86c1b8b1b51 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -216,34 +216,22 @@ static int at803x_suspend(struct phy_device *phydev)
int value;
int wol_enabled;
 
-   mutex_lock(&phydev->lock);
-
value = phy_read(phydev, AT803X_INTR_ENABLE);
wol_enabled = value & AT803X_INTR_ENABLE_WOL;
 
-   value = phy_read(phydev, MII_BMCR);
-
if (wol_enabled)
-   value |= BMCR_ISOLATE;
+   value = BMCR_ISOLATE;
else
-   value |= BMCR_PDOWN;
+   value = BMCR_PDOWN;
 
-   phy_write(phydev, MII_BMCR, value);
-
-   mutex_unlock(&phydev->lock);
+   phy_modify(phydev, MII_BMCR, 0, value);
 
return 0;
 }
 
 static int at803x_resume(struct phy_device *phydev)
 {
-   int value;
-
-   value = phy_read(phydev, MII_BMCR);
-   value &= ~(BMCR_PDOWN | BMCR_ISOLATE);
-   phy_write(phydev, MII_BMCR, value);
-
-   return 0;
+   return phy_modify(phydev, MII_BMCR, ~(BMCR_PDOWN | BMCR_ISOLATE), 0);
 }
 
 static int at803x_probe(struct phy_device *phydev)
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 1cb84064d658..6129ab1000f9 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -664,19 +664,14 @@ static int m88e1116r_config_init(struct phy_device 
*phydev)
 
 static int m88e3016_config_init(struct phy_device *phydev)
 {
-   int reg;
+   int ret;
 
/* Enable Scrambler and Auto-Crossover */
-   reg = phy_read(phydev, MII_88E3016_PHY_SPEC_CTRL);
-   if (reg < 0)
-   return reg;
-
-   reg &= ~MII_88E3016_DISABLE_SCRAMBLER;
-   reg |= MII_88E3016_AUTO_MDIX_CROSSOVER;
-
-   reg = phy_write(phydev, MII_88E3016_PHY_SPEC_CTRL, reg);
-   if (reg < 0)
-   return reg;
+   ret = phy_modify(phydev, MII_88E3016_PHY_SPEC_CTRL,
+~MII_88E3016_DISABLE_SCRAMBLER,
+MII_88E3016_AUTO_MDIX_CROSSOVER);
+   if (ret < 0)
+   return ret;
 
return marvell_config_init(phydev);
 }
@@ -685,42 +680,34 @@ static int m88e_config_init_hwcfg_mode(struct 
phy_device *phydev,
   u16 mode,
   int fibre_copper_auto)
 {
-   int temp;
-
-   temp = phy_read(phydev, MII_M_PHY_EXT_SR);
-   if (temp < 0)
-   return temp;
-
-   temp &= ~(MII_M_HWCFG_MODE_MASK |
- MII_M_HWCFG_FIBER_COPPER_AUTO |
- MII_M_HWCFG_FIBER_COPPER_RES);
-   temp |= mode;
-
if (fibre_copper_auto)
-   temp |= MII_M_HWCFG_FIBER_COPPER_AUTO;
+   mode |= MII_M_HWCFG_FIBER_COPPER_AUTO;
 
-   return phy_write(phydev, MII_M_PHY_EXT_SR, temp);
+   return phy_modify(phydev, MII_M_PHY_EXT_SR,
+ (u16)~(MII_M_HWCFG_MODE_MASK |
+MII_M_HWCFG_FIBER_COPPER_AUTO |
+MII_M_HWCFG_FIBER_COPPER_RES),
+ mode);
 }
 
 static int m88e_config_init_rgmii_delays(struct phy_device *phydev)
 {
-   int temp;
-
-   temp = phy_read(phydev, MII_M_PHY_EXT_CR);
-   if (temp < 0)
-   return temp;
+   int delay;
 
if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) {
-   temp |= (MII_M_RGMII_RX_DELAY | MII_M_RGMII_TX_DELAY);
+   delay = MII_M_RGMII_RX_DELAY | MII_M_RGMII_TX_DELAY;
} else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID) {
-   temp &= ~MII_M_RGMII_TX_DELAY;
-   temp |= MII_M_RGMII_RX_DELAY;
+   delay = MII_M_RGMII_RX_DELAY;
} else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID) {
-   temp &= ~MII_M_RGMII_RX_DELAY;
-   temp |= MII_M_RGMII_TX_DELAY;
+   delay = MII_M_RGMII_TX_DELAY;
+   } else {
+   delay = 0;
}
 
-   return phy_write(phydev, MII_M_PHY_EXT_CR, temp);
+   return phy_modify(phydev, MII_M_PHY_EXT_CR,
+ (u16)~(MII_M_RGMII_RX_DELAY |
+MII_M_RGMII_TX_DELAY),
+ delay);
 }
 
 static int m88e_config_init_rgmii(struct phy_device *phydev)
@@ -834,7 +821,6 @@ static int m88e1121_config_init(struct phy_device *phydev)
 static int m88e1510_con

[PATCH net-next 0/7] Resolve races in phy accessors

2017-12-29 Thread Russell King - ARM Linux
Hi,

This series resolves races with various accesses to PHY registers.
The first five patches are necessary before we add phylink support
to mvneta, the remaining three are merely cleanups for unobserved
races, and hence are less critical.

There are two possible classes of races that can occur: where we
write to a page register that changes the meaning of a group of
other registers, and where we read-modify-write a register.

Resolve these races by performing the accesses under the mdio bus
lock, ensuring that no other user can access the bus while the
series of atomic operations are being performed.

These patches have been posted before, and have been modified
along the lines of previous feedback:

- The third patch was originally reviewed by Florian, but as I've
  added __phy_modify() to it, I've removed that attributation.
- Included generic page-based accessors as suggested last time
  around.
- Since we have the unlocked __phy_modify() in this patch series,
  it is sensible to include the changes for this to marvell.c -
  these accessors have to change anyway to avoid deadlocks on the
  mdio bus lock.

I haven't been able to test the at803x.c changes yet beyond compile
testing - although I do have systems with an ar8035 PHY.  However,
they should be straight forward to review.

This is targetted for net-next because the races have not been
found in existing drivers, but have been observed with phylink
integrated into mvneta - that's not to say that the races do not
exist today, they are just unobserved (probably through lack of
rigorous enough testing.)  The race provoking condition is detailed
in patch 5.

 drivers/net/phy/at803x.c |  20 +-
 drivers/net/phy/marvell.c| 436 +--
 drivers/net/phy/mdio_bus.c   |  65 +--
 drivers/net/phy/phy-core.c   | 216 -
 drivers/net/phy/phy_device.c |  50 +
 include/linux/mdio.h |   3 +
 include/linux/phy.h  |  40 
 7 files changed, 487 insertions(+), 343 deletions(-)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


[PATCH net-next A 3/5] sfp: add support for 1000Base-PX and 1000Base-BX10

2017-12-29 Thread Russell King
Add support for decoding the transceiver information for 1000Base-PX and
1000Base-BX10.  These use 1000BASE-X protocol.

Signed-off-by: Russell King 
---
 drivers/net/phy/sfp-bus.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
index a668a67c7eef..6c05acd5b1d4 100644
--- a/drivers/net/phy/sfp-bus.c
+++ b/drivers/net/phy/sfp-bus.c
@@ -165,10 +165,26 @@ EXPORT_SYMBOL_GPL(sfp_parse_interface);
 void sfp_parse_support(struct sfp_bus *bus, const struct sfp_eeprom_id *id,
   unsigned long *support)
 {
+   unsigned int br_min, br_nom, br_max;
+
phylink_set(support, Autoneg);
phylink_set(support, Pause);
phylink_set(support, Asym_Pause);
 
+   /* Decode the bitrate information to MBd */
+   br_min = br_nom = br_max = 0;
+   if (id->base.br_nominal) {
+   if (id->base.br_nominal != 255) {
+   br_nom = id->base.br_nominal * 100;
+   br_min = br_nom + id->base.br_nominal * id->ext.br_min;
+   br_max = br_nom + id->base.br_nominal * id->ext.br_max;
+   } else if (id->ext.br_max) {
+   br_nom = 250 * id->ext.br_max;
+   br_max = br_nom + br_nom * id->ext.br_min / 100;
+   br_min = br_nom - br_nom * id->ext.br_min / 100;
+   }
+   }
+
/* Set ethtool support from the compliance fields. */
if (id->base.e10g_base_sr)
phylink_set(support, 1baseSR_Full);
@@ -187,6 +203,11 @@ void sfp_parse_support(struct sfp_bus *bus, const struct 
sfp_eeprom_id *id,
phylink_set(support, 1000baseT_Full);
}
 
+   /* 1000Base-PX or 1000Base-BX10 */
+   if ((id->base.e_base_px || id->base.e_base_bx10) &&
+   br_min <= 1300 && br_max >= 1200)
+   phylink_set(support, 1000baseX_Full);
+
switch (id->base.extended_cc) {
case 0x00: /* Unspecified */
break;
-- 
2.7.4



[PATCH net-next A 2/5] sfp: don't guess support from connector type

2017-12-29 Thread Russell King
Don't try to guess the support mask from the connector type - this is
mostly irrelevant to the speeds that the transceiver supports.

Signed-off-by: Russell King 
---
 drivers/net/phy/sfp-bus.c | 29 -
 1 file changed, 29 deletions(-)

diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
index bebc9391ac3d..a668a67c7eef 100644
--- a/drivers/net/phy/sfp-bus.c
+++ b/drivers/net/phy/sfp-bus.c
@@ -220,35 +220,6 @@ void sfp_parse_support(struct sfp_bus *bus, const struct 
sfp_eeprom_id *id,
if (id->base.br_nominal >= 12)
phylink_set(support, 1000baseX_Full);
}
-
-   switch (id->base.connector) {
-   case SFP_CONNECTOR_SC:
-   case SFP_CONNECTOR_FIBERJACK:
-   case SFP_CONNECTOR_LC:
-   case SFP_CONNECTOR_MT_RJ:
-   case SFP_CONNECTOR_MU:
-   case SFP_CONNECTOR_OPTICAL_PIGTAIL:
-   break;
-
-   case SFP_CONNECTOR_UNSPEC:
-   if (id->base.e1000_base_t)
-   break;
-
-   case SFP_CONNECTOR_SG: /* guess */
-   case SFP_CONNECTOR_MPO_1X12:
-   case SFP_CONNECTOR_MPO_2X16:
-   case SFP_CONNECTOR_HSSDC_II:
-   case SFP_CONNECTOR_COPPER_PIGTAIL:
-   case SFP_CONNECTOR_NOSEPARATE:
-   case SFP_CONNECTOR_MXC_2X16:
-   default:
-   /* a guess at the supported link modes */
-   dev_warn(bus->sfp_dev,
-"Guessing link modes, please report...\n");
-   phylink_set(support, 1000baseT_Half);
-   phylink_set(support, 1000baseT_Full);
-   break;
-   }
 }
 EXPORT_SYMBOL_GPL(sfp_parse_support);
 
-- 
2.7.4



[PATCH net-next A 4/5] sfp: improve support for direct-attach copper cables

2017-12-29 Thread Russell King
Improve the support for direct-attach copper so that we avoid kernel
warning messages, and report the appropriate PORT_DA type to userspace.
Direct Attach cables can use a number of protocols depending on their
range of speeds.

Signed-off-by: Russell King 
---
 drivers/net/phy/sfp-bus.c | 51 ---
 include/linux/sfp.h   | 36 -
 2 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
index 6c05acd5b1d4..bdc4bb3c8288 100644
--- a/drivers/net/phy/sfp-bus.c
+++ b/drivers/net/phy/sfp-bus.c
@@ -57,21 +57,19 @@ int sfp_parse_port(struct sfp_bus *bus, const struct 
sfp_eeprom_id *id,
case SFP_CONNECTOR_MT_RJ:
case SFP_CONNECTOR_MU:
case SFP_CONNECTOR_OPTICAL_PIGTAIL:
-   if (support)
-   phylink_set(support, FIBRE);
port = PORT_FIBRE;
break;
 
case SFP_CONNECTOR_RJ45:
-   if (support)
-   phylink_set(support, TP);
port = PORT_TP;
break;
 
+   case SFP_CONNECTOR_COPPER_PIGTAIL:
+   port = PORT_DA;
+   break;
+
case SFP_CONNECTOR_UNSPEC:
if (id->base.e1000_base_t) {
-   if (support)
-   phylink_set(support, TP);
port = PORT_TP;
break;
}
@@ -80,7 +78,6 @@ int sfp_parse_port(struct sfp_bus *bus, const struct 
sfp_eeprom_id *id,
case SFP_CONNECTOR_MPO_1X12:
case SFP_CONNECTOR_MPO_2X16:
case SFP_CONNECTOR_HSSDC_II:
-   case SFP_CONNECTOR_COPPER_PIGTAIL:
case SFP_CONNECTOR_NOSEPARATE:
case SFP_CONNECTOR_MXC_2X16:
port = PORT_OTHER;
@@ -92,6 +89,18 @@ int sfp_parse_port(struct sfp_bus *bus, const struct 
sfp_eeprom_id *id,
break;
}
 
+   if (support) {
+   switch (port) {
+   case PORT_FIBRE:
+   phylink_set(support, FIBRE);
+   break;
+
+   case PORT_TP:
+   phylink_set(support, TP);
+   break;
+   }
+   }
+
return port;
 }
 EXPORT_SYMBOL_GPL(sfp_parse_port);
@@ -143,6 +152,11 @@ phy_interface_t sfp_parse_interface(struct sfp_bus *bus,
break;
 
default:
+   if (id->base.e1000_base_cx) {
+   iface = PHY_INTERFACE_MODE_1000BASEX;
+   break;
+   }
+
iface = PHY_INTERFACE_MODE_NA;
dev_err(bus->sfp_dev,
"SFP module encoding does not support 8b10b nor 
64b66b\n");
@@ -208,6 +222,29 @@ void sfp_parse_support(struct sfp_bus *bus, const struct 
sfp_eeprom_id *id,
br_min <= 1300 && br_max >= 1200)
phylink_set(support, 1000baseX_Full);
 
+   /* For active or passive cables, select the link modes
+* based on the bit rates and the cable compliance bytes.
+*/
+   if ((id->base.sfp_ct_passive || id->base.sfp_ct_active) && br_nom) {
+   /* This may look odd, but some manufacturers use 12000MBd */
+   if (br_min <= 12000 && br_max >= 10300)
+   phylink_set(support, 1baseCR_Full);
+   if (br_min <= 3200 && br_max >= 3100)
+   phylink_set(support, 2500baseX_Full);
+   if (br_min <= 1300 && br_max >= 1200)
+   phylink_set(support, 1000baseX_Full);
+   }
+   if (id->base.sfp_ct_passive) {
+   if (id->base.passive.sff8431_app_e)
+   phylink_set(support, 1baseCR_Full);
+   }
+   if (id->base.sfp_ct_active) {
+   if (id->base.active.sff8431_app_e ||
+   id->base.active.sff8431_lim) {
+   phylink_set(support, 1baseCR_Full);
+   }
+   }
+
switch (id->base.extended_cc) {
case 0x00: /* Unspecified */
break;
diff --git a/include/linux/sfp.h b/include/linux/sfp.h
index 0c5c5f6ae1ec..e724d5a3dd80 100644
--- a/include/linux/sfp.h
+++ b/include/linux/sfp.h
@@ -165,7 +165,41 @@ struct sfp_eeprom_base {
char vendor_rev[4];
union {
__be16 optical_wavelength;
-   u8 cable_spec;
+   __be16 cable_compliance;
+   struct {
+#if defined __BIG_ENDIAN_BITFIELD
+   u8 reserved60_2:6;
+   u8 fc_pi_4_app_h:1;
+   u8 sff8431_app_e:1;
+   u8 reserved61:8;
+#elif defined __LITTLE_ENDIAN_BITFIELD
+   u8 sff8431_app_e:1;
+   u8 fc_pi_4_app_h:1;
+   u8 reserved60_2:6;
+   u8 reserved61:8;
+#else
+#error Unknown Endian

[PATCH net-next A 5/5] phylink: remove 'mode' variable from phylink_sfp_module_insert()

2017-12-29 Thread Russell King
'mode' is actually constant through phylink_sfp_module_insert(), so
remove it and replace it with the enumerated constant.

Signed-off-by: Russell King 
---
 drivers/net/phy/phylink.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 073530949677..4e8c459bf062 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1578,7 +1578,7 @@ static int phylink_sfp_module_insert(void *upstream,
__ETHTOOL_DECLARE_LINK_MODE_MASK(support) = { 0, };
struct phylink_link_state config;
phy_interface_t iface;
-   int mode, ret = 0;
+   int ret = 0;
bool changed;
u8 port;
 
@@ -1593,7 +1593,6 @@ static int phylink_sfp_module_insert(void *upstream,
case PHY_INTERFACE_MODE_1000BASEX:
case PHY_INTERFACE_MODE_2500BASEX:
case PHY_INTERFACE_MODE_10GKR:
-   mode = MLO_AN_INBAND;
break;
default:
return -EINVAL;
@@ -1611,13 +1610,15 @@ static int phylink_sfp_module_insert(void *upstream,
ret = phylink_validate(pl, support, &config);
if (ret) {
netdev_err(pl->netdev, "validation of %s/%s with support %*pb 
failed: %d\n",
-  phylink_an_mode_str(mode), 
phy_modes(config.interface),
+  phylink_an_mode_str(MLO_AN_INBAND),
+  phy_modes(config.interface),
   __ETHTOOL_LINK_MODE_MASK_NBITS, support, ret);
return ret;
}
 
netdev_dbg(pl->netdev, "requesting link mode %s/%s with support %*pb\n",
-  phylink_an_mode_str(mode), phy_modes(config.interface),
+  phylink_an_mode_str(MLO_AN_INBAND),
+  phy_modes(config.interface),
   __ETHTOOL_LINK_MODE_MASK_NBITS, support);
 
if (phy_interface_mode_is_8023z(iface) && pl->phydev)
@@ -1630,15 +1631,15 @@ static int phylink_sfp_module_insert(void *upstream,
linkmode_copy(pl->link_config.advertising, config.advertising);
}
 
-   if (pl->link_an_mode != mode ||
+   if (pl->link_an_mode != MLO_AN_INBAND ||
pl->link_config.interface != config.interface) {
pl->link_config.interface = config.interface;
-   pl->link_an_mode = mode;
+   pl->link_an_mode = MLO_AN_INBAND;
 
changed = true;
 
netdev_info(pl->netdev, "switched to %s/%s link mode\n",
-   phylink_an_mode_str(mode),
+   phylink_an_mode_str(MLO_AN_INBAND),
phy_modes(config.interface));
}
 
-- 
2.7.4



[PATCH net-next A 0/5] further sfp/phylink updates

2017-12-29 Thread Russell King - ARM Linux
Hi,

This series:
- cleans up printing of module information
- improves the transceiver capability decoding, getting rid of the
  guessing by connector type, improves direct-attach cable support
  and adds support for 1G Base-PX and Base-BX10 modules.
- cleans up phylink_sfp_module_insert()

 drivers/net/phy/phylink.c |  15 +++
 drivers/net/phy/sfp-bus.c | 101 +-
 drivers/net/phy/sfp.c |  24 +++
 include/linux/sfp.h   |  36 -
 4 files changed, 114 insertions(+), 62 deletions(-)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


[PATCH net-next A 1/5] sfp: use precision to print non-null terminated strings

2017-12-29 Thread Russell King
Rather than memcpy()'ing the strings and null terminate them, printf
allows non-NULL terminated strings provided the precision is not more
than the size of the buffer.  Use this form to print the basic module
information rather than copying and terminating the strings.

Signed-off-by: Russell King 
---
 drivers/net/phy/sfp.c | 24 ++--
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index ee6b2e041171..6c7d9289078d 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -466,11 +466,6 @@ static int sfp_sm_mod_probe(struct sfp *sfp)
 {
/* SFP module inserted - read I2C data */
struct sfp_eeprom_id id;
-   char vendor[17];
-   char part[17];
-   char sn[17];
-   char date[9];
-   char rev[5];
u8 check;
int err;
 
@@ -506,19 +501,12 @@ static int sfp_sm_mod_probe(struct sfp *sfp)
 
sfp->id = id;
 
-   memcpy(vendor, sfp->id.base.vendor_name, 16);
-   vendor[16] = '\0';
-   memcpy(part, sfp->id.base.vendor_pn, 16);
-   part[16] = '\0';
-   memcpy(rev, sfp->id.base.vendor_rev, 4);
-   rev[4] = '\0';
-   memcpy(sn, sfp->id.ext.vendor_sn, 16);
-   sn[16] = '\0';
-   memcpy(date, sfp->id.ext.datecode, 8);
-   date[8] = '\0';
-
-   dev_info(sfp->dev, "module %s %s rev %s sn %s dc %s\n",
-vendor, part, rev, sn, date);
+   dev_info(sfp->dev, "module %.*s %.*s rev %.*s sn %.*s dc %.*s\n",
+(int)sizeof(id.base.vendor_name), id.base.vendor_name,
+(int)sizeof(id.base.vendor_pn), id.base.vendor_pn,
+(int)sizeof(id.base.vendor_rev), id.base.vendor_rev,
+(int)sizeof(id.ext.vendor_sn), id.ext.vendor_sn,
+(int)sizeof(id.ext.datecode), id.ext.datecode);
 
/* Check whether we support this module */
if (!sfp->type->module_supported(&sfp->id)) {
-- 
2.7.4



Re: [RFT net-next v3 0/5] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-29 Thread Emiliano Ingrassia
Hi Martin,

On Fri, Dec 29, 2017 at 08:48:54AM +0100, Martin Blumenstingl wrote:
> Hi Emiliano,
> 
> On Fri, Dec 29, 2017 at 2:31 AM, Emiliano Ingrassia
>  wrote:
> > Hi Martin, Hi Dave,
> >
> > On Thu, Dec 28, 2017 at 11:21:23PM +0100, Martin Blumenstingl wrote:
> >> Hi Dave,
> >>
> >> please do not apply this series until it got a Tested-by from Emiliano.
> >>
> >>
> >> Hi Emiliano,
> >>
> >> you reported [0] that you couldn't get dwmac-meson8b to work on your
> >> Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
> >> I think I was able to find a fix: it consists of two patches (which you
> >> find in this series)
> >>
> >> Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
> >> only partially test this (I could only check if the clocks were
> >> calculated correctly when using a dummy 52394Hz input clock instead
> >> of MPLL2).
> >>
> >> Could you please give this series a try and let me know about the
> >> results?
> >> You obviously still need your two "ARM: dts: meson8b" patches which
> >> - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
> >> - enable Ethernet on the Odroid-C1
> >>
> >> When testing on Meson8b this also needs a fix for the MPLL clock driver:
> >> "clk: meson: mpll: use 64-bit maths in params_from_rate", see:
> >> https://patchwork.kernel.org/patch/10131677/
> >>
> >>
> >> I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
> >> and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
> >> fine (so let's hope that this also fixes your Meson8b issue :)).
> >>
> >>
> >> changes since v1 at [1]:
> >> - changed the subject of the cover-letter to indicate that this is all
> >>   about the RGMII clock
> >> - added PATCH #1 which ensures that we don't unnecessarily change the
> >>   parent clocks in RMII mode (and also makes the code easier to
> >>   understand)
> >> - changed subject of PATCH #2 (formerly PATCH #1) to state that this
> >>   is about the RGMII clock
> >> - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
> >> - replaced PATCH #3 (formerly PATCH #2) with one that sets
> >>   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
> >>   on Meson8b correctly
> >>
> >> changes since v2 at [2]:
> >> - added PATCH #2 to make the following patch easier
> >> - Emiliano reported that there's currently another bug in the
> >>   dwmac-meson8b driver which prevents it from working with RGMII PHYs on
> >>   Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
> >>   (instead of a divide by 5 or divide by 10 clock divider). This has not
> >>   been visible on GXBB and later due to the input clock which always led
> >>   to a selection of "divide by 10" (which is done internally in the IP
> >>   block, but the bit actually means "enable RGMII clock output").
> >>   PATCH #3 was added to address this issue.
> >> - the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
> >>   updated and the patch itself rebased because the m25_div clock was
> >>   removed with the new PATCH #3 (so some of the statements were not
> >>   valid anymore)
> >>
> >
> > Here is the clk_summary relative to ethernet on Odroid-C1+
> > with this new series applied:
> >
> > xtal112400  0 0
> >  sys_pll00  12  0 0
> >   cpu_clk   00  12  0 0
> >  vid_pll00   73200  0 0
> >  fixed_pll  22  255000  0 0
> >   mpll2 11   24701  0 0
> >c941.ethernet#m250_sel   11   24701  0 0
> > c941.ethernet#m250_div  11   24701  0 0
> >  c941.ethernet#fixed_div10  112470  0 0
> >   c941.ethernet#m25_en  112470  0 0
> >
> > The ethernet prg0 register is set to 0x74A1 which should be correct with
> > respect to the information contained in the S805 SoC manual.
> > Actually, the ethernet is not yet fully functional.
> > Trying to ping the board, I can see ARP request from host to board using
> > tcpdump. However, the host can't see any response.
> great - we're getting closer!
> 
> > Following the U-Boot value for prg0 register, which is 0x7d21, I also
> > tried to set bit 11. As expected, this did not have any influence.
> it *may* be something outside the PRG_ETH0 register than
> to confirm that: could you temporarily revert the last patch from this
> series ("net: stmmac: dwmac-meson8b: propagate rate changes to the
> parent clock")? this way MPLL2 will stay at ~500MHz and PRG_ETH0
> should be identical to what u-boot sets (apart from bit 11, but that
> is only relevant in RMII mode according to the datasheet)
>

Here is the clk_summ

Re: [PATCH] NET: usb: qmi_wwan: add support for YUGA CLM920-NC5 PID 0x9625

2017-12-29 Thread Bjørn Mork
"SZ Lin (林上智)"  writes:

> This patch adds support for PID 0x9625 of YUGA CLM920-NC5.
>
> YUGA CLM920-NC5 needs to enable QMI_WWAN_QUIRK_DTR before QMI operation.
>
> qmicli -d /dev/cdc-wdm0 -p --dms-get-revision
> [/dev/cdc-wdm0] Device revision retrieved:
> Revision: 'CLM920_NC5-V1  1  [Oct 23 2016 19:00:00]'
>
> Signed-off-by: SZ Lin (林上智) 
> ---
>  drivers/net/usb/qmi_wwan.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
> index 3000ddd1c7e2..728819feab44 100644
> --- a/drivers/net/usb/qmi_wwan.c
> +++ b/drivers/net/usb/qmi_wwan.c
> @@ -1100,6 +1100,7 @@ static const struct usb_device_id products[] = {
>   {QMI_FIXED_INTF(0x05c6, 0x9084, 4)},
>   {QMI_FIXED_INTF(0x05c6, 0x920d, 0)},
>   {QMI_FIXED_INTF(0x05c6, 0x920d, 5)},
> + {QMI_QUIRK_SET_DTR(0x05c6, 0x9625, 4)}, /* YUGA CLM920-NC5 */
>   {QMI_FIXED_INTF(0x0846, 0x68a2, 8)},
>   {QMI_FIXED_INTF(0x12d1, 0x140c, 1)},/* Huawei E173 */
>   {QMI_FIXED_INTF(0x12d1, 0x14ac, 1)},/* Huawei E1820 */

Perfect.  Thanks

Acked-by: Bjørn Mork 


Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface

2017-12-29 Thread Russell King - ARM Linux
On Fri, Dec 29, 2017 at 12:12:15PM +0100, Marcin Wojtas wrote:
> Hi Russell,
> 
> I see that I misspelled your email address, hence the series remained 
> unnoticed:
> https://lkml.org/lkml/2017/12/18/216
> 
> In terms of the phylink support, I think the most important are:
> * 3/8
> https://lkml.org/lkml/2017/12/18/211
> * 7/8
> https://lkml.org/lkml/2017/12/18/207
> 
> I think the way of obtaining PHY fwnode and connecting it from the
> latter patch could be incorporated to the phylink code. Although I
> didn't get much feedback, the whole ACPI-handling of MDIO bus and the
> PHYs touch ACPI specification and I expect it a slower to get merged.
> Hence my idea is following:
> * Send v2 with ACPI supporting link-irq only in mvpp2.c
> * Extract MDIO bus handling for ACPI and propose PHY handling
> modifications in phylink.
> 
> This way we may push the two things forwards in more efficient way.
> I'm looking forward to your opinion.

Agreed - as we have very few users of phylink at the moment (they're
mostly all in external trees) we can easily change the phylink
interfaces.  The first step is solving the ACPI representation of the
MDIO bus and attached devices, and until that is settled, not much can
be done.

However, it seems to me that the issues of adding ACPI to mvpp2 vs
adding phylink to mvpp2 are two entirely separate problems that don't
really conflict with each other - since the "phy" problem afflicts
both.

However, I'm not sure what this "link-irq" thing is that you talk
about (and I suspect it's one of the things that I've been trying for
months to find out about from Antoine when he says that there's stuff
that mvpp2 supports that phylink doesn't.)  So, I'm left to guess, and
I guess it's the mvpp2-variant of mvneta's in-band autonegotiation.
Continuing to guess from the mvpp2 phylink conversion patch, this mvpp2
variant is selected by not providing a phy handle in DT, whereas
mvneta's variant is selected using the ethernet-standard property
'managed = "in-band-status"'.

If my guessing is correct, I have to wonder why mvpp2 invented a
different way to represent this from mvneta?  This makes it much more
difficult to convert mvpp2 to phylink, and it also makes it difficult
to add SFP support ignoring the phylink issue (since there is no phy
handle there either.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface

2017-12-29 Thread Marcin Wojtas
Hi Russell,

2017-12-28 19:46 GMT+01:00 Russell King - ARM Linux :
> On Thu, Dec 28, 2017 at 07:27:39PM +0100, Antoine Tenart wrote:
>> Hi Florian,
>>
>> On Thu, Dec 28, 2017 at 07:02:09AM -0800, Florian Fainelli wrote:
>> > On 12/28/2017 02:05 AM, Antoine Tenart wrote:
>> > > On Thu, Dec 28, 2017 at 08:46:23AM +0100, Andrew Lunn wrote:
>> > >> On Wed, Dec 27, 2017 at 10:24:01PM +, Russell King - ARM Linux 
>> > >> wrote:
>> > >>> On Wed, Dec 27, 2017 at 11:14:45PM +0100, Antoine Tenart wrote:
>> > 
>> >  +&cps_eth2 {
>> >  +  /* CPS Lane 5 */
>> >  +  status = "okay";
>> >  +  phy-mode = "2500base-x";
>> >  +  /* Generic PHY, providing serdes lanes */
>> >  +  phys = <&cps_comphy5 2>;
>> >  +};
>> >  +
>> > >>>
>> > >>> This is wrong.  This lane is connected to a SFP cage which can support
>> > >>> more than just 2500base-X.  Tying it in this way to 2500base-X means
>> > >>> that this port does not support conenctions at 1000base-X, despite
>> > >>> that's one of the most popular and more standardised speeds.
>> > >>>
>> > >>
>> > >> I agree with Russell here. SFP modules are hot pluggable, and support
>> > >> a range of interface modes. You need to query what the SFP module is
>> > >> in order to know how to configure the SERDES interface. The phylink
>> > >> infrastructure does that for you.
>> > >
>> > > Sure, I understand. We'll be able to support such interfaces only when
>> > > the phylink PPv2 support lands in.
>> >
>> > Should we expect PHYLINK support to make it as the first patch in your
>> > v2 of this patch series, or is someone else doing that?
>>
>> No, the phylink patch conflicts with Marcin's ACPI series and we agreed
>> to let him get his series merged first. And I will probably work on a
>> few other topics before having the chance to work on it. So it'll
>> probably be me doing that, but not right now.
>
> ACPI is going to be a problem with phylink for a while.  There's patches
> queued in net-next which convert phylink and SFP mostly to the fwnode
> and property based systems, but phylib and i2c do not seem to have the
> necessary bits to be able to deal with those.
>
> Specifically, in DT we have "of_find_i2c_adapter_by_node()" but afaics
> there is no equivalent in ACPI - which means in an ACPI based system
> we have no way to determine the I2C bus associated with a SFP socket,
> which is a rather fundamental issue for SFP modules.
>
> For phylib side, there's "of_phy_attach()" but again there is no
> equivalent in ACPI. This should not be that much of a problem, because
> network drivers using the DT phylib calls (eg, "of_phy_connect()") are
> already restricted by this. That may have been solved by Marcin's
> series, but I've not seen it to know.
>

I see that I misspelled your email address, hence the series remained unnoticed:
https://lkml.org/lkml/2017/12/18/216

In terms of the phylink support, I think the most important are:
* 3/8
https://lkml.org/lkml/2017/12/18/211
* 7/8
https://lkml.org/lkml/2017/12/18/207

I think the way of obtaining PHY fwnode and connecting it from the
latter patch could be incorporated to the phylink code. Although I
didn't get much feedback, the whole ACPI-handling of MDIO bus and the
PHYs touch ACPI specification and I expect it a slower to get merged.
Hence my idea is following:
* Send v2 with ACPI supporting link-irq only in mvpp2.c
* Extract MDIO bus handling for ACPI and propose PHY handling
modifications in phylink.

This way we may push the two things forwards in more efficient way.
I'm looking forward to your opinion.

Best regards,
Marcin


Re: [PATCH][next] wcn36xx: remove redundant assignment to msg_body.min_ch_time

2017-12-29 Thread Colin Ian King
On 29/12/17 07:44, Loic Poulain wrote:
> Hi Colin, Bjorn,
> 
> On 26 December 2017 at 21:13, Bjorn Andersson
>  wrote:
>> On Tue 19 Dec 09:04 PST 2017, Colin King wrote:
>>
>>> From: Colin Ian King 
>>>
>>> msg_body.min_ch_time is being assigned twice; remove the redundant
>>> first assignment.
>>>
>>> Detected by CoverityScan, CID#1463042 ("Unused Value")
>>>
>>
>> Happy to see Coverity working for us :)
>>
>>
>> This should have had a:
>>
>> Fixes: 2f3bef4b247e ("wcn36xx: Add hardware scan offload support")
>>
>>> Signed-off-by: Colin Ian King 
>>> ---
>>>  drivers/net/wireless/ath/wcn36xx/smd.c | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
>>> b/drivers/net/wireless/ath/wcn36xx/smd.c
>>> index 2914618a0335..bab2eca5fcac 100644
>>> --- a/drivers/net/wireless/ath/wcn36xx/smd.c
>>> +++ b/drivers/net/wireless/ath/wcn36xx/smd.c
>>> @@ -625,7 +625,6 @@ int wcn36xx_smd_start_hw_scan(struct wcn36xx *wcn, 
>>> struct ieee80211_vif *vif,
>>>   INIT_HAL_MSG(msg_body, WCN36XX_HAL_START_SCAN_OFFLOAD_REQ);
>>>
>>>   msg_body.scan_type = WCN36XX_HAL_SCAN_TYPE_ACTIVE;
>>> - msg_body.min_ch_time = 30;
>>>   msg_body.min_ch_time = 100;
>>
>> But I strongly suspect the second line is supposed to be max_ch_time.
>>
>> @Loic, do you agree?
> 
> You're absolutely right.
> Colin could you please update your patch accordingly?

Resent as "wcn36xx: fix incorrect assignment to msg_body.min_ch_time"

> 
> Regards,
> Loic
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



[PATCH net-next 0/2] XDP transmission for tuntap

2017-12-29 Thread Jason Wang
Hi all:

This series tries to implement XDP transmission (ndo_xdp_xmit) for
tuntap. Pointer ring was used for queuing both XDP buffers and
sk_buff, this is done by encoding the type into lowest bit of the
pointer and storin XDP metadata in the headroom of XDP buff.

Tests gets 3.05 Mpps when doing xdp_redirect_map from ixgbe to VM
(testpmd + virtio-net in guest).

Please review.

Thanks

Jason Wang (2):
  tun/tap: use ptr_ring instead of skb_array
  tuntap: XDP transmission

 drivers/net/tap.c  |  41 -
 drivers/net/tun.c  | 233 ++---
 drivers/vhost/net.c|  52 ++-
 include/linux/if_tap.h |   6 +-
 include/linux/if_tun.h |  21 -
 5 files changed, 258 insertions(+), 95 deletions(-)

-- 
2.7.4



[PATCH net-next 2/2] tuntap: XDP transmission

2017-12-29 Thread Jason Wang
This patch implements XDP transmission for TAP. Since we can't create
new queues for TAP during XDP set, exist ptr_ring was reused for
queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG
(0x1ULL) was encoded into lowest bit of xpd_buff pointer during
ptr_ring_produce, and was decoded during consuming. XDP metadata was
stored in the headroom of the packet which should work in most of
cases since driver usually reserve enough headroom. Very minor changes
were done for vhost_net: it just need to peek the length depends on
the type of pointer.

Tests was done on two Intel E5-2630 2.40GHz machines connected back to
back through two 82599ES. Traffic were generated through MoonGen and
testpmd(rxonly) in guest reports 2.97Mpps when xdp_redirect_map is
doing redirection from ixgbe to TAP.

Cc: Jesper Dangaard Brouer 
Signed-off-by: Jason Wang 
---
 drivers/net/tun.c  | 205 -
 drivers/vhost/net.c|  13 +++-
 include/linux/if_tun.h |  17 
 3 files changed, 197 insertions(+), 38 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2c89efe..be6d993 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -240,6 +240,24 @@ struct tun_struct {
struct tun_steering_prog __rcu *steering_prog;
 };
 
+bool tun_is_xdp_buff(void *ptr)
+{
+   return (unsigned long)ptr & TUN_XDP_FLAG;
+}
+EXPORT_SYMBOL(tun_is_xdp_buff);
+
+void *tun_xdp_to_ptr(void *ptr)
+{
+   return (void *)((unsigned long)ptr | TUN_XDP_FLAG);
+}
+EXPORT_SYMBOL(tun_xdp_to_ptr);
+
+void *tun_ptr_to_xdp(void *ptr)
+{
+   return (void *)((unsigned long)ptr & ~TUN_XDP_FLAG);
+}
+EXPORT_SYMBOL(tun_ptr_to_xdp);
+
 static int tun_napi_receive(struct napi_struct *napi, int budget)
 {
struct tun_file *tfile = container_of(napi, struct tun_file, napi);
@@ -630,12 +648,25 @@ static struct tun_struct *tun_enable_queue(struct 
tun_file *tfile)
return tun;
 }
 
+static void tun_ptr_free(void *ptr)
+{
+   if (!ptr)
+   return;
+   if (tun_is_xdp_buff(ptr)) {
+   struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+
+   put_page(virt_to_head_page(xdp->data));
+   } else {
+   __skb_array_destroy_skb(ptr);
+   }
+}
+
 static void tun_queue_purge(struct tun_file *tfile)
 {
-   struct sk_buff *skb;
+   void *ptr;
 
-   while ((skb = ptr_ring_consume(&tfile->tx_ring)) != NULL)
-   kfree_skb(skb);
+   while ((ptr = ptr_ring_consume(&tfile->tx_ring)) != NULL)
+   tun_ptr_free(ptr);
 
skb_queue_purge(&tfile->sk.sk_write_queue);
skb_queue_purge(&tfile->sk.sk_error_queue);
@@ -688,8 +719,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
unregister_netdevice(tun->dev);
}
if (tun)
-   ptr_ring_cleanup(&tfile->tx_ring,
-__skb_array_destroy_skb);
+   ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
sock_put(&tfile->sk);
}
 }
@@ -1201,6 +1231,54 @@ static const struct net_device_ops tun_netdev_ops = {
.ndo_get_stats64= tun_net_get_stats64,
 };
 
+static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
+{
+   struct tun_struct *tun = netdev_priv(dev);
+   struct xdp_buff *buff = xdp->data_hard_start;
+   int headroom = xdp->data - xdp->data_hard_start;
+   struct tun_file *tfile;
+   u32 numqueues;
+   int ret = 0;
+
+   /* Assure headroom is available and buff is properly aligned */
+   if (unlikely(headroom < sizeof(*xdp) || tun_is_xdp_buff(xdp)))
+   return -ENOSPC;
+
+   *buff = *xdp;
+
+   rcu_read_lock();
+
+   numqueues = READ_ONCE(tun->numqueues);
+   if (!numqueues) {
+   ret = -ENOSPC;
+   goto out;
+   }
+   tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
+   numqueues]);
+   /* Encode the XDP flag into lowest bit for consumer to differ
+* XDP buffer from sk_buff.
+*/
+   if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(buff))) {
+   this_cpu_inc(tun->pcpu_stats->tx_dropped);
+   ret = -ENOSPC;
+   }
+
+out:
+   rcu_read_unlock();
+   return ret;
+}
+
+static void tun_xdp_flush(struct net_device *dev)
+{
+   struct tun_struct *tun = netdev_priv(dev);
+   struct tun_file *tfile = tun->tfiles[0];
+
+   /* Notify and wake up reader process */
+   if (tfile->flags & TUN_FASYNC)
+   kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
+   tfile->socket.sk->sk_data_ready(tfile->socket.sk);
+}
+
 static const struct net_device_ops tap_netdev_ops = {
.ndo_uninit = tun_net_uninit,
.ndo_open   = tun_net_open,
@@ -1218,6 +1296,8 @@ static const struct net_device_ops tap_netdev_ops 

[PATCH net-next 1/2] tun/tap: use ptr_ring instead of skb_array

2017-12-29 Thread Jason Wang
This patch switches to use ptr_ring instead of skb_array. This will be
used to enqueue different types of pointers by encoding type into
lower bits.

Signed-off-by: Jason Wang 
---
 drivers/net/tap.c  | 41 +
 drivers/net/tun.c  | 42 ++
 drivers/vhost/net.c| 39 ---
 include/linux/if_tap.h |  6 +++---
 include/linux/if_tun.h |  4 ++--
 5 files changed, 68 insertions(+), 64 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 0a886fda..7c38659 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -330,7 +330,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
if (!q)
return RX_HANDLER_PASS;
 
-   if (__skb_array_full(&q->skb_array))
+   if (__ptr_ring_full(&q->ring))
goto drop;
 
skb_push(skb, ETH_HLEN);
@@ -348,7 +348,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
goto drop;
 
if (!segs) {
-   if (skb_array_produce(&q->skb_array, skb))
+   if (ptr_ring_produce(&q->ring, skb))
goto drop;
goto wake_up;
}
@@ -358,7 +358,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
struct sk_buff *nskb = segs->next;
 
segs->next = NULL;
-   if (skb_array_produce(&q->skb_array, segs)) {
+   if (ptr_ring_produce(&q->ring, segs)) {
kfree_skb(segs);
kfree_skb_list(nskb);
break;
@@ -375,7 +375,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
!(features & NETIF_F_CSUM_MASK) &&
skb_checksum_help(skb))
goto drop;
-   if (skb_array_produce(&q->skb_array, skb))
+   if (ptr_ring_produce(&q->ring, skb))
goto drop;
}
 
@@ -497,7 +497,7 @@ static void tap_sock_destruct(struct sock *sk)
 {
struct tap_queue *q = container_of(sk, struct tap_queue, sk);
 
-   skb_array_cleanup(&q->skb_array);
+   ptr_ring_cleanup(&q->ring, __skb_array_destroy_skb);
 }
 
 static int tap_open(struct inode *inode, struct file *file)
@@ -517,7 +517,7 @@ static int tap_open(struct inode *inode, struct file *file)
 &tap_proto, 0);
if (!q)
goto err;
-   if (skb_array_init(&q->skb_array, tap->dev->tx_queue_len, GFP_KERNEL)) {
+   if (ptr_ring_init(&q->ring, tap->dev->tx_queue_len, GFP_KERNEL)) {
sk_free(&q->sk);
goto err;
}
@@ -546,7 +546,7 @@ static int tap_open(struct inode *inode, struct file *file)
 
err = tap_set_queue(tap, file, q);
if (err) {
-   /* tap_sock_destruct() will take care of freeing skb_array */
+   /* tap_sock_destruct() will take care of freeing ptr_ring */
goto err_put;
}
 
@@ -583,7 +583,7 @@ static unsigned int tap_poll(struct file *file, poll_table 
*wait)
mask = 0;
poll_wait(file, &q->wq.wait, wait);
 
-   if (!skb_array_empty(&q->skb_array))
+   if (!ptr_ring_empty(&q->ring))
mask |= POLLIN | POLLRDNORM;
 
if (sock_writeable(&q->sk) ||
@@ -844,7 +844,7 @@ static ssize_t tap_do_read(struct tap_queue *q,
TASK_INTERRUPTIBLE);
 
/* Read frames from the queue */
-   skb = skb_array_consume(&q->skb_array);
+   skb = ptr_ring_consume(&q->ring);
if (skb)
break;
if (noblock) {
@@ -1176,7 +1176,7 @@ static int tap_peek_len(struct socket *sock)
 {
struct tap_queue *q = container_of(sock, struct tap_queue,
   sock);
-   return skb_array_peek_len(&q->skb_array);
+   return PTR_RING_PEEK_CALL(&q->ring, __skb_array_len_with_tag);
 }
 
 /* Ops structure to mimic raw sockets with tun */
@@ -1202,7 +1202,7 @@ struct socket *tap_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tap_get_socket);
 
-struct skb_array *tap_get_skb_array(struct file *file)
+struct ptr_ring *tap_get_ptr_ring(struct file *file)
 {
struct tap_queue *q;
 
@@ -1211,29 +1211,30 @@ struct skb_array *tap_get_skb_array(struct file *file)
q = file->private_data;
if (!q)
return ERR_PTR(-EBADFD);
-   return &q->skb_array;
+   return &q->ring;
 }
-EXPORT_SYMBOL_GPL(tap_get_skb_array);
+EXPORT_SYMBOL_GPL(tap_get_ptr_ring);
 
 int tap_queue_resize(struct tap_dev *tap)
 {
struct net_device *dev = tap->dev;
struct tap_queue *q;
-   struct skb_array **arrays;
+   struct ptr_ring **ri

[PATCH][V2] wcn36xx: fix incorrect assignment to msg_body.min_ch_time

2017-12-29 Thread Colin King
From: Colin Ian King 

The second assignment to msg_body.min_ch_time is incorrect, it
should actually be to msg_body.max_ch_time.

Thanks to Bjorn Andersson for identifying the correct way to fix
this as my original fix was incorrect.

Detected by CoverityScan, CID#1463042 ("Unused Value")

Fixes: 2f3bef4b247e ("wcn36xx: Add hardware scan offload support")
Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/ath/wcn36xx/smd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 2914618a0335..2a4871ca9c72 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -626,7 +626,7 @@ int wcn36xx_smd_start_hw_scan(struct wcn36xx *wcn, struct 
ieee80211_vif *vif,
 
msg_body.scan_type = WCN36XX_HAL_SCAN_TYPE_ACTIVE;
msg_body.min_ch_time = 30;
-   msg_body.min_ch_time = 100;
+   msg_body.max_ch_time = 100;
msg_body.scan_hidden = 1;
memcpy(msg_body.mac, vif->addr, ETH_ALEN);
msg_body.p2p_search = vif->p2p;
-- 
2.14.1



[PATCH][V2] wcn36xx: fix incorrect assignment to msg_body.min_ch_time

2017-12-29 Thread Colin King
From: Colin Ian King 

The second assignment to msg_body.min_ch_time is incorrect, it
should actually be to msg_body.max_ch_time.

Thanks to Bjorn Andersson for identifying the correct way to fix
this as my original fix was incorrect.

Detected by CoverityScan, CID#1463042 ("Unused Value")

Fixes: 2f3bef4b247e ("wcn36xx: Add hardware scan offload support")
Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/ath/wcn36xx/smd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 2914618a0335..2a4871ca9c72 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -626,7 +626,7 @@ int wcn36xx_smd_start_hw_scan(struct wcn36xx *wcn, struct 
ieee80211_vif *vif,
 
msg_body.scan_type = WCN36XX_HAL_SCAN_TYPE_ACTIVE;
msg_body.min_ch_time = 30;
-   msg_body.min_ch_time = 100;
+   msg_body.max_ch_time = 100;
msg_body.scan_hidden = 1;
memcpy(msg_body.mac, vif->addr, ETH_ALEN);
msg_body.p2p_search = vif->p2p;
-- 
2.14.1



[PATCH] NET: usb: qmi_wwan: add support for YUGA CLM920-NC5 PID 0x9625

2017-12-29 Thread 林上智
This patch adds support for PID 0x9625 of YUGA CLM920-NC5.

YUGA CLM920-NC5 needs to enable QMI_WWAN_QUIRK_DTR before QMI operation.

qmicli -d /dev/cdc-wdm0 -p --dms-get-revision
[/dev/cdc-wdm0] Device revision retrieved:
Revision: 'CLM920_NC5-V1  1  [Oct 23 2016 19:00:00]'

Signed-off-by: SZ Lin (林上智) 
---
 drivers/net/usb/qmi_wwan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 3000ddd1c7e2..728819feab44 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -1100,6 +1100,7 @@ static const struct usb_device_id products[] = {
{QMI_FIXED_INTF(0x05c6, 0x9084, 4)},
{QMI_FIXED_INTF(0x05c6, 0x920d, 0)},
{QMI_FIXED_INTF(0x05c6, 0x920d, 5)},
+   {QMI_QUIRK_SET_DTR(0x05c6, 0x9625, 4)}, /* YUGA CLM920-NC5 */
{QMI_FIXED_INTF(0x0846, 0x68a2, 8)},
{QMI_FIXED_INTF(0x12d1, 0x140c, 1)},/* Huawei E173 */
{QMI_FIXED_INTF(0x12d1, 0x14ac, 1)},/* Huawei E1820 */
-- 
2.15.1



Re: iproute2 net-next

2017-12-29 Thread Jiri Pirko
Fri, Dec 29, 2017 at 12:46:31AM CET, dan...@iogearbox.net wrote:
>On 12/26/2017 10:35 AM, Leon Romanovsky wrote:
>> On Mon, Dec 25, 2017 at 10:14:26PM -0800, Stephen Hemminger wrote:
>>> On Tue, 26 Dec 2017 06:47:43 +0200
>>> Leon Romanovsky  wrote:
>>>
 On Mon, Dec 25, 2017 at 10:49:19AM -0800, Stephen Hemminger wrote:
> David Ahern has agreed to take over managing the net-next branch of 
> iproute2.
> The new location is:
>  
> https://git.kernel.org/pub/scm/linux/kernel/git/dsahern/iproute2-next.git/
>
> In the past, I have accepted new features into iproute2 master branch, but
> am changing the policy so that outside of the merge window (up until -rc1)
> new features will get put into net-next to get some more review and 
> testing
> time. This means that things like the proposed batch streaming mode will
> go through net-next.

 Did you consider to create one shared repo for the iproute2 to allow
 multiple committers workflow?
>>>
>>> For now having separate trees is best, there is no need for multiple
>>> committers the load is very light.
>>>
 It will be much convenient for the users to have one place for
 master/stable/net-next branches, instead of actually following two
 different repositories.
>>>
>>> If you are doing network development, you already need to deal with
>>> multiple repo's on the kernel side so there is no difference.
>> 
>> I agree with you that one extra "git remote add .." is not so huge and
>> all people who develop for the netdev will do it. My concern is about
>> Documentation and newcomers, who will have a hard time to find a right
>> tree.
>
>I guess it would certainly help to identify the official repo to rebase
>against much quicker if it would be under a common group on korg e.g.
>
>  * iproute2/iproute2.git - for current cycle
>  * iproute2/iproute2-next.git- for net-next bits
>
>and also be in line with other tooling (ethtool and others), even if
>not as high volume, but it would make it unambiguous right away from
>the other, private iproute2 repos on korg, imho. Just a thought.

+1

I was about to suggest this. This is nice opportunity to do such change.


>
 Example, of such shared repo:
 BPF: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/
 Bluetooth: 
 https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/
 RDMA: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/
>>>
>>> Most of these are high volume or vendor silo'd which is not the case here.
>Cheers,
>Daniel


[PATCH net-next] net: hns: add ACPI mode support for ethtool -p

2017-12-29 Thread Peng Li
From: Jian Shen 

The locate operation interface of fiber port can only
work with DT mode. Add a new interface to control the
locate led for ACPI mode.

Signed-off-by: Jian Shen 
Signed-off-by: Peng Li 
Tested-by: Zhou Wang 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c  |  2 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 57 +-
 2 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index 8b5cdf4..cac86e9 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@ -1168,7 +1168,7 @@ void hns_set_led_opt(struct hns_mac_cb *mac_cb)
 int hns_cpld_led_set_id(struct hns_mac_cb *mac_cb,
enum hnae_led_state status)
 {
-   if (!mac_cb || !mac_cb->cpld_ctrl)
+   if (!mac_cb)
return 0;
 
return mac_cb->dsaf_dev->misc_op->cpld_set_led_id(mac_cb, status);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c
index 408b63f..ca247c2 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c
@@ -18,6 +18,7 @@ enum _dsm_op_index {
HNS_OP_LED_SET_FUNC = 0x3,
HNS_OP_GET_PORT_TYPE_FUNC   = 0x4,
HNS_OP_GET_SFP_STAT_FUNC= 0x5,
+   HNS_OP_LOCATE_LED_SET_FUNC  = 0x6,
 };
 
 enum _dsm_rst_type {
@@ -81,6 +82,33 @@ static void hns_dsaf_acpi_ledctrl_by_port(struct hns_mac_cb 
*mac_cb, u8 op_type,
ACPI_FREE(obj);
 }
 
+static void hns_dsaf_acpi_locate_ledctrl_by_port(struct hns_mac_cb *mac_cb,
+u8 op_type, u32 locate,
+u32 port)
+{
+   union acpi_object obj_args[2], argv4;
+   union acpi_object *obj;
+
+   obj_args[0].integer.type = ACPI_TYPE_INTEGER;
+   obj_args[0].integer.value = locate;
+   obj_args[1].integer.type = ACPI_TYPE_INTEGER;
+   obj_args[1].integer.value = port;
+
+   argv4.type = ACPI_TYPE_PACKAGE;
+   argv4.package.count = 2;
+   argv4.package.elements = obj_args;
+
+   obj = acpi_evaluate_dsm(ACPI_HANDLE(mac_cb->dev),
+   &hns_dsaf_acpi_dsm_guid, 0, op_type, &argv4);
+   if (!obj) {
+   dev_err(mac_cb->dev, "ledctrl fail, locate:%d port:%d!\n",
+   locate, port);
+   return;
+   }
+
+   ACPI_FREE(obj);
+}
+
 static void hns_cpld_set_led(struct hns_mac_cb *mac_cb, int link_status,
 u16 speed, int data)
 {
@@ -160,6 +188,9 @@ static void cpld_led_reset_acpi(struct hns_mac_cb *mac_cb)
 static int cpld_set_led_id(struct hns_mac_cb *mac_cb,
   enum hnae_led_state status)
 {
+   if (!mac_cb->cpld_ctrl)
+   return 0;
+
switch (status) {
case HNAE_LED_ACTIVE:
mac_cb->cpld_led_value =
@@ -184,6 +215,30 @@ static int cpld_set_led_id(struct hns_mac_cb *mac_cb,
return 0;
 }
 
+static int cpld_set_led_id_acpi(struct hns_mac_cb *mac_cb,
+   enum hnae_led_state status)
+{
+   switch (status) {
+   case HNAE_LED_ACTIVE:
+   hns_dsaf_acpi_locate_ledctrl_by_port(mac_cb,
+HNS_OP_LOCATE_LED_SET_FUNC,
+CPLD_LED_ON_VALUE,
+mac_cb->mac_id);
+   break;
+   case HNAE_LED_INACTIVE:
+   hns_dsaf_acpi_locate_ledctrl_by_port(mac_cb,
+HNS_OP_LOCATE_LED_SET_FUNC,
+CPLD_LED_DEFAULT_VALUE,
+mac_cb->mac_id);
+   break;
+   default:
+   dev_err(mac_cb->dev, "invalid led state: %d!", status);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 #define RESET_REQ_OR_DREQ 1
 
 static void hns_dsaf_acpi_srst_by_port(struct dsaf_device *dsaf_dev, u8 
op_type,
@@ -660,7 +715,7 @@ struct dsaf_misc_op *hns_misc_op_get(struct dsaf_device 
*dsaf_dev)
} else if (is_acpi_node(dsaf_dev->dev->fwnode)) {
misc_op->cpld_set_led = hns_cpld_set_led_acpi;
misc_op->cpld_reset_led = cpld_led_reset_acpi;
-   misc_op->cpld_set_led_id = cpld_set_led_id;
+   misc_op->cpld_set_led_id = cpld_set_led_id_acpi;
 
misc_op->dsaf_reset = hns_dsaf_rst_acpi;
misc_op->xge_srst = hns_dsaf_xge_srst_by_port_acpi;
-- 
1.9.1



Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-29 Thread Masami Hiramatsu
On Thu, 28 Dec 2017 17:03:24 -0800
Alexei Starovoitov  wrote:

> On 12/28/17 12:20 AM, Masami Hiramatsu wrote:
> > On Wed, 27 Dec 2017 20:32:07 -0800
> > Alexei Starovoitov  wrote:
> >
> >> On 12/27/17 8:16 PM, Steven Rostedt wrote:
> >>> On Wed, 27 Dec 2017 19:45:42 -0800
> >>> Alexei Starovoitov  wrote:
> >>>
>  I don't think that's the case. My reading of current
>  trace_kprobe_ftrace() -> arch_check_ftrace_location()
>  is that it will not be true for old mcount case.
> >>>
> >>> In the old mcount case, you can't use ftrace to return without calling
> >>> the function. That is, no modification of the return ip, unless you
> >>> created a trampoline that could handle arbitrary stack frames, and
> >>> remove them from the stack before returning back to the function.
> >>
> >> correct. I was saying that trace_kprobe_ftrace() won't let us do
> >> bpf_override_return with old mcount.
> >
> > No, trace_kprobe_ftrace() just checks the given address will be
> > managed by ftrace. you can see arch_check_ftrace_location() in 
> > kernel/kprobes.c.
> >
> > FYI, CONFIG_KPROBES_ON_FTRACE depends on DYNAMIC_FTRACE_WITH_REGS, and
> > DYNAMIC_FTRACE_WITH_REGS doesn't depend on CC_USING_FENTRY.
> > This means if you compile kernel with old gcc and enable DYNAMIC_FTRACE,
> > kprobes uses ftrace on mcount address which is NOT the entry point
> > of target function.
> 
> ok. fair enough. I think we can gate the feature to !mcount only.
> 
> > On the other hand, changing IP feature has been implemented originaly
> > by kprobes with int3 (sw breakpoint). This means you can use kprobes
> > at correct address (the entry address of the function) you can hijack
> > the function, as jprobe did.
> >
>  As far as the rest of your arguments it very much puzzles me that
>  you claim that this patch suppose to work based on historical
>  reasoning whereas you did NOT test it.
> >>>
> >>> I believe that Masami is saying that the modification of the IP from
> >>> kprobes has been very well tested. But I'm guessing that you still want
> >>> a test case for using kprobes in this particular instance. It's not the
> >>> implementation of modifying the IP that you are worried about, but the
> >>> implementation of BPF using it in this case. Right?
> >>
> >> exactly. No doubt that old code works.
> >> But it doesn't mean that bpf_override_return() will continue to
> >> work in kprobes that are not ftrace based.
> >> I suspect Josef's existing test case will cover this situation.
> >> Probably only special .config is needed to disable ftrace, so
> >> "kprobe on entry but not ftrace" check will kick in.
> >
> > Right. If you need to test it, you can run Josef's test case without
> > CONFIG_DYNAMIC_FTRACE.
> 
> It should be obvious that the person who submits the patch
> must run the tests.
> 
> >> But I didn't get an impression that this situation was tested.
> >> Instead I see only logical reasoning that it's _supposed_ to work.
> >> That's not enough.
> >
> > OK, so would you just ask me to run samples/bpf ?
> 
> Please run Josef's test in the !ftrace setup.

Yes, I'll add the result of the test case.

Thank you,


-- 
Masami Hiramatsu 


Re: [PATCH net 3/3] eet: ena: invoke netif_carrier_off() only after netdev registered

2017-12-29 Thread Jakub Kicinski
On Fri, 29 Dec 2017 08:00:33 +, Belgazal, Netanel wrote:
> Yes, I mean in my driver.
> netif_carrier_off() have no effect when netdev is uninitialized.

Please look at the implementation again, test_*and_set*_bit().

> So I must call it after register_netdev().

Is there a user-visible problem you're trying to solve here?


Re: [PATCH net 3/3] eet: ena: invoke netif_carrier_off() only after netdev registered

2017-12-29 Thread Belgazal, Netanel
Yes, I mean in my driver.
netif_carrier_off() have no effect when netdev is uninitialized.
So I must call it after register_netdev().

On 12/29/17, 9:46 AM, "Jakub Kicinski"  wrote:

By "should" you mean in your driver, right?  I think calling
netif_carrier_off() on an unregistered netdev is a pretty standard
thing to do for drivers which manage carrier state.