date:20180510

Re: [PATCH net] macmace: Set platform device coherent_dma_mask

2018-05-10 Thread Finn Thain

On Fri, 11 May 2018, Michael Schmitz wrote:

> 
> I'm afraid using platform_device_register() (which you already use for 
> the SCC devices) is the only option handling this on a per-device basis 
> without touching platform core code, while at the same time keeping the 
> DMA mask setup out of device drivers

I don't think that will fly. If you call platform_device_register() and 
follow that with a dma mask assignment, you could race with the bus 
matching and driver probe, and we are back to the same WARNING message.

If you want to use platform_device_register(), you'd have to implement 
arch_setup_pdev_archdata() and use that to set up the dma mask.

> (I can see Geert's point there - device driver code might be shared 
> across implementations of the device on platforms with different DMA 
> mask requirements,, something the driver can't be expected to know 
> about).

As I said, these drivers might be expected to be portable between Macs and 
early PowerMacs, but the same dma mask would apply AFAIK.

If a platform driver isn't expected to be portable, I think either method 
is reasonable: arch_setup_pdev_archdata() or the method in the patch.

Anyway, there is this in arch/powerpc/kernel/setup-common.c:

void arch_setup_pdev_archdata(struct platform_device *pdev)
{
pdev->archdata.dma_mask = DMA_BIT_MASK(32);
pdev->dev.dma_mask = >archdata.dma_mask;
...

I'm inclined to propose something similar for m68k. That should fix the 
problem, since arch_setup_pdev_archdata() is already in the call chain:

platform_device_register_simple()
platform_device_register_resndata()
platform_device_register_full()
platform_device_alloc()
arch_setup_pdev_archdata()

Thoughts? Will this have nasty side effects for m68k platforms that use 
smaller dma masks?

-- 

> 
> Cheers,
> 
>   Michael
>

KASAN: null-ptr-deref Read in rds_ib_get_mr

2018-05-10 Thread DaeRyong Jeong

We report the crash: KASAN: null-ptr-deref Read in rds_ib_get_mr

Note that this bug is previously reported by syzkaller.
https://syzkaller.appspot.com/bug?id=0bb56a5a48b000b52aa2b0d8dd20b1f545214d91
Nonetheless, this bug has not fixed yet, and we hope that this report and our
analysis, which gets help by the RaceFuzzer's feature, will helpful to fix the
crash.

This crash has been found in v4.17-rc1 using RaceFuzzer (a modified
version of Syzkaller), which we describe more at the end of this
report. Our analysis shows that the race occurs when invoking two
syscalls concurrently, bind$rds and setsockopt$RDS_GET_MR.


Analysis:
We think the concurrent execution of __rds_rdma_map() and rds_bind()
causes the problem. __rds_rdma_map() checks whether rs->rs_bound_addr is 0
or not. But the concurrent execution with rds_bind() can by-pass this
check. Therefore, __rds_rdmap_map() calls rs->rs_transport->get_mr() and
rds_ib_get_mr() causes the null deref at ib_rdma.c:544 in v4.17-rc1, when
dereferencing rs_conn.


Thread interleaving:
CPU0 (__rds_rdma_map)   CPU1 (rds_bind)
// rds_add_bound() sets 
rs->bound_addr as none 0
ret = rds_add_bound(rs, 
sin->sin_addr.s_addr, >sin_port);
if (rs->rs_bound_addr == 0 || !rs->rs_transport) {
ret = -ENOTCONN; /* XXX not a great errno */
goto out;
}
if (rs->rs_transport) { 
/* previously bound */
trans = 
rs->rs_transport;
if 
(trans->laddr_check(sock_net(sock->sk),

   sin->sin_addr.s_addr) != 0) {
ret = 
-ENOPROTOOPT;
// 
rds_remove_bound() sets rs->bound_addr as 0

rds_remove_bound(rs);
...
trans_private = rs->rs_transport->get_mr(sg, nents, rs,
 >r_key);
(in rds_ib_get_mr())
struct rds_ib_connection *ic = rs->rs_conn->c_transport_data;


Call sequence (v4.17-rc1):
CPU0
rds_setsockopt
rds_get_mr
__rds_rdma_map
rds_ib_get_mr


CPU1
rds_bind
rds_add_bound
...
rds_remove_bound


Crash log:
==
BUG: KASAN: null-ptr-deref in rds_ib_get_mr+0x3a/0x150 net/rds/ib_rdma.c:544
Read of size 8 at addr 0068 by task syz-executor0/32067

CPU: 0 PID: 32067 Comm: syz-executor0 Not tainted 4.17.0-rc1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x166/0x21c lib/dump_stack.c:113
 kasan_report_error mm/kasan/report.c:352 [inline]
 kasan_report+0x140/0x360 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 __asan_load8+0x54/0x90 mm/kasan/kasan.c:699
 rds_ib_get_mr+0x3a/0x150 net/rds/ib_rdma.c:544
 __rds_rdma_map+0x521/0x9d0 net/rds/rdma.c:271
 rds_get_mr+0xad/0xf0 net/rds/rdma.c:333
 rds_setsockopt+0x57f/0x720 net/rds/af_rds.c:347
 __sys_setsockopt+0x147/0x230 net/socket.c:1903
 __do_sys_setsockopt net/socket.c:1914 [inline]
 __se_sys_setsockopt net/socket.c:1911 [inline]
 __x64_sys_setsockopt+0x67/0x80 net/socket.c:1911
 do_syscall_64+0x15f/0x4a0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4563f9
RSP: 002b:7f6a2b3c2b28 EFLAGS: 0246 ORIG_RAX: 0036
RAX: ffda RBX: 0072bee0 RCX: 004563f9
RDX: 0002 RSI: 0114 RDI: 0015
RBP: 0575 R08: 0020 R09: 
R10: 2140 R11: 0246 R12: 7f6a2b3c36d4
R13:  R14: 006fd398 R15: 
==


= About RaceFuzzer

RaceFuzzer is a customized version of Syzkaller, specifically tailored
to find race condition bugs in the Linux kernel. While we leverage
many different technique, the notable feature of RaceFuzzer is in
leveraging a custom hypervisor (QEMU/KVM) to interleave the
scheduling. In particular, we modified the hypervisor to intentionally
stall a per-core execution, which is similar to supporting per-core
breakpoint functionality. This allows RaceFuzzer to force the kernel
to deterministically trigger racy condition (which may rarely happen
in practice due to randomness in scheduling).

RaceFuzzer's C repro always pinpoints two racy syscalls. Since C
repro's scheduling synchronization should be performed at the

Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF

2018-05-10 Thread Martin KaFai Lau

On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
> This patch adds a new BPF helper function, sk_lookup() which allows BPF
> programs to find out if there is a socket listening on this host, and
> returns a socket pointer which the BPF program can then access to
> determine, for instance, whether to forward or drop traffic. sk_lookup()
> takes a reference on the socket, so when a BPF program makes use of this
> function, it must subsequently pass the returned pointer into the newly
> added sk_release() to return the reference.
> 
> By way of example, the following pseudocode would filter inbound
> connections at XDP if there is no corresponding service listening for
> the traffic:
> 
>   struct bpf_sock_tuple tuple;
>   struct bpf_sock_ops *sk;
> 
>   populate_tuple(ctx, ); // Extract the 5tuple from the packet
>   sk = bpf_sk_lookup(ctx, , sizeof tuple, netns, 0);
>   if (!sk) {
> // Couldn't find a socket listening for this traffic. Drop.
> return TC_ACT_SHOT;
>   }
>   bpf_sk_release(sk, 0);
>   return TC_ACT_OK;
> 
> Signed-off-by: Joe Stringer 
> ---
>  include/uapi/linux/bpf.h  |  39 +++-
>  kernel/bpf/verifier.c |   8 ++-
>  net/core/filter.c | 102 
> ++
>  tools/include/uapi/linux/bpf.h|  40 +++-
>  tools/testing/selftests/bpf/bpf_helpers.h |   7 ++
>  5 files changed, 193 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d615c777b573..29f38838dbca 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1828,6 +1828,25 @@ union bpf_attr {
>   *   Return
>   *   0 on success, or a negative error in case of failure.
>   *
> + * struct bpf_sock_ops *bpf_sk_lookup(ctx, tuple, tuple_size, netns, flags)
> + *   Decription
> + *   Look for socket matching 'tuple'. The return value must be 
> checked,
> + *   and if non-NULL, released via bpf_sk_release().
> + *   @ctx: pointer to ctx
> + *   @tuple: pointer to struct bpf_sock_tuple
> + *   @tuple_size: size of the tuple
> + *   @flags: flags value
> + *   Return
> + *   pointer to socket ops on success, or
> + *   NULL in case of failure
> + *
> + *  int bpf_sk_release(sock, flags)
> + *   Description
> + *   Release the reference held by 'sock'.
> + *   @sock: Pointer reference to release. Must be found via 
> bpf_sk_lookup().
> + *   @flags: flags value
> + *   Return
> + *   0 on success, or a negative error in case of failure.
>   */
>  #define __BPF_FUNC_MAPPER(FN)\
>   FN(unspec), \
> @@ -1898,7 +1917,9 @@ union bpf_attr {
>   FN(xdp_adjust_tail),\
>   FN(skb_get_xfrm_state), \
>   FN(get_stack),  \
> - FN(skb_load_bytes_relative),
> + FN(skb_load_bytes_relative),\
> + FN(sk_lookup),  \
> + FN(sk_release),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> @@ -2060,6 +2081,22 @@ struct bpf_sock {
>*/
>  };
>  
> +struct bpf_sock_tuple {
> + union {
> + __be32 ipv6[4];
> + __be32 ipv4;
> + } saddr;
> + union {
> + __be32 ipv6[4];
> + __be32 ipv4;
> + } daddr;
> + __be16 sport;
> + __be16 dport;
> + __u32 dst_if;
> + __u8 family;
> + __u8 proto;
> +};
> +
>  #define XDP_PACKET_HEADROOM 256
>  
>  /* User return codes for XDP prog type.
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 92b9a5dc465a..579012c483e4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -153,6 +153,12 @@ static const struct bpf_verifier_ops * const 
> bpf_verifier_ops[] = {
>   * PTR_TO_MAP_VALUE, PTR_TO_SOCKET_OR_NULL becomes PTR_TO_SOCKET when the 
> type
>   * passes through a NULL-check conditional. For the branch wherein the state 
> is
>   * changed to CONST_IMM, the verifier releases the reference.
> + *
> + * For each helper function that allocates a reference, such as 
> bpf_sk_lookup(),
> + * there is a corresponding release function, such as bpf_sk_release(). When
> + * a reference type passes into the release function, the verifier also 
> releases
> + * the reference. If any unchecked or unreleased reference remains at the 
> end of
> + * the program, the verifier rejects it.
>   */
>  
>  /* verifier_state + insn_idx are pushed to stack when branch is encountered 
> */
> @@ -277,7 +283,7 @@ static bool arg_type_is_refcounted(enum bpf_arg_type type)
>   */
>  static bool is_release_function(enum bpf_func_id func_id)
>  {
> - return false;
> + return func_id == BPF_FUNC_sk_release;
>  }
>  
>  /* string representation of 'enum bpf_reg_type' */
> diff --git

Re: [PATCH net] macmace: Set platform device coherent_dma_mask

2018-05-10 Thread Michael Schmitz

Hi Finn,

Am 11.05.2018 um 15:28 schrieb Finn Thain:
> On Fri, 11 May 2018, Michael Schmitz wrote:
> 
 Which begs the question: why can' you set up all Nubus bus devices' 
 DMA masks in nubus_device_register(), or nubus_add_board()?
>>>
>>> I am expecting to see the same WARNING from the nubus sonic driver but 
>>> it hasn't happened yet, so I don't have a patch for it yet. In 
>>> anycase, the nubus fix would be a lot like the zorro bus fix, so I 
>>> don't see a problem.
>>
>> That's odd. But what I meant to say is that by setting up 
>> dma_coherent_mask in nubus_add_board(), and pointing dma_mask to that, 
>> ypu won't need any patches to Nubus device drivers.
> 
> Right. I think I've already acknowledged that. But it's off-topic, because 
> the patches under review are for platform drivers. Those patches fix an 
> actual bug that I've observed. Whereas, the nubus driver dma mask issue 
> that you raised is purely theoretical at this stage.

I had lost track of the fact that macsonic can be probed as either Nubus
or platform device. Sorry for the noise.

I'm afraid using platform_device_register() (which you already use for
the SCC devices) is the only option handling this on a per-device basis
without touching platform core code, while at the same time keeping the
DMA mask setup out of device drivers (I can see Geert's point there -
device driver code might be shared across implementations of the device
on platforms with different DMA mask requirements,, something the driver
can't be expected to know about).

Cheers,

Michael

Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS

2018-05-10 Thread Willem de Bruijn

On Thu, May 10, 2018 at 4:28 AM,   wrote:
> From: Gao Feng 
>
> The skb flow limit is implemented for each CPU independently. In the
> current codes, the function skb_flow_limit gets the softnet_data by
> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
> the current cpu when enable RPS. As the result, the skb_flow_limit checks
> the stats of current CPU, while the skb is going to append the queue of
> another CPU. It isn't the expected behavior.
>
> Now pass the softnet_data as a param to softnet_data to make consistent.

The local cpu softnet_data is used on purpose. The operations in
skb_flow_limit() on sd fields could race if not executed on the local cpu.

Flow limit tries to detect large ("elephant") DoS flows with a fixed four-tuple.
These would always hit the same RPS cpu, so that cpu being backlogged
may be an indication that such a flow is active. But the flow will also always
arrive on the same initial cpu courtesy of RSS. So storing the lookup table
on the initial CPU is also fine. There may be false positives on other CPUs
with the same RPS destination, but that is unlikely with a highly concurrent
traffic server mix ("mice").

Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature
for the cpus on which traffic initially lands, not the RPS destination cpus.
See also Documentation/networking/scaling.txt

That said, I had to reread the code, as it does seem sensible that the
same softnet_data is intended to be used both when testing qlen and
flow_limit.

Re: [PATCH net-next] udp: avoid refcount_t saturation in __udp_gso_segment()

2018-05-10 Thread Willem de Bruijn

On Thu, May 10, 2018 at 10:07 PM, Eric Dumazet  wrote:
> For some reason, Willem thought that the issue we fixed for TCP
> in commit 7ec318feeed1 ("tcp: gso: avoid refcount_t warning from
> tcp_gso_segment()") was not relevant for UDP GSO.
>
> But syzbot found its way.

[..]

> Fixes: ad405857b174 ("udp: better wmem accounting on gso")
> Signed-off-by: Eric Dumazet 
> Cc: Willem de Bruijn 
> Cc: Alexander Duyck 
> Reported-by: syzbot 

Acked-by: Willem de Bruijn 

Thanks Eric. Yep, I was naive here. I am quite curious what kind of
gso packet syzkaller was able to cook that exceeds the truesize
of its segments.

Re: [PATCH net] macmace: Set platform device coherent_dma_mask

2018-05-10 Thread Finn Thain

On Fri, 11 May 2018, Michael Schmitz wrote:

> > > Which begs the question: why can' you set up all Nubus bus devices' 
> > > DMA masks in nubus_device_register(), or nubus_add_board()?
> >
> > I am expecting to see the same WARNING from the nubus sonic driver but 
> > it hasn't happened yet, so I don't have a patch for it yet. In 
> > anycase, the nubus fix would be a lot like the zorro bus fix, so I 
> > don't see a problem.
> 
> That's odd. But what I meant to say is that by setting up 
> dma_coherent_mask in nubus_add_board(), and pointing dma_mask to that, 
> ypu won't need any patches to Nubus device drivers.

Right. I think I've already acknowledged that. But it's off-topic, because 
the patches under review are for platform drivers. Those patches fix an 
actual bug that I've observed. Whereas, the nubus driver dma mask issue 
that you raised is purely theoretical at this stage.

--

[PATCH] dt-bindings: net: ravb: Add support for r8a77990 SoC

2018-05-10 Thread Yoshihiro Shimoda

Add documentation for r8a77990 compatible string to renesas ravb device
tree bindings documentation.

Signed-off-by: Yoshihiro Shimoda 
---
 Documentation/devicetree/bindings/net/renesas,ravb.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt 
b/Documentation/devicetree/bindings/net/renesas,ravb.txt
index 890526d..fac897d 100644
--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
@@ -21,6 +21,7 @@ Required properties:
   - "renesas,etheravb-r8a77965" for the R8A77965 SoC.
   - "renesas,etheravb-r8a77970" for the R8A77970 SoC.
   - "renesas,etheravb-r8a77980" for the R8A77980 SoC.
+  - "renesas,etheravb-r8a77990" for the R8A77990 SoC.
   - "renesas,etheravb-r8a77995" for the R8A77995 SoC.
   - "renesas,etheravb-rcar-gen3" as a fallback for the above
R-Car Gen3 devices.
-- 
1.9.1

[PATCH net V2] tun: fix use after free for ptr_ring

2018-05-10 Thread Jason Wang

We used to initialize ptr_ring during TUNSETIFF, this is because its
size depends on the tx_queue_len of netdevice. And we try to clean it
up when socket were detached from netdevice. A race were spotted when
trying to do uninit during a read which will lead a use after free for
pointer ring. Solving this by always initialize a zero size ptr_ring
in open() and do resizing during TUNSETIFF, and then we can safely do
cleanup during close(). With this, there's no need for the workaround
that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak
for tfile->tx_array").

Reported-by: syzbot+e8b902c3c3fadf0a9...@syzkaller.appspotmail.com
Cc: Eric Dumazet 
Cc: Cong Wang 
Cc: Michael S. Tsirkin 
Fixes: 1576d9860599 ("tun: switch to use skb array for tx")
Signed-off-by: Jason Wang 
---
Changes from v1:
- free ptr_ring during close()
- use tun_ptr_free() during resie for safety
---
 drivers/net/tun.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ef33950..9fbbb32 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -681,15 +681,6 @@ static void tun_queue_purge(struct tun_file *tfile)
skb_queue_purge(>sk.sk_error_queue);
 }
 
-static void tun_cleanup_tx_ring(struct tun_file *tfile)
-{
-   if (tfile->tx_ring.queue) {
-   ptr_ring_cleanup(>tx_ring, tun_ptr_free);
-   xdp_rxq_info_unreg(>xdp_rxq);
-   memset(>tx_ring, 0, sizeof(tfile->tx_ring));
-   }
-}
-
 static void __tun_detach(struct tun_file *tfile, bool clean)
 {
struct tun_file *ntfile;
@@ -736,7 +727,8 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
tun->dev->reg_state == NETREG_REGISTERED)
unregister_netdevice(tun->dev);
}
-   tun_cleanup_tx_ring(tfile);
+   if (tun)
+   xdp_rxq_info_unreg(>xdp_rxq);
sock_put(>sk);
}
 }
@@ -783,14 +775,14 @@ static void tun_detach_all(struct net_device *dev)
tun_napi_del(tun, tfile);
/* Drop read queue */
tun_queue_purge(tfile);
+   xdp_rxq_info_unreg(>xdp_rxq);
sock_put(>sk);
-   tun_cleanup_tx_ring(tfile);
}
list_for_each_entry_safe(tfile, tmp, >disabled, next) {
tun_enable_queue(tfile);
tun_queue_purge(tfile);
+   xdp_rxq_info_unreg(>xdp_rxq);
sock_put(>sk);
-   tun_cleanup_tx_ring(tfile);
}
BUG_ON(tun->numdisabled != 0);
 
@@ -834,7 +826,8 @@ static int tun_attach(struct tun_struct *tun, struct file 
*file,
}
 
if (!tfile->detached &&
-   ptr_ring_init(>tx_ring, dev->tx_queue_len, GFP_KERNEL)) {
+   ptr_ring_resize(>tx_ring, dev->tx_queue_len,
+   GFP_KERNEL, tun_ptr_free)) {
err = -ENOMEM;
goto out;
}
@@ -3219,6 +3212,11 @@ static int tun_chr_open(struct inode *inode, struct file 
* file)
_proto, 0);
if (!tfile)
return -ENOMEM;
+   if (ptr_ring_init(>tx_ring, 0, GFP_KERNEL)) {
+   sk_free(>sk);
+   return -ENOMEM;
+   }
+
RCU_INIT_POINTER(tfile->tun, NULL);
tfile->flags = 0;
tfile->ifindex = 0;
@@ -3239,8 +3237,6 @@ static int tun_chr_open(struct inode *inode, struct file 
* file)
 
sock_set_flag(>sk, SOCK_ZEROCOPY);
 
-   memset(>tx_ring, 0, sizeof(tfile->tx_ring));
-
return 0;
 }
 
@@ -3249,6 +3245,7 @@ static int tun_chr_close(struct inode *inode, struct file 
*file)
struct tun_file *tfile = file->private_data;
 
tun_detach(tfile, true);
+   ptr_ring_cleanup(>tx_ring, tun_ptr_free);
 
return 0;
 }
-- 
2.7.4

Re: [PATCH] coredump: rename umh_pipe_setup() to coredump_pipe_setup()

2018-05-10 Thread Al Viro

On Thu, May 10, 2018 at 11:32:47PM +, Luis R. Rodriguez wrote:

> I think net-next makes sense if Al Viro is OK with that. This way it could go
> in regardless of the state of your series, but it also lines up with your 
> work.

Fine by me...

Re: [PATCH net] macmace: Set platform device coherent_dma_mask

2018-05-10 Thread Michael Schmitz

Hi Finn,

On Fri, May 11, 2018 at 11:55 AM, Finn Thain  wrote:

>> > What's worse, if you do pass a dma_mask in struct
>> > platform_device_info, you end up with this problem in
>> > platform_device_register_full():
>> >
>> > if (pdevinfo->dma_mask) {
>> > /*
>> >  * This memory isn't freed when the device is put,
>> >  * I don't have a nice idea for that though.  Conceptually
>> >  * dma_mask in struct device should not be a pointer.
>> >  * See http://thread.gmane.org/gmane.linux.kernel.pci/9081
>> >  */
>> > pdev->dev.dma_mask =
>> > kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL);
>>
>> Maybe platform_device_register_full() should rather check whether
>> dev.coherent_dma_mask is set, and make dev.dma_mask point to that? This
>> is how we solved the warning issue for the Zorro bus devices...
>> (8614f1b58bd0e920a5859464a500b93152c5f8b1)
>>
>
> The claim in the comment above that a pointer is the wrong solution
> suggests that your proposal won't get far. Also, your proposal doesn't

I read the comment to be mostly concerned about not freeing memory,
and attempted to address that. I won't pretend it's the right thing to
do if the pointer will go away anyway, and I certainly won't submit a
patch. Sorry for muddling the issue.

> address the other issues I raised: a new
> platform_device_register_simple_dma() API would only have two callers, and
> the dma mask setup for device-tree probed platform devices is apparently a
> work-in-progress (which I don't want to churn up).

Yes, and that's why I would prefer your old patch handling this in the
device driver (which Geert didn't like), or in the alternative to set
the mask up when registering a device with its bus where appropriate.

I concede this won't help with pure platform devices but as we can't
test all these, we should leave the fix for platfoem devices up to
Christoph.

>
>> > > With people setting the mask to kill the WARNING splat, this may
>> > > become more common.
>> >
>> > Since the commit which introduced the WARNING, only commits f61e64310b75
>> > ("m68k: set dma and coherent masks for platform FEC ethernets") and
>> > 7bcfab202ca7 ("powerpc/macio: set a proper dma_coherent_mask") seem to be
>> > aimed at squelching that WARNING.
>> >
>> > (Am I missing any others?)
>>
>> Zorro devices :-)
>
> Right, I should add commit 55496d3fe2ac ("zorro: Set up z->dev.dma_mask
> for the DMA API") to that list.
>
>> Which begs the question: why can' you set up all Nubus bus devices' DMA
>> masks in nubus_device_register(), or nubus_add_board()?
>
> I am expecting to see the same WARNING from the nubus sonic driver but it
> hasn't happened yet, so I don't have a patch for it yet. In anycase, the
> nubus fix would be a lot like the zorro bus fix, so I don't see a problem.

That's odd. But what I meant to say is that by setting up
dma_coherent_mask in nubus_add_board(), and pointing dma_mask to that,
ypu won't need any patches to Nubus device drivers.

I must be missing something else...

Cheers,

  Michael


>
> --

[PATCH net-next] udp: avoid refcount_t saturation in __udp_gso_segment()

2018-05-10 Thread Eric Dumazet

For some reason, Willem thought that the issue we fixed for TCP
in commit 7ec318feeed1 ("tcp: gso: avoid refcount_t warning from
tcp_gso_segment()") was not relevant for UDP GSO.

But syzbot found its way.

refcount_t: saturated; leaking memory.
WARNING: CPU: 0 PID: 10261 at lib/refcount.c:78 
refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 10261 Comm: syz-executor5 Not tainted 4.17.0-rc3+ #38
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 panic+0x22f/0x4de kernel/panic.c:184
 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78
RSP: 0018:880196db6b90 EFLAGS: 00010282
RAX: 0026 RBX: ff01 RCX: c900040d9000
RDX: 4a29 RSI: 8160f6f1 RDI: 880196db66f0
RBP: 880196db6c78 R08: 8801b33d6740 R09: 0002
R10: 8801b33d6740 R11:  R12: 
R13:  R14: 880196db6c50 R15: 00020101
 refcount_add+0x1b/0x70 lib/refcount.c:102
 __udp_gso_segment+0xaa5/0xee0 net/ipv4/udp_offload.c:272
 udp4_ufo_fragment+0x592/0x7a0 net/ipv4/udp_offload.c:301
 inet_gso_segment+0x639/0x12b0 net/ipv4/af_inet.c:1342
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865
 skb_gso_segment include/linux/netdevice.h:4050 [inline]
 validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3122
 __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3579
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3620
 neigh_direct_output+0x15/0x20 net/core/neighbour.c:1401
 neigh_output include/net/neighbour.h:483 [inline]
 ip_finish_output2+0xa5f/0x1840 net/ipv4/ip_output.c:229
 ip_finish_output+0x828/0xf80 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:277 [inline]
 ip_output+0x21b/0x850 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
 ip_send_skb+0x40/0xe0 net/ipv4/ip_output.c:1434
 udp_send_skb.isra.37+0x5eb/0x1000 net/ipv4/udp.c:825
 udp_push_pending_frames+0x5c/0xf0 net/ipv4/udp.c:853
 udp_v6_push_pending_frames+0x380/0x3e0 net/ipv6/udp.c:1105
 udp_lib_setsockopt+0x59a/0x600 net/ipv4/udp.c:2403
 udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1447
 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3046
 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
 __do_sys_setsockopt net/socket.c:1914 [inline]
 __se_sys_setsockopt net/socket.c:1911 [inline]
 __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: ad405857b174 ("udp: better wmem accounting on gso")
Signed-off-by: Eric Dumazet 
Cc: Willem de Bruijn 
Cc: Alexander Duyck 
Reported-by: syzbot 
---
 net/ipv4/udp_offload.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 
ede2a7305b90f789c748d911530453ec2cbbfab7..92dc9e5a7ff3d0a7509bfa2a66e9189c8341a5fa
 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -268,9 +268,17 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
uh->check = gso_make_checksum(seg, ~check) ? : CSUM_MANGLED_0;
 
/* update refcount for the packet */
-   if (copy_dtor)
-   refcount_add(sum_truesize - gso_skb->truesize,
->sk_wmem_alloc);
+   if (copy_dtor) {
+   int delta = sum_truesize - gso_skb->truesize;
+
+   /* In some pathological cases, delta can be negative.
+* We need to either use refcount_add() or 
refcount_sub_and_test()
+*/
+   if (likely(delta >= 0))
+   refcount_add(delta, >sk_wmem_alloc);
+   else
+   WARN_ON_ONCE(refcount_sub_and_test(-delta, 
>sk_wmem_alloc));
+   }
return segs;
 }
 EXPORT_SYMBOL_GPL(__udp_gso_segment);
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH bpf-next] samples/bpf: xdp_monitor, accept short options

2018-05-10 Thread Prashant Bhole

updated optstring accept short options

Signed-off-by: Prashant Bhole 
---
 samples/bpf/xdp_monitor_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 894bc64c2cac..668511c77aaf 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -594,7 +594,7 @@ int main(int argc, char **argv)
snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]);
 
/* Parse commands line args */
-   while ((opt = getopt_long(argc, argv, "h",
+   while ((opt = getopt_long(argc, argv, "hDSs:",
  long_options, )) != -1) {
switch (opt) {
case 'D':
-- 
2.13.6

Re: [PATCH] mlx4_core: allocate 4KB ICM chunks

2018-05-10 Thread Qing Huang


Thank you for reviewing it!


On 5/10/2018 6:23 PM, Yanjun Zhu wrote:




On 2018/5/11 9:15, Qing Huang wrote:




On 5/10/2018 5:13 PM, Yanjun Zhu wrote:



On 2018/5/11 7:31, Qing Huang wrote:

When a system is under memory presure (high usage with fragments),
the original 256KB ICM chunk allocations will likely trigger kernel
memory management to enter slow path doing memory compact/migration
ops in order to complete high order memory allocations.

When that happens, user processes calling uverb APIs may get stuck
for more than 120s easily even though there are a lot of free pages
in smaller chunks available in the system.

Syslog:
...
Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
oracle_205573_e:205573 blocked for more than 120 seconds.
...

With 4KB ICM chunk size, the above issue is fixed.

However in order to support 4KB ICM chunk size, we need to fix another
issue in large size kcalloc allocations.

E.g.
Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
size, each ICM chunk can only hold 512 mtt entries (8 bytes for 
each mtt

entry). So we need a 16MB allocation for a table->icm pointer array to
hold 2M pointers which can easily cause kcalloc to fail.

The solution is to use vzalloc to replace kcalloc. There is no need
for contiguous memory pages for a driver meta data structure (no need

Hi,

Replace continuous memory pages with virtual memory, is there any 
performance loss?


Not really. "table->icm" will be accessed as individual pointer 
variables randomly. Kcalloc


Sure. Thanks. If "table->icm" will be accessed as individual pointer 
variables randomly, the performance loss

caused by discontinuous memory will be very trivial.

Reviewed-by: Zhu Yanjun 

also returns a virtual address except its mapped pages are guaranteed 
to be contiguous
which will provide little advantage over vzalloc for individual 
pointer variable access.


Qing



Zhu Yanjun

of DMA ops).

Signed-off-by: Qing Huang 
Acked-by: Daniel Jurgens 
---
  drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c 
b/drivers/net/ethernet/mellanox/mlx4/icm.c

index a822f7a..2b17a4b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,12 @@
  #include "fw.h"
    /*
- * We allocate in as big chunks as we can, up to a maximum of 256 KB
- * per chunk.
+ * We allocate in 4KB page size chunks to avoid high order memory
+ * allocations in fragmented/high usage memory situation.
   */
  enum {
-    MLX4_ICM_ALLOC_SIZE    = 1 << 18,
-    MLX4_TABLE_CHUNK_SIZE    = 1 << 18
+    MLX4_ICM_ALLOC_SIZE    = 1 << 12,
+    MLX4_TABLE_CHUNK_SIZE    = 1 << 12
  };
    static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct 
mlx4_icm_chunk *chunk)
@@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, 
struct mlx4_icm_table *table,

  obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
  num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
  -    table->icm  = kcalloc(num_icm, sizeof(*table->icm), 
GFP_KERNEL);

+    table->icm  = vzalloc(num_icm * sizeof(*table->icm));
  if (!table->icm)
  return -ENOMEM;
  table->virt = virt;
@@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, 
struct mlx4_icm_table *table,

  mlx4_free_icm(dev, table->icm[i], use_coherent);
  }
  -    kfree(table->icm);
+    vfree(table->icm);
    return -ENOMEM;
  }
@@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev 
*dev, struct mlx4_icm_table *table)

  mlx4_free_icm(dev, table->icm[i], table->coherent);
  }
  -    kfree(table->icm);
+    vfree(table->icm);
  }

Re:Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS

2018-05-10 Thread Gao Feng

At 
2018-05-11 08:55:47, "Eric Dumazet" eric.duma...@gmail.com wrote:


On 05/10/2018 05:18 PM, Gao Feng wrote:
 At 2018-05-10 21:02:55, "Eric Dumazet" eric.duma...@gmail.com 
wrote:


 On 05/10/2018 01:28 AM, gfree.w...@vip.163.com wrote:
 From: Gao Feng gfree.w...@vip.163.com

 The skb flow limit is implemented for each CPU independently. 
In the
 current codes, the function skb_flow_limit gets the 
softnet_data by
 this_cpu_ptr. But the target cpu of enqueue_to_backlog would 
be not
 the current cpu when enable RPS. As the result, the 
skb_flow_limit checks
 the stats of current CPU, while the skb is going to append the 
queue of
 another CPU. It isn't the expected behavior.

 Now pass the softnet_data as a param to softnet_data to make 
consistent.


 Please add a correct Fixes: tag
 
 Thanks Eric.
 
 I have one question about the "Fixes: tag".
 Most of patches are bug fixes, but when need to add the "Fixes: tag", 
and when not ?
 
 I'm not clear about it. Could you explain it please?
 

For this particular patch, since you have not CC Willem (author of the 
patch),
I found very useful that you did a search to find out.
Once you found which commit added the problem, simply add the Fixes: tag 
and CC: the author.

Doing so saves us (stable teams, reviewers, maintainers) a lot of time 
really. Normally I get the "to" list by 
get_maintainer.pl script, now I would save the stable team ASAP.
In my opinion, Fixes: tags should be mandatory when 
applicable.Thanks your explanations, I get 
it.Best RegardsFeng
 Best Regards
 Feng
 

 By doing so, you will likely add a CC: tag to make sure the author 
of the code
 will receive your email and give feed back.

 Thanks !

Re: linux-next: Signed-off-by missing for commit in the net tree

2018-05-10 Thread Hangbin Liu

On Fri, May 11, 2018 at 07:17:16AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Commit
> 
>   0e8411e426e2 ("ipv4: reset fnhe_mtu_locked after cache route flushed")
> 
> is missing a Signed-off-by from its author.

Opps, My bad.

> After route cache is flushed via ipv4_sysctl_rtcache_flush(), we forget
> to reset fnhe_mtu_locked in rt_bind_exception(). When pmtu is updated
> in __ip_rt_update_pmtu(), it will return directly since the pmtu is
> still locked. e.g.
>
> + ip netns exec client ping 10.10.1.1 -c 1 -s 1400 -M do
> PING 10.10.1.1 (10.10.1.1) 1400(1428) bytes of data.
> From 10.10.0.254 icmp_seq=1 Frag needed and DF set (mtu = 0)
>
> --- 10.10.1.1 ping statistics ---
> 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

I shouldn't add comments with the '---' lines. David reminded me before. But
I didn't realise it when pasted the ping logs. Another lesson learned...

Thanks Stephen.

Regards
Hangbin

Re: [PATCH net] tun: fix use after free for ptr_ring

2018-05-10 Thread Jason Wang




On 2018年05月11日 02:08, Cong Wang wrote:

On Tue, May 8, 2018 at 11:59 PM, Jason Wang  wrote:

We used to initialize ptr_ring during TUNSETIFF, this is because its
size depends on the tx_queue_len of netdevice. And we try to clean it
up when socket were detached from netdevice. A race were spotted when
trying to do uninit during a read which will lead a use after free for
pointer ring. Solving this by always initialize a zero size ptr_ring
in open() and do resizing during TUNSETIFF, and then we can safely do
cleanup during close(). With this, there's no need for the workaround
that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak
for tfile->tx_array").


Ah, I didn't know ptr_ring_init(0) could work... Nice patch!
Except one thing below.



diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ef33950..298cb96 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -681,15 +681,6 @@ static void tun_queue_purge(struct tun_file *tfile)
 skb_queue_purge(>sk.sk_error_queue);
  }

-static void tun_cleanup_tx_ring(struct tun_file *tfile)
-{
-   if (tfile->tx_ring.queue) {
-   ptr_ring_cleanup(>tx_ring, tun_ptr_free);
-   xdp_rxq_info_unreg(>xdp_rxq);
-   memset(>tx_ring, 0, sizeof(tfile->tx_ring));
-   }
-}


I don't think you can totally remove ptr_ring_cleanup(), it should be
called unconditionally with your ptr_ring_init(0) trick, right?


Right, my bad. Actually I do intend to cleanup it at close() like what 
commit log said.


Will send v2.

Thanks

Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS

2018-05-10 Thread Eric Dumazet



On 05/10/2018 05:18 PM, Gao Feng wrote:
> At 2018-05-10 21:02:55, "Eric Dumazet"  wrote:
>>
>>
>> On 05/10/2018 01:28 AM, gfree.w...@vip.163.com wrote:
>>> From: Gao Feng 
>>>
>>> The skb flow limit is implemented for each CPU independently. In the
>>> current codes, the function skb_flow_limit gets the softnet_data by
>>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>>> the stats of current CPU, while the skb is going to append the queue of
>>> another CPU. It isn't the expected behavior.
>>>
>>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>>
>>
>> Please add a correct Fixes: tag
> 
> Thanks Eric.
> 
> I have one question about the "Fixes: tag".
> Most of patches are bug fixes, but when need to add the "Fixes: tag", and 
> when not ?
> 
> I'm not clear about it. Could you explain it please?
> 

For this particular patch, since you have not CC Willem (author of the patch),
I found very useful that you did a search to find out.
Once you found which commit added the problem, simply add the Fixes: tag and 
CC: the author.

Doing so saves us (stable teams, reviewers, maintainers) a lot of time really.

In my opinion, Fixes: tags should be mandatory when applicable.

> Best Regards
> Feng
> 
>>
>> By doing so, you will likely add a CC: tag to make sure the author of the 
>> code
>> will receive your email and give feed back.
>>
>> Thanks !
>>

Re: [PATCH net-next] udp: Fix kernel panic in UDP GSO path

2018-05-10 Thread Eric Dumazet



On 05/10/2018 05:38 PM, Sean Tranchetti wrote:
> Using GSO in the UDP path on a device with
> scatter-gather netdevice feature disabled will result in a kernel
> panic with the following call stack:
>
> This panic is the result of allocating SKBs with small size
> for the newly segmented SKB. If the scatter-gather feature is
> disabled, the code attempts to call skb_put() on the small SKB
> with an argument of nearly the entire unsegmented SKB length.
> 
> After this patch, attempting to use GSO with scatter-gather
> disabled will result in -EINVAL being returned.
> 
> Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
> Signed-off-by: Sean Tranchetti 
> Signed-off-by: Subash Abhinov Kasiviswanathan 
> ---
>  net/ipv4/ip_output.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index b5e21eb..0d63690 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1054,8 +1054,16 @@ static int __ip_append_data(struct sock *sk,
>   copy = length;
>  
>   if (!(rt->dst.dev->features_F_SG)) {
> + struct sk_buff *tmp;
>   unsigned int off;
>  
> + if (paged) {
> + err = -EINVAL;
> + while ((tmp = __skb_dequeue(queue)) != NULL)
> + kfree(tmp);
> + goto error;
> + }
> +
>   off = skb->len;
>   if (getfrag(from, skb_put(skb, copy),
>   offset, copy, off, skb) < 0) {
> 


Hmm, no, we absolutely need to fix GSO instead.

Think of a bonding device (or any virtual devices), your patch wont avoid the 
crash.

[PATCH net-next] udp: Fix kernel panic in UDP GSO path

2018-05-10 Thread Sean Tranchetti

Using GSO in the UDP path on a device with
scatter-gather netdevice feature disabled will result in a kernel
panic with the following call stack:

kernel BUG at net/core/skbuff.c:104!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
PC is at skb_panic+0x4c/0x54
LR is at skb_panic+0x4c/0x54
Process udpgso_bench_tx (pid: 4078, stack limit = 0xff8048de8000)
[] skb_panic+0x4c/0x54
[] skb_copy_bits+0x0/0x244
[] __ip_append_data+0x230/0x814
[] ip_make_skb+0xe4/0x178
[] udp_sendmsg+0x828/0x888
[] inet_sendmsg+0xe4/0x130
[] ___sys_sendmsg+0x1d8/0x2c0
[] SyS_sendmsg+0x90/0xe0

This panic is the result of allocating SKBs with small size
for the newly segmented SKB. If the scatter-gather feature is
disabled, the code attempts to call skb_put() on the small SKB
with an argument of nearly the entire unsegmented SKB length.

After this patch, attempting to use GSO with scatter-gather
disabled will result in -EINVAL being returned.

Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
Signed-off-by: Sean Tranchetti 
Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 net/ipv4/ip_output.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b5e21eb..0d63690 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1054,8 +1054,16 @@ static int __ip_append_data(struct sock *sk,
copy = length;
 
if (!(rt->dst.dev->features_F_SG)) {
+   struct sk_buff *tmp;
unsigned int off;
 
+   if (paged) {
+   err = -EINVAL;
+   while ((tmp = __skb_dequeue(queue)) != NULL)
+   kfree(tmp);
+   goto error;
+   }
+
off = skb->len;
if (getfrag(from, skb_put(skb, copy),
offset, copy, off, skb) < 0) {
-- 
1.9.1

Re: [PATCH v6 6/6] MIPS: Boston: Adjust DT for pch_gbe PHY support

2018-05-10 Thread Andrew Lunn

>   eg20t_mac@2,0,1 {
>   compatible = "pci8086,8802";
>   reg = <0x00020100 0 0 0 0>;
> - phy-reset-gpios = <_gpio 6
> -GPIO_ACTIVE_LOW>;
> +
> + #address-cells = <1>;
> + #size-cells = <0>;

It is generally a good idea to put an 'mdio' container which the PHYs
are on. You then pass this container node to of_mdiobus_register().

> +
> + ethernet-phy@0 {
> + compatible = 
> "ethernet-phy-id001c.c915";
> + reg = <0>;
> + reset-gpios = <_gpio 6 
> GPIO_ACTIVE_LOW>;
> + reset-assert-us = <25000>;
> + reset-deassert-us = <25000>;
> + };

  Andrew

Re: [PATCH v6 6/6] MIPS: Boston: Adjust DT for pch_gbe PHY support

2018-05-10 Thread Andrew Lunn

> + ethernet-phy@0 {
> + compatible = 
> "ethernet-phy-id001c.c915";

You only need to specify the compatible string like this if the PHY
has its own ID wrong. The AT802x gets this right, so you don't need
this.

Andrew

Re: [PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()

2018-05-10 Thread Andrew Lunn

On Thu, May 10, 2018 at 04:16:52PM -0700, Paul Burton wrote:
> From: Andrew Lunn 
> 
> On some boards, this PHY has a problem when it hibernates. Export this
> function to a board can register a PHY fixup to disable hibernation.

What do you know about the problem?

https://patchwork.ozlabs.org/patch/686371/

I don't remember how it was solved, but you should probably do the
same.

Andrew

Re:Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS

2018-05-10 Thread Gao Feng

At 2018-05-10 21:02:55, "Eric Dumazet"  wrote:
>
>
>On 05/10/2018 01:28 AM, gfree.w...@vip.163.com wrote:
>> From: Gao Feng 
>> 
>> The skb flow limit is implemented for each CPU independently. In the
>> current codes, the function skb_flow_limit gets the softnet_data by
>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>> the stats of current CPU, while the skb is going to append the queue of
>> another CPU. It isn't the expected behavior.
>> 
>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>
>
>Please add a correct Fixes: tag

Thanks Eric.

I have one question about the "Fixes: tag".
Most of patches are bug fixes, but when need to add the "Fixes: tag", and when 
not ?

I'm not clear about it. Could you explain it please?

Best Regards
Feng

>
>By doing so, you will likely add a CC: tag to make sure the author of the code
>will receive your email and give feed back.
>
>Thanks !
>

Re: [PATCH] mlx4_core: allocate 4KB ICM chunks

2018-05-10 Thread Yanjun Zhu




On 2018/5/11 7:31, Qing Huang wrote:

When a system is under memory presure (high usage with fragments),
the original 256KB ICM chunk allocations will likely trigger kernel
memory management to enter slow path doing memory compact/migration
ops in order to complete high order memory allocations.

When that happens, user processes calling uverb APIs may get stuck
for more than 120s easily even though there are a lot of free pages
in smaller chunks available in the system.

Syslog:
...
Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
oracle_205573_e:205573 blocked for more than 120 seconds.
...

With 4KB ICM chunk size, the above issue is fixed.

However in order to support 4KB ICM chunk size, we need to fix another
issue in large size kcalloc allocations.

E.g.
Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
entry). So we need a 16MB allocation for a table->icm pointer array to
hold 2M pointers which can easily cause kcalloc to fail.

The solution is to use vzalloc to replace kcalloc. There is no need
for contiguous memory pages for a driver meta data structure (no need

Hi,

Replace continuous memory pages with virtual memory, is there any 
performance loss?


Zhu Yanjun

of DMA ops).

Signed-off-by: Qing Huang 
Acked-by: Daniel Jurgens 
---
  drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c 
b/drivers/net/ethernet/mellanox/mlx4/icm.c
index a822f7a..2b17a4b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,12 @@
  #include "fw.h"
  
  /*

- * We allocate in as big chunks as we can, up to a maximum of 256 KB
- * per chunk.
+ * We allocate in 4KB page size chunks to avoid high order memory
+ * allocations in fragmented/high usage memory situation.
   */
  enum {
-   MLX4_ICM_ALLOC_SIZE = 1 << 18,
-   MLX4_TABLE_CHUNK_SIZE   = 1 << 18
+   MLX4_ICM_ALLOC_SIZE = 1 << 12,
+   MLX4_TABLE_CHUNK_SIZE   = 1 << 12
  };
  
  static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)

@@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct 
mlx4_icm_table *table,
obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
  
-	table->icm  = kcalloc(num_icm, sizeof(*table->icm), GFP_KERNEL);

+   table->icm  = vzalloc(num_icm * sizeof(*table->icm));
if (!table->icm)
return -ENOMEM;
table->virt = virt;
@@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct 
mlx4_icm_table *table,
mlx4_free_icm(dev, table->icm[i], use_coherent);
}
  
-	kfree(table->icm);

+   vfree(table->icm);
  
  	return -ENOMEM;

  }
@@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, struct 
mlx4_icm_table *table)
mlx4_free_icm(dev, table->icm[i], table->coherent);
}
  
-	kfree(table->icm);

+   vfree(table->icm);
  }

Re: [PATCH bpf-next 0/7] bpf: add perf event reading loop and move samples closer to libbpf

2018-05-10 Thread Daniel Borkmann

On 05/10/2018 07:24 PM, Jakub Kicinski wrote:
> Hi!
> 
> This series started out as a follow up to the bpftool perf event dumping
> patches.
> 
> As suggested by Daniel patch 1 makes use of PERF_SAMPLE_TIME to simplify
> code and improve accuracy of timestamps.
> 
> Remaining patches are trying to move perf event loop into libbpf as
> suggested by Alexei.  One user for this new function is bpftool which
> links with libbpf nicely, the other, unfortunately, is in samples/bpf.
> Remaining patches make samples/bpf link against full libbpf.a (not just
> a handful of objects).  Once we have full power of libbpf at our disposal
> we can convert some of XDP samples to use libbpf loader instead of
> bpf_load.c.  My understanding is that this is the desired direction,
> at least for networking code.

Looks good, applied to bpf-next, thanks Jakub!

[RFC PATCH] ipv6: sr: lwt_seg6local_verifier_ops can be static

2018-05-10 Thread kbuild test robot


Fixes: e7d82c64d15a ("ipv6: sr: Add seg6local action End.BPF")
Signed-off-by: Fengguang Wu 
---
 filter.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index ce10f20..9e47c86 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6195,13 +6195,13 @@ const struct bpf_prog_ops lwt_xmit_prog_ops = {
.test_run   = bpf_prog_test_run_skb,
 };
 
-const struct bpf_verifier_ops lwt_seg6local_verifier_ops = {
+static const struct bpf_verifier_ops lwt_seg6local_verifier_ops = {
.get_func_proto = lwt_seg6local_func_proto,
.is_valid_access= lwt_is_valid_access,
.convert_ctx_access = bpf_convert_ctx_access,
 };
 
-const struct bpf_prog_ops lwt_seg6local_prog_ops = {
+static const struct bpf_prog_ops lwt_seg6local_prog_ops = {
.test_run   = bpf_prog_test_run_skb,
 };

Re: [PATCH bpf-next v4 5/6] ipv6: sr: Add seg6local action End.BPF

2018-05-10 Thread kbuild test robot

Hi Mathieu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:
https://github.com/0day-ci/linux/commits/Mathieu-Xhonneux/ipv6-sr-introduce-seg6local-End-BPF-action/20180511-032546
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   net/core/filter.c:112:48: sparse: expression using sizeof(void)
   net/core/filter.c:112:48: sparse: expression using sizeof(void)
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:406:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:409:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:412:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:415:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:418:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:481:27: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:484:27: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:487:27: sparse: subtraction of functions? Share your drugs
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   net/core/filter.c:1368:39: sparse: incorrect type in argument 1 (different 
address spaces) @@expected struct sock_filter const *filter @@got 
struct sockstruct sock_filter const *filter @@
   net/core/filter.c:1368:39:expected struct sock_filter const *filter
   net/core/filter.c:1368:39:got struct sock_filter [noderef] *filter
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   net/core/filter.c:1470:39: sparse: incorrect type in argument 1 (different 
address spaces) @@expected struct sock_filter const *filter @@got 
struct sockstruct sock_filter const *filter @@
   net/core/filter.c:1470:39:expected struct sock_filter const *filter
   net/core/filter.c:1470:39:got struct sock_filter [noderef] *filter
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   net/core/filter.c:1772:43: sparse: incorrect type in argument 2 (different 
base types) @@expected restricted __wsum [usertype] diff @@got unsigned 
lonrestricted __wsum [usertype] diff @@
   net/core/filter.c:1772:43:expected restricted __wsum [usertype] diff
   net/core/filter.c:1772:43:got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1775:36: sparse: incorrect type in argument 2 (different 
base types) @@expected restricted __be16 [usertype] old @@got unsigned 
lonrestricted __be16 [usertype] old @@
   net/core/filter.c:1775:36:expected restricted __be16 [usertype] old
   net/core/filter.c:1775:36:got unsigned long long [unsigned] [usertype] 
from
   net/core/filter.c:1775:42: sparse: incorrect type in argument 3 (different 
base types) @@expected restricted __be16 [usertype] new @@got unsigned 
lonrestricted __be16 [usertype] new @@
   net/core/filter.c:1775:42:expected restricted __be16 [usertype] new
   net/core/filter.c:1775:42:got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1778:36: sparse: incorrect type in argument 2 (different 
base types) @@expected restricted __be32 [usertype] from @@got unsigned 
lonrestricted __be32 [usertype] from @@

Re: [PATCH net] macmace: Set platform device coherent_dma_mask

2018-05-10 Thread Finn Thain

On Fri, 11 May 2018, Michael Schmitz wrote:

> > > Perhaps you can add a new helper 
> > > (platform_device_register_simple_dma()?) that takes the DMA mask, 
> > > too?
...
> >
> > So far, it looks like macmace and macsonic would be the only callers 
> > of this new API call.
> >
> > What's worse, if you do pass a dma_mask in struct 
> > platform_device_info, you end up with this problem in 
> > platform_device_register_full():
> >
> > if (pdevinfo->dma_mask) {
> > /*
> >  * This memory isn't freed when the device is put,
> >  * I don't have a nice idea for that though.  Conceptually
> >  * dma_mask in struct device should not be a pointer.
> >  * See http://thread.gmane.org/gmane.linux.kernel.pci/9081
> >  */
> > pdev->dev.dma_mask =
> > kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL);
> 
> Maybe platform_device_register_full() should rather check whether 
> dev.coherent_dma_mask is set, and make dev.dma_mask point to that? This 
> is how we solved the warning issue for the Zorro bus devices... 
> (8614f1b58bd0e920a5859464a500b93152c5f8b1)
> 

The claim in the comment above that a pointer is the wrong solution 
suggests that your proposal won't get far. Also, your proposal doesn't 
address the other issues I raised: a new 
platform_device_register_simple_dma() API would only have two callers, and 
the dma mask setup for device-tree probed platform devices is apparently a 
work-in-progress (which I don't want to churn up).

> > > With people setting the mask to kill the WARNING splat, this may 
> > > become more common.
> >
> > Since the commit which introduced the WARNING, only commits f61e64310b75
> > ("m68k: set dma and coherent masks for platform FEC ethernets") and
> > 7bcfab202ca7 ("powerpc/macio: set a proper dma_coherent_mask") seem to be
> > aimed at squelching that WARNING.
> >
> > (Am I missing any others?)
> 
> Zorro devices :-)

Right, I should add commit 55496d3fe2ac ("zorro: Set up z->dev.dma_mask 
for the DMA API") to that list.

> Which begs the question: why can' you set up all Nubus bus devices' DMA 
> masks in nubus_device_register(), or nubus_add_board()?

I am expecting to see the same WARNING from the nubus sonic driver but it 
hasn't happened yet, so I don't have a patch for it yet. In anycase, the 
nubus fix would be a lot like the zorro bus fix, so I don't see a problem.

--

[PATCH v6 6/6] MIPS: Boston: Adjust DT for pch_gbe PHY support

2018-05-10 Thread Paul Burton

The pch_gbe driver support for PHY reset GPIOs is now provided by the
standard phylib infrastructure, using a standard PHY binding. Adjust the
Boston devicetree to make use of the standard PHY binding.

This is possible because we bundle the DT along with the kernel binary
into a Flattened Image Tree, so the DT and kernel are always shipped
together for the Boston platform.

Signed-off-by: Paul Burton 
Cc: Andrew Lunn 
Cc: David S. Miller 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org

---

Changes in v6:
- New patch.

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 arch/mips/boot/dts/img/boston.dts | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/mips/boot/dts/img/boston.dts 
b/arch/mips/boot/dts/img/boston.dts
index 65af3f6ba81c..cb55f7ba20c3 100644
--- a/arch/mips/boot/dts/img/boston.dts
+++ b/arch/mips/boot/dts/img/boston.dts
@@ -144,8 +144,17 @@
eg20t_mac@2,0,1 {
compatible = "pci8086,8802";
reg = <0x00020100 0 0 0 0>;
-   phy-reset-gpios = <_gpio 6
-  GPIO_ACTIVE_LOW>;
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ethernet-phy@0 {
+   compatible = 
"ethernet-phy-id001c.c915";
+   reg = <0>;
+   reset-gpios = <_gpio 6 
GPIO_ACTIVE_LOW>;
+   reset-assert-us = <25000>;
+   reset-deassert-us = <25000>;
+   };
};
 
eg20t_gpio: eg20t_gpio@2,0,2 {
-- 
2.17.0

Re: [PATCH bpf-next] selftests/bpf: Fix bash reference in Makefile

2018-05-10 Thread Daniel Borkmann

On 05/11/2018 12:26 AM, Joe Stringer wrote:
> '|& ...' is a bash 4.0+ construct which is not guaranteed to be available
> when using '$(shell ...)' in a Makefile. Fall back to the more portable
> '2>&1 | ...'.
> 
> Fixes the following warning during compilation:
> 
>   /bin/sh: 1: Syntax error: "&" unexpected
> 
> Signed-off-by: Joe Stringer 

Applied to bpf-next, thanks Joe!

Re: [PATCH] coredump: rename umh_pipe_setup() to coredump_pipe_setup()

2018-05-10 Thread Luis R. Rodriguez

On Thu, May 10, 2018 at 04:19:09PM -0700, Alexei Starovoitov wrote:
> On Mon, May 07, 2018 at 04:30:02PM -0700, Luis R. Rodriguez wrote:
> > This makes it clearer this code is part of the coredump code, and
> > is not an exported generic helper from kernel/umh.c.
> > 
> > Signed-off-by: Luis R. Rodriguez 
> > ---
> >  fs/coredump.c | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index 1e2c87acac9b..566504781683 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -508,7 +508,7 @@ static void wait_for_dump_helpers(struct file *file)
> >  }
> >  
> >  /*
> > - * umh_pipe_setup
> > + * coredump_pipe_setup
> >   * helper function to customize the process used
> >   * to collect the core in userspace.  Specifically
> >   * it sets up a pipe and installs it as fd 0 (stdin)
> > @@ -518,7 +518,7 @@ static void wait_for_dump_helpers(struct file *file)
> >   * is a special value that we use to trap recursive
> >   * core dumps
> >   */
> > -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> > +static int coredump_pipe_setup(struct subprocess_info *info, struct cred 
> > *new)
> 
> I think this renaming makes sense.
> How do we want to proceed?
> I can take it as part of my series and get the whole thing through net-next
> or folks want to apply this separately?

I think net-next makes sense if Al Viro is OK with that. This way it could go
in regardless of the state of your series, but it also lines up with your work.

  Luis

[PATCH v6 5/6] net: pch_gbe: Allow build on MIPS platforms

2018-05-10 Thread Paul Burton

Allow the pch_gbe driver to be built on MIPS platforms, allowing its use
on the MIPS Boston development board.

Signed-off-by: Paul Burton 
Cc: Andrew Lunn 
Cc: David S. Miller 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org

---

Changes in v6:
- None.

Changes in v5:
- None.

Changes in v4:
- None.

Changes in v3:
- None.

Changes in v2:
- None.

 drivers/net/ethernet/oki-semi/pch_gbe/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig 
b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
index 045256e99586..bf85c44fb7e5 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
@@ -4,7 +4,7 @@
 
 config PCH_GBE
tristate "OKI SEMICONDUCTOR IOH(ML7223/ML7831) GbE"
-   depends on PCI && (X86_32 || COMPILE_TEST)
+   depends on PCI && (X86_32 || MIPS || COMPILE_TEST)
select PTP_1588_CLOCK_PCH
select NET_PTP_CLASSIFY
select AT803X_PHY
-- 
2.17.0

[PATCH] mlx4_core: allocate 4KB ICM chunks

2018-05-10 Thread Qing Huang

When a system is under memory presure (high usage with fragments),
the original 256KB ICM chunk allocations will likely trigger kernel
memory management to enter slow path doing memory compact/migration
ops in order to complete high order memory allocations.

When that happens, user processes calling uverb APIs may get stuck
for more than 120s easily even though there are a lot of free pages
in smaller chunks available in the system.

Syslog:
...
Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
oracle_205573_e:205573 blocked for more than 120 seconds.
...

With 4KB ICM chunk size, the above issue is fixed.

However in order to support 4KB ICM chunk size, we need to fix another
issue in large size kcalloc allocations.

E.g.
Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
entry). So we need a 16MB allocation for a table->icm pointer array to
hold 2M pointers which can easily cause kcalloc to fail.

The solution is to use vzalloc to replace kcalloc. There is no need
for contiguous memory pages for a driver meta data structure (no need
of DMA ops).

Signed-off-by: Qing Huang 
Acked-by: Daniel Jurgens 
---
 drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c 
b/drivers/net/ethernet/mellanox/mlx4/icm.c
index a822f7a..2b17a4b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,12 @@
 #include "fw.h"
 
 /*
- * We allocate in as big chunks as we can, up to a maximum of 256 KB
- * per chunk.
+ * We allocate in 4KB page size chunks to avoid high order memory
+ * allocations in fragmented/high usage memory situation.
  */
 enum {
-   MLX4_ICM_ALLOC_SIZE = 1 << 18,
-   MLX4_TABLE_CHUNK_SIZE   = 1 << 18
+   MLX4_ICM_ALLOC_SIZE = 1 << 12,
+   MLX4_TABLE_CHUNK_SIZE   = 1 << 12
 };
 
 static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk 
*chunk)
@@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct 
mlx4_icm_table *table,
obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
 
-   table->icm  = kcalloc(num_icm, sizeof(*table->icm), GFP_KERNEL);
+   table->icm  = vzalloc(num_icm * sizeof(*table->icm));
if (!table->icm)
return -ENOMEM;
table->virt = virt;
@@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct 
mlx4_icm_table *table,
mlx4_free_icm(dev, table->icm[i], use_coherent);
}
 
-   kfree(table->icm);
+   vfree(table->icm);
 
return -ENOMEM;
 }
@@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, struct 
mlx4_icm_table *table)
mlx4_free_icm(dev, table->icm[i], table->coherent);
}
 
-   kfree(table->icm);
+   vfree(table->icm);
 }
-- 
2.9.3

Re: [bpf-next v3 0/9] bpf: Add helper to do FIB lookups

2018-05-10 Thread Daniel Borkmann

On 05/10/2018 05:34 AM, David Ahern wrote:
> Provide a helper for doing a FIB and neighbor lookup in the kernel
> tables from an XDP program. The helper provides a fastpath for forwarding
> packets. If the packet is a local delivery or for any reason is not a
> simple lookup and forward, the packet is expected to continue up the stack
> for full processing.
> 
> The response from a FIB and neighbor lookup is either the egress index
> with the bpf_fib_lookup struct filled in with dmac and gateway or
> 0 meaning the packet should continue up the stack. In time we can
> revisit this to return the FIB lookup result errno if it is one of the
> special RTN_'s such as RTN_BLACKHOLE (-EINVAL) so that the XDP
> programs can do an early drop if desired.
> 
> Patches 1-6 do some more refactoring to IPv6 with the end goal of
> extracting a FIB lookup function that aligns with fib_lookup for IPv4,
> basically returning a fib6_info without creating a dst based entry.
> 
> Patch 7 adds lookup functions to the ipv6 stub. These are needed since
> bpf is built into the kernel and ipv6 may not be built or loaded.
> 
> Patch 8 adds the bpf helper and 9 adds a sample program.
> 
> v3
> - remove ETH_ALEN and in6_addr from uapi header

Applied to bpf-next, thanks David!

[PATCH v6 4/6] ptp: pch: Allow build on MIPS platforms

2018-05-10 Thread Paul Burton

Allow the ptp_pch driver to be built on MIPS platforms in preparation
for use on the MIPS Boston board.

Signed-off-by: Paul Burton 
Acked-by: Richard Cochran 
Cc: Andrew Lunn 
Cc: David S. Miller 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org

Signed-off-by: Paul Burton 
---

Changes in v6: None
Changes in v5:
- Newly included in this series to satisfy Kconfig.

Changes in v4: None
Changes in v3: None
Changes in v2: None

 drivers/ptp/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index a21ad10d613c..8618982ab96a 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -90,7 +90,7 @@ config DP83640_PHY
 
 config PTP_1588_CLOCK_PCH
tristate "Intel PCH EG20T as PTP clock"
-   depends on X86_32 || COMPILE_TEST
+   depends on X86_32 || MIPS || COMPILE_TEST
depends on HAS_IOMEM && NET
imply PTP_1588_CLOCK
help
-- 
2.17.0

[net 2/3] net/mlx5: E-Switch, Include VF RDMA stats in vport statistics

2018-05-10 Thread Saeed Mahameed

From: Adi Nissim 

The host side reporting of VF vport statistics didn't include the VF
RDMA traffic.

Fixes: 3b751a2a418a ("net/mlx5: E-Switch, Introduce get vf statistics")
Signed-off-by: Adi Nissim 
Reported-by: Ariel Almog 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 332bc56306bf..1352d13eedb3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -2175,26 +2175,35 @@ int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch 
*esw,
memset(vf_stats, 0, sizeof(*vf_stats));
vf_stats->rx_packets =
MLX5_GET_CTR(out, received_eth_unicast.packets) +
+   MLX5_GET_CTR(out, received_ib_unicast.packets) +
MLX5_GET_CTR(out, received_eth_multicast.packets) +
+   MLX5_GET_CTR(out, received_ib_multicast.packets) +
MLX5_GET_CTR(out, received_eth_broadcast.packets);
 
vf_stats->rx_bytes =
MLX5_GET_CTR(out, received_eth_unicast.octets) +
+   MLX5_GET_CTR(out, received_ib_unicast.octets) +
MLX5_GET_CTR(out, received_eth_multicast.octets) +
+   MLX5_GET_CTR(out, received_ib_multicast.octets) +
MLX5_GET_CTR(out, received_eth_broadcast.octets);
 
vf_stats->tx_packets =
MLX5_GET_CTR(out, transmitted_eth_unicast.packets) +
+   MLX5_GET_CTR(out, transmitted_ib_unicast.packets) +
MLX5_GET_CTR(out, transmitted_eth_multicast.packets) +
+   MLX5_GET_CTR(out, transmitted_ib_multicast.packets) +
MLX5_GET_CTR(out, transmitted_eth_broadcast.packets);
 
vf_stats->tx_bytes =
MLX5_GET_CTR(out, transmitted_eth_unicast.octets) +
+   MLX5_GET_CTR(out, transmitted_ib_unicast.octets) +
MLX5_GET_CTR(out, transmitted_eth_multicast.octets) +
+   MLX5_GET_CTR(out, transmitted_ib_multicast.octets) +
MLX5_GET_CTR(out, transmitted_eth_broadcast.octets);
 
vf_stats->multicast =
-   MLX5_GET_CTR(out, received_eth_multicast.packets);
+   MLX5_GET_CTR(out, received_eth_multicast.packets) +
+   MLX5_GET_CTR(out, received_ib_multicast.packets);
 
vf_stats->broadcast =
MLX5_GET_CTR(out, received_eth_broadcast.packets);
-- 
2.14.3

[PATCH v6 3/6] net: pch_gbe: Support DeviceTree for MDIO/PHY description

2018-05-10 Thread Paul Burton

When running on a system which uses device tree, use
of_mdiobus_register() rather than plain mdiobus_register() in order to
support parsing PHY information from the DT.

On systems without CONFIG_OF_MDIO set of_mdiobus_register() falls back
to mdiobus_register() anyway, but here we check for a non-NULL device
node in order to continue functioning as-is if a system has
CONFIG_OF_MDIO=y but doesn't use the devicetree.

Signed-off-by: Paul Burton 
Cc: Andrew Lunn 
Cc: David S. Miller 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org

---

Changes in v6:
- New patch, significantly simplified by Andrew's preceding patches.

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index b20ed110cdef..f491044c2739 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRV_VERSION "1.01"
 const char pch_driver_version[] = DRV_VERSION;
@@ -829,6 +830,9 @@ static int pch_gbe_init_mdio(struct pch_gbe_adapter 
*adapter)
 
adapter->mdiobus = bus;
 
+   if (dev->of_node)
+   return of_mdiobus_register(bus, dev->of_node);
+
return mdiobus_register(bus);
 }
 
-- 
2.17.0

[net 3/3] net/mlx5e: Err if asked to offload TC match on frag being first

2018-05-10 Thread Saeed Mahameed

From: Roi Dayan 

The HW doesn't support matching on frag first/later, return error if we are
asked to offload that.

Fixes: 3f7d0eb42d59 ("net/mlx5e: Offload TC matching on packets being IP 
fragments")
Signed-off-by: Roi Dayan 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3c534fc43400..b94276db3ce9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1261,6 +1261,10 @@ static int __parse_cls_flower(struct mlx5e_priv *priv,
  f->mask);
addr_type = key->addr_type;
 
+   /* the HW doesn't support frag first/later */
+   if (mask->flags & FLOW_DIS_FIRST_FRAG)
+   return -EOPNOTSUPP;
+
if (mask->flags & FLOW_DIS_IS_FRAGMENT) {
MLX5_SET(fte_match_set_lyr_2_4, headers_c, frag, 1);
MLX5_SET(fte_match_set_lyr_2_4, headers_v, frag,
-- 
2.14.3

[PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()

2018-05-10 Thread Paul Burton

From: Andrew Lunn 

On some boards, this PHY has a problem when it hibernates. Export this
function to a board can register a PHY fixup to disable hibernation.

Signed-off-by: Andrew Lunn 
Signed-off-by: Paul Burton 
Cc: David S. Miller 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org

---

Changes in v6:
- New patch.

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 drivers/net/phy/at803x.c   |  5 +++--
 include/linux/at803x_phy.h | 16 
 2 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/at803x_phy.h

diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index 411cf1072bae..5aede5708abf 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define AT803X_INTR_ENABLE 0x12
 #define AT803X_INTR_ENABLE_AUTONEG_ERR BIT(15)
@@ -93,8 +94,8 @@ static int at803x_debug_reg_read(struct phy_device *phydev, 
u16 reg)
return phy_read(phydev, AT803X_DEBUG_DATA);
 }
 
-static int at803x_debug_reg_mask(struct phy_device *phydev, u16 reg,
-u16 clear, u16 set)
+int at803x_debug_reg_mask(struct phy_device *phydev, u16 reg,
+ u16 clear, u16 set)
 {
u16 val;
int ret;
diff --git a/include/linux/at803x_phy.h b/include/linux/at803x_phy.h
new file mode 100644
index ..2460c17d56ec
--- /dev/null
+++ b/include/linux/at803x_phy.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _AT803X_PHY_H
+#define _PHY_AT803X_PHY
+
+#define ATH8030_PHY_ID 0x004dd076
+#define ATH8031_PHY_ID 0x004dd074
+#define ATH8035_PHY_ID 0x004dd072
+#define AT803X_PHY_ID_MASK 0xffef
+
+#define AT8031_HIBERNATE   0x0B
+#define AT8031_PS_HIB_EN   0x8000 /* Hibernate enable */
+
+int at803x_debug_reg_mask(struct phy_device *phydev, u16 reg,
+ u16 clear, u16 set);
+
+#endif /* _AT803X_PHY_H */
-- 
2.17.0

[net 1/3] net/mlx5: Free IRQs in shutdown path

2018-05-10 Thread Saeed Mahameed

From: Daniel Jurgens 

Some platforms require IRQs to be free'd in the shutdown path. Otherwise
they will fail to be reallocated after a kexec.

Fixes: 8812c24d28f4 ("net/mlx5: Add fast unload support in shutdown flow")
Signed-off-by: Daniel Jurgens 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 28 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  8 +++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  2 ++
 3 files changed, 38 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index c1c94974e16b..1814f803bd2c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -34,6 +34,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_RFS_ACCEL
+#include 
+#endif
 #include "mlx5_core.h"
 #include "fpga/core.h"
 #include "eswitch.h"
@@ -923,3 +926,28 @@ int mlx5_core_eq_query(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq,
MLX5_SET(query_eq_in, in, eq_number, eq->eqn);
return mlx5_cmd_exec(dev, in, sizeof(in), out, outlen);
 }
+
+/* This function should only be called after mlx5_cmd_force_teardown_hca */
+void mlx5_core_eq_free_irqs(struct mlx5_core_dev *dev)
+{
+   struct mlx5_eq_table *table = >priv.eq_table;
+   struct mlx5_eq *eq;
+
+#ifdef CONFIG_RFS_ACCEL
+   if (dev->rmap) {
+   free_irq_cpu_rmap(dev->rmap);
+   dev->rmap = NULL;
+   }
+#endif
+   list_for_each_entry(eq, >comp_eqs_list, list)
+   free_irq(eq->irqn, eq);
+
+   free_irq(table->pages_eq.irqn, >pages_eq);
+   free_irq(table->async_eq.irqn, >async_eq);
+   free_irq(table->cmd_eq.irqn, >cmd_eq);
+#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
+   if (MLX5_CAP_GEN(dev, pg))
+   free_irq(table->pfault_eq.irqn, >pfault_eq);
+#endif
+   pci_free_irq_vectors(dev->pdev);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 63a8ea31601c..e2c465b0b3f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1587,6 +1587,14 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev 
*dev)
 
mlx5_enter_error_state(dev, true);
 
+   /* Some platforms requiring freeing the IRQ's in the shutdown
+* flow. If they aren't freed they can't be allocated after
+* kexec. There is no need to cleanup the mlx5_core software
+* contexts.
+*/
+   mlx5_irq_clear_affinity_hints(dev);
+   mlx5_core_eq_free_irqs(dev);
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 7d001fe6e631..023882d9a22e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -128,6 +128,8 @@ int mlx5_core_eq_query(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq,
   u32 *out, int outlen);
 int mlx5_start_eqs(struct mlx5_core_dev *dev);
 void mlx5_stop_eqs(struct mlx5_core_dev *dev);
+/* This function should only be called after mlx5_cmd_force_teardown_hca */
+void mlx5_core_eq_free_irqs(struct mlx5_core_dev *dev);
 struct mlx5_eq *mlx5_eqn2eq(struct mlx5_core_dev *dev, int eqn);
 u32 mlx5_eq_poll_irq_disabled(struct mlx5_eq *eq);
 void mlx5_cq_tasklet_cb(unsigned long data);
-- 
2.14.3

Re: [PATCH] coredump: rename umh_pipe_setup() to coredump_pipe_setup()

2018-05-10 Thread Alexei Starovoitov

On Mon, May 07, 2018 at 04:30:02PM -0700, Luis R. Rodriguez wrote:
> This makes it clearer this code is part of the coredump code, and
> is not an exported generic helper from kernel/umh.c.
> 
> Signed-off-by: Luis R. Rodriguez 
> ---
>  fs/coredump.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 1e2c87acac9b..566504781683 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -508,7 +508,7 @@ static void wait_for_dump_helpers(struct file *file)
>  }
>  
>  /*
> - * umh_pipe_setup
> + * coredump_pipe_setup
>   * helper function to customize the process used
>   * to collect the core in userspace.  Specifically
>   * it sets up a pipe and installs it as fd 0 (stdin)
> @@ -518,7 +518,7 @@ static void wait_for_dump_helpers(struct file *file)
>   * is a special value that we use to trap recursive
>   * core dumps
>   */
> -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> +static int coredump_pipe_setup(struct subprocess_info *info, struct cred 
> *new)

I think this renaming makes sense.
How do we want to proceed?
I can take it as part of my series and get the whole thing through net-next
or folks want to apply this separately?

[pull request][net 0/3] Mellanox, mlx5 fixes 2018-05-10

2018-05-10 Thread Saeed Mahameed

Hi Dave,

the following series includes some fixes for mlx5 core driver.
Please pull and let me know if there's any problem.

For -stable v4.5
("net/mlx5: E-Switch, Include VF RDMA stats in vport statistics")

For -stable v4.10
("net/mlx5e: Err if asked to offload TC match on frag being first")

Thanks,
Saeed.

---

The following changes since commit ca3943c4aaff083bc25419f04e549e293590258e:

  Merge tag 'linux-can-fixes-for-4.17-20180510' of 
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can (2018-05-10 
17:57:11 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-fixes-2018-05-10

for you to fetch changes up to f85900c3e13fdb61f040c9feecbcda601e0cdcfb:

  net/mlx5e: Err if asked to offload TC match on frag being first (2018-05-10 
16:10:13 -0700)


mlx5-fixes-2018-05-10


Adi Nissim (1):
  net/mlx5: E-Switch, Include VF RDMA stats in vport statistics

Daniel Jurgens (1):
  net/mlx5: Free IRQs in shutdown path

Roi Dayan (1):
  net/mlx5e: Err if asked to offload TC match on frag being first

 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  4 
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 28 ++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 11 -
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  8 +++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  2 ++
 5 files changed, 52 insertions(+), 1 deletion(-)

[PATCH v6 2/6] net: ethernet: pch_gbe: Convert to mdiobus and phylib

2018-05-10 Thread Paul Burton

From: Andrew Lunn 

Convert this driver to use the mdio bus and phylib infrastructure.  It
will then use the common AT803X PHY driver, rather than use its own
code. Have the shared code also handle the GPIO used to reset the PHY.
To implement disabling PHY hibernation, which appears to cause issues
on the minnow board, add a PHY fixup.

Over all, these changes should make it easier to use other PHYs with
the MAC chip, and reduces the lines of code.

[paul.bur...@mips.com:
  - Select CONFIG_PHYLIB.
  - Drop selection of CONFIG_MII.
  - Restore the define of PCH_GBE_MAC_IFOP_RGMII.
  - Add GPIOF_ACTIVE_LOW to the minnow PHY reset GPIO flags.]

Signed-off-by: Andrew Lunn 
Signed-off-by: Paul Burton 
Cc: David S. Miller 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org

---

Changes in v6:
- New patch.

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 drivers/net/ethernet/oki-semi/pch_gbe/Kconfig |   3 +-
 .../net/ethernet/oki-semi/pch_gbe/Makefile|   2 +-
 .../net/ethernet/oki-semi/pch_gbe/pch_gbe.h   |  35 +-
 .../ethernet/oki-semi/pch_gbe/pch_gbe_api.c   | 118 --
 .../ethernet/oki-semi/pch_gbe/pch_gbe_api.h   |   8 +-
 .../oki-semi/pch_gbe/pch_gbe_ethtool.c|  89 +
 .../ethernet/oki-semi/pch_gbe/pch_gbe_main.c  | 378 +-
 .../ethernet/oki-semi/pch_gbe/pch_gbe_param.c | 265 
 .../ethernet/oki-semi/pch_gbe/pch_gbe_phy.c   | 377 -
 .../ethernet/oki-semi/pch_gbe/pch_gbe_phy.h   |  37 --
 10 files changed, 213 insertions(+), 1099 deletions(-)
 delete mode 100644 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_phy.c
 delete mode 100644 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_phy.h

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig 
b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
index 5f7a35212796..045256e99586 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
@@ -5,9 +5,10 @@
 config PCH_GBE
tristate "OKI SEMICONDUCTOR IOH(ML7223/ML7831) GbE"
depends on PCI && (X86_32 || COMPILE_TEST)
-   select MII
select PTP_1588_CLOCK_PCH
select NET_PTP_CLASSIFY
+   select AT803X_PHY
+   select PHYLIB
---help---
  This is a gigabit ethernet driver for EG20T PCH.
  EG20T PCH is the platform controller hub that is used in Intel's
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/Makefile 
b/drivers/net/ethernet/oki-semi/pch_gbe/Makefile
index 31288d4ad248..163ddda97bd1 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/Makefile
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_PCH_GBE) += pch_gbe.o
 
-pch_gbe-y := pch_gbe_phy.o pch_gbe_ethtool.o pch_gbe_param.o
+pch_gbe-y := pch_gbe_ethtool.o pch_gbe_param.o
 pch_gbe-y += pch_gbe_api.o pch_gbe_main.o
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
index 697e29dd4bd3..055cf9a2b418 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
@@ -22,7 +22,8 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
-#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -332,23 +333,11 @@ struct pch_gbe_hw;
  * struct  pch_gbe_functions - HAL APi function pointer
  * @get_bus_info:  for pch_gbe_hal_get_bus_info
  * @init_hw:   for pch_gbe_hal_init_hw
- * @read_phy_reg:  for pch_gbe_hal_read_phy_reg
- * @write_phy_reg: for pch_gbe_hal_write_phy_reg
- * @reset_phy: for pch_gbe_hal_phy_hw_reset
- * @sw_reset_phy:  for pch_gbe_hal_phy_sw_reset
- * @power_up_phy:  for pch_gbe_hal_power_up_phy
- * @power_down_phy:for pch_gbe_hal_power_down_phy
  * @read_mac_addr: for pch_gbe_hal_read_mac_addr
  */
 struct pch_gbe_functions {
void (*get_bus_info) (struct pch_gbe_hw *);
s32 (*init_hw) (struct pch_gbe_hw *);
-   s32 (*read_phy_reg) (struct pch_gbe_hw *, u32, u16 *);
-   s32 (*write_phy_reg) (struct pch_gbe_hw *, u32, u16);
-   void (*reset_phy) (struct pch_gbe_hw *);
-   void (*sw_reset_phy) (struct pch_gbe_hw *);
-   void (*power_up_phy) (struct pch_gbe_hw *hw);
-   void (*power_down_phy) (struct pch_gbe_hw *hw);
s32 (*read_mac_addr) (struct pch_gbe_hw *);
 };
 
@@ -378,18 +367,10 @@ struct pch_gbe_mac_info {
 
 /**
  * struct pch_gbe_phy_info - PHY information
- * @addr:  PHY address
- * @id:PHY's identifier
- * @revision:  PHY's revision
  * @reset_delay_us:HW reset delay time[us]
- * @autoneg_advertised:Autoneg advertised
  */
 struct pch_gbe_phy_info {
-   u32 addr;
-   u32 id;
-   u32 revision;
u32 reset_delay_us;
-   u16 autoneg_advertised;
 };
 
 /*!
@@ -578,6 +559,8 @@ struct pch_gbe_hw_stats {
u32 intr_tcpip_err_count;
 };
 
+struct

[PATCH v6 0/6] net: pch_gbe: MIPS support

2018-05-10 Thread Paul Burton

The Intel EG20T Platform Controller Hub is used on the MIPS Boston
development board to provide various peripherals including ethernet.

This series migrates the pch_gbe driver's PHY support to use phylib,
implements support for device tree which we use to provide the PHY reset
GPIO, and allows the driver to be built for MIPS.

Applies atop v4.17-rc4.

Please note that I don't have access to the Intel systems (eg.
MinnowBoard v1) that make use of this driver, so am unable to test on
those. If anyone with such a system could test the series that would be
much appreciated.

v6 of the series is later than I'd hoped, but we had a product
release[1] that kept me busy. My apologies!

The series is significantly different to earlier versions - Andrew did
the legwork of converting to phylib and that simplified things
significantly. v5 contained further fixes to the driver which I've
removed from v6 such that this series is just enough to get the driver
running on the MIPS Boston platform, despite a few bugs in the interest
of a simpler & more focused patch series. I'll submit those fixes
separately.

Thanks,
Paul

[1] 
https://www.mips.com/press/new-mips-i7200-processor-core-delivers-unmatched-performance-and-efficiency-for-advanced-lte5g-communications-and-networking-ic-designs/

Andrew Lunn (2):
  net: phy: at803x: Export at803x_debug_reg_mask()
  net: ethernet: pch_gbe: Convert to mdiobus and phylib

Paul Burton (4):
  net: pch_gbe: Support DeviceTree for MDIO/PHY description
  ptp: pch: Allow build on MIPS platforms
  net: pch_gbe: Allow build on MIPS platforms
  MIPS: Boston: Adjust DT for pch_gbe PHY support

 arch/mips/boot/dts/img/boston.dts |  13 +-
 drivers/net/ethernet/oki-semi/pch_gbe/Kconfig |   5 +-
 .../net/ethernet/oki-semi/pch_gbe/Makefile|   2 +-
 .../net/ethernet/oki-semi/pch_gbe/pch_gbe.h   |  35 +-
 .../ethernet/oki-semi/pch_gbe/pch_gbe_api.c   | 118 --
 .../ethernet/oki-semi/pch_gbe/pch_gbe_api.h   |   8 +-
 .../oki-semi/pch_gbe/pch_gbe_ethtool.c|  89 +---
 .../ethernet/oki-semi/pch_gbe/pch_gbe_main.c  | 382 +-
 .../ethernet/oki-semi/pch_gbe/pch_gbe_param.c | 265 
 .../ethernet/oki-semi/pch_gbe/pch_gbe_phy.c   | 377 -
 .../ethernet/oki-semi/pch_gbe/pch_gbe_phy.h   |  37 --
 drivers/net/phy/at803x.c  |   5 +-
 drivers/ptp/Kconfig   |   2 +-
 include/linux/at803x_phy.h|  16 +
 14 files changed, 249 insertions(+), 1105 deletions(-)
 delete mode 100644 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_phy.c
 delete mode 100644 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_phy.h
 create mode 100644 include/linux/at803x_phy.h

-- 
2.17.0

Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper

2018-05-10 Thread Alexei Starovoitov

On Thu, May 10, 2018 at 03:27:24PM -0700, Kees Cook wrote:
> On Fri, May 4, 2018 at 12:56 PM, Luis R. Rodriguez  wrote:
> > What a mighty short list of reviewers. Adding some more. My review below.
> > I'd appreciate a Cc on future versions of these patches.
> 
> Me too, please. And likely linux-security-module@ and Jessica too.
> 
> > On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> >> Introduce helper:
> >> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> >> struct umh_info {
> >>struct file *pipe_to_umh;
> >>struct file *pipe_from_umh;
> >>pid_t pid;
> >> };
> >>
> >> that GPLed kernel modules (signed or unsigned) can use it to execute part
> >> of its own data as swappable user mode process.
> >>
> >> The kernel will do:
> >> - mount "tmpfs"
> >> - allocate a unique file in tmpfs
> >> - populate that file with [data, data + len] bytes
> >> - user-mode-helper code will do_execve that file and, before the process
> >>   starts, the kernel will create two unix pipes for bidirectional
> >>   communication between kernel module and umh
> >> - close tmpfs file, effectively deleting it
> >> - the fork_usermode_blob will return zero on success and populate
> >>   'struct umh_info' with two unix pipes and the pid of the user process
> 
> I'm trying to think how LSMs can successfully reason about the
> resulting exec(). In the past, we've replaced "blob" style interfaces
> with file-based interfaces (e.g. init_module() -> finit_module(),
> kexec_load() -> kexec_file_load()) to better let the kernel understand
> the origin of executable content. Here the intent is fine: we're
> getting the exec from an already-loaded module, etc, etc. I'm trying
> to think specifically about the interface.
> 
> How can the ultimate exec get tied back to the kernel module in a way
> that the LSM can query? Right now the hooks hit during exec are:
> kernel_read_file() and kernel_post_read_file() of tmpfs file,
> bprm_set_creds(), bprm_check(), bprm_commiting_creds(),
> bprm_commited_creds(). It seems silly to me for an LSM to perform
> these checks at all since I would expect the _meaningful_ check to be
> finit_module() of the module itself. Having a way for an LSM to know
> the exec is tied to a kernel module would let them skip the nonsense
> checks.
> 
> Since the process for doing the usermode_blob is defined by the kernel
> module build/link/objcopy process, could we tighten the
> fork_usermode_blob() interface to point to the kernel module itself,
> rather than leaving it an open-ended "blob" interface? Given our
> history of needing to replace blob interfaces with file interfaces,
> I'm cautious to add a new blob interface. Maybe just pull all the
> blob-finding/loading into the interface, and just make it something
> like fork_usermode_kmod(struct module *mod, struct umh_info *info) ?

I don't think it will work, since Andy and others pointed out that
bpfilter needs to work as builtin as well. There is no 'struct module'
in such case, but fork-ing of the user process still needs to happen.

[PATCH net 0/5] rxrpc: Fixes

2018-05-10 Thread David Howells


Here are three fixes for AF_RXRPC and two tracepoints that were useful for
finding them:

 (1) Fix missing start of expect-Rx-by timeout on initial packet
 transmission so that calls will time out if the peer doesn't respond.

 (2) Fix error reception on AF_INET6 sockets by using the correct family of
 sockopts on the UDP transport socket.

 (3) Fix setting the minimum security level on kernel calls so that they
 can be encrypted.

 (4) Add a tracepoint to log ICMP/ICMP6 and other error reports from the
 transport socket.

 (5) Add a tracepoint to log UDP sendmsg failure so that we can find out if
 transmission failure occurred on the UDP socket.

The patches are tagged here:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-fixes-20180510

and can also be found on the following branch:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes

David
---
David Howells (5):
  rxrpc: Fix missing start of call timeout
  rxrpc: Fix error reception on AF_INET6 sockets
  rxrpc: Fix the min security level for kernel calls
  rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
  rxrpc: Trace UDP transmission failure


 include/trace/events/rxrpc.h |   85 ++
 net/rxrpc/af_rxrpc.c |2 -
 net/rxrpc/ar-internal.h  |1 
 net/rxrpc/conn_event.c   |   11 -
 net/rxrpc/input.c|2 -
 net/rxrpc/local_event.c  |3 +
 net/rxrpc/local_object.c |   57 +---
 net/rxrpc/output.c   |   34 -
 net/rxrpc/peer_event.c   |   46 +++
 net/rxrpc/rxkad.c|6 ++-
 net/rxrpc/sendmsg.c  |   10 +
 11 files changed, 209 insertions(+), 48 deletions(-)

[PATCH net 1/5] rxrpc: Fix missing start of call timeout

2018-05-10 Thread David Howells

The expect_rx_by call timeout is supposed to be set when a call is started
to indicate that we need to receive a packet by that point.  This is
currently put back every time we receive a packet, but it isn't started
when we first send a packet.  Without this, the call may wait forever if
the server doesn't deign to reply.

Fix this by setting the timeout upon a successful UDP sendmsg call for the
first DATA packet.  The timeout is initiated only for initial transmission
and not for subsequent retries as we don't want the retry mechanism to
extend the timeout indefinitely.

Fixes: a158bdd3247b ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne 
Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/input.c   |2 +-
 net/rxrpc/output.c  |   11 +++
 net/rxrpc/sendmsg.c |   10 ++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 90d7079e0aa9..19975d2ca9a2 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -476,6 +476,7 @@ enum rxrpc_call_flag {
RXRPC_CALL_SEND_PING,   /* A ping will need to be sent */
RXRPC_CALL_PINGING, /* Ping in process */
RXRPC_CALL_RETRANS_TIMEOUT, /* Retransmission due to timeout 
occurred */
+   RXRPC_CALL_BEGAN_RX_TIMER,  /* We began the expect_rx_by timer */
 };
 
 /*
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 0410d2277ca2..b5fd6381313d 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -971,7 +971,7 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
if (timo) {
unsigned long now = jiffies, expect_rx_by;
 
-   expect_rx_by = jiffies + timo;
+   expect_rx_by = now + timo;
WRITE_ONCE(call->expect_rx_by, expect_rx_by);
rxrpc_reduce_call_timer(call, expect_rx_by, now,
rxrpc_timer_set_for_normal);
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 7f1fc04775b3..6b9d27f0d7ec 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -414,6 +414,17 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct 
sk_buff *skb,

rxrpc_timer_set_for_lost_ack);
}
}
+
+   if (sp->hdr.seq == 1 &&
+   !test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER,
+ >flags)) {
+   unsigned long nowj = jiffies, expect_rx_by;
+
+   expect_rx_by = nowj + call->next_rx_timo;
+   WRITE_ONCE(call->expect_rx_by, expect_rx_by);
+   rxrpc_reduce_call_timer(call, expect_rx_by, nowj,
+   rxrpc_timer_set_for_normal);
+   }
}
 
rxrpc_set_keepalive(call);
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 206e802ccbdc..be01f9c5d963 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -223,6 +223,15 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, 
struct rxrpc_call *call,
 
ret = rxrpc_send_data_packet(call, skb, false);
if (ret < 0) {
+   switch (ret) {
+   case -ENETUNREACH:
+   case -EHOSTUNREACH:
+   case -ECONNREFUSED:
+   rxrpc_set_call_completion(call,
+ RXRPC_CALL_LOCAL_ERROR,
+ 0, ret);
+   goto out;
+   }
_debug("need instant resend %d", ret);
rxrpc_instant_resend(call, ix);
} else {
@@ -241,6 +250,7 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, 
struct rxrpc_call *call,
rxrpc_timer_set_for_send);
}
 
+out:
rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
_leave("");
 }

[PATCH net 2/5] rxrpc: Fix error reception on AF_INET6 sockets

2018-05-10 Thread David Howells

AF_RXRPC tries to turn on IP_RECVERR and IP_MTU_DISCOVER on the UDP socket
it just opened for communications with the outside world, regardless of the
type of socket.  Unfortunately, this doesn't work with an AF_INET6 socket.

Fix this by turning on IPV6_RECVERR and IPV6_MTU_DISCOVER instead if the
socket is of the AF_INET6 family.

Without this, kAFS server and address rotation doesn't work correctly
because the algorithm doesn't detect received network errors.

Fixes: 75b54cb57ca3 ("rxrpc: Add IPv6 support")
Signed-off-by: David Howells 
---

 net/rxrpc/local_object.c |   57 ++
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index 8b54e9531d52..b493e6b62740 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -134,22 +134,49 @@ static int rxrpc_open_socket(struct rxrpc_local *local, 
struct net *net)
}
}
 
-   /* we want to receive ICMP errors */
-   opt = 1;
-   ret = kernel_setsockopt(local->socket, SOL_IP, IP_RECVERR,
-   (char *) , sizeof(opt));
-   if (ret < 0) {
-   _debug("setsockopt failed");
-   goto error;
-   }
+   switch (local->srx.transport.family) {
+   case AF_INET:
+   /* we want to receive ICMP errors */
+   opt = 1;
+   ret = kernel_setsockopt(local->socket, SOL_IP, IP_RECVERR,
+   (char *) , sizeof(opt));
+   if (ret < 0) {
+   _debug("setsockopt failed");
+   goto error;
+   }
 
-   /* we want to set the don't fragment bit */
-   opt = IP_PMTUDISC_DO;
-   ret = kernel_setsockopt(local->socket, SOL_IP, IP_MTU_DISCOVER,
-   (char *) , sizeof(opt));
-   if (ret < 0) {
-   _debug("setsockopt failed");
-   goto error;
+   /* we want to set the don't fragment bit */
+   opt = IP_PMTUDISC_DO;
+   ret = kernel_setsockopt(local->socket, SOL_IP, IP_MTU_DISCOVER,
+   (char *) , sizeof(opt));
+   if (ret < 0) {
+   _debug("setsockopt failed");
+   goto error;
+   }
+   break;
+
+   case AF_INET6:
+   /* we want to receive ICMP errors */
+   opt = 1;
+   ret = kernel_setsockopt(local->socket, SOL_IPV6, IPV6_RECVERR,
+   (char *) , sizeof(opt));
+   if (ret < 0) {
+   _debug("setsockopt failed");
+   goto error;
+   }
+
+   /* we want to set the don't fragment bit */
+   opt = IPV6_PMTUDISC_DO;
+   ret = kernel_setsockopt(local->socket, SOL_IPV6, 
IPV6_MTU_DISCOVER,
+   (char *) , sizeof(opt));
+   if (ret < 0) {
+   _debug("setsockopt failed");
+   goto error;
+   }
+   break;
+
+   default:
+   BUG();
}
 
/* set the socket up */

[PATCH net 3/5] rxrpc: Fix the min security level for kernel calls

2018-05-10 Thread David Howells

Fix the kernel call initiation to set the minimum security level for kernel
initiated calls (such as from kAFS) from the sockopt value.

Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol 
info")
Signed-off-by: David Howells 
---

 net/rxrpc/af_rxrpc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 9a2c8e7c000e..2b463047dd7b 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -313,7 +313,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket 
*sock,
memset(, 0, sizeof(cp));
cp.local= rx->local;
cp.key  = key;
-   cp.security_level   = 0;
+   cp.security_level   = rx->min_sec_level;
cp.exclusive= false;
cp.upgrade  = upgrade;
cp.service_id   = srx->srx_service;

[PATCH net 4/5] rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages

2018-05-10 Thread David Howells

Add a tracepoint to log received ICMP/ICMP6 events and other error
messages.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |   30 +++
 net/rxrpc/peer_event.c   |   46 +-
 2 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 9e96c2fe2793..497d0b67f421 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -15,6 +15,7 @@
 #define _TRACE_RXRPC_H
 
 #include 
+#include 
 
 /*
  * Define enums for tracing information.
@@ -1374,6 +1375,35 @@ TRACE_EVENT(rxrpc_resend,
  __entry->anno)
);
 
+TRACE_EVENT(rxrpc_rx_icmp,
+   TP_PROTO(struct rxrpc_peer *peer, struct sock_extended_err *ee,
+struct sockaddr_rxrpc *srx),
+
+   TP_ARGS(peer, ee, srx),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   peer)
+   __field_struct(struct sock_extended_err,ee  )
+   __field_struct(struct sockaddr_rxrpc,   srx )
+),
+
+   TP_fast_assign(
+   __entry->peer = peer->debug_id;
+   memcpy(&__entry->ee, ee, sizeof(__entry->ee));
+   memcpy(&__entry->srx, srx, sizeof(__entry->srx));
+  ),
+
+   TP_printk("P=%08x o=%u t=%u c=%u i=%u d=%u e=%d %pISp",
+ __entry->peer,
+ __entry->ee.ee_origin,
+ __entry->ee.ee_type,
+ __entry->ee.ee_code,
+ __entry->ee.ee_info,
+ __entry->ee.ee_data,
+ __entry->ee.ee_errno,
+ &__entry->srx.transport)
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 78c2f95d1f22..0ed8b651cec2 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -28,39 +28,39 @@ static void rxrpc_store_error(struct rxrpc_peer *, struct 
sock_exterr_skb *);
  * Find the peer associated with an ICMP packet.
  */
 static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local,
-const struct sk_buff *skb)
+const struct sk_buff *skb,
+struct sockaddr_rxrpc *srx)
 {
struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
-   struct sockaddr_rxrpc srx;
 
_enter("");
 
-   memset(, 0, sizeof(srx));
-   srx.transport_type = local->srx.transport_type;
-   srx.transport_len = local->srx.transport_len;
-   srx.transport.family = local->srx.transport.family;
+   memset(srx, 0, sizeof(*srx));
+   srx->transport_type = local->srx.transport_type;
+   srx->transport_len = local->srx.transport_len;
+   srx->transport.family = local->srx.transport.family;
 
/* Can we see an ICMP4 packet on an ICMP6 listening socket?  and vice
 * versa?
 */
-   switch (srx.transport.family) {
+   switch (srx->transport.family) {
case AF_INET:
-   srx.transport.sin.sin_port = serr->port;
+   srx->transport.sin.sin_port = serr->port;
switch (serr->ee.ee_origin) {
case SO_EE_ORIGIN_ICMP:
_net("Rx ICMP");
-   memcpy(_addr,
+   memcpy(>transport.sin.sin_addr,
   skb_network_header(skb) + serr->addr_offset,
   sizeof(struct in_addr));
break;
case SO_EE_ORIGIN_ICMP6:
_net("Rx ICMP6 on v4 sock");
-   memcpy(_addr,
+   memcpy(>transport.sin.sin_addr,
   skb_network_header(skb) + serr->addr_offset + 12,
   sizeof(struct in_addr));
break;
default:
-   memcpy(_addr, _hdr(skb)->saddr,
+   memcpy(>transport.sin.sin_addr, 
_hdr(skb)->saddr,
   sizeof(struct in_addr));
break;
}
@@ -68,25 +68,25 @@ static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct 
rxrpc_local *local,
 
 #ifdef CONFIG_AF_RXRPC_IPV6
case AF_INET6:
-   srx.transport.sin6.sin6_port = serr->port;
+   srx->transport.sin6.sin6_port = serr->port;
switch (serr->ee.ee_origin) {
case SO_EE_ORIGIN_ICMP6:
_net("Rx ICMP6");
-   memcpy(_addr,
+   memcpy(>transport.sin6.sin6_addr,

[PATCH net 5/5] rxrpc: Trace UDP transmission failure

2018-05-10 Thread David Howells

Add a tracepoint to log transmission failure from the UDP transport socket
being used by AF_RXRPC.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |   55 ++
 net/rxrpc/conn_event.c   |   11 ++--
 net/rxrpc/local_event.c  |3 ++
 net/rxrpc/output.c   |   23 --
 net/rxrpc/rxkad.c|6 +++--
 5 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 497d0b67f421..077e664ac9a2 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -211,6 +211,20 @@ enum rxrpc_congest_change {
rxrpc_cong_saw_nack,
 };
 
+enum rxrpc_tx_fail_trace {
+   rxrpc_tx_fail_call_abort,
+   rxrpc_tx_fail_call_ack,
+   rxrpc_tx_fail_call_data_frag,
+   rxrpc_tx_fail_call_data_nofrag,
+   rxrpc_tx_fail_call_final_resend,
+   rxrpc_tx_fail_conn_abort,
+   rxrpc_tx_fail_conn_challenge,
+   rxrpc_tx_fail_conn_response,
+   rxrpc_tx_fail_reject,
+   rxrpc_tx_fail_version_keepalive,
+   rxrpc_tx_fail_version_reply,
+};
+
 #endif /* end __RXRPC_DECLARE_TRACE_ENUMS_ONCE_ONLY */
 
 /*
@@ -438,6 +452,19 @@ enum rxrpc_congest_change {
EM(RXRPC_CALL_LOCAL_ERROR,  "LocalError") \
E_(RXRPC_CALL_NETWORK_ERROR,"NetError")
 
+#define rxrpc_tx_fail_traces \
+   EM(rxrpc_tx_fail_call_abort,"CallAbort") \
+   EM(rxrpc_tx_fail_call_ack,  "CallAck") \
+   EM(rxrpc_tx_fail_call_data_frag,"CallDataFrag") \
+   EM(rxrpc_tx_fail_call_data_nofrag,  "CallDataNofrag") \
+   EM(rxrpc_tx_fail_call_final_resend, "CallFinalResend") \
+   EM(rxrpc_tx_fail_conn_abort,"ConnAbort") \
+   EM(rxrpc_tx_fail_conn_challenge,"ConnChall") \
+   EM(rxrpc_tx_fail_conn_response, "ConnResp") \
+   EM(rxrpc_tx_fail_reject,"Reject") \
+   EM(rxrpc_tx_fail_version_keepalive, "VerKeepalive") \
+   E_(rxrpc_tx_fail_version_reply, "VerReply")
+
 /*
  * Export enum symbols via userspace.
  */
@@ -461,6 +488,7 @@ rxrpc_propose_ack_traces;
 rxrpc_propose_ack_outcomes;
 rxrpc_congest_modes;
 rxrpc_congest_changes;
+rxrpc_tx_fail_traces;
 
 /*
  * Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -1404,6 +1432,33 @@ TRACE_EVENT(rxrpc_rx_icmp,
  &__entry->srx.transport)
);
 
+TRACE_EVENT(rxrpc_tx_fail,
+   TP_PROTO(unsigned int debug_id, rxrpc_serial_t serial, int ret,
+enum rxrpc_tx_fail_trace what),
+
+   TP_ARGS(debug_id, serial, ret, what),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   debug_id)
+   __field(rxrpc_serial_t, serial  )
+   __field(int,ret )
+   __field(enum rxrpc_tx_fail_trace,   what)
+),
+
+   TP_fast_assign(
+   __entry->debug_id = debug_id;
+   __entry->serial = serial;
+   __entry->ret = ret;
+   __entry->what = what;
+  ),
+
+   TP_printk("c=%08x r=%x ret=%d %s",
+ __entry->debug_id,
+ __entry->serial,
+ __entry->ret,
+ __print_symbolic(__entry->what, rxrpc_tx_fail_traces))
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index c717152070df..1350f1be8037 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -40,7 +40,7 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
} __attribute__((packed)) pkt;
struct rxrpc_ackinfo ack_info;
size_t len;
-   int ioc;
+   int ret, ioc;
u32 serial, mtu, call_id, padding;
 
_enter("%d", conn->debug_id);
@@ -135,10 +135,13 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
break;
}
 
-   kernel_sendmsg(conn->params.local->socket, , iov, ioc, len);
+   ret = kernel_sendmsg(conn->params.local->socket, , iov, ioc, len);
conn->params.peer->last_tx_at = ktime_get_real();
+   if (ret < 0)
+   trace_rxrpc_tx_fail(conn->debug_id, serial, ret,
+   rxrpc_tx_fail_call_final_resend);
+
_leave("");
-   return;
 }
 
 /*
@@ -236,6 +239,8 @@ static int rxrpc_abort_connection(struct rxrpc_connection 
*conn,
 
ret = kernel_sendmsg(conn->params.local->socket, , iov, 2, len);
if (ret < 0) {
+   trace_rxrpc_tx_fail(conn->debug_id, serial, ret,
+   rxrpc_tx_fail_conn_abort);

Re: [PATCH v6 0/5] PCI: Improve PCIe link status reporting

2018-05-10 Thread Bjorn Helgaas

On Thu, May 03, 2018 at 03:00:07PM -0500, Bjorn Helgaas wrote:
> This is based on Tal's recent work to unify the approach for reporting PCIe
> link speed/width and whether the device is being limited by a slower
> upstream link.
> 
> The new pcie_print_link_status() interface appeared in v4.17-rc1; see
> 9e506a7b5147 ("PCI: Add pcie_print_link_status() to log link speed and
> whether it's limited").
> 
> That's a good way to replace use of pcie_get_minimum_link(), which gives
> misleading results when a path contains both a fast, narrow link and a
> slow, wide link: it reports the equivalent of a slow, narrow link.
> 
> This series removes the remaining uses of pcie_get_minimum_link() and then
> removes the interface itself.  I'd like to merge them all through the PCI
> tree to make the removal easy.
> 
> This does change the dmesg reporting of link speeds, and in the ixgbe case,
> it changes the reporting from KERN_WARN level to KERN_INFO.  If that's an
> issue, let's talk about it.  I'm hoping the reduce code size, improved
> functionality, and consistency across drivers is enough to make this
> worthwhile.
> 
> ---
> 
> Bjorn Helgaas (5):
>   bnx2x: Report PCIe link properties with pcie_print_link_status()
>   bnxt_en: Report PCIe link properties with pcie_print_link_status()
>   cxgb4: Report PCIe link properties with pcie_print_link_status()
>   ixgbe: Report PCIe link properties with pcie_print_link_status()
>   PCI: Remove unused pcie_get_minimum_link()

Jeff has acked the ixgbe patch.

Any comments on the bnx2x, bnxt_en, or cxgb4 patches?

>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   23 ++-
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c|   19 --
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |   75 
> --
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|   47 --
>  drivers/pci/pci.c|   43 -
>  include/linux/pci.h  |2 -
>  6 files changed, 9 insertions(+), 200 deletions(-)

Re: [PATCH] bpf, doc: clarification for the meaning of 'id'

2018-05-10 Thread Daniel Borkmann

On 05/10/2018 05:09 AM, Wang YanQing wrote:
> For me, as a reader whose mother language isn't English, the
> old words bring a little difficulty to catch the meaning, this
> patch rewords the subsection in a more clarificatory way.
> 
> This patch also add blank lines as separator at two places
> to improve readability.
> 
> Signed-off-by: Wang YanQing 

Applied to bpf-next, thanks Wang!

Re: [PATCH bpf] tools: bpf: handle NULL return in bpf_prog_load_xattr()

2018-05-10 Thread Daniel Borkmann

On 05/10/2018 07:09 PM, Jakub Kicinski wrote:
> bpf_object__open() can return error pointer as well as NULL.
> Fix error handling in bpf_prog_load_xattr() (and indirectly
> bpf_prog_load()).
> 
> Fixes: 6f6d33f3b3d0 ("bpf: selftests add sockmap tests")
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Quentin Monnet 

Applied to bpf tree, thanks Jakub!

Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper

2018-05-10 Thread Kees Cook

On Fri, May 4, 2018 at 12:56 PM, Luis R. Rodriguez  wrote:
> What a mighty short list of reviewers. Adding some more. My review below.
> I'd appreciate a Cc on future versions of these patches.

Me too, please. And likely linux-security-module@ and Jessica too.

> On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
>> Introduce helper:
>> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
>> struct umh_info {
>>struct file *pipe_to_umh;
>>struct file *pipe_from_umh;
>>pid_t pid;
>> };
>>
>> that GPLed kernel modules (signed or unsigned) can use it to execute part
>> of its own data as swappable user mode process.
>>
>> The kernel will do:
>> - mount "tmpfs"
>> - allocate a unique file in tmpfs
>> - populate that file with [data, data + len] bytes
>> - user-mode-helper code will do_execve that file and, before the process
>>   starts, the kernel will create two unix pipes for bidirectional
>>   communication between kernel module and umh
>> - close tmpfs file, effectively deleting it
>> - the fork_usermode_blob will return zero on success and populate
>>   'struct umh_info' with two unix pipes and the pid of the user process

I'm trying to think how LSMs can successfully reason about the
resulting exec(). In the past, we've replaced "blob" style interfaces
with file-based interfaces (e.g. init_module() -> finit_module(),
kexec_load() -> kexec_file_load()) to better let the kernel understand
the origin of executable content. Here the intent is fine: we're
getting the exec from an already-loaded module, etc, etc. I'm trying
to think specifically about the interface.

How can the ultimate exec get tied back to the kernel module in a way
that the LSM can query? Right now the hooks hit during exec are:
kernel_read_file() and kernel_post_read_file() of tmpfs file,
bprm_set_creds(), bprm_check(), bprm_commiting_creds(),
bprm_commited_creds(). It seems silly to me for an LSM to perform
these checks at all since I would expect the _meaningful_ check to be
finit_module() of the module itself. Having a way for an LSM to know
the exec is tied to a kernel module would let them skip the nonsense
checks.

Since the process for doing the usermode_blob is defined by the kernel
module build/link/objcopy process, could we tighten the
fork_usermode_blob() interface to point to the kernel module itself,
rather than leaving it an open-ended "blob" interface? Given our
history of needing to replace blob interfaces with file interfaces,
I'm cautious to add a new blob interface. Maybe just pull all the
blob-finding/loading into the interface, and just make it something
like fork_usermode_kmod(struct module *mod, struct umh_info *info) ?

-Kees

-- 
Kees Cook
Pixel Security

[PATCH bpf-next] selftests/bpf: Fix bash reference in Makefile

2018-05-10 Thread Joe Stringer

'|& ...' is a bash 4.0+ construct which is not guaranteed to be available
when using '$(shell ...)' in a Makefile. Fall back to the more portable
'2>&1 | ...'.

Fixes the following warning during compilation:

/bin/sh: 1: Syntax error: "&" unexpected

Signed-off-by: Joe Stringer 
---
 tools/testing/selftests/bpf/Makefile | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 9d762184b805..79d29d6cc719 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -90,9 +90,9 @@ CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \
 $(OUTPUT)/test_l4lb_noinline.o: CLANG_FLAGS += -fno-inline
 $(OUTPUT)/test_xdp_noinline.o: CLANG_FLAGS += -fno-inline
 
-BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help |& grep dwarfris)
-BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help |& grep BTF)
-BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --version |& grep LLVM)
+BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
+BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
+BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --version 2>&1 | grep LLVM)
 
 ifneq ($(BTF_LLC_PROBE),)
 ifneq ($(BTF_PAHOLE_PROBE),)
-- 
2.14.1

[PATCH net v2] rps: Correct wrong skb_flow_limit check when enable RPS

2018-05-10 Thread gfree . wind

From: Gao Feng 

The skb flow limit is implemented for each CPU independently. In the
current codes, the function skb_flow_limit gets the softnet_data by
this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
the current cpu when enable RPS. As the result, the skb_flow_limit checks
the stats of current CPU, while the skb is going to append the queue of
another CPU. It isn't the expected behavior.

Now pass the softnet_data as a param to make consistent.

Fixes: 99bbc7074190 ("rps: selective flow shedding during softnet overflow")
Signed-off-by: Gao Feng 
---
 v2: Add Fixes tag per Eric, and enhance the commit log
 v1: intial version

 net/core/dev.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index af0558b..0f98eff 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3883,18 +3883,15 @@ static int rps_ipi_queued(struct softnet_data *sd)
 int netdev_flow_limit_table_len __read_mostly = (1 << 12);
 #endif
 
-static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
+static bool skb_flow_limit(struct softnet_data *sd, struct sk_buff *skb, 
unsigned int qlen)
 {
 #ifdef CONFIG_NET_FLOW_LIMIT
struct sd_flow_limit *fl;
-   struct softnet_data *sd;
unsigned int old_flow, new_flow;
 
if (qlen < (netdev_max_backlog >> 1))
return false;
 
-   sd = this_cpu_ptr(_data);
-
rcu_read_lock();
fl = rcu_dereference(sd->flow_limit);
if (fl) {
@@ -3938,7 +3935,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int 
cpu,
if (!netif_running(skb->dev))
goto drop;
qlen = skb_queue_len(>input_pkt_queue);
-   if (qlen <= netdev_max_backlog && !skb_flow_limit(skb, qlen)) {
+   if (qlen <= netdev_max_backlog && !skb_flow_limit(sd, skb, qlen)) {
if (qlen) {
 enqueue:
__skb_queue_tail(>input_pkt_queue, skb);
-- 
1.9.1

Re: [PATCH net-next] tcp: switch pacing timer to softirq based hrtimer

2018-05-10 Thread David Miller

From: Eric Dumazet 
Date: Thu, 10 May 2018 13:55:00 -0700

> 
> 
> On 05/10/2018 12:49 PM, Eric Dumazet wrote:
>> linux-4.16 got support for softirq based hrtimers.
>> TCP can switch its pacing hrtimer to this variant, since this
>> avoids going through a tasklet and some atomic operations.
>> 
> 
> I need to send a V2, adding a test of hrtimer_cancel() return value
> in tcp_clear_xmit_timers() to eventually release the socket reference.

Ok.

[PATCH v2 net-next] tcp: switch pacing timer to softirq based hrtimer

2018-05-10 Thread Eric Dumazet

linux-4.16 got support for softirq based hrtimers.
TCP can switch its pacing hrtimer to this variant, since this
avoids going through a tasklet and some atomic operations.

pacing timer logic looks like other (jiffies based) tcp timers.

v2: use hrtimer_try_to_cancel() in tcp_clear_xmit_timers()
to correctly release reference on socket if needed.

Signed-off-by: Eric Dumazet 
---
 include/net/tcp.h |  4 ++-
 net/ipv4/tcp_output.c | 69 ---
 net/ipv4/tcp_timer.c  |  2 +-
 3 files changed, 29 insertions(+), 46 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
cf803fe0fb86b6a0fb1876a9f775a9c6e6a28ac4..3b1d617b01109b133b4ecafa9ee46173851083f8
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -557,7 +557,9 @@ void tcp_fin(struct sock *sk);
 void tcp_init_xmit_timers(struct sock *);
 static inline void tcp_clear_xmit_timers(struct sock *sk)
 {
-   hrtimer_cancel(_sk(sk)->pacing_timer);
+   if (hrtimer_try_to_cancel(_sk(sk)->pacing_timer) == 1)
+   sock_put(sk);
+
inet_csk_clear_xmit_timers(sk);
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 
d07c0dcc99aaa55c4da963599c8286c8baa1f783..0d8f950a9006598c70dbf51e281a3fe32dfaa234
 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -772,7 +772,7 @@ struct tsq_tasklet {
 };
 static DEFINE_PER_CPU(struct tsq_tasklet, tsq_tasklet);
 
-static void tcp_tsq_handler(struct sock *sk)
+static void tcp_tsq_write(struct sock *sk)
 {
if ((1 << sk->sk_state) &
(TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
@@ -789,6 +789,16 @@ static void tcp_tsq_handler(struct sock *sk)
   0, GFP_ATOMIC);
}
 }
+
+static void tcp_tsq_handler(struct sock *sk)
+{
+   bh_lock_sock(sk);
+   if (!sock_owned_by_user(sk))
+   tcp_tsq_write(sk);
+   else if (!test_and_set_bit(TCP_TSQ_DEFERRED, >sk_tsq_flags))
+   sock_hold(sk);
+   bh_unlock_sock(sk);
+}
 /*
  * One tasklet per cpu tries to send more skbs.
  * We run in tasklet context but need to disable irqs when
@@ -816,16 +826,7 @@ static void tcp_tasklet_func(unsigned long data)
smp_mb__before_atomic();
clear_bit(TSQ_QUEUED, >sk_tsq_flags);
 
-   if (!sk->sk_lock.owned &&
-   test_bit(TCP_TSQ_DEFERRED, >sk_tsq_flags)) {
-   bh_lock_sock(sk);
-   if (!sock_owned_by_user(sk)) {
-   clear_bit(TCP_TSQ_DEFERRED, >sk_tsq_flags);
-   tcp_tsq_handler(sk);
-   }
-   bh_unlock_sock(sk);
-   }
-
+   tcp_tsq_handler(sk);
sk_free(sk);
}
 }
@@ -853,9 +854,10 @@ void tcp_release_cb(struct sock *sk)
nflags = flags & ~TCP_DEFERRED_ALL;
} while (cmpxchg(>sk_tsq_flags, flags, nflags) != flags);
 
-   if (flags & TCPF_TSQ_DEFERRED)
-   tcp_tsq_handler(sk);
-
+   if (flags & TCPF_TSQ_DEFERRED) {
+   tcp_tsq_write(sk);
+   __sock_put(sk);
+   }
/* Here begins the tricky part :
 * We are called from release_sock() with :
 * 1) BH disabled
@@ -929,7 +931,7 @@ void tcp_wfree(struct sk_buff *skb)
if (!(oval & TSQF_THROTTLED) || (oval & TSQF_QUEUED))
goto out;
 
-   nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED | 
TCPF_TSQ_DEFERRED;
+   nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED;
nval = cmpxchg(>sk_tsq_flags, oval, nval);
if (nval != oval)
continue;
@@ -948,37 +950,17 @@ void tcp_wfree(struct sk_buff *skb)
sk_free(sk);
 }
 
-/* Note: Called under hard irq.
- * We can not call TCP stack right away.
+/* Note: Called under soft irq.
+ * We can call TCP stack right away, unless socket is owned by user.
  */
 enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer)
 {
struct tcp_sock *tp = container_of(timer, struct tcp_sock, 
pacing_timer);
struct sock *sk = (struct sock *)tp;
-   unsigned long nval, oval;
 
-   for (oval = READ_ONCE(sk->sk_tsq_flags);; oval = nval) {
-   struct tsq_tasklet *tsq;
-   bool empty;
+   tcp_tsq_handler(sk);
+   sock_put(sk);
 
-   if (oval & TSQF_QUEUED)
-   break;
-
-   nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED | 
TCPF_TSQ_DEFERRED;
-   nval = cmpxchg(>sk_tsq_flags, oval, nval);
-   if (nval != oval)
-   continue;
-
-   if (!refcount_inc_not_zero(>sk_wmem_alloc))
-   break;
-   /* queue this socket to tasklet queue */
-   tsq = this_cpu_ptr(_tasklet);
-   empty = list_empty(>head);
-

Re: pull-request: can 2018-05-10

2018-05-10 Thread David Miller

From: Marc Kleine-Budde 
Date: Thu, 10 May 2018 18:47:47 +0200

> this is a pull request for net/master consisting of 2 patches.
> 
> Both patches are from Lukas Wunner and fix two problems found in the hi311x 
> CAN
> driver under high load situations.

Applied.

Re: [PATCH] qed: fix spelling mistake: "taskelt" -> "tasklet"

2018-05-10 Thread David Miller

From: Colin King 
Date: Thu, 10 May 2018 15:03:27 +0100

> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in DP_VERBOSE message text
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH net-next] rocker: Postpone filtering of !added_by_user FDB

2018-05-10 Thread David Miller

From: Petr Machata 
Date: Thu, 10 May 2018 15:29:46 +0200

> Breaking out of the switch in rocker_switchdev_event() still ends up
> scheduling work, except an ill-defined one. This leads to an OOPS cited
> below. Fix by postponing the check until rocker_switchdev_event_work().
 ...
> Fixes: 816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev 
> notifications")
> Suggested-by: Vivien Didelot 
> Signed-off-by: Petr Machata 

Applied.

Re: [PATCH net-next] tls: Fix tls_device initialization

2018-05-10 Thread David Miller

From: Boris Pismenny 
Date: Thu, 10 May 2018 16:27:25 +0300

> Add sg table initialization to fix a BUG_ON encountered when enabling
> CONFIG_DEBUG_SG.
> 
> Signed-off-by: Boris Pismenny 

Applied.

Re: [PATCH][next] net: aquantia: fix unsigned numvecs comparison with less than zero

2018-05-10 Thread David Miller

From: Colin King 
Date: Thu, 10 May 2018 13:52:01 +0100

> From: Colin Ian King 
> 
> The comparison of numvecs < 0 is always false because numvecs is a u32
> and hence the error return from a failed call to pci_alloc_irq_vectores
> is never detected.  Fix this by using the signed int ret to handle the
> error return and assign numvecs to err.
> 
> Detected by CoverityScan, CID#1468650 ("Unsigned compared against 0")
> 
> Fixes: a09bd81b5413 ("net: aquantia: Limit number of vectors to actually 
> allocated irqs")
> Signed-off-by: Colin Ian King 

This doesn't apply to net-next.

Re: [PATCH net-next] cxgb4: fix the wrong conversion of Mbps to Kbps

2018-05-10 Thread David Miller

From: Ganesh Goudar 
Date: Thu, 10 May 2018 16:07:23 +0530

> fix the wrong conversion where 1 Mbps was converted to
> 1024 Kbps.
> 
> Signed-off-by: Ganesh Goudar 

Applied, thanks.

Re: [PATCH net-next 0/4] mlxsw: Support VLAN devices in mirroring offloads

2018-05-10 Thread David Miller

From: Ido Schimmel 
Date: Thu, 10 May 2018 13:13:02 +0300

> Petr says:
> 
> When offloading "tc action mirred mirror", there are several scenarios
> where VLAN devices can show up, that mlxsw can offload on Spectrum
> machines.
> 
> I) A direct mirror to a VLAN device on top of a front-panel port device
>(commonly referred to as "RSPAN")
> 
> II) VLAN device in egress path of a packet when resolving a mirror to
> gretap or ip6gretap netdevice.
> 
> Specifically in the latter case, the following are the cases that can be
> offloaded:
> 
> IIa) VLAN device directly above a physical device.
> IIb) A VLAN-unaware bridge where the egress device is as in IIa.
> IIc) VLAN device on top of a VLAN-aware bridge where the egress device
>  is a physical device.
> 
> This patch set implements all the above cases.
 ...

Series applied, thanks.

Re: KASAN: use-after-free Read in __dev_queue_xmit

2018-05-10 Thread Willem de Bruijn

On Wed, May 9, 2018 at 5:05 PM, Willem de Bruijn
 wrote:
> On Wed, May 9, 2018 at 3:36 PM, Eric Dumazet  wrote:
>>
>>
>> On 05/09/2018 12:21 PM, Willem de Bruijn wrote:
>>
>>> Indeed. The skb shared info struct is zeroed by dev_validate_header
>>> as a result of dev->hard_header_len exceeding skb->end - skb->data.
>>>
>>> Not exactly sure yet how this can happen. The hard header length space
>>> is accounted for during allocation as reserved memory. But,
>>> packet_alloc_skb does call skb_reserve(), moving skb->data
>>> effectively beyond this reserved region.
>>>
>>> It may be incorrect to pass skb->data to dev_validate_header, as that
>>> does not point to the start of the ll_header anymore. Still figuring out 
>>> what
>>> the right fix is..

The following resolves the issue.

packet_alloc_skb already calls skb_reserve(skb, reserve), so now
the network header should start at 0, not at reserve.

If SOCK_DGRAM, dev_hard_header() calls skb_push for the link
layer and returns this offset.

If SOCK_RAW, we should do the same and use the reserved space to
write the link layer.

Now behavior is the same as in tpacket_snd.

@@ -2898,19 +2911,26 @@ static int packet_snd(struct socket *sock,
struct msghdr *msg, size_t len)
tlen = dev->needed_tailroom;
linear = __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len);
linear = max(linear, min_t(int, len, dev->hard_header_len));
skb = packet_alloc_skb(sk, hlen + tlen, hlen, len, linear,
   msg->msg_flags & MSG_DONTWAIT, );
if (skb == NULL)
goto out_unlock;

-   skb_set_network_header(skb, reserve);
+   skb_reset_network_header(skb);

err = -EINVAL;
if (sock->type == SOCK_DGRAM) {
offset = dev_hard_header(skb, dev, ntohs(proto), addr,
NULL, len);
if (unlikely(offset < 0))
goto out_free;
+   } else {
+   skb_push(skb, dev->hard_header_len);
}

/* Returns -EFAULT on error */
err = skb_copy_datagram_from_iter(skb, offset, >msg_iter, len);

Re: [PATCH net] sctp: remove sctp_chunk_put from fail_mark err path in sctp_ulpevent_make_rcvmsg

2018-05-10 Thread David Miller

From: Xin Long 
Date: Thu, 10 May 2018 17:34:13 +0800

> In Commit 1f45f78f8e51 ("sctp: allow GSO frags to access the chunk too"),
> it held the chunk in sctp_ulpevent_make_rcvmsg to access it safely later
> in recvmsg. However, it also added sctp_chunk_put in fail_mark err path,
> which is only triggered before holding the chunk.
> 
> syzbot reported a use-after-free crash happened on this err path, where
> it shouldn't call sctp_chunk_put.
> 
> This patch simply removes this call.
> 
> Fixes: 1f45f78f8e51 ("sctp: allow GSO frags to access the chunk too")
> Reported-by: syzbot+141d898c5f24489db...@syzkaller.appspotmail.com
> Signed-off-by: Xin Long 

Applied and queued up for -stable.

Re: [PATCH v2] net/mlx4_en: Fix an error handling path in 'mlx4_en_init_netdev()'

2018-05-10 Thread David Miller

From: Christophe JAILLET 
Date: Thu, 10 May 2018 09:06:04 +0200

> If an error occurs, 'mlx4_en_destroy_netdev()' is called.
> It then calls 'mlx4_en_free_resources()' which does the needed resources
> cleanup.
> 
> So, doing some explicit kfree in the error handling path would lead to
> some double kfree.
> 
> Simplify code to avoid such a case.
> 
> Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
> Signed-off-by: Christophe JAILLET 

Applied and queued up for -stable, thanks.

Re: [PATCH net-next v2] tcp: Add mark for TIMEWAIT sockets

2018-05-10 Thread David Miller

From: Jon Maxwell 
Date: Thu, 10 May 2018 16:53:51 +1000

> This version has some suggestions by Eric Dumazet:
> 
> - Use a local variable for the mark in IPv6 instead of ctl_sk to avoid SMP 
> races. 
> - Use the more elegant "IP4_REPLY_MARK(net, skb->mark) ?: sk->sk_mark"
> statement. 
> - Factorize code as sk_fullsock() check is not necessary.
> 
> Aidan McGurn from Openwave Mobility systems reported the following bug:
> 
> "Marked routing is broken on customer deployment. Its effects are large 
> increase in Uplink retransmissions caused by the client never receiving 
> the final ACK to their FINACK - this ACK misses the mark and routes out 
> of the incorrect route."
> 
> Currently marks are added to sk_buffs for replies when the "fwmark_reflect" 
> sysctl is enabled. But not for TW sockets that had sk->sk_mark set via 
> setsockopt(SO_MARK..).  
> 
> Fix this in IPv4/v6 by adding tw->tw_mark for TIME_WAIT sockets. Copy the the 
> original sk->sk_mark in __inet_twsk_hashdance() to the new tw->tw_mark 
> location. 
> Then progate this so that the skb gets sent with the correct mark. Do the 
> same 
> for resets. Give the "fwmark_reflect" sysctl precedence over sk->sk_mark so 
> that
> netfilter rules are still honored.
> 
> Signed-off-by: Jon Maxwell 

I'm surprised the lack of a mark in timewait sockets wasn't noticed earlier.

Applied, thank you.

Re: [PATCH] net: ipv4: remove define INET_CSK_DEBUG and unnecessary EXPORT_SYMBOL

2018-05-10 Thread David Miller

From: Joe Perches 
Date: Wed,  9 May 2018 23:24:07 -0700

> INET_CSK_DEBUG is always set and only is used for 2 pr_debug calls.
> 
> EXPORT_SYMBOL(inet_csk_timer_bug_msg) is only used by these 2
> pr_debug calls and is also unnecessary as the exported string can
> be used directly by these calls.
> 
> Signed-off-by: Joe Perches 

Applied to net-next.

Re: [PATCH net-next] net/core: correct the variable name in dev_ioctl() comment

2018-05-10 Thread David Miller

From: Sun Lianwen 
Date: Thu, 10 May 2018 11:01:20 +0800

> The variable name is not "arg" but "ifr" in dev_ioctl()
> 
> Signed-off-by: Sun Lianwen 

If you are going to touch this, fix it full by adding the need_copyout
variable to the comment as well.

Re: [PATCH net] hv_netvsc: set master device

2018-05-10 Thread David Miller

From: Stephen Hemminger 
Date: Wed,  9 May 2018 14:09:04 -0700

> The hyper-v transparent bonding should have used master_dev_link.
> The netvsc device should look like a master bond device not
> like the upper side of a tunnel.
> 
> This makes the semantics the same so that userspace applications
> looking at network devices see the correct master relationshipship.
> 
> Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
> Signed-off-by: Stephen Hemminger 

Applied and queued up for -stable.

Re: pull-request: mac80211 2018-05-09

2018-05-10 Thread David Miller

From: Johannes Berg 
Date: Wed,  9 May 2018 21:36:12 +0200

> We just have a few fixes this time around.
> 
> Please pull and let me know if there's any problem.

Pulled, thank you!

Re: pull-request: mac80211-next 2018-05-09

2018-05-10 Thread David Miller

From: Johannes Berg 
Date: Wed, 09 May 2018 23:29:37 +0200

> Hi,
> 
> Sorry, scratch that.
> 
> I forgot that this commit:
> 
>> Toke Høiland-Jørgensen (3):
> 
>>   cfg80211: Expose TXQ stats and parameters to userspace
> 
> caused a bunch of "too much stack" warnings - I should put in at least
> the non-driver fix for that first, and then coordinate with Kalle to
> send the driver fixes in too.

Ok, tossed.

Re: [PATCH net-next] liquidio: monitor all of Octeon's cores in watchdog thread

2018-05-10 Thread David Miller

From: Felix Manlunas 
Date: Wed, 9 May 2018 11:31:31 -0700

> The liquidio_watchdog kernel thread is watching over only 12 cores of the
> Octeon CN23XX; it's neglecting the other 4 cores that are present in the
> CN2360.  Fix it by defining LIO_MAX_CORES as 16.
> 
> Signed-off-by: Felix Manlunas 

Applied.

Re: [PATCH net-next] liquidio: bump up driver version to 1.7.2 to match newer NIC firmware

2018-05-10 Thread David Miller

From: Felix Manlunas 
Date: Wed, 9 May 2018 11:49:38 -0700

> Signed-off-by: Felix Manlunas 

Applied.

Re: [net-next v2 0/6][pull request] 100GbE Intel Wired LAN Driver Updates 2018-05-09

2018-05-10 Thread David Miller

From: Jeff Kirsher 
Date: Wed,  9 May 2018 11:10:05 -0700

> This series contains updates to fm10k only.
> 
> Jake provides all the changes in the series, starting with adding
> support for accelerated MACVLAN devices.  Reduced code duplication by
> implementing a macro to be used when setting up the type specific
> macros.  Avoided potential bugs with stats by using a macro to calculate
> the array size when passing to ensure that the size is correct.
> 
> v2: changed macro reference '#' with __stringify() as suggested by
> Joe Perches to patch 2 of the series.  Also made sure the updated
> series of patches is actually pushed to my kernel.org tree
 ...
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Pulled, thanks Jeff.

Re: [PATCH net] net/ipv6: fix lock imbalance in ip6_route_del()

2018-05-10 Thread David Miller

From: Eric Dumazet 
Date: Wed,  9 May 2018 10:05:46 -0700

> WARNING: lock held when returning to user space!
> 4.17.0-rc3+ #37 Not tainted
 ...
> Fixes: 23fb93a4d3f1 ("net/ipv6: Cleanup exception and cache route handling")
> Signed-off-by: Eric Dumazet 
> Cc: David Ahern 
> Reported-by: syzbot 

Applied to net-next.

Re: [PATCH net] tipc: fix one byte leak in tipc_sk_set_orig_addr()

2018-05-10 Thread David Miller

From: Eric Dumazet 
Date: Wed,  9 May 2018 09:50:22 -0700

> sysbot/KMSAN reported an uninit-value in recvmsg() that
> I tracked down to tipc_sk_set_orig_addr(), missing
> srcaddr->member.scope initialization.
> 
> This patches moves srcaddr->sock.scope init to follow
> fields order and ease future verifications.
 ...
> Fixes: 31c82a2d9d51 ("tipc: add second source address to 
> recvmsg()/recvfrom()")
> Signed-off-by: Eric Dumazet 
> Reported-by: syzbot 

Applied and queued up for -stable.

Re: [PATCH net] tc-testing: fix tdc tests for 'bpf' action

2018-05-10 Thread David Miller

From: Davide Caratti 
Date: Wed,  9 May 2018 18:45:42 +0200

> - correct a typo in the value of 'matchPattern' of test 282d, potentially
>  causing false negative
> - allow errors when 'teardown' executes '$TC action flush action bpf' in
>  test 282d, to fix false positive when it is run with act_bpf unloaded
> - correct the value of 'matchPattern' in test e939, causing false positive
>  in case the BPF JIT is enabled
> 
> Fixes: 440ea4ae1828 ("tc-testing: add selftests for 'bpf' action")
> Signed-off-by: Davide Caratti 

Applied.

Re: [PATCH ghak81 RFC V1 5/5] audit: collect audit task parameters

2018-05-10 Thread Richard Guy Briggs

On 2018-05-09 11:46, Paul Moore wrote:
> On Fri, May 4, 2018 at 4:54 PM, Richard Guy Briggs  wrote:
> > The audit-related parameters in struct task_struct should ideally be
> > collected together and accessed through a standard audit API.
> >
> > Collect the existing loginuid, sessionid and audit_context together in a
> > new struct audit_task_info pointer called "audit" in struct task_struct.
> >
> > Use kmem_cache to manage this pool of memory.
> > Un-inline audit_free() to be able to always recover that memory.
> >
> > See: https://github.com/linux-audit/audit-kernel/issues/81
> >
> > Signed-off-by: Richard Guy Briggs 
> > ---
> >  MAINTAINERS|  2 +-
> >  include/linux/audit.h  |  8 
> >  include/linux/audit_task.h | 31 +++
> >  include/linux/sched.h  |  6 ++
> >  init/init_task.c   |  8 ++--
> >  kernel/auditsc.c   |  4 ++--
> >  6 files changed, 46 insertions(+), 13 deletions(-)
> >  create mode 100644 include/linux/audit_task.h
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 0a1410d..8c7992d 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2510,7 +2510,7 @@ L:linux-au...@redhat.com (moderated for 
> > non-subscribers)
> >  W: https://github.com/linux-audit
> >  T: git git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git
> >  S: Supported
> > -F: include/linux/audit.h
> > +F: include/linux/audit*.h
> >  F: include/uapi/linux/audit.h
> >  F: kernel/audit*
> >
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index dba0d45..1324969 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -237,11 +237,11 @@ extern void __audit_inode_child(struct inode *parent,
> >
> >  static inline void audit_set_context(struct task_struct *task, struct 
> > audit_context *ctx)
> >  {
> > -   task->audit_context = ctx;
> > +   task->audit.ctx = ctx;
> >  }
> >  static inline struct audit_context *audit_context(struct task_struct *task)
> >  {
> > -   return task->audit_context;
> > +   return task->audit.ctx;
> >  }
> >  static inline bool audit_dummy_context(void)
> >  {
> > @@ -330,12 +330,12 @@ extern int auditsc_get_stamp(struct audit_context 
> > *ctx,
> >
> >  static inline kuid_t audit_get_loginuid(struct task_struct *tsk)
> >  {
> > -   return tsk->loginuid;
> > +   return tsk->audit.loginuid;
> >  }
> >
> >  static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
> >  {
> > -   return tsk->sessionid;
> > +   return tsk->audit.sessionid;
> >  }
> >
> >  extern void __audit_ipc_obj(struct kern_ipc_perm *ipcp);
> > diff --git a/include/linux/audit_task.h b/include/linux/audit_task.h
> > new file mode 100644
> > index 000..d4b3a20
> > --- /dev/null
> > +++ b/include/linux/audit_task.h
> > @@ -0,0 +1,31 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/* audit_task.h -- definition of audit_task_info structure
> > + *
> > + * Copyright 2018 Red Hat Inc., Raleigh, North Carolina.
> > + * All Rights Reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * Written by Richard Guy Briggs 
> > + *
> > + */
> > +
> > +#ifndef _LINUX_AUDIT_TASK_H_
> > +#define _LINUX_AUDIT_TASK_H_
> > +
> > +struct audit_context;
> > +struct audit_task_info {
> > +   kuid_t  loginuid;
> > +   unsigned intsessionid;
> > +   struct audit_context*ctx;
> > +};
> > +
> > +#endif
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index b3d697f..b58eca0 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -27,9 +27,9 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  /* task_struct member predeclarations (sorted alphabetically): */
> > -struct audit_context;
> >  struct backing_dev_info;
> >  struct bio_list;
> >  struct blk_plug;
> > @@ -832,10 +832,8 @@ struct task_struct {
> >
> > struct callback_head*task_works;
> >
> > -   struct audit_context*audit_context;
> >  #ifdef CONFIG_AUDITSYSCALL
> > -   kuid_t  loginuid;
> > -   unsigned intsessionid;
> > +   struct audit_task_info  audit;
> >  #endif
> 
> Considering that the audit_context pointer is now in the
> audit_task_info struct, should the audit_task_info struct be placed
> outside the

Re: [PATCH net-next 0/4] Misc bug fixes for HNS3 Ethernet Driver

2018-05-10 Thread David Miller

From: Salil Mehta 
Date: Wed, 9 May 2018 17:24:37 +0100

> Fixes to some of the bugs found during system test, internal review
> and clean-up

Series applied, thank you.

Re: [PATCH net-next] hv_netvsc: typo in NDIS RSS parameters structure

2018-05-10 Thread David Miller

From: Stephen Hemminger 
Date: Wed,  9 May 2018 09:00:07 -0700

> Fix simple misspelling kashkey_offset should be hashkey_offset.
> 
> Signed-off-by: Stephen Hemminger 

Applied.

Re: [PATCH v3 next-next] drivers: net: davinci_mdio: prevent spurious timeout

2018-05-10 Thread David Miller

From: Sekhar Nori 
Date: Wed, 9 May 2018 21:15:15 +0530

> A well timed kernel preemption in the time_after() loop
> in wait_for_idle() can result in a spurious timeout
> error to be returned.
> 
> Fix it by using readl_poll_timeout() which takes care of
> this issue.
> 
> Reviewed-by: Andrew Lunn 
> Signed-off-by: Sekhar Nori 

Applied.

Re: [PATCH ghak81 RFC V1 1/5] audit: normalize loginuid read access

2018-05-10 Thread Richard Guy Briggs

On 2018-05-09 11:13, Paul Moore wrote:
> On Fri, May 4, 2018 at 4:54 PM, Richard Guy Briggs  wrote:
> > Recognizing that the loginuid is an internal audit value, use an access
> > function to retrieve the audit loginuid value for the task rather than
> > reaching directly into the task struct to get it.
> >
> > Signed-off-by: Richard Guy Briggs 
> > ---
> >  kernel/auditsc.c | 16 
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index 479c031..f3817d0 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -374,7 +374,7 @@ static int audit_field_compare(struct task_struct *tsk,
> > case AUDIT_COMPARE_EGID_TO_OBJ_GID:
> > return audit_compare_gid(cred->egid, name, f, ctx);
> > case AUDIT_COMPARE_AUID_TO_OBJ_UID:
> > -   return audit_compare_uid(tsk->loginuid, name, f, ctx);
> > +   return audit_compare_uid(audit_get_loginuid(tsk), name, f, 
> > ctx);
> > case AUDIT_COMPARE_SUID_TO_OBJ_UID:
> > return audit_compare_uid(cred->suid, name, f, ctx);
> > case AUDIT_COMPARE_SGID_TO_OBJ_GID:
> > @@ -385,7 +385,7 @@ static int audit_field_compare(struct task_struct *tsk,
> > return audit_compare_gid(cred->fsgid, name, f, ctx);
> > /* uid comparisons */
> > case AUDIT_COMPARE_UID_TO_AUID:
> > -   return audit_uid_comparator(cred->uid, f->op, 
> > tsk->loginuid);
> > +   return audit_uid_comparator(cred->uid, f->op, 
> > audit_get_loginuid(tsk));
> > case AUDIT_COMPARE_UID_TO_EUID:
> > return audit_uid_comparator(cred->uid, f->op, cred->euid);
> > case AUDIT_COMPARE_UID_TO_SUID:
> > @@ -394,11 +394,11 @@ static int audit_field_compare(struct task_struct 
> > *tsk,
> > return audit_uid_comparator(cred->uid, f->op, cred->fsuid);
> > /* auid comparisons */
> > case AUDIT_COMPARE_AUID_TO_EUID:
> > -   return audit_uid_comparator(tsk->loginuid, f->op, 
> > cred->euid);
> > +   return audit_uid_comparator(audit_get_loginuid(tsk), f->op, 
> > cred->euid);
> > case AUDIT_COMPARE_AUID_TO_SUID:
> > -   return audit_uid_comparator(tsk->loginuid, f->op, 
> > cred->suid);
> > +   return audit_uid_comparator(audit_get_loginuid(tsk), f->op, 
> > cred->suid);
> > case AUDIT_COMPARE_AUID_TO_FSUID:
> > -   return audit_uid_comparator(tsk->loginuid, f->op, 
> > cred->fsuid);
> > +   return audit_uid_comparator(audit_get_loginuid(tsk), f->op, 
> > cred->fsuid);
> > /* euid comparisons */
> > case AUDIT_COMPARE_EUID_TO_SUID:
> > return audit_uid_comparator(cred->euid, f->op, cred->suid);
> > @@ -611,7 +611,7 @@ static int audit_filter_rules(struct task_struct *tsk,
> > result = match_tree_refs(ctx, rule->tree);
> > break;
> > case AUDIT_LOGINUID:
> > -   result = audit_uid_comparator(tsk->loginuid, f->op, 
> > f->uid);
> > +   result = 
> > audit_uid_comparator(audit_get_loginuid(tsk), f->op, f->uid);
> > break;
> > case AUDIT_LOGINUID_SET:
> > result = audit_comparator(audit_loginuid_set(tsk), 
> > f->op, f->val);
> > @@ -2287,8 +2287,8 @@ int audit_signal_info(int sig, struct task_struct *t)
> > (sig == SIGTERM || sig == SIGHUP ||
> >  sig == SIGUSR1 || sig == SIGUSR2)) {
> > audit_sig_pid = task_tgid_nr(tsk);
> > -   if (uid_valid(tsk->loginuid))
> > -   audit_sig_uid = tsk->loginuid;
> > +   if (uid_valid(audit_get_loginuid(tsk)))
> > +   audit_sig_uid = audit_get_loginuid(tsk);
> 
> I realize this comment is a little silly given the nature of loginuid,
> but if we are going to abstract away loginuid accesses (which I think
> is good), we should probably access it once, store it in a local
> variable, perform the validity check on the local variable, then
> commit the local variable to audit_sig_uid.  I realize a TOCTOU
> problem is unlikely here, but with this new layer of abstraction it
> seems that some additional safety might be a good thing.

Ok, I'll just assign it to where it is going and check it there, holding
the audit_ctl_lock the whole time, since it should have been done
anyways for all of audit_sig_{pid,uid,sid} anyways to get a consistent
view from the AUDIT_SIGNAL_INFO fetch.

> > else
> > audit_sig_uid = uid;
> > security_task_getsecid(tsk, _sig_sid);

> paul moore

- RGB

--
Richard Guy Briggs 
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice:

Re: [PATCH ghak81 RFC V1 3/5] audit: use inline function to get audit context

2018-05-10 Thread Richard Guy Briggs

On 2018-05-09 11:28, Paul Moore wrote:
> On Fri, May 4, 2018 at 4:54 PM, Richard Guy Briggs  wrote:
> > Recognizing that the audit context is an internal audit value, use an
> > access function to retrieve the audit context pointer for the task
> > rather than reaching directly into the task struct to get it.
> >
> > Signed-off-by: Richard Guy Briggs 
> > ---
> >  include/linux/audit.h| 16 ---
> >  include/net/xfrm.h   |  2 +-
> >  kernel/audit.c   |  4 +--
> >  kernel/audit_watch.c |  2 +-
> >  kernel/auditsc.c | 52 
> > ++--
> >  net/bridge/netfilter/ebtables.c  |  2 +-
> >  net/core/dev.c   |  2 +-
> >  net/netfilter/x_tables.c |  2 +-
> >  net/netlabel/netlabel_user.c |  2 +-
> >  security/integrity/ima/ima_api.c |  2 +-
> >  security/integrity/integrity_audit.c |  2 +-
> >  security/lsm_audit.c |  2 +-
> >  security/selinux/hooks.c |  4 +--
> >  security/selinux/selinuxfs.c |  6 ++---
> >  security/selinux/ss/services.c   | 12 -
> >  15 files changed, 60 insertions(+), 52 deletions(-)
> >
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 5f86f7c..93e4c61 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -235,26 +235,30 @@ extern void __audit_inode_child(struct inode *parent,
> >  extern void __audit_seccomp(unsigned long syscall, long signr, int code);
> >  extern void __audit_ptrace(struct task_struct *t);
> >
> > +static inline struct audit_context *audit_context(struct task_struct *task)
> > +{
> > +   return task->audit_context;
> > +}
> 
> Another case where I think I agree with everything here on principle,
> especially when one considers it in the larger context of the audit
> container ID work.  However, I think we might be able to somply this a
> bit by eliminating the parameter to the new audit_context() helper and
> making it always reference the current task_struct.  Based on this
> patch it would appear that this change would work for all callers
> except for audit_take_context() and __audit_syscall_entry(), both of
> which are contained within the core audit code and are enough of a
> special case that I think it is acceptable for them to access the
> context directly.  I'm trying to think of reasons why a non-audit
> kernel subsystem would ever need to access the audit context of a
> process other than current and I can't think of any ... removing the
> task_struct pointer might help prevent mistakes/abuse in the future.

As for __audit_syscall_{entry,exit}() and audit_signal_info(), they are
using current.  current is assigned to local variable tsk only to be
used as the LHS in assignments and for locking.

But, audit_take_context() and audit_log_exit() are both called also from
__audit_free() which can have non-current handed to it by copy_process()
cleaning up, while do_exit() appears to still be in current.

So, Ok, ditch the parameter to audit_context() and use local access when
needed.

> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index 6e3ceb9..a4bbdcc 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -836,7 +836,7 @@ static inline struct audit_context 
> > *audit_take_context(struct task_struct *tsk,
> >   int return_valid,
> >   long return_code)
> >  {
> > -   struct audit_context *context = tsk->audit_context;
> > +   struct audit_context *context = audit_context(tsk);
> >
> > if (!context)
> > return NULL;
> > @@ -1510,7 +1510,7 @@ void __audit_syscall_entry(int major, unsigned long 
> > a1, unsigned long a2,
> >unsigned long a3, unsigned long a4)
> >  {
> > struct task_struct *tsk = current;
> > -   struct audit_context *context = tsk->audit_context;
> > +   struct audit_context *context = audit_context(tsk);
> > enum audit_state state;
> >
> > if (!audit_enabled || !context)
> 
> -- 
> paul moore
> www.paul-moore.com

- RGB

--
Richard Guy Briggs 
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

linux-next: Signed-off-by missing for commit in the net tree

2018-05-10 Thread Stephen Rothwell

Hi all,

Commit

  0e8411e426e2 ("ipv4: reset fnhe_mtu_locked after cache route flushed")

is missing a Signed-off-by from its author.

-- 
Cheers,
Stephen Rothwell


pgp2hPLk4JDiD.pgp
Description: OpenPGP digital signature

Re: [PATCH net] sctp: remove sctp_chunk_put from fail_mark err path in sctp_ulpevent_make_rcvmsg

2018-05-10 Thread Marcelo Ricardo Leitner

On Thu, May 10, 2018 at 05:34:13PM +0800, Xin Long wrote:
> In Commit 1f45f78f8e51 ("sctp: allow GSO frags to access the chunk too"),
> it held the chunk in sctp_ulpevent_make_rcvmsg to access it safely later
> in recvmsg. However, it also added sctp_chunk_put in fail_mark err path,
> which is only triggered before holding the chunk.
>
> syzbot reported a use-after-free crash happened on this err path, where
> it shouldn't call sctp_chunk_put.
>
> This patch simply removes this call.
>
> Fixes: 1f45f78f8e51 ("sctp: allow GSO frags to access the chunk too")
> Reported-by: syzbot+141d898c5f24489db...@syzkaller.appspotmail.com
> Signed-off-by: Xin Long 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  net/sctp/ulpevent.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
> index 84207ad..8cb7d98 100644
> --- a/net/sctp/ulpevent.c
> +++ b/net/sctp/ulpevent.c
> @@ -715,7 +715,6 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct 
> sctp_association *asoc,
>   return event;
>
>  fail_mark:
> - sctp_chunk_put(chunk);
>   kfree_skb(skb);
>  fail:
>   return NULL;
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH net-next] tcp: switch pacing timer to softirq based hrtimer

2018-05-10 Thread Eric Dumazet

On 05/10/2018 12:49 PM, Eric Dumazet wrote:
> linux-4.16 got support for softirq based hrtimers.
> TCP can switch its pacing hrtimer to this variant, since this
> avoids going through a tasklet and some atomic operations.
> 

I need to send a V2, adding a test of hrtimer_cancel() return value
in tcp_clear_xmit_timers() to eventually release the socket reference.

Re: [PATCH v6 4/5] ixgbe: Report PCIe link properties with pcie_print_link_status()

2018-05-10 Thread Jeff Kirsher

On Thu, 2018-05-03 at 15:00 -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> Previously the driver used pcie_get_minimum_link() to warn when the
> NIC
> is in a slot that can't supply as much bandwidth as the NIC could
> use.
> 
> pcie_get_minimum_link() can be misleading because it finds the
> slowest link
> and the narrowest link (which may be different links) without
> considering
> the total bandwidth of each link.  For a path with a 16 GT/s x1 link
> and a
> 2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250
> MB/s of
> bandwidth, not the true available bandwidth of about 1969 MB/s for a
> 16 GT/s x1 link.
> 
> Use pcie_print_link_status() to report PCIe link speed and possible
> limitations instead of implementing this in the driver itself.  This
> finds
> the slowest link in the path to the device by computing the total
> bandwidth
> of each link and compares that with the capabilities of the device.
> 
> The dmesg change is:
> 
>   - PCI Express bandwidth of %dGT/s available
>   - (Speed:%s, Width: x%d, Encoding Loss:%s)
>   + %u.%03u Gb/s available PCIe bandwidth (%s x%d link)
> 
> or, if the device is capable of better performance than is available
> in the
> current slot:
> 
>   - This is not sufficient for optimal performance of this card.
>   - For optimal performance, at least %dGT/s of bandwidth is
> required.
>   - A slot with more lanes and/or higher speed is suggested.
>   + %u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at
> %s (capable of %u.%03u Gb/s with %s x%d link)
> 
> Note that the driver previously used dev_warn() to suggest using a
> different slot, but pcie_print_link_status() uses dev_info() because
> if the
> platform has no faster slot available, the user can't do anything
> about the
> warning and may not want to be bothered with it.
> 
> Signed-off-by: Bjorn Helgaas 

Acked-by: Jeff Kirsher 

Since this is apart of a series, I am not planning to pick this up and
push to David Miller in my ixgbe updates.  This should remain in the
series so David can pick up the entire series at once.

> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   47 +
> 
>  1 file changed, 1 insertion(+), 46 deletions(-)


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v6 5/5] PCI: Remove unused pcie_get_minimum_link()

2018-05-10 Thread Jeff Kirsher

On Thu, 2018-05-10 at 11:33 -0500, Bjorn Helgaas wrote:
> On Thu, May 03, 2018 at 03:00:43PM -0500, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas 
> > 
> > In some cases pcie_get_minimum_link() returned misleading
> > information
> > because it found the slowest link and the narrowest link without
> > considering the total bandwidth of the link.
> > 
> > For example, consider a path with these two links:
> > 
> >- 16.0 GT/s  x1 link  (16.0 * 10^9 * 128 / 130) *  1 / 8 = 1969
> > MB/s
> >-  2.5 GT/s x16 link  ( 2.5 * 10^9 *   8 /  10) * 16 / 8 = 4000
> > MB/s
> > 
> > The available bandwidth of the path is limited by the 16 GT/s link
> > to about
> > 1969 MB/s, but pcie_get_minimum_link() returned 2.5 GT/s x1, which
> > corresponds to only 250 MB/s.
> > 
> > Callers should use pcie_print_link_status() instead, or
> > pcie_bandwidth_available() if they need more detailed information.
> > 
> > Remove pcie_get_minimum_link() since there are no callers left.
> > 
> > Signed-off-by: Bjorn Helgaas 
> 
> Hi Jeff,
> 
> I got your note that you applied this to dev-queue.  I assume that
> means you also applied the preceding patches that removed all the
> users.  I got a note about ixgbe, but not the others, so I'm just
> double-checking.

I did initially apply it, but realized that I would have to apply the
earlier patches as well, which did not pertain to the Intel wired LAN
drivers.  So I have removed this patch from queue and will only be
testing the ixgbe patch of the series, which Andrew has already tested
and responded to.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-next v2 3/9] net: phy: phylink: Poll link GPIOs

2018-05-10 Thread Florian Fainelli

On 05/10/2018 01:29 PM, Russell King - ARM Linux wrote:
> On Thu, May 10, 2018 at 01:17:31PM -0700, Florian Fainelli wrote:
>> From: Russell King 
>>
>> When using a fixed link with a link GPIO, we need to poll that GPIO to
>> determine link state changes. This is consistent with what fixed_phy.c does.
>>
>> Signed-off-by: Florian Fainelli 
> 
> I'd like this to use the GPIO interrupt where available, only falling back
> to the timer approach when there's no interrupt.  Unfortunately, I don't
> have much time to devote to this at the moment, having recently been away
> on vacation, and now having to work on ARM specific issues for probably
> all of the remainder of this kernel cycle.
> 
> That means I won't have time to test your series on any of the boards
> I have available to me.

No worries, thanks for looking at this. Andrew and I both tested this on
the devel B and C boards where this is primarily useful. Can I still get
your SoB for the portions you authored?

I will follow up with a change that uses GPIO interrupts when they are
available.

> 
>> ---
>>  drivers/net/phy/phylink.c | 16 
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
>> index 6392b5248cf5..581ce93ecaf9 100644
>> --- a/drivers/net/phy/phylink.c
>> +++ b/drivers/net/phy/phylink.c
>> @@ -19,6 +19,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  
>>  #include "sfp.h"
>> @@ -54,6 +55,7 @@ struct phylink {
>>  /* The link configuration settings */
>>  struct phylink_link_state link_config;
>>  struct gpio_desc *link_gpio;
>> +struct timer_list link_poll;
>>  void (*get_fixed_state)(struct net_device *dev,
>>  struct phylink_link_state *s);
>>  
>> @@ -500,6 +502,15 @@ static void phylink_run_resolve(struct phylink *pl)
>>  queue_work(system_power_efficient_wq, >resolve);
>>  }
>>  
>> +static void phylink_fixed_poll(struct timer_list *t)
>> +{
>> +struct phylink *pl = container_of(t, struct phylink, link_poll);
>> +
>> +mod_timer(t, jiffies + HZ);
>> +
>> +phylink_run_resolve(pl);
>> +}
>> +
>>  static const struct sfp_upstream_ops sfp_phylink_ops;
>>  
>>  static int phylink_register_sfp(struct phylink *pl,
>> @@ -572,6 +583,7 @@ struct phylink *phylink_create(struct net_device *ndev,
>>  pl->link_config.an_enabled = true;
>>  pl->ops = ops;
>>  __set_bit(PHYLINK_DISABLE_STOPPED, >phylink_disable_state);
>> +timer_setup(>link_poll, phylink_fixed_poll, 0);
>>  
>>  bitmap_fill(pl->supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
>>  linkmode_copy(pl->link_config.advertising, pl->supported);
>> @@ -905,6 +917,8 @@ void phylink_start(struct phylink *pl)
>>  clear_bit(PHYLINK_DISABLE_STOPPED, >phylink_disable_state);
>>  phylink_run_resolve(pl);
>>  
>> +if (pl->link_an_mode == MLO_AN_FIXED && !IS_ERR(pl->link_gpio))
>> +mod_timer(>link_poll, jiffies + HZ);
>>  if (pl->sfp_bus)
>>  sfp_upstream_start(pl->sfp_bus);
>>  if (pl->phydev)
>> @@ -929,6 +943,8 @@ void phylink_stop(struct phylink *pl)
>>  phy_stop(pl->phydev);
>>  if (pl->sfp_bus)
>>  sfp_upstream_stop(pl->sfp_bus);
>> +if (pl->link_an_mode == MLO_AN_FIXED && !IS_ERR(pl->link_gpio))
>> +del_timer_sync(>link_poll);
>>  
>>  set_bit(PHYLINK_DISABLE_STOPPED, >phylink_disable_state);
>>  queue_work(system_power_efficient_wq, >resolve);
>> -- 
>> 2.14.1
>>
> 


-- 
Florian

Re: [PATCH net-next v2 3/9] net: phy: phylink: Poll link GPIOs

2018-05-10 Thread Russell King - ARM Linux

On Thu, May 10, 2018 at 01:17:31PM -0700, Florian Fainelli wrote:
> From: Russell King 
> 
> When using a fixed link with a link GPIO, we need to poll that GPIO to
> determine link state changes. This is consistent with what fixed_phy.c does.
> 
> Signed-off-by: Florian Fainelli 

I'd like this to use the GPIO interrupt where available, only falling back
to the timer approach when there's no interrupt.  Unfortunately, I don't
have much time to devote to this at the moment, having recently been away
on vacation, and now having to work on ARM specific issues for probably
all of the remainder of this kernel cycle.

That means I won't have time to test your series on any of the boards
I have available to me.

> ---
>  drivers/net/phy/phylink.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 6392b5248cf5..581ce93ecaf9 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "sfp.h"
> @@ -54,6 +55,7 @@ struct phylink {
>   /* The link configuration settings */
>   struct phylink_link_state link_config;
>   struct gpio_desc *link_gpio;
> + struct timer_list link_poll;
>   void (*get_fixed_state)(struct net_device *dev,
>   struct phylink_link_state *s);
>  
> @@ -500,6 +502,15 @@ static void phylink_run_resolve(struct phylink *pl)
>   queue_work(system_power_efficient_wq, >resolve);
>  }
>  
> +static void phylink_fixed_poll(struct timer_list *t)
> +{
> + struct phylink *pl = container_of(t, struct phylink, link_poll);
> +
> + mod_timer(t, jiffies + HZ);
> +
> + phylink_run_resolve(pl);
> +}
> +
>  static const struct sfp_upstream_ops sfp_phylink_ops;
>  
>  static int phylink_register_sfp(struct phylink *pl,
> @@ -572,6 +583,7 @@ struct phylink *phylink_create(struct net_device *ndev,
>   pl->link_config.an_enabled = true;
>   pl->ops = ops;
>   __set_bit(PHYLINK_DISABLE_STOPPED, >phylink_disable_state);
> + timer_setup(>link_poll, phylink_fixed_poll, 0);
>  
>   bitmap_fill(pl->supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
>   linkmode_copy(pl->link_config.advertising, pl->supported);
> @@ -905,6 +917,8 @@ void phylink_start(struct phylink *pl)
>   clear_bit(PHYLINK_DISABLE_STOPPED, >phylink_disable_state);
>   phylink_run_resolve(pl);
>  
> + if (pl->link_an_mode == MLO_AN_FIXED && !IS_ERR(pl->link_gpio))
> + mod_timer(>link_poll, jiffies + HZ);
>   if (pl->sfp_bus)
>   sfp_upstream_start(pl->sfp_bus);
>   if (pl->phydev)
> @@ -929,6 +943,8 @@ void phylink_stop(struct phylink *pl)
>   phy_stop(pl->phydev);
>   if (pl->sfp_bus)
>   sfp_upstream_stop(pl->sfp_bus);
> + if (pl->link_an_mode == MLO_AN_FIXED && !IS_ERR(pl->link_gpio))
> + del_timer_sync(>link_poll);
>  
>   set_bit(PHYLINK_DISABLE_STOPPED, >phylink_disable_state);
>   queue_work(system_power_efficient_wq, >resolve);
> -- 
> 2.14.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net] macmace: Set platform device coherent_dma_mask

2018-05-10 Thread Michael Schmitz

Hi Finn,

On Thu, May 10, 2018 at 1:25 PM, Finn Thain  wrote:
> On Thu, 3 May 2018, Geert Uytterhoeven wrote:
>
>>
>> Perhaps you can add a new helper
>> (platform_device_register_simple_dma()?) that takes the DMA mask, too?
[...]
> To actually hoist the dma mask setup out of existing platform drivers
> would have implications for every device that matches with those drivers.
>
> That's a bit risky since I can't test those devices -- that's assuming I
> could identify them all; sometimes platform device matching is not well
> defined at build time (see loongson_sysconf.ecname).
>
> So far, it looks like macmace and macsonic would be the only callers of
> this new API call.
>
> What's worse, if you do pass a dma_mask in struct platform_device_info,
> you end up with this problem in platform_device_register_full():
>
> if (pdevinfo->dma_mask) {
> /*
>  * This memory isn't freed when the device is put,
>  * I don't have a nice idea for that though.  Conceptually
>  * dma_mask in struct device should not be a pointer.
>  * See http://thread.gmane.org/gmane.linux.kernel.pci/9081
>  */
> pdev->dev.dma_mask =
> kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL);

Maybe platform_device_register_full() should rather check whether
dev.coherent_dma_mask is set, and make dev.dma_mask point to that?
This is how we solved the warning issue for the Zorro bus devices...
(8614f1b58bd0e920a5859464a500b93152c5f8b1)

Not sure what the ramifications of that change would be in the general
case (i.e. platforms where coherent and non-coherent DMA operations
must use different masks). I'd hope all those platforms explicitly set
up their DMA masks anyway.

Your other comment regarding the default used by dma_get_mask() is spot on.

> Most of the platform drivers that call dma_coerce_mask_and_coherent() are
> using pdev->of_match_table, not platform_device_register_simple(). Many of
> them have a comment like this:
>
> /*
>  * Right now device-tree probed devices don't get dma_mask set.
>  * Since shared usb code relies on it, set it here for now.
>  * Once we have dma capability bindings this can go away.
>  */
>
>> With people setting the mask to kill the WARNING splat, this may become
>> more common.
>
> Since the commit which introduced the WARNING, only commits f61e64310b75
> ("m68k: set dma and coherent masks for platform FEC ethernets") and
> 7bcfab202ca7 ("powerpc/macio: set a proper dma_coherent_mask") seem to be
> aimed at squelching that WARNING.
>
> (Am I missing any others?)

Zorro devices :-) Which begs the question: why can' you set up all
Nubus bus devices' DMA masks in nubus_device_register(), or
nubus_add_board()?

> So far, this is not looking like a common problem, and I'm having trouble
> finding some way to improve on my original patches.

Putting this in the core platform device code might have too many
unintended side effects. Platform specific bus drivers or device
drivers might be a safer place to put this. Makes it harder for
Christoph to find all instances of such workarounds though.

Cheers,

  Michael

[PATCH net-next v2 5/9] net: dsa: bcm_sf2: Implement phylink_mac_ops

2018-05-10 Thread Florian Fainelli

Make the bcm_sf2 driver implement phylink_mac_ops since it needs to
support a wide variety of network interfaces: internal & external MDIO
PHYs, fixed PHYs, MoCA with MMIO link status.

A large amount of what needs to be done already exists under
bcm_sf2_sw_adjust_link() so we are essentially breaking this down into
the necessary operation for PHYLINK to work: mac_config, mac_link_up,
mac_link_down and validate. We can now entirely get rid of most of what
fixed_link_update() provided because only the link information is actually
necessary. We still have to force DUPLEX_FULL for legacy Device Tree bindings
that did not specify that before.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 214 --
 1 file changed, 206 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 97236cfcbae4..a20608b0329e 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -306,7 +307,8 @@ static int bcm_sf2_sw_mdio_write(struct mii_bus *bus, int 
addr, int regnum,
 
 static irqreturn_t bcm_sf2_switch_0_isr(int irq, void *dev_id)
 {
-   struct bcm_sf2_priv *priv = dev_id;
+   struct dsa_switch *ds = dev_id;
+   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
 
priv->irq0_stat = intrl2_0_readl(priv, INTRL2_CPU_STATUS) &
~priv->irq0_mask;
@@ -317,16 +319,21 @@ static irqreturn_t bcm_sf2_switch_0_isr(int irq, void 
*dev_id)
 
 static irqreturn_t bcm_sf2_switch_1_isr(int irq, void *dev_id)
 {
-   struct bcm_sf2_priv *priv = dev_id;
+   struct dsa_switch *ds = dev_id;
+   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
 
priv->irq1_stat = intrl2_1_readl(priv, INTRL2_CPU_STATUS) &
~priv->irq1_mask;
intrl2_1_writel(priv, priv->irq1_stat, INTRL2_CPU_CLEAR);
 
-   if (priv->irq1_stat & P_LINK_UP_IRQ(P7_IRQ_OFF))
-   priv->port_sts[7].link = 1;
-   if (priv->irq1_stat & P_LINK_DOWN_IRQ(P7_IRQ_OFF))
-   priv->port_sts[7].link = 0;
+   if (priv->irq1_stat & P_LINK_UP_IRQ(P7_IRQ_OFF)) {
+   priv->port_sts[7].link = true;
+   dsa_port_phylink_mac_change(ds, 7, true);
+   }
+   if (priv->irq1_stat & P_LINK_DOWN_IRQ(P7_IRQ_OFF)) {
+   priv->port_sts[7].link = false;
+   dsa_port_phylink_mac_change(ds, 7, false);
+   }
 
return IRQ_HANDLED;
 }
@@ -620,6 +627,192 @@ static void bcm_sf2_sw_fixed_link_update(struct 
dsa_switch *ds, int port,
status->pause = 1;
 }
 
+static void bcm_sf2_sw_validate(struct dsa_switch *ds, int port,
+   unsigned long *supported,
+   struct phylink_link_state *state)
+{
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
+
+   if (!phy_interface_mode_is_rgmii(state->interface) &&
+   state->interface != PHY_INTERFACE_MODE_MII &&
+   state->interface != PHY_INTERFACE_MODE_REVMII &&
+   state->interface != PHY_INTERFACE_MODE_GMII &&
+   state->interface != PHY_INTERFACE_MODE_INTERNAL &&
+   state->interface != PHY_INTERFACE_MODE_MOCA) {
+   bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
+   dev_err(ds->dev,
+   "Unsupported interface: %d\n", state->interface);
+   return;
+   }
+
+   /* Allow all the expected bits */
+   phylink_set(mask, Autoneg);
+   phylink_set_port_modes(mask);
+   phylink_set(mask, Pause);
+   phylink_set(mask, Asym_Pause);
+
+   /* With the exclusion of MII and Reverse MII, we support Gigabit,
+* including Half duplex
+*/
+   if (state->interface != PHY_INTERFACE_MODE_MII &&
+   state->interface != PHY_INTERFACE_MODE_REVMII) {
+   phylink_set(mask, 1000baseT_Full);
+   phylink_set(mask, 1000baseT_Half);
+   }
+
+   phylink_set(mask, 10baseT_Half);
+   phylink_set(mask, 10baseT_Full);
+   phylink_set(mask, 100baseT_Half);
+   phylink_set(mask, 100baseT_Full);
+
+   bitmap_and(supported, supported, mask,
+  __ETHTOOL_LINK_MODE_MASK_NBITS);
+   bitmap_and(state->advertising, state->advertising, mask,
+  __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static void bcm_sf2_sw_mac_config(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ const struct phylink_link_state *state)
+{
+   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
+   u32 id_mode_dis = 0, port_mode;
+   u32 reg, offset;
+
+   if (priv->type == BCM7445_DEVICE_ID)
+   offset = CORE_STS_OVERRIDE_GMIIP_PORT(port);
+   else
+   offset =

[PATCH net-next v2 4/9] net: dsa: Add PHYLINK switch operations

2018-05-10 Thread Florian Fainelli

In preparation for adding support for PHYLINK within DSA, define a number of
operations that we will need and that switch drivers can start implementing.
Proper integration with PHYLINK will follow in subsequent patches.

We start selecting PHYLINK (which implies PHYLIB) in net/dsa/Kconfig
such that drivers can be guaranteed that this dependency is properly
taken care of and can start referencing PHYLINK helper functions without
requiring stubs or anything.

Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h | 24 
 net/dsa/Kconfig   |  2 +-
 net/dsa/slave.c   |  5 +
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 462e9741b210..ed64c1f3f117 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -20,12 +20,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 struct tc_action;
 struct phy_device;
 struct fixed_phy_status;
+struct phylink_link_state;
 
 enum dsa_tag_protocol {
DSA_TAG_PROTO_NONE = 0,
@@ -353,6 +355,27 @@ struct dsa_switch_ops {
void(*fixed_link_update)(struct dsa_switch *ds, int port,
struct fixed_phy_status *st);
 
+   /*
+* PHYLINK integration
+*/
+   void(*phylink_validate)(struct dsa_switch *ds, int port,
+   unsigned long *supported,
+   struct phylink_link_state *state);
+   int (*phylink_mac_link_state)(struct dsa_switch *ds, int port,
+ struct phylink_link_state *state);
+   void(*phylink_mac_config)(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ const struct phylink_link_state *state);
+   void(*phylink_mac_an_restart)(struct dsa_switch *ds, int port);
+   void(*phylink_mac_link_down)(struct dsa_switch *ds, int port,
+unsigned int mode,
+phy_interface_t interface);
+   void(*phylink_mac_link_up)(struct dsa_switch *ds, int port,
+  unsigned int mode,
+  phy_interface_t interface,
+  struct phy_device *phydev);
+   void(*phylink_fixed_state)(struct dsa_switch *ds, int port,
+  struct phylink_link_state *state);
/*
 * ethtool hardware statistics.
 */
@@ -595,5 +618,6 @@ static inline int call_dsa_notifiers(unsigned long val, 
struct net_device *dev,
 int dsa_port_get_phy_strings(struct dsa_port *dp, uint8_t *data);
 int dsa_port_get_ethtool_phy_stats(struct dsa_port *dp, uint64_t *data);
 int dsa_port_get_phy_sset_count(struct dsa_port *dp);
+void dsa_port_phylink_mac_change(struct dsa_switch *ds, int port, bool up);
 
 #endif
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index bbf2c82cf7b2..4183e4ba27a5 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -9,7 +9,7 @@ config NET_DSA
depends on HAVE_NET_DSA && MAY_USE_DEVLINK
depends on BRIDGE || BRIDGE=n
select NET_SWITCHDEV
-   select PHYLIB
+   select PHYLINK
---help---
  Say Y if you want to enable support for the hardware switches 
supported
  by the Distributed Switch Architecture.
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 746ab428a17a..6c2f042e3c29 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1119,6 +1119,11 @@ static int dsa_slave_phy_connect(struct net_device 
*slave_dev, int addr)
  dsa_slave_adjust_link, p->phy_interface);
 }
 
+void dsa_port_phylink_mac_change(struct dsa_switch *ds, int port, bool up)
+{
+}
+EXPORT_SYMBOL_GPL(dsa_port_phylink_mac_change);
+
 static int dsa_slave_phy_setup(struct net_device *slave_dev)
 {
struct dsa_port *dp = dsa_slave_to_port(slave_dev);
-- 
2.14.1

[PATCH net-next v2 8/9] net: dsa: Plug in PHYLINK support

2018-05-10 Thread Florian Fainelli

Add support for PHYLINK within the DSA subsystem in order to support more
complex devices such as pluggable (SFP) and non-pluggable (SFF) modules, 10G
PHYs, and traditional PHYs. Using PHYLINK allows us to drop some amount of
complexity we had while probing fixed and non-fixed PHYs using Device Tree.

Because PHYLINK separates the Ethernet MAC/port configuration into different
stages, we let switch drivers implement those, and for now, we maintain
functionality by calling dsa_slave_adjust_link() during
phylink_mac_link_{up,down} which provides semantically equivalent steps.

Drivers willing to take advantage of PHYLINK should implement the phylink_mac_*
operations that DSA wraps.

We cannot quite remove the adjust_link() callback just yet, because a number of
drivers rely on that for configuring their "CPU" and "DSA" ports, this is done
dsa_port_setup_phy_of() and dsa_port_fixed_link_register_of() still.

Drivers that utilize fixed links for user-facing ports (e.g: bcm_sf2) will need
to implement phylink_mac_ops from now on to preserve functionality, since 
PHYLINK
*does not* create a phy_device instance for fixed links.

Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h  |   1 +
 net/dsa/dsa_priv.h |   9 --
 net/dsa/slave.c| 294 +++--
 3 files changed, 172 insertions(+), 132 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index ed64c1f3f117..fdbd6082945d 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -201,6 +201,7 @@ struct dsa_port {
u8  stp_state;
struct net_device   *bridge_dev;
struct devlink_port devlink_port;
+   struct phylink  *pl;
/*
 * Original copy of the master netdev ethtool_ops
 */
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 053731473c99..3964c6f7a7c0 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -75,15 +75,6 @@ struct dsa_slave_priv {
/* DSA port data, such as switch, port index, etc. */
struct dsa_port *dp;
 
-   /*
-* The phylib phy_device pointer for the PHY connected
-* to this port.
-*/
-   phy_interface_t phy_interface;
-   int old_link;
-   int old_pause;
-   int old_duplex;
-
 #ifdef CONFIG_NET_POLL_CONTROLLER
struct netpoll  *netpoll;
 #endif
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 729f18d23bdd..1e3b6a6d8a40 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -97,8 +98,7 @@ static int dsa_slave_open(struct net_device *dev)
if (err)
goto clear_promisc;
 
-   if (dev->phydev)
-   phy_start(dev->phydev);
+   phylink_start(dp->pl);
 
return 0;
 
@@ -120,8 +120,7 @@ static int dsa_slave_close(struct net_device *dev)
struct net_device *master = dsa_slave_to_master(dev);
struct dsa_port *dp = dsa_slave_to_port(dev);
 
-   if (dev->phydev)
-   phy_stop(dev->phydev);
+   phylink_stop(dp->pl);
 
dsa_port_disable(dp, dev->phydev);
 
@@ -272,10 +271,7 @@ static int dsa_slave_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
break;
}
 
-   if (!dev->phydev)
-   return -ENODEV;
-
-   return phy_mii_ioctl(dev->phydev, ifr, cmd);
+   return phylink_mii_ioctl(p->dp->pl, ifr, cmd);
 }
 
 static int dsa_slave_port_attr_set(struct net_device *dev,
@@ -498,6 +494,13 @@ dsa_slave_get_regs(struct net_device *dev, struct 
ethtool_regs *regs, void *_p)
ds->ops->get_regs(ds, dp->index, regs, _p);
 }
 
+static int dsa_slave_nway_reset(struct net_device *dev)
+{
+   struct dsa_port *dp = dsa_slave_to_port(dev);
+
+   return phylink_ethtool_nway_reset(dp->pl);
+}
+
 static int dsa_slave_get_eeprom_len(struct net_device *dev)
 {
struct dsa_port *dp = dsa_slave_to_port(dev);
@@ -609,6 +612,8 @@ static void dsa_slave_get_wol(struct net_device *dev, 
struct ethtool_wolinfo *w)
struct dsa_port *dp = dsa_slave_to_port(dev);
struct dsa_switch *ds = dp->ds;
 
+   phylink_ethtool_get_wol(dp->pl, w);
+
if (ds->ops->get_wol)
ds->ops->get_wol(ds, dp->index, w);
 }
@@ -619,6 +624,8 @@ static int dsa_slave_set_wol(struct net_device *dev, struct 
ethtool_wolinfo *w)
struct dsa_switch *ds = dp->ds;
int ret = -EOPNOTSUPP;
 
+   phylink_ethtool_set_wol(dp->pl, w);
+
if (ds->ops->set_wol)
ret = ds->ops->set_wol(ds, dp->index, w);
 
@@ -642,13 +649,7 @@ static int dsa_slave_set_eee(struct net_device *dev, 
struct ethtool_eee *e)
if (ret)
return ret;
 
-   if (e->eee_enabled) {
-   ret = phy_init_eee(dev->phydev, 0);
-   if (ret)
-

[PATCH net-next v2 2/9] net: phy: phylink: Release link GPIO

2018-05-10 Thread Florian Fainelli

We are not releasing the link GPIO descriptor with gpiod_put() which results in
subsequent probing to get -EBUSY when calling fwnode_get_named_gpiod(). Fix this
by doing the release in phylink_destroy().

Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phylink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 412d1cf4fa66..6392b5248cf5 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -612,6 +612,8 @@ void phylink_destroy(struct phylink *pl)
 {
if (pl->sfp_bus)
sfp_unregister_upstream(pl->sfp_bus);
+   if (!IS_ERR(pl->link_gpio))
+   gpiod_put(pl->link_gpio);
 
cancel_work_sync(>resolve);
kfree(pl);
-- 
2.14.1

1 2 3 >

1 - 100 of 243 matches

Mail list logo