Re: [PATCH V2] mlx4_core: allocate ICM memory in page size chunks

2018-05-16 Thread Tariq Toukan



On 15/05/2018 9:53 PM, Qing Huang wrote:



On 5/15/2018 2:19 AM, Tariq Toukan wrote:



On 14/05/2018 7:41 PM, Qing Huang wrote:



On 5/13/2018 2:00 AM, Tariq Toukan wrote:



On 11/05/2018 10:23 PM, Qing Huang wrote:

When a system is under memory presure (high usage with fragments),
the original 256KB ICM chunk allocations will likely trigger kernel
memory management to enter slow path doing memory compact/migration
ops in order to complete high order memory allocations.

When that happens, user processes calling uverb APIs may get stuck
for more than 120s easily even though there are a lot of free pages
in smaller chunks available in the system.

Syslog:
...
Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
oracle_205573_e:205573 blocked for more than 120 seconds.
...

With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.

However in order to support smaller ICM chunk size, we need to fix
another issue in large size kcalloc allocations.

E.g.
Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
size, each ICM chunk can only hold 512 mtt entries (8 bytes for 
each mtt

entry). So we need a 16MB allocation for a table->icm pointer array to
hold 2M pointers which can easily cause kcalloc to fail.

The solution is to use vzalloc to replace kcalloc. There is no need
for contiguous memory pages for a driver meta data structure (no need
of DMA ops).

Signed-off-by: Qing Huang 
Acked-by: Daniel Jurgens 
Reviewed-by: Zhu Yanjun 
---
v2 -> v1: adjusted chunk size to reflect different architectures.

  drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c 
b/drivers/net/ethernet/mellanox/mlx4/icm.c

index a822f7a..ccb62b8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,12 @@
  #include "fw.h"
    /*
- * We allocate in as big chunks as we can, up to a maximum of 256 KB
- * per chunk.
+ * We allocate in page size (default 4KB on many archs) chunks to 
avoid high
+ * order memory allocations in fragmented/high usage memory 
situation.

   */
  enum {
-    MLX4_ICM_ALLOC_SIZE    = 1 << 18,
-    MLX4_TABLE_CHUNK_SIZE    = 1 << 18
+    MLX4_ICM_ALLOC_SIZE    = 1 << PAGE_SHIFT,
+    MLX4_TABLE_CHUNK_SIZE    = 1 << PAGE_SHIFT


Which is actually PAGE_SIZE.


Yes, we wanted to avoid high order memory allocations.



Then please use PAGE_SIZE instead.


PAGE_SIZE is usually defined as 1 << PAGE_SHIFT. So I think PAGE_SHIFT 
is actually more appropriate here.




Definition of PAGE_SIZE varies among different archs.
It is not always as simple as 1 << PAGE_SHIFT.
It might be:
PAGE_SIZE (1UL << PAGE_SHIFT)
PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT)
etc...

Please replace 1 << PAGE_SHIFT with PAGE_SIZE.






Also, please add a comma at the end of the last entry.


Hmm..., followed the existing code style and checkpatch.pl didn't 
complain about the comma.




I am in favor of having a comma also after the last element, so that 
when another enum element is added we do not modify this line again, 
which would falsely affect git blame.


I know it didn't exist before your patch, but once we're here, let's 
do it.


I'm okay either way. If adding an extra comma is preferred by many 
people, someone should update checkpatch.pl to enforce it. :)



I agree.
Until then, please use an extra comma in this patch.






  };
    static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct 
mlx4_icm_chunk *chunk)
@@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, 
struct mlx4_icm_table *table,

  obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
  num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
  -    table->icm  = kcalloc(num_icm, sizeof(*table->icm), 
GFP_KERNEL);

+    table->icm  = vzalloc(num_icm * sizeof(*table->icm));


Why not kvzalloc ?


I think table->icm pointer array doesn't really need physically 
contiguous memory. Sometimes high order
memory allocation by kmalloc variants may trigger slow path and cause 
tasks to be blocked.




This is control path so it is less latency-sensitive.
Let's not produce unnecessary degradation here, please call kvzalloc 
so we maintain a similar behavior when contiguous memory is available, 
and a fallback for resiliency.


No sure what exactly degradation is caused by vzalloc here. I think it's 
better to keep physically contiguous pages
to other requests which really need them. Besides slow path/mem 
compacting can be really expensive.




Degradation is expected when you replace a contig memory with non-contig 
memory, without any perf test.
We agree that when contig memory is not available, we should use 
non-contig instead of simply failing, and for this you can call kvzalloc.





Thanks,
Qing




  if (!table->icm)
  return -ENOMEM;
  table->virt = virt;
@@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev 

Re: linux-next: BUG: KASAN: use-after-free in tun_chr_close

2018-05-16 Thread Andrei Vagin
Hi Jason,

I think the problem is in "tun: hold a tun socket during ptr_ring_cleanup".

Pls take a look at the attached patch.


On Tue, May 15, 2018 at 11:28:25PM -0700, Andrei Vagin wrote:
> We run CRIU tests on linux-next regularly and today we caught this bug:
> 
> https://travis-ci.org/avagin/linux/jobs/379450631
> 
> [   50.264837] 
> ==
> [   50.264986] BUG: KASAN: use-after-free in 
> __lock_acquire.isra.30+0x1ad4/0x1bb0
> [   50.265088] Read of size 8 at addr 88018e1728f8 by task criu/1819
> [   50.265167] 
> [   50.265249] CPU: 0 PID: 1819 Comm: criu Not tainted 
> 4.17.0-rc5-next-20180515+ #1
> [   50.265251] Hardware name: Google Google Compute Engine/Google Compute 
> Engine, BIOS Google 01/01/2011
> [   50.265252] Call Trace:
> [   50.265262]  dump_stack+0x71/0xab
> [   50.265265]  ? __lock_acquire.isra.30+0x1ad4/0x1bb0
> [   50.265271]  print_address_description+0x6a/0x270
> [   50.265273]  ? __lock_acquire.isra.30+0x1ad4/0x1bb0
> [   50.265275]  kasan_report+0x237/0x360
> [   50.265278]  __lock_acquire.isra.30+0x1ad4/0x1bb0
> [   50.265285]  ? register_netdev+0x30/0x30
> [   50.265288]  lock_acquire+0x10b/0x2a0
> [   50.265294]  ? tun_chr_close+0x1d7/0x4c0
> [   50.265298]  ? kfree+0xd6/0x1f0
> [   50.265303]  _raw_spin_lock+0x25/0x30
> [   50.265306]  ? tun_chr_close+0x1d7/0x4c0
> [   50.265308]  tun_chr_close+0x1d7/0x4c0
> [   50.265313]  ? fcntl_setlk+0xaf0/0xaf0
> [   50.265320]  __fput+0x251/0x770
> [   50.265324]  task_work_run+0x10e/0x180
> [   50.265330]  exit_to_usermode_loop+0xcb/0xf0
> [   50.265332]  do_syscall_64+0x21d/0x280
> [   50.265335]  ? prepare_exit_to_usermode+0x88/0x130
> [   50.265338]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   50.265342] RIP: 0033:0x1494fa6f93f0
> [   50.265342] Code: 73 01 c3 48 8b 0d b8 9b 20 00 f7 d8 64 89 01 48 83 c8 ff 
> c3 66 0f 1f 44 00 00 83 3d 59 e0 20 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 
> 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 0e fa ff ff 48 89 04 24 
> [   50.265388] RSP: 002b:7ffd229fe7f8 EFLAGS: 0246 ORIG_RAX: 
> 0003
> [   50.265391] RAX:  RBX: 0004 RCX: 
> 1494fa6f93f0
> [   50.265393] RDX: 7ffd229fe80c RSI: 400454da RDI: 
> 0004
> [   50.265395] RBP:  R08: 420b R09: 
> 
> [   50.265396] R10:  R11: 0246 R12: 
> 1494fab116a0
> [   50.265398] R13: 0d06 R14:  R15: 
> 
> [   50.265400] 
> [   50.265476] Allocated by task 1819:
> [   50.265554]  kasan_kmalloc+0xa0/0xd0
> [   50.265556]  __kmalloc+0x13a/0x250
> [   50.265561]  sk_prot_alloc+0xd3/0x250
> [   50.265564]  sk_alloc+0x35/0x9d0
> [   50.265566]  tun_chr_open+0x7b/0x5a0
> [   50.265570]  misc_open+0x313/0x480
> [   50.265573]  chrdev_open+0x1d6/0x4b0
> [   50.265575]  do_dentry_open+0x6ae/0xee0
> [   50.265578]  path_openat+0xce6/0x2890
> [   50.265580]  do_filp_open+0x17a/0x270
> [   50.265582]  do_sys_open+0x203/0x340
> [   50.265584]  do_syscall_64+0xa0/0x280
> [   50.265586]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   50.265587] 
> [   50.265667] Freed by task 1819:
> [   50.265745]  __kasan_slab_free+0x130/0x180
> [   50.265747]  kfree+0xd6/0x1f0
> [   50.265750]  __sk_destruct+0x46f/0x580
> [   50.265752]  tun_chr_close+0x330/0x4c0
> [   50.265754]  __fput+0x251/0x770
> [   50.265756]  task_work_run+0x10e/0x180
> [   50.265758]  exit_to_usermode_loop+0xcb/0xf0
> [   50.265760]  do_syscall_64+0x21d/0x280
> [   50.265762]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   50.265762] 
> [   50.265840] The buggy address belongs to the object at 88018e172200
> [   50.265840]  which belongs to the cache kmalloc-2048 of size 2048
> [   50.265927] The buggy address is located 1784 bytes inside of
> [   50.265927]  2048-byte region [88018e172200, 88018e172a00)
> [   50.266011] The buggy address belongs to the page:
> [   50.266089] page:ea0006385c00 count:1 mapcount:0 
> mapping: index:0x0 compound_mapcount: 0
> [   50.266178] flags: 0x17fff808100(slab|head)
> [   50.266257] raw: 017fff808100   
> 0001800f000f
> [   50.266342] raw: dead0100 dead0200 8801d9016800 
> 
> [   50.266425] page dumped because: kasan: bad access detected
> [   50.266501] 
> [   50.266590] Memory state around the buggy address:
> [   50.266693]  88018e172780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb
> [   50.266776]  88018e172800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb
> [   50.266860] >88018e172880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb
> [   50.266943]
>  ^
> [   50.267020]  88018e172900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb
> [   50.267103]  88018e172980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb 

Re: [PATCH 08/14] net: sched: account for temporary action reference

2018-05-16 Thread Jiri Pirko
Mon, May 14, 2018 at 04:27:09PM CEST, vla...@mellanox.com wrote:
>tca_get_fill function has 'bind' and 'ref' arguments that get passed
>down to action dump function. These arguments values are subtracted from
>actual reference and bind counter values before writing them to skb.
>
>In order to prevent concurrent action delete, RTM_GETACTION handler
>acquires a reference to action before 'dumping' it and releases it
>afterwards. This reference is temporal and should not be accounted by
>userspace clients. (both logically and to preserver current API
>behavior)
>
>Use existing infrastructure of tca_get_fill arguments to subtract that
>temporary reference and not expose it to userspace.
>
>Signed-off-by: Vlad Buslov 
>---
> net/sched/act_api.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>index 3f02cd1..2772276e 100644
>--- a/net/sched/act_api.c
>+++ b/net/sched/act_api.c
>@@ -935,7 +935,7 @@ tcf_get_notify(struct net *net, u32 portid, struct 
>nlmsghdr *n,
>   if (!skb)
>   return -ENOBUFS;
>   if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, 0, event,
>-   0, 0) <= 0) {
>+   0, 1) <= 0) {
>   NL_SET_ERR_MSG(extack, "Failed to fill netlink attributes while 
> adding TC action");
>   kfree_skb(skb);
>   return -EINVAL;
>@@ -1125,7 +1125,7 @@ tcf_del_notify(struct net *net, struct nlmsghdr *n, 
>struct list_head *actions,
>   return -ENOBUFS;
> 
>   if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, 0, RTM_DELACTION,
>-   0, 1) <= 0) {
>+   0, 2) <= 0) {

So now you are adjusting dump because of a change in a different patch
right? This also breaks bisect.


Re: [PATCH 00/14] Modify action API for implementing lockless actions

2018-05-16 Thread Vlad Buslov

On Tue 15 May 2018 at 22:07, Lucas Bates  wrote:
> On Tue, May 15, 2018 at 6:03 PM, Lucas Bates  wrote:
>> On Tue, May 15, 2018 at 5:49 PM, Jamal Hadi Salim  wrote:
 Test 7d50: Add skbmod action to set destination mac
 exit: 255 0
 dst MAC address <11:22:33:44:55:66>
 RTNETLINK answers: No such file or directory
 We have an error talking to the kernel

>>>
>>> You may actually have broken something with your patches in this case.
>>>
>>> Lucas - does this test pass on latest net-next?
>>
>> Yes, 7d50 has been passing on our builds for at least the last month.
>
> Also, Vlad, you can look at the JSON to see the test case data, or run
> tdc.py -s | less and search for the ID to see the commands being run.
> I'm here if you need help using tdc.

Hello Lucas,

I'll look into JSON test definition and try to understand whats wrong.


Re: KMSAN: uninit-value in __sctp_v6_cmp_addr

2018-05-16 Thread Xin Long
On Wed, May 16, 2018 at 12:25 AM, syzbot
 wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:74ee2200b89f kmsan: bump .config.example to v4.17-rc3
> git tree:   https://github.com/google/kmsan.git/master
> console output: https://syzkaller.appspot.com/x/log.txt?x=169efb5b80
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4ca1e57bafa8ab1f
> dashboard link: https://syzkaller.appspot.com/bug?extid=85490c30c260afff22f2
> compiler:   clang version 7.0.0 (trunk 329391)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=157e923780
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10fe5de780
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+85490c30c260afff2...@syzkaller.appspotmail.com
>
> random: sshd: uninitialized urandom read (32 bytes read)
> random: sshd: uninitialized urandom read (32 bytes read)
> random: sshd: uninitialized urandom read (32 bytes read)
> random: sshd: uninitialized urandom read (32 bytes read)
> ==
> BUG: KMSAN: uninit-value in __sctp_v6_cmp_addr+0x49a/0x850
> net/sctp/ipv6.c:580
> CPU: 0 PID: 4453 Comm: syz-executor325 Not tainted 4.17.0-rc3+ #88
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x185/0x1d0 lib/dump_stack.c:113
>  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
>  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
>  __sctp_v6_cmp_addr+0x49a/0x850 net/sctp/ipv6.c:580
Pls check if the testing kernel has this commit:
commit d625329b06e46bd20baf9ee40847d11982569204
Author: Xin Long 
Date:   Thu Apr 26 14:13:57 2018 +0800

sctp: handle two v4 addrs comparison in sctp_inet6_cmp_addr

Thanks.

>  sctp_inet6_cmp_addr+0x3dc/0x400 net/sctp/ipv6.c:898
>  sctp_bind_addr_match+0x18b/0x2f0 net/sctp/bind_addr.c:330
>  sctp_addrs_lookup_transport+0x904/0xa20 net/sctp/input.c:942
>  __sctp_lookup_association net/sctp/input.c:985 [inline]
>  __sctp_rcv_lookup net/sctp/input.c:1249 [inline]
>  sctp_rcv+0x15e6/0x4d30 net/sctp/input.c:170
>  ip_local_deliver_finish+0x874/0xec0 net/ipv4/ip_input.c:215
>  NF_HOOK include/linux/netfilter.h:288 [inline]
>  ip_local_deliver+0x43c/0x4e0 net/ipv4/ip_input.c:256
>  dst_input include/net/dst.h:450 [inline]
>  ip_rcv_finish+0xa36/0x1d00 net/ipv4/ip_input.c:396
>  NF_HOOK include/linux/netfilter.h:288 [inline]
>  ip_rcv+0x118f/0x16d0 net/ipv4/ip_input.c:492
>  __netif_receive_skb_core+0x47df/0x4a90 net/core/dev.c:4592
>  __netif_receive_skb net/core/dev.c:4657 [inline]
>  process_backlog+0x62d/0xe20 net/core/dev.c:5337
>  napi_poll net/core/dev.c:5735 [inline]
>  net_rx_action+0x7c1/0x1a70 net/core/dev.c:5801
>  __do_softirq+0x56d/0x93d kernel/softirq.c:285
>  do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1046
>  
>  do_softirq kernel/softirq.c:329 [inline]
>  __local_bh_enable_ip+0x114/0x140 kernel/softirq.c:182
>  local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
>  rcu_read_unlock_bh include/linux/rcupdate.h:728 [inline]
>  ip_finish_output2+0x135a/0x1470 net/ipv4/ip_output.c:231
>  ip_finish_output+0xcb2/0xff0 net/ipv4/ip_output.c:317
>  NF_HOOK_COND include/linux/netfilter.h:277 [inline]
>  ip_output+0x505/0x5d0 net/ipv4/ip_output.c:405
>  dst_output include/net/dst.h:444 [inline]
>  ip_local_out net/ipv4/ip_output.c:124 [inline]
>  ip_queue_xmit+0x1a1e/0x1d10 net/ipv4/ip_output.c:504
>  sctp_v4_xmit+0x188/0x210 net/sctp/protocol.c:983
>  sctp_packet_transmit+0x3eaa/0x4350 net/sctp/output.c:650
>  sctp_outq_flush+0x1a7a/0x6320 net/sctp/outqueue.c:1197
>  sctp_outq_uncork+0xd2/0xf0 net/sctp/outqueue.c:776
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>  sctp_do_sm+0x8707/0x8d20 net/sctp/sm_sideeffect.c:1191
>  sctp_primitive_REQUESTHEARTBEAT+0x175/0x1a0 net/sctp/primitive.c:200
>  sctp_apply_peer_addr_params+0x207/0x1670 net/sctp/socket.c:2487
>  sctp_setsockopt_peer_addr_params net/sctp/socket.c:2683 [inline]
>  sctp_setsockopt+0x10e5f/0x11600 net/sctp/socket.c:4258
>  sock_common_setsockopt+0x136/0x170 net/core/sock.c:3039
>  __sys_setsockopt+0x4af/0x560 net/socket.c:1903
>  __do_sys_setsockopt net/socket.c:1914 [inline]
>  __se_sys_setsockopt net/socket.c:1911 [inline]
>  __x64_sys_setsockopt+0x15c/0x1c0 net/socket.c:1911
>  do_syscall_64+0x154/0x220 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x43fef9
> RSP: 002b:7ffc00d9bfd8 EFLAGS: 0207 ORIG_RAX: 0036
> RAX: ffda RBX: 004002c8 RCX: 0043fef9
> RDX: 0009 RSI: 0084 RDI: 0003
> RBP: 006ca018 R08: 0098 R09: 001c
> R10: 2180 R11: 0207 R12: 00401820
> R13: 004018b0 R14:  R15: 

Re: xdp and fragments with virtio

2018-05-16 Thread Jason Wang



On 2018年05月16日 11:51, David Ahern wrote:

Hi Jason:

I am trying to test MTU changes to the BPF fib_lookup helper and seeing
something odd. Hoping you can help.

I have a VM with multiple virtio based NICs and tap backends. I install
the xdp program on eth1 and eth2 to do forwarding. In the host I send a
large packet to eth1:

$ ping -s 1500 9.9.9.9


The tap device in the host sees 2 packets:

$ sudo tcpdump -nv -i vm02-eth1
20:44:33.943160 IP (tos 0x0, ttl 64, id 58746, offset 0, flags [+],
proto ICMP (1), length 1500)
 10.100.1.254 > 9.9.9.9: ICMP echo request, id 17917, seq 1, length 1480
20:44:33.943172 IP (tos 0x0, ttl 64, id 58746, offset 1480, flags
[none], proto ICMP (1), length 48)
 10.100.1.254 > 9.9.9.9: ip-proto-1


In the VM, the XDP program only sees the first packet, not the fragment.
I added a printk to the program (see diff below):

$ cat trace_pipe
   -0 [003] ..s2   254.436467: 0: packet length 1514


Anything come to mind in the virtio xdp implementation that affects
fragment packets? I see this with both IPv4 and v6.


Not yet. But we do turn of tap gso when virtio has XDP set, but it 
shouldn't matter this case.


Will try to see what's wrong.

Thanks



Thanks,
David

[1] xdp program diff showing printk that dumps packet length:

diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index 4a6be0f87505..f119b506e782 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -52,6 +52,11 @@ static __always_inline int xdp_fwd_flags(struct
xdp_md *ctx, u32 flags)
 u16 h_proto;
 u64 nh_off;

+   {
+   char fmt[] = "packet length %u\n";
+
+   bpf_trace_printk(fmt, sizeof(fmt), ctx->data_end-ctx->data);
+   }
 nh_off = sizeof(*eth);
 if (data + nh_off > data_end)
 return XDP_DROP;





Re: linux-next: BUG: KASAN: use-after-free in tun_chr_close

2018-05-16 Thread Jason Wang



On 2018年05月16日 15:12, Andrei Vagin wrote:

Hi Jason,

I think the problem is in "tun: hold a tun socket during ptr_ring_cleanup".

Pls take a look at the attached patch.


Yes.

It looks to me it's not necessary to take extra refcnt during release, 
we can just do the cleanup at __tun_detach().


Could you help to test the attached patch?

Thanks

>From 4b5ad75208e379dcb32abb9ac4790a0446f8558b Mon Sep 17 00:00:00 2001
From: Jason Wang 
Date: Wed, 16 May 2018 15:26:52 +0800
Subject: [PATCH] tuntap: fix use after free during release

After commit b196d88aba8a ("tun: fix use after free for ptr_ring") we
need clean up tx ring during release(). But unfortunately, it tries to
do the cleanup after socket were destroyed which will lead another
use-after-free. Fix this by doing the cleanup before dropping the last
reference of the socket in __tun_detach().

Reported-by: Andrei Vagin 
Fixes: b196d88aba8a ("tun: fix use after free for ptr_ring")
Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 9fbbb32..d45ac37 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -729,6 +729,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 		}
 		if (tun)
 			xdp_rxq_info_unreg(&tfile->xdp_rxq);
+		ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
 		sock_put(&tfile->sk);
 	}
 }
@@ -3245,7 +3246,6 @@ static int tun_chr_close(struct inode *inode, struct file *file)
 	struct tun_file *tfile = file->private_data;
 
 	tun_detach(tfile, true);
-	ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
 
 	return 0;
 }
-- 
2.7.4



Re: linux-next: BUG: KASAN: use-after-free in tun_chr_close

2018-05-16 Thread Andrei Vagin
On Wed, May 16, 2018 at 03:32:59PM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月16日 15:12, Andrei Vagin wrote:
> > Hi Jason,
> > 
> > I think the problem is in "tun: hold a tun socket during ptr_ring_cleanup".
> > 
> > Pls take a look at the attached patch.
> 
> Yes.
> 
> It looks to me it's not necessary to take extra refcnt during release, we
> can just do the cleanup at __tun_detach().
> 
> Could you help to test the attached patch?

I've run my test on the kernel with this patch. It fixes the problem.
The patch looks correct for me.

Acked-by: Andrei Vagin 

> 
> Thanks
> 

> From 4b5ad75208e379dcb32abb9ac4790a0446f8558b Mon Sep 17 00:00:00 2001
> From: Jason Wang 
> Date: Wed, 16 May 2018 15:26:52 +0800
> Subject: [PATCH] tuntap: fix use after free during release
> 
> After commit b196d88aba8a ("tun: fix use after free for ptr_ring") we
> need clean up tx ring during release(). But unfortunately, it tries to
> do the cleanup after socket were destroyed which will lead another
> use-after-free. Fix this by doing the cleanup before dropping the last
> reference of the socket in __tun_detach().
> 
> Reported-by: Andrei Vagin 
> Fixes: b196d88aba8a ("tun: fix use after free for ptr_ring")
> Signed-off-by: Jason Wang 
> ---
>  drivers/net/tun.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 9fbbb32..d45ac37 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -729,6 +729,7 @@ static void __tun_detach(struct tun_file *tfile, bool 
> clean)
>   }
>   if (tun)
>   xdp_rxq_info_unreg(&tfile->xdp_rxq);
> + ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
>   sock_put(&tfile->sk);
>   }
>  }
> @@ -3245,7 +3246,6 @@ static int tun_chr_close(struct inode *inode, struct 
> file *file)
>   struct tun_file *tfile = file->private_data;
>  
>   tun_detach(tfile, true);
> - ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
>  
>   return 0;
>  }
> -- 
> 2.7.4
> 



Re: [PATCH 09/14] net: sched: don't release reference on action overwrite

2018-05-16 Thread Jiri Pirko
Mon, May 14, 2018 at 04:27:10PM CEST, vla...@mellanox.com wrote:
>Return from action init function with reference to action taken,
>even when overwriting existing action.
>
>Action init API initializes its fourth argument (pointer to pointer to
>tc action) to either existing action with same index or newly created
>action. In case of existing index(and bind argument is zero), init
>function returns without incrementing action reference counter. Caller
>of action init then proceeds working with action without actually
>holding reference to it. This means that action could be deleted
>concurrently. To prevent such scenario this patch changes action init

Be imperative to the codebase in the patch description.


>behavior to always take reference to action before returning
>successfully.

Where's the balance? Who does the release instead? I'm probably missing
something.

>
>Signed-off-by: Vlad Buslov 
>---
> net/sched/act_bpf.c|  8 
> net/sched/act_connmark.c   |  5 +++--
> net/sched/act_csum.c   |  8 
> net/sched/act_gact.c   |  5 +++--
> net/sched/act_ife.c| 12 +---
> net/sched/act_ipt.c|  5 +++--
> net/sched/act_mirred.c |  5 ++---
> net/sched/act_nat.c|  5 +++--
> net/sched/act_pedit.c  |  5 +++--
> net/sched/act_police.c |  8 +++-
> net/sched/act_sample.c |  8 +++-
> net/sched/act_simple.c |  5 +++--
> net/sched/act_skbedit.c|  5 +++--
> net/sched/act_skbmod.c |  8 +++-
> net/sched/act_tunnel_key.c |  8 +++-
> net/sched/act_vlan.c   |  8 +++-
> 16 files changed, 51 insertions(+), 57 deletions(-)
>
>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>index 5d95c43..5554bf7 100644
>--- a/net/sched/act_bpf.c
>+++ b/net/sched/act_bpf.c
>@@ -311,9 +311,10 @@ static int tcf_bpf_init(struct net *net, struct nlattr 
>*nla,
>   if (bind)
>   return 0;
> 
>-  tcf_idr_release(*act, bind);
>-  if (!replace)
>+  if (!replace) {
>+  tcf_idr_release(*act, bind);
>   return -EEXIST;
>+  }
>   }
> 
>   is_bpf = tb[TCA_ACT_BPF_OPS_LEN] && tb[TCA_ACT_BPF_OPS];

[...]



Re: [PATCH 09/14] net: sched: don't release reference on action overwrite

2018-05-16 Thread Vlad Buslov

On Wed 16 May 2018 at 07:43, Jiri Pirko  wrote:
> Mon, May 14, 2018 at 04:27:10PM CEST, vla...@mellanox.com wrote:
>>Return from action init function with reference to action taken,
>>even when overwriting existing action.
>>
>>Action init API initializes its fourth argument (pointer to pointer to
>>tc action) to either existing action with same index or newly created
>>action. In case of existing index(and bind argument is zero), init
>>function returns without incrementing action reference counter. Caller
>>of action init then proceeds working with action without actually
>>holding reference to it. This means that action could be deleted
>>concurrently. To prevent such scenario this patch changes action init
>
> Be imperative to the codebase in the patch description.
>
>
>>behavior to always take reference to action before returning
>>successfully.
>
> Where's the balance? Who does the release instead? I'm probably missing
> something.

I've resplit these patches for V2 to always do take/release in same
patch.

>
>>
>>Signed-off-by: Vlad Buslov 
>>---
>> net/sched/act_bpf.c|  8 
>> net/sched/act_connmark.c   |  5 +++--
>> net/sched/act_csum.c   |  8 
>> net/sched/act_gact.c   |  5 +++--
>> net/sched/act_ife.c| 12 +---
>> net/sched/act_ipt.c|  5 +++--
>> net/sched/act_mirred.c |  5 ++---
>> net/sched/act_nat.c|  5 +++--
>> net/sched/act_pedit.c  |  5 +++--
>> net/sched/act_police.c |  8 +++-
>> net/sched/act_sample.c |  8 +++-
>> net/sched/act_simple.c |  5 +++--
>> net/sched/act_skbedit.c|  5 +++--
>> net/sched/act_skbmod.c |  8 +++-
>> net/sched/act_tunnel_key.c |  8 +++-
>> net/sched/act_vlan.c   |  8 +++-
>> 16 files changed, 51 insertions(+), 57 deletions(-)
>>
>>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>>index 5d95c43..5554bf7 100644
>>--- a/net/sched/act_bpf.c
>>+++ b/net/sched/act_bpf.c
>>@@ -311,9 +311,10 @@ static int tcf_bpf_init(struct net *net, struct nlattr 
>>*nla,
>>  if (bind)
>>  return 0;
>> 
>>- tcf_idr_release(*act, bind);
>>- if (!replace)
>>+ if (!replace) {
>>+ tcf_idr_release(*act, bind);
>>  return -EEXIST;
>>+ }
>>  }
>> 
>>  is_bpf = tb[TCA_ACT_BPF_OPS_LEN] && tb[TCA_ACT_BPF_OPS];
>
> [...]



RE: i40e - Is i40e_force_link_state doing the right thing ?

2018-05-16 Thread Stachura, Mariusz
> Hi Mariusz, ...
>
> On Tue, May 15, 2018 at 2:24 PM, Stachura, Mariusz 
>  wrote:
>> On Tue, May 15, 2018 at 1:15 PM, Chaitanya Lala  
>> wrote:
>>> Hi,
>>>
>>> I am trying to bring up a Intel XL710 4x10G Intel card using the 
>>> latest mainline top-of-tree.
>>> The problem is that "ifconfig up" and "ifconfig down" do not take 
>>> effect at the link state level.
>>> I tracked the problem down to i40e_force_link_state() when it is 
>>> called from i40e_down().
>>> It calls i40e_force_link_state with "is_up" == false. In-turn it 
>>> calls, i40e_aq_set_link_restart_an(hw, true, NULL).
>>>
>>> Should the second argument of  i40e_aq_set_link_restart_an be "is_up"
>>> vs the current "true"
>>> i.e. i40e_aq_set_link_restart_an(hw, is_up, NULL). ? When I make this 
>>> change, the link state syncs-up with the interface administrative 
>>> state.
>>>
>>> Is this a bug ?
>>>
>>> Thanks,
>>>  Chaitanya
>>
>> Hello Chaitanya,
>>
>> i40e_down() calls i40e_force_link_state with "is_up" == false only if 
>> interface's private flag "link-down-on-close" is set. By default the link is 
>> left up for manageability and VF traffic, user can use this flag to power 
>> down the interface on the link level. Does that work for you?
>> The command is:
>> "ethtool --set-priv-flags IFNAME link-down-on-close on" and then
>
> This flag is _on_ in my setup and hencet i40e_force_link_state is being 
> called with is_up == false in my setup. The problem is that irrespective of 
> value of "is_up" flag, i40e_force_link_state invokes 
> i40e_aq_set_link_restart_an with second argument (enable_link) as "true". So 
> i40e_aq_set_link_restart_an is always trying to enable link even if is_up was 
> false. Is that correct behavior ?
>
> I have pasted code with my annotations below marked with "//XXX".
> (...)
> Thanks,
> Chaitanya

Hey,
i40e_aq_set_link_restart_an has second argument set to "true" intentionally, as 
I understand the "link-down-on-close" does not work for you, right? I will 
double check if this feature works for me and get back to you, thank you again.


Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). If you are not the intended recipient, please 
contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.


Re: [PATCH 10/14] net: sched: extend act API for lockless actions

2018-05-16 Thread Jiri Pirko
Mon, May 14, 2018 at 04:27:11PM CEST, vla...@mellanox.com wrote:
>Implement new action API function to atomically delete action with
>specified index and to atomically insert unique action. These functions are
>required to implement init and delete functions for specific actions that
>do not rely on rtnl lock.
>
>Signed-off-by: Vlad Buslov 
>---
> include/net/act_api.h |  2 ++
> net/sched/act_api.c   | 45 +
> 2 files changed, 47 insertions(+)
>
>diff --git a/include/net/act_api.h b/include/net/act_api.h
>index a8c8570..bce0cf1 100644
>--- a/include/net/act_api.h
>+++ b/include/net/act_api.h
>@@ -153,7 +153,9 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
>struct nlattr *est,
>  struct tc_action **a, const struct tc_action_ops *ops,
>  int bind, bool cpustats);
> void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
>+void tcf_idr_insert_unique(struct tc_action_net *tn, struct tc_action *a);
> 
>+int tcf_idr_find_delete(struct tc_action_net *tn, u32 index);
> int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);
> 
> static inline int tcf_idr_release(struct tc_action *a, bool bind)
>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>index 2772276e..a5193dc 100644
>--- a/net/sched/act_api.c
>+++ b/net/sched/act_api.c
>@@ -330,6 +330,41 @@ bool tcf_idr_check(struct tc_action_net *tn, u32 index, 
>struct tc_action **a,
> }
> EXPORT_SYMBOL(tcf_idr_check);
> 
>+int tcf_idr_find_delete(struct tc_action_net *tn, u32 index)
>+{
>+  struct tcf_idrinfo *idrinfo = tn->idrinfo;
>+  struct tc_action *p;
>+  int ret = 0;
>+
>+  spin_lock_bh(&idrinfo->lock);

Why "_bh" is needed here?


>+  p = idr_find(&idrinfo->action_idr, index);
>+  if (!p) {
>+  spin_unlock(&idrinfo->lock);
>+  return -ENOENT;
>+  }
>+
>+  if (!atomic_read(&p->tcfa_bindcnt)) {
>+  if (refcount_dec_and_test(&p->tcfa_refcnt)) {
>+  struct module *owner = p->ops->owner;
>+
>+  WARN_ON(p != idr_remove(&idrinfo->action_idr,
>+  p->tcfa_index));
>+  spin_unlock_bh(&idrinfo->lock);
>+
>+  tcf_action_cleanup(p);
>+  module_put(owner);
>+  return 0;
>+  }
>+  ret = 0;
>+  } else {
>+  ret = -EPERM;

I wonder if "-EPERM" is the best error code for this...


>+  }
>+
>+  spin_unlock_bh(&idrinfo->lock);
>+  return ret;
>+}
>+EXPORT_SYMBOL(tcf_idr_find_delete);
>+
> int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>  struct tc_action **a, const struct tc_action_ops *ops,
>  int bind, bool cpustats)
>@@ -407,6 +442,16 @@ void tcf_idr_insert(struct tc_action_net *tn, struct 
>tc_action *a)
> }
> EXPORT_SYMBOL(tcf_idr_insert);
> 
>+void tcf_idr_insert_unique(struct tc_action_net *tn, struct tc_action *a)
>+{
>+  struct tcf_idrinfo *idrinfo = tn->idrinfo;
>+
>+  spin_lock_bh(&idrinfo->lock);
>+  WARN_ON(idr_replace(&idrinfo->action_idr, a, a->tcfa_index));

Under which condition this WARN_ON is hit?


>+  spin_unlock_bh(&idrinfo->lock);
>+}
>+EXPORT_SYMBOL(tcf_idr_insert_unique);
>+
> void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
>struct tcf_idrinfo *idrinfo)
> {
>-- 
>2.7.5
>


Re: linux-next: BUG: KASAN: use-after-free in tun_chr_close

2018-05-16 Thread Jason Wang



On 2018年05月16日 15:40, Andrei Vagin wrote:

On Wed, May 16, 2018 at 03:32:59PM +0800, Jason Wang wrote:

On 2018年05月16日 15:12, Andrei Vagin wrote:

Hi Jason,

I think the problem is in "tun: hold a tun socket during ptr_ring_cleanup".

Pls take a look at the attached patch.

Yes.

It looks to me it's not necessary to take extra refcnt during release, we
can just do the cleanup at __tun_detach().

Could you help to test the attached patch?

I've run my test on the kernel with this patch. It fixes the problem.
The patch looks correct for me.

Acked-by: Andrei Vagin



Cool, thanks a lot!

Let me post a formal patch.



Re: [PATCH net-next 2/2] pfifo_fast: drop unneeded additional lock on dequeue

2018-05-16 Thread Paolo Abeni
On Tue, 2018-05-15 at 23:17 +0300, Michael S. Tsirkin wrote:
> On Tue, May 15, 2018 at 04:24:37PM +0200, Paolo Abeni wrote:
> > After the previous patch, for NOLOCK qdiscs, q->seqlock is
> > always held when the dequeue() is invoked, we can drop
> > any additional locking to protect such operation.
> > 
> > Signed-off-by: Paolo Abeni 
> > ---
> >  include/linux/skb_array.h | 5 +
> >  net/sched/sch_generic.c   | 4 ++--
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> Is the seqlock taken during qdisc_change_tx_queue_len?
> We need to prevent that racing with dequeue.

Thanks for the head-up! I missed that code-path.

I'll add the lock in qdisc_change_tx_queue_len() in v2.

Thanks you,

Paolo


Re: [PATCH 09/14] net: sched: don't release reference on action overwrite

2018-05-16 Thread Jiri Pirko
Wed, May 16, 2018 at 09:47:32AM CEST, vla...@mellanox.com wrote:
>
>On Wed 16 May 2018 at 07:43, Jiri Pirko  wrote:
>> Mon, May 14, 2018 at 04:27:10PM CEST, vla...@mellanox.com wrote:
>>>Return from action init function with reference to action taken,
>>>even when overwriting existing action.
>>>
>>>Action init API initializes its fourth argument (pointer to pointer to
>>>tc action) to either existing action with same index or newly created
>>>action. In case of existing index(and bind argument is zero), init
>>>function returns without incrementing action reference counter. Caller
>>>of action init then proceeds working with action without actually
>>>holding reference to it. This means that action could be deleted
>>>concurrently. To prevent such scenario this patch changes action init
>>
>> Be imperative to the codebase in the patch description.
>>
>>
>>>behavior to always take reference to action before returning
>>>successfully.
>>
>> Where's the balance? Who does the release instead? I'm probably missing
>> something.
>
>I've resplit these patches for V2 to always do take/release in same
>patch.

Good. Thanks.

>
>>
>>>
>>>Signed-off-by: Vlad Buslov 
>>>---
>>> net/sched/act_bpf.c|  8 
>>> net/sched/act_connmark.c   |  5 +++--
>>> net/sched/act_csum.c   |  8 
>>> net/sched/act_gact.c   |  5 +++--
>>> net/sched/act_ife.c| 12 +---
>>> net/sched/act_ipt.c|  5 +++--
>>> net/sched/act_mirred.c |  5 ++---
>>> net/sched/act_nat.c|  5 +++--
>>> net/sched/act_pedit.c  |  5 +++--
>>> net/sched/act_police.c |  8 +++-
>>> net/sched/act_sample.c |  8 +++-
>>> net/sched/act_simple.c |  5 +++--
>>> net/sched/act_skbedit.c|  5 +++--
>>> net/sched/act_skbmod.c |  8 +++-
>>> net/sched/act_tunnel_key.c |  8 +++-
>>> net/sched/act_vlan.c   |  8 +++-
>>> 16 files changed, 51 insertions(+), 57 deletions(-)
>>>
>>>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>>>index 5d95c43..5554bf7 100644
>>>--- a/net/sched/act_bpf.c
>>>+++ b/net/sched/act_bpf.c
>>>@@ -311,9 +311,10 @@ static int tcf_bpf_init(struct net *net, struct nlattr 
>>>*nla,
>>> if (bind)
>>> return 0;
>>> 
>>>-tcf_idr_release(*act, bind);
>>>-if (!replace)
>>>+if (!replace) {
>>>+tcf_idr_release(*act, bind);
>>> return -EEXIST;
>>>+}
>>> }
>>> 
>>> is_bpf = tb[TCA_ACT_BPF_OPS_LEN] && tb[TCA_ACT_BPF_OPS];
>>
>> [...]
>


INFO: rcu detected stall in sctp_packet_transmit

2018-05-16 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:961423f9fcbc Merge branch 'sctp-Introduce-sctp_flush_ctx'
git tree:   net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1366aea780
kernel config:  https://syzkaller.appspot.com/x/.config?x=51fb0a6913f757db
dashboard link: https://syzkaller.appspot.com/bug?extid=ff0b569fb5111dcd1a36
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ff0b569fb5111dcd1...@syzkaller.appspotmail.com

INFO: rcu_sched self-detected stall on CPU
	0-: (1 GPs behind) idle=dae/1/4611686018427387908 softirq=93090/93091  
fqs=30902

 (t=125000 jiffies g=51107 c=51106 q=972)
NMI backtrace for cpu 0
CPU: 0 PID: 24668 Comm: syz-executor6 Not tainted 4.17.0-rc4+ #44
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
 nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
 rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
 print_cpu_stall kernel/rcu/tree.c:1525 [inline]
 check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
 __rcu_pending kernel/rcu/tree.c:3356 [inline]
 rcu_pending kernel/rcu/tree.c:3401 [inline]
 rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
 update_process_times+0x2d/0x70 kernel/time/timer.c:1636
 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
 __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
 __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
 hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
 smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
RIP: 0010:sctp_v6_xmit+0x259/0x6b0 net/sctp/ipv6.c:219
RSP: 0018:8801dae068e8 EFLAGS: 0246 ORIG_RAX: ff13
RAX: 0007 RBX: 8801bb7ec800 RCX: 86f1b345
RDX:  RSI: 86f1b381 RDI: 8801b73d97c4
RBP: 8801dae06988 R08: 88019505c300 R09: ed003b5c46c2
R10: ed003b5c46c2 R11: 8801dae23613 R12: 88011fd57300
R13: 8801bb7ecec8 R14: 0029 R15: 0002
 sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:642
 sctp_outq_flush_transports net/sctp/outqueue.c:1164 [inline]
 sctp_outq_flush+0x5f5/0x3430 net/sctp/outqueue.c:1212
 sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
 sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
 sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
 sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
 sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
 invoke_softirq kernel/softirq.c:365 [inline]
 irq_exit+0x1d1/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:525 [inline]
 smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
 
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783  
[inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160  
[inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0  
kernel/locking/spinlock.c:184

RSP: 0018:880196227328 EFLAGS: 0286 ORIG_RAX: ff13
RAX: dc00 RBX: 0286 RCX: 
RDX: 111a316d RSI: 0001 RDI: 0286
RBP: 880196227338 R08: ed003b5c4b81 R09: 
R10:  R11:  R12: 8801dae25c00
R13: 8801dae25c80 R14: 880196227758 R15: 8801dae25c00
 unlock_hrtimer_base kernel/time/hrtimer.c:887 [inline]
 hrtimer_start_range_ns+0x692/0xd10 kernel/time/hrtimer.c:1118
 hrtimer_start_expires include/linux/hrtimer.h:412 [inline]
 futex_wait_queue_me+0x304/0x820 kernel/futex.c:2517
 futex_wait+0x450/0x9f0 kernel/futex.c:2645
 do_futex+0x336/0x27d0 kernel/futex.c:3527
 __do_sys_futex kernel/futex.c:3587 [inline]
 __se_sys_futex kernel/futex.c:3555 [inline]
 __x64_sys_futex+0x46a/0x680 kernel/futex.c:3555
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455a09
RSP: 002b:00a3e938 EFLAGS: 0246 ORIG_RAX: 00ca
RAX: fff

Re: [PATCH 10/14] net: sched: extend act API for lockless actions

2018-05-16 Thread Vlad Buslov

On Wed 16 May 2018 at 07:50, Jiri Pirko  wrote:
> Mon, May 14, 2018 at 04:27:11PM CEST, vla...@mellanox.com wrote:
>>Implement new action API function to atomically delete action with
>>specified index and to atomically insert unique action. These functions are
>>required to implement init and delete functions for specific actions that
>>do not rely on rtnl lock.
>>
>>Signed-off-by: Vlad Buslov 
>>---
>> include/net/act_api.h |  2 ++
>> net/sched/act_api.c   | 45 +
>> 2 files changed, 47 insertions(+)
>>
>>diff --git a/include/net/act_api.h b/include/net/act_api.h
>>index a8c8570..bce0cf1 100644
>>--- a/include/net/act_api.h
>>+++ b/include/net/act_api.h
>>@@ -153,7 +153,9 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
>>struct nlattr *est,
>> struct tc_action **a, const struct tc_action_ops *ops,
>> int bind, bool cpustats);
>> void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
>>+void tcf_idr_insert_unique(struct tc_action_net *tn, struct tc_action *a);
>> 
>>+int tcf_idr_find_delete(struct tc_action_net *tn, u32 index);
>> int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);
>> 
>> static inline int tcf_idr_release(struct tc_action *a, bool bind)
>>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>>index 2772276e..a5193dc 100644
>>--- a/net/sched/act_api.c
>>+++ b/net/sched/act_api.c
>>@@ -330,6 +330,41 @@ bool tcf_idr_check(struct tc_action_net *tn, u32 index, 
>>struct tc_action **a,
>> }
>> EXPORT_SYMBOL(tcf_idr_check);
>> 
>>+int tcf_idr_find_delete(struct tc_action_net *tn, u32 index)
>>+{
>>+ struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>+ struct tc_action *p;
>>+ int ret = 0;
>>+
>>+ spin_lock_bh(&idrinfo->lock);
>
> Why "_bh" is needed here?

Original idr remove function used _bh version so I used it here as well.
As I already replied to your previous question about idrinfo lock usage,
I don't see any particular reason for locking with _bh at this point.
I've contacted the author(Chris Mi) and he said that he just preserved
locking the same way as it was before he changed hash table to idr for
action lookup.

You want me to do standalone patch that cleans up idrinfo locking?

>
>
>>+ p = idr_find(&idrinfo->action_idr, index);
>>+ if (!p) {
>>+ spin_unlock(&idrinfo->lock);
>>+ return -ENOENT;
>>+ }
>>+
>>+ if (!atomic_read(&p->tcfa_bindcnt)) {
>>+ if (refcount_dec_and_test(&p->tcfa_refcnt)) {
>>+ struct module *owner = p->ops->owner;
>>+
>>+ WARN_ON(p != idr_remove(&idrinfo->action_idr,
>>+ p->tcfa_index));
>>+ spin_unlock_bh(&idrinfo->lock);
>>+
>>+ tcf_action_cleanup(p);
>>+ module_put(owner);
>>+ return 0;
>>+ }
>>+ ret = 0;
>>+ } else {
>>+ ret = -EPERM;
>
> I wonder if "-EPERM" is the best error code for this...

This is what original code returned so I decided to preserve
compatibility.

>
>
>>+ }
>>+
>>+ spin_unlock_bh(&idrinfo->lock);
>>+ return ret;
>>+}
>>+EXPORT_SYMBOL(tcf_idr_find_delete);
>>+
>> int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>> struct tc_action **a, const struct tc_action_ops *ops,
>> int bind, bool cpustats)
>>@@ -407,6 +442,16 @@ void tcf_idr_insert(struct tc_action_net *tn, struct 
>>tc_action *a)
>> }
>> EXPORT_SYMBOL(tcf_idr_insert);
>> 
>>+void tcf_idr_insert_unique(struct tc_action_net *tn, struct tc_action *a)
>>+{
>>+ struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>+
>>+ spin_lock_bh(&idrinfo->lock);
>>+ WARN_ON(idr_replace(&idrinfo->action_idr, a, a->tcfa_index));
>
> Under which condition this WARN_ON is hit?

When idr replace returns non-NULL pointer, which means that somehow
concurrent insertion of action with same index has happened and we are
leaking memory.

By the way I'm still not sure if having this insert unique function is
warranted or I should just add WARN to regular idr insert. What is your
opinion on this?

>
>
>>+ spin_unlock_bh(&idrinfo->lock);
>>+}
>>+EXPORT_SYMBOL(tcf_idr_insert_unique);
>>+
>> void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
>>   struct tcf_idrinfo *idrinfo)
>> {
>>-- 
>>2.7.5
>>



Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-16 Thread Oliver Neukum
Am Mittwoch, den 16.05.2018, 03:37 + schrieb Hayes Wang:
> Oliver Neukum [mailto:oneu...@suse.com]
> > 
> > Hi,
> > 
> > I got reports about hangs with this trace:
> > 
> > May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more
> > than 60 seconds.
> > May 13 01:36:55 neroon kernel:   Tainted: G U
> > 4.17.0-rc4-1.g8257a00-vanilla #1
> > May 13 01:36:55 neroon kernel: "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > May 13 01:36:55 neroon kernel: kworker/0:0 D0 4  2
> > 0x8000
> > May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152]
> > May 13 01:36:55 neroon kernel: Call Trace:
> > May 13 01:36:55 neroon kernel:  ? __schedule+0x289/0x880
> > May 13 01:36:55 neroon kernel:  schedule+0x2f/0x90
> > May 13 01:36:55 neroon kernel:  rpm_resume+0xf9/0x7a0
> > May 13 01:36:55 neroon kernel:  ? wait_woken+0x80/0x80
> > May 13 01:36:55 neroon kernel:  rpm_resume+0x547/0x7a0
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> > May 13 01:36:55 neroon kernel:  __pm_runtime_resume+0x3a/0x50
> > May 13 01:36:55 neroon kernel:  usb_autopm_get_interface+0x1d/0x50 [usbcore]
> 
> Would usb_autopm_get_interface() take a long time?
> The driver would wake the device if it has suspended.
> I have no idea about how usb_autopm_get_interface() works, so I don't know 
> how to help.

Hi,

it basically calls r8152_resume() and makes a control request to the
hub. I think we are spinning in rtl8152_runtime_resume(), but where?
It has a lot of NAPI stuff. Any suggestions on how to instrument or
trace this?

Regards
Oliver



[RFC v4 2/5] virtio_ring: support creating packed ring

2018-05-16 Thread Tiwei Bie
This commit introduces the support for creating packed ring.
All split ring specific functions are added _split suffix.
Some necessary stubs for packed ring are also added.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 764 +++
 include/linux/virtio_ring.h  |   8 +-
 2 files changed, 513 insertions(+), 259 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71458f493cf8..62d7c407841a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -64,8 +64,8 @@ struct vring_desc_state {
 struct vring_virtqueue {
struct virtqueue vq;
 
-   /* Actual memory layout for this queue */
-   struct vring vring;
+   /* Is this a packed ring? */
+   bool packed;
 
/* Can we use weak barriers? */
bool weak_barriers;
@@ -79,19 +79,45 @@ struct vring_virtqueue {
/* Host publishes avail event idx */
bool event;
 
-   /* Head of free buffer list. */
-   unsigned int free_head;
/* Number we've added since last sync. */
unsigned int num_added;
 
/* Last used index we've seen. */
u16 last_used_idx;
 
-   /* Last written value to avail->flags */
-   u16 avail_flags_shadow;
+   union {
+   /* Available for split ring */
+   struct {
+   /* Actual memory layout for this queue. */
+   struct vring vring;
 
-   /* Last written value to avail->idx in guest byte order */
-   u16 avail_idx_shadow;
+   /* Head of free buffer list. */
+   unsigned int free_head;
+
+   /* Last written value to avail->flags */
+   u16 avail_flags_shadow;
+
+   /* Last written value to avail->idx in
+* guest byte order. */
+   u16 avail_idx_shadow;
+   };
+
+   /* Available for packed ring */
+   struct {
+   /* Actual memory layout for this queue. */
+   struct vring_packed vring_packed;
+
+   /* Driver ring wrap counter. */
+   u8 wrap_counter;
+
+   /* Index of the next avail descriptor. */
+   u16 next_avail_idx;
+
+   /* Last written value to driver->flags in
+* guest byte order. */
+   u16 event_flags_shadow;
+   };
+   };
 
/* How to notify other side. FIXME: commonalize hcalls! */
bool (*notify)(struct virtqueue *vq);
@@ -201,8 +227,17 @@ static dma_addr_t vring_map_single(const struct 
vring_virtqueue *vq,
  cpu_addr, size, direction);
 }
 
-static void vring_unmap_one(const struct vring_virtqueue *vq,
-   struct vring_desc *desc)
+static int vring_mapping_error(const struct vring_virtqueue *vq,
+  dma_addr_t addr)
+{
+   if (!vring_use_dma_api(vq->vq.vdev))
+   return 0;
+
+   return dma_mapping_error(vring_dma_dev(vq), addr);
+}
+
+static void vring_unmap_one_split(const struct vring_virtqueue *vq,
+ struct vring_desc *desc)
 {
u16 flags;
 
@@ -226,17 +261,9 @@ static void vring_unmap_one(const struct vring_virtqueue 
*vq,
}
 }
 
-static int vring_mapping_error(const struct vring_virtqueue *vq,
-  dma_addr_t addr)
-{
-   if (!vring_use_dma_api(vq->vq.vdev))
-   return 0;
-
-   return dma_mapping_error(vring_dma_dev(vq), addr);
-}
-
-static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
-unsigned int total_sg, gfp_t gfp)
+static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
 {
struct vring_desc *desc;
unsigned int i;
@@ -257,14 +284,14 @@ static struct vring_desc *alloc_indirect(struct virtqueue 
*_vq,
return desc;
 }
 
-static inline int virtqueue_add(struct virtqueue *_vq,
-   struct scatterlist *sgs[],
-   unsigned int total_sg,
-   unsigned int out_sgs,
-   unsigned int in_sgs,
-   void *data,
-   void *ctx,
-   gfp_t gfp)
+static inline int virtqueue_add_split(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int total_sg,
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+ void *data,

[RFC v4 5/5] virtio_ring: enable packed ring

2018-05-16 Thread Tiwei Bie
Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index de3839f3621a..b158692263b0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1940,6 +1940,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_IOMMU_PLATFORM:
break;
+   case VIRTIO_F_RING_PACKED:
+   break;
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
-- 
2.17.0



[RFC v4 4/5] virtio_ring: add event idx support in packed ring

2018-05-16 Thread Tiwei Bie
This commit introduces the event idx support in
packed ring.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 75 +---
 1 file changed, 70 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c6c5deb0e3ae..de3839f3621a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1006,7 +1006,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
 static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
-   u16 flags;
+   u16 new, old, off_wrap, flags, wrap_counter, event_idx;
bool needs_kick;
u32 snapshot;
 
@@ -1015,9 +1015,19 @@ static bool virtqueue_kick_prepare_packed(struct 
virtqueue *_vq)
 * suppressions. */
virtio_mb(vq->weak_barriers);
 
+   old = vq->next_avail_idx - vq->num_added;
+   new = vq->next_avail_idx;
+   vq->num_added = 0;
+
snapshot = *(u32 *)vq->vring_packed.device;
+   off_wrap = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot & 0x));
flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
 
+   wrap_counter = off_wrap >> 15;
+   event_idx = off_wrap & ~(1<<15);
+   if (wrap_counter != vq->wrap_counter)
+   event_idx -= vq->vring_packed.num;
+
 #ifdef DEBUG
if (vq->last_add_time_valid) {
WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
@@ -1026,7 +1036,10 @@ static bool virtqueue_kick_prepare_packed(struct 
virtqueue *_vq)
vq->last_add_time_valid = false;
 #endif
 
-   needs_kick = (flags != VRING_EVENT_F_DISABLE);
+   if (flags == VRING_EVENT_F_DESC)
+   needs_kick = vring_need_event(event_idx, new, old);
+   else
+   needs_kick = (flags != VRING_EVENT_F_DISABLE);
END_USE(vq);
return needs_kick;
 }
@@ -1098,7 +,7 @@ static void *virtqueue_get_buf_ctx_packed(struct 
virtqueue *_vq,
  void **ctx)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
-   u16 last_used, id;
+   u16 wrap_counter, last_used, id;
void *ret;
 
START_USE(vq);
@@ -1138,6 +1151,19 @@ static void *virtqueue_get_buf_ctx_packed(struct 
virtqueue *_vq,
ret = vq->desc_state[id].data;
detach_buf_packed(vq, last_used, id, ctx);
 
+   wrap_counter = vq->wrap_counter;
+   if (vq->last_used_idx > vq->next_avail_idx)
+   wrap_counter ^= 1;
+
+   /* If we expect an interrupt for the next entry, tell host
+* by writing event index and flush out the write before
+* the read in the next get_buf call. */
+   if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
+   virtio_store_mb(vq->weak_barriers,
+   &vq->vring_packed.driver->off_wrap,
+   cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
+   (wrap_counter << 15)));
+
 #ifdef DEBUG
vq->last_add_time_valid = false;
 #endif
@@ -1160,15 +1186,27 @@ static void virtqueue_disable_cb_packed(struct 
virtqueue *_vq)
 static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
+   u16 wrap_counter;
 
START_USE(vq);
 
/* We optimistically turn back on interrupts, then check if there was
 * more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the flags bit or point the event index at the next
+* entry. Always update the event index to keep code simple. */
+
+   wrap_counter = vq->wrap_counter;
+   if (vq->last_used_idx > vq->next_avail_idx)
+   wrap_counter ^= 1;
+
+   vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+   vq->last_used_idx | (wrap_counter << 15));
 
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
virtio_wmb(vq->weak_barriers);
-   vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+   vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+VRING_EVENT_F_ENABLE;
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
}
@@ -1194,15 +1232,40 @@ static bool virtqueue_poll_packed(struct virtqueue 
*_vq, unsigned last_used_idx)
 static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
+   u16 bufs, used_idx, wrap_counter;
 
START_USE(vq);
 
/* We optimistically turn back on interrupts, then check if there was
 * more to do. */
+   /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+* either clear the f

[RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Tiwei Bie
This commit introduces the basic support (without EVENT_IDX)
for packed ring.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 491 ++-
 1 file changed, 481 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 62d7c407841a..c6c5deb0e3ae 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,7 +58,8 @@
 
 struct vring_desc_state {
void *data; /* Data for callback. */
-   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   void *indir_desc;   /* Indirect descriptor, if any. */
+   int num;/* Descriptor list length. */
 };
 
 struct vring_virtqueue {
@@ -116,6 +117,9 @@ struct vring_virtqueue {
/* Last written value to driver->flags in
 * guest byte order. */
u16 event_flags_shadow;
+
+   /* ID allocation. */
+   struct idr buffer_id;
};
};
 
@@ -142,6 +146,16 @@ struct vring_virtqueue {
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
+ unsigned int total_sg)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /* If the host supports indirect descriptor tables, and we have multiple
+* buffers, then go indirect. FIXME: tune this threshold */
+   return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
 /*
  * Modern virtio devices have feature bits to specify whether they need a
  * quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -327,9 +341,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
head = vq->free_head;
 
-   /* If the host supports indirect descriptor tables, and we have multiple
-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+   if (virtqueue_use_indirect(_vq, total_sg))
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
@@ -741,6 +753,63 @@ static inline unsigned vring_size_packed(unsigned int num, 
unsigned long align)
& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
 }
 
+static void vring_unmap_one_packed(const struct vring_virtqueue *vq,
+  struct vring_packed_desc *desc)
+{
+   u16 flags;
+
+   if (!vring_use_dma_api(vq->vq.vdev))
+   return;
+
+   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+
+   if (flags & VRING_DESC_F_INDIRECT) {
+   dma_unmap_single(vring_dma_dev(vq),
+virtio64_to_cpu(vq->vq.vdev, desc->addr),
+virtio32_to_cpu(vq->vq.vdev, desc->len),
+(flags & VRING_DESC_F_WRITE) ?
+DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   } else {
+   dma_unmap_page(vring_dma_dev(vq),
+  virtio64_to_cpu(vq->vq.vdev, desc->addr),
+  virtio32_to_cpu(vq->vq.vdev, desc->len),
+  (flags & VRING_DESC_F_WRITE) ?
+  DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   }
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
+{
+   struct vring_packed_desc *desc;
+
+   /*
+* We require lowmem mappings for the descriptors because
+* otherwise virt_to_phys will give us bogus addresses in the
+* virtqueue.
+*/
+   gfp &= ~__GFP_HIGHMEM;
+
+   desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+   return desc;
+}
+
+static u16 alloc_id_packed(struct vring_virtqueue *vq)
+{
+   u16 id;
+
+   id = idr_alloc(&vq->buffer_id, NULL, 0, vq->vring_packed.num,
+  GFP_KERNEL);
+   return id;
+}
+
+static void free_id_packed(struct vring_virtqueue *vq, u16 id)
+{
+   idr_remove(&vq->buffer_id, id);
+}
+
 static inline int virtqueue_add_packed(struct virtqueue *_vq,
   struct scatterlist *sgs[],
   unsigned int total_sg,
@@ -750,47 +819,446 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
   void *ctx,
   gfp_t gfp)
 {
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct vring_packed_desc *desc;
+   struct scatterlist *sg;
+   unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
+   __virtio16 uninitialized_var(head_fl

[RFC v4 1/5] virtio: add packed ring definitions

2018-05-16 Thread Tiwei Bie
Signed-off-by: Tiwei Bie 
---
 include/uapi/linux/virtio_config.h | 12 +-
 include/uapi/linux/virtio_ring.h   | 36 ++
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 308e2096291f..a6e392325e3a 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
  * transport being used (eg. virtio_ring), the rest are per-device feature
  * bits. */
 #define VIRTIO_TRANSPORT_F_START   28
-#define VIRTIO_TRANSPORT_F_END 34
+#define VIRTIO_TRANSPORT_F_END 36
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,14 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM33
+
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED   34
+
+/*
+ * This feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER  35
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..3932cb80c347 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,9 @@
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT  4
 
+#define VRING_DESC_F_AVAIL(b)  ((b) << 7)
+#define VRING_DESC_F_USED(b)   ((b) << 15)
+
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
  * will still kick if it's out of buffers. */
@@ -53,6 +56,10 @@
  * optimization.  */
 #define VRING_AVAIL_F_NO_INTERRUPT 1
 
+#define VRING_EVENT_F_ENABLE   0x0
+#define VRING_EVENT_F_DISABLE  0x1
+#define VRING_EVENT_F_DESC 0x2
+
 /* We support indirect buffer descriptors */
 #define VIRTIO_RING_F_INDIRECT_DESC28
 
@@ -171,4 +178,33 @@ static inline int vring_need_event(__u16 event_idx, __u16 
new_idx, __u16 old)
return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
 }
 
+struct vring_packed_desc_event {
+   /* __virtio16 off  : 15; // Descriptor Event Offset
+* __virtio16 wrap : 1;  // Descriptor Event Wrap Counter */
+   __virtio16 off_wrap;
+   /* __virtio16 flags : 2; // Descriptor Event Flags */
+   __virtio16 flags;
+};
+
+struct vring_packed_desc {
+   /* Buffer Address. */
+   __virtio64 addr;
+   /* Buffer Length. */
+   __virtio32 len;
+   /* Buffer ID. */
+   __virtio16 id;
+   /* The flags depending on descriptor type. */
+   __virtio16 flags;
+};
+
+struct vring_packed {
+   unsigned int num;
+
+   struct vring_packed_desc *desc;
+
+   struct vring_packed_desc_event *driver;
+
+   struct vring_packed_desc_event *device;
+};
+
 #endif /* _UAPI_LINUX_VIRTIO_RING_H */
-- 
2.17.0



[RFC v4 0/5] virtio: support packed ring

2018-05-16 Thread Tiwei Bie
Hello everyone,

This RFC implements packed ring support in virtio driver.

Some simple functional tests have been done with Jason's
packed ring implementation in vhost:

https://lkml.org/lkml/2018/4/23/12

Both of ping and netperf worked as expected (with EVENT_IDX
disabled).

TODO:
- Refinements (for code and commit log);
- More tests;
- Bug fixes;

RFC v3 -> RFC v4:
- Make ID allocation support out-of-order (Jason);
- Various fixes for EVENT_IDX support;

RFC v2 -> RFC v3:
- Split into small patches (Jason);
- Add helper virtqueue_use_indirect() (Jason);
- Just set id for the last descriptor of a list (Jason);
- Calculate the prev in virtqueue_add_packed() (Jason);
- Fix/improve desc suppression code (Jason/MST);
- Refine the code layout for XXX_split/packed and wrappers (MST);
- Fix the comments and API in uapi (MST);
- Remove the BUG_ON() for indirect (Jason);
- Some other refinements and bug fixes;

RFC v1 -> RFC v2:
- Add indirect descriptor support - compile test only;
- Add event suppression supprt - compile test only;
- Move vring_packed_init() out of uapi (Jason, MST);
- Merge two loops into one in virtqueue_add_packed() (Jason);
- Split vring_unmap_one() for packed ring and split ring (Jason);
- Avoid using '%' operator (Jason);
- Rename free_head -> next_avail_idx (Jason);
- Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
- Some other refinements and bug fixes;

Thanks!

Tiwei Bie (5):
  virtio: add packed ring definitions
  virtio_ring: support creating packed ring
  virtio_ring: add packed ring support
  virtio_ring: add event idx support in packed ring
  virtio_ring: enable packed ring

 drivers/virtio/virtio_ring.c   | 1338 ++--
 include/linux/virtio_ring.h|8 +-
 include/uapi/linux/virtio_config.h |   12 +-
 include/uapi/linux/virtio_ring.h   |   36 +
 4 files changed, 1116 insertions(+), 278 deletions(-)

-- 
2.17.0



Re: [PATCH net-next v2 0/2] of: mdio: Fall back to mdiobus_register() with NULL device_node

2018-05-16 Thread Geert Uytterhoeven
Hi Florian,

Thanks for your series!
I like the effect on simplifying drivers.

On Wed, May 16, 2018 at 1:56 AM, Florian Fainelli  wrote:
> This patch series updates of_mdiobus_register() such that when the device_node
> argument is NULL, it calls mdiobus_register() directly. This is consistent 
> with
> the behavior of of_mdiobus_register() when CONFIG_OF=n.

IMHO the CONFIG_OF=n behavior of of_mdiobus_register() (which I wasn't
aware of) is inconsistent with the behavior of other of_*() functions,
which are just empty stubs.

So I'm wondering if you should do it the other way around, and let
mdiobus_register() call of_mdiobus_register() if dev->of_node exists?

This does mean mdiobus_register() should gain a struct device * parameter,
and thus changes to many more drivers are needed.

> I only converted the most obvious drivers, there are others that have a much
> less obvious behavior and specifically attempt to deal with CONFIG_ACPI.

I haven't looked at the ACPI handling, but perhaps this can be moved
inside mdiobus_register() as well?

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH 10/14] net: sched: extend act API for lockless actions

2018-05-16 Thread Jiri Pirko
Wed, May 16, 2018 at 10:16:13AM CEST, vla...@mellanox.com wrote:
>
>On Wed 16 May 2018 at 07:50, Jiri Pirko  wrote:
>> Mon, May 14, 2018 at 04:27:11PM CEST, vla...@mellanox.com wrote:
>>>Implement new action API function to atomically delete action with
>>>specified index and to atomically insert unique action. These functions are
>>>required to implement init and delete functions for specific actions that
>>>do not rely on rtnl lock.
>>>
>>>Signed-off-by: Vlad Buslov 
>>>---
>>> include/net/act_api.h |  2 ++
>>> net/sched/act_api.c   | 45 +
>>> 2 files changed, 47 insertions(+)
>>>
>>>diff --git a/include/net/act_api.h b/include/net/act_api.h
>>>index a8c8570..bce0cf1 100644
>>>--- a/include/net/act_api.h
>>>+++ b/include/net/act_api.h
>>>@@ -153,7 +153,9 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
>>>struct nlattr *est,
>>>struct tc_action **a, const struct tc_action_ops *ops,
>>>int bind, bool cpustats);
>>> void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
>>>+void tcf_idr_insert_unique(struct tc_action_net *tn, struct tc_action *a);
>>> 
>>>+int tcf_idr_find_delete(struct tc_action_net *tn, u32 index);
>>> int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);
>>> 
>>> static inline int tcf_idr_release(struct tc_action *a, bool bind)
>>>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>>>index 2772276e..a5193dc 100644
>>>--- a/net/sched/act_api.c
>>>+++ b/net/sched/act_api.c
>>>@@ -330,6 +330,41 @@ bool tcf_idr_check(struct tc_action_net *tn, u32 index, 
>>>struct tc_action **a,
>>> }
>>> EXPORT_SYMBOL(tcf_idr_check);
>>> 
>>>+int tcf_idr_find_delete(struct tc_action_net *tn, u32 index)
>>>+{
>>>+struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>>+struct tc_action *p;
>>>+int ret = 0;
>>>+
>>>+spin_lock_bh(&idrinfo->lock);
>>
>> Why "_bh" is needed here?
>
>Original idr remove function used _bh version so I used it here as well.
>As I already replied to your previous question about idrinfo lock usage,
>I don't see any particular reason for locking with _bh at this point.
>I've contacted the author(Chris Mi) and he said that he just preserved
>locking the same way as it was before he changed hash table to idr for
>action lookup.
>
>You want me to do standalone patch that cleans up idrinfo locking?

Yes please. You can send it separately, not as a part of this patchset.



>
>>
>>
>>>+p = idr_find(&idrinfo->action_idr, index);
>>>+if (!p) {
>>>+spin_unlock(&idrinfo->lock);
>>>+return -ENOENT;
>>>+}
>>>+
>>>+if (!atomic_read(&p->tcfa_bindcnt)) {
>>>+if (refcount_dec_and_test(&p->tcfa_refcnt)) {
>>>+struct module *owner = p->ops->owner;
>>>+
>>>+WARN_ON(p != idr_remove(&idrinfo->action_idr,
>>>+p->tcfa_index));
>>>+spin_unlock_bh(&idrinfo->lock);
>>>+
>>>+tcf_action_cleanup(p);
>>>+module_put(owner);
>>>+return 0;
>>>+}
>>>+ret = 0;
>>>+} else {
>>>+ret = -EPERM;
>>
>> I wonder if "-EPERM" is the best error code for this...
>
>This is what original code returned so I decided to preserve
>compatibility.

Okay.


>
>>
>>
>>>+}
>>>+
>>>+spin_unlock_bh(&idrinfo->lock);
>>>+return ret;
>>>+}
>>>+EXPORT_SYMBOL(tcf_idr_find_delete);
>>>+
>>> int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
>>>struct tc_action **a, const struct tc_action_ops *ops,
>>>int bind, bool cpustats)
>>>@@ -407,6 +442,16 @@ void tcf_idr_insert(struct tc_action_net *tn, struct 
>>>tc_action *a)
>>> }
>>> EXPORT_SYMBOL(tcf_idr_insert);
>>> 
>>>+void tcf_idr_insert_unique(struct tc_action_net *tn, struct tc_action *a)
>>>+{
>>>+struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>>+
>>>+spin_lock_bh(&idrinfo->lock);
>>>+WARN_ON(idr_replace(&idrinfo->action_idr, a, a->tcfa_index));
>>
>> Under which condition this WARN_ON is hit?
>
>When idr replace returns non-NULL pointer, which means that somehow
>concurrent insertion of action with same index has happened and we are
>leaking memory.

Is that possible to happen? Meaning, can I as a user cause this by doing
something in a wrong/unexpected way?


>
>By the way I'm still not sure if having this insert unique function is
>warranted or I should just add WARN to regular idr insert. What is your
>opinion on this?

I have to check where you use this.


>
>>
>>
>>>+spin_unlock_bh(&idrinfo->lock);
>>>+}
>>>+EXPORT_SYMBOL(tcf_idr_insert_unique);
>>>+
>>> void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
>>>  struct tcf_idrinfo *idrinfo)
>>> {
>>>-- 
>>>2.7.5
>>>
>


Re: KMSAN: uninit-value in __sctp_v6_cmp_addr

2018-05-16 Thread Alexander Potapenko
On Wed, May 16, 2018 at 9:17 AM Xin Long  wrote:

> On Wed, May 16, 2018 at 12:25 AM, syzbot
>  wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:74ee2200b89f kmsan: bump .config.example to v4.17-rc3
> > git tree:   https://github.com/google/kmsan.git/master
> > console output: https://syzkaller.appspot.com/x/log.txt?x=169efb5b80
> > kernel config:
https://syzkaller.appspot.com/x/.config?x=4ca1e57bafa8ab1f
> > dashboard link:
https://syzkaller.appspot.com/bug?extid=85490c30c260afff22f2
> > compiler:   clang version 7.0.0 (trunk 329391)
> > syzkaller repro:
https://syzkaller.appspot.com/x/repro.syz?x=157e923780
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10fe5de780
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the
commit:
> > Reported-by: syzbot+85490c30c260afff2...@syzkaller.appspotmail.com
> >
> > random: sshd: uninitialized urandom read (32 bytes read)
> > random: sshd: uninitialized urandom read (32 bytes read)
> > random: sshd: uninitialized urandom read (32 bytes read)
> > random: sshd: uninitialized urandom read (32 bytes read)
> > ==
> > BUG: KMSAN: uninit-value in __sctp_v6_cmp_addr+0x49a/0x850
> > net/sctp/ipv6.c:580
> > CPU: 0 PID: 4453 Comm: syz-executor325 Not tainted 4.17.0-rc3+ #88
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Call Trace:
> >  
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x185/0x1d0 lib/dump_stack.c:113
> >  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
> >  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
> >  __sctp_v6_cmp_addr+0x49a/0x850 net/sctp/ipv6.c:580
> Pls check if the testing kernel has this commit:
> commit d625329b06e46bd20baf9ee40847d11982569204
> Author: Xin Long 
> Date:   Thu Apr 26 14:13:57 2018 +0800

>  sctp: handle two v4 addrs comparison in sctp_inet6_cmp_addr

It doesn't, because we were testing v4.17-rc3, and the patch is in
v4.17-rc4.
I'll update to -rc5 and test again.
> Thanks.
Thank you!
> >  sctp_inet6_cmp_addr+0x3dc/0x400 net/sctp/ipv6.c:898
> >  sctp_bind_addr_match+0x18b/0x2f0 net/sctp/bind_addr.c:330
> >  sctp_addrs_lookup_transport+0x904/0xa20 net/sctp/input.c:942
> >  __sctp_lookup_association net/sctp/input.c:985 [inline]
> >  __sctp_rcv_lookup net/sctp/input.c:1249 [inline]
> >  sctp_rcv+0x15e6/0x4d30 net/sctp/input.c:170
> >  ip_local_deliver_finish+0x874/0xec0 net/ipv4/ip_input.c:215
> >  NF_HOOK include/linux/netfilter.h:288 [inline]
> >  ip_local_deliver+0x43c/0x4e0 net/ipv4/ip_input.c:256
> >  dst_input include/net/dst.h:450 [inline]
> >  ip_rcv_finish+0xa36/0x1d00 net/ipv4/ip_input.c:396
> >  NF_HOOK include/linux/netfilter.h:288 [inline]
> >  ip_rcv+0x118f/0x16d0 net/ipv4/ip_input.c:492
> >  __netif_receive_skb_core+0x47df/0x4a90 net/core/dev.c:4592
> >  __netif_receive_skb net/core/dev.c:4657 [inline]
> >  process_backlog+0x62d/0xe20 net/core/dev.c:5337
> >  napi_poll net/core/dev.c:5735 [inline]
> >  net_rx_action+0x7c1/0x1a70 net/core/dev.c:5801
> >  __do_softirq+0x56d/0x93d kernel/softirq.c:285
> >  do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1046
> >  
> >  do_softirq kernel/softirq.c:329 [inline]
> >  __local_bh_enable_ip+0x114/0x140 kernel/softirq.c:182
> >  local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
> >  rcu_read_unlock_bh include/linux/rcupdate.h:728 [inline]
> >  ip_finish_output2+0x135a/0x1470 net/ipv4/ip_output.c:231
> >  ip_finish_output+0xcb2/0xff0 net/ipv4/ip_output.c:317
> >  NF_HOOK_COND include/linux/netfilter.h:277 [inline]
> >  ip_output+0x505/0x5d0 net/ipv4/ip_output.c:405
> >  dst_output include/net/dst.h:444 [inline]
> >  ip_local_out net/ipv4/ip_output.c:124 [inline]
> >  ip_queue_xmit+0x1a1e/0x1d10 net/ipv4/ip_output.c:504
> >  sctp_v4_xmit+0x188/0x210 net/sctp/protocol.c:983
> >  sctp_packet_transmit+0x3eaa/0x4350 net/sctp/output.c:650
> >  sctp_outq_flush+0x1a7a/0x6320 net/sctp/outqueue.c:1197
> >  sctp_outq_uncork+0xd2/0xf0 net/sctp/outqueue.c:776
> >  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
> >  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
> >  sctp_do_sm+0x8707/0x8d20 net/sctp/sm_sideeffect.c:1191
> >  sctp_primitive_REQUESTHEARTBEAT+0x175/0x1a0 net/sctp/primitive.c:200
> >  sctp_apply_peer_addr_params+0x207/0x1670 net/sctp/socket.c:2487
> >  sctp_setsockopt_peer_addr_params net/sctp/socket.c:2683 [inline]
> >  sctp_setsockopt+0x10e5f/0x11600 net/sctp/socket.c:4258
> >  sock_common_setsockopt+0x136/0x170 net/core/sock.c:3039
> >  __sys_setsockopt+0x4af/0x560 net/socket.c:1903
> >  __do_sys_setsockopt net/socket.c:1914 [inline]
> >  __se_sys_setsockopt net/socket.c:1911 [inline]
> >  __x64_sys_setsockopt+0x15c/0x1c0 net/socket.c:1911
> >  do_syscall_64+0x154/0x220 arch/x86/entry/common.c:287
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > RIP: 0033:0x43fef9
> > RSP: 002b:7ffc00d9bfd8 EFLAGS: 00

[PATCH] net: 8390: ne: Fix accidentally removed RBTX4927 support

2018-05-16 Thread Geert Uytterhoeven
The configuration settings for RBTX4927 were accidentally removed,
leading to a silently broken network interface.

Re-add the missing settings to fix this.

Fixes: 8eb97ff5a4ec941d ("net: 8390: remove m32r specific bits")
Signed-off-by: Geert Uytterhoeven 
---
Bisected between v4.9-rc2 (doh) and v4.17-rc5.

Note to myself: I should do more boot testing on RBTX4927.
Fortunately I caught it before it ends up in a point release ;-)
---
 drivers/net/ethernet/8390/ne.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/8390/ne.c b/drivers/net/ethernet/8390/ne.c
index ac99d089ac7266c3..1c97e39b478e9f89 100644
--- a/drivers/net/ethernet/8390/ne.c
+++ b/drivers/net/ethernet/8390/ne.c
@@ -164,7 +164,9 @@ bad_clone_list[] __initdata = {
 #define NESM_START_PG  0x40/* First page of TX buffer */
 #define NESM_STOP_PG   0x80/* Last page +1 of RX ring */
 
-#if defined(CONFIG_ATARI)  /* 8-bit mode on Atari, normal on Q40 */
+#if defined(CONFIG_MACH_TX49XX)
+#  define DCR_VAL 0x48 /* 8-bit mode */
+#elif defined(CONFIG_ATARI)/* 8-bit mode on Atari, normal on Q40 */
 #  define DCR_VAL (MACH_IS_ATARI ? 0x48 : 0x49)
 #else
 #  define DCR_VAL 0x49
-- 
2.7.4



Xilinx axienet + DP83620 in fiber mode won't set netif_carrier_on

2018-05-16 Thread Alvaro G. M.
Hi,

I have a custom board with a Xilinx FPGA running Microblaze and fitting a
Xilinx Axi Ethernet IP core.  This core communicates through MII mode with a
DP83620 PHY from Texas that supports both cabled and fiber interfaces, of
which I'm using the latter.

Under these circumstances, I've noticed that the interface is pretty much
dead except for receiving broadcast packages, so I tried to dig on the
driver to find the cause. Please, beware that I'm not very familiar with the
netdev subsystem, so I may be mistaken on lots of things.

It seems that of_phy_connect ends up calling netif_carrier_off:

phy_device.c:1036
/* Initial carrier state is off as the phy is about to be
 * (re)initialized.
 */
netif_carrier_off(phydev->attached_dev);

/* Do initial configuration here, now that
 * we have certain key parameters
 * (dev_flags and interface)
 */
err = phy_init_hw(phydev);
if (err)
goto error;

phy_resume(phydev);

However, neither xilinx_axienet_main.c nor dp83848.c ever runs
netif_carrier_on. As a simple test, I tried this patch, and that was enough
to make the interface work.

diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c 
b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index e74e1e897864..d8bbe4c51b8a 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -957,6 +957,8 @@ static int axienet_open(struct net_device *ndev)
if (ret)
goto err_rx_irq;
 
+   netif_carrier_on(ndev);
+
return 0;
 
 err_rx_irq:


I understand, however, that this is just a proof of concept that shows the
underlying issue. I'd like to contribute to making this a proper patch, or
maybe anyone who is familiar with the netdev subsystem knows at first sight
what is the solution for this.

My understanding is that this code works fine with other PHY chips, as
pretty much the same code has been in the kernel for a long time, but that
probably before ee06b1728b95643668e40fc58ae118aeb7c1753e (which I
instigated) this Xilinx core and driver had never been tested with any
interface other than GMII and RGMII, which were back then written
explicitly, with an unknown PHY chip.

I should also note that axienet_adjust_link is never called in this
configuration, which is the place where I think the call to netif_carrier_on
should be (based on what I've read on other ethernet drivers), but it
seems that the dp83620 doesn't notify of any autonegotiation (at least while
on fiber mode).

I'm open to reading and testing whatever is needed, and please, feel free to
correct me if I've said anything incorrect, which most probably I've done.

Best regards

-- 
Alvaro G. M.


Re: [PATCH 10/14] net: sched: extend act API for lockless actions

2018-05-16 Thread Vlad Buslov

On Wed 16 May 2018 at 08:56, Jiri Pirko  wrote:
> Wed, May 16, 2018 at 10:16:13AM CEST, vla...@mellanox.com wrote:
>>
>>On Wed 16 May 2018 at 07:50, Jiri Pirko  wrote:
>>> Mon, May 14, 2018 at 04:27:11PM CEST, vla...@mellanox.com wrote:
Implement new action API function to atomically delete action with
specified index and to atomically insert unique action. These functions are
required to implement init and delete functions for specific actions that
do not rely on rtnl lock.

Signed-off-by: Vlad Buslov 
---
 include/net/act_api.h |  2 ++
 net/sched/act_api.c   | 45 +
 2 files changed, 47 insertions(+)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index a8c8570..bce0cf1 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -153,7 +153,9 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
struct nlattr *est,
   struct tc_action **a, const struct tc_action_ops *ops,
   int bind, bool cpustats);
 void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
+void tcf_idr_insert_unique(struct tc_action_net *tn, struct tc_action *a);
 
+int tcf_idr_find_delete(struct tc_action_net *tn, u32 index);
 int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);
 
 static inline int tcf_idr_release(struct tc_action *a, bool bind)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 2772276e..a5193dc 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -330,6 +330,41 @@ bool tcf_idr_check(struct tc_action_net *tn, u32 
index, struct tc_action **a,
 }
 EXPORT_SYMBOL(tcf_idr_check);
 
+int tcf_idr_find_delete(struct tc_action_net *tn, u32 index)
+{
+   struct tcf_idrinfo *idrinfo = tn->idrinfo;
+   struct tc_action *p;
+   int ret = 0;
+
+   spin_lock_bh(&idrinfo->lock);
>>>
>>> Why "_bh" is needed here?
>>
>>Original idr remove function used _bh version so I used it here as well.
>>As I already replied to your previous question about idrinfo lock usage,
>>I don't see any particular reason for locking with _bh at this point.
>>I've contacted the author(Chris Mi) and he said that he just preserved
>>locking the same way as it was before he changed hash table to idr for
>>action lookup.
>>
>>You want me to do standalone patch that cleans up idrinfo locking?
>
> Yes please. You can send it separately, not as a part of this
> patchset.

Okay.

>
>
>
>>
>>>
>>>
+   p = idr_find(&idrinfo->action_idr, index);
+   if (!p) {
+   spin_unlock(&idrinfo->lock);
+   return -ENOENT;
+   }
+
+   if (!atomic_read(&p->tcfa_bindcnt)) {
+   if (refcount_dec_and_test(&p->tcfa_refcnt)) {
+   struct module *owner = p->ops->owner;
+
+   WARN_ON(p != idr_remove(&idrinfo->action_idr,
+   p->tcfa_index));
+   spin_unlock_bh(&idrinfo->lock);
+
+   tcf_action_cleanup(p);
+   module_put(owner);
+   return 0;
+   }
+   ret = 0;
+   } else {
+   ret = -EPERM;
>>>
>>> I wonder if "-EPERM" is the best error code for this...
>>
>>This is what original code returned so I decided to preserve
>>compatibility.
>
> Okay.
>
>
>>
>>>
>>>
+   }
+
+   spin_unlock_bh(&idrinfo->lock);
+   return ret;
+}
+EXPORT_SYMBOL(tcf_idr_find_delete);
+
 int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
   struct tc_action **a, const struct tc_action_ops *ops,
   int bind, bool cpustats)
@@ -407,6 +442,16 @@ void tcf_idr_insert(struct tc_action_net *tn, struct 
tc_action *a)
 }
 EXPORT_SYMBOL(tcf_idr_insert);
 
+void tcf_idr_insert_unique(struct tc_action_net *tn, struct tc_action *a)
+{
+   struct tcf_idrinfo *idrinfo = tn->idrinfo;
+
+   spin_lock_bh(&idrinfo->lock);
+   WARN_ON(idr_replace(&idrinfo->action_idr, a, a->tcfa_index));
>>>
>>> Under which condition this WARN_ON is hit?
>>
>>When idr replace returns non-NULL pointer, which means that somehow
>>concurrent insertion of action with same index has happened and we are
>>leaking memory.
>
> Is that possible to happen? Meaning, can I as a user cause this by doing
> something in a wrong/unexpected way?

No, it shouldn't be possible unless there is a race condition.
Otherwise I would put some proper error handling code there.

>
>
>>
>>By the way I'm still not sure if having this insert unique function is
>>warranted or I should just add WARN to regular idr insert. What is your
>>opinion on this?
>
> I have to check where you use this.

Every action init function uses this.

>
>
>>
>>>
>>>
+   spin_unlock_bh(&idrinfo->lock);
>>

Re: KMSAN: uninit-value in __sctp_v6_cmp_addr

2018-05-16 Thread Alexander Potapenko
#syz fix: sctp: handle two v4 addrs comparison in sctp_inet6_cmp_addr


[PATCH 06/42] proc: introduce proc_create_seq{,_data}

2018-05-16 Thread Christoph Hellwig
Variants of proc_create{,_data} that directly take a struct seq_operations
argument and drastically reduces the boilerplate code in the callers.

All trivial callers converted over.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/hp/common/sba_iommu.c  | 15 +-
 arch/ia64/kernel/perfmon.c   | 16 +--
 arch/s390/kernel/sysinfo.c   | 14 +-
 block/genhd.c| 28 +--
 crypto/proc.c| 14 +-
 drivers/char/misc.c  | 15 +-
 drivers/isdn/capi/kcapi_proc.c   | 80 ++--
 drivers/net/hamradio/bpqether.c  | 16 +--
 drivers/net/hamradio/scc.c   | 17 +--
 drivers/net/hamradio/yam.c   | 16 +--
 drivers/pci/proc.c   | 17 +--
 drivers/s390/block/dasd_proc.c   | 17 +--
 drivers/s390/char/tape_proc.c| 19 +---
 drivers/staging/ipx/ipx_proc.c   | 45 ++
 drivers/tty/tty_ldisc.c  | 15 +-
 drivers/video/fbdev/core/fbmem.c | 15 +-
 drivers/zorro/proc.c | 17 +--
 fs/cachefiles/proc.c | 19 +---
 fs/fscache/histogram.c   | 17 +--
 fs/fscache/internal.h|  3 +-
 fs/fscache/proc.c|  4 +-
 fs/proc/consoles.c   | 14 +-
 fs/proc/devices.c| 14 +-
 fs/proc/generic.c| 30 
 fs/proc/internal.h   |  1 +
 fs/proc/interrupts.c | 14 +-
 fs/proc/nommu.c  | 14 +-
 fs/proc/proc_tty.c   | 16 +--
 include/linux/proc_fs.h  |  9 
 include/linux/tty.h  |  3 +-
 include/net/ax25.h   |  5 +-
 include/net/netrom.h |  5 +-
 include/net/rose.h   |  6 +--
 kernel/locking/lockdep_proc.c| 29 +---
 kernel/sched/debug.c | 28 +--
 kernel/sched/stats.c | 15 +-
 mm/vmalloc.c | 11 +++--
 mm/vmstat.c  | 56 ++
 net/appletalk/atalk_proc.c   | 48 +++
 net/atm/br2684.c | 14 +-
 net/ax25/af_ax25.c   | 21 ++---
 net/ax25/ax25_route.c| 15 +-
 net/ax25/ax25_uid.c  | 15 +-
 net/core/net-procfs.c| 16 +--
 net/decnet/dn_dev.c  | 15 +-
 net/llc/llc_proc.c   | 28 +--
 net/netrom/af_netrom.c   | 18 ++-
 net/netrom/nr_route.c| 29 +---
 net/rose/af_rose.c   | 26 +++
 net/rose/rose_route.c| 44 ++
 net/sctp/objcnt.c| 16 +--
 net/x25/x25_proc.c   | 48 +++
 security/keys/proc.c | 34 +-
 53 files changed, 151 insertions(+), 925 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index aec4a3354abe..cb5cd86a5530 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -1942,19 +1942,6 @@ static const struct seq_operations ioc_seq_ops = {
.show  = ioc_show
 };
 
-static int
-ioc_open(struct inode *inode, struct file *file)
-{
-   return seq_open(file, &ioc_seq_ops);
-}
-
-static const struct file_operations ioc_fops = {
-   .open= ioc_open,
-   .read= seq_read,
-   .llseek  = seq_lseek,
-   .release = seq_release
-};
-
 static void __init
 ioc_proc_init(void)
 {
@@ -1964,7 +1951,7 @@ ioc_proc_init(void)
if (!dir)
return;
 
-   proc_create(ioc_list->name, 0, dir, &ioc_fops);
+   proc_create_seq(ioc_list->name, 0, dir, &ioc_seq_ops);
 }
 #endif
 
diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 8fb280e33114..3b38c717008a 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -5708,13 +5708,6 @@ const struct seq_operations pfm_seq_ops = {
.show = pfm_proc_show
 };
 
-static int
-pfm_proc_open(struct inode *inode, struct file *file)
-{
-   return seq_open(file, &pfm_seq_ops);
-}
-
-
 /*
  * we come here as soon as local_cpu_data->pfm_syst_wide is set. this happens
  * during pfm_enable() hence before pfm_start(). We cannot assume monitoring
@@ -6537,13 +6530,6 @@ pfm_probe_pmu(void)
return 0;
 }
 
-static const struct file_operations pfm_proc_fops = {
-   .open   = pfm_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-
 int __init
 pfm_init(void)
 {
@@ -6615,7 +6601,7 @@ pfm_init(void)
/*
 * create /proc/perfmon (mostly for debugging purposes)
 */
-   perfmon_dir = proc_create("perfmon", S_IRUGO, NULL, &pfm_proc_fops);
+   perfmon_dir = proc_create_seq("perfmon", S_IRUGO, NULL, &pfm_seq_ops);
if (perfmon_dir == NULL) {
printk(KERN_ERR "perfmon: cannot create /proc entry, perfmon 
disabled\n");
pm

[PATCH 02/42] proc: introduce a proc_pid_ns helper

2018-05-16 Thread Christoph Hellwig
Factor out retrieving the per-sb pid namespaces from the sb private data
into an easier to understand helper.

Suggested-by: Eric W. Biederman 
Signed-off-by: Christoph Hellwig 
---
 fs/proc/array.c |  7 +--
 fs/proc/base.c  | 18 --
 fs/proc/self.c  |  4 ++--
 fs/proc/thread_self.c   |  4 ++--
 include/linux/proc_fs.h |  6 ++
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index ae2c807fd719..911f66924d81 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -677,12 +677,7 @@ get_children_pid(struct inode *inode, struct pid 
*pid_prev, loff_t pos)
 
 static int children_seq_show(struct seq_file *seq, void *v)
 {
-   struct inode *inode = seq->private;
-   pid_t pid;
-
-   pid = pid_nr_ns(v, inode->i_sb->s_fs_info);
-   seq_printf(seq, "%d ", pid);
-
+   seq_printf(seq, "%d ", pid_nr_ns(v, proc_pid_ns(seq->private)));
return 0;
 }
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1b2ede6abcdf..29237cad19fd 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -698,7 +698,7 @@ static bool has_pid_permissions(struct pid_namespace *pid,
 
 static int proc_pid_permission(struct inode *inode, int mask)
 {
-   struct pid_namespace *pid = inode->i_sb->s_fs_info;
+   struct pid_namespace *pid = proc_pid_ns(inode);
struct task_struct *task;
bool has_perms;
 
@@ -733,13 +733,11 @@ static const struct inode_operations 
proc_def_inode_operations = {
 static int proc_single_show(struct seq_file *m, void *v)
 {
struct inode *inode = m->private;
-   struct pid_namespace *ns;
-   struct pid *pid;
+   struct pid_namespace *ns = proc_pid_ns(inode);
+   struct pid *pid = proc_pid(inode);
struct task_struct *task;
int ret;
 
-   ns = inode->i_sb->s_fs_info;
-   pid = proc_pid(inode);
task = get_pid_task(pid, PIDTYPE_PID);
if (!task)
return -ESRCH;
@@ -1410,7 +1408,7 @@ static const struct file_operations 
proc_fail_nth_operations = {
 static int sched_show(struct seq_file *m, void *v)
 {
struct inode *inode = m->private;
-   struct pid_namespace *ns = inode->i_sb->s_fs_info;
+   struct pid_namespace *ns = proc_pid_ns(inode);
struct task_struct *p;
 
p = get_proc_task(inode);
@@ -1782,8 +1780,8 @@ int pid_getattr(const struct path *path, struct kstat 
*stat,
u32 request_mask, unsigned int query_flags)
 {
struct inode *inode = d_inode(path->dentry);
+   struct pid_namespace *pid = proc_pid_ns(inode);
struct task_struct *task;
-   struct pid_namespace *pid = path->dentry->d_sb->s_fs_info;
 
generic_fillattr(inode, stat);
 
@@ -2337,7 +2335,7 @@ static int proc_timers_open(struct inode *inode, struct 
file *file)
return -ENOMEM;
 
tp->pid = proc_pid(inode);
-   tp->ns = inode->i_sb->s_fs_info;
+   tp->ns = proc_pid_ns(inode);
return 0;
 }
 
@@ -3239,7 +3237,7 @@ static struct tgid_iter next_tgid(struct pid_namespace 
*ns, struct tgid_iter ite
 int proc_pid_readdir(struct file *file, struct dir_context *ctx)
 {
struct tgid_iter iter;
-   struct pid_namespace *ns = file_inode(file)->i_sb->s_fs_info;
+   struct pid_namespace *ns = proc_pid_ns(file_inode(file));
loff_t pos = ctx->pos;
 
if (pos >= PID_MAX_LIMIT + TGID_OFFSET)
@@ -3588,7 +3586,7 @@ static int proc_task_readdir(struct file *file, struct 
dir_context *ctx)
/* f_version caches the tgid value that the last readdir call couldn't
 * return. lseek aka telldir automagically resets f_version to 0.
 */
-   ns = inode->i_sb->s_fs_info;
+   ns = proc_pid_ns(inode);
tid = (int)file->f_version;
file->f_version = 0;
for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 4d7d061696b3..127265e5c55f 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -12,7 +12,7 @@ static const char *proc_self_get_link(struct dentry *dentry,
  struct inode *inode,
  struct delayed_call *done)
 {
-   struct pid_namespace *ns = inode->i_sb->s_fs_info;
+   struct pid_namespace *ns = proc_pid_ns(inode);
pid_t tgid = task_tgid_nr_ns(current, ns);
char *name;
 
@@ -36,7 +36,7 @@ static unsigned self_inum __ro_after_init;
 int proc_setup_self(struct super_block *s)
 {
struct inode *root_inode = d_inode(s->s_root);
-   struct pid_namespace *ns = s->s_fs_info;
+   struct pid_namespace *ns = proc_pid_ns(root_inode);
struct dentry *self;

inode_lock(root_inode);
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 9d2efaca499f..b905010ca9eb 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -12,7 +12,7 @@ static const char *proc_thread_self_get_link(struct dentry 

[PATCH 12/42] ipv{4,6}/raw: simplify ѕeq_file code

2018-05-16 Thread Christoph Hellwig
Pass the hashtable to the proc private data instead of copying
it into the per-file private data.

Signed-off-by: Christoph Hellwig 
---
 include/net/raw.h |  4 
 net/ipv4/raw.c| 36 
 net/ipv6/raw.c|  6 --
 3 files changed, 16 insertions(+), 30 deletions(-)

diff --git a/include/net/raw.h b/include/net/raw.h
index 99d26d0c4a19..9c9fa98a91a4 100644
--- a/include/net/raw.h
+++ b/include/net/raw.h
@@ -48,7 +48,6 @@ void raw_proc_exit(void);
 struct raw_iter_state {
struct seq_net_private p;
int bucket;
-   struct raw_hashinfo *h;
 };
 
 static inline struct raw_iter_state *raw_seq_private(struct seq_file *seq)
@@ -58,9 +57,6 @@ static inline struct raw_iter_state *raw_seq_private(struct 
seq_file *seq)
 void *raw_seq_start(struct seq_file *seq, loff_t *pos);
 void *raw_seq_next(struct seq_file *seq, void *v, loff_t *pos);
 void raw_seq_stop(struct seq_file *seq, void *v);
-int raw_seq_open(struct inode *ino, struct file *file,
-struct raw_hashinfo *h, const struct seq_operations *ops);
-
 #endif
 
 int raw_hash_sk(struct sock *sk);
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 1b4d3355624a..ae57962b31e3 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -1003,11 +1003,12 @@ struct proto raw_prot = {
 static struct sock *raw_get_first(struct seq_file *seq)
 {
struct sock *sk;
+   struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file));
struct raw_iter_state *state = raw_seq_private(seq);
 
for (state->bucket = 0; state->bucket < RAW_HTABLE_SIZE;
++state->bucket) {
-   sk_for_each(sk, &state->h->ht[state->bucket])
+   sk_for_each(sk, &h->ht[state->bucket])
if (sock_net(sk) == seq_file_net(seq))
goto found;
}
@@ -1018,6 +1019,7 @@ static struct sock *raw_get_first(struct seq_file *seq)
 
 static struct sock *raw_get_next(struct seq_file *seq, struct sock *sk)
 {
+   struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file));
struct raw_iter_state *state = raw_seq_private(seq);
 
do {
@@ -1027,7 +1029,7 @@ static struct sock *raw_get_next(struct seq_file *seq, 
struct sock *sk)
} while (sk && sock_net(sk) != seq_file_net(seq));
 
if (!sk && ++state->bucket < RAW_HTABLE_SIZE) {
-   sk = sk_head(&state->h->ht[state->bucket]);
+   sk = sk_head(&h->ht[state->bucket]);
goto try_again;
}
return sk;
@@ -1045,9 +1047,9 @@ static struct sock *raw_get_idx(struct seq_file *seq, 
loff_t pos)
 
 void *raw_seq_start(struct seq_file *seq, loff_t *pos)
 {
-   struct raw_iter_state *state = raw_seq_private(seq);
+   struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file));
 
-   read_lock(&state->h->lock);
+   read_lock(&h->lock);
return *pos ? raw_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
 }
 EXPORT_SYMBOL_GPL(raw_seq_start);
@@ -1067,9 +1069,9 @@ EXPORT_SYMBOL_GPL(raw_seq_next);
 
 void raw_seq_stop(struct seq_file *seq, void *v)
 {
-   struct raw_iter_state *state = raw_seq_private(seq);
+   struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file));
 
-   read_unlock(&state->h->lock);
+   read_unlock(&h->lock);
 }
 EXPORT_SYMBOL_GPL(raw_seq_stop);
 
@@ -1110,25 +1112,10 @@ static const struct seq_operations raw_seq_ops = {
.show  = raw_seq_show,
 };
 
-int raw_seq_open(struct inode *ino, struct file *file,
-struct raw_hashinfo *h, const struct seq_operations *ops)
-{
-   int err;
-   struct raw_iter_state *i;
-
-   err = seq_open_net(ino, file, ops, sizeof(struct raw_iter_state));
-   if (err < 0)
-   return err;
-
-   i = raw_seq_private((struct seq_file *)file->private_data);
-   i->h = h;
-   return 0;
-}
-EXPORT_SYMBOL_GPL(raw_seq_open);
-
 static int raw_v4_seq_open(struct inode *inode, struct file *file)
 {
-   return raw_seq_open(inode, file, &raw_v4_hashinfo, &raw_seq_ops);
+   return seq_open_net(inode, file, &raw_seq_ops,
+   sizeof(struct raw_iter_state));
 }
 
 static const struct file_operations raw_seq_fops = {
@@ -1140,7 +1127,8 @@ static const struct file_operations raw_seq_fops = {
 
 static __net_init int raw_init_net(struct net *net)
 {
-   if (!proc_create("raw", 0444, net->proc_net, &raw_seq_fops))
+   if (!proc_create_data("raw", 0444, net->proc_net, &raw_seq_fops,
+   &raw_v4_hashinfo))
return -ENOMEM;
 
return 0;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 5eb9b08947ed..dade69bf61e6 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1306,7 +1306,8 @@ static const struct seq_operations raw6_seq_ops = {
 
 static int raw6_seq_open(struct inode *inode, struct file *file)
 {
-   return raw_seq_open(inode, file, &raw_v6_hashinfo, &raw6_seq_ops);
+   return seq_open_net(inode, file, &raw6_seq_o

[PATCH 16/42] net: move seq_file_single_net to

2018-05-16 Thread Christoph Hellwig
This helper deals with single_{open,release}_net internals and thus
belongs here.

Signed-off-by: Christoph Hellwig 
---
 include/linux/seq_file_net.h | 13 +
 include/net/ip_vs.h  | 12 
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/seq_file_net.h b/include/linux/seq_file_net.h
index 43ccd84127b6..ed20faa99e05 100644
--- a/include/linux/seq_file_net.h
+++ b/include/linux/seq_file_net.h
@@ -28,4 +28,17 @@ static inline struct net *seq_file_net(struct seq_file *seq)
 #endif
 }
 
+/*
+ * This one is needed for single_open_net since net is stored directly in
+ * private not as a struct i.e. seq_file_net can't be used.
+ */
+static inline struct net *seq_file_single_net(struct seq_file *seq)
+{
+#ifdef CONFIG_NET_NS
+   return (struct net *)seq->private;
+#else
+   return &init_net;
+#endif
+}
+
 #endif
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index eb0bec043c96..aea7a124e66b 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -41,18 +41,6 @@ static inline struct netns_ipvs *net_ipvs(struct net* net)
return net->ipvs;
 }
 
-/* This one needed for single_open_net since net is stored directly in
- * private not as a struct i.e. seq_file_net can't be used.
- */
-static inline struct net *seq_file_single_net(struct seq_file *seq)
-{
-#ifdef CONFIG_NET_NS
-   return (struct net *)seq->private;
-#else
-   return &init_net;
-#endif
-}
-
 /* Connections' size value needed by ip_vs_ctl.c */
 extern int ip_vs_conn_tab_size;
 
-- 
2.17.0



[PATCH 05/42] proc: add a proc_create_reg helper

2018-05-16 Thread Christoph Hellwig
Common code for creating a regular file.  Factor out of proc_create_data, to
be reused by other functions soon.

Signed-off-by: Christoph Hellwig 
---
 fs/proc/generic.c  | 44 +---
 fs/proc/internal.h |  2 ++
 2 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index bd8480ff0d35..ab6a321076b8 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -511,33 +511,39 @@ struct proc_dir_entry *proc_create_mount_point(const char 
*name)
 }
 EXPORT_SYMBOL(proc_create_mount_point);
 
-struct proc_dir_entry *proc_create_data(const char *name, umode_t mode,
-   struct proc_dir_entry *parent,
-   const struct file_operations *proc_fops,
-   void *data)
+struct proc_dir_entry *proc_create_reg(const char *name, umode_t mode,
+   struct proc_dir_entry **parent, void *data)
 {
-   struct proc_dir_entry *pde;
+   struct proc_dir_entry *p;
+
if ((mode & S_IFMT) == 0)
mode |= S_IFREG;
-
-   if (!S_ISREG(mode)) {
-   WARN_ON(1); /* use proc_mkdir() */
+   if ((mode & S_IALLUGO) == 0)
+   mode |= S_IRUGO;
+   if (WARN_ON_ONCE(!S_ISREG(mode)))
return NULL;
+
+   p = __proc_create(parent, name, mode, 1);
+   if (p) {
+   p->proc_iops = &proc_file_inode_operations;
+   p->data = data;
}
+   return p;
+}
+
+struct proc_dir_entry *proc_create_data(const char *name, umode_t mode,
+   struct proc_dir_entry *parent,
+   const struct file_operations *proc_fops, void *data)
+{
+   struct proc_dir_entry *p;
 
BUG_ON(proc_fops == NULL);
 
-   if ((mode & S_IALLUGO) == 0)
-   mode |= S_IRUGO;
-   pde = __proc_create(&parent, name, mode, 1);
-   if (!pde)
-   goto out;
-   pde->proc_fops = proc_fops;
-   pde->data = data;
-   pde->proc_iops = &proc_file_inode_operations;
-   return proc_register(parent, pde);
-out:
-   return NULL;
+   p = proc_create_reg(name, mode, &parent, data);
+   if (!p)
+   return NULL;
+   p->proc_fops = proc_fops;
+   return proc_register(parent, p);
 }
 EXPORT_SYMBOL(proc_create_data);
  
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 488e67490312..dd1e11400b97 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -162,6 +162,8 @@ extern bool proc_fill_cache(struct file *, struct 
dir_context *, const char *, i
 /*
  * generic.c
  */
+struct proc_dir_entry *proc_create_reg(const char *name, umode_t mode,
+   struct proc_dir_entry **parent, void *data);
 struct proc_dir_entry *proc_register(struct proc_dir_entry *dir,
struct proc_dir_entry *dp);
 extern struct dentry *proc_lookup(struct inode *, struct dentry *, unsigned 
int);
-- 
2.17.0



[PATCH 13/42] ipv6/flowlabel: simplify pid namespace lookup

2018-05-16 Thread Christoph Hellwig
The code should be using the pid namespace from the procfs mount
instead of trying to look it up during open.

Suggested-by: Eric W. Biederman 
Signed-off-by: Christoph Hellwig 
---
 net/ipv6/ip6_flowlabel.c | 29 ++---
 1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index c05c4e82a7ca..2fbd9bed764a 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -754,6 +754,10 @@ static struct ip6_flowlabel *ip6fl_get_idx(struct seq_file 
*seq, loff_t pos)
 static void *ip6fl_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(RCU)
 {
+   struct ip6fl_iter_state *state = ip6fl_seq_private(seq);
+
+   state->pid_ns = proc_pid_ns(file_inode(seq->file));
+
rcu_read_lock_bh();
return *pos ? ip6fl_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
 }
@@ -810,36 +814,15 @@ static const struct seq_operations ip6fl_seq_ops = {
 
 static int ip6fl_seq_open(struct inode *inode, struct file *file)
 {
-   struct seq_file *seq;
-   struct ip6fl_iter_state *state;
-   int err;
-
-   err = seq_open_net(inode, file, &ip6fl_seq_ops,
+   return seq_open_net(inode, file, &ip6fl_seq_ops,
   sizeof(struct ip6fl_iter_state));
-
-   if (!err) {
-   seq = file->private_data;
-   state = ip6fl_seq_private(seq);
-   rcu_read_lock();
-   state->pid_ns = get_pid_ns(task_active_pid_ns(current));
-   rcu_read_unlock();
-   }
-   return err;
-}
-
-static int ip6fl_seq_release(struct inode *inode, struct file *file)
-{
-   struct seq_file *seq = file->private_data;
-   struct ip6fl_iter_state *state = ip6fl_seq_private(seq);
-   put_pid_ns(state->pid_ns);
-   return seq_release_net(inode, file);
 }
 
 static const struct file_operations ip6fl_seq_fops = {
.open   =   ip6fl_seq_open,
.read   =   seq_read,
.llseek =   seq_lseek,
-   .release=   ip6fl_seq_release,
+   .release=   seq_release_net,
 };
 
 static int __net_init ip6_flowlabel_proc_init(struct net *net)
-- 
2.17.0



[PATCH 04/42] proc: simplify proc_register calling conventions

2018-05-16 Thread Christoph Hellwig
Return registered entry on success, return NULL on failure and free the
passed in entry.  Also expose it in internal.h as we'll start using it
in proc_net.c soon.

Signed-off-by: Christoph Hellwig 
---
 fs/proc/generic.c  | 44 ++--
 fs/proc/internal.h |  2 ++
 2 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 2078e70e1595..bd8480ff0d35 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -346,13 +346,12 @@ static const struct inode_operations 
proc_dir_inode_operations = {
.setattr= proc_notify_change,
 };
 
-static int proc_register(struct proc_dir_entry * dir, struct proc_dir_entry * 
dp)
+/* returns the registered entry, or frees dp and returns NULL on failure */
+struct proc_dir_entry *proc_register(struct proc_dir_entry *dir,
+   struct proc_dir_entry *dp)
 {
-   int ret;
-
-   ret = proc_alloc_inum(&dp->low_ino);
-   if (ret)
-   return ret;
+   if (proc_alloc_inum(&dp->low_ino))
+   goto out_free_entry;
 
write_lock(&proc_subdir_lock);
dp->parent = dir;
@@ -360,12 +359,16 @@ static int proc_register(struct proc_dir_entry * dir, 
struct proc_dir_entry * dp
WARN(1, "proc_dir_entry '%s/%s' already registered\n",
 dir->name, dp->name);
write_unlock(&proc_subdir_lock);
-   proc_free_inum(dp->low_ino);
-   return -EEXIST;
+   goto out_free_inum;
}
write_unlock(&proc_subdir_lock);
 
-   return 0;
+   return dp;
+out_free_inum:
+   proc_free_inum(dp->low_ino);
+out_free_entry:
+   pde_free(dp);
+   return NULL;
 }
 
 static struct proc_dir_entry *__proc_create(struct proc_dir_entry **parent,
@@ -443,10 +446,7 @@ struct proc_dir_entry *proc_symlink(const char *name,
if (ent->data) {
strcpy((char*)ent->data,dest);
ent->proc_iops = &proc_link_inode_operations;
-   if (proc_register(parent, ent) < 0) {
-   pde_free(ent);
-   ent = NULL;
-   }
+   ent = proc_register(parent, ent);
} else {
pde_free(ent);
ent = NULL;
@@ -470,11 +470,9 @@ struct proc_dir_entry *proc_mkdir_data(const char *name, 
umode_t mode,
ent->proc_fops = &proc_dir_operations;
ent->proc_iops = &proc_dir_inode_operations;
parent->nlink++;
-   if (proc_register(parent, ent) < 0) {
-   pde_free(ent);
+   ent = proc_register(parent, ent);
+   if (!ent)
parent->nlink--;
-   ent = NULL;
-   }
}
return ent;
 }
@@ -505,11 +503,9 @@ struct proc_dir_entry *proc_create_mount_point(const char 
*name)
ent->proc_fops = NULL;
ent->proc_iops = NULL;
parent->nlink++;
-   if (proc_register(parent, ent) < 0) {
-   pde_free(ent);
+   ent = proc_register(parent, ent);
+   if (!ent)
parent->nlink--;
-   ent = NULL;
-   }
}
return ent;
 }
@@ -539,11 +535,7 @@ struct proc_dir_entry *proc_create_data(const char *name, 
umode_t mode,
pde->proc_fops = proc_fops;
pde->data = data;
pde->proc_iops = &proc_file_inode_operations;
-   if (proc_register(parent, pde) < 0)
-   goto out_free;
-   return pde;
-out_free:
-   pde_free(pde);
+   return proc_register(parent, pde);
 out:
return NULL;
 }
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 0f1692e63cb6..488e67490312 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -162,6 +162,8 @@ extern bool proc_fill_cache(struct file *, struct 
dir_context *, const char *, i
 /*
  * generic.c
  */
+struct proc_dir_entry *proc_register(struct proc_dir_entry *dir,
+   struct proc_dir_entry *dp);
 extern struct dentry *proc_lookup(struct inode *, struct dentry *, unsigned 
int);
 struct dentry *proc_lookup_de(struct inode *, struct dentry *, struct 
proc_dir_entry *);
 extern int proc_readdir(struct file *, struct dir_context *);
-- 
2.17.0



[PATCH 28/42] drbd: switch to proc_create_single

2018-05-16 Thread Christoph Hellwig
And stop messing with try_module_get on THIS_MODULE, which doesn't make
any sense here.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/drbd/drbd_int.h  |  2 +-
 drivers/block/drbd/drbd_main.c |  3 ++-
 drivers/block/drbd/drbd_proc.c | 34 +-
 3 files changed, 4 insertions(+), 35 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 06ecee1b528e..461ddec04e7c 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -1643,7 +1643,7 @@ void drbd_bump_write_ordering(struct drbd_resource 
*resource, struct drbd_backin
 
 /* drbd_proc.c */
 extern struct proc_dir_entry *drbd_proc;
-extern const struct file_operations drbd_proc_fops;
+int drbd_seq_show(struct seq_file *seq, void *v);
 
 /* drbd_actlog.c */
 extern bool drbd_al_begin_io_prepare(struct drbd_device *device, struct 
drbd_interval *i);
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 185f1ef00a7c..c2d154faac02 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -3010,7 +3010,8 @@ static int __init drbd_init(void)
goto fail;
 
err = -ENOMEM;
-   drbd_proc = proc_create_data("drbd", S_IFREG | S_IRUGO , NULL, 
&drbd_proc_fops, NULL);
+   drbd_proc = proc_create_single("drbd", S_IFREG | S_IRUGO , NULL,
+   drbd_seq_show);
if (!drbd_proc) {
pr_err("unable to register proc file\n");
goto fail;
diff --git a/drivers/block/drbd/drbd_proc.c b/drivers/block/drbd/drbd_proc.c
index 582caeb0de86..74ef29247bb5 100644
--- a/drivers/block/drbd/drbd_proc.c
+++ b/drivers/block/drbd/drbd_proc.c
@@ -33,18 +33,7 @@
 #include 
 #include "drbd_int.h"
 
-static int drbd_proc_open(struct inode *inode, struct file *file);
-static int drbd_proc_release(struct inode *inode, struct file *file);
-
-
 struct proc_dir_entry *drbd_proc;
-const struct file_operations drbd_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = drbd_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= drbd_proc_release,
-};
 
 static void seq_printf_with_thousands_grouping(struct seq_file *seq, long v)
 {
@@ -235,7 +224,7 @@ static void drbd_syncer_progress(struct drbd_device 
*device, struct seq_file *se
}
 }
 
-static int drbd_seq_show(struct seq_file *seq, void *v)
+int drbd_seq_show(struct seq_file *seq, void *v)
 {
int i, prev_i = -1;
const char *sn;
@@ -345,24 +334,3 @@ static int drbd_seq_show(struct seq_file *seq, void *v)
 
return 0;
 }
-
-static int drbd_proc_open(struct inode *inode, struct file *file)
-{
-   int err;
-
-   if (try_module_get(THIS_MODULE)) {
-   err = single_open(file, drbd_seq_show, NULL);
-   if (err)
-   module_put(THIS_MODULE);
-   return err;
-   }
-   return -ENODEV;
-}
-
-static int drbd_proc_release(struct inode *inode, struct file *file)
-{
-   module_put(THIS_MODULE);
-   return single_release(inode, file);
-}
-
-/* PROC FS stuff end */
-- 
2.17.0



[PATCH 17/42] proc: introduce proc_create_net{,_data}

2018-05-16 Thread Christoph Hellwig
Variants of proc_create{,_data} that directly take a struct seq_operations
and deal with network namespaces in ->open and ->release.  All callers of
proc_create + seq_open_net converted over, and seq_{open,release}_net are
removed entirely.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ppp/pppoe.c | 18 +---
 fs/nfs/client.c | 43 ++---
 fs/proc/proc_net.c  | 61 -
 include/linux/proc_fs.h |  9 
 include/linux/seq_file_net.h|  3 --
 include/net/ip6_fib.h   | 10 +++-
 include/net/phonet/pn_dev.h |  4 +-
 include/net/udp.h   |  4 +-
 net/8021q/vlanproc.c| 18 ++--
 net/atm/clip.c  | 17 +--
 net/core/net-procfs.c   | 49 +++-
 net/core/sock.c | 16 +--
 net/decnet/dn_neigh.c   | 18 +---
 net/ipv4/arp.c  | 17 ++-
 net/ipv4/fib_trie.c | 32 ++---
 net/ipv4/igmp.c | 33 ++---
 net/ipv4/ipmr.c | 32 ++---
 net/ipv4/ping.c | 16 +--
 net/ipv4/raw.c  | 17 +--
 net/ipv4/tcp_ipv4.c | 17 +--
 net/ipv4/udp.c  | 21 ++---
 net/ipv4/udplite.c  |  4 +-
 net/ipv6/addrconf.c | 16 +--
 net/ipv6/anycast.c  | 16 +--
 net/ipv6/ip6_fib.c  | 18 +---
 net/ipv6/ip6_flowlabel.c| 17 +--
 net/ipv6/ip6mr.c| 32 ++---
 net/ipv6/mcast.c| 34 ++
 net/ipv6/ping.c | 16 +--
 net/ipv6/raw.c  | 17 +--
 net/ipv6/route.c| 11 +
 net/ipv6/tcp_ipv6.c | 17 +--
 net/ipv6/udp.c  | 21 ++---
 net/ipv6/udplite.c  |  5 +-
 net/kcm/kcmproc.c   | 16 +--
 net/key/af_key.c| 16 +--
 net/l2tp/l2tp_ppp.c | 22 +
 net/netfilter/ipvs/ip_vs_app.c  | 16 +--
 net/netfilter/ipvs/ip_vs_conn.c | 35 ++
 net/netfilter/ipvs/ip_vs_ctl.c  | 16 +--
 net/netfilter/nf_conntrack_expect.c | 17 +--
 net/netfilter/nf_conntrack_standalone.c | 33 ++---
 net/netfilter/nf_log.c  | 19 +---
 net/netfilter/nf_synproxy_core.c| 17 +--
 net/netfilter/nfnetlink_log.c   | 18 +---
 net/netfilter/nfnetlink_queue.c | 18 +---
 net/netfilter/x_tables.c| 18 ++--
 net/netlink/af_netlink.c| 18 +---
 net/packet/af_packet.c  | 17 +--
 net/phonet/pn_dev.c |  6 ++-
 net/phonet/socket.c | 30 +---
 net/rxrpc/ar-internal.h |  4 +-
 net/rxrpc/net_ns.c  |  7 ++-
 net/rxrpc/proc.c| 31 +
 net/sctp/proc.c | 54 +++---
 net/unix/af_unix.c  | 17 +--
 net/wireless/wext-proc.c| 17 +--
 57 files changed, 202 insertions(+), 939 deletions(-)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 7df07337d69c..ce61231e96ea 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -1096,21 +1096,6 @@ static const struct seq_operations pppoe_seq_ops = {
.stop   = pppoe_seq_stop,
.show   = pppoe_seq_show,
 };
-
-static int pppoe_seq_open(struct inode *inode, struct file *file)
-{
-   return seq_open_net(inode, file, &pppoe_seq_ops,
-   sizeof(struct seq_net_private));
-}
-
-static const struct file_operations pppoe_seq_fops = {
-   .owner  = THIS_MODULE,
-   .open   = pppoe_seq_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release_net,
-};
-
 #endif /* CONFIG_PROC_FS */
 
 static const struct proto_ops pppoe_ops = {
@@ -1146,7 +1131,8 @@ static __net_init int pppoe_init_net(struct net *net)
 
rwlock_init(&pn->hash_lock);
 
-   pde = proc_create("pppoe", 0444, net->proc_net, &pppoe_seq_fops);
+   pde = proc_create_net("pppoe", 0444, net->proc_net,
+   &pppoe_seq_ops, sizeof(struct seq_net_private));
 #ifdef CONFIG_PROC_FS
if (!pde)
return -ENOMEM;
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index b9129e2befea..bbc91d7ca1bd 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1067,7 +1067,6 @@ void nfs_clients_init(struct net *net)
 }
 
 #ifdef CONFIG_PROC_FS
-static int nfs_server_list_open(st

[PATCH 25/42] jfs: simplify procfs code

2018-05-16 Thread Christoph Hellwig
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig 
---
 fs/jfs/jfs_debug.c| 43 ++-
 fs/jfs/jfs_debug.h| 10 +-
 fs/jfs/jfs_logmgr.c   | 14 +-
 fs/jfs/jfs_metapage.c | 14 +-
 fs/jfs/jfs_txnmgr.c   | 28 ++--
 fs/jfs/jfs_xtree.c| 14 +-
 6 files changed, 24 insertions(+), 99 deletions(-)

diff --git a/fs/jfs/jfs_debug.c b/fs/jfs/jfs_debug.c
index a70907606025..35a5b2a81ae0 100644
--- a/fs/jfs/jfs_debug.c
+++ b/fs/jfs/jfs_debug.c
@@ -29,7 +29,6 @@
 
 #ifdef PROC_FS_JFS /* see jfs_debug.h */
 
-static struct proc_dir_entry *base;
 #ifdef CONFIG_JFS_DEBUG
 static int jfs_loglevel_proc_show(struct seq_file *m, void *v)
 {
@@ -66,43 +65,29 @@ static const struct file_operations jfs_loglevel_proc_fops 
= {
 };
 #endif
 
-static struct {
-   const char  *name;
-   const struct file_operations *proc_fops;
-} Entries[] = {
-#ifdef CONFIG_JFS_STATISTICS
-   { "lmstats",&jfs_lmstats_proc_fops, },
-   { "txstats",&jfs_txstats_proc_fops, },
-   { "xtstat", &jfs_xtstat_proc_fops, },
-   { "mpstat", &jfs_mpstat_proc_fops, },
-#endif
-#ifdef CONFIG_JFS_DEBUG
-   { "TxAnchor",   &jfs_txanchor_proc_fops, },
-   { "loglevel",   &jfs_loglevel_proc_fops }
-#endif
-};
-#define NPROCENT   ARRAY_SIZE(Entries)
-
 void jfs_proc_init(void)
 {
-   int i;
+   struct proc_dir_entry *base;
 
-   if (!(base = proc_mkdir("fs/jfs", NULL)))
+   base = proc_mkdir("fs/jfs", NULL);
+   if (!base)
return;
 
-   for (i = 0; i < NPROCENT; i++)
-   proc_create(Entries[i].name, 0, base, Entries[i].proc_fops);
+#ifdef CONFIG_JFS_STATISTICS
+   proc_create_single("lmstats", 0, base, jfs_lmstats_proc_show);
+   proc_create_single("txstats", 0, base, jfs_txstats_proc_show);
+   proc_create_single("xtstat", 0, base, jfs_xtstat_proc_show);
+   proc_create_single("mpstat", 0, base, jfs_mpstat_proc_show);
+#endif
+#ifdef CONFIG_JFS_DEBUG
+   proc_create_single("TxAnchor", 0, base, jfs_txanchor_proc_show);
+   proc_create("loglevel", 0, base, &jfs_loglevel_proc_fops);
+#endif
 }
 
 void jfs_proc_clean(void)
 {
-   int i;
-
-   if (base) {
-   for (i = 0; i < NPROCENT; i++)
-   remove_proc_entry(Entries[i].name, base);
-   remove_proc_entry("fs/jfs", NULL);
-   }
+   remove_proc_subtree("fs/jfs", NULL);
 }
 
 #endif /* PROC_FS_JFS */
diff --git a/fs/jfs/jfs_debug.h b/fs/jfs/jfs_debug.h
index eafd1300a00b..0d9e35da8462 100644
--- a/fs/jfs/jfs_debug.h
+++ b/fs/jfs/jfs_debug.h
@@ -62,7 +62,7 @@ extern void jfs_proc_clean(void);
 
 extern int jfsloglevel;
 
-extern const struct file_operations jfs_txanchor_proc_fops;
+int jfs_txanchor_proc_show(struct seq_file *m, void *v);
 
 /* information message: e.g., configuration, major event */
 #define jfs_info(fmt, arg...) do { \
@@ -105,10 +105,10 @@ extern const struct file_operations 
jfs_txanchor_proc_fops;
  * --
  */
 #ifdef CONFIG_JFS_STATISTICS
-extern const struct file_operations jfs_lmstats_proc_fops;
-extern const struct file_operations jfs_txstats_proc_fops;
-extern const struct file_operations jfs_mpstat_proc_fops;
-extern const struct file_operations jfs_xtstat_proc_fops;
+int jfs_lmstats_proc_show(struct seq_file *m, void *v);
+int jfs_txstats_proc_show(struct seq_file *m, void *v);
+int jfs_mpstat_proc_show(struct seq_file *m, void *v);
+int jfs_xtstat_proc_show(struct seq_file *m, void *v);
 
 #defineINCREMENT(x)((x)++)
 #defineDECREMENT(x)((x)--)
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 0e5d412c0b01..6b68df395892 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -2493,7 +2493,7 @@ int lmLogFormat(struct jfs_log *log, s64 logAddress, int 
logSize)
 }
 
 #ifdef CONFIG_JFS_STATISTICS
-static int jfs_lmstats_proc_show(struct seq_file *m, void *v)
+int jfs_lmstats_proc_show(struct seq_file *m, void *v)
 {
seq_printf(m,
   "JFS Logmgr stats\n"
@@ -2510,16 +2510,4 @@ static int jfs_lmstats_proc_show(struct seq_file *m, 
void *v)
   lmStat.partial_page);
return 0;
 }
-
-static int jfs_lmstats_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, jfs_lmstats_proc_show, NULL);
-}
-
-const struct file_operations jfs_lmstats_proc_fops = {
-   .open   = jfs_lmstats_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
 #endif /* CONFIG_JFS_STATISTICS */
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index 1a3b0cc22ad3..fa2c6824c7f2 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/j

[PATCH 21/42] megaraid: simplify procfs code

2018-05-16 Thread Christoph Hellwig
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_single.

Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/megaraid.c | 140 +++-
 drivers/scsi/megaraid.h |  12 
 2 files changed, 36 insertions(+), 116 deletions(-)

diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
index 7195cff51d4c..91f5e2c68dbc 100644
--- a/drivers/scsi/megaraid.c
+++ b/drivers/scsi/megaraid.c
@@ -2731,53 +2731,6 @@ proc_show_rdrv_40(struct seq_file *m, void *v)
return proc_show_rdrv(m, m->private, 30, 39);
 }
 
-
-/*
- * seq_file wrappers for procfile show routines.
- */
-static int mega_proc_open(struct inode *inode, struct file *file)
-{
-   adapter_t *adapter = proc_get_parent_data(inode);
-   int (*show)(struct seq_file *, void *) = PDE_DATA(inode);
-
-   return single_open(file, show, adapter);
-}
-
-static const struct file_operations mega_proc_fops = {
-   .open   = mega_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
-/*
- * Table of proc files we need to create.
- */
-struct mega_proc_file {
-   const char *name;
-   unsigned short ptr_offset;
-   int (*show) (struct seq_file *m, void *v);
-};
-
-static const struct mega_proc_file mega_proc_files[] = {
-   { "config",   offsetof(adapter_t, proc_read), proc_show_config 
},
-   { "stat", offsetof(adapter_t, proc_stat), proc_show_stat },
-   { "mailbox",  offsetof(adapter_t, proc_mbox), proc_show_mbox },
-#if MEGA_HAVE_ENH_PROC
-   { "rebuild-rate", offsetof(adapter_t, proc_rr), 
proc_show_rebuild_rate },
-   { "battery-status",   offsetof(adapter_t, proc_battery), 
proc_show_battery },
-   { "diskdrives-ch0",   offsetof(adapter_t, proc_pdrvstat[0]), 
proc_show_pdrv_ch0 },
-   { "diskdrives-ch1",   offsetof(adapter_t, proc_pdrvstat[1]), 
proc_show_pdrv_ch1 },
-   { "diskdrives-ch2",   offsetof(adapter_t, proc_pdrvstat[2]), 
proc_show_pdrv_ch2 },
-   { "diskdrives-ch3",   offsetof(adapter_t, proc_pdrvstat[3]), 
proc_show_pdrv_ch3 },
-   { "raiddrives-0-9",   offsetof(adapter_t, proc_rdrvstat[0]), 
proc_show_rdrv_10 },
-   { "raiddrives-10-19", offsetof(adapter_t, proc_rdrvstat[1]), 
proc_show_rdrv_20 },
-   { "raiddrives-20-29", offsetof(adapter_t, proc_rdrvstat[2]), 
proc_show_rdrv_30 },
-   { "raiddrives-30-39", offsetof(adapter_t, proc_rdrvstat[3]), 
proc_show_rdrv_40 },
-#endif
-   { NULL }
-};
-
 /**
  * mega_create_proc_entry()
  * @index - index in soft state array
@@ -2788,31 +2741,45 @@ static const struct mega_proc_file mega_proc_files[] = {
 static void
 mega_create_proc_entry(int index, struct proc_dir_entry *parent)
 {
-   const struct mega_proc_file *f;
-   adapter_t   *adapter = hba_soft_state[index];
-   struct proc_dir_entry   *dir, *de, **ppde;
-   u8  string[16];
+   adapter_t *adapter = hba_soft_state[index];
+   struct proc_dir_entry *dir;
+   u8 string[16];
 
sprintf(string, "hba%d", adapter->host->host_no);
-
-   dir = adapter->controller_proc_dir_entry =
-   proc_mkdir_data(string, 0, parent, adapter);
-   if(!dir) {
+   dir = proc_mkdir_data(string, 0, parent, adapter);
+   if (!dir) {
dev_warn(&adapter->dev->dev, "proc_mkdir failed\n");
return;
}
 
-   for (f = mega_proc_files; f->name; f++) {
-   de = proc_create_data(f->name, S_IRUSR, dir, &mega_proc_fops,
- f->show);
-   if (!de) {
-   dev_warn(&adapter->dev->dev, "proc_create failed\n");
-   return;
-   }
-
-   ppde = (void *)adapter + f->ptr_offset;
-   *ppde = de;
-   }
+   proc_create_single_data("config", S_IRUSR, dir,
+   proc_show_config, adapter);
+   proc_create_single_data("stat", S_IRUSR, dir,
+   proc_show_stat, adapter);
+   proc_create_single_data("mailbox", S_IRUSR, dir,
+   proc_show_mbox, adapter);
+#if MEGA_HAVE_ENH_PROC
+   proc_create_single_data("rebuild-rate", S_IRUSR, dir,
+   proc_show_rebuild_rate, adapter);
+   proc_create_single_data("battery-status", S_IRUSR, dir,
+   proc_show_battery, adapter);
+   proc_create_single_data("diskdrives-ch0", S_IRUSR, dir,
+   proc_show_pdrv_ch0, adapter);
+   proc_create_single_data("diskdrives-ch1", S_IRUSR, dir,
+   proc_show_pdrv_ch1, adapter);
+   proc_create_single_data("diskdrives-ch2", 

[PATCH 23/42] afs: simplify procfs code

2018-05-16 Thread Christoph Hellwig
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig 
---
 fs/afs/proc.c | 134 ++
 1 file changed, 15 insertions(+), 119 deletions(-)

diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 839a22280606..3aad32762989 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -62,7 +62,6 @@ static const struct file_operations afs_proc_rootcell_fops = {
.llseek = no_llseek,
 };
 
-static int afs_proc_cell_volumes_open(struct inode *inode, struct file *file);
 static void *afs_proc_cell_volumes_start(struct seq_file *p, loff_t *pos);
 static void *afs_proc_cell_volumes_next(struct seq_file *p, void *v,
loff_t *pos);
@@ -76,15 +75,6 @@ static const struct seq_operations afs_proc_cell_volumes_ops 
= {
.show   = afs_proc_cell_volumes_show,
 };
 
-static const struct file_operations afs_proc_cell_volumes_fops = {
-   .open   = afs_proc_cell_volumes_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-
-static int afs_proc_cell_vlservers_open(struct inode *inode,
-   struct file *file);
 static void *afs_proc_cell_vlservers_start(struct seq_file *p, loff_t *pos);
 static void *afs_proc_cell_vlservers_next(struct seq_file *p, void *v,
  loff_t *pos);
@@ -98,14 +88,6 @@ static const struct seq_operations 
afs_proc_cell_vlservers_ops = {
.show   = afs_proc_cell_vlservers_show,
 };
 
-static const struct file_operations afs_proc_cell_vlservers_fops = {
-   .open   = afs_proc_cell_vlservers_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-
-static int afs_proc_servers_open(struct inode *inode, struct file *file);
 static void *afs_proc_servers_start(struct seq_file *p, loff_t *pos);
 static void *afs_proc_servers_next(struct seq_file *p, void *v,
loff_t *pos);
@@ -119,13 +101,6 @@ static const struct seq_operations afs_proc_servers_ops = {
.show   = afs_proc_servers_show,
 };
 
-static const struct file_operations afs_proc_servers_fops = {
-   .open   = afs_proc_servers_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-
 static int afs_proc_sysname_open(struct inode *inode, struct file *file);
 static int afs_proc_sysname_release(struct inode *inode, struct file *file);
 static void *afs_proc_sysname_start(struct seq_file *p, loff_t *pos);
@@ -152,7 +127,7 @@ static const struct file_operations afs_proc_sysname_fops = 
{
.write  = afs_proc_sysname_write,
 };
 
-static const struct file_operations afs_proc_stats_fops;
+static int afs_proc_stats_show(struct seq_file *m, void *v);
 
 /*
  * initialise the /proc/fs/afs/ directory
@@ -167,8 +142,8 @@ int afs_proc_init(struct afs_net *net)
 
if (!proc_create("cells", 0644, net->proc_afs, &afs_proc_cells_fops) ||
!proc_create("rootcell", 0644, net->proc_afs, 
&afs_proc_rootcell_fops) ||
-   !proc_create("servers", 0644, net->proc_afs, 
&afs_proc_servers_fops) ||
-   !proc_create("stats", 0644, net->proc_afs, &afs_proc_stats_fops) ||
+   !proc_create_seq("servers", 0644, net->proc_afs, 
&afs_proc_servers_ops) ||
+   !proc_create_single("stats", 0644, net->proc_afs, 
afs_proc_stats_show) ||
!proc_create("sysname", 0644, net->proc_afs, 
&afs_proc_sysname_fops))
goto error_tree;
 
@@ -196,16 +171,7 @@ void afs_proc_cleanup(struct afs_net *net)
  */
 static int afs_proc_cells_open(struct inode *inode, struct file *file)
 {
-   struct seq_file *m;
-   int ret;
-
-   ret = seq_open(file, &afs_proc_cells_ops);
-   if (ret < 0)
-   return ret;
-
-   m = file->private_data;
-   m->private = PDE_DATA(inode);
-   return 0;
+   return seq_open(file, &afs_proc_cells_ops);
 }
 
 /*
@@ -430,10 +396,11 @@ int afs_proc_cell_setup(struct afs_net *net, struct 
afs_cell *cell)
if (!dir)
goto error_dir;
 
-   if (!proc_create_data("vlservers", 0, dir,
- &afs_proc_cell_vlservers_fops, cell) ||
-   !proc_create_data("volumes", 0, dir,
- &afs_proc_cell_volumes_fops, cell))
+   if (!proc_create_seq_data("vlservers", 0, dir,
+   &afs_proc_cell_vlservers_ops, cell))
+   goto error_tree;
+   if (!proc_create_seq_data("volumes", 0, dir, &afs_proc_cell_volumes_ops,
+   cell))
goto error_tree;
 
_leave(" = 0");
@@ -458,29 +425,6 @@ void afs_proc_cell_remove(struct afs_net *net,

[PATCH 38/42] isdn: replace ->proc_fops with ->proc_show

2018-05-16 Thread Christoph Hellwig
And switch to proc_create_single_data.

Signed-off-by: Christoph Hellwig 
---
 drivers/isdn/capi/kcapi.c  |  3 ++-
 drivers/isdn/gigaset/capi.c| 16 +---
 drivers/isdn/hardware/avm/avmcard.h|  4 ++--
 drivers/isdn/hardware/avm/b1.c | 17 ++---
 drivers/isdn/hardware/avm/b1dma.c  | 17 ++---
 drivers/isdn/hardware/avm/b1isa.c  |  2 +-
 drivers/isdn/hardware/avm/b1pci.c  |  4 ++--
 drivers/isdn/hardware/avm/b1pcmcia.c   |  2 +-
 drivers/isdn/hardware/avm/c4.c | 15 +--
 drivers/isdn/hardware/avm/t1isa.c  |  2 +-
 drivers/isdn/hardware/avm/t1pci.c  |  2 +-
 drivers/isdn/hardware/eicon/capimain.c | 15 +--
 drivers/isdn/hysdn/hycapi.c| 15 +--
 include/linux/isdn/capilli.h   |  2 +-
 net/bluetooth/cmtp/capi.c  | 14 +-
 15 files changed, 20 insertions(+), 110 deletions(-)

diff --git a/drivers/isdn/capi/kcapi.c b/drivers/isdn/capi/kcapi.c
index 46c189ad8d94..0ff517d3c98f 100644
--- a/drivers/isdn/capi/kcapi.c
+++ b/drivers/isdn/capi/kcapi.c
@@ -534,7 +534,8 @@ int attach_capi_ctr(struct capi_ctr *ctr)
init_waitqueue_head(&ctr->state_wait_queue);
 
sprintf(ctr->procfn, "capi/controllers/%d", ctr->cnr);
-   ctr->procent = proc_create_data(ctr->procfn, 0, NULL, ctr->proc_fops, 
ctr);
+   ctr->procent = proc_create_single_data(ctr->procfn, 0, NULL,
+   ctr->proc_show, ctr);
 
ncontrollers++;
 
diff --git a/drivers/isdn/gigaset/capi.c b/drivers/isdn/gigaset/capi.c
index ccec7778cad2..dac5cd35e901 100644
--- a/drivers/isdn/gigaset/capi.c
+++ b/drivers/isdn/gigaset/capi.c
@@ -2437,19 +2437,6 @@ static int gigaset_proc_show(struct seq_file *m, void *v)
return 0;
 }
 
-static int gigaset_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, gigaset_proc_show, PDE_DATA(inode));
-}
-
-static const struct file_operations gigaset_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = gigaset_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
 /**
  * gigaset_isdn_regdev() - register device to LL
  * @cs:device descriptor structure.
@@ -2478,8 +2465,7 @@ int gigaset_isdn_regdev(struct cardstate *cs, const char 
*isdnid)
iif->ctr.register_appl = gigaset_register_appl;
iif->ctr.release_appl  = gigaset_release_appl;
iif->ctr.send_message  = gigaset_send_message;
-   iif->ctr.procinfo  = gigaset_procinfo;
-   iif->ctr.proc_fops = &gigaset_proc_fops;
+   iif->ctr.proc_show = gigaset_proc_show,
INIT_LIST_HEAD(&iif->appls);
skb_queue_head_init(&iif->sendqueue);
atomic_set(&iif->sendqlen, 0);
diff --git a/drivers/isdn/hardware/avm/avmcard.h 
b/drivers/isdn/hardware/avm/avmcard.h
index c95712dbfa9f..cdfa89c71997 100644
--- a/drivers/isdn/hardware/avm/avmcard.h
+++ b/drivers/isdn/hardware/avm/avmcard.h
@@ -556,7 +556,7 @@ u16  b1_send_message(struct capi_ctr *ctrl, struct sk_buff 
*skb);
 void b1_parse_version(avmctrl_info *card);
 irqreturn_t b1_interrupt(int interrupt, void *devptr);
 
-extern const struct file_operations b1ctl_proc_fops;
+int b1_proc_show(struct seq_file *m, void *v);
 
 avmcard_dmainfo *avmcard_dma_alloc(char *name, struct pci_dev *,
   long rsize, long ssize);
@@ -576,6 +576,6 @@ void b1dma_register_appl(struct capi_ctr *ctrl,
 capi_register_params *rp);
 void b1dma_release_appl(struct capi_ctr *ctrl, u16 appl);
 u16  b1dma_send_message(struct capi_ctr *ctrl, struct sk_buff *skb);
-extern const struct file_operations b1dmactl_proc_fops;
+int b1dma_proc_show(struct seq_file *m, void *v);
 
 #endif /* _AVMCARD_H_ */
diff --git a/drivers/isdn/hardware/avm/b1.c b/drivers/isdn/hardware/avm/b1.c
index b1833d08a5fe..5ee5489d3f15 100644
--- a/drivers/isdn/hardware/avm/b1.c
+++ b/drivers/isdn/hardware/avm/b1.c
@@ -637,7 +637,7 @@ irqreturn_t b1_interrupt(int interrupt, void *devptr)
 }
 
 /* - */
-static int b1ctl_proc_show(struct seq_file *m, void *v)
+int b1_proc_show(struct seq_file *m, void *v)
 {
struct capi_ctr *ctrl = m->private;
avmctrl_info *cinfo = (avmctrl_info *)(ctrl->driverdata);
@@ -699,20 +699,7 @@ static int b1ctl_proc_show(struct seq_file *m, void *v)
 
return 0;
 }
-
-static int b1ctl_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, b1ctl_proc_show, PDE_DATA(inode));
-}
-
-const struct file_operations b1ctl_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = b1ctl_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-EXPORT_SYMBOL(b1ctl_proc_fops);
+EXPORT_SYMBOL(b1_proc_show);
 
 

[PATCH 37/42] atm: switch to proc_create_seq_private

2018-05-16 Thread Christoph Hellwig
And remove proc boilerplate code.

Signed-off-by: Christoph Hellwig 
---
 net/atm/proc.c | 72 +-
 1 file changed, 13 insertions(+), 59 deletions(-)

diff --git a/net/atm/proc.c b/net/atm/proc.c
index f272b0f59d82..0b0495a41bbe 100644
--- a/net/atm/proc.c
+++ b/net/atm/proc.c
@@ -68,7 +68,6 @@ static void atm_dev_info(struct seq_file *seq, const struct 
atm_dev *dev)
 struct vcc_state {
int bucket;
struct sock *sk;
-   int family;
 };
 
 static inline int compare_family(struct sock *sk, int family)
@@ -106,23 +105,13 @@ static int __vcc_walk(struct sock **sock, int family, int 
*bucket, loff_t l)
return (l < 0);
 }
 
-static inline void *vcc_walk(struct vcc_state *state, loff_t l)
+static inline void *vcc_walk(struct seq_file *seq, loff_t l)
 {
-   return __vcc_walk(&state->sk, state->family, &state->bucket, l) ?
-  state : NULL;
-}
-
-static int __vcc_seq_open(struct inode *inode, struct file *file,
-   int family, const struct seq_operations *ops)
-{
-   struct vcc_state *state;
-
-   state = __seq_open_private(file, ops, sizeof(*state));
-   if (state == NULL)
-   return -ENOMEM;
+   struct vcc_state *state = seq->private;
+   int family = (uintptr_t)(PDE_DATA(file_inode(seq->file)));
 
-   state->family = family;
-   return 0;
+   return __vcc_walk(&state->sk, family, &state->bucket, l) ?
+  state : NULL;
 }
 
 static void *vcc_seq_start(struct seq_file *seq, loff_t *pos)
@@ -133,7 +122,7 @@ static void *vcc_seq_start(struct seq_file *seq, loff_t 
*pos)
 
read_lock(&vcc_sklist_lock);
state->sk = SEQ_START_TOKEN;
-   return left ? vcc_walk(state, left) : SEQ_START_TOKEN;
+   return left ? vcc_walk(seq, left) : SEQ_START_TOKEN;
 }
 
 static void vcc_seq_stop(struct seq_file *seq, void *v)
@@ -144,9 +133,7 @@ static void vcc_seq_stop(struct seq_file *seq, void *v)
 
 static void *vcc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-   struct vcc_state *state = seq->private;
-
-   v = vcc_walk(state, 1);
+   v = vcc_walk(seq, 1);
*pos += !!PTR_ERR(v);
return v;
 }
@@ -280,18 +267,6 @@ static const struct seq_operations pvc_seq_ops = {
.show   = pvc_seq_show,
 };
 
-static int pvc_seq_open(struct inode *inode, struct file *file)
-{
-   return __vcc_seq_open(inode, file, PF_ATMPVC, &pvc_seq_ops);
-}
-
-static const struct file_operations pvc_seq_fops = {
-   .open   = pvc_seq_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release_private,
-};
-
 static int vcc_seq_show(struct seq_file *seq, void *v)
 {
if (v == SEQ_START_TOKEN) {
@@ -314,18 +289,6 @@ static const struct seq_operations vcc_seq_ops = {
.show   = vcc_seq_show,
 };
 
-static int vcc_seq_open(struct inode *inode, struct file *file)
-{
-   return __vcc_seq_open(inode, file, 0, &vcc_seq_ops);
-}
-
-static const struct file_operations vcc_seq_fops = {
-   .open   = vcc_seq_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release_private,
-};
-
 static int svc_seq_show(struct seq_file *seq, void *v)
 {
static const char atm_svc_banner[] =
@@ -349,18 +312,6 @@ static const struct seq_operations svc_seq_ops = {
.show   = svc_seq_show,
 };
 
-static int svc_seq_open(struct inode *inode, struct file *file)
-{
-   return __vcc_seq_open(inode, file, PF_ATMSVC, &svc_seq_ops);
-}
-
-static const struct file_operations svc_seq_fops = {
-   .open   = svc_seq_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release_private,
-};
-
 static ssize_t proc_dev_atm_read(struct file *file, char __user *buf,
 size_t count, loff_t *pos)
 {
@@ -434,9 +385,12 @@ int __init atm_proc_init(void)
if (!atm_proc_root)
return -ENOMEM;
proc_create_seq("devices", 0444, atm_proc_root, &atm_dev_seq_ops);
-   proc_create("pvc", 0444, atm_proc_root, &pvc_seq_fops);
-   proc_create("svc", 0444, atm_proc_root, &svc_seq_fops);
-   proc_create("vc", 0444, atm_proc_root, &vcc_seq_fops);
+   proc_create_seq_private("pvc", 0444, atm_proc_root, &pvc_seq_ops,
+   sizeof(struct vcc_state), (void *)(uintptr_t)PF_ATMPVC);
+   proc_create_seq_private("svc", 0444, atm_proc_root, &svc_seq_ops,
+   sizeof(struct vcc_state), (void *)(uintptr_t)PF_ATMSVC);
+   proc_create_seq_private("vc", 0444, atm_proc_root, &vcc_seq_ops,
+   sizeof(struct vcc_state), NULL);
return 0;
 }
 
-- 
2.17.0



[PATCH 30/42] bonding: switch to proc_create_seq_data

2018-05-16 Thread Christoph Hellwig
And use proc private data directly instead of doing a detour
through seq->private.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/bonding/bond_procfs.c | 36 ++-
 1 file changed, 6 insertions(+), 30 deletions(-)

diff --git a/drivers/net/bonding/bond_procfs.c 
b/drivers/net/bonding/bond_procfs.c
index 01059f1a7bca..9f7d83e827c3 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -10,7 +10,7 @@
 static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(RCU)
 {
-   struct bonding *bond = seq->private;
+   struct bonding *bond = PDE_DATA(file_inode(seq->file));
struct list_head *iter;
struct slave *slave;
loff_t off = 0;
@@ -29,7 +29,7 @@ static void *bond_info_seq_start(struct seq_file *seq, loff_t 
*pos)
 
 static void *bond_info_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-   struct bonding *bond = seq->private;
+   struct bonding *bond = PDE_DATA(file_inode(seq->file));
struct list_head *iter;
struct slave *slave;
bool found = false;
@@ -56,7 +56,7 @@ static void bond_info_seq_stop(struct seq_file *seq, void *v)
 
 static void bond_info_show_master(struct seq_file *seq)
 {
-   struct bonding *bond = seq->private;
+   struct bonding *bond = PDE_DATA(file_inode(seq->file));
const struct bond_opt_value *optval;
struct slave *curr, *primary;
int i;
@@ -167,7 +167,7 @@ static void bond_info_show_master(struct seq_file *seq)
 static void bond_info_show_slave(struct seq_file *seq,
 const struct slave *slave)
 {
-   struct bonding *bond = seq->private;
+   struct bonding *bond = PDE_DATA(file_inode(seq->file));
 
seq_printf(seq, "\nSlave Interface: %s\n", slave->dev->name);
seq_printf(seq, "MII Status: %s\n", 
bond_slave_link_status(slave->link));
@@ -257,38 +257,14 @@ static const struct seq_operations bond_info_seq_ops = {
.show  = bond_info_seq_show,
 };
 
-static int bond_info_open(struct inode *inode, struct file *file)
-{
-   struct seq_file *seq;
-   int res;
-
-   res = seq_open(file, &bond_info_seq_ops);
-   if (!res) {
-   /* recover the pointer buried in proc_dir_entry data */
-   seq = file->private_data;
-   seq->private = PDE_DATA(inode);
-   }
-
-   return res;
-}
-
-static const struct file_operations bond_info_fops = {
-   .owner   = THIS_MODULE,
-   .open= bond_info_open,
-   .read= seq_read,
-   .llseek  = seq_lseek,
-   .release = seq_release,
-};
-
 void bond_create_proc_entry(struct bonding *bond)
 {
struct net_device *bond_dev = bond->dev;
struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
 
if (bn->proc_dir) {
-   bond->proc_entry = proc_create_data(bond_dev->name,
-   0444, bn->proc_dir,
-   &bond_info_fops, bond);
+   bond->proc_entry = proc_create_seq_data(bond_dev->name, 0444,
+   bn->proc_dir, &bond_info_seq_ops, bond);
if (bond->proc_entry == NULL)
netdev_warn(bond_dev, "Cannot create /proc/net/%s/%s\n",
DRV_NAME, bond_dev->name);
-- 
2.17.0



[PATCH 35/42] bluetooth: switch to proc_create_seq_data

2018-05-16 Thread Christoph Hellwig
And use proc private data directly instead of doing a detour
through seq->private and private state.

Signed-off-by: Christoph Hellwig 
---
 net/bluetooth/af_bluetooth.c | 40 +---
 1 file changed, 5 insertions(+), 35 deletions(-)

diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 84d92a077834..3264e1873219 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -605,15 +605,10 @@ int bt_sock_wait_ready(struct sock *sk, unsigned long 
flags)
 EXPORT_SYMBOL(bt_sock_wait_ready);
 
 #ifdef CONFIG_PROC_FS
-struct bt_seq_state {
-   struct bt_sock_list *l;
-};
-
 static void *bt_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(seq->private->l->lock)
 {
-   struct bt_seq_state *s = seq->private;
-   struct bt_sock_list *l = s->l;
+   struct bt_sock_list *l = PDE_DATA(file_inode(seq->file));
 
read_lock(&l->lock);
return seq_hlist_start_head(&l->head, *pos);
@@ -621,8 +616,7 @@ static void *bt_seq_start(struct seq_file *seq, loff_t *pos)
 
 static void *bt_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-   struct bt_seq_state *s = seq->private;
-   struct bt_sock_list *l = s->l;
+   struct bt_sock_list *l = PDE_DATA(file_inode(seq->file));
 
return seq_hlist_next(v, &l->head, pos);
 }
@@ -630,16 +624,14 @@ static void *bt_seq_next(struct seq_file *seq, void *v, 
loff_t *pos)
 static void bt_seq_stop(struct seq_file *seq, void *v)
__releases(seq->private->l->lock)
 {
-   struct bt_seq_state *s = seq->private;
-   struct bt_sock_list *l = s->l;
+   struct bt_sock_list *l = PDE_DATA(file_inode(seq->file));
 
read_unlock(&l->lock);
 }
 
 static int bt_seq_show(struct seq_file *seq, void *v)
 {
-   struct bt_seq_state *s = seq->private;
-   struct bt_sock_list *l = s->l;
+   struct bt_sock_list *l = PDE_DATA(file_inode(seq->file));
 
if (v == SEQ_START_TOKEN) {
seq_puts(seq ,"sk   RefCnt Rmem   Wmem   User   
Inode  Parent");
@@ -681,35 +673,13 @@ static const struct seq_operations bt_seq_ops = {
.show  = bt_seq_show,
 };
 
-static int bt_seq_open(struct inode *inode, struct file *file)
-{
-   struct bt_sock_list *sk_list;
-   struct bt_seq_state *s;
-
-   sk_list = PDE_DATA(inode);
-   s = __seq_open_private(file, &bt_seq_ops,
-  sizeof(struct bt_seq_state));
-   if (!s)
-   return -ENOMEM;
-
-   s->l = sk_list;
-   return 0;
-}
-
-static const struct file_operations bt_fops = {
-   .open = bt_seq_open,
-   .read = seq_read,
-   .llseek = seq_lseek,
-   .release = seq_release_private
-};
-
 int bt_procfs_init(struct net *net, const char *name,
   struct bt_sock_list *sk_list,
   int (* seq_show)(struct seq_file *, void *))
 {
sk_list->custom_seq_show = seq_show;
 
-   if (!proc_create_data(name, 0, net->proc_net, &bt_fops, sk_list))
+   if (!proc_create_seq_data(name, 0, net->proc_net, &bt_seq_ops, sk_list))
return -ENOMEM;
return 0;
 }
-- 
2.17.0



[PATCH 36/42] atm: simplify procfs code

2018-05-16 Thread Christoph Hellwig
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.

Signed-off-by: Christoph Hellwig 
---
 net/atm/proc.c | 65 ++
 1 file changed, 7 insertions(+), 58 deletions(-)

diff --git a/net/atm/proc.c b/net/atm/proc.c
index 55410c00c7e2..f272b0f59d82 100644
--- a/net/atm/proc.c
+++ b/net/atm/proc.c
@@ -257,18 +257,6 @@ static const struct seq_operations atm_dev_seq_ops = {
.show   = atm_dev_seq_show,
 };
 
-static int atm_dev_seq_open(struct inode *inode, struct file *file)
-{
-   return seq_open(file, &atm_dev_seq_ops);
-}
-
-static const struct file_operations devices_seq_fops = {
-   .open   = atm_dev_seq_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-
 static int pvc_seq_show(struct seq_file *seq, void *v)
 {
static char atm_pvc_banner[] =
@@ -440,58 +428,19 @@ void atm_proc_dev_deregister(struct atm_dev *dev)
kfree(dev->proc_name);
 }
 
-static struct atm_proc_entry {
-   char *name;
-   const struct file_operations *proc_fops;
-   struct proc_dir_entry *dirent;
-} atm_proc_ents[] = {
-   { .name = "devices",.proc_fops = &devices_seq_fops },
-   { .name = "pvc",.proc_fops = &pvc_seq_fops },
-   { .name = "svc",.proc_fops = &svc_seq_fops },
-   { .name = "vc", .proc_fops = &vcc_seq_fops },
-   { .name = NULL, .proc_fops = NULL }
-};
-
-static void atm_proc_dirs_remove(void)
-{
-   static struct atm_proc_entry *e;
-
-   for (e = atm_proc_ents; e->name; e++) {
-   if (e->dirent)
-   remove_proc_entry(e->name, atm_proc_root);
-   }
-   remove_proc_entry("atm", init_net.proc_net);
-}
-
 int __init atm_proc_init(void)
 {
-   static struct atm_proc_entry *e;
-   int ret;
-
atm_proc_root = proc_net_mkdir(&init_net, "atm", init_net.proc_net);
if (!atm_proc_root)
-   goto err_out;
-   for (e = atm_proc_ents; e->name; e++) {
-   struct proc_dir_entry *dirent;
-
-   dirent = proc_create(e->name, 0444,
-atm_proc_root, e->proc_fops);
-   if (!dirent)
-   goto err_out_remove;
-   e->dirent = dirent;
-   }
-   ret = 0;
-out:
-   return ret;
-
-err_out_remove:
-   atm_proc_dirs_remove();
-err_out:
-   ret = -ENOMEM;
-   goto out;
+   return -ENOMEM;
+   proc_create_seq("devices", 0444, atm_proc_root, &atm_dev_seq_ops);
+   proc_create("pvc", 0444, atm_proc_root, &pvc_seq_fops);
+   proc_create("svc", 0444, atm_proc_root, &svc_seq_fops);
+   proc_create("vc", 0444, atm_proc_root, &vcc_seq_fops);
+   return 0;
 }
 
 void atm_proc_exit(void)
 {
-   atm_proc_dirs_remove();
+   remove_proc_subtree("atm", init_net.proc_net);
 }
-- 
2.17.0



[PATCH 40/42] ide: replace ->proc_fops with ->proc_show

2018-05-16 Thread Christoph Hellwig
Just set up the show callback in the tty_operations, and use
proc_create_single_data to create the file without additional
boilerplace code.

Signed-off-by: Christoph Hellwig 
---
 drivers/ide/ide-cd.c  |  15 +---
 drivers/ide/ide-disk_proc.c   |  62 ++--
 drivers/ide/ide-floppy_proc.c |  17 +
 drivers/ide/ide-proc.c| 136 +-
 drivers/ide/ide-tape.c|  17 +
 include/linux/ide.h   |   6 +-
 6 files changed, 31 insertions(+), 222 deletions(-)

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 5a8e8e3c22cd..b52a7bdace52 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1426,21 +1426,8 @@ static int idecd_capacity_proc_show(struct seq_file *m, 
void *v)
return 0;
 }
 
-static int idecd_capacity_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, idecd_capacity_proc_show, PDE_DATA(inode));
-}
-
-static const struct file_operations idecd_capacity_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = idecd_capacity_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
 static ide_proc_entry_t idecd_proc[] = {
-   { "capacity", S_IFREG|S_IRUGO, &idecd_capacity_proc_fops },
+   { "capacity", S_IFREG|S_IRUGO, idecd_capacity_proc_show },
{}
 };
 
diff --git a/drivers/ide/ide-disk_proc.c b/drivers/ide/ide-disk_proc.c
index 82a36ced4e96..95d239b2f646 100644
--- a/drivers/ide/ide-disk_proc.c
+++ b/drivers/ide/ide-disk_proc.c
@@ -52,19 +52,6 @@ static int idedisk_cache_proc_show(struct seq_file *m, void 
*v)
return 0;
 }
 
-static int idedisk_cache_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, idedisk_cache_proc_show, PDE_DATA(inode));
-}
-
-static const struct file_operations idedisk_cache_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = idedisk_cache_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
 static int idedisk_capacity_proc_show(struct seq_file *m, void *v)
 {
ide_drive_t*drive = (ide_drive_t *)m->private;
@@ -73,19 +60,6 @@ static int idedisk_capacity_proc_show(struct seq_file *m, 
void *v)
return 0;
 }
 
-static int idedisk_capacity_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, idedisk_capacity_proc_show, PDE_DATA(inode));
-}
-
-static const struct file_operations idedisk_capacity_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = idedisk_capacity_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
 static int __idedisk_proc_show(struct seq_file *m, ide_drive_t *drive, u8 
sub_cmd)
 {
u8 *buf;
@@ -114,43 +88,17 @@ static int idedisk_sv_proc_show(struct seq_file *m, void 
*v)
return __idedisk_proc_show(m, m->private, ATA_SMART_READ_VALUES);
 }
 
-static int idedisk_sv_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, idedisk_sv_proc_show, PDE_DATA(inode));
-}
-
-static const struct file_operations idedisk_sv_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = idedisk_sv_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
 static int idedisk_st_proc_show(struct seq_file *m, void *v)
 {
return __idedisk_proc_show(m, m->private, ATA_SMART_READ_THRESHOLDS);
 }
 
-static int idedisk_st_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, idedisk_st_proc_show, PDE_DATA(inode));
-}
-
-static const struct file_operations idedisk_st_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = idedisk_st_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
 ide_proc_entry_t ide_disk_proc[] = {
-   { "cache",S_IFREG|S_IRUGO, &idedisk_cache_proc_fops },
-   { "capacity", S_IFREG|S_IRUGO, &idedisk_capacity_proc_fops  },
-   { "geometry", S_IFREG|S_IRUGO, &ide_geometry_proc_fops  },
-   { "smart_values", S_IFREG|S_IRUSR, &idedisk_sv_proc_fops},
-   { "smart_thresholds", S_IFREG|S_IRUSR, &idedisk_st_proc_fops},
+   { "cache",S_IFREG|S_IRUGO, idedisk_cache_proc_show  },
+   { "capacity", S_IFREG|S_IRUGO, idedisk_capacity_proc_show   },
+   { "geometry", S_IFREG|S_IRUGO, ide_geometry_proc_show   },
+   { "smart_values", S_IFREG|S_IRUSR, idedisk_sv_proc_show },
+   { "smart_thresholds", S_IFREG|S_IRUSR, idedisk_st_proc_show },
{}
 };
 
diff --git a/drivers/ide/ide-floppy_proc.c b/drivers/ide/ide-floppy_proc.c
index 471457ebea67..7f697

[PATCH 42/42] proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields

2018-05-16 Thread Christoph Hellwig
This makes Alexey happy and Al groan.  Based on a patch from
Alexey Dobriyan.

Signed-off-by: Christoph Hellwig 
---
 fs/proc/internal.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 84c68508a256..a318ae5b36b4 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -62,9 +62,9 @@ struct proc_dir_entry {
umode_t mode;
u8 namelen;
 #ifdef CONFIG_64BIT
-#define SIZEOF_PDE_INLINE_NAME (192-139)
+#define SIZEOF_PDE_INLINE_NAME (192-155)
 #else
-#define SIZEOF_PDE_INLINE_NAME (128-87)
+#define SIZEOF_PDE_INLINE_NAME (128-95)
 #endif
char inline_name[SIZEOF_PDE_INLINE_NAME];
 } __randomize_layout;
-- 
2.17.0



[PATCH 33/42] netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data

2018-05-16 Thread Christoph Hellwig
And use proc private data directly instead of doing a detour
through seq->private.

Signed-off-by: Christoph Hellwig 
---
 net/netfilter/xt_hashlimit.c | 92 +++-
 1 file changed, 18 insertions(+), 74 deletions(-)

diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 0cd73567e7ff..9b16402f29af 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -57,9 +57,9 @@ static inline struct hashlimit_net *hashlimit_pernet(struct 
net *net)
 }
 
 /* need to declare this at the top */
-static const struct file_operations dl_file_ops_v2;
-static const struct file_operations dl_file_ops_v1;
-static const struct file_operations dl_file_ops;
+static const struct seq_operations dl_seq_ops_v2;
+static const struct seq_operations dl_seq_ops_v1;
+static const struct seq_operations dl_seq_ops;
 
 /* hash table crap */
 struct dsthash_dst {
@@ -272,7 +272,7 @@ static int htable_create(struct net *net, struct 
hashlimit_cfg3 *cfg,
 {
struct hashlimit_net *hashlimit_net = hashlimit_pernet(net);
struct xt_hashlimit_htable *hinfo;
-   const struct file_operations *fops;
+   const struct seq_operations *ops;
unsigned int size, i;
int ret;
 
@@ -321,19 +321,19 @@ static int htable_create(struct net *net, struct 
hashlimit_cfg3 *cfg,
 
switch (revision) {
case 1:
-   fops = &dl_file_ops_v1;
+   ops = &dl_seq_ops_v1;
break;
case 2:
-   fops = &dl_file_ops_v2;
+   ops = &dl_seq_ops_v2;
break;
default:
-   fops = &dl_file_ops;
+   ops = &dl_seq_ops;
}
 
-   hinfo->pde = proc_create_data(name, 0,
+   hinfo->pde = proc_create_seq_data(name, 0,
(family == NFPROTO_IPV4) ?
hashlimit_net->ipt_hashlimit : hashlimit_net->ip6t_hashlimit,
-   fops, hinfo);
+   ops, hinfo);
if (hinfo->pde == NULL) {
kfree(hinfo->name);
vfree(hinfo);
@@ -1057,7 +1057,7 @@ static struct xt_match hashlimit_mt_reg[] __read_mostly = 
{
 static void *dl_seq_start(struct seq_file *s, loff_t *pos)
__acquires(htable->lock)
 {
-   struct xt_hashlimit_htable *htable = s->private;
+   struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->private));
unsigned int *bucket;
 
spin_lock_bh(&htable->lock);
@@ -1074,7 +1074,7 @@ static void *dl_seq_start(struct seq_file *s, loff_t *pos)
 
 static void *dl_seq_next(struct seq_file *s, void *v, loff_t *pos)
 {
-   struct xt_hashlimit_htable *htable = s->private;
+   struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->private));
unsigned int *bucket = v;
 
*pos = ++(*bucket);
@@ -1088,7 +1088,7 @@ static void *dl_seq_next(struct seq_file *s, void *v, 
loff_t *pos)
 static void dl_seq_stop(struct seq_file *s, void *v)
__releases(htable->lock)
 {
-   struct xt_hashlimit_htable *htable = s->private;
+   struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->private));
unsigned int *bucket = v;
 
if (!IS_ERR(bucket))
@@ -1130,7 +1130,7 @@ static void dl_seq_print(struct dsthash_ent *ent, 
u_int8_t family,
 static int dl_seq_real_show_v2(struct dsthash_ent *ent, u_int8_t family,
   struct seq_file *s)
 {
-   const struct xt_hashlimit_htable *ht = s->private;
+   struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->private));
 
spin_lock(&ent->lock);
/* recalculate to show accurate numbers */
@@ -1145,7 +1145,7 @@ static int dl_seq_real_show_v2(struct dsthash_ent *ent, 
u_int8_t family,
 static int dl_seq_real_show_v1(struct dsthash_ent *ent, u_int8_t family,
   struct seq_file *s)
 {
-   const struct xt_hashlimit_htable *ht = s->private;
+   struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->private));
 
spin_lock(&ent->lock);
/* recalculate to show accurate numbers */
@@ -1160,7 +1160,7 @@ static int dl_seq_real_show_v1(struct dsthash_ent *ent, 
u_int8_t family,
 static int dl_seq_real_show(struct dsthash_ent *ent, u_int8_t family,
struct seq_file *s)
 {
-   const struct xt_hashlimit_htable *ht = s->private;
+   struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->private));
 
spin_lock(&ent->lock);
/* recalculate to show accurate numbers */
@@ -1174,7 +1174,7 @@ static int dl_seq_real_show(struct dsthash_ent *ent, 
u_int8_t family,
 
 static int dl_seq_show_v2(struct seq_file *s, void *v)
 {
-   struct xt_hashlimit_htable *htable = s->private;
+   struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->private));
unsigned int *bucket = (unsigned int *)v;
struct dsthash_ent *ent;
 
@@ -1188,7 +1188,7 @@ static int dl_seq_show_v2(struct seq_file *s, void *v)
 
 static int dl_seq_s

[PATCH 41/42] tty: replace ->proc_fops with ->proc_show

2018-05-16 Thread Christoph Hellwig
Just set up the show callback in the tty_operations, and use
proc_create_single_data to create the file without additional
boilerplace code.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/hp/sim/simserial.c| 15 +--
 arch/xtensa/platforms/iss/console.c | 15 +--
 drivers/char/pcmcia/synclink_cs.c   | 15 +--
 drivers/mmc/core/sdio_uart.c| 15 +--
 drivers/staging/fwserial/fwserial.c | 15 +--
 drivers/tty/amiserial.c | 15 +--
 drivers/tty/cyclades.c  | 15 +--
 drivers/tty/serial/serial_core.c| 15 +--
 drivers/tty/synclink.c  | 15 +--
 drivers/tty/synclink_gt.c   | 15 +--
 drivers/tty/synclinkmp.c| 15 +--
 drivers/usb/serial/usb-serial.c | 15 +--
 fs/proc/proc_tty.c  |  6 +++---
 include/linux/tty_driver.h  |  2 +-
 14 files changed, 16 insertions(+), 172 deletions(-)

diff --git a/arch/ia64/hp/sim/simserial.c b/arch/ia64/hp/sim/simserial.c
index a419ccf33cde..663388a73d4e 100644
--- a/arch/ia64/hp/sim/simserial.c
+++ b/arch/ia64/hp/sim/simserial.c
@@ -435,19 +435,6 @@ static int rs_proc_show(struct seq_file *m, void *v)
return 0;
 }
 
-static int rs_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, rs_proc_show, NULL);
-}
-
-static const struct file_operations rs_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = rs_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
 static const struct tty_operations hp_ops = {
.open = rs_open,
.close = rs_close,
@@ -462,7 +449,7 @@ static const struct tty_operations hp_ops = {
.unthrottle = rs_unthrottle,
.send_xchar = rs_send_xchar,
.hangup = rs_hangup,
-   .proc_fops = &rs_proc_fops,
+   .proc_show = rs_proc_show,
 };
 
 static const struct tty_port_operations hp_port_ops = {
diff --git a/arch/xtensa/platforms/iss/console.c 
b/arch/xtensa/platforms/iss/console.c
index 92f567f9a21e..af81a62faba6 100644
--- a/arch/xtensa/platforms/iss/console.c
+++ b/arch/xtensa/platforms/iss/console.c
@@ -153,19 +153,6 @@ static int rs_proc_show(struct seq_file *m, void *v)
return 0;
 }
 
-static int rs_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, rs_proc_show, NULL);
-}
-
-static const struct file_operations rs_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = rs_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
 static const struct tty_operations serial_ops = {
.open = rs_open,
.close = rs_close,
@@ -176,7 +163,7 @@ static const struct tty_operations serial_ops = {
.chars_in_buffer = rs_chars_in_buffer,
.hangup = rs_hangup,
.wait_until_sent = rs_wait_until_sent,
-   .proc_fops = &rs_proc_fops,
+   .proc_show = rs_proc_show,
 };
 
 int __init rs_init(void)
diff --git a/drivers/char/pcmcia/synclink_cs.c 
b/drivers/char/pcmcia/synclink_cs.c
index aa502e9fb7fa..66b04194aa9f 100644
--- a/drivers/char/pcmcia/synclink_cs.c
+++ b/drivers/char/pcmcia/synclink_cs.c
@@ -2616,19 +2616,6 @@ static int mgslpc_proc_show(struct seq_file *m, void *v)
return 0;
 }
 
-static int mgslpc_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, mgslpc_proc_show, NULL);
-}
-
-static const struct file_operations mgslpc_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = mgslpc_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
 static int rx_alloc_buffers(MGSLPC_INFO *info)
 {
/* each buffer has header and data */
@@ -2815,7 +2802,7 @@ static const struct tty_operations mgslpc_ops = {
.tiocmget = tiocmget,
.tiocmset = tiocmset,
.get_icount = mgslpc_get_icount,
-   .proc_fops = &mgslpc_proc_fops,
+   .proc_show = mgslpc_proc_show,
 };
 
 static int __init synclink_cs_init(void)
diff --git a/drivers/mmc/core/sdio_uart.c b/drivers/mmc/core/sdio_uart.c
index d3c91f412b69..25e113001a3c 100644
--- a/drivers/mmc/core/sdio_uart.c
+++ b/drivers/mmc/core/sdio_uart.c
@@ -1008,19 +1008,6 @@ static int sdio_uart_proc_show(struct seq_file *m, void 
*v)
return 0;
 }
 
-static int sdio_uart_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, sdio_uart_proc_show, NULL);
-}
-
-static const struct file_operations sdio_uart_proc_fops = {
-   .owner  = THIS_MODULE,
-   .open   = sdio_uart_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
 static const struct t

Re: [PATCH 14/14] net: sched: implement delete for all actions

2018-05-16 Thread Jiri Pirko
Mon, May 14, 2018 at 04:27:15PM CEST, vla...@mellanox.com wrote:
>Implement delete function that is required to delete actions without
>holding rtnl lock. Use action API function that atomically deletes action
>only if it is still in action idr. This implementation prevents concurrent
>threads from deleting same action twice.
>
>Signed-off-by: Vlad Buslov 
>---
> net/sched/act_bpf.c|  8 
> net/sched/act_connmark.c   |  8 
> net/sched/act_csum.c   |  8 
> net/sched/act_gact.c   |  8 
> net/sched/act_ife.c|  8 
> net/sched/act_ipt.c| 16 
> net/sched/act_mirred.c |  8 
> net/sched/act_nat.c|  8 
> net/sched/act_pedit.c  |  8 
> net/sched/act_police.c |  8 
> net/sched/act_sample.c |  8 
> net/sched/act_simple.c |  8 
> net/sched/act_skbedit.c|  8 
> net/sched/act_skbmod.c |  8 
> net/sched/act_tunnel_key.c |  8 
> net/sched/act_vlan.c   |  8 
> 16 files changed, 136 insertions(+)
>
>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>index 0bf4ecf..36f7f66 100644
>--- a/net/sched/act_bpf.c
>+++ b/net/sched/act_bpf.c
>@@ -394,6 +394,13 @@ static int tcf_bpf_search(struct net *net, struct 
>tc_action **a, u32 index,
>   return tcf_idr_search(tn, a, index);
> }
> 
>+static int tcf_bpf_delete(struct net *net, u32 index)
>+{
>+  struct tc_action_net *tn = net_generic(net, bpf_net_id);
>+
>+  return tcf_idr_find_delete(tn, index);
>+}
>+
> static struct tc_action_ops act_bpf_ops __read_mostly = {
>   .kind   =   "bpf",
>   .type   =   TCA_ACT_BPF,
>@@ -404,6 +411,7 @@ static struct tc_action_ops act_bpf_ops __read_mostly = {
>   .init   =   tcf_bpf_init,
>   .walk   =   tcf_bpf_walker,
>   .lookup =   tcf_bpf_search,
>+  .delete =   tcf_bpf_delete,

I wonder, right before this patch, how the idr index got removed?
delete op is NULL and I didn't find anyone else to do it.

Also, after this patch, does it make sense to have following check in
tcf_action_del_1()?

   if (ops->delete)
   err = ops->delete(net, index);

Looks like ops->delete is non-null for all.

Seems to me that you need to introduce this patch filling up the delete
op in all acts and only after that introduce a code that actually calls
it.

[...]


[PATCH 32/42] neigh: switch to proc_create_seq_data

2018-05-16 Thread Christoph Hellwig
And use proc private data directly instead of doing a detour
through seq->private.

Signed-off-by: Christoph Hellwig 
---
 net/core/neighbour.c | 31 ++-
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index ce519861be59..1fb43bff417d 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -59,7 +59,7 @@ static int pneigh_ifdown_and_unlock(struct neigh_table *tbl,
struct net_device *dev);
 
 #ifdef CONFIG_PROC_FS
-static const struct file_operations neigh_stat_seq_fops;
+static const struct seq_operations neigh_stat_seq_ops;
 #endif
 
 /*
@@ -1558,8 +1558,8 @@ void neigh_table_init(int index, struct neigh_table *tbl)
panic("cannot create neighbour cache statistics");
 
 #ifdef CONFIG_PROC_FS
-   if (!proc_create_data(tbl->id, 0, init_net.proc_net_stat,
- &neigh_stat_seq_fops, tbl))
+   if (!proc_create_seq_data(tbl->id, 0, init_net.proc_net_stat,
+ &neigh_stat_seq_ops, tbl))
panic("cannot create neighbour proc dir entry");
 #endif
 
@@ -2786,7 +2786,7 @@ EXPORT_SYMBOL(neigh_seq_stop);
 
 static void *neigh_stat_seq_start(struct seq_file *seq, loff_t *pos)
 {
-   struct neigh_table *tbl = seq->private;
+   struct neigh_table *tbl = PDE_DATA(file_inode(seq->file));
int cpu;
 
if (*pos == 0)
@@ -2803,7 +2803,7 @@ static void *neigh_stat_seq_start(struct seq_file *seq, 
loff_t *pos)
 
 static void *neigh_stat_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-   struct neigh_table *tbl = seq->private;
+   struct neigh_table *tbl = PDE_DATA(file_inode(seq->file));
int cpu;
 
for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
@@ -2822,7 +2822,7 @@ static void neigh_stat_seq_stop(struct seq_file *seq, 
void *v)
 
 static int neigh_stat_seq_show(struct seq_file *seq, void *v)
 {
-   struct neigh_table *tbl = seq->private;
+   struct neigh_table *tbl = PDE_DATA(file_inode(seq->file));
struct neigh_statistics *st = v;
 
if (v == SEQ_START_TOKEN) {
@@ -2861,25 +2861,6 @@ static const struct seq_operations neigh_stat_seq_ops = {
.stop   = neigh_stat_seq_stop,
.show   = neigh_stat_seq_show,
 };
-
-static int neigh_stat_seq_open(struct inode *inode, struct file *file)
-{
-   int ret = seq_open(file, &neigh_stat_seq_ops);
-
-   if (!ret) {
-   struct seq_file *sf = file->private_data;
-   sf->private = PDE_DATA(inode);
-   }
-   return ret;
-};
-
-static const struct file_operations neigh_stat_seq_fops = {
-   .open= neigh_stat_seq_open,
-   .read= seq_read,
-   .llseek  = seq_lseek,
-   .release = seq_release,
-};
-
 #endif /* CONFIG_PROC_FS */
 
 static inline size_t neigh_nlmsg_size(void)
-- 
2.17.0



Re: [PATCH 13/14] net: sched: use unique idr insert function in unlocked actions

2018-05-16 Thread Jiri Pirko
Mon, May 14, 2018 at 04:27:14PM CEST, vla...@mellanox.com wrote:
>Substitute calls to action insert function with calls to action insert
>unique function that warns if insertion overwrites index in idr.
>
>Signed-off-by: Vlad Buslov 
>---
> net/sched/act_bpf.c| 2 +-
> net/sched/act_connmark.c   | 2 +-
> net/sched/act_csum.c   | 2 +-
> net/sched/act_gact.c   | 2 +-
> net/sched/act_ife.c| 2 +-
> net/sched/act_ipt.c| 2 +-
> net/sched/act_mirred.c | 2 +-
> net/sched/act_nat.c| 2 +-
> net/sched/act_pedit.c  | 2 +-
> net/sched/act_police.c | 2 +-
> net/sched/act_sample.c | 2 +-
> net/sched/act_simple.c | 2 +-
> net/sched/act_skbedit.c| 2 +-
> net/sched/act_skbmod.c | 2 +-
> net/sched/act_tunnel_key.c | 2 +-
> net/sched/act_vlan.c   | 2 +-
> 16 files changed, 16 insertions(+), 16 deletions(-)
>
>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>index 7e20fdc..0bf4ecf 100644
>--- a/net/sched/act_bpf.c
>+++ b/net/sched/act_bpf.c
>@@ -354,7 +354,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr 
>*nla,
>   rcu_assign_pointer(prog->filter, cfg.filter);
> 
>   if (res == ACT_P_CREATED) {
>-  tcf_idr_insert(tn, *act);
>+  tcf_idr_insert_unique(tn, *act);

Seems to me that tcf_idr_insert() is unused after this patch. If that is
the case, I think that you don't need to introduce
tcf_idr_insert_unique() and just do what you need to do in
tcf_idr_insert()

[...]


[PATCH 39/42] ide: remove ide_driver_proc_write

2018-05-16 Thread Christoph Hellwig
The driver proc file hasn't been writeable for a long time, so this is
just dead code.

Signed-off-by: Christoph Hellwig 
Acked-by: "Eric W. Biederman" 
---
 drivers/ide/ide-proc.c | 46 --
 1 file changed, 46 deletions(-)

diff --git a/drivers/ide/ide-proc.c b/drivers/ide/ide-proc.c
index 863db44c7916..b3b8b8822d6a 100644
--- a/drivers/ide/ide-proc.c
+++ b/drivers/ide/ide-proc.c
@@ -528,58 +528,12 @@ static int ide_driver_proc_open(struct inode *inode, 
struct file *file)
return single_open(file, ide_driver_proc_show, PDE_DATA(inode));
 }
 
-static int ide_replace_subdriver(ide_drive_t *drive, const char *driver)
-{
-   struct device *dev = &drive->gendev;
-   int ret = 1;
-   int err;
-
-   device_release_driver(dev);
-   /* FIXME: device can still be in use by previous driver */
-   strlcpy(drive->driver_req, driver, sizeof(drive->driver_req));
-   err = device_attach(dev);
-   if (err < 0)
-   printk(KERN_WARNING "IDE: %s: device_attach error: %d\n",
-   __func__, err);
-   drive->driver_req[0] = 0;
-   if (dev->driver == NULL) {
-   err = device_attach(dev);
-   if (err < 0)
-   printk(KERN_WARNING
-   "IDE: %s: device_attach(2) error: %d\n",
-   __func__, err);
-   }
-   if (dev->driver && !strcmp(dev->driver->name, driver))
-   ret = 0;
-
-   return ret;
-}
-
-static ssize_t ide_driver_proc_write(struct file *file, const char __user 
*buffer,
-size_t count, loff_t *pos)
-{
-   ide_drive_t *drive = PDE_DATA(file_inode(file));
-   char name[32];
-
-   if (!capable(CAP_SYS_ADMIN))
-   return -EACCES;
-   if (count > 31)
-   count = 31;
-   if (copy_from_user(name, buffer, count))
-   return -EFAULT;
-   name[count] = '\0';
-   if (ide_replace_subdriver(drive, name))
-   return -EINVAL;
-   return count;
-}
-
 static const struct file_operations ide_driver_proc_fops = {
.owner  = THIS_MODULE,
.open   = ide_driver_proc_open,
.read   = seq_read,
.llseek = seq_lseek,
.release= single_release,
-   .write  = ide_driver_proc_write,
 };
 
 static int ide_media_proc_show(struct seq_file *m, void *v)
-- 
2.17.0



Re: [PATCH 11/14] net: core: add new/replace rate estimator lock parameter

2018-05-16 Thread Jiri Pirko
Mon, May 14, 2018 at 04:27:12PM CEST, vla...@mellanox.com wrote:
>Extend rate estimator new and replace APIs with additional spinlock
>parameter used by lockless actions to protect rate_est pointer from
>concurrent modification.
>
>Signed-off-by: Vlad Buslov 

[...]


> /**
>  * gen_new_estimator - create a new rate estimator
>  * @bstats: basic statistics
>  * @cpu_bstats: bstats per cpu
>  * @rate_est: rate estimator statistics
>+ * @rate_est_lock: rate_est lock (might be NULL)

I cannot find a place you actually use this new arg in this patchset.
Did I miss it?


>  * @stats_lock: statistics lock
>  * @running: qdisc running seqcount
>  * @opt: rate estimator configuration TLV

[...]


[PATCH 29/42] rtc/proc: switch to proc_create_single_data

2018-05-16 Thread Christoph Hellwig
And stop trying to get a reference on the submodule, procfs code deals
with release after an unloaded module and thus removed proc entry.

Signed-off-by: Christoph Hellwig 
Acked-by: Alexandre Belloni 
---
 drivers/rtc/rtc-proc.c | 33 ++---
 1 file changed, 2 insertions(+), 31 deletions(-)

diff --git a/drivers/rtc/rtc-proc.c b/drivers/rtc/rtc-proc.c
index 31e7e23cc5be..a9dd9218fae2 100644
--- a/drivers/rtc/rtc-proc.c
+++ b/drivers/rtc/rtc-proc.c
@@ -107,40 +107,11 @@ static int rtc_proc_show(struct seq_file *seq, void 
*offset)
return 0;
 }
 
-static int rtc_proc_open(struct inode *inode, struct file *file)
-{
-   int ret;
-   struct rtc_device *rtc = PDE_DATA(inode);
-
-   if (!try_module_get(rtc->owner))
-   return -ENODEV;
-
-   ret = single_open(file, rtc_proc_show, rtc);
-   if (ret)
-   module_put(rtc->owner);
-   return ret;
-}
-
-static int rtc_proc_release(struct inode *inode, struct file *file)
-{
-   int res = single_release(inode, file);
-   struct rtc_device *rtc = PDE_DATA(inode);
-
-   module_put(rtc->owner);
-   return res;
-}
-
-static const struct file_operations rtc_proc_fops = {
-   .open   = rtc_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= rtc_proc_release,
-};
-
 void rtc_proc_add_device(struct rtc_device *rtc)
 {
if (is_rtc_hctosys(rtc))
-   proc_create_data("driver/rtc", 0, NULL, &rtc_proc_fops, rtc);
+   proc_create_single_data("driver/rtc", 0, NULL, rtc_proc_show,
+   rtc);
 }
 
 void rtc_proc_del_device(struct rtc_device *rtc)
-- 
2.17.0



Re: [PATCH 13/14] net: sched: use unique idr insert function in unlocked actions

2018-05-16 Thread Vlad Buslov

On Wed 16 May 2018 at 09:50, Jiri Pirko  wrote:
> Mon, May 14, 2018 at 04:27:14PM CEST, vla...@mellanox.com wrote:
>>Substitute calls to action insert function with calls to action insert
>>unique function that warns if insertion overwrites index in idr.
>>
>>Signed-off-by: Vlad Buslov 
>>---
>> net/sched/act_bpf.c| 2 +-
>> net/sched/act_connmark.c   | 2 +-
>> net/sched/act_csum.c   | 2 +-
>> net/sched/act_gact.c   | 2 +-
>> net/sched/act_ife.c| 2 +-
>> net/sched/act_ipt.c| 2 +-
>> net/sched/act_mirred.c | 2 +-
>> net/sched/act_nat.c| 2 +-
>> net/sched/act_pedit.c  | 2 +-
>> net/sched/act_police.c | 2 +-
>> net/sched/act_sample.c | 2 +-
>> net/sched/act_simple.c | 2 +-
>> net/sched/act_skbedit.c| 2 +-
>> net/sched/act_skbmod.c | 2 +-
>> net/sched/act_tunnel_key.c | 2 +-
>> net/sched/act_vlan.c   | 2 +-
>> 16 files changed, 16 insertions(+), 16 deletions(-)
>>
>>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>>index 7e20fdc..0bf4ecf 100644
>>--- a/net/sched/act_bpf.c
>>+++ b/net/sched/act_bpf.c
>>@@ -354,7 +354,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr 
>>*nla,
>>  rcu_assign_pointer(prog->filter, cfg.filter);
>> 
>>  if (res == ACT_P_CREATED) {
>>- tcf_idr_insert(tn, *act);
>>+ tcf_idr_insert_unique(tn, *act);
>
> Seems to me that tcf_idr_insert() is unused after this patch. If that is
> the case, I think that you don't need to introduce
> tcf_idr_insert_unique() and just do what you need to do in
> tcf_idr_insert()

Got it.

>
> [...]



[PATCH 34/42] netfilter/x_tables: switch to proc_create_seq_private

2018-05-16 Thread Christoph Hellwig
And remove proc boilerplate code.

Signed-off-by: Christoph Hellwig 
---
 net/netfilter/x_tables.c | 42 ++--
 1 file changed, 6 insertions(+), 36 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 344dd01a5027..0e314f95a4a3 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1648,22 +1648,6 @@ static const struct seq_operations xt_match_seq_ops = {
.show   = xt_match_seq_show,
 };
 
-static int xt_match_open(struct inode *inode, struct file *file)
-{
-   struct nf_mttg_trav *trav;
-   trav = __seq_open_private(file, &xt_match_seq_ops, sizeof(*trav));
-   if (!trav)
-   return -ENOMEM;
-   return 0;
-}
-
-static const struct file_operations xt_match_ops = {
-   .open= xt_match_open,
-   .read= seq_read,
-   .llseek  = seq_lseek,
-   .release = seq_release_private,
-};
-
 static void *xt_target_seq_start(struct seq_file *seq, loff_t *pos)
 {
return xt_mttg_seq_start(seq, pos, true);
@@ -1698,22 +1682,6 @@ static const struct seq_operations xt_target_seq_ops = {
.show   = xt_target_seq_show,
 };
 
-static int xt_target_open(struct inode *inode, struct file *file)
-{
-   struct nf_mttg_trav *trav;
-   trav = __seq_open_private(file, &xt_target_seq_ops, sizeof(*trav));
-   if (!trav)
-   return -ENOMEM;
-   return 0;
-}
-
-static const struct file_operations xt_target_ops = {
-   .open= xt_target_open,
-   .read= seq_read,
-   .llseek  = seq_lseek,
-   .release = seq_release_private,
-};
-
 #define FORMAT_TABLES  "_tables_names"
 #defineFORMAT_MATCHES  "_tables_matches"
 #define FORMAT_TARGETS "_tables_targets"
@@ -1787,8 +1755,9 @@ int xt_proto_init(struct net *net, u_int8_t af)
 
strlcpy(buf, xt_prefix[af], sizeof(buf));
strlcat(buf, FORMAT_MATCHES, sizeof(buf));
-   proc = proc_create_data(buf, 0440, net->proc_net, &xt_match_ops,
-   (void *)(unsigned long)af);
+   proc = proc_create_seq_private(buf, 0440, net->proc_net,
+   &xt_match_seq_ops, sizeof(struct nf_mttg_trav),
+   (void *)(unsigned long)af);
if (!proc)
goto out_remove_tables;
if (uid_valid(root_uid) && gid_valid(root_gid))
@@ -1796,8 +1765,9 @@ int xt_proto_init(struct net *net, u_int8_t af)
 
strlcpy(buf, xt_prefix[af], sizeof(buf));
strlcat(buf, FORMAT_TARGETS, sizeof(buf));
-   proc = proc_create_data(buf, 0440, net->proc_net, &xt_target_ops,
-   (void *)(unsigned long)af);
+   proc = proc_create_seq_private(buf, 0440, net->proc_net,
+&xt_target_seq_ops, sizeof(struct nf_mttg_trav),
+(void *)(unsigned long)af);
if (!proc)
goto out_remove_matches;
if (uid_valid(root_uid) && gid_valid(root_gid))
-- 
2.17.0



[PATCH 27/42] resource: switch to proc_create_seq_data

2018-05-16 Thread Christoph Hellwig
And use the root resource directly from the proc private data.

Signed-off-by: Christoph Hellwig 
---
 kernel/resource.c | 43 +--
 1 file changed, 5 insertions(+), 38 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 2af6c03858b9..b589dda910b3 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -87,7 +87,7 @@ enum { MAX_IORES_LEVEL = 5 };
 static void *r_start(struct seq_file *m, loff_t *pos)
__acquires(resource_lock)
 {
-   struct resource *p = m->private;
+   struct resource *p = PDE_DATA(file_inode(m->file));
loff_t l = 0;
read_lock(&resource_lock);
for (p = p->child; p && l < *pos; p = r_next(m, p, &l))
@@ -103,7 +103,7 @@ static void r_stop(struct seq_file *m, void *v)
 
 static int r_show(struct seq_file *m, void *v)
 {
-   struct resource *root = m->private;
+   struct resource *root = PDE_DATA(file_inode(m->file));
struct resource *r = v, *p;
unsigned long long start, end;
int width = root->end < 0x1 ? 4 : 8;
@@ -135,44 +135,11 @@ static const struct seq_operations resource_op = {
.show   = r_show,
 };
 
-static int ioports_open(struct inode *inode, struct file *file)
-{
-   int res = seq_open(file, &resource_op);
-   if (!res) {
-   struct seq_file *m = file->private_data;
-   m->private = &ioport_resource;
-   }
-   return res;
-}
-
-static int iomem_open(struct inode *inode, struct file *file)
-{
-   int res = seq_open(file, &resource_op);
-   if (!res) {
-   struct seq_file *m = file->private_data;
-   m->private = &iomem_resource;
-   }
-   return res;
-}
-
-static const struct file_operations proc_ioports_operations = {
-   .open   = ioports_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-
-static const struct file_operations proc_iomem_operations = {
-   .open   = iomem_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-
 static int __init ioresources_init(void)
 {
-   proc_create("ioports", 0, NULL, &proc_ioports_operations);
-   proc_create("iomem", 0, NULL, &proc_iomem_operations);
+   proc_create_seq_data("ioports", 0, NULL, &resource_op,
+   &ioport_resource);
+   proc_create_seq_data("iomem", 0, NULL, &resource_op, &iomem_resource);
return 0;
 }
 __initcall(ioresources_init);
-- 
2.17.0



[PATCH 26/42] staging/rtl8192u: simplify procfs code

2018-05-16 Thread Christoph Hellwig
Unwind the registration loop into individual calls.  Switch to use
proc_create_single where applicable.

Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.

Signed-off-by: Christoph Hellwig 
---
 drivers/staging/rtl8192u/r8192U_core.c | 67 ++
 1 file changed, 14 insertions(+), 53 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_core.c 
b/drivers/staging/rtl8192u/r8192U_core.c
index d607c59761cf..7a0dbc0fa18e 100644
--- a/drivers/staging/rtl8192u/r8192U_core.c
+++ b/drivers/staging/rtl8192u/r8192U_core.c
@@ -646,64 +646,25 @@ static void rtl8192_proc_module_init(void)
rtl8192_proc = proc_mkdir(RTL819xU_MODULE_NAME, init_net.proc_net);
 }
 
-/*
- * seq_file wrappers for procfile show routines.
- */
-static int rtl8192_proc_open(struct inode *inode, struct file *file)
-{
-   struct net_device *dev = proc_get_parent_data(inode);
-   int (*show)(struct seq_file *, void *) = PDE_DATA(inode);
-
-   return single_open(file, show, dev);
-}
-
-static const struct file_operations rtl8192_proc_fops = {
-   .open   = rtl8192_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
-
-/*
- * Table of proc files we need to create.
- */
-struct rtl8192_proc_file {
-   char name[12];
-   int (*show)(struct seq_file *, void *);
-};
-
-static const struct rtl8192_proc_file rtl8192_proc_files[] = {
-   { "stats-rx",   &proc_get_stats_rx },
-   { "stats-tx",   &proc_get_stats_tx },
-   { "stats-ap",   &proc_get_stats_ap },
-   { "registers",  &proc_get_registers },
-   { "" }
-};
-
 static void rtl8192_proc_init_one(struct net_device *dev)
 {
-   const struct rtl8192_proc_file *f;
struct proc_dir_entry *dir;
 
-   if (rtl8192_proc) {
-   dir = proc_mkdir_data(dev->name, 0, rtl8192_proc, dev);
-   if (!dir) {
-   RT_TRACE(COMP_ERR,
-"Unable to initialize /proc/net/rtl8192/%s\n",
-dev->name);
-   return;
-   }
+   if (!rtl8192_proc)
+   return;
 
-   for (f = rtl8192_proc_files; f->name[0]; f++) {
-   if (!proc_create_data(f->name, S_IFREG | S_IRUGO, dir,
- &rtl8192_proc_fops, f->show)) {
-   RT_TRACE(COMP_ERR,
-"Unable to initialize 
/proc/net/rtl8192/%s/%s\n",
-dev->name, f->name);
-   return;
-   }
-   }
-   }
+   dir = proc_mkdir_data(dev->name, 0, rtl8192_proc, dev);
+   if (!dir)
+   return;
+
+   proc_create_single("stats-rx", S_IFREG | S_IRUGO, dir,
+   proc_get_stats_rx);
+   proc_create_single("stats-tx", S_IFREG | S_IRUGO, dir,
+   proc_get_stats_tx);
+   proc_create_single("stats-ap", S_IFREG | S_IRUGO, dir,
+   proc_get_stats_ap);
+   proc_create_single("registers", S_IFREG | S_IRUGO, dir,
+   proc_get_registers);
 }
 
 static void rtl8192_proc_remove_one(struct net_device *dev)
-- 
2.17.0



Re: [PATCH net-next 2/2] pfifo_fast: drop unneeded additional lock on dequeue

2018-05-16 Thread Paolo Abeni
On Wed, 2018-05-16 at 09:56 +0200, Paolo Abeni wrote:
> On Tue, 2018-05-15 at 23:17 +0300, Michael S. Tsirkin wrote:
> > On Tue, May 15, 2018 at 04:24:37PM +0200, Paolo Abeni wrote:
> > > After the previous patch, for NOLOCK qdiscs, q->seqlock is
> > > always held when the dequeue() is invoked, we can drop
> > > any additional locking to protect such operation.
> > > 
> > > Signed-off-by: Paolo Abeni 
> > > ---
> > >  include/linux/skb_array.h | 5 +
> > >  net/sched/sch_generic.c   | 4 ++--
> > >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > Is the seqlock taken during qdisc_change_tx_queue_len?
> > We need to prevent that racing with dequeue.
> 
> Thanks for the head-up! I missed that code-path.
> 
> I'll add the lock in qdisc_change_tx_queue_len() in v2.

Actually the lock is not needed in qdisc_change_tx_queue_len(): the
device is deactivated before calling ops->change_tx_queue_len, so the
latter can't race with ops->dequeue().

I think the current patch is safe.

Cheers,

Paolo


[PATCH 31/42] hostap: switch to proc_create_{seq,single}_data

2018-05-16 Thread Christoph Hellwig
And use proc private data directly instead of doing a detour
through seq->private.

Signed-off-by: Christoph Hellwig 
---
 .../net/wireless/intersil/hostap/hostap_ap.c  |  70 ++---
 .../net/wireless/intersil/hostap/hostap_hw.c  |  17 +--
 .../wireless/intersil/hostap/hostap_proc.c| 143 +++---
 3 files changed, 39 insertions(+), 191 deletions(-)

diff --git a/drivers/net/wireless/intersil/hostap/hostap_ap.c 
b/drivers/net/wireless/intersil/hostap/hostap_ap.c
index 4f76f81dd3af..d1884b8913e7 100644
--- a/drivers/net/wireless/intersil/hostap/hostap_ap.c
+++ b/drivers/net/wireless/intersil/hostap/hostap_ap.c
@@ -69,7 +69,7 @@ static void prism2_send_mgmt(struct net_device *dev,
 #ifndef PRISM2_NO_PROCFS_DEBUG
 static int ap_debug_proc_show(struct seq_file *m, void *v)
 {
-   struct ap_data *ap = m->private;
+   struct ap_data *ap = PDE_DATA(file_inode(m->file));
 
seq_printf(m, "BridgedUnicastFrames=%u\n", ap->bridged_unicast);
seq_printf(m, "BridgedMulticastFrames=%u\n", ap->bridged_multicast);
@@ -81,18 +81,6 @@ static int ap_debug_proc_show(struct seq_file *m, void *v)
seq_printf(m, "tx_drop_nonassoc=%u\n", ap->tx_drop_nonassoc);
return 0;
 }
-
-static int ap_debug_proc_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, ap_debug_proc_show, PDE_DATA(inode));
-}
-
-static const struct file_operations ap_debug_proc_fops = {
-   .open   = ap_debug_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
 #endif /* PRISM2_NO_PROCFS_DEBUG */
 
 
@@ -333,7 +321,7 @@ void hostap_deauth_all_stas(struct net_device *dev, struct 
ap_data *ap,
 
 static int ap_control_proc_show(struct seq_file *m, void *v)
 {
-   struct ap_data *ap = m->private;
+   struct ap_data *ap = PDE_DATA(file_inode(m->file));
char *policy_txt;
struct mac_entry *entry;
 
@@ -365,20 +353,20 @@ static int ap_control_proc_show(struct seq_file *m, void 
*v)
 
 static void *ap_control_proc_start(struct seq_file *m, loff_t *_pos)
 {
-   struct ap_data *ap = m->private;
+   struct ap_data *ap = PDE_DATA(file_inode(m->file));
spin_lock_bh(&ap->mac_restrictions.lock);
return seq_list_start_head(&ap->mac_restrictions.mac_list, *_pos);
 }
 
 static void *ap_control_proc_next(struct seq_file *m, void *v, loff_t *_pos)
 {
-   struct ap_data *ap = m->private;
+   struct ap_data *ap = PDE_DATA(file_inode(m->file));
return seq_list_next(v, &ap->mac_restrictions.mac_list, _pos);
 }
 
 static void ap_control_proc_stop(struct seq_file *m, void *v)
 {
-   struct ap_data *ap = m->private;
+   struct ap_data *ap = PDE_DATA(file_inode(m->file));
spin_unlock_bh(&ap->mac_restrictions.lock);
 }
 
@@ -389,24 +377,6 @@ static const struct seq_operations ap_control_proc_seqops 
= {
.show   = ap_control_proc_show,
 };
 
-static int ap_control_proc_open(struct inode *inode, struct file *file)
-{
-   int ret = seq_open(file, &ap_control_proc_seqops);
-   if (ret == 0) {
-   struct seq_file *m = file->private_data;
-   m->private = PDE_DATA(inode);
-   }
-   return ret;
-}
-
-static const struct file_operations ap_control_proc_fops = {
-   .open   = ap_control_proc_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-
-
 int ap_control_add_mac(struct mac_restrictions *mac_restrictions, u8 *mac)
 {
struct mac_entry *entry;
@@ -585,20 +555,20 @@ static int prism2_ap_proc_show(struct seq_file *m, void 
*v)
 
 static void *prism2_ap_proc_start(struct seq_file *m, loff_t *_pos)
 {
-   struct ap_data *ap = m->private;
+   struct ap_data *ap = PDE_DATA(file_inode(m->file));
spin_lock_bh(&ap->sta_table_lock);
return seq_list_start_head(&ap->sta_list, *_pos);
 }
 
 static void *prism2_ap_proc_next(struct seq_file *m, void *v, loff_t *_pos)
 {
-   struct ap_data *ap = m->private;
+   struct ap_data *ap = PDE_DATA(file_inode(m->file));
return seq_list_next(v, &ap->sta_list, _pos);
 }
 
 static void prism2_ap_proc_stop(struct seq_file *m, void *v)
 {
-   struct ap_data *ap = m->private;
+   struct ap_data *ap = PDE_DATA(file_inode(m->file));
spin_unlock_bh(&ap->sta_table_lock);
 }
 
@@ -608,23 +578,6 @@ static const struct seq_operations prism2_ap_proc_seqops = 
{
.stop   = prism2_ap_proc_stop,
.show   = prism2_ap_proc_show,
 };
-
-static int prism2_ap_proc_open(struct inode *inode, struct file *file)
-{
-   int ret = seq_open(file, &prism2_ap_proc_seqops);
-   if (ret == 0) {
-   struct seq_file *m = file->private_data;
-   m->private = PDE_DATA(inode);
-   }
-   return ret;
-}
-
-static const struct file_operations prism2_ap_proc_fops = {
-   .open   = prism2_ap_proc_open,
-   

[PATCH 24/42] ext4: simplify procfs code

2018-05-16 Thread Christoph Hellwig
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig 
---
 fs/ext4/ext4.h|  2 +-
 fs/ext4/mballoc.c | 29 
 fs/ext4/sysfs.c   | 49 +--
 3 files changed, 14 insertions(+), 66 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a42e71203e53..229ea4da6785 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2390,7 +2390,7 @@ extern int ext4_init_inode_table(struct super_block *sb,
 extern void ext4_end_bitmap_read(struct buffer_head *bh, int uptodate);
 
 /* mballoc.c */
-extern const struct file_operations ext4_seq_mb_groups_fops;
+extern const struct seq_operations ext4_mb_seq_groups_ops;
 extern long ext4_mb_stats;
 extern long ext4_mb_max_to_scan;
 extern int ext4_mb_init(struct super_block *);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 769a62708b1c..6884e81c1465 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2254,7 +2254,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context 
*ac)
 
 static void *ext4_mb_seq_groups_start(struct seq_file *seq, loff_t *pos)
 {
-   struct super_block *sb = seq->private;
+   struct super_block *sb = PDE_DATA(file_inode(seq->file));
ext4_group_t group;
 
if (*pos < 0 || *pos >= ext4_get_groups_count(sb))
@@ -2265,7 +2265,7 @@ static void *ext4_mb_seq_groups_start(struct seq_file 
*seq, loff_t *pos)
 
 static void *ext4_mb_seq_groups_next(struct seq_file *seq, void *v, loff_t 
*pos)
 {
-   struct super_block *sb = seq->private;
+   struct super_block *sb = PDE_DATA(file_inode(seq->file));
ext4_group_t group;
 
++*pos;
@@ -2277,7 +2277,7 @@ static void *ext4_mb_seq_groups_next(struct seq_file 
*seq, void *v, loff_t *pos)
 
 static int ext4_mb_seq_groups_show(struct seq_file *seq, void *v)
 {
-   struct super_block *sb = seq->private;
+   struct super_block *sb = PDE_DATA(file_inode(seq->file));
ext4_group_t group = (ext4_group_t) ((unsigned long) v);
int i;
int err, buddy_loaded = 0;
@@ -2330,34 +2330,13 @@ static void ext4_mb_seq_groups_stop(struct seq_file 
*seq, void *v)
 {
 }
 
-static const struct seq_operations ext4_mb_seq_groups_ops = {
+const struct seq_operations ext4_mb_seq_groups_ops = {
.start  = ext4_mb_seq_groups_start,
.next   = ext4_mb_seq_groups_next,
.stop   = ext4_mb_seq_groups_stop,
.show   = ext4_mb_seq_groups_show,
 };
 
-static int ext4_mb_seq_groups_open(struct inode *inode, struct file *file)
-{
-   struct super_block *sb = PDE_DATA(inode);
-   int rc;
-
-   rc = seq_open(file, &ext4_mb_seq_groups_ops);
-   if (rc == 0) {
-   struct seq_file *m = file->private_data;
-   m->private = sb;
-   }
-   return rc;
-
-}
-
-const struct file_operations ext4_seq_mb_groups_fops = {
-   .open   = ext4_mb_seq_groups_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-
 static struct kmem_cache *get_groupinfo_cache(int blocksize_bits)
 {
int cache_index = blocksize_bits - EXT4_MIN_BLOCK_LOG_SIZE;
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 9ebd26c957c2..f34da0bb8f17 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -346,39 +346,9 @@ static struct kobject *ext4_root;
 
 static struct kobject *ext4_feat;
 
-#define PROC_FILE_SHOW_DEFN(name) \
-static int name##_open(struct inode *inode, struct file *file) \
-{ \
-   return single_open(file, ext4_seq_##name##_show, PDE_DATA(inode)); \
-} \
-\
-static const struct file_operations ext4_seq_##name##_fops = { \
-   .open   = name##_open, \
-   .read   = seq_read, \
-   .llseek = seq_lseek, \
-   .release= single_release, \
-}
-
-#define PROC_FILE_LIST(name) \
-   { __stringify(name), &ext4_seq_##name##_fops }
-
-PROC_FILE_SHOW_DEFN(es_shrinker_info);
-PROC_FILE_SHOW_DEFN(options);
-
-static const struct ext4_proc_files {
-   const char *name;
-   const struct file_operations *fops;
-} proc_files[] = {
-   PROC_FILE_LIST(options),
-   PROC_FILE_LIST(es_shrinker_info),
-   PROC_FILE_LIST(mb_groups),
-   { NULL, NULL },
-};
-
 int ext4_register_sysfs(struct super_block *sb)
 {
struct ext4_sb_info *sbi = EXT4_SB(sb);
-   const struct ext4_proc_files *p;
int err;
 
init_completion(&sbi->s_kobj_unregister);
@@ -392,11 +362,14 @@ int ext4_register_sysfs(struct super_block *sb)
 
if (ext4_proc_root)
sbi->s_proc = proc_mkdir(sb->s_id, ext4_proc_root);
-
if (sbi->s_proc) {
-   for (p = proc_files; p->name; p++)
-   proc_create_data(p->name, S_IRUGO, sbi->s_proc,
-p->fops, sb);
+   proc_cre

Re: [PATCH 41/42] tty: replace ->proc_fops with ->proc_show

2018-05-16 Thread Greg Kroah-Hartman
On Wed, May 16, 2018 at 11:43:45AM +0200, Christoph Hellwig wrote:
> Just set up the show callback in the tty_operations, and use
> proc_create_single_data to create the file without additional
> boilerplace code.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Greg Kroah-Hartman 


Re: [PATCH 14/14] net: sched: implement delete for all actions

2018-05-16 Thread Vlad Buslov

On Wed 16 May 2018 at 09:48, Jiri Pirko  wrote:
> Mon, May 14, 2018 at 04:27:15PM CEST, vla...@mellanox.com wrote:
>>Implement delete function that is required to delete actions without
>>holding rtnl lock. Use action API function that atomically deletes action
>>only if it is still in action idr. This implementation prevents concurrent
>>threads from deleting same action twice.
>>
>>Signed-off-by: Vlad Buslov 
>>---
>> net/sched/act_bpf.c|  8 
>> net/sched/act_connmark.c   |  8 
>> net/sched/act_csum.c   |  8 
>> net/sched/act_gact.c   |  8 
>> net/sched/act_ife.c|  8 
>> net/sched/act_ipt.c| 16 
>> net/sched/act_mirred.c |  8 
>> net/sched/act_nat.c|  8 
>> net/sched/act_pedit.c  |  8 
>> net/sched/act_police.c |  8 
>> net/sched/act_sample.c |  8 
>> net/sched/act_simple.c |  8 
>> net/sched/act_skbedit.c|  8 
>> net/sched/act_skbmod.c |  8 
>> net/sched/act_tunnel_key.c |  8 
>> net/sched/act_vlan.c   |  8 
>> 16 files changed, 136 insertions(+)
>>
>>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>>index 0bf4ecf..36f7f66 100644
>>--- a/net/sched/act_bpf.c
>>+++ b/net/sched/act_bpf.c
>>@@ -394,6 +394,13 @@ static int tcf_bpf_search(struct net *net, struct 
>>tc_action **a, u32 index,
>>  return tcf_idr_search(tn, a, index);
>> }
>> 
>>+static int tcf_bpf_delete(struct net *net, u32 index)
>>+{
>>+ struct tc_action_net *tn = net_generic(net, bpf_net_id);
>>+
>>+ return tcf_idr_find_delete(tn, index);
>>+}
>>+
>> static struct tc_action_ops act_bpf_ops __read_mostly = {
>>  .kind   =   "bpf",
>>  .type   =   TCA_ACT_BPF,
>>@@ -404,6 +411,7 @@ static struct tc_action_ops act_bpf_ops __read_mostly = {
>>  .init   =   tcf_bpf_init,
>>  .walk   =   tcf_bpf_walker,
>>  .lookup =   tcf_bpf_search,
>>+ .delete =   tcf_bpf_delete,
>
> I wonder, right before this patch, how the idr index got removed?
> delete op is NULL and I didn't find anyone else to do it.
>
> Also, after this patch, does it make sense to have following check in
> tcf_action_del_1()?
>
>if (ops->delete)
>  err = ops->delete(net, index);
>
> Looks like ops->delete is non-null for all.
>
> Seems to me that you need to introduce this patch filling up the delete
> op in all acts and only after that introduce a code that actually calls
> it.

Already moved this for V2 patchset to:
  - Add delete callback to ops and implement it for all actions in single
  patch.
  - Move this patch before delete first use.

Will now remove the conditional as well.

>
> [...]



RE: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-16 Thread Hayes Wang
Oliver Neukum [mailto:oneu...@suse.com]
> Sent: Wednesday, May 16, 2018 4:27 PM
[...]
> >
> > Would usb_autopm_get_interface() take a long time?
> > The driver would wake the device if it has suspended.
> > I have no idea about how usb_autopm_get_interface() works, so I don't know
> how to help.
> 
> Hi,
> 
> it basically calls r8152_resume() and makes a control request to the
> hub. I think we are spinning in rtl8152_runtime_resume(), but where?
> It has a lot of NAPI stuff. Any suggestions on how to instrument or
> trace this?

Is rtl8152_runtime_resume() called? I don't see the name in the trace.

I guess the relative API in rtl8152_runtime_resume() are
ops->disable= rtl8153_disable;
ops->autosuspend_en = rtl8153_runtime_enable;

And I don't find any possible dead lock in rtl8152_runtime_resume().

Besides, I find a similar issue as following.
https://www.spinics.net/lists/netdev/msg493512.html


Best Regards,
Hayes





Re: [PATCH 12/14] net: sched: retry action check-insert on concurrent modification

2018-05-16 Thread Jiri Pirko
Mon, May 14, 2018 at 04:27:13PM CEST, vla...@mellanox.com wrote:
>Retry check-insert sequence in action init functions if action with same
>index was inserted concurrently.
>
>Signed-off-by: Vlad Buslov 
>---
> net/sched/act_bpf.c| 8 +++-
> net/sched/act_connmark.c   | 8 +++-
> net/sched/act_csum.c   | 8 +++-
> net/sched/act_gact.c   | 8 +++-
> net/sched/act_ife.c| 8 +++-
> net/sched/act_ipt.c| 8 +++-
> net/sched/act_mirred.c | 8 +++-
> net/sched/act_nat.c| 8 +++-
> net/sched/act_pedit.c  | 8 +++-
> net/sched/act_police.c | 9 -
> net/sched/act_sample.c | 8 +++-
> net/sched/act_simple.c | 9 -
> net/sched/act_skbedit.c| 8 +++-
> net/sched/act_skbmod.c | 8 +++-
> net/sched/act_tunnel_key.c | 9 -
> net/sched/act_vlan.c   | 9 -
> 16 files changed, 116 insertions(+), 16 deletions(-)
>
>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>index 5554bf7..7e20fdc 100644
>--- a/net/sched/act_bpf.c
>+++ b/net/sched/act_bpf.c
>@@ -299,10 +299,16 @@ static int tcf_bpf_init(struct net *net, struct nlattr 
>*nla,
> 
>   parm = nla_data(tb[TCA_ACT_BPF_PARMS]);
> 
>+replay:
>   if (!tcf_idr_check(tn, parm->index, act, bind)) {
>   ret = tcf_idr_create(tn, parm->index, est, act,
>&act_bpf_ops, bind, true);
>-  if (ret < 0)
>+  /* Action with specified index was created concurrently.
>+   * Check again.
>+   */
>+  if (parm->index && ret == -ENOSPC)
>+  goto replay;
>+  else if (ret)

Hmm, looks like you are doing the same/very similar thing in every act
code. I think it would make sense to introduce a helper function for
this purpose.

[...]


[PATCH 22/42] sg: simplify procfs code

2018-05-16 Thread Christoph Hellwig
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sg.c | 124 +-
 1 file changed, 12 insertions(+), 112 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index c198b96368dd..8ff687158704 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -66,7 +66,6 @@ static int sg_version_num = 30536;/* 2 digits for each 
component */
 static char *sg_version_date = "20140603";
 
 static int sg_proc_init(void);
-static void sg_proc_cleanup(void);
 #endif
 
 #define SG_ALLOW_DIO_DEF 0
@@ -1661,7 +1660,7 @@ static void __exit
 exit_sg(void)
 {
 #ifdef CONFIG_SCSI_PROC_FS
-   sg_proc_cleanup();
+   remove_proc_subtree("scsi/sg", NULL);
 #endif /* CONFIG_SCSI_PROC_FS */
scsi_unregister_interface(&sg_interface);
class_destroy(sg_sysfs_class);
@@ -2274,11 +2273,6 @@ sg_get_dev(int dev)
 }
 
 #ifdef CONFIG_SCSI_PROC_FS
-
-static struct proc_dir_entry *sg_proc_sgp = NULL;
-
-static char sg_proc_sg_dirname[] = "scsi/sg";
-
 static int sg_proc_seq_show_int(struct seq_file *s, void *v);
 
 static int sg_proc_single_open_adio(struct inode *inode, struct file *file);
@@ -2306,37 +2300,11 @@ static const struct file_operations dressz_fops = {
 };
 
 static int sg_proc_seq_show_version(struct seq_file *s, void *v);
-static int sg_proc_single_open_version(struct inode *inode, struct file *file);
-static const struct file_operations version_fops = {
-   .owner = THIS_MODULE,
-   .open = sg_proc_single_open_version,
-   .read = seq_read,
-   .llseek = seq_lseek,
-   .release = single_release,
-};
-
 static int sg_proc_seq_show_devhdr(struct seq_file *s, void *v);
-static int sg_proc_single_open_devhdr(struct inode *inode, struct file *file);
-static const struct file_operations devhdr_fops = {
-   .owner = THIS_MODULE,
-   .open = sg_proc_single_open_devhdr,
-   .read = seq_read,
-   .llseek = seq_lseek,
-   .release = single_release,
-};
-
 static int sg_proc_seq_show_dev(struct seq_file *s, void *v);
-static int sg_proc_open_dev(struct inode *inode, struct file *file);
 static void * dev_seq_start(struct seq_file *s, loff_t *pos);
 static void * dev_seq_next(struct seq_file *s, void *v, loff_t *pos);
 static void dev_seq_stop(struct seq_file *s, void *v);
-static const struct file_operations dev_fops = {
-   .owner = THIS_MODULE,
-   .open = sg_proc_open_dev,
-   .read = seq_read,
-   .llseek = seq_lseek,
-   .release = seq_release,
-};
 static const struct seq_operations dev_seq_ops = {
.start = dev_seq_start,
.next  = dev_seq_next,
@@ -2345,14 +2313,6 @@ static const struct seq_operations dev_seq_ops = {
 };
 
 static int sg_proc_seq_show_devstrs(struct seq_file *s, void *v);
-static int sg_proc_open_devstrs(struct inode *inode, struct file *file);
-static const struct file_operations devstrs_fops = {
-   .owner = THIS_MODULE,
-   .open = sg_proc_open_devstrs,
-   .read = seq_read,
-   .llseek = seq_lseek,
-   .release = seq_release,
-};
 static const struct seq_operations devstrs_seq_ops = {
.start = dev_seq_start,
.next  = dev_seq_next,
@@ -2361,14 +2321,6 @@ static const struct seq_operations devstrs_seq_ops = {
 };
 
 static int sg_proc_seq_show_debug(struct seq_file *s, void *v);
-static int sg_proc_open_debug(struct inode *inode, struct file *file);
-static const struct file_operations debug_fops = {
-   .owner = THIS_MODULE,
-   .open = sg_proc_open_debug,
-   .read = seq_read,
-   .llseek = seq_lseek,
-   .release = seq_release,
-};
 static const struct seq_operations debug_seq_ops = {
.start = dev_seq_start,
.next  = dev_seq_next,
@@ -2376,50 +2328,23 @@ static const struct seq_operations debug_seq_ops = {
.show  = sg_proc_seq_show_debug,
 };
 
-
-struct sg_proc_leaf {
-   const char * name;
-   const struct file_operations * fops;
-};
-
-static const struct sg_proc_leaf sg_proc_leaf_arr[] = {
-   {"allow_dio", &adio_fops},
-   {"debug", &debug_fops},
-   {"def_reserved_size", &dressz_fops},
-   {"device_hdr", &devhdr_fops},
-   {"devices", &dev_fops},
-   {"device_strs", &devstrs_fops},
-   {"version", &version_fops}
-};
-
 static int
 sg_proc_init(void)
 {
-   int num_leaves = ARRAY_SIZE(sg_proc_leaf_arr);
-   int k;
+   struct proc_dir_entry *p;
 
-   sg_proc_sgp = proc_mkdir(sg_proc_sg_dirname, NULL);
-   if (!sg_proc_sgp)
+   p = proc_mkdir("scsi/sg", NULL);
+   if (!p)
return 1;
-   for (k = 0; k < num_leaves; ++k) {
-   const str

Re: [PATCH 11/14] net: core: add new/replace rate estimator lock parameter

2018-05-16 Thread Vlad Buslov

On Wed 16 May 2018 at 09:54, Jiri Pirko  wrote:
> Mon, May 14, 2018 at 04:27:12PM CEST, vla...@mellanox.com wrote:
>>Extend rate estimator new and replace APIs with additional spinlock
>>parameter used by lockless actions to protect rate_est pointer from
>>concurrent modification.
>>
>>Signed-off-by: Vlad Buslov 
>
> [...]
>
>
>> /**
>>  * gen_new_estimator - create a new rate estimator
>>  * @bstats: basic statistics
>>  * @cpu_bstats: bstats per cpu
>>  * @rate_est: rate estimator statistics
>>+ * @rate_est_lock: rate_est lock (might be NULL)
>
> I cannot find a place you actually use this new arg in this patchset.
> Did I miss it?

It is used by specific action init function. However, that code was
moved to next patchset due to patchset size limit.

>
>
>>  * @stats_lock: statistics lock
>>  * @running: qdisc running seqcount
>>  * @opt: rate estimator configuration TLV
>
> [...]



[PATCH 18/42] proc: introduce proc_create_net_single

2018-05-16 Thread Christoph Hellwig
Variant of proc_create_data that directly take a seq_file show
callback and deals with network namespaces in ->open and ->release.
All callers of proc_create + single_open_net converted over, and
single_{open,release}_net are removed entirely.

Signed-off-by: Christoph Hellwig 
---
 fs/proc/proc_net.c |  49 -
 include/linux/proc_fs.h|   4 ++
 include/linux/seq_file_net.h   |   7 +-
 net/can/bcm.c  |  16 +
 net/can/proc.c | 127 ++---
 net/ipv4/fib_trie.c|  16 +
 net/ipv4/proc.c|  48 ++---
 net/ipv6/proc.c|  31 ++--
 net/ipv6/route.c   |  15 +---
 net/kcm/kcmproc.c  |  16 +
 net/netfilter/ipvs/ip_vs_ctl.c |  31 ++--
 net/sctp/proc.c|  17 +
 net/xfrm/xfrm_proc.c   |  16 +
 13 files changed, 86 insertions(+), 307 deletions(-)

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index c99fd183f034..7d94fa005b0d 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -93,37 +93,50 @@ struct proc_dir_entry *proc_create_net_data(const char 
*name, umode_t mode,
 }
 EXPORT_SYMBOL_GPL(proc_create_net_data);
 
-int single_open_net(struct inode *inode, struct file *file,
-   int (*show)(struct seq_file *, void *))
+static int single_open_net(struct inode *inode, struct file *file)
 {
-   int err;
+   struct proc_dir_entry *de = PDE(inode);
struct net *net;
+   int err;
 
-   err = -ENXIO;
net = get_proc_net(inode);
-   if (net == NULL)
-   goto err_net;
-
-   err = single_open(file, show, net);
-   if (err < 0)
-   goto err_open;
-
-   return 0;
+   if (!net)
+   return -ENXIO;
 
-err_open:
-   put_net(net);
-err_net:
+   err = single_open(file, de->single_show, net);
+   if (err)
+   put_net(net);
return err;
 }
-EXPORT_SYMBOL_GPL(single_open_net);
 
-int single_release_net(struct inode *ino, struct file *f)
+static int single_release_net(struct inode *ino, struct file *f)
 {
struct seq_file *seq = f->private_data;
put_net(seq->private);
return single_release(ino, f);
 }
-EXPORT_SYMBOL_GPL(single_release_net);
+
+static const struct file_operations proc_net_single_fops = {
+   .open   = single_open_net,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release_net,
+};
+
+struct proc_dir_entry *proc_create_net_single(const char *name, umode_t mode,
+   struct proc_dir_entry *parent,
+   int (*show)(struct seq_file *, void *), void *data)
+{
+   struct proc_dir_entry *p;
+
+   p = proc_create_reg(name, mode, &parent, data);
+   if (!p)
+   return NULL;
+   p->proc_fops = &proc_net_single_fops;
+   p->single_show = show;
+   return proc_register(parent, p);
+}
+EXPORT_SYMBOL_GPL(proc_create_net_single);
 
 static struct net *get_proc_task_net(struct inode *dir)
 {
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 9dcde9644253..e518352137e7 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -58,6 +58,9 @@ struct proc_dir_entry *proc_create_net_data(const char *name, 
umode_t mode,
unsigned int state_size, void *data);
 #define proc_create_net(name, mode, parent, state_size, ops) \
proc_create_net_data(name, mode, parent, state_size, ops, NULL)
+struct proc_dir_entry *proc_create_net_single(const char *name, umode_t mode,
+   struct proc_dir_entry *parent,
+   int (*show)(struct seq_file *, void *), void *data);
 
 #else /* CONFIG_PROC_FS */
 
@@ -97,6 +100,7 @@ static inline int remove_proc_subtree(const char *name, 
struct proc_dir_entry *p
 
 #define proc_create_net_data(name, mode, parent, ops, state_size, data) 
({NULL;})
 #define proc_create_net(name, mode, parent, state_size, ops) ({NULL;})
+#define proc_create_net_single(name, mode, parent, show, data) ({NULL;})
 
 #endif /* CONFIG_PROC_FS */
 
diff --git a/include/linux/seq_file_net.h b/include/linux/seq_file_net.h
index 5ea18a16291a..0fdbe1ddd8d1 100644
--- a/include/linux/seq_file_net.h
+++ b/include/linux/seq_file_net.h
@@ -13,9 +13,6 @@ struct seq_net_private {
 #endif
 };
 
-int single_open_net(struct inode *, struct file *file,
-   int (*show)(struct seq_file *, void *));
-int single_release_net(struct inode *, struct file *);
 static inline struct net *seq_file_net(struct seq_file *seq)
 {
 #ifdef CONFIG_NET_NS
@@ -26,8 +23,8 @@ static inline struct net *seq_file_net(struct seq_file *seq)
 }
 
 /*
- * This one is needed for single_open_net since net is stored directly in
- * private not as a struct i.e. seq_file_net can't be used.
+ * This one is needed for proc_create_net_single since net is stored directly
+ * in private not as a struct i.e. seq_file_net can

[PATCH 20/42] sgi-gru: simplify procfs code

2018-05-16 Thread Christoph Hellwig
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig 
---
 drivers/misc/sgi-gru/gruprocfs.c | 81 ++--
 1 file changed, 14 insertions(+), 67 deletions(-)

diff --git a/drivers/misc/sgi-gru/gruprocfs.c b/drivers/misc/sgi-gru/gruprocfs.c
index 4f7635922394..42ea2eccaee9 100644
--- a/drivers/misc/sgi-gru/gruprocfs.c
+++ b/drivers/misc/sgi-gru/gruprocfs.c
@@ -270,16 +270,6 @@ static int options_open(struct inode *inode, struct file 
*file)
return single_open(file, options_show, NULL);
 }
 
-static int cch_open(struct inode *inode, struct file *file)
-{
-   return seq_open(file, &cch_seq_ops);
-}
-
-static int gru_open(struct inode *inode, struct file *file)
-{
-   return seq_open(file, &gru_seq_ops);
-}
-
 /* *INDENT-OFF* */
 static const struct file_operations statistics_fops = {
.open   = statistics_open,
@@ -305,73 +295,30 @@ static const struct file_operations options_fops = {
.release= single_release,
 };
 
-static const struct file_operations cch_fops = {
-   .open   = cch_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-static const struct file_operations gru_fops = {
-   .open   = gru_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release,
-};
-
-static struct proc_entry {
-   char *name;
-   umode_t mode;
-   const struct file_operations *fops;
-   struct proc_dir_entry *entry;
-} proc_files[] = {
-   {"statistics", 0644, &statistics_fops},
-   {"mcs_statistics", 0644, &mcs_statistics_fops},
-   {"debug_options", 0644, &options_fops},
-   {"cch_status", 0444, &cch_fops},
-   {"gru_status", 0444, &gru_fops},
-   {NULL}
-};
-/* *INDENT-ON* */
-
 static struct proc_dir_entry *proc_gru __read_mostly;
 
-static int create_proc_file(struct proc_entry *p)
-{
-   p->entry = proc_create(p->name, p->mode, proc_gru, p->fops);
-   if (!p->entry)
-   return -1;
-   return 0;
-}
-
-static void delete_proc_files(void)
-{
-   struct proc_entry *p;
-
-   if (proc_gru) {
-   for (p = proc_files; p->name; p++)
-   if (p->entry)
-   remove_proc_entry(p->name, proc_gru);
-   proc_remove(proc_gru);
-   }
-}
-
 int gru_proc_init(void)
 {
-   struct proc_entry *p;
-
proc_gru = proc_mkdir("sgi_uv/gru", NULL);
-
-   for (p = proc_files; p->name; p++)
-   if (create_proc_file(p))
-   goto err;
+   if (!proc_gru)
+   return -1;
+   if (!proc_create("statistics", 0644, proc_gru, &statistics_fops))
+   goto err;
+   if (!proc_create("mcs_statistics", 0644, proc_gru, 
&mcs_statistics_fops))
+   goto err;
+   if (!proc_create("debug_options", 0644, proc_gru, &options_fops))
+   goto err;
+   if (!proc_create_seq("cch_status", 0444, proc_gru, &cch_seq_ops))
+   goto err;
+   if (!proc_create_seq("gru_status", 0444, proc_gru, &gru_seq_ops))
+   goto err;
return 0;
-
 err:
-   delete_proc_files();
+   remove_proc_subtree("sgi_uv/gru", NULL);
return -1;
 }
 
 void gru_proc_exit(void)
 {
-   delete_proc_files();
+   remove_proc_subtree("sgi_uv/gru", NULL);
 }
-- 
2.17.0



[PATCH 19/42] acpi/battery: simplify procfs code

2018-05-16 Thread Christoph Hellwig
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Rafael J. Wysocki 
---
 drivers/acpi/battery.c | 121 +
 1 file changed, 26 insertions(+), 95 deletions(-)

diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c
index bdb24d636d9a..76550689ce10 100644
--- a/drivers/acpi/battery.c
+++ b/drivers/acpi/battery.c
@@ -81,14 +81,6 @@ MODULE_PARM_DESC(cache_time, "cache time in milliseconds");
 #ifdef CONFIG_ACPI_PROCFS_POWER
 extern struct proc_dir_entry *acpi_lock_battery_dir(void);
 extern void *acpi_unlock_battery_dir(struct proc_dir_entry *acpi_battery_dir);
-
-enum acpi_battery_files {
-   info_tag = 0,
-   state_tag,
-   alarm_tag,
-   ACPI_BATTERY_NUMFILES,
-};
-
 #endif
 
 static const struct acpi_device_id battery_device_ids[] = {
@@ -985,9 +977,10 @@ static const char *acpi_battery_units(const struct 
acpi_battery *battery)
"mA" : "mW";
 }
 
-static int acpi_battery_print_info(struct seq_file *seq, int result)
+static int acpi_battery_info_proc_show(struct seq_file *seq, void *offset)
 {
struct acpi_battery *battery = seq->private;
+   int result = acpi_battery_update(battery, false);
 
if (result)
goto end;
@@ -1041,9 +1034,10 @@ static int acpi_battery_print_info(struct seq_file *seq, 
int result)
return result;
 }
 
-static int acpi_battery_print_state(struct seq_file *seq, int result)
+static int acpi_battery_state_proc_show(struct seq_file *seq, void *offset)
 {
struct acpi_battery *battery = seq->private;
+   int result = acpi_battery_update(battery, false);
 
if (result)
goto end;
@@ -1088,9 +1082,10 @@ static int acpi_battery_print_state(struct seq_file 
*seq, int result)
return result;
 }
 
-static int acpi_battery_print_alarm(struct seq_file *seq, int result)
+static int acpi_battery_alarm_proc_show(struct seq_file *seq, void *offset)
 {
struct acpi_battery *battery = seq->private;
+   int result = acpi_battery_update(battery, false);
 
if (result)
goto end;
@@ -1142,82 +1137,22 @@ static ssize_t acpi_battery_write_alarm(struct file 
*file,
return result;
 }
 
-typedef int(*print_func)(struct seq_file *seq, int result);
-
-static print_func acpi_print_funcs[ACPI_BATTERY_NUMFILES] = {
-   acpi_battery_print_info,
-   acpi_battery_print_state,
-   acpi_battery_print_alarm,
-};
-
-static int acpi_battery_read(int fid, struct seq_file *seq)
+static int acpi_battery_alarm_proc_open(struct inode *inode, struct file *file)
 {
-   struct acpi_battery *battery = seq->private;
-   int result = acpi_battery_update(battery, false);
-   return acpi_print_funcs[fid](seq, result);
+   return single_open(file, acpi_battery_alarm_proc_show, PDE_DATA(inode));
 }
 
-#define DECLARE_FILE_FUNCTIONS(_name) \
-static int acpi_battery_read_##_name(struct seq_file *seq, void *offset) \
-{ \
-   return acpi_battery_read(_name##_tag, seq); \
-} \
-static int acpi_battery_##_name##_open_fs(struct inode *inode, struct file 
*file) \
-{ \
-   return single_open(file, acpi_battery_read_##_name, PDE_DATA(inode)); \
-}
-
-DECLARE_FILE_FUNCTIONS(info);
-DECLARE_FILE_FUNCTIONS(state);
-DECLARE_FILE_FUNCTIONS(alarm);
-
-#undef DECLARE_FILE_FUNCTIONS
-
-#define FILE_DESCRIPTION_RO(_name) \
-   { \
-   .name = __stringify(_name), \
-   .mode = S_IRUGO, \
-   .ops = { \
-   .open = acpi_battery_##_name##_open_fs, \
-   .read = seq_read, \
-   .llseek = seq_lseek, \
-   .release = single_release, \
-   .owner = THIS_MODULE, \
-   }, \
-   }
-
-#define FILE_DESCRIPTION_RW(_name) \
-   { \
-   .name = __stringify(_name), \
-   .mode = S_IFREG | S_IRUGO | S_IWUSR, \
-   .ops = { \
-   .open = acpi_battery_##_name##_open_fs, \
-   .read = seq_read, \
-   .llseek = seq_lseek, \
-   .write = acpi_battery_write_##_name, \
-   .release = single_release, \
-   .owner = THIS_MODULE, \
-   }, \
-   }
-
-static const struct battery_file {
-   struct file_operations ops;
-   umode_t mode;
-   const char *name;
-} acpi_battery_file[] = {
-   FILE_DESCRIPTION_RO(info),
-   FILE_DESCRIPTION_RO(state),
-   FILE_DESCRIPTION_RW(alarm),
+static const struct file_operations acpi_battery_alarm_fops = {
+   .owner  = THIS_MODULE,
+   .open   = acpi_battery_alarm_proc_open,
+   .read   = seq_read,
+   .write  = acpi_battery_write_alarm,
+   .llseek = seq_lseek,
+   .release= single_release,
 };
 
-#undef FILE_DESCRIPTION_RO
-#undef FILE_DESCRIPTION_RW
-
 static int

[PATCH 11/42] ipv{4,6}/ping: simplify proc file creation

2018-05-16 Thread Christoph Hellwig
Remove the pointless ping_seq_afinfo indirection and make the code look
like most other protocols.

Signed-off-by: Christoph Hellwig 
---
 include/net/ping.h | 11 --
 net/ipv4/ping.c| 50 +-
 net/ipv6/ping.c| 35 +---
 3 files changed, 37 insertions(+), 59 deletions(-)

diff --git a/include/net/ping.h b/include/net/ping.h
index 4cd90d6b5c25..fd080e043a6e 100644
--- a/include/net/ping.h
+++ b/include/net/ping.h
@@ -83,20 +83,9 @@ int  ping_queue_rcv_skb(struct sock *sk, struct sk_buff 
*skb);
 bool ping_rcv(struct sk_buff *skb);
 
 #ifdef CONFIG_PROC_FS
-struct ping_seq_afinfo {
-   char*name;
-   sa_family_t family;
-   const struct file_operations*seq_fops;
-   const struct seq_operations seq_ops;
-};
-
-extern const struct file_operations ping_seq_fops;
-
 void *ping_seq_start(struct seq_file *seq, loff_t *pos, sa_family_t family);
 void *ping_seq_next(struct seq_file *seq, void *v, loff_t *pos);
 void ping_seq_stop(struct seq_file *seq, void *v);
-int ping_proc_register(struct net *net, struct ping_seq_afinfo *afinfo);
-void ping_proc_unregister(struct net *net, struct ping_seq_afinfo *afinfo);
 
 int __init ping_proc_init(void);
 void ping_proc_exit(void);
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 56a010622f70..4d21c24dba78 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -1150,58 +1150,36 @@ static int ping_v4_seq_show(struct seq_file *seq, void 
*v)
return 0;
 }
 
-static int ping_seq_open(struct inode *inode, struct file *file)
+static const struct seq_operations ping_v4_seq_ops = {
+   .start  = ping_v4_seq_start,
+   .show   = ping_v4_seq_show,
+   .next   = ping_seq_next,
+   .stop   = ping_seq_stop,
+};
+
+static int ping_v4_seq_open(struct inode *inode, struct file *file)
 {
-   struct ping_seq_afinfo *afinfo = PDE_DATA(inode);
-   return seq_open_net(inode, file, &afinfo->seq_ops,
+   return seq_open_net(inode, file, &ping_v4_seq_ops,
   sizeof(struct ping_iter_state));
 }
 
-const struct file_operations ping_seq_fops = {
-   .open   = ping_seq_open,
+const struct file_operations ping_v4_seq_fops = {
+   .open   = ping_v4_seq_open,
.read   = seq_read,
.llseek = seq_lseek,
.release= seq_release_net,
 };
-EXPORT_SYMBOL_GPL(ping_seq_fops);
-
-static struct ping_seq_afinfo ping_v4_seq_afinfo = {
-   .name   = "icmp",
-   .family = AF_INET,
-   .seq_fops   = &ping_seq_fops,
-   .seq_ops= {
-   .start  = ping_v4_seq_start,
-   .show   = ping_v4_seq_show,
-   .next   = ping_seq_next,
-   .stop   = ping_seq_stop,
-   },
-};
 
-int ping_proc_register(struct net *net, struct ping_seq_afinfo *afinfo)
+static int __net_init ping_v4_proc_init_net(struct net *net)
 {
-   struct proc_dir_entry *p;
-   p = proc_create_data(afinfo->name, 0444, net->proc_net,
-afinfo->seq_fops, afinfo);
-   if (!p)
+   if (!proc_create("icmp", 0444, net->proc_net, &ping_v4_seq_fops))
return -ENOMEM;
return 0;
 }
-EXPORT_SYMBOL_GPL(ping_proc_register);
-
-void ping_proc_unregister(struct net *net, struct ping_seq_afinfo *afinfo)
-{
-   remove_proc_entry(afinfo->name, net->proc_net);
-}
-EXPORT_SYMBOL_GPL(ping_proc_unregister);
-
-static int __net_init ping_v4_proc_init_net(struct net *net)
-{
-   return ping_proc_register(net, &ping_v4_seq_afinfo);
-}
 
 static void __net_exit ping_v4_proc_exit_net(struct net *net)
 {
-   ping_proc_unregister(net, &ping_v4_seq_afinfo);
+   remove_proc_entry("icmp", net->proc_net);
 }
 
 static struct pernet_operations ping_v4_net_ops = {
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index 746eeae7f581..45d5c8e0f2bf 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Compatibility glue so we can support IPv6 when it's compiled as a module */
@@ -215,26 +216,36 @@ static int ping_v6_seq_show(struct seq_file *seq, void *v)
return 0;
 }
 
-static struct ping_seq_afinfo ping_v6_seq_afinfo = {
-   .name   = "icmp6",
-   .family = AF_INET6,
-   .seq_fops   = &ping_seq_fops,
-   .seq_ops= {
-   .start  = ping_v6_seq_start,
-   .show   = ping_v6_seq_show,
-   .next   = ping_seq_next,
-   .stop   = ping_seq_stop,
-   },
+static const struct seq_operations ping_v6_seq_ops = {
+   .start  = ping_v6_seq_start,
+   .show   = ping_v6_seq_show,
+   .next   = ping_seq_next,
+   .stop   = ping_seq_stop,
+};
+
+stati

[PATCH 15/42] netfilter/x_tables: simplify ѕeq_file code

2018-05-16 Thread Christoph Hellwig
Just use the address family from the proc private data instead of copying
it into per-file data.

Signed-off-by: Christoph Hellwig 
---
 net/netfilter/x_tables.c | 39 +++
 1 file changed, 11 insertions(+), 28 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 71325fef647d..3704101af27f 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1489,15 +1489,10 @@ void *xt_unregister_table(struct xt_table *table)
 EXPORT_SYMBOL_GPL(xt_unregister_table);
 
 #ifdef CONFIG_PROC_FS
-struct xt_names_priv {
-   struct seq_net_private p;
-   u_int8_t af;
-};
 static void *xt_table_seq_start(struct seq_file *seq, loff_t *pos)
 {
-   struct xt_names_priv *priv = seq->private;
struct net *net = seq_file_net(seq);
-   u_int8_t af = priv->af;
+   u_int8_t af = (unsigned long)PDE_DATA(file_inode(seq->file));
 
mutex_lock(&xt[af].mutex);
return seq_list_start(&net->xt.tables[af], *pos);
@@ -1505,17 +1500,15 @@ static void *xt_table_seq_start(struct seq_file *seq, 
loff_t *pos)
 
 static void *xt_table_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-   struct xt_names_priv *priv = seq->private;
struct net *net = seq_file_net(seq);
-   u_int8_t af = priv->af;
+   u_int8_t af = (unsigned long)PDE_DATA(file_inode(seq->file));
 
return seq_list_next(v, &net->xt.tables[af], pos);
 }
 
 static void xt_table_seq_stop(struct seq_file *seq, void *v)
 {
-   struct xt_names_priv *priv = seq->private;
-   u_int8_t af = priv->af;
+   u_int8_t af = (unsigned long)PDE_DATA(file_inode(seq->file));
 
mutex_unlock(&xt[af].mutex);
 }
@@ -1538,16 +1531,8 @@ static const struct seq_operations xt_table_seq_ops = {
 
 static int xt_table_open(struct inode *inode, struct file *file)
 {
-   int ret;
-   struct xt_names_priv *priv;
-
-   ret = seq_open_net(inode, file, &xt_table_seq_ops,
-  sizeof(struct xt_names_priv));
-   if (!ret) {
-   priv = ((struct seq_file *)file->private_data)->private;
-   priv->af = (unsigned long)PDE_DATA(inode);
-   }
-   return ret;
+   return seq_open_net(inode, file, &xt_table_seq_ops,
+   sizeof(struct seq_net_private));
 }
 
 static const struct file_operations xt_table_ops = {
@@ -1563,7 +1548,7 @@ static const struct file_operations xt_table_ops = {
  */
 struct nf_mttg_trav {
struct list_head *head, *curr;
-   uint8_t class, nfproto;
+   uint8_t class;
 };
 
 enum {
@@ -1580,6 +1565,7 @@ static void *xt_mttg_seq_next(struct seq_file *seq, void 
*v, loff_t *ppos,
[MTTG_TRAV_NFP_UNSPEC] = MTTG_TRAV_NFP_SPEC,
[MTTG_TRAV_NFP_SPEC]   = MTTG_TRAV_DONE,
};
+   uint8_t nfproto = (unsigned long)PDE_DATA(file_inode(seq->file));
struct nf_mttg_trav *trav = seq->private;
 
switch (trav->class) {
@@ -1594,9 +1580,9 @@ static void *xt_mttg_seq_next(struct seq_file *seq, void 
*v, loff_t *ppos,
if (trav->curr != trav->head)
break;
mutex_unlock(&xt[NFPROTO_UNSPEC].mutex);
-   mutex_lock(&xt[trav->nfproto].mutex);
+   mutex_lock(&xt[nfproto].mutex);
trav->head = trav->curr = is_target ?
-   &xt[trav->nfproto].target : &xt[trav->nfproto].match;
+   &xt[nfproto].target : &xt[nfproto].match;
trav->class = next_class[trav->class];
break;
case MTTG_TRAV_NFP_SPEC:
@@ -1628,6 +1614,7 @@ static void *xt_mttg_seq_start(struct seq_file *seq, 
loff_t *pos,
 
 static void xt_mttg_seq_stop(struct seq_file *seq, void *v)
 {
+   uint8_t nfproto = (unsigned long)PDE_DATA(file_inode(seq->file));
struct nf_mttg_trav *trav = seq->private;
 
switch (trav->class) {
@@ -1635,7 +1622,7 @@ static void xt_mttg_seq_stop(struct seq_file *seq, void 
*v)
mutex_unlock(&xt[NFPROTO_UNSPEC].mutex);
break;
case MTTG_TRAV_NFP_SPEC:
-   mutex_unlock(&xt[trav->nfproto].mutex);
+   mutex_unlock(&xt[nfproto].mutex);
break;
}
 }
@@ -1680,8 +1667,6 @@ static int xt_match_open(struct inode *inode, struct file 
*file)
trav = __seq_open_private(file, &xt_match_seq_ops, sizeof(*trav));
if (!trav)
return -ENOMEM;
-
-   trav->nfproto = (unsigned long)PDE_DATA(inode);
return 0;
 }
 
@@ -1732,8 +1717,6 @@ static int xt_target_open(struct inode *inode, struct 
file *file)
trav = __seq_open_private(file, &xt_target_seq_ops, sizeof(*trav));
if (!trav)
return -ENOMEM;
-
-   trav->nfproto = (unsigned long)PDE_DATA(inode);
return 0;
 }
 
-- 
2.17.0



[PATCH 14/42] net/kcm: simplify proc registration

2018-05-16 Thread Christoph Hellwig
Remove a couple indirections to make the code look like most other
protocols.

Signed-off-by: Christoph Hellwig 
---
 net/kcm/kcmproc.c | 71 ---
 1 file changed, 17 insertions(+), 54 deletions(-)

diff --git a/net/kcm/kcmproc.c b/net/kcm/kcmproc.c
index 1fac92543094..6d0667e62baf 100644
--- a/net/kcm/kcmproc.c
+++ b/net/kcm/kcmproc.c
@@ -15,12 +15,6 @@
 #include 
 
 #ifdef CONFIG_PROC_FS
-struct kcm_seq_muxinfo {
-   char*name;
-   const struct file_operations*seq_fops;
-   const struct seq_operations seq_ops;
-};
-
 static struct kcm_mux *kcm_get_first(struct seq_file *seq)
 {
struct net *net = seq_file_net(seq);
@@ -86,14 +80,6 @@ struct kcm_proc_mux_state {
int idx;
 };
 
-static int kcm_seq_open(struct inode *inode, struct file *file)
-{
-   struct kcm_seq_muxinfo *muxinfo = PDE_DATA(inode);
-
-   return seq_open_net(inode, file, &muxinfo->seq_ops,
-  sizeof(struct kcm_proc_mux_state));
-}
-
 static void kcm_format_mux_header(struct seq_file *seq)
 {
struct net *net = seq_file_net(seq);
@@ -246,6 +232,19 @@ static int kcm_seq_show(struct seq_file *seq, void *v)
return 0;
 }
 
+static const struct seq_operations kcm_seq_ops = {
+   .show   = kcm_seq_show,
+   .start  = kcm_seq_start,
+   .next   = kcm_seq_next,
+   .stop   = kcm_seq_stop,
+};
+
+static int kcm_seq_open(struct inode *inode, struct file *file)
+{
+   return seq_open_net(inode, file, &kcm_seq_ops,
+  sizeof(struct kcm_proc_mux_state));
+}
+
 static const struct file_operations kcm_seq_fops = {
.open   = kcm_seq_open,
.read   = seq_read,
@@ -253,37 +252,6 @@ static const struct file_operations kcm_seq_fops = {
.release= seq_release_net,
 };
 
-static struct kcm_seq_muxinfo kcm_seq_muxinfo = {
-   .name   = "kcm",
-   .seq_fops   = &kcm_seq_fops,
-   .seq_ops= {
-   .show   = kcm_seq_show,
-   .start  = kcm_seq_start,
-   .next   = kcm_seq_next,
-   .stop   = kcm_seq_stop,
-   }
-};
-
-static int kcm_proc_register(struct net *net, struct kcm_seq_muxinfo *muxinfo)
-{
-   struct proc_dir_entry *p;
-   int rc = 0;
-
-   p = proc_create_data(muxinfo->name, 0444, net->proc_net,
-muxinfo->seq_fops, muxinfo);
-   if (!p)
-   rc = -ENOMEM;
-   return rc;
-}
-EXPORT_SYMBOL(kcm_proc_register);
-
-static void kcm_proc_unregister(struct net *net,
-   struct kcm_seq_muxinfo *muxinfo)
-{
-   remove_proc_entry(muxinfo->name, net->proc_net);
-}
-EXPORT_SYMBOL(kcm_proc_unregister);
-
 static int kcm_stats_seq_show(struct seq_file *seq, void *v)
 {
struct kcm_psock_stats psock_stats;
@@ -404,16 +372,11 @@ static const struct file_operations kcm_stats_seq_fops = {
 
 static int kcm_proc_init_net(struct net *net)
 {
-   int err;
-
if (!proc_create("kcm_stats", 0444, net->proc_net,
-&kcm_stats_seq_fops)) {
-   err = -ENOMEM;
+&kcm_stats_seq_fops))
goto out_kcm_stats;
-   }
 
-   err = kcm_proc_register(net, &kcm_seq_muxinfo);
-   if (err)
+   if (!proc_create("kcm", 0444, net->proc_net, &kcm_seq_fops))
goto out_kcm;
 
return 0;
@@ -421,12 +384,12 @@ static int kcm_proc_init_net(struct net *net)
 out_kcm:
remove_proc_entry("kcm_stats", net->proc_net);
 out_kcm_stats:
-   return err;
+   return -ENOMEM;
 }
 
 static void kcm_proc_exit_net(struct net *net)
 {
-   kcm_proc_unregister(net, &kcm_seq_muxinfo);
+   remove_proc_entry("kcm", net->proc_net);
remove_proc_entry("kcm_stats", net->proc_net);
 }
 
-- 
2.17.0



[PATCH 08/42] proc: introduce proc_create_single{,_data}

2018-05-16 Thread Christoph Hellwig
Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.

All trivial callers converted over.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/kernel/dma.c | 14 +---
 arch/arm/kernel/swp_emulate.c | 15 +---
 arch/arm/mach-rpc/ecard.c | 16 +---
 arch/ia64/kernel/palinfo.c| 16 +---
 arch/ia64/kernel/salinfo.c| 42 --
 arch/ia64/sn/kernel/sn2/prominfo_proc.c   | 32 +---
 arch/ia64/sn/kernel/sn2/sn_proc_fs.c  | 62 ++
 arch/m68k/kernel/setup_mm.c   | 14 +---
 arch/mips/pci/ops-pmcmsp.c| 28 +--
 arch/mips/sibyte/common/bus_watcher.c | 16 +---
 arch/parisc/kernel/pci-dma.c  | 17 +---
 arch/parisc/kernel/pdc_chassis.c  | 14 +---
 arch/powerpc/kernel/eeh.c | 14 +---
 arch/powerpc/kernel/rtas-proc.c   | 32 +---
 arch/powerpc/platforms/cell/spufs/sched.c | 14 +---
 arch/s390/kernel/sysinfo.c| 14 +---
 arch/sh/drivers/dma/dma-api.c | 14 +---
 arch/sparc/kernel/ioport.c| 19 +
 arch/um/drivers/ubd_kern.c| 16 +---
 arch/x86/kernel/apm_32.c  | 15 +---
 drivers/acpi/ac.c | 21 +
 drivers/acpi/button.c | 19 +
 drivers/block/DAC960.c| 49 ++-
 drivers/block/pktcdvd.c   | 14 +---
 drivers/block/ps3vram.c   | 17 +---
 drivers/char/apm-emulation.c  | 15 +---
 drivers/char/ds1620.c | 14 +---
 drivers/char/efirtc.c | 15 +---
 drivers/char/nvram.c  | 15 +---
 drivers/char/rtc.c| 19 +
 drivers/char/toshiba.c| 15 +---
 drivers/connector/connector.c | 15 +---
 drivers/input/misc/hp_sdc_rtc.c   | 14 +---
 drivers/isdn/capi/capi.c  | 30 +--
 drivers/isdn/capi/capidrv.c   | 15 +---
 drivers/isdn/hardware/eicon/diva_didd.c   | 17 +---
 drivers/isdn/hardware/eicon/divasi.c  | 17 +---
 drivers/macintosh/via-pmu.c   | 57 +++--
 drivers/media/pci/saa7164/saa7164-core.c  | 14 +---
 drivers/media/pci/zoran/videocodec.c  | 16 +---
 drivers/message/fusion/mptbase.c  | 57 +++--
 drivers/mtd/mtdcore.c | 14 +---
 drivers/net/wireless/atmel/atmel.c| 15 +---
 .../net/wireless/intersil/hostap/hostap_ap.c  | 16 +---
 drivers/net/wireless/ray_cs.c | 15 +---
 drivers/nubus/proc.c  | 51 ++--
 drivers/parisc/ccio-dma.c | 34 +---
 drivers/parisc/sba_iommu.c| 32 +---
 drivers/platform/x86/toshiba_acpi.c   | 17 +---
 drivers/pnp/pnpbios/proc.c| 78 ++
 drivers/staging/comedi/proc.c | 18 +---
 drivers/usb/gadget/udc/at91_udc.c | 16 +---
 drivers/usb/gadget/udc/fsl_udc_core.c | 18 +---
 drivers/usb/gadget/udc/goku_udc.c | 18 +---
 drivers/usb/gadget/udc/omap_udc.c | 15 +---
 drivers/video/fbdev/via/viafbdev.c| 17 +---
 fs/cifs/cifs_debug.c  | 15 +---
 fs/f2fs/sysfs.c   | 29 ++-
 fs/filesystems.c  | 14 +---
 fs/fscache/internal.h |  2 +-
 fs/fscache/proc.c |  4 +-
 fs/fscache/stats.c| 17 +---
 fs/proc/cmdline.c | 14 +---
 fs/proc/generic.c | 29 +++
 fs/proc/internal.h|  5 +-
 fs/proc/loadavg.c | 14 +---
 fs/proc/meminfo.c | 14 +---
 fs/proc/softirqs.c| 14 +---
 fs/proc/uptime.c  | 14 +---
 fs/proc/version.c | 14 +---
 fs/reiserfs/procfs.c  | 16 +---
 fs/xfs/xfs_stats.c| 31 +--
 include/linux/proc_fs.h   | 10 ++-
 kernel/cgroup/cgroup-internal.h   |  2 +-
 kernel/cgroup/cgroup-v1.c | 14 +---
 kernel/cgroup/cgroup.c|  2 +-
 kernel/dma.c  | 14 +---
 kernel/exec_domain.c  | 14 +---
 kernel/irq/proc.c | 82 +++
 kernel/locking/lockdep_proc.c | 16 +---
 net/8021q/vlanproc.c  | 21 +
 net/ipv4/ipconfig.c   | 14 +---
 net/ipv4

Re: [PATCH 11/14] net: core: add new/replace rate estimator lock parameter

2018-05-16 Thread Jiri Pirko
Wed, May 16, 2018 at 12:00:57PM CEST, vla...@mellanox.com wrote:
>
>On Wed 16 May 2018 at 09:54, Jiri Pirko  wrote:
>> Mon, May 14, 2018 at 04:27:12PM CEST, vla...@mellanox.com wrote:
>>>Extend rate estimator new and replace APIs with additional spinlock
>>>parameter used by lockless actions to protect rate_est pointer from
>>>concurrent modification.
>>>
>>>Signed-off-by: Vlad Buslov 
>>
>> [...]
>>
>>
>>> /**
>>>  * gen_new_estimator - create a new rate estimator
>>>  * @bstats: basic statistics
>>>  * @cpu_bstats: bstats per cpu
>>>  * @rate_est: rate estimator statistics
>>>+ * @rate_est_lock: rate_est lock (might be NULL)
>>
>> I cannot find a place you actually use this new arg in this patchset.
>> Did I miss it?
>
>It is used by specific action init function. However, that code was
>moved to next patchset due to patchset size limit.

Please move this patch too.

>
>>
>>
>>>  * @stats_lock: statistics lock
>>>  * @running: qdisc running seqcount
>>>  * @opt: rate estimator configuration TLV
>>
>> [...]
>


[PATCH 09/42] ipv{4,6}/udp{,lite}: simplify proc registration

2018-05-16 Thread Christoph Hellwig
Remove a couple indirections to make the code look like most other
protocols.

Signed-off-by: Christoph Hellwig 
---
 include/net/udp.h  | 20 --
 net/ipv4/udp.c | 99 +-
 net/ipv4/udplite.c | 21 +++---
 net/ipv6/udp.c | 30 +-
 net/ipv6/udplite.c | 21 +++---
 5 files changed, 78 insertions(+), 113 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 0676b272f6ac..093cd323f66a 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -408,31 +408,27 @@ do {  
\
 #define __UDPX_INC_STATS(sk, field) __UDP_INC_STATS(sock_net(sk), field, 0)
 #endif
 
-/* /proc */
-int udp_seq_open(struct inode *inode, struct file *file);
-
+#ifdef CONFIG_PROC_FS
 struct udp_seq_afinfo {
-   char*name;
sa_family_t family;
struct udp_table*udp_table;
-   const struct file_operations*seq_fops;
-   struct seq_operations   seq_ops;
 };
 
 struct udp_iter_state {
struct seq_net_private  p;
-   sa_family_t family;
int bucket;
-   struct udp_table*udp_table;
 };
 
-#ifdef CONFIG_PROC_FS
-int udp_proc_register(struct net *net, struct udp_seq_afinfo *afinfo);
-void udp_proc_unregister(struct net *net, struct udp_seq_afinfo *afinfo);
+void *udp_seq_start(struct seq_file *seq, loff_t *pos);
+void *udp_seq_next(struct seq_file *seq, void *v, loff_t *pos);
+void udp_seq_stop(struct seq_file *seq, void *v);
+
+extern const struct file_operations udp_afinfo_seq_fops;
+extern const struct file_operations udp6_afinfo_seq_fops;
 
 int udp4_proc_init(void);
 void udp4_proc_exit(void);
-#endif
+#endif /* CONFIG_PROC_FS */
 
 int udpv4_offload_init(void);
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b61a770884fa..51559a8c6e57 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2582,12 +2582,13 @@ EXPORT_SYMBOL(udp_prot);
 static struct sock *udp_get_first(struct seq_file *seq, int start)
 {
struct sock *sk;
+   struct udp_seq_afinfo *afinfo = PDE_DATA(file_inode(seq->file));
struct udp_iter_state *state = seq->private;
struct net *net = seq_file_net(seq);
 
-   for (state->bucket = start; state->bucket <= state->udp_table->mask;
+   for (state->bucket = start; state->bucket <= afinfo->udp_table->mask;
 ++state->bucket) {
-   struct udp_hslot *hslot = 
&state->udp_table->hash[state->bucket];
+   struct udp_hslot *hslot = 
&afinfo->udp_table->hash[state->bucket];
 
if (hlist_empty(&hslot->head))
continue;
@@ -2596,7 +2597,7 @@ static struct sock *udp_get_first(struct seq_file *seq, 
int start)
sk_for_each(sk, &hslot->head) {
if (!net_eq(sock_net(sk), net))
continue;
-   if (sk->sk_family == state->family)
+   if (sk->sk_family == afinfo->family)
goto found;
}
spin_unlock_bh(&hslot->lock);
@@ -2608,16 +2609,17 @@ static struct sock *udp_get_first(struct seq_file *seq, 
int start)
 
 static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk)
 {
+   struct udp_seq_afinfo *afinfo = PDE_DATA(file_inode(seq->file));
struct udp_iter_state *state = seq->private;
struct net *net = seq_file_net(seq);
 
do {
sk = sk_next(sk);
-   } while (sk && (!net_eq(sock_net(sk), net) || sk->sk_family != 
state->family));
+   } while (sk && (!net_eq(sock_net(sk), net) || sk->sk_family != 
afinfo->family));
 
if (!sk) {
-   if (state->bucket <= state->udp_table->mask)
-   
spin_unlock_bh(&state->udp_table->hash[state->bucket].lock);
+   if (state->bucket <= afinfo->udp_table->mask)
+   
spin_unlock_bh(&afinfo->udp_table->hash[state->bucket].lock);
return udp_get_first(seq, state->bucket + 1);
}
return sk;
@@ -2633,15 +2635,16 @@ static struct sock *udp_get_idx(struct seq_file *seq, 
loff_t pos)
return pos ? NULL : sk;
 }
 
-static void *udp_seq_start(struct seq_file *seq, loff_t *pos)
+void *udp_seq_start(struct seq_file *seq, loff_t *pos)
 {
struct udp_iter_state *state = seq->private;
state->bucket = MAX_UDP_PORTS;
 
return *pos ? udp_get_idx(seq, *pos-1) : SEQ_START_TOKEN;
 }
+EXPORT_SYMBOL(udp_seq_start);
 
-static void *udp_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+void *udp_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
struct sock *sk;
 
@@ -2653,56 +2656,17 @@ static void *udp_seq_next(struct seq_file *seq, void 
*v, loff_t *pos)
++*pos;
return sk;
 }
+EXPORT_SYMBOL(udp_seq_next);
 
-static void udp_seq_stop(s

Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-16 Thread Oliver Neukum
Am Mittwoch, den 16.05.2018, 10:00 + schrieb Hayes Wang:
> Oliver Neukum [mailto:oneu...@suse.com]
> > Sent: Wednesday, May 16, 2018 4:27 PM
> 
> [...]
> > > 
> > > Would usb_autopm_get_interface() take a long time?
> > > The driver would wake the device if it has suspended.
> > > I have no idea about how usb_autopm_get_interface() works, so I don't know
> > 
> > how to help.
> > 
> > Hi,
> > 
> > it basically calls r8152_resume() and makes a control request to the
> > hub. I think we are spinning in rtl8152_runtime_resume(), but where?
> > It has a lot of NAPI stuff. Any suggestions on how to instrument or
> > trace this?
> 
> Is rtl8152_runtime_resume() called? I don't see the name in the trace.

Good question. I see nothing else that could produce a live lock.
> 
> I guess the relative API in rtl8152_runtime_resume() are
>   ops->disable= rtl8153_disable;
>   ops->autosuspend_en = rtl8153_runtime_enable;
> 
> And I don't find any possible dead lock in rtl8152_runtime_resume().
> 
> Besides, I find a similar issue as following.
> https://www.spinics.net/lists/netdev/msg493512.html

Well, if we have an imbalance in NAPI it should strike whereever
it is used, not just in suspend(). Is there debugging for NAPI
we could activate?

Regards
Oliver



Re: [PATCH V2] mlx4_core: allocate ICM memory in page size chunks

2018-05-16 Thread Gi-Oh Kim
On Wed, May 16, 2018 at 9:04 AM, Tariq Toukan  wrote:
>
>
> On 15/05/2018 9:53 PM, Qing Huang wrote:
>>
>>
>>
>> On 5/15/2018 2:19 AM, Tariq Toukan wrote:
>>>
>>>
>>>
>>> On 14/05/2018 7:41 PM, Qing Huang wrote:



 On 5/13/2018 2:00 AM, Tariq Toukan wrote:
>
>
>
> On 11/05/2018 10:23 PM, Qing Huang wrote:
>>
>> When a system is under memory presure (high usage with fragments),
>> the original 256KB ICM chunk allocations will likely trigger kernel
>> memory management to enter slow path doing memory compact/migration
>> ops in order to complete high order memory allocations.
>>
>> When that happens, user processes calling uverb APIs may get stuck
>> for more than 120s easily even though there are a lot of free pages
>> in smaller chunks available in the system.
>>
>> Syslog:
>> ...
>> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
>> oracle_205573_e:205573 blocked for more than 120 seconds.
>> ...
>>
>> With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
>>
>> However in order to support smaller ICM chunk size, we need to fix
>> another issue in large size kcalloc allocations.
>>
>> E.g.
>> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
>> size, each ICM chunk can only hold 512 mtt entries (8 bytes for each
>> mtt
>> entry). So we need a 16MB allocation for a table->icm pointer array to
>> hold 2M pointers which can easily cause kcalloc to fail.
>>
>> The solution is to use vzalloc to replace kcalloc. There is no need
>> for contiguous memory pages for a driver meta data structure (no need
>> of DMA ops).
>>
>> Signed-off-by: Qing Huang 
>> Acked-by: Daniel Jurgens 
>> Reviewed-by: Zhu Yanjun 
>> ---
>> v2 -> v1: adjusted chunk size to reflect different architectures.
>>
>>   drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++---
>>   1 file changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c
>> b/drivers/net/ethernet/mellanox/mlx4/icm.c
>> index a822f7a..ccb62b8 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
>> @@ -43,12 +43,12 @@
>>   #include "fw.h"
>> /*
>> - * We allocate in as big chunks as we can, up to a maximum of 256 KB
>> - * per chunk.
>> + * We allocate in page size (default 4KB on many archs) chunks to
>> avoid high
>> + * order memory allocations in fragmented/high usage memory
>> situation.
>>*/
>>   enum {
>> -MLX4_ICM_ALLOC_SIZE= 1 << 18,
>> -MLX4_TABLE_CHUNK_SIZE= 1 << 18
>> +MLX4_ICM_ALLOC_SIZE= 1 << PAGE_SHIFT,
>> +MLX4_TABLE_CHUNK_SIZE= 1 << PAGE_SHIFT
>
>
> Which is actually PAGE_SIZE.


 Yes, we wanted to avoid high order memory allocations.

>>>
>>> Then please use PAGE_SIZE instead.
>>
>>
>> PAGE_SIZE is usually defined as 1 << PAGE_SHIFT. So I think PAGE_SHIFT is
>> actually more appropriate here.
>>
>
> Definition of PAGE_SIZE varies among different archs.
> It is not always as simple as 1 << PAGE_SHIFT.
> It might be:
> PAGE_SIZE (1UL << PAGE_SHIFT)
> PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT)
> etc...
>
> Please replace 1 << PAGE_SHIFT with PAGE_SIZE.
>
>>
>>>
> Also, please add a comma at the end of the last entry.


 Hmm..., followed the existing code style and checkpatch.pl didn't
 complain about the comma.

>>>
>>> I am in favor of having a comma also after the last element, so that when
>>> another enum element is added we do not modify this line again, which would
>>> falsely affect git blame.
>>>
>>> I know it didn't exist before your patch, but once we're here, let's do
>>> it.
>>
>>
>> I'm okay either way. If adding an extra comma is preferred by many people,
>> someone should update checkpatch.pl to enforce it. :)
>>
> I agree.
> Until then, please use an extra comma in this patch.
>
>>>
>
>>   };
>> static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct
>> mlx4_icm_chunk *chunk)
>> @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev,
>> struct mlx4_icm_table *table,
>>   obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
>>   num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
>>   -table->icm  = kcalloc(num_icm, sizeof(*table->icm),
>> GFP_KERNEL);
>> +table->icm  = vzalloc(num_icm * sizeof(*table->icm));
>
>
> Why not kvzalloc ?


 I think table->icm pointer array doesn't really need physically
 contiguous memory. Sometimes high order
 memory allocation by kmalloc variants may trigger slow path and cause
 tasks to be blocked.

>>>
>>> This is control path so it is less latency-sensitive.
>>> Let's not produce unnec

[PATCH 07/42] proc: introduce proc_create_seq_private

2018-05-16 Thread Christoph Hellwig
Variant of proc_create_data that directly take a struct seq_operations
argument + a private state size and drastically reduces the boilerplate
code in the callers.

All trivial callers converted over.

Signed-off-by: Christoph Hellwig 
---
 fs/locks.c | 16 ++--
 fs/proc/generic.c  |  9 ++---
 fs/proc/internal.h |  1 +
 include/linux/atalk.h  |  7 ++-
 include/linux/proc_fs.h|  9 ++---
 kernel/time/timer_list.c   | 16 ++--
 mm/vmalloc.c   | 18 +++---
 net/appletalk/aarp.c   | 20 +---
 net/appletalk/atalk_proc.c |  3 ++-
 net/atm/lec.c  | 15 ++-
 net/decnet/af_decnet.c | 17 +++--
 net/decnet/dn_route.c  | 19 +++
 12 files changed, 37 insertions(+), 113 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 62bbe8b31f26..05e211be8684 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2788,22 +2788,10 @@ static const struct seq_operations locks_seq_operations 
= {
.show   = locks_show,
 };
 
-static int locks_open(struct inode *inode, struct file *filp)
-{
-   return seq_open_private(filp, &locks_seq_operations,
-   sizeof(struct locks_iterator));
-}
-
-static const struct file_operations proc_locks_operations = {
-   .open   = locks_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= seq_release_private,
-};
-
 static int __init proc_locks_init(void)
 {
-   proc_create("locks", 0, NULL, &proc_locks_operations);
+   proc_create_seq_private("locks", 0, NULL, &locks_seq_operations,
+   sizeof(struct locks_iterator), NULL);
return 0;
 }
 fs_initcall(proc_locks_init);
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index af644caaaf85..f87cb0053387 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -560,6 +560,8 @@ static int proc_seq_open(struct inode *inode, struct file 
*file)
 {
struct proc_dir_entry *de = PDE(inode);
 
+   if (de->state_size)
+   return seq_open_private(file, de->seq_ops, de->state_size);
return seq_open(file, de->seq_ops);
 }
 
@@ -570,9 +572,9 @@ static const struct file_operations proc_seq_fops = {
.release= seq_release,
 };
 
-struct proc_dir_entry *proc_create_seq_data(const char *name, umode_t mode,
+struct proc_dir_entry *proc_create_seq_private(const char *name, umode_t mode,
struct proc_dir_entry *parent, const struct seq_operations *ops,
-   void *data)
+   unsigned int state_size, void *data)
 {
struct proc_dir_entry *p;
 
@@ -581,9 +583,10 @@ struct proc_dir_entry *proc_create_seq_data(const char 
*name, umode_t mode,
return NULL;
p->proc_fops = &proc_seq_fops;
p->seq_ops = ops;
+   p->state_size = state_size;
return proc_register(parent, p);
 }
-EXPORT_SYMBOL(proc_create_seq_data);
+EXPORT_SYMBOL(proc_create_seq_private);
 
 void proc_set_size(struct proc_dir_entry *de, loff_t size)
 {
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 4fb01c5f9c1a..bcfe830ffd59 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -46,6 +46,7 @@ struct proc_dir_entry {
const struct file_operations *proc_fops;
const struct seq_operations *seq_ops;
void *data;
+   unsigned int state_size;
unsigned int low_ino;
nlink_t nlink;
kuid_t uid;
diff --git a/include/linux/atalk.h b/include/linux/atalk.h
index 40373920ea58..23f805562f4e 100644
--- a/include/linux/atalk.h
+++ b/include/linux/atalk.h
@@ -145,7 +145,12 @@ extern rwlock_t atalk_interfaces_lock;
 
 extern struct atalk_route atrtr_default;
 
-extern const struct file_operations atalk_seq_arp_fops;
+struct aarp_iter_state {
+   int bucket;
+   struct aarp_entry **table;
+};
+
+extern const struct seq_operations aarp_seq_ops;
 
 extern int sysctl_aarp_expiry_time;
 extern int sysctl_aarp_tick_time;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index f368a896a8cb..314713a48817 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -25,11 +25,13 @@ extern struct proc_dir_entry *proc_mkdir_mode(const char *, 
umode_t,
  struct proc_dir_entry *);
 struct proc_dir_entry *proc_create_mount_point(const char *name);
 
-struct proc_dir_entry *proc_create_seq_data(const char *name, umode_t mode,
+struct proc_dir_entry *proc_create_seq_private(const char *name, umode_t mode,
struct proc_dir_entry *parent, const struct seq_operations *ops,
-   void *data);
+   unsigned int state_size, void *data);
+#define proc_create_seq_data(name, mode, parent, ops, data) \
+   proc_create_seq_private(name, mode, parent, ops, 0, data)
 #define proc_create_seq(name, mode, parent, ops) \
-   proc_create_seq_data(name,

[PATCH 03/42] proc: don't detour through seq->private to get the inode

2018-05-16 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/proc/array.c | 20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 911f66924d81..4a8e413bf59b 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -677,20 +677,22 @@ get_children_pid(struct inode *inode, struct pid 
*pid_prev, loff_t pos)
 
 static int children_seq_show(struct seq_file *seq, void *v)
 {
-   seq_printf(seq, "%d ", pid_nr_ns(v, proc_pid_ns(seq->private)));
+   struct inode *inode = file_inode(seq->file);
+
+   seq_printf(seq, "%d ", pid_nr_ns(v, proc_pid_ns(inode)));
return 0;
 }
 
 static void *children_seq_start(struct seq_file *seq, loff_t *pos)
 {
-   return get_children_pid(seq->private, NULL, *pos);
+   return get_children_pid(file_inode(seq->file), NULL, *pos);
 }
 
 static void *children_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
struct pid *pid;
 
-   pid = get_children_pid(seq->private, v, *pos + 1);
+   pid = get_children_pid(file_inode(seq->file), v, *pos + 1);
put_pid(v);
 
++*pos;
@@ -711,17 +713,7 @@ static const struct seq_operations children_seq_ops = {
 
 static int children_seq_open(struct inode *inode, struct file *file)
 {
-   struct seq_file *m;
-   int ret;
-
-   ret = seq_open(file, &children_seq_ops);
-   if (ret)
-   return ret;
-
-   m = file->private_data;
-   m->private = inode;
-
-   return ret;
+   return seq_open(file, &children_seq_ops);
 }
 
 const struct file_operations proc_tid_children_operations = {
-- 
2.17.0



[PATCH 10/42] ipv{4,6}/tcp: simplify procfs registration

2018-05-16 Thread Christoph Hellwig
Avoid most of the afinfo indirections and just call the proc helpers
directly.

Signed-off-by: Christoph Hellwig 
---
 include/net/tcp.h   | 11 ++
 net/ipv4/tcp_ipv4.c | 85 +
 net/ipv6/tcp_ipv6.c | 27 +-
 3 files changed, 53 insertions(+), 70 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9c9b3768b350..51dc7a26a2fa 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1747,27 +1747,22 @@ enum tcp_seq_states {
TCP_SEQ_STATE_ESTABLISHED,
 };
 
-int tcp_seq_open(struct inode *inode, struct file *file);
+void *tcp_seq_start(struct seq_file *seq, loff_t *pos);
+void *tcp_seq_next(struct seq_file *seq, void *v, loff_t *pos);
+void tcp_seq_stop(struct seq_file *seq, void *v);
 
 struct tcp_seq_afinfo {
-   char*name;
sa_family_t family;
-   const struct file_operations*seq_fops;
-   struct seq_operations   seq_ops;
 };
 
 struct tcp_iter_state {
struct seq_net_private  p;
-   sa_family_t family;
enum tcp_seq_states state;
struct sock *syn_wait_sk;
int bucket, offset, sbucket, num;
loff_t  last_pos;
 };
 
-int tcp_proc_register(struct net *net, struct tcp_seq_afinfo *afinfo);
-void tcp_proc_unregister(struct net *net, struct tcp_seq_afinfo *afinfo);
-
 extern struct request_sock_ops tcp_request_sock_ops;
 extern struct request_sock_ops tcp6_request_sock_ops;
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index f70586b50838..645f259d0972 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1961,6 +1961,7 @@ EXPORT_SYMBOL(tcp_v4_destroy_sock);
  */
 static void *listening_get_next(struct seq_file *seq, void *cur)
 {
+   struct tcp_seq_afinfo *afinfo = PDE_DATA(file_inode(seq->file));
struct tcp_iter_state *st = seq->private;
struct net *net = seq_file_net(seq);
struct inet_listen_hashbucket *ilb;
@@ -1983,7 +1984,7 @@ static void *listening_get_next(struct seq_file *seq, 
void *cur)
sk_for_each_from(sk) {
if (!net_eq(sock_net(sk), net))
continue;
-   if (sk->sk_family == st->family)
+   if (sk->sk_family == afinfo->family)
return sk;
}
spin_unlock(&ilb->lock);
@@ -2020,6 +2021,7 @@ static inline bool empty_bucket(const struct 
tcp_iter_state *st)
  */
 static void *established_get_first(struct seq_file *seq)
 {
+   struct tcp_seq_afinfo *afinfo = PDE_DATA(file_inode(seq->file));
struct tcp_iter_state *st = seq->private;
struct net *net = seq_file_net(seq);
void *rc = NULL;
@@ -2036,7 +2038,7 @@ static void *established_get_first(struct seq_file *seq)
 
spin_lock_bh(lock);
sk_nulls_for_each(sk, node, 
&tcp_hashinfo.ehash[st->bucket].chain) {
-   if (sk->sk_family != st->family ||
+   if (sk->sk_family != afinfo->family ||
!net_eq(sock_net(sk), net)) {
continue;
}
@@ -2051,6 +2053,7 @@ static void *established_get_first(struct seq_file *seq)
 
 static void *established_get_next(struct seq_file *seq, void *cur)
 {
+   struct tcp_seq_afinfo *afinfo = PDE_DATA(file_inode(seq->file));
struct sock *sk = cur;
struct hlist_nulls_node *node;
struct tcp_iter_state *st = seq->private;
@@ -2062,7 +2065,8 @@ static void *established_get_next(struct seq_file *seq, 
void *cur)
sk = sk_nulls_next(sk);
 
sk_nulls_for_each_from(sk, node) {
-   if (sk->sk_family == st->family && net_eq(sock_net(sk), net))
+   if (sk->sk_family == afinfo->family &&
+   net_eq(sock_net(sk), net))
return sk;
}
 
@@ -2135,7 +2139,7 @@ static void *tcp_seek_last_pos(struct seq_file *seq)
return rc;
 }
 
-static void *tcp_seq_start(struct seq_file *seq, loff_t *pos)
+void *tcp_seq_start(struct seq_file *seq, loff_t *pos)
 {
struct tcp_iter_state *st = seq->private;
void *rc;
@@ -2156,8 +2160,9 @@ static void *tcp_seq_start(struct seq_file *seq, loff_t 
*pos)
st->last_pos = *pos;
return rc;
 }
+EXPORT_SYMBOL(tcp_seq_start);
 
-static void *tcp_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+void *tcp_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
struct tcp_iter_state *st = seq->private;
void *rc = NULL;
@@ -2186,8 +2191,9 @@ static void *tcp_seq_next(struct seq_file *seq, void *v, 
loff_t *pos)
st->last_pos = *pos;
return rc;
 }
+EXPORT_SYMBOL(tcp_seq_next);
 
-static void tcp_seq_stop(struct seq_file *seq, void *v)
+void tcp_seq_stop(struct seq_file *seq, void *v)
 {
struct tcp_iter_state *st = seq->private;
 
@@ -2202,47 +2208,7 @@ 

Re: [RFC v4 5/5] virtio_ring: enable packed ring

2018-05-16 Thread Tiwei Bie
On Wed, May 16, 2018 at 01:15:48PM +0300, Sergei Shtylyov wrote:
> On 5/16/2018 11:37 AM, Tiwei Bie wrote:
> 
> > Signed-off-by: Tiwei Bie 
> > ---
> >   drivers/virtio/virtio_ring.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index de3839f3621a..b158692263b0 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1940,6 +1940,8 @@ void vring_transport_features(struct virtio_device 
> > *vdev)
> > break;
> > case VIRTIO_F_IOMMU_PLATFORM:
> > break;
> > +   case VIRTIO_F_RING_PACKED:
> > +   break;
> 
>Why not just add this *case* under the previous *case*?

Do you mean fallthrough? Something like:

case VIRTIO_F_IOMMU_PLATFORM:
case VIRTIO_F_RING_PACKED:
break;

Best regards,
Tiwei Bie

> 
> > default:
> > /* We don't understand this bit. */
> > __virtio_clear_bit(vdev, i);
> 
> MBR, Sergei


Re: [RFC v4 5/5] virtio_ring: enable packed ring

2018-05-16 Thread Sergei Shtylyov

On 5/16/2018 11:37 AM, Tiwei Bie wrote:


Signed-off-by: Tiwei Bie 
---
  drivers/virtio/virtio_ring.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index de3839f3621a..b158692263b0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1940,6 +1940,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_IOMMU_PLATFORM:
break;
+   case VIRTIO_F_RING_PACKED:
+   break;


   Why not just add this *case* under the previous *case*?


default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);


MBR, Sergei


simplify procfs code for seq_file instances V3

2018-05-16 Thread Christoph Hellwig
We currently have hundreds of proc files that implement plain, read-only
seq_file based interfaces.  This series consolidates them using new
procfs helpers that take the seq_operations or simple show callback
directly.

A git tree is available at:

git://git.infradead.org/users/hch/misc.git proc_create.3

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/proc_create.3

Changes since V2:
 - use unsigned int for state_size everywhere
 - move state_size around in proc_dir_entry to use a struct packing hole
 - update SIZEOF_PDE_INLINE_NAME
 - added a new proc_pid_ns helper
 - improved a few changelogs
 - added back a nubus comment
 - minor typo fix
 - collected various ACKs

Changes since V1:
 - open code proc_create_data to avoid setting not fully initialized
   entries live
 - use unsigned int for state_size
 - dropped the s390/cio/blacklist hunk as it has a write method
 - dropped the IPMI patch given that IPMI proc support is scheduled for
   removal.


[PATCH 01/42] net/can: single_open_net needs to be paired with single_release_net

2018-05-16 Thread Christoph Hellwig
Otherwise we will leak a reference to the network namespace.

Signed-off-by: Christoph Hellwig 
---
 net/can/bcm.c  | 2 +-
 net/can/proc.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/can/bcm.c b/net/can/bcm.c
index ac5e5e34fee3..8073fa14e143 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -249,7 +249,7 @@ static const struct file_operations bcm_proc_fops = {
.open   = bcm_proc_open,
.read   = seq_read,
.llseek = seq_lseek,
-   .release= single_release,
+   .release= single_release_net,
 };
 #endif /* CONFIG_PROC_FS */
 
diff --git a/net/can/proc.c b/net/can/proc.c
index fdf704e9bb8c..fde2fd55b826 100644
--- a/net/can/proc.c
+++ b/net/can/proc.c
@@ -279,7 +279,7 @@ static const struct file_operations can_stats_proc_fops = {
.open   = can_stats_proc_open,
.read   = seq_read,
.llseek = seq_lseek,
-   .release= single_release,
+   .release= single_release_net,
 };
 
 static int can_reset_stats_proc_show(struct seq_file *m, void *v)
@@ -449,7 +449,7 @@ static const struct file_operations 
can_rcvlist_sff_proc_fops = {
.open   = can_rcvlist_sff_proc_open,
.read   = seq_read,
.llseek = seq_lseek,
-   .release= single_release,
+   .release= single_release_net,
 };
 
 
@@ -492,7 +492,7 @@ static const struct file_operations 
can_rcvlist_eff_proc_fops = {
.open   = can_rcvlist_eff_proc_open,
.read   = seq_read,
.llseek = seq_lseek,
-   .release= single_release,
+   .release= single_release_net,
 };
 
 /*
-- 
2.17.0



Re: mounting NFS on the same host leads to D state

2018-05-16 Thread maowenan
Hi,

I have tested in recent version 4.16 rc7 and find it also has the same issue.
@Eric do you have any comments about this nfs issue?

On 2018/5/15 11:47, maowenan wrote:
> Hi,
> 
> Recently I have tested NFS and exportfs scenario,
> that NFS server and client are in the same host.
> And I found mounting NFS filesystm onto the same host
> can lead to rpc.mountd and related task become D state.
> My kernel version is based on 3.10, and I find 4.15 has the same
> appearance.
> 
> My test step as below:
> 1)create dir.
> mkdir -p /home/test1 /home/test2
> 2)share dir /home/test1
> echo '/home/test1 localhost(rw,all_squash,anonuid=0,anongid=0)' > /etc/exports
> 3)exportfs
> exportfs -vr || echo "Failed to export /home/test1"
> 4)mount NFS.
> mount localhost:/home/test1 /home/test2 -o vers=3,soft
> 5)share dir /home/test2
> echo '/home/test2 *(rw,all_squash,anonuid=0,anongid=0)' >> /etc/exports
> 6)exportfs
> exportfs -vr
> 7) list /home/test2
> ls /home/test2
> then we found ls command is hung, ls and rpc.mountd became "D" state, and 
> after
> 180 second ls command return.
> 
> Another scenario as below:
> 1)create dir.
> mkdir -p /home/test3 /home/test4
> 2)share dir /home/test3
> echo '/home/test3 
> localhost(rw,sync,no_wdelay,anonuid=0,anongid=0,no_subtree_check)' > 
> /etc/exports
> 3)exportfs
> exportfs -r
> 4)to see NFS status
> showmount -e localhost
> 5)mount NFS
> mount -t nfs4 -o proto=tcp,nolock,soft,timeo=50 localhost:/home/test3 
> /home/test4
> 6) stop nfs service,and  and check ls task state is D.
> service nfs stop
> ls /home/test4
> ls command is hung and became D state.
> 
> I wonder to know is it reasonable about these test scenario because NFS 
> server and
> client are in the same host? Since some task went into D state, is there any 
> reason about this?
> and is there any patch to fix this issue?
> Here is a link to talk about NFS mounting on the same host,  
> https://lwn.net/Articles/595652/
> 



Re: INFO: rcu detected stall in sctp_packet_transmit

2018-05-16 Thread Xin Long
On Wed, May 16, 2018 at 4:11 PM, syzbot
 wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:961423f9fcbc Merge branch 'sctp-Introduce-sctp_flush_ctx'
> git tree:   net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1366aea780
> kernel config:  https://syzkaller.appspot.com/x/.config?x=51fb0a6913f757db
> dashboard link: https://syzkaller.appspot.com/bug?extid=ff0b569fb5111dcd1a36
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+ff0b569fb5111dcd1...@syzkaller.appspotmail.com
>
> INFO: rcu_sched self-detected stall on CPU
> 0-: (1 GPs behind) idle=dae/1/4611686018427387908
> softirq=93090/93091 fqs=30902
>  (t=125000 jiffies g=51107 c=51106 q=972)
> NMI backtrace for cpu 0
> CPU: 0 PID: 24668 Comm: syz-executor6 Not tainted 4.17.0-rc4+ #44
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
> RIP: 0010:sctp_v6_xmit+0x259/0x6b0 net/sctp/ipv6.c:219
> RSP: 0018:8801dae068e8 EFLAGS: 0246 ORIG_RAX: ff13
> RAX: 0007 RBX: 8801bb7ec800 RCX: 86f1b345
> RDX:  RSI: 86f1b381 RDI: 8801b73d97c4
> RBP: 8801dae06988 R08: 88019505c300 R09: ed003b5c46c2
> R10: ed003b5c46c2 R11: 8801dae23613 R12: 88011fd57300
> R13: 8801bb7ecec8 R14: 0029 R15: 0002
>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:642
>  sctp_outq_flush_transports net/sctp/outqueue.c:1164 [inline]
>  sctp_outq_flush+0x5f5/0x3430 net/sctp/outqueue.c:1212
>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
Shocks, this timer event again. Can we try to minimize the repo.syz and
get a short script, not neccessary to reproduce the issue 100%. we need
to know what it was doing when this happened.

Thanks.

>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>  
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
> [inline]
> RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160
> [inline]
> RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0
> kernel/locking/spinlock.c:184
> RSP: 0018:880196227328 EFLAGS: 0286 ORIG_RAX: ff13
> RAX: dc00 RBX: 0286 RCX: 
> RDX: 111a316d RSI: 0001 RDI: 0286
> RBP: 880196227338 R08: ed003b5c4b81 R09: 
> R10:  R11:  R12: 8801dae25c00
> R13: 8801dae25c80 R14: 880196227758 R15: 8801dae25c00
>  unlock_hrtimer_base kernel/time/hrtimer.c:887 [inline]
>  hrtimer_start_range_ns+0x692/0xd10 kernel/time/hrtimer.c:1118
>  hrtimer_start_expires include/linux/hrtimer.h:412 [inline]
>  futex_wait_queue_me+0x304/0x820 kernel/futex.c:2517
>  futex_wa

Re: [RFC PATCH bpf-next 00/12] AF_XDP, zero-copy support

2018-05-16 Thread Jesper Dangaard Brouer
On Tue, 15 May 2018 21:06:03 +0200
Björn Töpel  wrote:

> e have run some benchmarks on a dual socket system with two Broadwell
> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> cores which gives a total of 28, but only two cores are used in these
> experiments. One for TR/RX and one for the user space application. The
> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The
> NIC is Intel I40E 40Gbit/s using the i40e driver.
> 
> Below are the results in Mpps of the I40E NIC benchmark runs for 64
> and 1500 byte packets, generated by a commercial packet generator HW
> outputing packets at full 40 Gbit/s line rate. The results are without
> retpoline so that we can compare against previous numbers. 
> 
> AF_XDP performance 64 byte packets. Results from the AF_XDP V3 patch
> set are also reported for ease of reference.
> 
> Benchmark   XDP_SKBXDP_DRVXDP_DRV with zerocopy
> rxdrop   2.9*   9.6*   21.5
> txpush   2.6*   -  21.6
> l2fwd1.9*   2.5*   15.0

These performance numbers are actually amazing.

When reaching these amazing/crazy speeds, where we are approaching the
speed of light (travel 30 cm in 1 nanosec), we have to view these
numbers differently, because we are actually working on a nanosec scale.

21.5 Mpps is 46.5 nanosec.

If we want to optimize for +1 Mpps, then (1/22.5*10^3=44.44ns) your
actually only have to optimize the code with 2 nanosec, and with this
2.0 GHz CPU it should in theory only be 4 cycles, but likely have more
instructions per cycle (I see around 2.5 ins per cycle), so we are
looking at (2*2*2.5) needing to find 10 cycles for +1Mpps.

Comparing to XDP_DROP of 32.3Mpps vs ZC-rxdrop 21.5Mpps, this is
actually only a "slowdown" of 15.55 ns, for having frame travel through
xdp_do_redirect, do map lookup etc, and queue into userspace, and
return frames back to kernel.  That is rather amazingly fast.

  (1/21.5*10^3)-(1/32.3*10^3) = 15.55 ns

Another performance number which is amazing is your l2fwd number of
15Mpps, because it if faster than xdp_redirect_map on i40e NICs on my
system, which runs at 12.2 Mpps (2.8Mpps slower).  Again looking at the
nanosec scale instead, this correspond to 15.3 ns.
  I expect, this improvement comes from avoiding page_frag_free, and
avoiding the TX dma_map call (as you premap pages DMA for TX). Reverse
calculating based on perf percentage, I find that these should only
cost 7.18 ns.  Maybe the rest is because you are running TX and TX-dma
completion on another CPU.

I notice you are also using the XDP return-API, which still does a
rhashtable_lookup per frame.  I plan to optimize this to do bulking, to
get away from per frame lookup.  Thus, this should get even faster.


> * From AF_XDP V3 patch set and cover letter.
> 
> AF_XDP performance 1500 byte packets:
> Benchmark   XDP_SKB   XDP_DRV XDP_DRV with zerocopy
> rxdrop   2.13.3   3.3
> l2fwd1.41.8   3.1
> 
> So why do we not get higher values for RX similar to the 34 Mpps we
> had in AF_PACKET V4? We made an experiment running the rxdrop
> benchmark without using the xdp_do_redirect/flush infrastructure nor
> using an XDP program (all traffic on a queue goes to one
> socket). Instead the driver acts directly on the AF_XDP socket. With
> this we got 36.9 Mpps, a significant improvement without any change to
> the uapi. So not forcing users to have an XDP program if they do not
> need it, might be a good idea. This measurement is actually higher
> than what we got with AF_PACKET V4.

So, that are you telling me with your number 36.9 Mpps for
direct-socket-rxdrop...

Compared to XDP_DROP at 32.3Mpps, are you saying that it only costs
3.86 nanosec to call the XDP bpf_prog which returns XDP_DROP.  That is
very impressive actually. (1/32.3*10^3)-(1/36.9*10^3)

Compared to ZC-AF_XDP rxdrop 21.5Mpps, are you saying the cost of XDP
redirect infrastructure, map lookups etc (incl. return-API per frame)
cost 19.41 nanosec (1/21.5*10^3)-(1/36.9*10^3).  Which is approx 40
clock-cycles or 100 (speculative) instructions.  That is not too bad,
and we are still optimizing this stuff.


> XDP performance on our system as a base line:
> 
> 64 byte packets:
> XDP stats   CPU pps issue-pps
> XDP-RX CPU  16  32.3M  0
> 
> 1500 byte packets:
> XDP stats   CPU pps issue-pps
> XDP-RX CPU  16  3.3M0

Overall I'm *very* impressed by the performance of ZC AF_XDP.
Just remember that measuring improvement in +N Mpps, is actually
misleading, when operating at these (light) speeds.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: INFO: rcu detected stall in sctp_packet_transmit

2018-05-16 Thread Dmitry Vyukov
On Wed, May 16, 2018 at 12:44 PM, Xin Long  wrote:
> On Wed, May 16, 2018 at 4:11 PM, syzbot
>  wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:961423f9fcbc Merge branch 'sctp-Introduce-sctp_flush_ctx'
>> git tree:   net-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1366aea780
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=51fb0a6913f757db
>> dashboard link: https://syzkaller.appspot.com/bug?extid=ff0b569fb5111dcd1a36
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+ff0b569fb5111dcd1...@syzkaller.appspotmail.com
>>
>> INFO: rcu_sched self-detected stall on CPU
>> 0-: (1 GPs behind) idle=dae/1/4611686018427387908
>> softirq=93090/93091 fqs=30902
>>  (t=125000 jiffies g=51107 c=51106 q=972)
>> NMI backtrace for cpu 0
>> CPU: 0 PID: 24668 Comm: syz-executor6 Not tainted 4.17.0-rc4+ #44
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>>  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
>>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>> RIP: 0010:sctp_v6_xmit+0x259/0x6b0 net/sctp/ipv6.c:219
>> RSP: 0018:8801dae068e8 EFLAGS: 0246 ORIG_RAX: ff13
>> RAX: 0007 RBX: 8801bb7ec800 RCX: 86f1b345
>> RDX:  RSI: 86f1b381 RDI: 8801b73d97c4
>> RBP: 8801dae06988 R08: 88019505c300 R09: ed003b5c46c2
>> R10: ed003b5c46c2 R11: 8801dae23613 R12: 88011fd57300
>> R13: 8801bb7ecec8 R14: 0029 R15: 0002
>>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:642
>>  sctp_outq_flush_transports net/sctp/outqueue.c:1164 [inline]
>>  sctp_outq_flush+0x5f5/0x3430 net/sctp/outqueue.c:1212
>>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
> Shocks, this timer event again. Can we try to minimize the repo.syz and
> get a short script, not neccessary to reproduce the issue 100%. we need
> to know what it was doing when this happened.
>
> Thanks.

It's possible to reply the whole log from console output following
these instructions:
https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md


>>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>>  expire_timers kernel/time/timer.c:1363 [inline]
>>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>>  invoke_softirq kernel/softirq.c:365 [inline]
>>  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>>  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>>  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>>  
>> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
>> [inline]
>> RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160
>> [inline]
>> RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0
>> kernel/locking/spinlock.c:184
>> RSP: 0018:880196227328 EFLAGS: 0286 ORIG_RAX: ff13
>> RAX: dc00 RBX: 0286 RCX: 
>> RDX: 111a316d RSI: 0001 RDI: 0286
>> RBP: 880196227338 R08: ed003b5c4b81 R09: 
>> R10:  R11:  R12: 8801dae25c

[PATCH net] net/sched: fix refcnt leak in the error path of tcf_vlan_init()

2018-05-16 Thread Davide Caratti
Similarly to what was done with commit a52956dfc503 ("net sched actions:
fix refcnt leak in skbmod"), fix the error path of tcf_vlan_init() to avoid
refcnt leaks when wrong value of TCA_VLAN_PUSH_VLAN_PROTOCOL is given.

Fixes: 5026c9b1bafc ("net sched: vlan action fix late binding")
CC: Roman Mashak 
Signed-off-by: Davide Caratti 
---
 net/sched/act_vlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 853604685965..1fb39e1f9d07 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -161,6 +161,8 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
case htons(ETH_P_8021AD):
break;
default:
+   if (exists)
+   tcf_idr_release(*a, bind);
return -EPROTONOSUPPORT;
}
} else {
-- 
2.17.0



[PATCH net-next] ath10k: Remove useless test before clk_disable_unprepare

2018-05-16 Thread YueHaibing
clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.

Signed-off-by: YueHaibing 
---
 drivers/net/wireless/ath/ath10k/ahb.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ahb.c 
b/drivers/net/wireless/ath/ath10k/ahb.c
index 35d1049..fa39fff 100644
--- a/drivers/net/wireless/ath/ath10k/ahb.c
+++ b/drivers/net/wireless/ath/ath10k/ahb.c
@@ -180,14 +180,11 @@ static void ath10k_ahb_clock_disable(struct ath10k *ar)
 {
struct ath10k_ahb *ar_ahb = ath10k_ahb_priv(ar);
 
-   if (!IS_ERR_OR_NULL(ar_ahb->cmd_clk))
-   clk_disable_unprepare(ar_ahb->cmd_clk);
+   clk_disable_unprepare(ar_ahb->cmd_clk);
 
-   if (!IS_ERR_OR_NULL(ar_ahb->ref_clk))
-   clk_disable_unprepare(ar_ahb->ref_clk);
+   clk_disable_unprepare(ar_ahb->ref_clk);
 
-   if (!IS_ERR_OR_NULL(ar_ahb->rtc_clk))
-   clk_disable_unprepare(ar_ahb->rtc_clk);
+   clk_disable_unprepare(ar_ahb->rtc_clk);
 }
 
 static int ath10k_ahb_rst_ctrl_init(struct ath10k *ar)
-- 
2.7.0




[PATCH net-next] net: stmmac: Remove useless test before clk_disable_unprepare

2018-05-16 Thread YueHaibing
clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.

Signed-off-by: YueHaibing 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index 13133b3..f08625a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -1104,30 +1104,20 @@ static int gmac_clk_enable(struct rk_priv_data 
*bsp_priv, bool enable)
} else {
if (bsp_priv->clk_enabled) {
if (phy_iface == PHY_INTERFACE_MODE_RMII) {
-   if (!IS_ERR(bsp_priv->mac_clk_rx))
-   clk_disable_unprepare(
-   bsp_priv->mac_clk_rx);
+   clk_disable_unprepare(bsp_priv->mac_clk_rx);
 
-   if (!IS_ERR(bsp_priv->clk_mac_ref))
-   clk_disable_unprepare(
-   bsp_priv->clk_mac_ref);
+   clk_disable_unprepare(bsp_priv->clk_mac_ref);
 
-   if (!IS_ERR(bsp_priv->clk_mac_refout))
-   clk_disable_unprepare(
-   bsp_priv->clk_mac_refout);
+   clk_disable_unprepare(bsp_priv->clk_mac_refout);
}
 
-   if (!IS_ERR(bsp_priv->clk_phy))
-   clk_disable_unprepare(bsp_priv->clk_phy);
+   clk_disable_unprepare(bsp_priv->clk_phy);
 
-   if (!IS_ERR(bsp_priv->aclk_mac))
-   clk_disable_unprepare(bsp_priv->aclk_mac);
+   clk_disable_unprepare(bsp_priv->aclk_mac);
 
-   if (!IS_ERR(bsp_priv->pclk_mac))
-   clk_disable_unprepare(bsp_priv->pclk_mac);
+   clk_disable_unprepare(bsp_priv->pclk_mac);
 
-   if (!IS_ERR(bsp_priv->mac_clk_tx))
-   clk_disable_unprepare(bsp_priv->mac_clk_tx);
+   clk_disable_unprepare(bsp_priv->mac_clk_tx);
/**
 * if (!IS_ERR(bsp_priv->clk_mac))
 *  clk_disable_unprepare(bsp_priv->clk_mac);
-- 
2.7.0




Re: [PATCH net-next v2 2/2] drivers: net: Remove device_node checks with of_mdiobus_register()

2018-05-16 Thread Jose Abreu
On 16-05-2018 00:56, Florian Fainelli wrote:
> A number of drivers have the following pattern:
>
> if (np)
>   of_mdiobus_register()
> else
>   mdiobus_register()
>
> which the implementation of of_mdiobus_register() now takes care of.
> Remove that pattern in drivers that strictly adhere to it.
>
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/net/dsa/bcm_sf2.c |  8 ++--
>  drivers/net/dsa/mv88e6xxx/chip.c  |  5 +
>  drivers/net/ethernet/cadence/macb_main.c  | 12 +++-
>  drivers/net/ethernet/freescale/fec_main.c |  8 ++--
>  drivers/net/ethernet/marvell/mvmdio.c |  5 +
>  drivers/net/ethernet/renesas/sh_eth.c | 11 +++
>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c |  5 +

For stmmac:

Reviewed-by: Jose Abreu 

Thanks and Best Regards,
Jose Miguel Abreu

>  drivers/net/ethernet/ti/davinci_mdio.c|  8 +++-
>  drivers/net/phy/mdio-gpio.c   |  6 +-
>  drivers/net/phy/mdio-mscc-miim.c  |  6 +-
>  drivers/net/usb/lan78xx.c |  7 ++-
>  11 files changed, 20 insertions(+), 61 deletions(-)
>



Re: INFO: rcu detected stall in sctp_packet_transmit

2018-05-16 Thread Xin Long
On Wed, May 16, 2018 at 6:53 PM, Dmitry Vyukov  wrote:
> On Wed, May 16, 2018 at 12:44 PM, Xin Long  wrote:
>> On Wed, May 16, 2018 at 4:11 PM, syzbot
>>  wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit:961423f9fcbc Merge branch 'sctp-Introduce-sctp_flush_ctx'
>>> git tree:   net-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1366aea780
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=51fb0a6913f757db
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=ff0b569fb5111dcd1a36
>>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>>>
>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+ff0b569fb5111dcd1...@syzkaller.appspotmail.com
>>>
>>> INFO: rcu_sched self-detected stall on CPU
>>> 0-: (1 GPs behind) idle=dae/1/4611686018427387908
>>> softirq=93090/93091 fqs=30902
>>>  (t=125000 jiffies g=51107 c=51106 q=972)
>>> NMI backtrace for cpu 0
>>> CPU: 0 PID: 24668 Comm: syz-executor6 Not tainted 4.17.0-rc4+ #44
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>>> Google 01/01/2011
>>> Call Trace:
>>>  
>>>  __dump_stack lib/dump_stack.c:77 [inline]
>>>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>>>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>>>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>>>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>>>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>>>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>>>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>>>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>>>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>>>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>>>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>>>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>>>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>>>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>>>  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>>>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
>>>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>>>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>>> RIP: 0010:sctp_v6_xmit+0x259/0x6b0 net/sctp/ipv6.c:219
>>> RSP: 0018:8801dae068e8 EFLAGS: 0246 ORIG_RAX: ff13
>>> RAX: 0007 RBX: 8801bb7ec800 RCX: 86f1b345
>>> RDX:  RSI: 86f1b381 RDI: 8801b73d97c4
>>> RBP: 8801dae06988 R08: 88019505c300 R09: ed003b5c46c2
>>> R10: ed003b5c46c2 R11: 8801dae23613 R12: 88011fd57300
>>> R13: 8801bb7ecec8 R14: 0029 R15: 0002
>>>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:642
>>>  sctp_outq_flush_transports net/sctp/outqueue.c:1164 [inline]
>>>  sctp_outq_flush+0x5f5/0x3430 net/sctp/outqueue.c:1212
>>>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>>>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>>>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>>>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>>>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
>> Shocks, this timer event again. Can we try to minimize the repo.syz and
>> get a short script, not neccessary to reproduce the issue 100%. we need
>> to know what it was doing when this happened.
>>
>> Thanks.
>
> It's possible to reply the whole log from console output following
> these instructions:
> https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md
Thanks, it's running now.
Usually how long will it take to finish running this 5000-line log?

>
>
>>>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>>>  expire_timers kernel/time/timer.c:1363 [inline]
>>>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>>>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>>>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>>>  invoke_softirq kernel/softirq.c:365 [inline]
>>>  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>>>  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>>>  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>>>  
>>> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
>>> [inline]
>>> RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160
>>> [inline]
>>> RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0
>>> kernel/locking/spinlock.c:184
>>> RSP: 0018:880196227328 EFLAGS: 0286 ORIG_RAX: ff13
>>> RAX: dc00

[PATCH net-next] net: ethoc: Remove useless test before clk_disable_unprepare

2018-05-16 Thread YueHaibing
clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.

Signed-off-by: YueHaibing 
---
 drivers/net/ethernet/ethoc.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 8bb0db9..00a5727 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -1246,8 +1246,7 @@ static int ethoc_probe(struct platform_device *pdev)
mdiobus_unregister(priv->mdio);
mdiobus_free(priv->mdio);
 free2:
-   if (priv->clk)
-   clk_disable_unprepare(priv->clk);
+   clk_disable_unprepare(priv->clk);
 free:
free_netdev(netdev);
 out:
@@ -1271,8 +1270,7 @@ static int ethoc_remove(struct platform_device *pdev)
mdiobus_unregister(priv->mdio);
mdiobus_free(priv->mdio);
}
-   if (priv->clk)
-   clk_disable_unprepare(priv->clk);
+   clk_disable_unprepare(priv->clk);
unregister_netdev(netdev);
free_netdev(netdev);
}
-- 
2.7.0




Re: [PATCH net-next v3 0/7] Microsemi Ocelot Ethernet switch support

2018-05-16 Thread Alexandre Belloni
On 14/05/2018 22:47:35+0100, James Hogan wrote:
> On Mon, May 14, 2018 at 10:58:44PM +0200, Andrew Lunn wrote:
> > Hi Alexandre
> > > 
> > > The ocelot dts changes are here for reference and should probably go
> > > through the MIPS tree once the bindings are accepted.
> > 
> > For your next version, you probably want to drop those patches, so
> > that David can apply the network patches to net-next.
> 
> Since it sounds like the net patches are ready now, I'll apply the MIPS
> DTS ones for 4.18.
> 

They are in now, tell me if you want me to resend.

Anyway, I'll probably send another patch to enable the driver in
board-ocelot.config


-- 
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


  1   2   3   4   >