Re: [PATCH v9 4/4] s390: ap: kvm: Enable PQAP/AQIC facility for the guest
On 26.06.19 23:12, Tony Krowiak wrote: > On 6/25/19 4:15 PM, Christian Borntraeger wrote: >> >> >> On 25.06.19 22:13, Christian Borntraeger wrote: >>> >>> >>> On 21.05.19 17:34, Pierre Morel wrote: AP Queue Interruption Control (AQIC) facility gives the guest the possibility to control interruption for the Cryptographic Adjunct Processor queues. Signed-off-by: Pierre Morel Reviewed-by: Tony Krowiak --- arch/s390/tools/gen_facilities.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c index 61ce5b5..aed14fc 100644 --- a/arch/s390/tools/gen_facilities.c +++ b/arch/s390/tools/gen_facilities.c @@ -114,6 +114,7 @@ static struct facility_def facility_defs[] = { .bits = (int[]){ 12, /* AP Query Configuration Information */ 15, /* AP Facilities Test */ + 65, /* AP Queue Interruption Control */ 156, /* etoken facility */ -1 /* END */ } >>> >>> I think we should only set stfle.65 if we have the aiv facility (Because we >>> do not >>> have a GISA otherwise) > > My assumption here is that you are taking the line added above > (STFLE.65) out and replacing with one of the two suggestions > below. Yes, I want to replace this hunk. I am quite fuzzy on how all of this CPU model stuff works, > but I am thinking that the above makes STFLE.65 available to be > set via the CPU model (i.e., aqic=on on the QEMU command line) as > long as it is supported by the host. Yes, it makes it available when the host has stfle.65. But at the same time it does not look if the adapter interruption virtualization facility is available. For example for vsie the guest2 will enable stfle.65 for its guests, but we do not support AIV. By taking that line out, we > are relying on one of the suggestions below to make STFLE.65 > available to the guest only if AIV facility is available. Does that > sound about right? > > If that is the case, then wouldn't we also have to add a check to make > sure that STFLE.65 is available on the host (i.e., test_facility(65))? I think AIV in level n is enough to provide STFLE.65 in level n+1. On the other hand also checking for stfle.65 does not hurt. > > >>> >>> So something like this instead? >>> >>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c >>> index 28ebd64..1501cd6 100644 >>> --- a/arch/s390/kvm/kvm-s390.c >>> +++ b/arch/s390/kvm/kvm-s390.c >>> @@ -2461,6 +2461,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long >>> type) >>> set_kvm_facility(kvm->arch.model.fac_list, 147); >>> } >>> + if (css_general_characteristics.aiv) >>> + set_kvm_facility(kvm->arch.model.fac_mask, 65); >>> + >>> kvm->arch.model.cpuid = kvm_s390_get_initial_cpuid(); >>> kvm->arch.model.ibc = sclp.ibc & 0x0fff; >>> >> >> Maybe even just piggyback on gisa init (it will bail out early). > > It could also go in the kvm_s390_crypto_init() function since it > is related to crypto. > >> >> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c >> index 9dde4d7..9182a04 100644 >> --- a/arch/s390/kvm/interrupt.c >> +++ b/arch/s390/kvm/interrupt.c >> @@ -3100,6 +3100,7 @@ void kvm_s390_gisa_init(struct kvm *kvm) >> gi->timer.function = gisa_vcpu_kicker; >> memset(gi->origin, 0, sizeof(struct kvm_s390_gisa)); >> gi->origin->next_alert = (u32)(u64)gi->origin; >> + set_kvm_facility(kvm->arch.model.fac_mask, 65); >> VM_EVENT(kvm, 3, "gisa 0x%pK initialized", gi->origin); >> } >> >
Re: [PATCH v3 6/7] x86/smpboot: introduce per-cpu variable for HT siblings
On Thu, 27 Jun 2019, Thomas Gleixner wrote: > On Wed, 26 Jun 2019, subhra mazumdar wrote: > > > Introduce a per-cpu variable to keep the number of HT siblings of a cpu. > > This will be used for quick lookup in select_idle_cpu to determine the > > limits of search. > > Why? The number of siblings is constant at least today unless you play > silly cpu hotplug games. A bit more justification for adding yet another > random storage would be appreciated. > > > This patch does it only for x86. > > # grep 'This patch' Documentation/process/submitting-patches.rst > > IOW, we all know already that this is a patch and from the subject prefix > and the diffstat it's pretty obvious that this is x86 only. > > So instead of documenting the obvious, please add proper context to justify > the change. Aside of that the right ordering is to introduce the default fallback in a separate patch, which explains the reasoning and then in the next one add the x86 optimized version. Thanks, tglx
Re: [PATCH v3 6/7] x86/smpboot: introduce per-cpu variable for HT siblings
On Wed, 26 Jun 2019, subhra mazumdar wrote: > Introduce a per-cpu variable to keep the number of HT siblings of a cpu. > This will be used for quick lookup in select_idle_cpu to determine the > limits of search. Why? The number of siblings is constant at least today unless you play silly cpu hotplug games. A bit more justification for adding yet another random storage would be appreciated. > This patch does it only for x86. # grep 'This patch' Documentation/process/submitting-patches.rst IOW, we all know already that this is a patch and from the subject prefix and the diffstat it's pretty obvious that this is x86 only. So instead of documenting the obvious, please add proper context to justify the change. > +/* representing number of HT siblings of each CPU */ > +DEFINE_PER_CPU_READ_MOSTLY(unsigned int, cpumask_weight_sibling); > +EXPORT_PER_CPU_SYMBOL(cpumask_weight_sibling); Why does this need an export? No module has any reason to access this. > /* representing HT and core siblings of each logical CPU */ > DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map); > EXPORT_PER_CPU_SYMBOL(cpu_core_map); > @@ -520,6 +524,8 @@ void set_cpu_sibling_map(int cpu) > > if (!has_mp) { > cpumask_set_cpu(cpu, topology_sibling_cpumask(cpu)); > + per_cpu(cpumask_weight_sibling, cpu) = > + cpumask_weight(topology_sibling_cpumask(cpu)); > cpumask_set_cpu(cpu, cpu_llc_shared_mask(cpu)); > cpumask_set_cpu(cpu, topology_core_cpumask(cpu)); > c->booted_cores = 1; > @@ -529,8 +535,12 @@ void set_cpu_sibling_map(int cpu) > for_each_cpu(i, cpu_sibling_setup_mask) { > o = &cpu_data(i); > > - if ((i == cpu) || (has_smt && match_smt(c, o))) > + if ((i == cpu) || (has_smt && match_smt(c, o))) { > link_mask(topology_sibling_cpumask, cpu, i); > + threads = cpumask_weight(topology_sibling_cpumask(cpu)); > + per_cpu(cpumask_weight_sibling, cpu) = threads; > + per_cpu(cpumask_weight_sibling, i) = threads; This only works for SMT=2, but fails to update the rest for SMT=4. > @@ -1482,6 +1494,8 @@ static void remove_siblinginfo(int cpu) > > for_each_cpu(sibling, topology_core_cpumask(cpu)) { > cpumask_clear_cpu(cpu, topology_core_cpumask(sibling)); > + per_cpu(cpumask_weight_sibling, sibling) = > + cpumask_weight(topology_sibling_cpumask(sibling)); While remove does the right thing. Thanks, tglx
KASAN: use-after-free Read in xlog_alloc_log
Hello, syzbot found the following crash on: HEAD commit:1dd45f17 Add linux-next specific files for 20190626 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=172479e9a0 kernel config: https://syzkaller.appspot.com/x/.config?x=c1222640552e42a5 dashboard link: https://syzkaller.appspot.com/bug?extid=b75afdbe271a0d7ac4f6 compiler: gcc (GCC) 9.0.0 20181231 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+b75afdbe271a0d7ac...@syzkaller.appspotmail.com XFS (loop5): Mounting V4 Filesystem == BUG: KASAN: use-after-free in xlog_alloc_log+0x1266/0x1380 fs/xfs/xfs_log.c:1478 Read of size 8 at addr 8880693e2990 by task syz-executor.5/12241 CPU: 1 PID: 12241 Comm: syz-executor.5 Not tainted 5.2.0-rc6-next-20190626 #23 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351 __kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482 kasan_report+0x12/0x17 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 xlog_alloc_log+0x1266/0x1380 fs/xfs/xfs_log.c:1478 xfs_log_mount+0xdc/0x780 fs/xfs/xfs_log.c:580 xfs_mountfs+0xdb9/0x1be0 fs/xfs/xfs_mount.c:815 xfs_fs_fill_super+0xca6/0x16c0 fs/xfs/xfs_super.c:1740 mount_bdev+0x304/0x3c0 fs/super.c:1346 xfs_fs_mount+0x35/0x40 fs/xfs/xfs_super.c:1814 legacy_get_tree+0x108/0x220 fs/fs_context.c:661 vfs_get_tree+0x8e/0x390 fs/super.c:1476 do_new_mount fs/namespace.c:2791 [inline] do_mount+0x138c/0x1c00 fs/namespace.c:3111 ksys_mount+0xdb/0x150 fs/namespace.c:3320 __do_sys_mount fs/namespace.c:3334 [inline] __se_sys_mount fs/namespace.c:3331 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3331 do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45bf6a Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 9d 8d fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 7a 8d fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:7fac99605a88 EFLAGS: 0206 ORIG_RAX: 00a5 RAX: ffda RBX: 7fac99605b40 RCX: 0045bf6a RDX: 7fac99605ae0 RSI: 2000 RDI: 7fac99605b00 RBP: 0001 R08: 7fac99605b40 R09: 7fac99605ae0 R10: R11: 0206 R12: 0004 R13: 004c858e R14: 004df0e0 R15: Allocated by task 12241: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc mm/slab.c:3656 [inline] __kmalloc+0x163/0x770 mm/slab.c:3665 kmalloc include/linux/slab.h:556 [inline] kmem_alloc+0xd2/0x200 fs/xfs/kmem.c:24 kmem_zalloc fs/xfs/kmem.h:73 [inline] xlog_alloc_log+0xbf4/0x1380 fs/xfs/xfs_log.c:1420 xfs_log_mount+0xdc/0x780 fs/xfs/xfs_log.c:580 xfs_mountfs+0xdb9/0x1be0 fs/xfs/xfs_mount.c:815 xfs_fs_fill_super+0xca6/0x16c0 fs/xfs/xfs_super.c:1740 mount_bdev+0x304/0x3c0 fs/super.c:1346 xfs_fs_mount+0x35/0x40 fs/xfs/xfs_super.c:1814 legacy_get_tree+0x108/0x220 fs/fs_context.c:661 vfs_get_tree+0x8e/0x390 fs/super.c:1476 do_new_mount fs/namespace.c:2791 [inline] do_mount+0x138c/0x1c00 fs/namespace.c:3111 ksys_mount+0xdb/0x150 fs/namespace.c:3320 __do_sys_mount fs/namespace.c:3334 [inline] __se_sys_mount fs/namespace.c:3331 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3331 do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 12241: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x2c0 mm/slab.c:3757 kvfree+0x61/0x70 mm/util.c:488 kmem_free fs/xfs/kmem.h:66 [inline] xlog_alloc_log+0xea9/0x1380 fs/xfs/xfs_log.c:1480 xfs_log_mount+0xdc/0x780 fs/xfs/xfs_log.c:580 xfs_mountfs+0xdb9/0x1be0 fs/xfs/xfs_mount.c:815 xfs_fs_fill_super+0xca6/0x16c0 fs/xfs/xfs_super.c:1740 mount_bdev+0x304/0x3c0 fs/super.c:1346 xfs_fs_mount+0x35/0x40 fs/xfs/xfs_super.c:1814 legacy_get_tree+0x108/0x220 fs/fs_context.c:661 vfs_get_tree+0x8e/0x390 fs/super.c:1476 do_new_mount fs/namespace.c:2791 [inline] do_mount+0x138c/0x1c00 fs/namespace.c:3111 ksys_mount+0xdb/0x150 fs/namespace.c:3320 __do_sys_mount fs/namespace.c:3334 [inline] __se_sys_mount fs/namespace.c:3331 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3331 do_syscall_64
KASAN: use-after-free Write in xfrm_policy_flush
Hello, syzbot found the following crash on: HEAD commit:249155c2 Merge branch 'parisc-5.2-4' of git://git.kernel.o.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=10f1198da0 kernel config: https://syzkaller.appspot.com/x/.config?x=9a31528e58cc12e2 dashboard link: https://syzkaller.appspot.com/bug?extid=2daeb7ae5e8245095f65 compiler: clang version 9.0.0 (/home/glider/llvm/clang 80fee25776c2fb61e74c1ecb1a523375c2500b69) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+2daeb7ae5e8245095...@syzkaller.appspotmail.com netlink: 168 bytes leftover after parsing attributes in process `syz-executor.2'. == BUG: KASAN: use-after-free in __write_once_size include/linux/compiler.h:221 [inline] BUG: KASAN: use-after-free in __hlist_del include/linux/list.h:748 [inline] BUG: KASAN: use-after-free in hlist_del_rcu include/linux/rculist.h:455 [inline] BUG: KASAN: use-after-free in __xfrm_policy_unlink net/xfrm/xfrm_policy.c:2217 [inline] BUG: KASAN: use-after-free in xfrm_policy_flush+0x3be/0x900 net/xfrm/xfrm_policy.c:1794 Write of size 8 at addr 8880a5cfdd00 by task syz-executor.2/31717 CPU: 0 PID: 31717 Comm: syz-executor.2 Not tainted 5.2.0-rc6+ #7 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113 print_address_description+0x6d/0x310 mm/kasan/report.c:188 __kasan_report+0x14b/0x1c0 mm/kasan/report.c:317 kasan_report+0x26/0x50 mm/kasan/common.c:614 __asan_report_store8_noabort+0x17/0x20 mm/kasan/generic_report.c:137 __write_once_size include/linux/compiler.h:221 [inline] __hlist_del include/linux/list.h:748 [inline] hlist_del_rcu include/linux/rculist.h:455 [inline] __xfrm_policy_unlink net/xfrm/xfrm_policy.c:2217 [inline] xfrm_policy_flush+0x3be/0x900 net/xfrm/xfrm_policy.c:1794 xfrm_flush_policy+0x132/0x3c0 net/xfrm/xfrm_user.c:2123 xfrm_user_rcv_msg+0x46b/0x720 net/xfrm/xfrm_user.c:2657 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2482 xfrm_netlink_rcv+0x74/0x90 net/xfrm/xfrm_user.c:2665 netlink_unicast_kernel net/netlink/af_netlink.c:1307 [inline] netlink_unicast+0x962/0xaf0 net/netlink/af_netlink.c:1333 netlink_sendmsg+0xa7a/0xd40 net/netlink/af_netlink.c:1922 sock_sendmsg_nosec net/socket.c:646 [inline] sock_sendmsg net/socket.c:665 [inline] ___sys_sendmsg+0x66b/0x9a0 net/socket.c:2286 __sys_sendmsg net/socket.c:2324 [inline] __do_sys_sendmsg net/socket.c:2333 [inline] __se_sys_sendmsg net/socket.c:2331 [inline] __x64_sys_sendmsg+0x1cf/0x290 net/socket.c:2331 do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x459519 Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7f4e7b5f5c78 EFLAGS: 0246 ORIG_RAX: 002e RAX: ffda RBX: 0003 RCX: 00459519 RDX: RSI: 2014f000 RDI: 0003 RBP: 0075bf20 R08: R09: R10: R11: 0246 R12: 7f4e7b5f66d4 R13: 004c7264 R14: 004dc6c8 R15: Allocated by task 8433: save_stack mm/kasan/common.c:71 [inline] set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:489 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc mm/slab.c:3660 [inline] __kmalloc+0x23c/0x310 mm/slab.c:3669 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:742 [inline] xfrm_hash_alloc+0x38/0xe0 net/xfrm/xfrm_hash.c:21 xfrm_policy_init net/xfrm/xfrm_policy.c:4036 [inline] xfrm_net_init+0x269/0xd60 net/xfrm/xfrm_policy.c:4120 ops_init+0x336/0x420 net/core/net_namespace.c:130 setup_net+0x212/0x690 net/core/net_namespace.c:316 copy_net_ns+0x224/0x380 net/core/net_namespace.c:439 create_new_namespaces+0x4ec/0x700 kernel/nsproxy.c:103 unshare_nsproxy_namespaces+0x12a/0x190 kernel/nsproxy.c:202 ksys_unshare+0x540/0xac0 kernel/fork.c:2692 __do_sys_unshare kernel/fork.c:2760 [inline] __se_sys_unshare kernel/fork.c:2758 [inline] __x64_sys_unshare+0x38/0x40 kernel/fork.c:2758 do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 21222: save_stack mm/kasan/common.c:71 [inline] set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x12a/0x1e0 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xae/0x120 mm/slab.c:3755 xfrm_hash_free+0x38/0xd0 net/xfrm/xfrm_hash.c:35 xfrm_bydst_resize net/xfrm/xfrm_policy.c:602 [in
Re: [PATCH] soc: imx-scu: Add SoC UID(unique identifier) support
On Thu, Jun 27, 2019 at 3:48 AM Anson Huang wrote: > > Hi, Daniel > > > -Original Message- > > From: Daniel Baluta > > Sent: Wednesday, June 26, 2019 8:42 PM > > To: Anson Huang > > Cc: Shawn Guo ; Sascha Hauer > > ; Pengutronix Kernel Team > > ; Fabio Estevam ; Aisheng > > Dong ; Abel Vesa ; linux- > > arm-kernel ; Linux Kernel Mailing List > > ; dl-linux-imx ; Daniel > > Baluta > > Subject: Re: [PATCH] soc: imx-scu: Add SoC UID(unique identifier) support > > > > On Wed, Jun 26, 2019 at 10:06 AM wrote: > > > > > > From: Anson Huang > > > > > > Add i.MX SCU SoC's UID(unique identifier) support, user can read it > > > from sysfs: > > > > > > root@imx8qxpmek:~# cat /sys/devices/soc0/soc_uid > > > 7B64280B57AC1898 > > > > > > Signed-off-by: Anson Huang > > > --- > > > drivers/soc/imx/soc-imx-scu.c | 35 > > > +++ > > > 1 file changed, 35 insertions(+) > > > > > > diff --git a/drivers/soc/imx/soc-imx-scu.c > > > b/drivers/soc/imx/soc-imx-scu.c index 676f612..8d322a1 100644 > > > --- a/drivers/soc/imx/soc-imx-scu.c > > > +++ b/drivers/soc/imx/soc-imx-scu.c > > > @@ -27,6 +27,36 @@ struct imx_sc_msg_misc_get_soc_id { > > > } data; > > > } __packed; > > > > > > +struct imx_sc_msg_misc_get_soc_uid { > > > + struct imx_sc_rpc_msg hdr; > > > + u32 uid_low; > > > + u32 uid_high; > > > +} __packed; > > > + > > > +static ssize_t soc_uid_show(struct device *dev, > > > + struct device_attribute *attr, char *buf) > > > +{ > > > + struct imx_sc_msg_misc_get_soc_uid msg; > > > + struct imx_sc_rpc_msg *hdr = &msg.hdr; > > > + u64 soc_uid; > > > + > > > + hdr->ver = IMX_SC_RPC_VERSION; > > > + hdr->svc = IMX_SC_RPC_SVC_MISC; > > > + hdr->func = IMX_SC_MISC_FUNC_UNIQUE_ID; > > > + hdr->size = 1; > > > + > > > + /* the return value of SCU FW is in correct, skip return value > > > + check */ > > > > Why do you mean by "in correct"? > > I made a mistake, it should be "incorrect", the existing SCFW of this API > returns > an error value even this API is successfully called, to make it work with > current > SCFW, I have to skip the return value check for this API for now. Will send > V2 patch > to fix this typo. Thanks Anson! It makes sense now. It is a little bit sad though because we won't know when there is a "real" error :). Lets update the comment to be more specific: /* SCFW FW API always returns an error even the function is successfully executed, so skip returned value */ > > > + imx_scu_call_rpc(soc_ipc_handle, &msg, true); > > > + > > > + soc_uid = msg.uid_high; > > > + soc_uid <<= 32; > > > + soc_uid |= msg.uid_low; > > > + > > > + return sprintf(buf, "%016llX\n", soc_uid); > > > > snprintf? > > The snprintf is to avoid buffer overflow, which in this case, I don't know > the size > of "buf", and the value(u64) to be printed is with fixed length of 64, so I > think > sprint is just OK. Ok.
Re: [PATCH] staging: rtl8723bs: hal: sdio_halinit: Remove set but unused varilable pHalData
On Wed, Jun 26, 2019 at 11:14:59PM +0530, Hariprasad Kelam wrote: > @@ -1433,7 +1430,6 @@ static void SetHwReg8723BS(struct adapter *padapter, u8 > variable, u8 *val) > #endif > #endif > > - pHalData = GET_HAL_DATA(padapter); > > switch (variable) { We need to delete one of those blank lines or it introduces a new checkpatch warning. regards, dan carpenter
Re: [PATCH v9 11/12] soc: mediatek: cmdq: add cmdq_dev_get_client_reg function
Hi, Bibby: On Thu, 2019-06-27 at 14:19 +0800, Bibby Hsieh wrote: > GCE cannot know the register base address, this function > can help cmdq client to get the cmdq_client_reg structure. > > Signed-off-by: Bibby Hsieh > --- > drivers/soc/mediatek/mtk-cmdq-helper.c | 24 > include/linux/soc/mediatek/mtk-cmdq.h | 21 + > 2 files changed, 45 insertions(+) > > diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c > b/drivers/soc/mediatek/mtk-cmdq-helper.c > index 70ad4d806fac..ceb1b569891f 100644 > --- a/drivers/soc/mediatek/mtk-cmdq-helper.c > +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c > @@ -27,6 +27,30 @@ struct cmdq_instruction { > u8 op; > }; > > +int cmdq_dev_get_client_reg(struct device *dev, > + struct cmdq_client_reg *client_reg, int idx) > +{ > + struct of_phandle_args spec; > + > + if (!client_reg) > + return -ENOENT; > + > + if (of_parse_phandle_with_args(dev->of_node, "mediatek,gce-client-reg", > +"#subsys-cells", idx, &spec)) { > + dev_err(dev, "can't parse gce-client-reg property (%d)", idx); > + > + return -ENOENT; Maybe my expression is not so clear. of_parse_phandle_with_args() may return -ENOENT, but it also may return -EINVAL. My point is why do you change the return value of of_parse_phandle_with_args(). What the error you get from of_parse_phandle_with_args(), you could also return it to the caller of cmdq_dev_get_client_reg(). Regards, CK > + } > + > + client_reg->subsys = spec.args[0]; > + client_reg->offset = spec.args[1]; > + client_reg->size = spec.args[2]; > + of_node_put(spec.np); > + > + return 0; > +} > +EXPORT_SYMBOL(cmdq_dev_get_client_reg); > + > static void cmdq_client_timeout(struct timer_list *t) > { > struct cmdq_client *client = from_timer(client, t, timer); > diff --git a/include/linux/soc/mediatek/mtk-cmdq.h > b/include/linux/soc/mediatek/mtk-cmdq.h > index a345870a6d10..be402c4c740e 100644 > --- a/include/linux/soc/mediatek/mtk-cmdq.h > +++ b/include/linux/soc/mediatek/mtk-cmdq.h > @@ -15,6 +15,12 @@ > > struct cmdq_pkt; > > +struct cmdq_client_reg { > + u8 subsys; > + u16 offset; > + u16 size; > +}; > + > struct cmdq_client { > spinlock_t lock; > u32 pkt_cnt; > @@ -142,4 +148,19 @@ int cmdq_pkt_flush_async(struct cmdq_pkt *pkt, > cmdq_async_flush_cb cb, > */ > int cmdq_pkt_flush(struct cmdq_pkt *pkt); > > +/** > + * cmdq_dev_get_client_reg() - parse cmdq client reg from the device > + * node of CMDQ client > + * @dev: device of CMDQ mailbox clienti > + * @client_reg: CMDQ client reg pointer > + * @idx: the index of desired reg > + * > + * Return: 0 for success; else the error code is returned > + * > + * Help CMDQ client pasing the cmdq client reg > + * from the device node of CMDQ client. > + */ > +int cmdq_dev_get_client_reg(struct device *dev, > + struct cmdq_client_reg *client_reg, int idx); > + > #endif /* __MTK_CMDQ_H__ */
Re: [PATCH v7 4/4] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT
> On Jun 25, 2019, at 11:00 PM, Srikar Dronamraju > wrote: > > * Song Liu [2019-06-25 16:53:25]: > >> This patches uses newly added FOLL_SPLIT_PMD in uprobe. This enables easy >> regroup of huge pmd after the uprobe is disabled (in next patch). >> >> Acked-by: Kirill A. Shutemov >> Signed-off-by: Song Liu >> --- >> kernel/events/uprobes.c | 6 ++ >> 1 file changed, 2 insertions(+), 4 deletions(-) > > Looks good to me. > > Reviewed-by: Srikar Dronamraju Thanks Srikar! I guess these 4 patches are ready to go? Hi Andrew, Could you please route them via the mm tree? Thanks, Song
Re: [PATCH] media: staging/imx: Fix NULL deref in find_pipeline_entity()
On Wed, 2019-06-26 at 11:52 -0700, Steve Longerbeam wrote: > Fix a cut&paste error in find_pipeline_entity(). The start entity must be > passed to media_entity_to_video_device() in find_pipeline_entity(), not > pad->entity. The pad is only put to use later, after determining the start > entity is not the entity being searched for. > > Fixes: 3ef46bc97ca2 ("media: staging/imx: Improve pipeline searching") > > Reported-by: Colin Ian King > Signed-off-by: Steve Longerbeam > --- > drivers/staging/media/imx/imx-media-utils.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/staging/media/imx/imx-media-utils.c > b/drivers/staging/media/imx/imx-media-utils.c > index b5b8a3b7730a..6fb88c22ee27 100644 > --- a/drivers/staging/media/imx/imx-media-utils.c > +++ b/drivers/staging/media/imx/imx-media-utils.c > @@ -842,7 +842,7 @@ find_pipeline_entity(struct media_entity *start, u32 > grp_id, > if (sd->grp_id & grp_id) > return &sd->entity; > } else if (buftype && is_media_entity_v4l2_video_device(start)) { > - vfd = media_entity_to_video_device(pad->entity); > + vfd = media_entity_to_video_device(start); > if (buftype == vfd->queue->type) > return &vfd->entity; > } Reviewed-by: Philipp Zabel regards Philipp
Re: [PATCH v2 3/5] OPP: Add support for parsing the interconnect bandwidth
Hey Georgi, In addition to Viresh's comments I found a few more while testing the series on SDM845. On 4/24/19 11:22 AM, Viresh Kumar wrote: On 23-04-19, 16:28, Georgi Djakov wrote: The OPP bindings now support bandwidth values, so add support to parse it from device tree and store it into the new dev_pm_opp_icc_bw struct, which is part of the dev_pm_opp. Also add and export the dev_pm_opp_set_paths() and dev_pm_opp_put_paths() helpers, to set (and release) an interconnect paths to a device. The bandwidth of these paths will be updated when the OPPs are switched. Signed-off-by: Georgi Djakov --- drivers/opp/core.c | 87 ++- drivers/opp/of.c | 102 + drivers/opp/opp.h | 9 include/linux/pm_opp.h | 14 ++ 4 files changed, 210 insertions(+), 2 deletions(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 0420f7e8ad5b..97ee39ecdebd 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -19,6 +19,7 @@ #include #include #include +#include Just include this once in opp.h and the other .c files won't need it. #include #include @@ -876,6 +877,8 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index) ret); } + _of_find_paths(opp_table, dev); + BLOCKING_INIT_NOTIFIER_HEAD(&opp_table->head); INIT_LIST_HEAD(&opp_table->opp_list); kref_init(&opp_table->kref); @@ -1129,11 +1132,12 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_remove_all_dynamic); struct dev_pm_opp *_opp_allocate(struct opp_table *table) { struct dev_pm_opp *opp; - int count, supply_size; + int count, supply_size, icc_size; /* Allocate space for at least one supply */ count = table->regulator_count > 0 ? table->regulator_count : 1; supply_size = sizeof(*opp->supplies) * count; + icc_size = sizeof(*opp->bandwidth) * table->path_count; /* allocate new OPP node and supplies structures */ opp = kzalloc(sizeof(*opp) + supply_size, GFP_KERNEL); You never updated this to include icc_size :( @@ -1141,7 +1145,8 @@ struct dev_pm_opp *_opp_allocate(struct opp_table *table) return NULL; /* Put the supplies at the end of the OPP structure as an empty array */ - opp->supplies = (struct dev_pm_opp_supply *)(opp + 1); + opp->bandwidth = (struct dev_pm_opp_icc_bw *)(opp + 1); Keep the order as supplies and then bandwidth. + opp->supplies = (struct dev_pm_opp_supply *)(opp + icc_size + 1); opp->supplies = (struct dev_pm_opp_supply *)(opp + 1); opp->bandwidth = (struct dev_pm_opp_icc_bw *)(opp->supplies + 1); Did you check what address gets assigned here ? I think the pointer addition will screw things up for you. INIT_LIST_HEAD(&opp->node); return opp; @@ -1637,6 +1642,84 @@ void dev_pm_opp_put_clkname(struct opp_table *opp_table) } EXPORT_SYMBOL_GPL(dev_pm_opp_put_clkname); +/** + * dev_pm_opp_set_paths() - Set interconnect path for a device + * @dev: Device for which interconnect path is being set. + * + * This must be called before any OPPs are initialized for the device. + */ +struct opp_table *dev_pm_opp_set_paths(struct device *dev) I got a bit confused. Why is this routine required exactly as _of_find_paths() would have already done something similar ? +{ + struct opp_table *opp_table; + int ret, i; + + opp_table = dev_pm_opp_get_opp_table(dev); + if (!opp_table) + return ERR_PTR(-ENOMEM); + + /* This should be called before OPPs are initialized */ + if (WARN_ON(!list_empty(&opp_table->opp_list))) { + ret = -EBUSY; + goto err; + } + + /* Another CPU that shares the OPP table has set the path */ + if (opp_table->paths) + return opp_table; + >> +opp_table->paths = kmalloc_array(opp_table->path_count, of_find_paths might have failed so you would want to re-calculate opp_table->path_count. +sizeof(*opp_table->paths), GFP_KERNEL); + + /* Find interconnect path(s) for the device */ + for (i = 0; i < opp_table->path_count; i++) { + opp_table->paths[i] = of_icc_get_by_index(dev, i); + if (IS_ERR(opp_table->paths[i])) { + ret = PTR_ERR(opp_table->paths[i]); + if (ret != -EPROBE_DEFER) + dev_err(dev, "%s: Couldn't find path%d: %d\n", + __func__, i, ret); we should clean up by call icc_put on the paths that succeeded and free/set the opp_table->paths to NULL. + goto err; + } + } + + return opp_table; + +err: + dev_pm_opp_put_opp_table(opp_table); + + return ERR_PTR(ret); +} +EXPORT_SYMBOL_GPL(dev_pm_opp_set_paths); + +/** + * dev_pm_op
Re: [RFC PATCH v3 0/4] Deliver vGPU display vblank event to userspace
Hi, > Instead of delivering page flip events, we choose to post display vblank > event. Handling page flip events for both primary plane and cursor plane > may make user space quite busy, although we have the mask/unmask mechansim > for mitigation. Besides, there are some cases that guest app only uses > one framebuffer for both drawing and display. In such case, guest OS won't > do the plane page flip when the framebuffer is updated, thus the user > land won't be notified about the updated framebuffer. What happens when the guest is idle and doesn't draw anything to the framebuffer? cheers, Gerd
[PATCH v9 09/12] soc: mediatek: cmdq: define the instruction struct
Define an instruction structure for gce driver to append command. This structure can make the client's code more readability. Signed-off-by: Bibby Hsieh Reviewed-by: CK Hu --- drivers/soc/mediatek/mtk-cmdq-helper.c | 103 +++ include/linux/mailbox/mtk-cmdq-mailbox.h | 2 + 2 files changed, 72 insertions(+), 33 deletions(-) diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c b/drivers/soc/mediatek/mtk-cmdq-helper.c index 7aa0517ff2f3..0886c4967ca4 100644 --- a/drivers/soc/mediatek/mtk-cmdq-helper.c +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c @@ -9,12 +9,24 @@ #include #include -#define CMDQ_ARG_A_WRITE_MASK 0x #define CMDQ_WRITE_ENABLE_MASK BIT(0) #define CMDQ_EOC_IRQ_ENBIT(0) #define CMDQ_EOC_CMD ((u64)((CMDQ_CODE_EOC << CMDQ_OP_CODE_SHIFT)) \ << 32 | CMDQ_EOC_IRQ_EN) +struct cmdq_instruction { + union { + u32 value; + u32 mask; + }; + union { + u16 offset; + u16 event; + }; + u8 subsys; + u8 op; +}; + static void cmdq_client_timeout(struct timer_list *t) { struct cmdq_client *client = from_timer(client, t, timer); @@ -110,10 +122,8 @@ void cmdq_pkt_destroy(struct cmdq_pkt *pkt) } EXPORT_SYMBOL(cmdq_pkt_destroy); -static int cmdq_pkt_append_command(struct cmdq_pkt *pkt, enum cmdq_code code, - u32 arg_a, u32 arg_b) +static struct cmdq_instruction *cmdq_pkt_append_command(struct cmdq_pkt *pkt) { - u64 *cmd_ptr; if (unlikely(pkt->cmd_buf_size + CMDQ_INST_SIZE > pkt->buf_size)) { /* @@ -127,81 +137,108 @@ static int cmdq_pkt_append_command(struct cmdq_pkt *pkt, enum cmdq_code code, pkt->cmd_buf_size += CMDQ_INST_SIZE; WARN_ONCE(1, "%s: buffer size %u is too small !\n", __func__, (u32)pkt->buf_size); - return -ENOMEM; + return NULL; } - cmd_ptr = pkt->va_base + pkt->cmd_buf_size; - (*cmd_ptr) = (u64)((code << CMDQ_OP_CODE_SHIFT) | arg_a) << 32 | arg_b; + pkt->cmd_buf_size += CMDQ_INST_SIZE; - return 0; + return pkt->va_base + pkt->cmd_buf_size - CMDQ_INST_SIZE; } int cmdq_pkt_write(struct cmdq_pkt *pkt, u8 subsys, u16 offset, u32 value) { - u32 arg_a = (offset & CMDQ_ARG_A_WRITE_MASK) | - (subsys << CMDQ_SUBSYS_SHIFT); + struct cmdq_instruction *inst; + + inst = cmdq_pkt_append_command(pkt); + if (!inst) + return -ENOMEM; + + inst->op = CMDQ_CODE_WRITE; + inst->value = value; + inst->offset = offset; + inst->subsys = subsys; - return cmdq_pkt_append_command(pkt, CMDQ_CODE_WRITE, arg_a, value); + return 0; } EXPORT_SYMBOL(cmdq_pkt_write); int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u8 subsys, u16 offset, u32 value, u32 mask) { + struct cmdq_instruction *inst; u32 offset_mask = offset; - int err = 0; if (mask != 0x) { - err = cmdq_pkt_append_command(pkt, CMDQ_CODE_MASK, 0, ~mask); + inst = cmdq_pkt_append_command(pkt); + if (!inst) + return -ENOMEM; + + inst->op = CMDQ_CODE_MASK; + inst->mask = ~mask; offset_mask |= CMDQ_WRITE_ENABLE_MASK; } - err |= cmdq_pkt_write(pkt, value, subsys, offset_mask); - return err; + return cmdq_pkt_write(pkt, subsys, offset_mask, value); } EXPORT_SYMBOL(cmdq_pkt_write_mask); int cmdq_pkt_wfe(struct cmdq_pkt *pkt, u16 event) { - u32 arg_b; + struct cmdq_instruction *inst; if (event >= CMDQ_MAX_EVENT) return -EINVAL; - /* -* WFE arg_b -* bit 0-11: wait value -* bit 15: 1 - wait, 0 - no wait -* bit 16-27: update value -* bit 31: 1 - update, 0 - no update -*/ - arg_b = CMDQ_WFE_UPDATE | CMDQ_WFE_WAIT | CMDQ_WFE_WAIT_VALUE; + inst = cmdq_pkt_append_command(pkt); + if (!inst) + return -ENOMEM; + + inst->op = CMDQ_CODE_WFE; + inst->value = CMDQ_WFE_OPTION; + inst->event = event; - return cmdq_pkt_append_command(pkt, CMDQ_CODE_WFE, event, arg_b); + return 0; } EXPORT_SYMBOL(cmdq_pkt_wfe); int cmdq_pkt_clear_event(struct cmdq_pkt *pkt, u16 event) { + struct cmdq_instruction *inst; + if (event >= CMDQ_MAX_EVENT) return -EINVAL; - return cmdq_pkt_append_command(pkt, CMDQ_CODE_WFE, event, - CMDQ_WFE_UPDATE); + inst = cmdq_pkt_append_command(pkt); + if (!inst) + return -ENOMEM; + + inst->op = CMDQ_CODE_WFE; + inst->value = CMDQ_WFE_UPDATE; + inst->event = event; + + return 0; } EXPORT_SYMBOL(cmdq_pkt_clear_e
Re: [RFC PATCH v3 2/4] vfio: Introduce vGPU display irq type
On Thu, Jun 27, 2019 at 11:38:00AM +0800, Tina Zhang wrote: > Introduce vGPU specific irq type VFIO_IRQ_TYPE_GFX, and > VFIO_IRQ_SUBTYPE_GFX_DISPLAY_IRQ as the subtype for vGPU display > > Signed-off-by: Tina Zhang > --- > include/uapi/linux/vfio.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 600784acc4ac..c3e9c821a5cb 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -465,6 +465,9 @@ struct vfio_irq_info_cap_type { > __u32 subtype; /* type specific */ > }; > > +#define VFIO_IRQ_TYPE_GFX(1) > +#define VFIO_IRQ_SUBTYPE_GFX_DISPLAY_IRQ (1) VFIO_IRQ_TYPE_GFX_VBLANK ? cheers, Gerd
[PATCH 2/2] net: macb: Fix SUBNS increment and increase resolution
The subns increment register has 24 bits as follows: RegBit[15:0] = Subns[23:8]; RegBit[31:24] = Subns[7:0] Fix the same in the driver and increase sub ns resolution to the best capable, 24 bits. This should be the case on all GEM versions that this PTP driver supports. Signed-off-by: Harini Katakam --- drivers/net/ethernet/cadence/macb.h | 6 +- drivers/net/ethernet/cadence/macb_ptp.c | 5 - 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index 90bc70b..03983bd 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -496,7 +496,11 @@ /* Bitfields in TISUBN */ #define GEM_SUBNSINCR_OFFSET 0 -#define GEM_SUBNSINCR_SIZE 16 +#define GEM_SUBNSINCRL_OFFSET 24 +#define GEM_SUBNSINCRL_SIZE8 +#define GEM_SUBNSINCRH_OFFSET 0 +#define GEM_SUBNSINCRH_SIZE16 +#define GEM_SUBNSINCR_SIZE 24 /* Bitfields in TI */ #define GEM_NSINCR_OFFSET 0 diff --git a/drivers/net/ethernet/cadence/macb_ptp.c b/drivers/net/ethernet/cadence/macb_ptp.c index 6276eac..43a3f0d 100644 --- a/drivers/net/ethernet/cadence/macb_ptp.c +++ b/drivers/net/ethernet/cadence/macb_ptp.c @@ -104,7 +104,10 @@ static int gem_tsu_incr_set(struct macb *bp, struct tsu_incr *incr_spec) * to take effect. */ spin_lock_irqsave(&bp->tsu_clk_lock, flags); - gem_writel(bp, TISUBN, GEM_BF(SUBNSINCR, incr_spec->sub_ns)); + /* RegBit[15:0] = Subns[23:8]; RegBit[31:24] = Subns[7:0] */ + gem_writel(bp, TISUBN, GEM_BF(SUBNSINCRL, incr_spec->sub_ns) | + GEM_BF(SUBNSINCRH, (incr_spec->sub_ns >> + GEM_SUBNSINCRL_SIZE))); gem_writel(bp, TI, GEM_BF(NSINCR, incr_spec->ns)); spin_unlock_irqrestore(&bp->tsu_clk_lock, flags); -- 2.7.4
[PATCH v5 4/7] perf diff: Use hists to manage basic blocks per symbol
The hist__account_cycles() can account cycles per basic block. The basic block information is saved in cycles_hist structure. This patch processes each symbol, get basic blocks from cycles_hist and add the basic block entries to a new hists (in 'struct block_hist'). Using a hists is because we need to compare, sort and print the basic blocks later. v5: --- Since now we still carry block_info in 'struct hist_entry' we don't need to use our own new/free ops for hist entries. And the block_info is released in hist_entry__delete. v3: --- 1. In v2, we put block stuffs in 'struct hist_entry', but it's not a good design. In v3, we create a new 'struct block_hist' and cast the 'struct hist_entry' to 'struct block_hist' in some places, which can avoid adding new stuffs in 'struct hist_entry'. 2. abs() -> labs(), in block_cycles_diff_cmp(). v2: --- v1 adds the basic block entries to per data-file hists but v2 adds the basic block entries to per symbol hists. That is to keep current perf-diff format. Will show the result in next patches. Signed-off-by: Jin Yao --- tools/perf/builtin-diff.c | 190 +- tools/perf/util/hist.c| 3 + tools/perf/util/sort.h| 12 +++ 3 files changed, 202 insertions(+), 3 deletions(-) diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index a7e0420..ff2c076 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -20,6 +20,7 @@ #include "util/data.h" #include "util/config.h" #include "util/time-utils.h" +#include "util/annotate.h" #include #include @@ -87,11 +88,14 @@ static s64 compute_wdiff_w2; static const char *cpu_list; static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); +static struct addr_location dummy_al; + enum { COMPUTE_DELTA, COMPUTE_RATIO, COMPUTE_WEIGHTED_DIFF, COMPUTE_DELTA_ABS, + COMPUTE_CYCLES, COMPUTE_MAX, }; @@ -100,6 +104,7 @@ const char *compute_names[COMPUTE_MAX] = { [COMPUTE_DELTA_ABS] = "delta-abs", [COMPUTE_RATIO] = "ratio", [COMPUTE_WEIGHTED_DIFF] = "wdiff", + [COMPUTE_CYCLES] = "cycles", }; static int compute = COMPUTE_DELTA_ABS; @@ -234,6 +239,8 @@ static int setup_compute(const struct option *opt, const char *str, for (i = 0; i < COMPUTE_MAX; i++) if (!strcmp(cstr, compute_names[i])) { *cp = i; + if (i == COMPUTE_CYCLES) + break; return setup_compute_opt(option); } @@ -336,6 +343,31 @@ static int formula_fprintf(struct hist_entry *he, struct hist_entry *pair, return -1; } +static void *block_hist_zalloc(size_t size) +{ + struct block_hist *bh; + + bh = zalloc(size + sizeof(*bh)); + if (!bh) + return NULL; + + return &bh->he; +} + +static void block_hist_free(void *he) +{ + struct block_hist *bh; + + bh = container_of(he, struct block_hist, he); + hists__delete_entries(&bh->block_hists); + free(bh); +} + +struct hist_entry_ops block_hist_ops = { + .new= block_hist_zalloc, + .free = block_hist_free, +}; + static int diff__process_sample_event(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, @@ -363,9 +395,22 @@ static int diff__process_sample_event(struct perf_tool *tool, goto out_put; } - if (!hists__add_entry(hists, &al, NULL, NULL, NULL, sample, true)) { - pr_warning("problem incrementing symbol period, skipping event\n"); - goto out_put; + if (compute != COMPUTE_CYCLES) { + if (!hists__add_entry(hists, &al, NULL, NULL, NULL, sample, + true)) { + pr_warning("problem incrementing symbol period, " + "skipping event\n"); + goto out_put; + } + } else { + if (!hists__add_entry_ops(hists, &block_hist_ops, &al, NULL, + NULL, NULL, sample, true)) { + pr_warning("problem incrementing symbol period, " + "skipping event\n"); + goto out_put; + } + + hist__account_cycles(sample->branch_stack, &al, sample, false); } /* @@ -475,6 +520,127 @@ static void hists__baseline_only(struct hists *hists) } } +static int64_t block_cmp(struct perf_hpp_fmt *fmt __maybe_unused, +struct hist_entry *left, struct hist_entry *right) +{ + struct block_info *bi_l = left->block_info; + struct block_info *bi_r = right->block_info; + int cmp; + + if (!bi_l->sym || !bi_r->sym) { + if (!bi_l->sy
[PATCH v9 04/12] mailbox: mediatek: cmdq: move the CMDQ_IRQ_MASK into cmdq driver data
The interrupt mask and thread number has positive correlation, so we move the CMDQ_IRQ_MASK into cmdq driver data and calculate it by thread number. Signed-off-by: Bibby Hsieh Reviewed-by: CK Hu --- drivers/mailbox/mtk-cmdq-mailbox.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c b/drivers/mailbox/mtk-cmdq-mailbox.c index 00d5219094e5..8fddd26288e8 100644 --- a/drivers/mailbox/mtk-cmdq-mailbox.c +++ b/drivers/mailbox/mtk-cmdq-mailbox.c @@ -18,7 +18,6 @@ #include #define CMDQ_OP_CODE_MASK (0xff << CMDQ_OP_CODE_SHIFT) -#define CMDQ_IRQ_MASK 0x #define CMDQ_NUM_CMD(t)(t->cmd_buf_size / CMDQ_INST_SIZE) #define CMDQ_CURR_IRQ_STATUS 0x10 @@ -72,6 +71,7 @@ struct cmdq { void __iomem*base; u32 irq; u32 thread_nr; + u32 irq_mask; struct cmdq_thread *thread; struct clk *clock; boolsuspended; @@ -285,11 +285,11 @@ static irqreturn_t cmdq_irq_handler(int irq, void *dev) unsigned long irq_status, flags = 0L; int bit; - irq_status = readl(cmdq->base + CMDQ_CURR_IRQ_STATUS) & CMDQ_IRQ_MASK; - if (!(irq_status ^ CMDQ_IRQ_MASK)) + irq_status = readl(cmdq->base + CMDQ_CURR_IRQ_STATUS) & cmdq->irq_mask; + if (!(irq_status ^ cmdq->irq_mask)) return IRQ_NONE; - for_each_clear_bit(bit, &irq_status, fls(CMDQ_IRQ_MASK)) { + for_each_clear_bit(bit, &irq_status, cmdq->thread_nr) { struct cmdq_thread *thread = &cmdq->thread[bit]; spin_lock_irqsave(&thread->chan->lock, flags); @@ -473,6 +473,9 @@ static int cmdq_probe(struct platform_device *pdev) dev_err(dev, "failed to get irq\n"); return -EINVAL; } + + cmdq->thread_nr = (u32)(unsigned long)of_device_get_match_data(dev); + cmdq->irq_mask = GENMASK(cmdq->thread_nr - 1, 0); err = devm_request_irq(dev, cmdq->irq, cmdq_irq_handler, IRQF_SHARED, "mtk_cmdq", cmdq); if (err < 0) { @@ -489,7 +492,6 @@ static int cmdq_probe(struct platform_device *pdev) return PTR_ERR(cmdq->clock); } - cmdq->thread_nr = (u32)(unsigned long)of_device_get_match_data(dev); cmdq->mbox.dev = dev; cmdq->mbox.chans = devm_kcalloc(dev, cmdq->thread_nr, sizeof(*cmdq->mbox.chans), GFP_KERNEL); -- 2.18.0
[PATCH 1/2] net: macb: Add separate definition for PPM fraction
The scaled ppm parameter passed to _adjfine() contains a 16 bit fraction. This just happens to be the same as SUBNSINCR_SIZE now. Hence define this separately. Signed-off-by: Harini Katakam --- drivers/net/ethernet/cadence/macb.h | 3 +++ drivers/net/ethernet/cadence/macb_ptp.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index 515bfd2..90bc70b 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -834,6 +834,9 @@ struct gem_tx_ts { /* limit RX checksum offload to TCP and UDP packets */ #define GEM_RX_CSUM_CHECKED_MASK 2 +/* Scaled PPM fraction */ +#define PPM_FRACTION 16 + /* struct macb_tx_skb - data about an skb which is being transmitted * @skb: skb currently being transmitted, only set for the last buffer * of the frame diff --git a/drivers/net/ethernet/cadence/macb_ptp.c b/drivers/net/ethernet/cadence/macb_ptp.c index 0a8aca8..6276eac 100644 --- a/drivers/net/ethernet/cadence/macb_ptp.c +++ b/drivers/net/ethernet/cadence/macb_ptp.c @@ -135,7 +135,7 @@ static int gem_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm) * (temp / USEC_PER_SEC) + 0.5 */ adj += (USEC_PER_SEC >> 1); - adj >>= GEM_SUBNSINCR_SIZE; /* remove fractions */ + adj >>= PPM_FRACTION; /* remove fractions */ adj = div_u64(adj, USEC_PER_SEC); adj = neg_adj ? (word - adj) : (word + adj); -- 2.7.4
[PATCH v9 12/12] arm64: dts: add gce node for mt8183
add gce device node for mt8183 Signed-off-by: Bibby Hsieh --- arch/arm64/boot/dts/mediatek/mt8183.dtsi | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm64/boot/dts/mediatek/mt8183.dtsi b/arch/arm64/boot/dts/mediatek/mt8183.dtsi index 08274bfcebd8..42b7cc9e7304 100644 --- a/arch/arm64/boot/dts/mediatek/mt8183.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8183.dtsi @@ -8,6 +8,7 @@ #include #include #include +#include / { compatible = "mediatek,mt8183"; @@ -212,6 +213,16 @@ clock-names = "spi", "wrap"; }; + gce: gce@10238000 { + compatible = "mediatek,mt8183-gce"; + reg = <0 0x10238000 0 0x4000>; + interrupts = ; + #mbox-cells = <3>; + #subsys-cells = <3>; + clocks = <&infracfg CLK_INFRA_GCE>; + clock-names = "gce"; + }; + uart0: serial@11002000 { compatible = "mediatek,mt8183-uart", "mediatek,mt6577-uart"; -- 2.18.0
[PATCH v9 11/12] soc: mediatek: cmdq: add cmdq_dev_get_client_reg function
GCE cannot know the register base address, this function can help cmdq client to get the cmdq_client_reg structure. Signed-off-by: Bibby Hsieh --- drivers/soc/mediatek/mtk-cmdq-helper.c | 24 include/linux/soc/mediatek/mtk-cmdq.h | 21 + 2 files changed, 45 insertions(+) diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c b/drivers/soc/mediatek/mtk-cmdq-helper.c index 70ad4d806fac..ceb1b569891f 100644 --- a/drivers/soc/mediatek/mtk-cmdq-helper.c +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c @@ -27,6 +27,30 @@ struct cmdq_instruction { u8 op; }; +int cmdq_dev_get_client_reg(struct device *dev, + struct cmdq_client_reg *client_reg, int idx) +{ + struct of_phandle_args spec; + + if (!client_reg) + return -ENOENT; + + if (of_parse_phandle_with_args(dev->of_node, "mediatek,gce-client-reg", + "#subsys-cells", idx, &spec)) { + dev_err(dev, "can't parse gce-client-reg property (%d)", idx); + + return -ENOENT; + } + + client_reg->subsys = spec.args[0]; + client_reg->offset = spec.args[1]; + client_reg->size = spec.args[2]; + of_node_put(spec.np); + + return 0; +} +EXPORT_SYMBOL(cmdq_dev_get_client_reg); + static void cmdq_client_timeout(struct timer_list *t) { struct cmdq_client *client = from_timer(client, t, timer); diff --git a/include/linux/soc/mediatek/mtk-cmdq.h b/include/linux/soc/mediatek/mtk-cmdq.h index a345870a6d10..be402c4c740e 100644 --- a/include/linux/soc/mediatek/mtk-cmdq.h +++ b/include/linux/soc/mediatek/mtk-cmdq.h @@ -15,6 +15,12 @@ struct cmdq_pkt; +struct cmdq_client_reg { + u8 subsys; + u16 offset; + u16 size; +}; + struct cmdq_client { spinlock_t lock; u32 pkt_cnt; @@ -142,4 +148,19 @@ int cmdq_pkt_flush_async(struct cmdq_pkt *pkt, cmdq_async_flush_cb cb, */ int cmdq_pkt_flush(struct cmdq_pkt *pkt); +/** + * cmdq_dev_get_client_reg() - parse cmdq client reg from the device + *node of CMDQ client + * @dev: device of CMDQ mailbox clienti + * @client_reg: CMDQ client reg pointer + * @idx: the index of desired reg + * + * Return: 0 for success; else the error code is returned + * + * Help CMDQ client pasing the cmdq client reg + * from the device node of CMDQ client. + */ +int cmdq_dev_get_client_reg(struct device *dev, + struct cmdq_client_reg *client_reg, int idx); + #endif /* __MTK_CMDQ_H__ */ -- 2.18.0
[PATCH v9 05/12] mailbox: mediatek: cmdq: support mt8183 gce function
add mt8183 compatible name for supporting gce function Signed-off-by: Bibby Hsieh Reviewed-by: CK Hu --- drivers/mailbox/mtk-cmdq-mailbox.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c b/drivers/mailbox/mtk-cmdq-mailbox.c index 8fddd26288e8..69daaadc3a5f 100644 --- a/drivers/mailbox/mtk-cmdq-mailbox.c +++ b/drivers/mailbox/mtk-cmdq-mailbox.c @@ -539,6 +539,7 @@ static const struct dev_pm_ops cmdq_pm_ops = { static const struct of_device_id cmdq_of_ids[] = { {.compatible = "mediatek,mt8173-gce", .data = (void *)16}, + {.compatible = "mediatek,mt8183-gce", .data = (void *)24}, {} }; -- 2.18.0
[PATCH v9 00/12] support gce on mt8183 platform
Changes since v8: - change the error return code in cmdq_dev_get_client_reg() Changes since v7: - remove the memory allocation out of cmdq_dev_get_client_reg() - rebase onto 5.2-rc1 Changes since v6: - remove cmdq_dev_get_event function and gce event property - separate some changes to indepentent patch - change the binding document related to gce-client-reg property Changes since v5: - fix typo - remove gce-event-name form the dt-binding - add reasons in commit message Changes since v4: - refine the architecture of the packet encoder function - refine the gce enevt property - change the patch's title Changes since v3: - fix a typo in dt-binding and dtsi - cast the return value to right format Changes since v2: - according to CK's review comment, change the property name and refine the parameter - change the patch's title - remove unused property from dt-binding and dts Changes since v1: - add prefix "cmdq" in the commit subject - add dt-binding document for get event and subsys function - add fix up tag in fixup patch - fix up some coding style (alignment) MTK will support gce function on mt8183 platform. dt-binding: gce: add gce header file for mt8183 mailbox: mediatek: cmdq: support mt8183 gce function arm64: dts: add gce node for mt8183 Besides above patches, we refine gce driver on those patches. soc: mediatek: cmdq: reorder the parameter soc: mediatek: cmdq: change the type of input parameter mailbox: mediatek: cmdq: move the CMDQ_IRQ_MASK into cmdq driver data soc: mediatek: cmdq: clear the event in cmdq initial flow In order to enhance the convenience of gce usage, we add new helper functions and refine the method of instruction combining. dt-binding: gce: remove thread-num property dt-binding: gce: add binding for gce client reg property soc: mediatek: cmdq: define the instruction struct soc: mediatek: cmdq: add polling function soc: mediatek: cmdq: add cmdq_dev_get_client_reg function Bibby Hsieh (12): dt-binding: gce: remove thread-num property dt-binding: gce: add gce header file for mt8183 dt-binding: gce: add binding for gce client reg property mailbox: mediatek: cmdq: move the CMDQ_IRQ_MASK into cmdq driver data mailbox: mediatek: cmdq: support mt8183 gce function soc: mediatek: cmdq: clear the event in cmdq initial flow soc: mediatek: cmdq: reorder the parameter soc: mediatek: cmdq: change the type of input parameter soc: mediatek: cmdq: define the instruction struct soc: mediatek: cmdq: add polling function soc: mediatek: cmdq: add cmdq_dev_get_client_reg function arm64: dts: add gce node for mt8183 .../devicetree/bindings/mailbox/mtk-gce.txt | 25 ++- arch/arm64/boot/dts/mediatek/mt8183.dtsi | 11 ++ drivers/mailbox/mtk-cmdq-mailbox.c| 18 +- drivers/soc/mediatek/mtk-cmdq-helper.c| 165 include/dt-bindings/gce/mt8183-gce.h | 177 ++ include/linux/mailbox/mtk-cmdq-mailbox.h | 5 + include/linux/soc/mediatek/mtk-cmdq.h | 53 +- 7 files changed, 393 insertions(+), 61 deletions(-) create mode 100644 include/dt-bindings/gce/mt8183-gce.h -- 2.18.0
[PATCH 0/2] Sub ns increment fixes in Macb PTP
The subns increment register fields are not captured correctly in the driver. Fix the same and also increase the subns incr resolution. Sub ns resolution was increased to 24 bits in r1p06f2 version. To my knowledge, this PTP driver, with its current BD time stamp implementation, is only useful to that version or above. So, I have increased the resolution unconditionally. Please let me know if there is any IP versions incompatible with this - there is no register to obtain this information from. Changes from RFC: None Harini Katakam (2): net: macb: Add separate definition for PPM fraction net: macb: Fix SUBNS increment and increase resolution drivers/net/ethernet/cadence/macb.h | 9 - drivers/net/ethernet/cadence/macb_ptp.c | 7 +-- 2 files changed, 13 insertions(+), 3 deletions(-) -- 2.7.4
[PATCH v5 6/7] perf diff: Print the basic block cycles diff
$ perf record -b ./div $ perf record -b ./div Following is the default perf diff output $ perf diff # Event 'cycles' # # Baseline Delta Abs Shared Object Symbol # . .. # 48.75% +0.33% div [.] main 8.21% -0.20% div [.] compute_flag 19.02% -0.12% libc-2.23.so [.] __random_r 16.17% -0.09% libc-2.23.so [.] __random 2.27% -0.03% div [.] rand@plt +0.02% [i915][k] gen8_irq_handler 5.52% +0.02% libc-2.23.so [.] rand This patch creates a new computation selection 'cycles'. $ perf diff -c cycles # Event 'cycles' # # Baseline [Program Block Range] Cycles Diff Shared Object Symbol # .. .. # 48.75% [div.c:42 -> div.c:45] 147 div [.] main 48.75% [div.c:31 -> div.c:40] 4 div [.] main 48.75% [div.c:40 -> div.c:40] 0 div [.] main 48.75% [div.c:42 -> div.c:42] 0 div [.] main 48.75% [div.c:42 -> div.c:44] 0 div [.] main 19.02% [random_r.c:357 -> random_r.c:360] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:373] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:376] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:380] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:392] 0 libc-2.23.so [.] __random_r 16.17% [random.c:288 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:295] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:297] 0 libc-2.23.so [.] __random 16.17% [random.c:291 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:293 -> random.c:293] 0 libc-2.23.so [.] __random 8.21% [div.c:22 -> div.c:22] 148 div [.] compute_flag 8.21% [div.c:22 -> div.c:25] 0 div [.] compute_flag 8.21% [div.c:27 -> div.c:28] 0 div [.] compute_flag 5.52% [rand.c:26 -> rand.c:27] 0 libc-2.23.so [.] rand 5.52% [rand.c:26 -> rand.c:28] 0 libc-2.23.so [.] rand 2.27% [rand@plt+0 -> rand@plt+0] 0 div [.] rand@plt 0.01% [entry_64.S:694 -> entry_64.S:694] 16 [kernel.vmlinux] [k] native_irq_return_iret 0.00% [fair.c:7676 -> fair.c:7665] 162 [kernel.vmlinux] [k] update_blocked_averages "[Program Block Range]" indicates the range of program basic block (start -> end). If we can find the source line it prints the source line otherwise it prints the symbol+offset instead. v4: --- Use source lines or symbol+offset to indicate the basic block. It should be easier to understand. v3: --- Cast 'struct hist_entry' to 'struct block_hist' in hist_entry__block_fprintf. Use symbol_conf.report_block to check if executing hist_entry__block_fprintf. v2: --- Keep standard perf diff format and display the 'Baseline' and 'Shared Object'. The output is sorted by "Baseline" and the basic blocks in the same function are sorted by cycles diff. Signed-off-by: Jin Yao --- tools/perf/builtin-diff.c | 80 --- tools/perf/ui/stdio/hist.c| 27 +++ tools/perf/util/hist.c| 18 ++ tools/perf/util/hist.h| 3 ++ tools/perf/util/srcline.c | 4 ++- tools/perf/util/symbol_conf.h | 4 ++- 6 files changed, 130 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index 864423a..c888927 100644 --- a/tools/perf/buil
[PATCH v9 08/12] soc: mediatek: cmdq: change the type of input parameter
According to the cmdq hardware design, the subsys is u8, the offset is u16 and the event id is u16. This patch changes the type of subsys, offset and event id to the correct type. Signed-off-by: Bibby Hsieh Reviewed-by: CK Hu --- drivers/soc/mediatek/mtk-cmdq-helper.c | 10 +- include/linux/soc/mediatek/mtk-cmdq.h | 10 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c b/drivers/soc/mediatek/mtk-cmdq-helper.c index 082b8978651e..7aa0517ff2f3 100644 --- a/drivers/soc/mediatek/mtk-cmdq-helper.c +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c @@ -136,7 +136,7 @@ static int cmdq_pkt_append_command(struct cmdq_pkt *pkt, enum cmdq_code code, return 0; } -int cmdq_pkt_write(struct cmdq_pkt *pkt, u32 subsys, u32 offset, u32 value) +int cmdq_pkt_write(struct cmdq_pkt *pkt, u8 subsys, u16 offset, u32 value) { u32 arg_a = (offset & CMDQ_ARG_A_WRITE_MASK) | (subsys << CMDQ_SUBSYS_SHIFT); @@ -145,8 +145,8 @@ int cmdq_pkt_write(struct cmdq_pkt *pkt, u32 subsys, u32 offset, u32 value) } EXPORT_SYMBOL(cmdq_pkt_write); -int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u32 subsys, - u32 offset, u32 value, u32 mask) +int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u8 subsys, + u16 offset, u32 value, u32 mask) { u32 offset_mask = offset; int err = 0; @@ -161,7 +161,7 @@ int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u32 subsys, } EXPORT_SYMBOL(cmdq_pkt_write_mask); -int cmdq_pkt_wfe(struct cmdq_pkt *pkt, u32 event) +int cmdq_pkt_wfe(struct cmdq_pkt *pkt, u16 event) { u32 arg_b; @@ -181,7 +181,7 @@ int cmdq_pkt_wfe(struct cmdq_pkt *pkt, u32 event) } EXPORT_SYMBOL(cmdq_pkt_wfe); -int cmdq_pkt_clear_event(struct cmdq_pkt *pkt, u32 event) +int cmdq_pkt_clear_event(struct cmdq_pkt *pkt, u16 event) { if (event >= CMDQ_MAX_EVENT) return -EINVAL; diff --git a/include/linux/soc/mediatek/mtk-cmdq.h b/include/linux/soc/mediatek/mtk-cmdq.h index 39d813dde4b4..9618debb9ceb 100644 --- a/include/linux/soc/mediatek/mtk-cmdq.h +++ b/include/linux/soc/mediatek/mtk-cmdq.h @@ -66,7 +66,7 @@ void cmdq_pkt_destroy(struct cmdq_pkt *pkt); * * Return: 0 for success; else the error code is returned */ -int cmdq_pkt_write(struct cmdq_pkt *pkt, u32 subsys, u32 offset, u32 value); +int cmdq_pkt_write(struct cmdq_pkt *pkt, u8 subsys, u16 offset, u32 value); /** * cmdq_pkt_write_mask() - append write command with mask to the CMDQ packet @@ -78,8 +78,8 @@ int cmdq_pkt_write(struct cmdq_pkt *pkt, u32 subsys, u32 offset, u32 value); * * Return: 0 for success; else the error code is returned */ -int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u32 subsys, - u32 offset, u32 value, u32 mask); +int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u8 subsys, + u16 offset, u32 value, u32 mask); /** * cmdq_pkt_wfe() - append wait for event command to the CMDQ packet @@ -88,7 +88,7 @@ int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u32 subsys, * * Return: 0 for success; else the error code is returned */ -int cmdq_pkt_wfe(struct cmdq_pkt *pkt, u32 event); +int cmdq_pkt_wfe(struct cmdq_pkt *pkt, u16 event); /** * cmdq_pkt_clear_event() - append clear event command to the CMDQ packet @@ -97,7 +97,7 @@ int cmdq_pkt_wfe(struct cmdq_pkt *pkt, u32 event); * * Return: 0 for success; else the error code is returned */ -int cmdq_pkt_clear_event(struct cmdq_pkt *pkt, u32 event); +int cmdq_pkt_clear_event(struct cmdq_pkt *pkt, u16 event); /** * cmdq_pkt_flush_async() - trigger CMDQ to asynchronously execute the CMDQ -- 2.18.0
[PATCH v9 07/12] soc: mediatek: cmdq: reorder the parameter
The order of gce instructions is [subsys offset value] so reorder the parameter of cmdq_pkt_write_mask and cmdq_pkt_write function. Signed-off-by: Bibby Hsieh Reviewed-by: CK Hu --- drivers/soc/mediatek/mtk-cmdq-helper.c | 6 +++--- include/linux/soc/mediatek/mtk-cmdq.h | 10 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c b/drivers/soc/mediatek/mtk-cmdq-helper.c index ff9fef5a032b..082b8978651e 100644 --- a/drivers/soc/mediatek/mtk-cmdq-helper.c +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c @@ -136,7 +136,7 @@ static int cmdq_pkt_append_command(struct cmdq_pkt *pkt, enum cmdq_code code, return 0; } -int cmdq_pkt_write(struct cmdq_pkt *pkt, u32 value, u32 subsys, u32 offset) +int cmdq_pkt_write(struct cmdq_pkt *pkt, u32 subsys, u32 offset, u32 value) { u32 arg_a = (offset & CMDQ_ARG_A_WRITE_MASK) | (subsys << CMDQ_SUBSYS_SHIFT); @@ -145,8 +145,8 @@ int cmdq_pkt_write(struct cmdq_pkt *pkt, u32 value, u32 subsys, u32 offset) } EXPORT_SYMBOL(cmdq_pkt_write); -int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u32 value, - u32 subsys, u32 offset, u32 mask) +int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u32 subsys, + u32 offset, u32 value, u32 mask) { u32 offset_mask = offset; int err = 0; diff --git a/include/linux/soc/mediatek/mtk-cmdq.h b/include/linux/soc/mediatek/mtk-cmdq.h index 4e8899972db4..39d813dde4b4 100644 --- a/include/linux/soc/mediatek/mtk-cmdq.h +++ b/include/linux/soc/mediatek/mtk-cmdq.h @@ -60,26 +60,26 @@ void cmdq_pkt_destroy(struct cmdq_pkt *pkt); /** * cmdq_pkt_write() - append write command to the CMDQ packet * @pkt: the CMDQ packet - * @value: the specified target register value * @subsys:the CMDQ sub system code * @offset:register offset from CMDQ sub system + * @value: the specified target register value * * Return: 0 for success; else the error code is returned */ -int cmdq_pkt_write(struct cmdq_pkt *pkt, u32 value, u32 subsys, u32 offset); +int cmdq_pkt_write(struct cmdq_pkt *pkt, u32 subsys, u32 offset, u32 value); /** * cmdq_pkt_write_mask() - append write command with mask to the CMDQ packet * @pkt: the CMDQ packet - * @value: the specified target register value * @subsys:the CMDQ sub system code * @offset:register offset from CMDQ sub system + * @value: the specified target register value * @mask: the specified target register mask * * Return: 0 for success; else the error code is returned */ -int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u32 value, - u32 subsys, u32 offset, u32 mask); +int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u32 subsys, + u32 offset, u32 value, u32 mask); /** * cmdq_pkt_wfe() - append wait for event command to the CMDQ packet -- 2.18.0
[PATCH v5 5/7] perf diff: Link same basic blocks among different data
The target is to compare the performance difference (cycles diff) for the same basic blocks in different data files. The same basic block means same function, same start address and same end address. This patch finds the same basic blocks from different data files and link them together and resort by the cycles diff. v3: --- The block stuffs are maintained by new structure 'block_hist', so this patch is update accordingly. v2: --- Since now the basic block hists is changed to per symbol, the patch only links the basic block hists for the same symbol in different data files. Signed-off-by: Jin Yao --- tools/perf/builtin-diff.c | 90 +++ 1 file changed, 90 insertions(+) diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index ff2c076..864423a 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -641,6 +641,85 @@ static int process_block_per_sym(struct hist_entry *he) return 0; } +static int block_pair_cmp(struct hist_entry *a, struct hist_entry *b) +{ + struct block_info *bi_a = a->block_info; + struct block_info *bi_b = b->block_info; + int cmp; + + if (!bi_a->sym || !bi_b->sym) + return -1; + + if (bi_a->sym->name && bi_b->sym->name) { + cmp = strcmp(bi_a->sym->name, bi_b->sym->name); + if ((!cmp) && (bi_a->start == bi_b->start) && + (bi_a->end == bi_b->end)) { + return 0; + } + } + + return -1; +} + +static struct hist_entry *get_block_pair(struct hist_entry *he, +struct hists *hists_pair) +{ + struct rb_root_cached *root = hists_pair->entries_in; + struct rb_node *next = rb_first_cached(root); + int cmp; + + while (next != NULL) { + struct hist_entry *he_pair = rb_entry(next, struct hist_entry, + rb_node_in); + + next = rb_next(&he_pair->rb_node_in); + + cmp = block_pair_cmp(he_pair, he); + if (!cmp) + return he_pair; + } + + return NULL; +} + +static void compute_cycles_diff(struct hist_entry *he, + struct hist_entry *pair) +{ + pair->diff.computed = true; + if (pair->block_info->num && he->block_info->num) { + pair->diff.cycles = + pair->block_info->cycles_aggr / pair->block_info->num_aggr - + he->block_info->cycles_aggr / he->block_info->num_aggr; + } +} + +static void block_hists_match(struct hists *hists_base, + struct hists *hists_pair) +{ + struct rb_root_cached *root = hists_base->entries_in; + struct rb_node *next = rb_first_cached(root); + + while (next != NULL) { + struct hist_entry *he = rb_entry(next, struct hist_entry, +rb_node_in); + struct hist_entry *pair = get_block_pair(he, hists_pair); + + next = rb_next(&he->rb_node_in); + + if (pair) { + hist_entry__add_pair(pair, he); + compute_cycles_diff(he, pair); + } + } +} + +static int filter_cb(struct hist_entry *he, void *arg __maybe_unused) +{ + /* Skip the calculation of column length in output_resort */ + he->filtered = true; + return 0; +} + static void hists__precompute(struct hists *hists) { struct rb_root_cached *root; @@ -653,6 +732,7 @@ static void hists__precompute(struct hists *hists) next = rb_first_cached(root); while (next != NULL) { + struct block_hist *bh, *pair_bh; struct hist_entry *he, *pair; struct data__file *d; int i; @@ -681,6 +761,16 @@ static void hists__precompute(struct hists *hists) break; case COMPUTE_CYCLES: process_block_per_sym(pair); + bh = container_of(he, struct block_hist, he); + pair_bh = container_of(pair, struct block_hist, + he); + + if (bh->valid && pair_bh->valid) { + block_hists_match(&bh->block_hists, + &pair_bh->block_hists); + hists__output_resort_cb(&pair_bh->block_hists, + NULL, filter_cb); + } break; default: BUG_ON(1); -- 2.7.4
[PATCH v9 10/12] soc: mediatek: cmdq: add polling function
add polling function in cmdq helper functions Signed-off-by: Bibby Hsieh Reviewed-by: CK Hu --- drivers/soc/mediatek/mtk-cmdq-helper.c | 28 include/linux/mailbox/mtk-cmdq-mailbox.h | 1 + include/linux/soc/mediatek/mtk-cmdq.h| 15 + 3 files changed, 44 insertions(+) diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c b/drivers/soc/mediatek/mtk-cmdq-helper.c index 0886c4967ca4..70ad4d806fac 100644 --- a/drivers/soc/mediatek/mtk-cmdq-helper.c +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c @@ -220,6 +220,34 @@ int cmdq_pkt_clear_event(struct cmdq_pkt *pkt, u16 event) } EXPORT_SYMBOL(cmdq_pkt_clear_event); +int cmdq_pkt_poll(struct cmdq_pkt *pkt, u8 subsys, + u16 offset, u32 value, u32 mask) +{ + struct cmdq_instruction *inst; + + if (mask != 0x) { + inst = cmdq_pkt_append_command(pkt); + if (!inst) + return -ENOMEM; + + inst->op = CMDQ_CODE_MASK; + inst->value = ~mask; + offset = offset | 0x1; + } + + inst = cmdq_pkt_append_command(pkt); + if (!inst) + return -ENOMEM; + + inst->op = CMDQ_CODE_POLL; + inst->value = value; + inst->offset = offset; + inst->subsys = subsys; + + return 0; +} +EXPORT_SYMBOL(cmdq_pkt_poll); + static int cmdq_pkt_finalize(struct cmdq_pkt *pkt) { struct cmdq_instruction *inst; diff --git a/include/linux/mailbox/mtk-cmdq-mailbox.h b/include/linux/mailbox/mtk-cmdq-mailbox.h index c8adedefaf42..9e3502945bc1 100644 --- a/include/linux/mailbox/mtk-cmdq-mailbox.h +++ b/include/linux/mailbox/mtk-cmdq-mailbox.h @@ -46,6 +46,7 @@ enum cmdq_code { CMDQ_CODE_MASK = 0x02, CMDQ_CODE_WRITE = 0x04, + CMDQ_CODE_POLL = 0x08, CMDQ_CODE_JUMP = 0x10, CMDQ_CODE_WFE = 0x20, CMDQ_CODE_EOC = 0x40, diff --git a/include/linux/soc/mediatek/mtk-cmdq.h b/include/linux/soc/mediatek/mtk-cmdq.h index 9618debb9ceb..a345870a6d10 100644 --- a/include/linux/soc/mediatek/mtk-cmdq.h +++ b/include/linux/soc/mediatek/mtk-cmdq.h @@ -99,6 +99,21 @@ int cmdq_pkt_wfe(struct cmdq_pkt *pkt, u16 event); */ int cmdq_pkt_clear_event(struct cmdq_pkt *pkt, u16 event); +/** + * cmdq_pkt_poll() - Append polling command to the CMDQ packet, ask GCE to + * execute an instruction that wait for a specified hardware + * register to check for the value. All GCE hardware + * threads will be blocked by this instruction. + * @pkt: the CMDQ packet + * @subsys:the CMDQ sub system code + * @offset:register offset from CMDQ sub system + * @value: the specified target register value + * @mask: the specified target register mask + * + * Return: 0 for success; else the error code is returned + */ +int cmdq_pkt_poll(struct cmdq_pkt *pkt, u8 subsys, + u16 offset, u32 value, u32 mask); /** * cmdq_pkt_flush_async() - trigger CMDQ to asynchronously execute the CMDQ * packet and call back at the end of done packet -- 2.18.0
[PATCH v5 0/7] perf diff: diff cycles at basic block level
In some cases small changes in hot loops can show big differences. But it's difficult to identify these differences. perf diff currently can only diff symbols (functions). We can also expand it to diff cycles of individual programs blocks as reported by timed LBR. This would allow to identify changes in specific code accurately. With this patch set, for example, $ perf record -b ./div $ perf record -b ./div $ perf diff -c cycles # Event 'cycles' # # Baseline [Program Block Range] Cycles Diff Shared Object Symbol # .. .. # 48.75% [div.c:42 -> div.c:45] 147 div [.] main 48.75% [div.c:31 -> div.c:40] 4 div [.] main 48.75% [div.c:40 -> div.c:40] 0 div [.] main 48.75% [div.c:42 -> div.c:42] 0 div [.] main 48.75% [div.c:42 -> div.c:44] 0 div [.] main 19.02% [random_r.c:357 -> random_r.c:360] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:373] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:376] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:380] 0 libc-2.23.so [.] __random_r 19.02% [random_r.c:357 -> random_r.c:392] 0 libc-2.23.so [.] __random_r 16.17% [random.c:288 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:295] 0 libc-2.23.so [.] __random 16.17% [random.c:288 -> random.c:297] 0 libc-2.23.so [.] __random 16.17% [random.c:291 -> random.c:291] 0 libc-2.23.so [.] __random 16.17% [random.c:293 -> random.c:293] 0 libc-2.23.so [.] __random 8.21% [div.c:22 -> div.c:22] 148 div [.] compute_flag 8.21% [div.c:22 -> div.c:25] 0 div [.] compute_flag 8.21% [div.c:27 -> div.c:28] 0 div [.] compute_flag 5.52% [rand.c:26 -> rand.c:27] 0 libc-2.23.so [.] rand 5.52% [rand.c:26 -> rand.c:28] 0 libc-2.23.so [.] rand 2.27% [rand@plt+0 -> rand@plt+0] 0 div [.] rand@plt 0.01% [entry_64.S:694 -> entry_64.S:694] 16 [kernel.vmlinux] [k] native_irq_return_iret 0.00% [fair.c:7676 -> fair.c:7665] 162 [kernel.vmlinux] [k] update_blocked_averages '[Program Block Range]' indicates the range of program basic block (start -> end). If we can find the source line it prints the source line otherwise it prints the symbol+offset instead. v5: --- Only the patch 'perf diff: Use hists to manage basic blocks per symbol' is changed in v5. Since we still carry block_info in 'struct hist_entry' so we don't need our own new/free ops for hist entries. And the block_info is released in hist_entry__delete. v4: --- Use source lines or symbol+offset to indicate the basic block. Changed patches: perf diff: Print the basic block cycles diff perf diff: Documentation -c cycles option v3: --- In v3, the major change is to move most of block stuffs from 'struct hist_entry' to new structure 'struct block_hist' and update the code accordingly. But we still have to keep the block_info in 'struct hist_entry' since we need to compare by block info when inserting new entry to hists. Others are minor changes, such as abs() -> labs(), removing duplicated ops and etc. Changed patches: perf diff: Use hists to manage basic blocks per symbol perf diff: Link same basic blocks among different data perf diff: Print the basic block cycles diff v2: --- Keep standard perf diff format. Following is the v1 output. # perf diff --basic-block # Cycles diff Basic block (start:end) # ... ... # -208 hrtimer_interrupt (30b9e0:30ba42
[PATCH v9 06/12] soc: mediatek: cmdq: clear the event in cmdq initial flow
GCE hardware stored event information in own internal sysram, if the initial value in those sysram is not zero value it will cause a situation that gce can wait the event immediately after client ask gce to wait event but not really trigger the corresponding hardware. In order to make sure that the wait event function is exactly correct, we need to clear the sysram value in cmdq initial flow. Fixes: 623a6143a845 ("mailbox: mediatek: Add Mediatek CMDQ driver") Signed-off-by: Bibby Hsieh Reviewed-by: CK Hu --- drivers/mailbox/mtk-cmdq-mailbox.c | 5 + include/linux/mailbox/mtk-cmdq-mailbox.h | 2 ++ include/linux/soc/mediatek/mtk-cmdq.h| 3 --- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c b/drivers/mailbox/mtk-cmdq-mailbox.c index 69daaadc3a5f..9a6ce9f5a7db 100644 --- a/drivers/mailbox/mtk-cmdq-mailbox.c +++ b/drivers/mailbox/mtk-cmdq-mailbox.c @@ -21,6 +21,7 @@ #define CMDQ_NUM_CMD(t)(t->cmd_buf_size / CMDQ_INST_SIZE) #define CMDQ_CURR_IRQ_STATUS 0x10 +#define CMDQ_SYNC_TOKEN_UPDATE 0x68 #define CMDQ_THR_SLOT_CYCLES 0x30 #define CMDQ_THR_BASE 0x100 #define CMDQ_THR_SIZE 0x80 @@ -104,8 +105,12 @@ static void cmdq_thread_resume(struct cmdq_thread *thread) static void cmdq_init(struct cmdq *cmdq) { + int i; + WARN_ON(clk_enable(cmdq->clock) < 0); writel(CMDQ_THR_ACTIVE_SLOT_CYCLES, cmdq->base + CMDQ_THR_SLOT_CYCLES); + for (i = 0; i <= CMDQ_MAX_EVENT; i++) + writel(i, cmdq->base + CMDQ_SYNC_TOKEN_UPDATE); clk_disable(cmdq->clock); } diff --git a/include/linux/mailbox/mtk-cmdq-mailbox.h b/include/linux/mailbox/mtk-cmdq-mailbox.h index ccb73422c2fa..911475da7a53 100644 --- a/include/linux/mailbox/mtk-cmdq-mailbox.h +++ b/include/linux/mailbox/mtk-cmdq-mailbox.h @@ -19,6 +19,8 @@ #define CMDQ_WFE_UPDATEBIT(31) #define CMDQ_WFE_WAIT BIT(15) #define CMDQ_WFE_WAIT_VALUE0x1 +/** cmdq event maximum */ +#define CMDQ_MAX_EVENT 0x3ff /* * CMDQ_CODE_MASK: diff --git a/include/linux/soc/mediatek/mtk-cmdq.h b/include/linux/soc/mediatek/mtk-cmdq.h index 54ade13a9b15..4e8899972db4 100644 --- a/include/linux/soc/mediatek/mtk-cmdq.h +++ b/include/linux/soc/mediatek/mtk-cmdq.h @@ -13,9 +13,6 @@ #define CMDQ_NO_TIMEOUT0xu -/** cmdq event maximum */ -#define CMDQ_MAX_EVENT 0x3ff - struct cmdq_pkt; struct cmdq_client { -- 2.18.0
[PATCH v9 02/12] dt-binding: gce: add gce header file for mt8183
Add documentation for the mt8183 gce. Add gce header file defined the gce hardware event, subsys number and constant for mt8183. Signed-off-by: Bibby Hsieh Reviewed-by: Rob Herring --- .../devicetree/bindings/mailbox/mtk-gce.txt | 6 +- include/dt-bindings/gce/mt8183-gce.h | 177 ++ 2 files changed, 180 insertions(+), 3 deletions(-) create mode 100644 include/dt-bindings/gce/mt8183-gce.h diff --git a/Documentation/devicetree/bindings/mailbox/mtk-gce.txt b/Documentation/devicetree/bindings/mailbox/mtk-gce.txt index cfe40b01d164..1f7f8f2a3f49 100644 --- a/Documentation/devicetree/bindings/mailbox/mtk-gce.txt +++ b/Documentation/devicetree/bindings/mailbox/mtk-gce.txt @@ -9,7 +9,7 @@ CMDQ driver uses mailbox framework for communication. Please refer to mailbox.txt for generic information about mailbox device-tree bindings. Required properties: -- compatible: Must be "mediatek,mt8173-gce" +- compatible: can be "mediatek,mt8173-gce" or "mediatek,mt8183-gce" - reg: Address range of the GCE unit - interrupts: The interrupt signal from the GCE block - clock: Clocks according to the common clock binding @@ -28,8 +28,8 @@ Required properties for a client device: - mediatek,gce-subsys: u32, specify the sub-system id which is corresponding to the register address. -Some vaules of properties are defined in 'dt-bindings/gce/mt8173-gce.h'. Such as -sub-system ids, thread priority, event ids. +Some vaules of properties are defined in 'dt-bindings/gce/mt8173-gce.h' +or 'dt-binding/gce/mt8183-gce.h'. Such as sub-system ids, thread priority, event ids. Example: diff --git a/include/dt-bindings/gce/mt8183-gce.h b/include/dt-bindings/gce/mt8183-gce.h new file mode 100644 index ..aeb95154fac2 --- /dev/null +++ b/include/dt-bindings/gce/mt8183-gce.h @@ -0,0 +1,177 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2019 MediaTek Inc. + * Author: Bibby Hsieh + * + */ + +#ifndef _DT_BINDINGS_GCE_MT8183_H +#define _DT_BINDINGS_GCE_MT8183_H + +#define CMDQ_NO_TIMEOUT0x + +#define CMDQ_THR_MAX_COUNT 24 + +/* GCE HW thread priority */ +#define CMDQ_THR_PRIO_LOWEST 0 +#define CMDQ_THR_PRIO_HIGHEST 1 + +/* GCE SUBSYS */ +#define SUBSYS_13000 +#define SUBSYS_14001 +#define SUBSYS_14012 +#define SUBSYS_14023 +#define SUBSYS_15024 +#define SUBSYS_18805 +#define SUBSYS_18816 +#define SUBSYS_18827 +#define SUBSYS_18838 +#define SUBSYS_18849 +#define SUBSYS_100010 +#define SUBSYS_100111 +#define SUBSYS_100212 +#define SUBSYS_100313 +#define SUBSYS_100414 +#define SUBSYS_100515 +#define SUBSYS_102016 +#define SUBSYS_102817 +#define SUBSYS_170018 +#define SUBSYS_170119 +#define SUBSYS_170220 +#define SUBSYS_170321 +#define SUBSYS_180022 +#define SUBSYS_180123 +#define SUBSYS_180224 +#define SUBSYS_180425 +#define SUBSYS_180526 +#define SUBSYS_180827 +#define SUBSYS_180a28 +#define SUBSYS_180b29 + +#define CMDQ_EVENT_DISP_RDMA0_SOF 0 +#define CMDQ_EVENT_DISP_RDMA1_SOF 1 +#define CMDQ_EVENT_MDP_RDMA0_SOF 2 +#define CMDQ_EVENT_MDP_RSZ0_SOF 4 +#define CMDQ_EVENT_MDP_RSZ1_SOF 5 +#define CMDQ_EVENT_MDP_TDSHP_SOF 6 +#define CMDQ_EVENT_MDP_WROT0_SOF 7 +#define CMDQ_EVENT_MDP_WDMA0_SOF 8 +#define CMDQ_EVENT_DISP_OVL0_SOF 9 +#define CMDQ_EVENT_DISP_OVL0_2L_SOF10 +#define CMDQ_EVENT_DISP_OVL1_2L_SOF11 +#define CMDQ_EVENT_DISP_WDMA0_SOF 12 +#define CMDQ_EVENT_DISP_COLOR0_SOF 13 +#define CMDQ_EVENT_DISP_CCORR0_SOF 14 +#define CMDQ_EVENT_DISP_AAL0_SOF 15 +#define CMDQ_EVENT_DISP_GAMMA0_SOF 16 +#define CMDQ_EVENT_DISP_DITHER0_SOF17 +#define CMDQ_EVENT_DISP_PWM0_SOF 18 +#define CMDQ_EVENT_DISP_DSI0_SOF 19 +#define CMDQ_EVENT_DISP_DPI0_SOF
[PATCH v5 1/7] perf util: Create block_info structure
perf diff currently can only diff symbols(functions). We should expand it to diff cycles of individual programs blocks as reported by timed LBR. This would allow to identify changes in specific code accurately. We need a new structure to maintain the basic block information, such as, symbol(function), start/end address of this block, cycles. This patch creates this structure and with some ops. Signed-off-by: Jin Yao --- tools/perf/util/symbol.c | 22 ++ tools/perf/util/symbol.h | 23 +++ 2 files changed, 45 insertions(+) diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index f4540f8..4e0a7b3 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -2351,3 +2351,25 @@ struct mem_info *mem_info__new(void) refcount_set(&mi->refcnt, 1); return mi; } + +struct block_info *block_info__get(struct block_info *bi) +{ + if (bi) + refcount_inc(&bi->refcnt); + return bi; +} + +void block_info__put(struct block_info *bi) +{ + if (bi && refcount_dec_and_test(&bi->refcnt)) + free(bi); +} + +struct block_info *block_info__new(void) +{ + struct block_info *bi = zalloc(sizeof(*bi)); + + if (bi) + refcount_set(&bi->refcnt, 1); + return bi; +} diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index 9a8fe01..12755b4 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -131,6 +131,17 @@ struct mem_info { refcount_t refcnt; }; +struct block_info { + struct symbol *sym; + u64 start; + u64 end; + u64 cycles; + u64 cycles_aggr; + int num; + int num_aggr; + refcount_t refcnt; +}; + struct addr_location { struct machine *machine; struct thread *thread; @@ -332,4 +343,16 @@ static inline void __mem_info__zput(struct mem_info **mi) #define mem_info__zput(mi) __mem_info__zput(&mi) +struct block_info *block_info__new(void); +struct block_info *block_info__get(struct block_info *bi); +void block_info__put(struct block_info *bi); + +static inline void __block_info__zput(struct block_info **bi) +{ + block_info__put(*bi); + *bi = NULL; +} + +#define block_info__zput(bi) __block_info__zput(&bi) + #endif /* __PERF_SYMBOL */ -- 2.7.4
[PATCH v5 7/7] perf diff: Documentation -c cycles option
Documentation the new computation selection 'cycles'. v4: --- Change the column 'Block cycles diff [start:end]' to '[Program Block Range] Cycles Diff' Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-diff.txt | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt index facd91e..d5cc15e 100644 --- a/tools/perf/Documentation/perf-diff.txt +++ b/tools/perf/Documentation/perf-diff.txt @@ -90,9 +90,10 @@ OPTIONS -c:: --compute:: -Differential computation selection - delta, ratio, wdiff, delta-abs -(default is delta-abs). Default can be changed using diff.compute -config option. See COMPARISON METHODS section for more info. +Differential computation selection - delta, ratio, wdiff, cycles, +delta-abs (default is delta-abs). Default can be changed using +diff.compute config option. See COMPARISON METHODS section for +more info. -p:: --period:: @@ -280,6 +281,16 @@ If specified the 'Weighted diff' column is displayed with value 'd' computed as: - WEIGHT-A being the weight of the data file - WEIGHT-B being the weight of the baseline data file +cycles +~~ +If specified the '[Program Block Range] Cycles Diff' column is displayed. +It displays the cycles difference of same program basic block amongst +two perf.data. The program basic block is the code between two branches. + +'[Program Block Range]' indicates the range of a program basic block. +Source line is reported if it can be found otherwise uses symbol+offset +instead. + SEE ALSO linkperf:perf-record[1], linkperf:perf-report[1] -- 2.7.4
[PATCH v5 2/7] perf util: Add block_info in hist_entry
The block_info contains the program basic block information, i.e, contains the start address and the end address of this basic block and how much cycles it takes. We need to compare, sort and even print out the basic block by some orders, i.e. sort by cycles. For this purpose, we add block_info field to hist_entry. In order not to impact current interface, we creates a new function hists__add_entry_block. Signed-off-by: Jin Yao --- tools/perf/util/hist.c | 22 -- tools/perf/util/hist.h | 6 ++ tools/perf/util/sort.h | 1 + 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index fb3271f..680ad93 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -574,6 +574,8 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists, */ mem_info__zput(entry->mem_info); + block_info__zput(entry->block_info); + /* If the map of an existing hist_entry has * become out-of-date due to an exec() or * similar, update it. Otherwise we will @@ -645,6 +647,7 @@ __hists__add_entry(struct hists *hists, struct symbol *sym_parent, struct branch_info *bi, struct mem_info *mi, + struct block_info *block_info, struct perf_sample *sample, bool sample_self, struct hist_entry_ops *ops) @@ -677,6 +680,7 @@ __hists__add_entry(struct hists *hists, .hists = hists, .branch_info = bi, .mem_info = mi, + .block_info = block_info, .transaction = sample->transaction, .raw_data = sample->raw_data, .raw_size = sample->raw_size, @@ -699,7 +703,7 @@ struct hist_entry *hists__add_entry(struct hists *hists, struct perf_sample *sample, bool sample_self) { - return __hists__add_entry(hists, al, sym_parent, bi, mi, + return __hists__add_entry(hists, al, sym_parent, bi, mi, NULL, sample, sample_self, NULL); } @@ -712,10 +716,24 @@ struct hist_entry *hists__add_entry_ops(struct hists *hists, struct perf_sample *sample, bool sample_self) { - return __hists__add_entry(hists, al, sym_parent, bi, mi, + return __hists__add_entry(hists, al, sym_parent, bi, mi, NULL, sample, sample_self, ops); } +struct hist_entry *hists__add_entry_block(struct hists *hists, + struct hist_entry_ops *ops, + struct addr_location *al, + struct block_info *block_info) +{ + struct hist_entry entry = { + .ops = ops, + .block_info = block_info, + .hists = hists, + }, *he = hists__findnew_entry(hists, &entry, al, false); + + return he; +} + static int iter_next_nop_entry(struct hist_entry_iter *iter __maybe_unused, struct addr_location *al __maybe_unused) diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 76ff6c6..c8f7d66 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -16,6 +16,7 @@ struct addr_location; struct map_symbol; struct mem_info; struct branch_info; +struct block_info; struct symbol; enum hist_filter { @@ -149,6 +150,11 @@ struct hist_entry *hists__add_entry_ops(struct hists *hists, struct perf_sample *sample, bool sample_self); +struct hist_entry *hists__add_entry_block(struct hists *hists, + struct hist_entry_ops *ops, + struct addr_location *al, + struct block_info *bi); + int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al, int max_stack_depth, void *arg); diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index ce376a7..43623fa 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -144,6 +144,7 @@ struct hist_entry { longtime; struct hists*hists; struct mem_info *mem_info; + struct block_info *block_info; void*raw_data; u32 raw_size; int num_res; -- 2.7.4
[PATCH v9 03/12] dt-binding: gce: add binding for gce client reg property
cmdq driver provide a function that get the relationship of sub system number from device node for client. add specification for #subsys-cells, mediatek,gce-client-reg. Signed-off-by: Bibby Hsieh --- .../devicetree/bindings/mailbox/mtk-gce.txt| 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/Documentation/devicetree/bindings/mailbox/mtk-gce.txt b/Documentation/devicetree/bindings/mailbox/mtk-gce.txt index 1f7f8f2a3f49..d48282d6b02d 100644 --- a/Documentation/devicetree/bindings/mailbox/mtk-gce.txt +++ b/Documentation/devicetree/bindings/mailbox/mtk-gce.txt @@ -21,12 +21,21 @@ Required properties: priority: Priority of GCE thread. atomic_exec: GCE processing continuous packets of commands in atomic way. +- #subsys-cells: Should be 3. + <&phandle subsys_number start_offset size> + phandle: Label name of a gce node. + subsys_number: specify the sub-system id which is corresponding + to the register address. + start_offset: the start offset of register address that GCE can access. + size: the total size of register address that GCE can access. Required properties for a client device: - mboxes: Client use mailbox to communicate with GCE, it should have this property and list of phandle, mailbox specifiers. -- mediatek,gce-subsys: u32, specify the sub-system id which is corresponding - to the register address. +Optional properties for a client device: +- mediatek,gce-client-reg: Specify the sub-system id which is corresponding + to the register address, it should have this property and list of phandle, + sub-system specifiers. Some vaules of properties are defined in 'dt-bindings/gce/mt8173-gce.h' or 'dt-binding/gce/mt8183-gce.h'. Such as sub-system ids, thread priority, event ids. @@ -40,6 +49,7 @@ Example: clocks = <&infracfg CLK_INFRA_GCE>; clock-names = "gce"; #mbox-cells = <3>; + #subsys-cells = <3>; }; Example for a client device: @@ -48,9 +58,9 @@ Example for a client device: compatible = "mediatek,mt8173-mmsys"; mboxes = <&gce 0 CMDQ_THR_PRIO_LOWEST 1>, <&gce 1 CMDQ_THR_PRIO_LOWEST 1>; - mediatek,gce-subsys = ; mutex-event-eof = ; - + mediatek,gce-client-reg = <&gce SUBSYS_1400 0x3000 0x1000>, + <&gce SUBSYS_1401 0x2000 0x100>; ... }; -- 2.18.0
[PATCH v9 01/12] dt-binding: gce: remove thread-num property
"thread-num" is an unused property so we remove it from example. Signed-off-by: Bibby Hsieh Reviewed-by: Rob Herring --- Documentation/devicetree/bindings/mailbox/mtk-gce.txt | 1 - 1 file changed, 1 deletion(-) diff --git a/Documentation/devicetree/bindings/mailbox/mtk-gce.txt b/Documentation/devicetree/bindings/mailbox/mtk-gce.txt index 7d72b21c9e94..cfe40b01d164 100644 --- a/Documentation/devicetree/bindings/mailbox/mtk-gce.txt +++ b/Documentation/devicetree/bindings/mailbox/mtk-gce.txt @@ -39,7 +39,6 @@ Example: interrupts = ; clocks = <&infracfg CLK_INFRA_GCE>; clock-names = "gce"; - thread-num = CMDQ_THR_MAX_COUNT; #mbox-cells = <3>; }; -- 2.18.0
[PATCH v5 3/7] perf diff: Check if all data files with branch stacks
We will expand perf diff to support diff cycles of individual programs blocks, so it requires all data files having branch stacks. This patch checks HEADER_BRANCH_STACK in header, and only set the flag has_br_stack when HEADER_BRANCH_STACK are set in all data files. v2: --- Move check_file_brstack() from __cmd_diff() to cmd_diff(). Because later patch will check flag 'has_br_stack' before ui_init(). Signed-off-by: Jin Yao --- tools/perf/builtin-diff.c | 29 + 1 file changed, 29 insertions(+) diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index 6e79207..a7e0420 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -32,6 +32,7 @@ struct perf_diff { struct perf_time_interval *ptime_range; int range_size; int range_num; + bool has_br_stack; }; /* Diff command specific HPP columns. */ @@ -873,6 +874,31 @@ static int parse_time_str(struct data__file *d, char *abstime_ostr, return ret; } +static int check_file_brstack(void) +{ + struct data__file *d; + bool has_br_stack; + int i; + + data__for_each_file(i, d) { + d->session = perf_session__new(&d->data, false, &pdiff.tool); + if (!d->session) { + pr_err("Failed to open %s\n", d->data.path); + return -1; + } + + has_br_stack = perf_header__has_feat(&d->session->header, +HEADER_BRANCH_STACK); + perf_session__delete(d->session); + if (!has_br_stack) + return 0; + } + + /* Set only all files having branch stacks */ + pdiff.has_br_stack = true; + return 0; +} + static int __cmd_diff(void) { struct data__file *d; @@ -1487,6 +1513,9 @@ int cmd_diff(int argc, const char **argv) if (data_init(argc, argv) < 0) return -1; + if (check_file_brstack() < 0) + return -1; + if (ui_init() < 0) return -1; -- 2.7.4
Re: [RFC PATCH v3 1/4] vfio: Define device specific irq type capability
Hi, > +struct vfio_irq_info_cap_type { > + struct vfio_info_cap_header header; > + __u32 type; /* global per bus driver */ > + __u32 subtype; /* type specific */ Do we really need both type and subtype? cheers, Gerd
[PATCHv1] ARM64: defconfig: Adding LEDS_TRIGGERS_TIMER
This is to add LEDS_TRIGGERS_TIMER for blinking LED controls upon simple boot upon ARM devices Ong, Hean Loong (1): ARM64: defconfig: Add LEDS_TRIGGERS_TIMER for blinking leds arch/arm64/configs/defconfig |1 + 1 files changed, 1 insertions(+), 0 deletions(-)
[PATCHv1] ARM64: defconfig: Add LEDS_TRIGGERS_TIMER for blinking leds
Adding LED Triggers Timers for LED blinking support on ARM devices Signed-off-by: Ong, Hean Loong --- arch/arm64/configs/defconfig |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 4d58351..6fbd651 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -595,6 +595,7 @@ CONFIG_LEDS_TRIGGER_HEARTBEAT=y CONFIG_LEDS_TRIGGER_CPU=y CONFIG_LEDS_TRIGGER_DEFAULT_ON=y CONFIG_LEDS_TRIGGER_PANIC=y +CONFIG_LEDS_TRIGGER_TIMER=y CONFIG_EDAC=y CONFIG_EDAC_GHES=y CONFIG_RTC_CLASS=y -- 1.7.1
Re: linux-next: build warning after merge of the mfd tree
On 27/06/19 10:41 AM, Stephen Rothwell wrote: Hi Lee, After merging the mfd tree, today's linux-next build (x86_64 allmodconfig) produced this warning: drivers/regulator/lp87565-regulator.c: In function 'lp87565_regulator_probe': drivers/regulator/lp87565-regulator.c:182:11: warning: this statement may fall through [-Wimplicit-fallthrough=] max_idx = LP87565_BUCK_3210; Missed adding a break here. Can i send a patch on top of linux-next? ^~~ drivers/regulator/lp87565-regulator.c:183:2: note: here default: ^~~ Introduced by commit 7ee63bd74750 ("regulator: lp87565: Add 4-phase lp87561 regulator support") I get these warnings because I am building with -Wimplicit-fallthrough in attempt to catch new additions early. The gcc warning can be turned off by adding a /* fall through */ comment at the point the fall through happens (assuming that the fall through is intentional).
Re: [PATCH v2 2/5] interconnect: Add of_icc_get_by_index() helper function
Hey Georgi, I heard there is a follow up discussion planned to finalize on the which approach to follow. If we do end up with your series, I found some fixes that you might want to use when you re-post. On 2019-05-07 17:29, Sibi Sankar wrote: Hey Georgi, On 4/23/19 6:58 PM, Georgi Djakov wrote: This is the same as the traditional of_icc_get() function, but the difference is that it takes index as an argument, instead of name. Signed-off-by: Georgi Djakov --- drivers/interconnect/core.c | 45 include/linux/interconnect.h | 6 + 2 files changed, 41 insertions(+), 10 deletions(-) diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c index 871eb4bc4efc..a7c3c262c974 100644 --- a/drivers/interconnect/core.c +++ b/drivers/interconnect/core.c @@ -295,9 +295,9 @@ static struct icc_node *of_icc_get_from_provider(struct of_phandle_args *spec) } /** - * of_icc_get() - get a path handle from a DT node based on name + * of_icc_get_by_index() - get a path handle from a DT node based on index * @dev: device pointer for the consumer device - * @name: interconnect path name + * @idx: interconnect path index * * This function will search for a path between two endpoints and return an * icc_path handle on success. Use icc_put() to release constraints when they @@ -309,13 +309,12 @@ static struct icc_node *of_icc_get_from_provider(struct of_phandle_args *spec) * Return: icc_path pointer on success or ERR_PTR() on error. NULL is returned * when the API is disabled or the "interconnects" DT property is missing. */ -struct icc_path *of_icc_get(struct device *dev, const char *name) +struct icc_path *of_icc_get_by_index(struct device *dev, int idx) { struct icc_path *path = ERR_PTR(-EPROBE_DEFER); struct icc_node *src_node, *dst_node; struct device_node *np = NULL; struct of_phandle_args src_args, dst_args; - int idx = 0; int ret; if (!dev || !dev->of_node) @@ -335,12 +334,6 @@ struct icc_path *of_icc_get(struct device *dev, const char *name) * lets support only global ids and extend this in the future if needed * without breaking DT compatibility. */ - if (name) { - idx = of_property_match_string(np, "interconnect-names", name); - if (idx < 0) - return ERR_PTR(idx); - } - ret = of_parse_phandle_with_args(np, "interconnects", "#interconnect-cells", idx * 2, &src_args); @@ -383,6 +376,38 @@ struct icc_path *of_icc_get(struct device *dev, const char *name) return path; } + +/** + * of_icc_get() - get a path handle from a DT node based on name + * @dev: device pointer for the consumer device + * @name: interconnect path name + * + * This function will search for a path between two endpoints and return an + * icc_path handle on success. Use icc_put() to release constraints when they + * are not needed anymore. + * If the interconnect API is disabled, NULL is returned and the consumer + * drivers will still build. Drivers are free to handle this specifically, + * but they don't have to. + * + * Return: icc_path pointer on success or ERR_PTR() on error. NULL is returned + * when the API is disabled or the "interconnects" DT property is missing. + */ please change the description since it does not return NULL when the property is missing. +struct icc_path *of_icc_get(struct device *dev, const char *name) +{ + int idx = 0; + + if (!dev || !dev->of_node) + return ERR_PTR(-ENODEV); + + if (name) { + idx = of_property_match_string(dev->of_node, + "interconnect-names", name); + if (idx < 0) + return ERR_PTR(idx); + } + + return of_icc_get_by_index(dev, idx); +} EXPORT_SYMBOL_GPL(of_icc_get); /** diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h index dc25864755ba..0e430b3b6519 100644 --- a/include/linux/interconnect.h +++ b/include/linux/interconnect.h @@ -28,6 +28,7 @@ struct device; struct icc_path *icc_get(struct device *dev, const int src_id, const int dst_id); struct icc_path *of_icc_get(struct device *dev, const char *name); +struct icc_path *of_icc_get_by_index(struct device *dev, int idx); void icc_put(struct icc_path *path); int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw); @@ -45,6 +46,11 @@ static inline struct icc_path *of_icc_get(struct device *dev, return NULL; } +struct icc_path *of_icc_get_by_index(struct device *dev, int idx) This should be static inline instead +{ + return NULL; +} + static inline void icc_put(struct icc_path *path) { } -- -- Sibi Sankar -- Qualcomm Innovation Center, Inc. is a member of Code Auror
Re: [PATCH] recordmcount: Fix spurious mcount entries on powerpc
"Naveen N. Rao" writes: > The recent change enabling HAVE_C_RECORDMCOUNT on powerpc started > showing the following issue: > > # modprobe kprobe_example >ftrace-powerpc: Not expected bl: opcode is 3c4c0001 >WARNING: CPU: 0 PID: 227 at kernel/trace/ftrace.c:2001 > ftrace_bug+0x90/0x318 >Modules linked in: >CPU: 0 PID: 227 Comm: modprobe Not tainted 5.2.0-rc6-00678-g1c329100b942 #2 >NIP: c0264318 LR: c025d694 CTR: c0f5cd30 >REGS: c1f2b7b0 TRAP: 0700 Not tainted > (5.2.0-rc6-00678-g1c329100b942) >MSR: 90010282b033 CR: > 28228222 XER: >CFAR: c02642fc IRQMASK: 0 > >NIP [c0264318] ftrace_bug+0x90/0x318 >LR [c025d694] ftrace_process_locs+0x4f4/0x5e0 >Call Trace: >[c1f2ba40] [0004] 0x4 (unreliable) >[c1f2bad0] [c025d694] ftrace_process_locs+0x4f4/0x5e0 >[c1f2bb90] [c020ff10] load_module+0x25b0/0x30c0 >[c1f2bd00] [c0210cb0] sys_finit_module+0xc0/0x130 >[c1f2be20] [c000bda4] system_call+0x5c/0x70 >Instruction dump: >419e0018 2f83 419e00bc 2f83ffea 409e00cc 481c 0fe0 3c62ff96 >3901 3940 386386d0 48c4 <0fe0> 3ce20003 3901 3c62ff96 >---[ end trace 4c438d5cebf78381 ]--- >ftrace failed to modify >[] 0xc008012a0008 > actual: 01:00:4c:3c >Initializing ftrace call sites >ftrace record flags: 200 > (0) > expected tramp: c006af4c Aha, thanks. I saw that on one of my text boxes but hadn't pinned it down to this commit. > Fixes: c7d64b560ce80 ("powerpc/ftrace: Enable C Version of recordmcount") That commit is the tip of my next, so I'll drop it for now and merge them in the other order so there's breakage. Steve are you OK if I merge this via the powerpc tree? I'll reword the commit message so that it makes sense coming prior to the commit mentioned above. cheers > Signed-off-by: Naveen N. Rao > --- > scripts/recordmcount.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/scripts/recordmcount.h b/scripts/recordmcount.h > index 13c5e6c8829c..47fca2c69a73 100644 > --- a/scripts/recordmcount.h > +++ b/scripts/recordmcount.h > @@ -325,7 +325,8 @@ static uint_t *sift_rel_mcount(uint_t *mlocp, > if (!mcountsym) > mcountsym = get_mcountsym(sym0, relp, str0); > > - if (mcountsym == Elf_r_sym(relp) && !is_fake_mcount(relp)) { > + if (mcountsym && mcountsym == Elf_r_sym(relp) && > + !is_fake_mcount(relp)) { > uint_t const addend = > _w(_w(relp->r_offset) - recval + mcount_adjust); > mrelp->r_offset = _w(offbase > -- > 2.22.0
Re: [PATCH] scsi: uapi: ufs: Fix SPDX license identifier
A gentle ping. Thanks, Avri From: Avri Altman Sent: Wednesday, June 12, 2019 4:34:37 PM To: James E.J. Bottomley; Martin K. Petersen; linux-s...@vger.kernel.org; linux-kernel@vger.kernel.org; Arnd Bergmann; Pedro Sousa; Alim Akhtar Cc: Avi Shchislowski; Alex Lemberg; Avri Altman Subject: [PATCH] scsi: uapi: ufs: Fix SPDX license identifier added 'WITH Linux-syscall-note' exception, which is the officially assigned exception identifier for the kernel syscall exception. This exception makes it possible to include GPL headers into non GPL code, without confusing license compliance tools. fixes: a851b2bd3632 (scsi: uapi: ufs: Make utp_upiu_req visible to user space) Signed-off-by: Avri Altman --- include/uapi/scsi/scsi_bsg_ufs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/scsi/scsi_bsg_ufs.h b/include/uapi/scsi/scsi_bsg_ufs.h index 17c7abd..9988db6 100644 --- a/include/uapi/scsi/scsi_bsg_ufs.h +++ b/include/uapi/scsi/scsi_bsg_ufs.h @@ -1,4 +1,4 @@ -/* SPDX-License-Identifier: GPL-2.0 */ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* * UFS Transport SGIO v4 BSG Message Support * -- 1.9.1
[PATCH v4 2/2] fpga: dfl: fme: add performance reporting support
This patch adds support for performance reporting private feature for FPGA Management Engine (FME). Now it supports several different performance counters, including 'basic', 'cache', 'fabric', 'vtd' and 'vtd_sip'. It allows user to use standard linux tools to access these performance counters. e.g. List all events by "perf list" perf list | grep fme fme0/cache_read_hit/ [Kernel PMU event] fme0/cache_read_miss/[Kernel PMU event] ... fme0/fab_mmio_read/ [Kernel PMU event] fme0/fab_mmio_write/ [Kernel PMU event] ... fme0/fab_port_mmio_read,portid=?/[Kernel PMU event] fme0/fab_port_mmio_write,portid=?/ [Kernel PMU event] ... fme0/vtd_port_devtlb_1g_fill,portid=?/ [Kernel PMU event] fme0/vtd_port_devtlb_2m_fill,portid=?/ [Kernel PMU event] ... fme0/vtd_sip_iotlb_1g_hit/ [Kernel PMU event] fme0/vtd_sip_iotlb_1g_miss/ [Kernel PMU event] ... fme0/clock [Kernel PMU event] ... e.g. check increased counter value after run one application using "perf stat" command. perf stat -e fme0/fab_mmio_read/,fme0/fab_mmio_write/, ./test Performance counter stats for './test': 1 fme0/fab_mmio_read/ 2 fme0/fab_mmio_write/ 1.009496520 seconds time elapsed Please note that fabric counters support both fab_* and fab_port_*, but actually they are sharing one set of performance counters in hardware. If user wants to monitor overall data events on fab_* then fab_port_* can't be supported at the same time, see example below: perf stat -e fme0/fab_mmio_read/,fme0/fab_port_mmio_write,portid=0/ Performance counter stats for 'system wide': 0 fme0/fab_mmio_read/ fme0/fab_port_mmio_write,portid=0/ 2.141064085 seconds time elapsed Signed-off-by: Luwei Kang Signed-off-by: Xu Yilun Signed-off-by: Wu Hao --- v3: replace scnprintf with sprintf in sysfs interfaces. update sysfs doc kernel version and date. fix sysfs doc issue for fabric counter. refine PERF_OBJ_ATTR_* macro, doesn't count on __ATTR anymore. introduce PERF_OBJ_ATTR_F_* macro, as it needs to use different filenames for some of the sysfs attributes. remove kobject_del when destroy kobject, kobject_put is enough. do sysfs_remove_groups first when destroying perf_obj. WARN_ON_ONCE in case internal parms are wrong in read_*_count(). v4: rework this patch to use standard perf API as user interfaces. --- drivers/fpga/Makefile | 1 + drivers/fpga/dfl-fme-main.c | 4 + drivers/fpga/dfl-fme-perf.c | 871 drivers/fpga/dfl-fme.h | 2 + 4 files changed, 878 insertions(+) create mode 100644 drivers/fpga/dfl-fme-perf.c diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile index 4865b74..d8e21df 100644 --- a/drivers/fpga/Makefile +++ b/drivers/fpga/Makefile @@ -40,6 +40,7 @@ obj-$(CONFIG_FPGA_DFL_FME_REGION) += dfl-fme-region.o obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o dfl-fme-error.o +dfl-fme-objs += dfl-fme-perf.o dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o dfl-afu-objs += dfl-afu-error.o diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c index 9225b68..a11c112 100644 --- a/drivers/fpga/dfl-fme-main.c +++ b/drivers/fpga/dfl-fme-main.c @@ -639,6 +639,10 @@ static void fme_power_mgmt_uinit(struct platform_device *pdev, .ops = &fme_power_mgmt_ops, }, { + .id_table = fme_perf_id_table, + .ops = &fme_perf_ops, + }, + { .ops = NULL, }, }; diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c new file mode 100644 index 000..0d7768a --- /dev/null +++ b/drivers/fpga/dfl-fme-perf.c @@ -0,0 +1,871 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Driver for FPGA Management Engine (FME) Global Performance Reporting + * + * Copyright 2019 Intel Corporation, Inc. + * + * Authors: + * Kang Luwei + * Xiao Guangrong + * Wu Hao + * Xu Yilun + * Joseph Grecco + * Enno Luebbers + * Tim Whisonant + * Ananda Ravuri + * Mitchel, Henry + */ + +#include +#include "dfl.h" +#include "dfl-fme.h" + +/* + * Performance Counter Registers for Cache. + * + * Cache Events are listed below as CACHE_EVNT_*. + */ +#define CACHE_CTRL 0x8 +#define CACHE_RESET_CNTR BIT_ULL(0) +#define CACHE_FREEZE_CNTR BIT_ULL(8) +#define CACHE_CTRL_EVNTGENMASK_ULL(19, 16) +#define CACHE_EVNT_RD_HIT 0x0 +#define CACHE_EVNT_WR_HIT 0x1 +#define CACHE_EVNT_RD_MISS 0x2 +#define CACHE_EVNT_WR_MISS 0x3 +#define CACHE_EVNT_RSVD
Re: [PATCH] ALSA: hda: Use correct start/count for sysfs init
On Wed, 26 Jun 2019 23:59:33 +0200, Evan Green wrote: > > On Wed, Jun 26, 2019 at 2:16 PM Takashi Iwai wrote: > > > > On Wed, 26 Jun 2019 22:34:28 +0200, > > Evan Green wrote: > > > > > > On Wed, Jun 26, 2019 at 1:27 AM Takashi Iwai wrote: > > > > > > > > On Tue, 25 Jun 2019 23:54:18 +0200, > > > > Evan Green wrote: > > > > > > > > > > The normal flow through the widget sysfs codepath is that > > > > > snd_hdac_refresh_widgets() is called once without the sysfs bool set > > > > > to set up codec->num_nodes and friends, then another time with the > > > > > bool set to actually allocate all the sysfs widgets. However, during > > > > > the first time allocation, hda_widget_sysfs_reinit() ignores the new > > > > > num_nodes passed in via parameter and just calls > > > > > hda_widget_sysfs_init(), > > > > > using whatever was in codec->num_nodes before the update. This is not > > > > > correct in cases where num_nodes changes. Here's an example: > > > > > > > > > > Sometime earlier: > > > > > snd_hdac_refresh_widgets(hdac, false) > > > > > sets codec->num_nodes to 2, widgets is still not allocated > > > > > > > > > > Now: > > > > > snd_hdac_refresh_widgets(hdac, true) > > > > > hda_widget_sysfs_reinit(num_nodes=7) > > > > > hda_widget_sysfs_init() > > > > > widget_tree_create() > > > > > alloc(codec->num_nodes) // this is still 2 > > > > > codec->num_nodes = 7 > > > > > > > > > > Pass num_nodes and start_nid down into widget_tree_create() so that > > > > > the right number of nodes are allocated in all cases. > > > > > > > > > > Signed-off-by: Evan Green > > > > > > > > Thanks for the patch. That's indeed a problem, but I guess a simpler > > > > approach is just to return if sysfs didn't exist. If the sysfs > > > > entries aren't present at the second call with sysfs=true, it implies > > > > that the codec object will be exposed anyway later, and the sysfs will > > > > be created there. So, something like below would work instead? > > > > > > Hi Takashi, > > > Thanks for taking a look. I'm not sure you'd want to do that because > > > then you end up returning from sysfs_reinit without having allocated > > > any of the sysfs widgets. You'd be relying on the implicit behavior > > > that another call to init is coming later (despite having updated > > > num_nodes and start node), which is difficult to follow and easy to > > > break. In my opinion the slight bit of extra diffs is well worth the > > > clarity of having widget_tree_create always allocate the correct > > > start/count. > > > > Well, skipping is the right behavior, actually. The whole need of the > > refresh function is just to refresh the widget list, and the current > > behavior to create a sysfs is rather superfluous. This action has > > never been used, so better to get removed for avoiding misuse. > > Whoops, I sent out a v2 before seeing this. Sorry to jump the gun like that. > > I don't quite follow what you mean by "current behavior to create a > sysfs is rather superfluous". Do you think we could delete this > conditional in re-init altogether? I wasn't totally sure, but it > seemed like if the conditional could possibly be activated, then the > behavior was also incorrect. > > Actually, couldn't this happen if something goes through > widget_tree_free(), then something else goes through a reinit()? If > the reinit call doesn't have the same number of widgets as before, > then you'd need my patch to avoid initing with the wrong array size. I meant that hda_widget_sysfs_reinit() creates sysfs files if they weren't present. hda_widget_sysfs_reinit() should do nothing if the sysfs wasn't created beforehand -- like my suggested patch does. After that change, "bool sysfs" argument can be de even dropped from snd_hdac_refresh_widgets(). The very first call of this function is with sysfs=false, but at this point, codec->widgets=NULL, so reinit() would just skip. All later calls are with sysfs=true. > > > Actually, in looking at the widget lock patch, I don't think it's > > > sufficient either. It adds a lock around sysfs_reinit, but the setting > > > of codec->num_nodes and codec->start_nid is unprotected by the lock. > > > So you could have the two threads politely serialize through > > > sysfs_reinit, but then get reordered before setting codec->num_nodes, > > > landing you with an array whose length doesn't match num_nodes. > > > > The usage of snd_hdac_refresh_widgets() is supposed to be done only at > > the codec probe phase, hence there is no lock done in the core code; > > IOW, any concurrent access has to be protected in the caller side in > > general. > > > > Have you actually seen such concurrent accesses? If yes, that's a > > problem in the usage. > > I got into staring at this code while trying to debug a KASAN > use-after-free in this code. I found the issue in this patch by > inspection, so I'm not 100% sure if it could ever happen. My > use-after-free appears to be fixed by the new widget_lo
[PATCH v4 1/2] Documentation: fpga: dfl: add description for performance reporting support
From: Xu Yilun This patch adds description for performance reporting support for Device Feature List (DFL) based FPGA. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao --- Documentation/fpga/dfl.txt | 83 ++ 1 file changed, 83 insertions(+) diff --git a/Documentation/fpga/dfl.txt b/Documentation/fpga/dfl.txt index 652917f..6acd1b5 100644 --- a/Documentation/fpga/dfl.txt +++ b/Documentation/fpga/dfl.txt @@ -115,6 +115,11 @@ More functions are exposed through sysfs management information (current temperature, thresholds, threshold status, etc.). + Performance reporting + performance counters are exposed through perf PMU APIs. Standard perf tool + can be used to monitor all available perf events. Please see performance + counter section below for more detailed information. + FIU - PORT == @@ -368,6 +373,84 @@ The device nodes used for ioctl() or mmap() can be referenced through: /sys/class/fpga_region///dev +Performance Counters + +Performance reporting is one private feature implemented in FME. It could +supports several independent, system-wide, device counter sets in hardware to +monitor and count for performance events, including "basic", "cache", "fabric", +"vtd" and "vtd_sip" counters. Users could use standard perf tool to monitor +FPGA cache hit/miss rate, transaction number, interface clock counter of AFU +and other FPGA performance events. + +Different FPGA devices may have different counter sets, it depends on hardware +implementation. e.g. some discrete FPGA cards don't have any cache. User could +use "perf list" to check which perf events are supported by target hardware. + +In order to allow user to use standard perf API to access these performance +counters, driver creates a perf PMU, and related sysfs interfaces in +/sys/bus/event_source/devices/fme* to describe available perf events and +configuration options. + +The "format" directory describes the format of the config field of struct +perf_event_attr. There are 3 bitfields for config, "evtype" defines which type +the perf event belongs to. "event" is the identity of the event within its +category. "portid" is introduced to decide counters set to monitor on FPGA +overall data or a specific port. + +The "events" directory describes the configuration templates for all available +events which can be used with perf tool directly. For example, fab_mmio_read +has the configuration "event=0x06,evtype=0x02,portid=0xff", which shows this +event belongs to fabric type (0x02), the local event id is 0x06 and it is for +overall monitoring (portid=0xff). + +Example usage of perf can be: + +$# perf list |grep fme + + fme0/fab_mmio_read/ [Kernel PMU event] +<...> + fme0/fab_port_mmio_read,portid=?/[Kernel PMU event] +<...> + +$# perf stat -a -e fme0/fab_mmio_read/ +or +$# perf stat -a -e fme0/event=0x06,evtype=0x02,portid=0xff/ +or +$# perf stat -a -e fme0/config=0xff2006/ + +Another example, fab_port_mmio_read monitors mmio read of a specific port. So +its configuration template is "event=0x06,evtype=0x01,portid=?". The portid +should be explicitly set. + +Its usage of perf can be: + +$# perf stat -a -e fme0/fab_port_mmio_read,portid=0x0/ +or +$# perf stat -a -e fme0/event=0x06,evtype=0x02,portid=0x0/ +or +$# perf stat -a -e fme0/config=0x2006/ + +Please note for fabric counters, overall perf events (fab_*) and port perf +events (fab_port_*) actually share one set of counters in hardware, so it can't +monitor both at the same time. If this set of counters is configured to monitor +overall data, then per port perf data is not supported. See below example. + +$# perf stat -e fme0/fab_mmio_read/,fme0/fab_port_mmio_write,\ +portid=0/ sleep 1 + + Performance counter stats for 'system wide': + + 3 fme0/fab_mmio_read/ + fme0/fab_port_mmio_write,portid=0x0/ + + 1.001750904 seconds time elapsed + +The driver also provides a "cpumask" sysfs attribute, which always shows fixed +value cpu0 as all perf events are from system-wide counters on FPGA device. + +The current driver does not support sampling. So "perf record" is unsupported. + + Add new FIUs support It's possible that developers made some new function blocks (FIUs) under this -- 1.8.3.1
[PATCH v4 0/2] add performance reporting support to FPGA DFL drivers
This patchset adds performance reporting support for FPGA DFL drivers. It introduces one pmu to expose userspace interfaces via standard perf API. User could use standard perf tool to access perf events exposed via pmu. This patchset is splitted from patchset[1] for better review, and version 3 patch could be found here[2]. Please note that this patchset needs to be applied on top of patchset[3][4]. Main changes from v3: - add more descriptions in doc, including how to use perf tool for these hardware counters. (patch #1) - use standard perf API instead of sysfs entries. (patch #2) [1]https://lkml.org/lkml/2019/5/27/11 [2]https://lkml.org/lkml/2019/5/27/18 [3]https://lkml.org/lkml/2019/6/27/29 [4]https://lkml.org/lkml/2019/6/27/49 Wu Hao (1): fpga: dfl: fme: add performance reporting support Xu Yilun (1): Documentation: fpga: dfl: add description for performance reporting support Documentation/fpga/dfl.txt | 83 + drivers/fpga/Makefile | 1 + drivers/fpga/dfl-fme-main.c | 4 + drivers/fpga/dfl-fme-perf.c | 871 drivers/fpga/dfl-fme.h | 2 + 5 files changed, 961 insertions(+) create mode 100644 drivers/fpga/dfl-fme-perf.c -- 1.8.3.1
Re: BUG: unable to handle kernel paging request in tls_prots
syzbot has bisected this bug to: commit e9db4ef6bf4ca9894bb324c76e01b8f1a16b2650 Author: John Fastabend Date: Sat Jun 30 13:17:47 2018 + bpf: sockhash fix omitted bucket lock in sock_close bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=148e8665a0 start commit: 904d88d7 qmi_wwan: Fix out-of-bounds read git tree: net final crash:https://syzkaller.appspot.com/x/report.txt?x=168e8665a0 console output: https://syzkaller.appspot.com/x/log.txt?x=128e8665a0 kernel config: https://syzkaller.appspot.com/x/.config?x=137ec2016ea3870d dashboard link: https://syzkaller.appspot.com/bug?extid=4207c7f3a443366d8aa2 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15576c71a0 Reported-by: syzbot+4207c7f3a443366d8...@syzkaller.appspotmail.com Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close") For information about bisection process see: https://goo.gl/tpsmEJ#bisection
[PATCHv5] mm/gup: speed up check_and_migrate_cma_pages() on huge page
Both hugetlb and thp locate on the same migration type of pageblock, since they are allocated from a free_list[]. Based on this fact, it is enough to check on a single subpage to decide the migration type of the whole huge page. By this way, it saves (2M/4K - 1) times loop for pmd_huge on x86, similar on other archs. Furthermore, when executing isolate_huge_page(), it avoid taking global hugetlb_lock many times, and meanless remove/add to the local link list cma_page_list. Signed-off-by: Pingfan Liu Cc: Andrew Morton Cc: Ira Weiny Cc: Mike Rapoport Cc: "Kirill A. Shutemov" Cc: Thomas Gleixner Cc: John Hubbard Cc: "Aneesh Kumar K.V" Cc: Christoph Hellwig Cc: Keith Busch Cc: Mike Kravetz Cc: Linux-kernel@vger.kernel.org --- v3 -> v4: fix C language precedence issue v4 -> v5: drop the check PageCompound() and improve notes mm/gup.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index ddde097..1deaad2 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1336,25 +1336,30 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, struct vm_area_struct **vmas, unsigned int gup_flags) { - long i; + long i, step; bool drain_allow = true; bool migrate_allow = true; LIST_HEAD(cma_page_list); check_again: - for (i = 0; i < nr_pages; i++) { + for (i = 0; i < nr_pages;) { + + struct page *head = compound_head(pages[i]); + + /* +* gup may start from a tail page. Advance step by the left +* part. +*/ + step = (1 << compound_order(head)) - (pages[i] - head); /* * If we get a page from the CMA zone, since we are going to * be pinning these entries, we might as well move them out * of the CMA zone if possible. */ - if (is_migrate_cma_page(pages[i])) { - - struct page *head = compound_head(pages[i]); - - if (PageHuge(head)) { + if (is_migrate_cma_page(head)) { + if (PageHuge(head)) isolate_huge_page(head, &cma_page_list); - } else { + else { if (!PageLRU(head) && drain_allow) { lru_add_drain_all(); drain_allow = false; @@ -1369,6 +1374,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, } } } + + i += step; } if (!list_empty(&cma_page_list)) { -- 2.7.5
[PATCH] x86/boot: Make gdt 8-byte aligned
When loading segment descriptor, it uses lock implicitly. Align gdt here to avoid potential split lock from crossing cache lines case. Signed-off-by: Xiaoyao Li --- arch/x86/boot/compressed/head_64.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index fafb75c6c592..6233ae35d0d9 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -659,6 +659,7 @@ no_longmode: gdt64: .word gdt_end - gdt .quad 0 + .balign 8 gdt: .word gdt_end - gdt .long gdt -- 2.19.1
[Linux-kernel-mentees][PATCH v3] nl80211: Fix undefined behavior in bit shift
Shifting signed 32-bit value by 31 bits is undefined. Changing most significant bit to unsigned. Signed-off-by: Jiunn Chang --- Changes included in v3: - remove change log from patch description Changes included in v2: - use subsystem specific subject lines - CC required mailing lists include/uapi/linux/nl80211.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h index 6f09d1500960..fa7ebbc6ff27 100644 --- a/include/uapi/linux/nl80211.h +++ b/include/uapi/linux/nl80211.h @@ -5314,7 +5314,7 @@ enum nl80211_feature_flags { NL80211_FEATURE_TDLS_CHANNEL_SWITCH = 1 << 28, NL80211_FEATURE_SCAN_RANDOM_MAC_ADDR= 1 << 29, NL80211_FEATURE_SCHED_SCAN_RANDOM_MAC_ADDR = 1 << 30, - NL80211_FEATURE_ND_RANDOM_MAC_ADDR = 1 << 31, + NL80211_FEATURE_ND_RANDOM_MAC_ADDR = 1U << 31, }; /** -- 2.22.0
[Linux-kernel-mentees][PATCH v3] packet: Fix undefined behavior in bit shift
Shifting signed 32-bit value by 31 bits is undefined. Changing most significant bit to unsigned. Signed-off-by: Jiunn Chang --- Changes included in v3: - remove change log from patch description Changes included in v2: - use subsystem specific subject lines - CC required mailing lists include/uapi/linux/if_packet.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index 467b654bd4c7..3d884d68eb30 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -123,7 +123,7 @@ struct tpacket_auxdata { /* Rx and Tx ring - header status */ #define TP_STATUS_TS_SOFTWARE (1 << 29) #define TP_STATUS_TS_SYS_HARDWARE (1 << 30) /* deprecated, never set */ -#define TP_STATUS_TS_RAW_HARDWARE (1 << 31) +#define TP_STATUS_TS_RAW_HARDWARE (1U << 31) /* Rx ring - feature request bits */ #define TP_FT_REQ_FILL_RXHASH 0x1 -- 2.22.0
[PATCH v4 05/15] Documentation: fpga: dfl: add descriptions for virtualization and new interfaces.
This patch adds virtualization support description for DFL based FPGA devices (based on PCIe SRIOV), and introductions to new interfaces added by new dfl private feature drivers. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull --- Documentation/fpga/dfl.txt | 101 + 1 file changed, 101 insertions(+) diff --git a/Documentation/fpga/dfl.txt b/Documentation/fpga/dfl.txt index 6df4621..a22631f 100644 --- a/Documentation/fpga/dfl.txt +++ b/Documentation/fpga/dfl.txt @@ -84,6 +84,8 @@ The following functions are exposed through ioctls: Get driver API version (DFL_FPGA_GET_API_VERSION) Check for extensions (DFL_FPGA_CHECK_EXTENSION) Program bitstream (DFL_FPGA_FME_PORT_PR) + Assign port to PF (DFL_FPGA_FME_PORT_ASSIGN) + Release port from PF (DFL_FPGA_FME_PORT_RELEASE) More functions are exposed through sysfs (/sys/class/fpga_region/regionX/dfl-fme.n/): @@ -99,6 +101,10 @@ More functions are exposed through sysfs one FPGA device may have more than one port, this sysfs interface indicates how many ports the FPGA device has. + Global error reporting management (errors/) + error reporting sysfs interfaces allow user to read errors detected by the + hardware, and clear the logged errors. + FIU - PORT == @@ -139,6 +145,10 @@ More functions are exposed through sysfs: Read Accelerator GUID (afu_id) afu_id indicates which PR bitstream is programmed to this AFU. + Error reporting (errors/) + error reporting sysfs interfaces allow user to read port/afu errors + detected by the hardware, and clear the logged errors. + DFL Framework Overview == @@ -212,6 +222,97 @@ the compat_id exposed by the target FPGA region. This check is usually done by userspace before calling the reconfiguration IOCTL. +FPGA virtualization - PCIe SRIOV + +This section describes the virtualization support on DFL based FPGA device to +enable accessing an accelerator from applications running in a virtual machine +(VM). This section only describes the PCIe based FPGA device with SRIOV support. + +Features supported by the particular FPGA device are exposed through Device +Feature Lists, as illustrated below: + + +---+ +-+ + | PF | | VF | + +---+ +-+ + ^^ ^ ^ + || | | ++-||-|--|---+ +| || | | | +| +-+ +---+ +---+ +---+ | +| | FME | | Port0 | | Port1 | | Port2 | | +| +-+ +---+ +---+ +---+ | +| ^ ^ ^ | +| | | | | +| +---+ +--+ +---+ | +| | AFU | | AFU | | AFU | | +| +---+ +--+ +---+ | +| | +|DFL based FPGA PCIe Device | ++---+ + +FME is always accessed through the physical function (PF). + +Ports (and related AFUs) are accessed via PF by default, but could be exposed +through virtual function (VF) devices via PCIe SRIOV. Each VF only contains +1 Port and 1 AFU for isolation. Users could assign individual VFs (accelerators) +created via PCIe SRIOV interface, to virtual machines. + +The driver organization in virtualization case is illustrated below: + + +---++--++--+ | + | FME || FME || FME | | + | FPGA || FPGA || FPGA | | + |Manager||Bridge||Region| | + +---++--++--+ | + +---+ ++ | ++ + | FME | | AFU | | | AFU | + | Module| | Module | | | Module | + +---+ ++ | ++ ++---+ | +---+ +| FPGA Container Device | | | FPGA Container Device | +| (FPGA Base Region) | | | (FPGA Base Region) | ++---+ | +---+ + +--+ | +--+ + | FPGA PCIE Module | | Virtual | FPGA PCIE Module | + +--+ Host | Machine +--+ + -- | -- + +---+| +---+ + | PCI PF Device || | PCI VF Device | + +---+| +---+ + +FPGA PCIe device dr
[PATCH v4 13/15] fpga: dfl: afu: add STP (SignalTap) support
STP (SignalTap) is one of the private features under the port for debugging. This patch adds private feature driver support for it to allow userspace applications to mmap related mmio region and provide STP service. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Moritz Fischer Acked-by: Alan Tull --- drivers/fpga/dfl-afu-main.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c index bcf6e28..8241ace 100644 --- a/drivers/fpga/dfl-afu-main.c +++ b/drivers/fpga/dfl-afu-main.c @@ -513,6 +513,36 @@ static void port_afu_uinit(struct platform_device *pdev, .uinit = port_afu_uinit, }; +static int port_stp_init(struct platform_device *pdev, +struct dfl_feature *feature) +{ + struct resource *res = &pdev->resource[feature->resource_index]; + + dev_dbg(&pdev->dev, "PORT STP Init.\n"); + + return afu_mmio_region_add(dev_get_platdata(&pdev->dev), + DFL_PORT_REGION_INDEX_STP, + resource_size(res), res->start, + DFL_PORT_REGION_MMAP | DFL_PORT_REGION_READ | + DFL_PORT_REGION_WRITE); +} + +static void port_stp_uinit(struct platform_device *pdev, + struct dfl_feature *feature) +{ + dev_dbg(&pdev->dev, "PORT STP UInit.\n"); +} + +static const struct dfl_feature_id port_stp_id_table[] = { + {.id = PORT_FEATURE_ID_STP,}, + {0,} +}; + +static const struct dfl_feature_ops port_stp_ops = { + .init = port_stp_init, + .uinit = port_stp_uinit, +}; + static struct dfl_feature_driver port_feature_drvs[] = { { .id_table = port_hdr_id_table, @@ -527,6 +557,10 @@ static void port_afu_uinit(struct platform_device *pdev, .ops = &port_err_ops, }, { + .id_table = port_stp_id_table, + .ops = &port_stp_ops, + }, + { .ops = NULL, } }; -- 1.8.3.1
[PATCH v4 08/15] fpga: dfl: afu: add AFU state related sysfs interfaces
This patch introduces more sysfs interfaces for Accelerated Function Unit (AFU). These interfaces allow users to read current AFU Power State (APx), read / clear AFU Power (APx) events which are sticky to identify transient APx state, and manage AFU's LTR (latency tolerance reporting). Signed-off-by: Ananda Ravuri Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull --- v3: replace scnprintf with sprintf in sysfs interfaces. update sysfs doc kernel version and date. v4: update sysfs doc date. --- Documentation/ABI/testing/sysfs-platform-dfl-port | 30 + drivers/fpga/dfl-afu-main.c | 140 ++ drivers/fpga/dfl.h| 11 ++ 3 files changed, 181 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port index 6a92dda..17b37d1 100644 --- a/Documentation/ABI/testing/sysfs-platform-dfl-port +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port @@ -14,3 +14,33 @@ Description: Read-only. User can program different PR bitstreams to FPGA Accelerator Function Unit (AFU) for different functions. It returns uuid which could be used to identify which PR bitstream is programmed in this AFU. + +What: /sys/bus/platform/devices/dfl-port.0/power_state +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It reports the APx (AFU Power) state, different APx + means different throttling level. When reading this file, it + returns "0" - Normal / "1" - AP1 / "2" - AP2 / "6" - AP6. + +What: /sys/bus/platform/devices/dfl-port.0/ap1_event +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-write. Read or set 1 to clear AP1 (AFU Power State 1) + event. It's used to indicate transient AP1 state. + +What: /sys/bus/platform/devices/dfl-port.0/ap2_event +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-write. Read or set 1 to clear AP2 (AFU Power State 2) + event. It's used to indicate transient AP2 state. + +What: /sys/bus/platform/devices/dfl-port.0/ltr +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-write. Read and set AFU latency tolerance reporting value. + Set ltr to 1 if the AFU can tolerate latency >= 40us or set it + to 0 if it is latency sensitive. diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c index 02baa6a..040ed8a 100644 --- a/drivers/fpga/dfl-afu-main.c +++ b/drivers/fpga/dfl-afu-main.c @@ -21,6 +21,8 @@ #include "dfl-afu.h" +#define DRV_VERSION"0.8" + /** * port_enable - enable a port * @pdev: port platform device. @@ -141,8 +143,145 @@ static int port_get_id(struct platform_device *pdev) } static DEVICE_ATTR_RO(id); +static ssize_t +ltr_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + mutex_lock(&pdata->lock); + v = readq(base + PORT_HDR_CTRL); + mutex_unlock(&pdata->lock); + + return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_CTRL_LATENCY, v)); +} + +static ssize_t +ltr_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + void __iomem *base; + u8 ltr; + u64 v; + + if (kstrtou8(buf, 0,1) + return -EINVAL; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + mutex_lock(&pdata->lock); + v = readq(base + PORT_HDR_CTRL); + v &= ~PORT_CTRL_LATENCY; + v |= FIELD_PREP(PORT_CTRL_LATENCY, ltr); + writeq(v, base + PORT_HDR_CTRL); + mutex_unlock(&pdata->lock); + + return count; +} +static DEVICE_ATTR_RW(ltr); + +static ssize_t +ap1_event_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + mutex_lock(&pdata->lock); + v = readq(base + PORT_HDR_STS); + mutex_unlock(&pdata->lock); + + return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_STS_AP1_EVT, v)); +} + +static ssize_t +ap1_event_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + void __iomem *base; + u8 ap1_event; + + if (kstrtou8(buf, 0, &ap1_event) || ap1_event != 1) + return -EI
[PATCH v4 10/15] fpga: dfl: add id_table for dfl private feature driver
This patch adds id_table for each dfl private feature driver, it allows to reuse same private feature driver to match and support multiple dfl private features. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Moritz Fischer Acked-by: Alan Tull --- drivers/fpga/dfl-afu-main.c | 14 -- drivers/fpga/dfl-fme-main.c | 11 --- drivers/fpga/dfl-fme-pr.c | 7 ++- drivers/fpga/dfl-fme.h | 3 ++- drivers/fpga/dfl.c | 21 +++-- drivers/fpga/dfl.h | 21 +++-- 6 files changed, 62 insertions(+), 15 deletions(-) diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c index 8b434a4..65b3e89 100644 --- a/drivers/fpga/dfl-afu-main.c +++ b/drivers/fpga/dfl-afu-main.c @@ -435,6 +435,11 @@ static void port_hdr_uinit(struct platform_device *pdev, return ret; } +static const struct dfl_feature_id port_hdr_id_table[] = { + {.id = PORT_FEATURE_ID_HEADER,}, + {0,} +}; + static const struct dfl_feature_ops port_hdr_ops = { .init = port_hdr_init, .uinit = port_hdr_uinit, @@ -495,6 +500,11 @@ static void port_afu_uinit(struct platform_device *pdev, sysfs_remove_files(&pdev->dev.kobj, port_afu_attrs); } +static const struct dfl_feature_id port_afu_id_table[] = { + {.id = PORT_FEATURE_ID_AFU,}, + {0,} +}; + static const struct dfl_feature_ops port_afu_ops = { .init = port_afu_init, .uinit = port_afu_uinit, @@ -502,11 +512,11 @@ static void port_afu_uinit(struct platform_device *pdev, static struct dfl_feature_driver port_feature_drvs[] = { { - .id = PORT_FEATURE_ID_HEADER, + .id_table = port_hdr_id_table, .ops = &port_hdr_ops, }, { - .id = PORT_FEATURE_ID_AFU, + .id_table = port_afu_id_table, .ops = &port_afu_ops, }, { diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c index 8b2a337..38c6342 100644 --- a/drivers/fpga/dfl-fme-main.c +++ b/drivers/fpga/dfl-fme-main.c @@ -158,6 +158,11 @@ static long fme_hdr_ioctl(struct platform_device *pdev, return -ENODEV; } +static const struct dfl_feature_id fme_hdr_id_table[] = { + {.id = FME_FEATURE_ID_HEADER,}, + {0,} +}; + static const struct dfl_feature_ops fme_hdr_ops = { .init = fme_hdr_init, .uinit = fme_hdr_uinit, @@ -166,12 +171,12 @@ static long fme_hdr_ioctl(struct platform_device *pdev, static struct dfl_feature_driver fme_feature_drvs[] = { { - .id = FME_FEATURE_ID_HEADER, + .id_table = fme_hdr_id_table, .ops = &fme_hdr_ops, }, { - .id = FME_FEATURE_ID_PR_MGMT, - .ops = &pr_mgmt_ops, + .id_table = fme_pr_mgmt_id_table, + .ops = &fme_pr_mgmt_ops, }, { .ops = NULL, diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c index cd94ba8..52f1745 100644 --- a/drivers/fpga/dfl-fme-pr.c +++ b/drivers/fpga/dfl-fme-pr.c @@ -483,7 +483,12 @@ static long fme_pr_ioctl(struct platform_device *pdev, return ret; } -const struct dfl_feature_ops pr_mgmt_ops = { +const struct dfl_feature_id fme_pr_mgmt_id_table[] = { + {.id = FME_FEATURE_ID_PR_MGMT,}, + {0} +}; + +const struct dfl_feature_ops fme_pr_mgmt_ops = { .init = pr_mgmt_init, .uinit = pr_mgmt_uinit, .ioctl = fme_pr_ioctl, diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h index de20755..7a021c4 100644 --- a/drivers/fpga/dfl-fme.h +++ b/drivers/fpga/dfl-fme.h @@ -35,6 +35,7 @@ struct dfl_fme { struct dfl_feature_platform_data *pdata; }; -extern const struct dfl_feature_ops pr_mgmt_ops; +extern const struct dfl_feature_ops fme_pr_mgmt_ops; +extern const struct dfl_feature_id fme_pr_mgmt_id_table[]; #endif /* __DFL_FME_H */ diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c index 28d61b6..1bb2b58 100644 --- a/drivers/fpga/dfl.c +++ b/drivers/fpga/dfl.c @@ -14,6 +14,8 @@ #include "dfl.h" +#define DRV_VERSION"0.8" + static DEFINE_MUTEX(dfl_id_mutex); /* @@ -281,6 +283,21 @@ static int dfl_feature_instance_init(struct platform_device *pdev, return ret; } +static bool dfl_feature_drv_match(struct dfl_feature *feature, + struct dfl_feature_driver *driver) +{ + const struct dfl_feature_id *ids = driver->id_table; + + if (ids) { + while (ids->id) { + if (ids->id == feature->id) + return true; + ids++; + } + } + return false; +} + /** * dfl_fpga_dev_feature_init - init for sub features of dfl feature device * @pdev: feature device. @@ -301,8 +318,7 @@ int dfl_fpga_dev_feature_init(struct platform_device *pdev, while (drv->ops) { dfl_f
[PATCH v4 12/15] fpga: dfl: afu: add error reporting support.
Error reporting is one important private feature, it reports error detected on port and accelerated function unit (AFU). It introduces several sysfs interfaces to allow userspace to check and clear errors detected by hardware. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull --- v2: add more error code description for error clear sysfs in doc. return -EINVAL instead of -EBUSY when input error code doesn't match in error clear sysfs. v3: replace scnprintf with sprintf in sysfs interfaces. update sysfs doc kernel version and date. v4: update sysfs doc date. --- Documentation/ABI/testing/sysfs-platform-dfl-port | 39 drivers/fpga/Makefile | 1 + drivers/fpga/dfl-afu-error.c | 225 ++ drivers/fpga/dfl-afu-main.c | 4 + drivers/fpga/dfl-afu.h| 4 + 5 files changed, 273 insertions(+) create mode 100644 drivers/fpga/dfl-afu-error.c diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port index 04ea7f2..4aeca94 100644 --- a/Documentation/ABI/testing/sysfs-platform-dfl-port +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port @@ -79,3 +79,42 @@ KernelVersion: 5.3 Contact: Wu Hao Description: Read-only. Read this file to get the status of issued command to userclck_freqcntrcmd. + +What: /sys/bus/platform/devices/dfl-port.0/errors/revision +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the revision of this error + reporting private feature. + +What: /sys/bus/platform/devices/dfl-port.0/errors/errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get errors detected on port and + Accelerated Function Unit (AFU). + +What: /sys/bus/platform/devices/dfl-port.0/errors/first_error +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the first error detected by + hardware. + +What: /sys/bus/platform/devices/dfl-port.0/errors/first_malformed_req +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the first malformed request + captured by hardware. + +What: /sys/bus/platform/devices/dfl-port.0/errors/clear +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Write-only. Write error code to this file to clear errors. + Write fails with -EINVAL if input parsing fails or input error + code doesn't match. + Write fails with -EBUSY or -ETIMEDOUT if error can't be cleared + as hardware is in low power state (-EBUSY) or not responding + (-ETIMEDOUT). diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile index 312b937..7255891 100644 --- a/drivers/fpga/Makefile +++ b/drivers/fpga/Makefile @@ -41,6 +41,7 @@ obj-$(CONFIG_FPGA_DFL_AFU)+= dfl-afu.o dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o +dfl-afu-objs += dfl-afu-error.o # Drivers for FPGAs which implement DFL obj-$(CONFIG_FPGA_DFL_PCI) += dfl-pci.o diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c new file mode 100644 index 000..f20dbdf --- /dev/null +++ b/drivers/fpga/dfl-afu-error.c @@ -0,0 +1,225 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Driver for FPGA Accelerated Function Unit (AFU) Error Reporting + * + * Copyright 2019 Intel Corporation, Inc. + * + * Authors: + * Wu Hao + * Xiao Guangrong + * Joseph Grecco + * Enno Luebbers + * Tim Whisonant + * Ananda Ravuri + * Mitchel Henry + */ + +#include + +#include "dfl-afu.h" + +#define PORT_ERROR_MASK0x8 +#define PORT_ERROR 0x10 +#define PORT_FIRST_ERROR 0x18 +#define PORT_MALFORMED_REQ00x20 +#define PORT_MALFORMED_REQ10x28 + +#define ERROR_MASK GENMASK_ULL(63, 0) + +/* mask or unmask port errors by the error mask register. */ +static void __port_err_mask(struct device *dev, bool mask) +{ + void __iomem *base; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR); + + writeq(mask ? ERROR_MASK : 0, base + PORT_ERROR_MASK); +} + +/* clear port errors. */ +static int __port_err_clear(struct device *dev, u64 err) +{ + struct platform_device *pdev = to_platform_device(dev); + void __iomem *base_err, *base_hdr; + int ret; + u64 v; + + base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR); + base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + /* +* clear Port Errors +* +* -
[PATCH v4 09/15] fpga: dfl: afu: add userclock sysfs interfaces.
This patch introduces userclock sysfs interfaces for AFU, user could use these interfaces for clock setting to AFU. Please note that, this is only working for port header feature with revision 0, for later revisions, userclock setting is moved to a separated private feature, so one revision sysfs interface is exposed to userspace application for this purpose too. Signed-off-by: Ananda Ravuri Signed-off-by: Russ Weight Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull --- v3: replace scnprintf with sprintf in sysfs interfaces. update sysfs doc kernel version and date. v4: update sysfs doc date. --- Documentation/ABI/testing/sysfs-platform-dfl-port | 35 +++ drivers/fpga/dfl-afu-main.c | 113 +- drivers/fpga/dfl.h| 4 + 3 files changed, 151 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port index 17b37d1..04ea7f2 100644 --- a/Documentation/ABI/testing/sysfs-platform-dfl-port +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port @@ -44,3 +44,38 @@ Contact: Wu Hao Description: Read-write. Read and set AFU latency tolerance reporting value. Set ltr to 1 if the AFU can tolerate latency >= 40us or set it to 0 if it is latency sensitive. + +What: /sys/bus/platform/devices/dfl-port.0/revision +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the revision of port header + feature. + +What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcmd +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Write-only. User writes command to this interface to set + userclock to AFU. + +What: /sys/bus/platform/devices/dfl-port.0/userclk_freqsts +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the status of issued command + to userclck_freqcmd. + +What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrcmd +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Write-only. User writes command to this interface to set + userclock counter. + +What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrsts +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the status of issued command + to userclck_freqcntrcmd. diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c index 040ed8a..8b434a4 100644 --- a/drivers/fpga/dfl-afu-main.c +++ b/drivers/fpga/dfl-afu-main.c @@ -144,6 +144,17 @@ static int port_get_id(struct platform_device *pdev) static DEVICE_ATTR_RO(id); static ssize_t +revision_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + void __iomem *base; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + return sprintf(buf, "%x\n", dfl_feature_revision(base)); +} +static DEVICE_ATTR_RO(revision); + +static ssize_t ltr_show(struct device *dev, struct device_attribute *attr, char *buf) { struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); @@ -278,6 +289,7 @@ static int port_get_id(struct platform_device *pdev) static const struct attribute *port_hdr_attrs[] = { &dev_attr_id.attr, + &dev_attr_revision.attr, &dev_attr_ltr.attr, &dev_attr_ap1_event.attr, &dev_attr_ap2_event.attr, @@ -285,14 +297,112 @@ static int port_get_id(struct platform_device *pdev) NULL, }; +static ssize_t +userclk_freqcmd_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + u64 userclk_freq_cmd; + void __iomem *base; + + if (kstrtou64(buf, 0, &userclk_freq_cmd)) + return -EINVAL; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + mutex_lock(&pdata->lock); + writeq(userclk_freq_cmd, base + PORT_HDR_USRCLK_CMD0); + mutex_unlock(&pdata->lock); + + return count; +} +static DEVICE_ATTR_WO(userclk_freqcmd); + +static ssize_t +userclk_freqcntrcmd_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + u64 userclk_freqcntr_cmd; + void __iomem *base; + + if (kstrtou64(buf, 0, &userclk_freqcntr_cmd)) + return -EINVAL; + + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); + + mutex_lock(&pdata->lock); + writeq(userclk_freqcntr_cmd, base + PORT_HDR_USRCLK_CMD1); + mutex_unlock(&pdata
[PATCH v4 14/15] fpga: dfl: fme: add capability sysfs interfaces
This patch adds 3 read-only sysfs interfaces for FPGA Management Engine (FME) block for capabilities including cache_size, fabric_version and socket_id. Signed-off-by: Luwei Kang Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull --- v3: replace scnprintf with sprintf in sysfs interfaces. update sysfs doc kernel version and date. v4: update sysfs doc date. --- Documentation/ABI/testing/sysfs-platform-dfl-fme | 23 drivers/fpga/dfl-fme-main.c | 48 2 files changed, 71 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme index 8fa4feb..99cd3b2 100644 --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme @@ -21,3 +21,26 @@ Contact: Wu Hao Description: Read-only. It returns Bitstream (static FPGA region) meta data, which includes the synthesis date, seed and other information of this static FPGA region. + +What: /sys/bus/platform/devices/dfl-fme.0/cache_size +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It returns cache size of this FPGA device. + +What: /sys/bus/platform/devices/dfl-fme.0/fabric_version +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It returns fabric version of this FPGA device. + Userspace applications need this information to select + best data channels per different fabric design. + +What: /sys/bus/platform/devices/dfl-fme.0/socket_id +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It returns socket_id to indicate which socket + this FPGA belongs to, only valid for integrated solution. + User only needs this information, in case standard numa node + can't provide correct information. diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c index 38c6342..2d69b8f 100644 --- a/drivers/fpga/dfl-fme-main.c +++ b/drivers/fpga/dfl-fme-main.c @@ -75,10 +75,58 @@ static ssize_t bitstream_metadata_show(struct device *dev, } static DEVICE_ATTR_RO(bitstream_metadata); +static ssize_t cache_size_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER); + + v = readq(base + FME_HDR_CAP); + + return sprintf(buf, "%u\n", + (unsigned int)FIELD_GET(FME_CAP_CACHE_SIZE, v)); +} +static DEVICE_ATTR_RO(cache_size); + +static ssize_t fabric_version_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER); + + v = readq(base + FME_HDR_CAP); + + return sprintf(buf, "%u\n", + (unsigned int)FIELD_GET(FME_CAP_FABRIC_VERID, v)); +} +static DEVICE_ATTR_RO(fabric_version); + +static ssize_t socket_id_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER); + + v = readq(base + FME_HDR_CAP); + + return sprintf(buf, "%u\n", + (unsigned int)FIELD_GET(FME_CAP_SOCKET_ID, v)); +} +static DEVICE_ATTR_RO(socket_id); + static const struct attribute *fme_hdr_attrs[] = { &dev_attr_ports_num.attr, &dev_attr_bitstream_id.attr, &dev_attr_bitstream_metadata.attr, + &dev_attr_cache_size.attr, + &dev_attr_fabric_version.attr, + &dev_attr_socket_id.attr, NULL, }; -- 1.8.3.1
[PATCH v4 11/15] fpga: dfl: afu: export __port_enable/disable function.
As these two functions are used by other private features. e.g. in error reporting private feature, it requires to check port status and reset port for error clearing. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Moritz Fischer Acked-by: Alan Tull --- drivers/fpga/dfl-afu-main.c | 25 ++--- drivers/fpga/dfl-afu.h | 3 +++ 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c index 65b3e89..c8bc0b5 100644 --- a/drivers/fpga/dfl-afu-main.c +++ b/drivers/fpga/dfl-afu-main.c @@ -24,14 +24,16 @@ #define DRV_VERSION"0.8" /** - * port_enable - enable a port + * __port_enable - enable a port * @pdev: port platform device. * * Enable Port by clear the port soft reset bit, which is set by default. * The AFU is unable to respond to any MMIO access while in reset. - * port_enable function should only be used after port_disable function. + * __port_enable function should only be used after __port_disable function. + * + * The caller needs to hold lock for protection. */ -static void port_enable(struct platform_device *pdev) +void __port_enable(struct platform_device *pdev) { struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev); void __iomem *base; @@ -54,13 +56,14 @@ static void port_enable(struct platform_device *pdev) #define RST_POLL_TIMEOUT 1000 /* us */ /** - * port_disable - disable a port + * __port_disable - disable a port * @pdev: port platform device. * - * Disable Port by setting the port soft reset bit, it puts the port into - * reset. + * Disable Port by setting the port soft reset bit, it puts the port into reset. + * + * The caller needs to hold lock for protection. */ -static int port_disable(struct platform_device *pdev) +int __port_disable(struct platform_device *pdev) { struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev); void __iomem *base; @@ -106,9 +109,9 @@ static int __port_reset(struct platform_device *pdev) { int ret; - ret = port_disable(pdev); + ret = __port_disable(pdev); if (!ret) - port_enable(pdev); + __port_enable(pdev); return ret; } @@ -805,9 +808,9 @@ static int port_enable_set(struct platform_device *pdev, bool enable) mutex_lock(&pdata->lock); if (enable) - port_enable(pdev); + __port_enable(pdev); else - ret = port_disable(pdev); + ret = __port_disable(pdev); mutex_unlock(&pdata->lock); return ret; diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h index 0c7630a..35e60c5 100644 --- a/drivers/fpga/dfl-afu.h +++ b/drivers/fpga/dfl-afu.h @@ -79,6 +79,9 @@ struct dfl_afu { struct dfl_feature_platform_data *pdata; }; +void __port_enable(struct platform_device *pdev); +int __port_disable(struct platform_device *pdev); + void afu_mmio_region_init(struct dfl_feature_platform_data *pdata); int afu_mmio_region_add(struct dfl_feature_platform_data *pdata, u32 region_index, u64 region_size, u64 phys, u32 flags); -- 1.8.3.1
[PATCH v4 02/15] fpga: dfl: fme: remove copy_to_user() in ioctl for PR
This patch removes copy_to_user() code in partial reconfiguration ioctl, as it's useless as user never needs to read the data structure after ioctl. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Moritz Fischer Acked-by: Alan Tull --- v2: clean up code split from patch 2 in v1 patchset. v3: no change. v4: no change. --- drivers/fpga/dfl-fme-pr.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c index d9ca955..6ec0f09 100644 --- a/drivers/fpga/dfl-fme-pr.c +++ b/drivers/fpga/dfl-fme-pr.c @@ -159,9 +159,6 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg) mutex_unlock(&pdata->lock); free_exit: vfree(buf); - if (copy_to_user((void __user *)arg, &port_pr, minsz)) - return -EFAULT; - return ret; } -- 1.8.3.1
[PATCH v4 01/15] fpga: dfl-fme-mgr: fix FME_PR_INTFC_ID register address.
FME_PR_INTFC_ID is used as compat_id for fpga manager and region, but high 64 bits and low 64 bits of the compat_id are swapped by mistake. This patch fixes this problem by fixing register address. Signed-off-by: Wu Hao Acked-by: Alan Tull Acked-by: Moritz Fischer --- drivers/fpga/dfl-fme-mgr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c index 76f3770..b3f7eee 100644 --- a/drivers/fpga/dfl-fme-mgr.c +++ b/drivers/fpga/dfl-fme-mgr.c @@ -30,8 +30,8 @@ #define FME_PR_STS 0x10 #define FME_PR_DATA0x18 #define FME_PR_ERR 0x20 -#define FME_PR_INTFC_ID_H 0xA8 -#define FME_PR_INTFC_ID_L 0xB0 +#define FME_PR_INTFC_ID_L 0xA8 +#define FME_PR_INTFC_ID_H 0xB0 /* FME PR Control Register Bitfield */ #define FME_PR_CTRL_PR_RST BIT_ULL(0) /* Reset PR engine */ -- 1.8.3.1
[PATCH v4 07/15] fpga: dfl: pci: enable SRIOV support.
This patch enables the standard sriov support. It allows user to enable SRIOV (and VFs), then user could pass through accelerators (VFs) into virtual machine or use VFs directly in host. Signed-off-by: Zhang Yi Z Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Acked-by: Moritz Fischer --- drivers/fpga/dfl-pci.c | 40 drivers/fpga/dfl.c | 41 + drivers/fpga/dfl.h | 1 + 3 files changed, 82 insertions(+) diff --git a/drivers/fpga/dfl-pci.c b/drivers/fpga/dfl-pci.c index 66b5720..2fa571b 100644 --- a/drivers/fpga/dfl-pci.c +++ b/drivers/fpga/dfl-pci.c @@ -223,8 +223,46 @@ int cci_pci_probe(struct pci_dev *pcidev, const struct pci_device_id *pcidevid) return ret; } +static int cci_pci_sriov_configure(struct pci_dev *pcidev, int num_vfs) +{ + struct cci_drvdata *drvdata = pci_get_drvdata(pcidev); + struct dfl_fpga_cdev *cdev = drvdata->cdev; + int ret = 0; + + mutex_lock(&cdev->lock); + + if (!num_vfs) { + /* +* disable SRIOV and then put released ports back to default +* PF access mode. +*/ + pci_disable_sriov(pcidev); + + __dfl_fpga_cdev_config_port_vf(cdev, false); + + } else if (cdev->released_port_num == num_vfs) { + /* +* only enable SRIOV if cdev has matched released ports, put +* released ports into VF access mode firstly. +*/ + __dfl_fpga_cdev_config_port_vf(cdev, true); + + ret = pci_enable_sriov(pcidev, num_vfs); + if (ret) + __dfl_fpga_cdev_config_port_vf(cdev, false); + } else { + ret = -EINVAL; + } + + mutex_unlock(&cdev->lock); + return ret; +} + static void cci_pci_remove(struct pci_dev *pcidev) { + if (dev_is_pf(&pcidev->dev)) + cci_pci_sriov_configure(pcidev, 0); + cci_remove_feature_devs(pcidev); pci_disable_pcie_error_reporting(pcidev); } @@ -234,6 +272,7 @@ static void cci_pci_remove(struct pci_dev *pcidev) .id_table = cci_pcie_id_tbl, .probe = cci_pci_probe, .remove = cci_pci_remove, + .sriov_configure = cci_pci_sriov_configure, }; module_pci_driver(cci_pci_driver); @@ -241,3 +280,4 @@ static void cci_pci_remove(struct pci_dev *pcidev) MODULE_DESCRIPTION("FPGA DFL PCIe Device Driver"); MODULE_AUTHOR("Intel Corporation"); MODULE_LICENSE("GPL v2"); +MODULE_VERSION(DRV_VERSION); diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c index 308c808..28d61b6 100644 --- a/drivers/fpga/dfl.c +++ b/drivers/fpga/dfl.c @@ -1112,6 +1112,47 @@ int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev, } EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port); +static void config_port_vf(struct device *fme_dev, int port_id, bool is_vf) +{ + void __iomem *base; + u64 v; + + base = dfl_get_feature_ioaddr_by_id(fme_dev, FME_FEATURE_ID_HEADER); + + v = readq(base + FME_HDR_PORT_OFST(port_id)); + + v &= ~FME_PORT_OFST_ACC_CTRL; + v |= FIELD_PREP(FME_PORT_OFST_ACC_CTRL, + is_vf ? FME_PORT_OFST_ACC_VF : FME_PORT_OFST_ACC_PF); + + writeq(v, base + FME_HDR_PORT_OFST(port_id)); +} + +/** + * __dfl_fpga_cdev_config_port_vf - configure port to VF access mode + * + * @cdev: parent container device. + * @if_vf: true for VF access mode, and false for PF access mode + * + * Return: 0 on success, negative error code otherwise. + * + * This function is needed in sriov configuration routine. It could be used to + * configures the released ports access mode to VF or PF. + * The caller needs to hold lock for protection. + */ +void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf) +{ + struct dfl_feature_platform_data *pdata; + + list_for_each_entry(pdata, &cdev->port_dev_list, node) { + if (device_is_registered(&pdata->dev->dev)) + continue; + + config_port_vf(cdev->fme_dev, pdata->id, is_vf); + } +} +EXPORT_SYMBOL_GPL(__dfl_fpga_cdev_config_port_vf); + static int __init dfl_fpga_init(void) { int ret; diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h index 63f39ab..1350e8e 100644 --- a/drivers/fpga/dfl.h +++ b/drivers/fpga/dfl.h @@ -421,5 +421,6 @@ struct platform_device * int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev, u32 port_id, bool release); +void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf); #endif /* __FPGA_DFL_H */ -- 1.8.3.1
[PATCH v4 00/15] add new features for FPGA DFL drivers
This patchset adds more features support for FPGA Device Feature List (DFL) drivers, including PR enhancement, virtualization support based on PCIe SRIOV, private features of Port, private features of FME, and enhancement to DFL framework. Please refer to details in below list. Main changes from v3: - split performance reporting support into another patchset for better review. Main changes from v2: - move thermal/power management private feature support to another patchset, including hwmon patches and related documentation update. - update sysfs doc for kernel version and date. - replace scnprintf to sprintf for sysfs interfaces. - fix comments for performance reporting support. (patch #16) Main changes from v1: - split the clean up code in a separated patch (patch #2) - add cpu_feature_enabled check for AVX512 code (patch #4) - improve sysfs return values and also sysfs doc (patch #12 #17) - create a hwmon for thermal management sysfs interfaces (patch #15) - create a hwmon for power management sysfs interfaces (patch #16) - update docmentation according to above changes (patch #5) - improve sysfs doc for performance reporting support (patch #18) Wu Hao (15): fpga: dfl-fme-mgr: fix FME_PR_INTFC_ID register address. fpga: dfl: fme: remove copy_to_user() in ioctl for PR fpga: dfl: fme: align PR buffer size per PR datawidth fpga: dfl: fme: support 512bit data width PR Documentation: fpga: dfl: add descriptions for virtualization and new interfaces. fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support. fpga: dfl: pci: enable SRIOV support. fpga: dfl: afu: add AFU state related sysfs interfaces fpga: dfl: afu: add userclock sysfs interfaces. fpga: dfl: add id_table for dfl private feature driver fpga: dfl: afu: export __port_enable/disable function. fpga: dfl: afu: add error reporting support. fpga: dfl: afu: add STP (SignalTap) support fpga: dfl: fme: add capability sysfs interfaces fpga: dfl: fme: add global error reporting support Documentation/ABI/testing/sysfs-platform-dfl-fme | 98 ++ Documentation/ABI/testing/sysfs-platform-dfl-port | 104 ++ Documentation/fpga/dfl.txt| 101 ++ drivers/fpga/Makefile | 3 +- drivers/fpga/dfl-afu-error.c | 225 + drivers/fpga/dfl-afu-main.c | 330 ++- drivers/fpga/dfl-afu.h| 7 + drivers/fpga/dfl-fme-error.c | 385 ++ drivers/fpga/dfl-fme-main.c | 120 ++- drivers/fpga/dfl-fme-mgr.c| 117 ++- drivers/fpga/dfl-fme-pr.c | 65 ++-- drivers/fpga/dfl-fme.h| 7 +- drivers/fpga/dfl-pci.c| 40 +++ drivers/fpga/dfl.c| 169 +- drivers/fpga/dfl.h| 54 ++- include/uapi/linux/fpga-dfl.h | 32 ++ 16 files changed, 1777 insertions(+), 80 deletions(-) create mode 100644 drivers/fpga/dfl-afu-error.c create mode 100644 drivers/fpga/dfl-fme-error.c -- 1.8.3.1
[PATCH v4 15/15] fpga: dfl: fme: add global error reporting support
This patch adds support for global error reporting for FPGA Management Engine (FME), it introduces sysfs interfaces to report different error detected by the hardware, and allow user to clear errors or inject error for testing purpose. Signed-off-by: Luwei Kang Signed-off-by: Ananda Ravuri Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull --- v2: fix issues found in sysfs doc. fix returned error code issues for writable sysfs interfaces. (use -EINVAL if input doesn't match error code) reorder the sysfs groups in code. v3: code rebase. replace scnprintf with sprintf in sysfs interfaces. update sysfs doc kernel version and date. v4: update sysfs doc date. --- Documentation/ABI/testing/sysfs-platform-dfl-fme | 75 + drivers/fpga/Makefile| 2 +- drivers/fpga/dfl-fme-error.c | 385 +++ drivers/fpga/dfl-fme-main.c | 4 + drivers/fpga/dfl-fme.h | 2 + drivers/fpga/dfl.h | 2 + 6 files changed, 469 insertions(+), 1 deletion(-) create mode 100644 drivers/fpga/dfl-fme-error.c diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme index 99cd3b2..86eef83 100644 --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme @@ -44,3 +44,78 @@ Description: Read-only. It returns socket_id to indicate which socket this FPGA belongs to, only valid for integrated solution. User only needs this information, in case standard numa node can't provide correct information. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/revision +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the revision of this global + error reporting private feature. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie0_errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-Write. Read this file for errors detected on pcie0 link. + Write this file to clear errors logged in pcie0_errors. Write + fails with -EINVAL if input parsing fails or input error code + doesn't match. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-Write. Read this file for errors detected on pcie1 link. + Write this file to clear errors logged in pcie1_errors. Write + fails with -EINVAL if input parsing fails or input error code + doesn't match. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It returns non-fatal errors detected. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. It returns catastrophic and fatal errors detected. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/inject_error +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-Write. Read this file to check errors injected. Write this + file to inject errors for testing purpose. Write fails with + -EINVAL if input parsing fails or input inject error code isn't + supported. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/errors +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get errors detected by hardware. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/first_error +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the first error detected by + hardware. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/next_error +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Read-only. Read this file to get the second error detected by + hardware. + +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/clear +Date: June 2019 +KernelVersion: 5.3 +Contact: Wu Hao +Description: Write-only. Write error code to this file to clear all errors + logged in errors, first_error and next_error. Write fails with + -EINVAL if input parsing fails or input error code doesn't + match. diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile index 7255891..4865b74 100644 --- a/drivers/fpga/Makefile +++ b/drivers/fpga/Makefile @@ -39,7 +39,7 @@ obj-$(CONFIG_FPGA_DFL_FM
[PATCH v4 04/15] fpga: dfl: fme: support 512bit data width PR
In early partial reconfiguration private feature, it only supports 32bit data width when writing data to hardware for PR. 512bit data width PR support is an important optimization for some specific solutions (e.g. XEON with FPGA integrated), it allows driver to use AVX512 instruction to improve the performance of partial reconfiguration. e.g. programming one 100MB bitstream image via this 512bit data width PR hardware only takes ~300ms, but 32bit revision requires ~3s per test result. Please note now this optimization is only done on revision 2 of this PR private feature which is only used in integrated solution that AVX512 is always supported. This revision 2 hardware doesn't support 32bit PR. Signed-off-by: Ananda Ravuri Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull --- v2: check AVX512 support using cpu_feature_enabled() fix other comments from Scott Wood v3: no change. v4: no change. --- drivers/fpga/dfl-fme-main.c | 3 ++ drivers/fpga/dfl-fme-mgr.c | 113 +--- drivers/fpga/dfl-fme-pr.c | 43 +++-- drivers/fpga/dfl-fme.h | 2 + drivers/fpga/dfl.h | 5 ++ 5 files changed, 135 insertions(+), 31 deletions(-) diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c index 086ad24..076d74f 100644 --- a/drivers/fpga/dfl-fme-main.c +++ b/drivers/fpga/dfl-fme-main.c @@ -21,6 +21,8 @@ #include "dfl.h" #include "dfl-fme.h" +#define DRV_VERSION"0.8" + static ssize_t ports_num_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -277,3 +279,4 @@ static int fme_remove(struct platform_device *pdev) MODULE_AUTHOR("Intel Corporation"); MODULE_LICENSE("GPL v2"); MODULE_ALIAS("platform:dfl-fme"); +MODULE_VERSION(DRV_VERSION); diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c index b3f7eee..d1a4ba5 100644 --- a/drivers/fpga/dfl-fme-mgr.c +++ b/drivers/fpga/dfl-fme-mgr.c @@ -22,14 +22,18 @@ #include #include +#include "dfl.h" #include "dfl-fme-pr.h" +#define DRV_VERSION"0.8" + /* FME Partial Reconfiguration Sub Feature Register Set */ #define FME_PR_DFH 0x0 #define FME_PR_CTRL0x8 #define FME_PR_STS 0x10 #define FME_PR_DATA0x18 #define FME_PR_ERR 0x20 +#define FME_PR_512_DATA0x40 /* Data Register for 512bit datawidth PR */ #define FME_PR_INTFC_ID_L 0xA8 #define FME_PR_INTFC_ID_H 0xB0 @@ -67,8 +71,43 @@ #define PR_WAIT_TIMEOUT 800 #define PR_HOST_STATUS_IDLE0 +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512) + +#include +#include + +static inline int is_cpu_avx512_enabled(void) +{ + return cpu_feature_enabled(X86_FEATURE_AVX512F); +} + +static inline void copy512(const void *src, void __iomem *dst) +{ + kernel_fpu_begin(); + + asm volatile("vmovdqu64 (%0), %%zmm0;" +"vmovntdq %%zmm0, (%1);" +: +: "r"(src), "r"(dst) +: "memory"); + + kernel_fpu_end(); +} +#else +static inline int is_cpu_avx512_enabled(void) +{ + return 0; +} + +static inline void copy512(const void *src, void __iomem *dst) +{ + WARN_ON_ONCE(1); +} +#endif + struct fme_mgr_priv { void __iomem *ioaddr; + unsigned int pr_datawidth; u64 pr_error; }; @@ -169,7 +208,7 @@ static int fme_mgr_write(struct fpga_manager *mgr, struct fme_mgr_priv *priv = mgr->priv; void __iomem *fme_pr = priv->ioaddr; u64 pr_ctrl, pr_status, pr_data; - int delay = 0, pr_credit, i = 0; + int ret = 0, delay = 0, pr_credit; dev_dbg(dev, "start request\n"); @@ -181,9 +220,9 @@ static int fme_mgr_write(struct fpga_manager *mgr, /* * driver can push data to PR hardware using PR_DATA register once HW -* has enough pr_credit (> 1), pr_credit reduces one for every 32bit -* pr data write to PR_DATA register. If pr_credit <= 1, driver needs -* to wait for enough pr_credit from hardware by polling. +* has enough pr_credit (> 1), pr_credit reduces one for every pr data +* width write to PR_DATA register. If pr_credit <= 1, driver needs to +* wait for enough pr_credit from hardware by polling. */ pr_status = readq(fme_pr + FME_PR_STS); pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status); @@ -192,7 +231,8 @@ static int fme_mgr_write(struct fpga_manager *mgr, while (pr_credit <= 1) { if (delay++ > PR_WAIT_TIMEOUT) { dev_err(dev, "PR_CREDIT timeout\n"); - return -ETIMEDOUT; + ret = -ETIMEDOUT; + goto done; } udelay(1); @@ -200,21 +240,27 @@ static int fme_mgr_write(struct fpga_manager *mgr
[PATCH v4 06/15] fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support.
In order to support virtualization usage via PCIe SRIOV, this patch adds two ioctls under FPGA Management Engine (FME) to release and assign back the port device. In order to safely turn Port from PF into VF and enable PCIe SRIOV, it requires user to invoke this PORT_RELEASE ioctl to release port firstly to remove userspace interfaces, and then configure the PF/VF access register in FME. After disable SRIOV, it requires user to invoke this PORT_ASSIGN ioctl to attach the port back to PF. Ioctl interfaces: * DFL_FPGA_FME_PORT_RELEASE Release platform device of given port, it deletes port platform device to remove related userspace interfaces on PF, then configures PF/VF access mode to VF. * DFL_FPGA_FME_PORT_ASSIGN Assign platform device of given port back to PF, it configures PF/VF access mode to PF, then adds port platform device back to re-enable related userspace interfaces on PF. Signed-off-by: Zhang Yi Z Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Acked-by: Moritz Fischer --- v3: code rebase. v4: no change. --- drivers/fpga/dfl-fme-main.c | 54 + drivers/fpga/dfl.c| 107 +- drivers/fpga/dfl.h| 10 include/uapi/linux/fpga-dfl.h | 32 + 4 files changed, 191 insertions(+), 12 deletions(-) diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c index 076d74f..8b2a337 100644 --- a/drivers/fpga/dfl-fme-main.c +++ b/drivers/fpga/dfl-fme-main.c @@ -16,6 +16,7 @@ #include #include +#include #include #include "dfl.h" @@ -105,9 +106,62 @@ static void fme_hdr_uinit(struct platform_device *pdev, sysfs_remove_files(&pdev->dev.kobj, fme_hdr_attrs); } +static long fme_hdr_ioctl_release_port(struct dfl_feature_platform_data *pdata, + void __user *arg) +{ + struct dfl_fpga_cdev *cdev = pdata->dfl_cdev; + struct dfl_fpga_fme_port_release release; + unsigned long minsz; + + minsz = offsetofend(struct dfl_fpga_fme_port_release, port_id); + + if (copy_from_user(&release, arg, minsz)) + return -EFAULT; + + if (release.argsz < minsz || release.flags) + return -EINVAL; + + return dfl_fpga_cdev_config_port(cdev, release.port_id, true); +} + +static long fme_hdr_ioctl_assign_port(struct dfl_feature_platform_data *pdata, + void __user *arg) +{ + struct dfl_fpga_cdev *cdev = pdata->dfl_cdev; + struct dfl_fpga_fme_port_assign assign; + unsigned long minsz; + + minsz = offsetofend(struct dfl_fpga_fme_port_assign, port_id); + + if (copy_from_user(&assign, arg, minsz)) + return -EFAULT; + + if (assign.argsz < minsz || assign.flags) + return -EINVAL; + + return dfl_fpga_cdev_config_port(cdev, assign.port_id, false); +} + +static long fme_hdr_ioctl(struct platform_device *pdev, + struct dfl_feature *feature, + unsigned int cmd, unsigned long arg) +{ + struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev); + + switch (cmd) { + case DFL_FPGA_FME_PORT_RELEASE: + return fme_hdr_ioctl_release_port(pdata, (void __user *)arg); + case DFL_FPGA_FME_PORT_ASSIGN: + return fme_hdr_ioctl_assign_port(pdata, (void __user *)arg); + } + + return -ENODEV; +} + static const struct dfl_feature_ops fme_hdr_ops = { .init = fme_hdr_init, .uinit = fme_hdr_uinit, + .ioctl = fme_hdr_ioctl, }; static struct dfl_feature_driver fme_feature_drvs[] = { diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c index 4b66aaa..308c808 100644 --- a/drivers/fpga/dfl.c +++ b/drivers/fpga/dfl.c @@ -231,16 +231,20 @@ void dfl_fpga_port_ops_del(struct dfl_fpga_port_ops *ops) */ int dfl_fpga_check_port_id(struct platform_device *pdev, void *pport_id) { - struct dfl_fpga_port_ops *port_ops = dfl_fpga_port_ops_get(pdev); - int port_id; + struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev); + struct dfl_fpga_port_ops *port_ops; + + if (pdata->id != FEATURE_DEV_ID_UNUSED) + return pdata->id == *(int *)pport_id; + port_ops = dfl_fpga_port_ops_get(pdev); if (!port_ops || !port_ops->get_id) return 0; - port_id = port_ops->get_id(pdev); + pdata->id = port_ops->get_id(pdev); dfl_fpga_port_ops_put(port_ops); - return port_id == *(int *)pport_id; + return pdata->id == *(int *)pport_id; } EXPORT_SYMBOL_GPL(dfl_fpga_check_port_id); @@ -474,6 +478,7 @@ static int build_info_commit_dev(struct build_feature_devs_info *binfo) pdata->dev = fdev; pdata->num = binfo->feature_num; pdata->dfl_cdev = binfo->cdev; + pdata->id = FEATURE_DEV_ID_UNUSED; mutex_init(&pdata->lock)
[PATCH v4 03/15] fpga: dfl: fme: align PR buffer size per PR datawidth
Current driver checks if input bitstream file size is aligned or not per PR data width (default 32bits). It requires one additional step for end user when they generate the bitstream file, padding extra zeros to bitstream file to align its size per PR data width, but they don't have to as hardware will drop extra padding bytes automatically. In order to simplify the user steps, this patch aligns PR buffer size per PR data width in driver, to allow user to pass unaligned size bitstream files to driver. Signed-off-by: Xu Yilun Signed-off-by: Wu Hao Acked-by: Alan Tull Acked-by: Moritz Fischer --- drivers/fpga/dfl-fme-pr.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c index 6ec0f09..3c71dc3 100644 --- a/drivers/fpga/dfl-fme-pr.c +++ b/drivers/fpga/dfl-fme-pr.c @@ -74,6 +74,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg) struct dfl_fme *fme; unsigned long minsz; void *buf = NULL; + size_t length; int ret = 0; u64 v; @@ -85,9 +86,6 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg) if (port_pr.argsz < minsz || port_pr.flags) return -EINVAL; - if (!IS_ALIGNED(port_pr.buffer_size, 4)) - return -EINVAL; - /* get fme header region */ fme_hdr = dfl_get_feature_ioaddr_by_id(&pdev->dev, FME_FEATURE_ID_HEADER); @@ -103,7 +101,13 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg) port_pr.buffer_size)) return -EFAULT; - buf = vmalloc(port_pr.buffer_size); + /* +* align PR buffer per PR bandwidth, as HW ignores the extra padding +* data automatically. +*/ + length = ALIGN(port_pr.buffer_size, 4); + + buf = vmalloc(length); if (!buf) return -ENOMEM; @@ -140,7 +144,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg) fpga_image_info_free(region->info); info->buf = buf; - info->count = port_pr.buffer_size; + info->count = length; info->region_id = port_pr.port_id; region->info = info; -- 1.8.3.1
Re: [PATCH V2 2/5] cpufreq: Replace few CPUFREQ_CONST_LOOPS checks with has_target()
On 20-06-19, 08:35, Viresh Kumar wrote: > > CPUFREQ_CONST_LOOPS was introduced in a very old commit from pre-2.6 > > kernel release commit 6a4a93f9c0d5 ("[CPUFREQ] Fix 'out of sync' > > issue"). > > > > Probably the initial idea was to just avoid these checks for set_policy > > type drivers and then things got changed over the years. And it is very > > unclear why these checks are there at all. > > > > Replace the CPUFREQ_CONST_LOOPS check with has_target(), which makes > > more sense now. > > > > cpufreq_notify_transition() is only called for has_target() type driver > > and not for set_policy type, and the check is simply redundant. Remove > > it as well. > > > > Also remove () around freq comparison statement as they aren't required > > and checkpatch also warns for them. > > > > Signed-off-by: Viresh Kumar > > --- > > drivers/cpufreq/cpufreq.c | 13 + > > 1 file changed, 5 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > index 54befd775bd6..41ac701e324f 100644 > > --- a/drivers/cpufreq/cpufreq.c > > +++ b/drivers/cpufreq/cpufreq.c > > @@ -359,12 +359,10 @@ static void cpufreq_notify_transition(struct > > cpufreq_policy *policy, > > * which is not equal to what the cpufreq core thinks is > > * "old frequency". > > */ > > - if (!(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) { > > - if (policy->cur && (policy->cur != freqs->old)) { > > - pr_debug("Warning: CPU frequency is %u, cpufreq > > assumed %u kHz\n", > > -freqs->old, policy->cur); > > - freqs->old = policy->cur; > > - } > > + if (policy->cur && policy->cur != freqs->old) { > > + pr_debug("Warning: CPU frequency is %u, cpufreq assumed > > %u kHz\n", > > +freqs->old, policy->cur); > > + freqs->old = policy->cur; > > } > > > > srcu_notifier_call_chain(&cpufreq_transition_notifier_list, > > @@ -1618,8 +1616,7 @@ static unsigned int __cpufreq_get(struct > > cpufreq_policy *policy) > > if (policy->fast_switch_enabled) > > return ret_freq; > > > > - if (ret_freq && policy->cur && > > - !(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) { > > + if (has_target() && ret_freq && policy->cur) { > > /* verify no discrepancy between actual and > > saved value exists */ > > if (unlikely(ret_freq != policy->cur)) { @Rafael: Here are your comments from the IRC exchange we had yesterday: > : > > so the problem is that, because of the CPUFREQ_CONST_LOOPS check in > __cpufreq_get(), it almost never does the cpufreq_out_of_sync() thing > now. Because many drivers set CPUFREQ_CONST_LOOPS most of the time, > some of them even unconditionally. This patch changes the code that > runs very rarely into code that runs relatively often. Right, we will do the frequency verification on has_target() platforms with CPUFREQ_CONST_LOOPS set after this patch. But why is it the wrong thing to do ? What we do here is that we verify that the cached value of current frequency is same as the real frequency the hardware is running at. It makes sense to not do this check for setpolicy type drivers as the cpufreq core isn't always aware of what the driver will end up doing with the frequency and so no verification. But for has_target() type drivers, cpufreq core caches the value with it and it should check it to make sure everything is fine. I don't see a correlation with CPUFREQ_CONST_LOOPS flag here, that's it. Either we do this verification or we don't, but there is no reason (as per my understanding) of skipping it using this flag. So if you look at the commit I pointed in the history git [1], it does two things: - It adds the verification code (which is quite similar today as well). - And it sets the CPUFREQ_CONST_LOOPS flag only for setpolicy drivers, rightly so. The problem happened when we started to use CPUFREQ_CONST_LOOPS for constant loops-per-jiffy thing as well and many has_target() drivers started using the same flag and unknowingly skipped the verification of frequency. So, I think the current code is doing the wrong thing by skipping the verification using CPUFREQ_CONST_LOOPS flag. -- viresh [1] https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=6a4a93f9c0d51b5f4ac1bd3efab53e43584330dd
Re: [GIT PULL] fixes for v5.2-rc7
On Wed, Jun 26, 2019 at 04:07:33PM +0200, Christian Brauner wrote: > Hi Linus, > > This pull request removes the validation of the pidfd return argument if > CLONE_PIDFD is specified: > > The following changes since commit 4b972a01a7da614b4796475f933094751a295a2f: > > Linux 5.2-rc6 (2019-06-22 16:01:36 -0700) > > are available in the Git repository at: > > g...@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux > tags/for-linus-20190626 > > for you to fetch changes up to bee19cd8f241ab3cd1bf79e03884e5371f9ef514: > > samples: make pidfd-metadata fail gracefully on older kernels (2019-06-24 > 15:55:50 +0200) > > Userspace tools and libraries such as strace or glibc need a cheap and > reliable way to tell whether CLONE_PIDFD is supported. > The easiest way is to pass an invalid fd value in the return argument, > perform the syscall and verify the value in the return argument has been > changed to a valid fd. > > However, if CLONE_PIDFD is specified we currently check if pidfd == 0 and > return EINVAL if not. > > The check for pidfd == 0 was originally added to enable us to abuse the > return argument for passing additional flags along with CLONE_PIDFD in the > future. > > However, extending legacy clone this way would be a terrible idea and with > clone3 on the horizon and the ability to reuse CLONE_DETACHED with > CLONE_PIDFD there's no real need for this clutch. So remove the pidfd == 0 > check and help userspace out. > > Please consider pulling these changes from the signed for-linus-20190626 tag. Al has another patch that removes the use of anon_inode_getfd() for the sake of anon_inode_getfile() + fd_install() to avoid the use of ksys_close(). I'll put it in my fixes branch and send a new PR with all those fixes in a few hours. Thanks! Christian
Re: [RFC PATCH 0/5] Add CONFIG symbol as module attribute
On Wed, Jun 26, 2019 at 03:21:08PM -0700, Luis Chamberlain wrote: > On Tue, Feb 5, 2019 at 2:07 PM Luis Chamberlain wrote: > > In lieu of no Luke Skywalker, if you will, for a large kconfig revamp > > on this, I'm inclined to believe *at least* having some kconfig_symb > > exposed for some modules is better than nothing. Christoph are you > > totally opposed to this effort until we get a non-reverse engineered > > effort in place? It just seems like an extraordinary amount of work > > and I'm not quite sure who's volunteering to do it. > > > > Other stakeholders may benefit from at least having some config --> > > module mapping for now. Not just backports or building slimmer > > kernels. > > Christoph, *poke* Yes, I'm still totally opposed to a half-backed hack like this.
Re: [PATCH v4 11/12] drm/virtio: switch from ttm to gem shmem helpers
. On Wed, Jun 19, 2019 at 11:08 PM Gerd Hoffmann wrote: > > virtio-gpu basically needs a sg_table for the bo, to tell the host where > the backing pages for the object are. So the gem shmem helpers are a > perfect fit. Some drm_gem_object_funcs need thin wrappers to update the > host state, but otherwise the helpers handle everything just fine. > > Once the fencing was sorted the switch was surprisingly easy and for the > most part just removing the ttm code. > > v4: fix drm_gem_object_funcs name. > > Signed-off-by: Gerd Hoffmann > Acked-by: Daniel Vetter > --- > drivers/gpu/drm/virtio/virtgpu_drv.h| 52 +--- > drivers/gpu/drm/virtio/virtgpu_drv.c| 20 +- > drivers/gpu/drm/virtio/virtgpu_gem.c| 16 +- > drivers/gpu/drm/virtio/virtgpu_ioctl.c | 19 +- > drivers/gpu/drm/virtio/virtgpu_kms.c| 9 - > drivers/gpu/drm/virtio/virtgpu_object.c | 148 > drivers/gpu/drm/virtio/virtgpu_prime.c | 37 --- > drivers/gpu/drm/virtio/virtgpu_ttm.c| 304 > drivers/gpu/drm/virtio/virtgpu_vq.c | 24 +- > drivers/gpu/drm/virtio/Kconfig | 2 +- > drivers/gpu/drm/virtio/Makefile | 2 +- > 11 files changed, 82 insertions(+), 551 deletions(-) > delete mode 100644 drivers/gpu/drm/virtio/virtgpu_ttm.c > > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h > b/drivers/gpu/drm/virtio/virtgpu_drv.h > index c23f4016df39..1d589de34449 100644 > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h > @@ -33,14 +33,11 @@ > > #include > #include > +#include > #include > #include > #include > #include > -#include > -#include > -#include > -#include > > #define DRIVER_NAME "virtio_gpu" > #define DRIVER_DESC "virtio GPU" > @@ -68,21 +65,16 @@ struct virtio_gpu_object_params { > }; > > struct virtio_gpu_object { > - struct drm_gem_object gem_base; > + struct drm_gem_shmem_object base; > uint32_t hw_res_handle; > > struct sg_table *pages; > uint32_t mapped; > - void *vmap; > bool dumb; > - struct ttm_placeplacement_code; > - struct ttm_placementplacement; > - struct ttm_buffer_objecttbo; > - struct ttm_bo_kmap_obj kmap; > bool created; > }; > #define gem_to_virtio_gpu_obj(gobj) \ > - container_of((gobj), struct virtio_gpu_object, gem_base) > + container_of((gobj), struct virtio_gpu_object, base.base) > > struct virtio_gpu_object_array { > u32 nents; > @@ -152,10 +144,6 @@ struct virtio_gpu_framebuffer { > #define to_virtio_gpu_framebuffer(x) \ > container_of(x, struct virtio_gpu_framebuffer, base) > > -struct virtio_gpu_mman { > - struct ttm_bo_devicebdev; > -}; > - > struct virtio_gpu_queue { > struct virtqueue *vq; > spinlock_t qlock; > @@ -184,8 +172,6 @@ struct virtio_gpu_device { > > struct virtio_device *vdev; > > - struct virtio_gpu_mman mman; > - > struct virtio_gpu_output outputs[VIRTIO_GPU_MAX_SCANOUTS]; > uint32_t num_scanouts; > > @@ -349,11 +335,6 @@ struct drm_plane *virtio_gpu_plane_init(struct > virtio_gpu_device *vgdev, > enum drm_plane_type type, > int index); > > -/* virtio_gpu_ttm.c */ > -int virtio_gpu_ttm_init(struct virtio_gpu_device *vgdev); > -void virtio_gpu_ttm_fini(struct virtio_gpu_device *vgdev); > -int virtio_gpu_mmap(struct file *filp, struct vm_area_struct *vma); > - > /* virtio_gpu_fence.c */ > bool virtio_fence_signaled(struct dma_fence *f); > struct virtio_gpu_fence *virtio_gpu_fence_alloc( > @@ -365,58 +346,47 @@ void virtio_gpu_fence_event_process(struct > virtio_gpu_device *vdev, > u64 last_seq); > > /* virtio_gpu_object */ > +struct drm_gem_object *virtio_gpu_create_object(struct drm_device *dev, > + size_t size); > int virtio_gpu_object_create(struct virtio_gpu_device *vgdev, > struct virtio_gpu_object_params *params, > struct virtio_gpu_object **bo_ptr, > struct virtio_gpu_fence *fence); > -void virtio_gpu_object_kunmap(struct virtio_gpu_object *bo); > -int virtio_gpu_object_kmap(struct virtio_gpu_object *bo); > -int virtio_gpu_object_get_sg_table(struct virtio_gpu_device *qdev, > - struct virtio_gpu_object *bo); > -void virtio_gpu_object_free_sg_table(struct virtio_gpu_object *bo); > > /* virtgpu_prime.c */ > -struct sg_table *virtgpu_gem_prime_get_sg_table(struct drm_gem_object *obj); > struct drm_gem_object *virtgpu_gem_prime_import_sg_table( > struct drm_device *dev, struct dma_buf_attachment *attach, > struct sg_table *sgt); > -void *virtgpu_gem_prime_vmap(struct drm_gem_object *obj); > -void virtgpu_gem_prime_vunmap(struct dr
Re: [PATCH] powerpc/64s/radix: Define arch_ioremap_p4d_supported()
Anshuman Khandual writes: > Recent core ioremap changes require HAVE_ARCH_HUGE_VMAP subscribing archs > provide arch_ioremap_p4d_supported() failing which will result in a build > failure like the following. > > ld: lib/ioremap.o: in function `.ioremap_huge_init': > ioremap.c:(.init.text+0x3c): undefined reference to > `.arch_ioremap_p4d_supported' > > This defines a stub implementation for arch_ioremap_p4d_supported() keeping > it disabled for now to fix the build problem. The easiest option is for this to be folded into your patch that creates the requirement for arch_ioremap_p4d_supported(). Andrew might do that for you, or you could send a v2. This looks fine from a powerpc POV: Acked-by: Michael Ellerman cheers > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Cc: "Aneesh Kumar K.V" > Cc: Nicholas Piggin > Cc: Andrew Morton > Cc: Stephen Rothwell > Cc: linuxppc-...@lists.ozlabs.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-n...@vger.kernel.org > > Signed-off-by: Anshuman Khandual > --- > This has been just build tested and fixes the problem reported earlier. > > arch/powerpc/mm/book3s64/radix_pgtable.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c > b/arch/powerpc/mm/book3s64/radix_pgtable.c > index 8904aa1..c81da88 100644 > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c > @@ -1124,6 +1124,11 @@ void radix__ptep_modify_prot_commit(struct > vm_area_struct *vma, > set_pte_at(mm, addr, ptep, pte); > } > > +int __init arch_ioremap_p4d_supported(void) > +{ > + return 0; > +} > + > int __init arch_ioremap_pud_supported(void) > { > /* HPT does not cope with large pages in the vmalloc area */ > -- > 2.7.4
[PATCH] mm/gup: Remove some BUG_ONs from get_gate_page()
If we end up without a PGD or PUD entry backing the gate area, don't BUG -- just fail gracefully. It's not entirely implausible that this could happen some day on x86. It doesn't right now even with an execute-only emulated vsyscall page because the fixmap shares the PUD, but the core mm code shouldn't rely on that particular detail to avoid OOPSing. Signed-off-by: Andy Lutomirski --- mm/gup.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index ddde097cf9e4..9883b598fd6f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -585,11 +585,14 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, pgd = pgd_offset_k(address); else pgd = pgd_offset_gate(mm, address); - BUG_ON(pgd_none(*pgd)); + if (pgd_none(*pgd)) + return -EFAULT; p4d = p4d_offset(pgd, address); - BUG_ON(p4d_none(*p4d)); + if (p4d_none(*p4d)) + return -EFAULT; pud = pud_offset(p4d, address); - BUG_ON(pud_none(*pud)); + if (pud_none(*pud)) + return -EFAULT; pmd = pmd_offset(pud, address); if (!pmd_present(*pmd)) return -EFAULT; -- 2.21.0
[PATCH] riscv: Remove gate area stubs
Since commit a6c19dfe3994 ("arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area"), which predates riscv's inclusion in Linux by almost three years, the default behavior wrt the gate area is sane. Remove riscv's gate area stubs. Cc: Palmer Dabbelt Cc: Albert Ou Cc: linux-ri...@lists.infradead.org Signed-off-by: Andy Lutomirski --- arch/riscv/include/asm/page.h | 4 arch/riscv/kernel/vdso.c | 19 --- 2 files changed, 23 deletions(-) diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h index 8ddb6c7fedac..d3e5f6c0c21a 100644 --- a/arch/riscv/include/asm/page.h +++ b/arch/riscv/include/asm/page.h @@ -115,8 +115,4 @@ extern unsigned long min_low_pfn; #include #include -/* vDSO support */ -/* We do define AT_SYSINFO_EHDR but don't use the gate mechanism */ -#define __HAVE_ARCH_GATE_AREA - #endif /* _ASM_RISCV_PAGE_H */ diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c index a0084c36d270..c9c21e0d5641 100644 --- a/arch/riscv/kernel/vdso.c +++ b/arch/riscv/kernel/vdso.c @@ -92,22 +92,3 @@ const char *arch_vma_name(struct vm_area_struct *vma) return "[vdso]"; return NULL; } - -/* - * Function stubs to prevent linker errors when AT_SYSINFO_EHDR is defined - */ - -int in_gate_area_no_mm(unsigned long addr) -{ - return 0; -} - -int in_gate_area(struct mm_struct *mm, unsigned long addr) -{ - return 0; -} - -struct vm_area_struct *get_gate_vma(struct mm_struct *mm) -{ - return NULL; -} -- 2.21.0
[PATCH v2 2/8] x86/vsyscall: Add a new vsyscall=xonly mode
With vsyscall emulation on, we still expose a readable vsyscall page that contains syscall instructions that validly implement the vsyscalls. We need this because certain dynamic binary instrumentation tools attempt to read the call targets of call instructions in the instrumented code. If the instrumented code uses vsyscalls, then the vsyscal page needs to contain readable code. Unfortunately, leaving readable memory at a deterministic address can be used to help various ASLR bypasses, so we gain some hardening value if we disallow vsyscall reads. Given how rarely the vsyscall page needs to be readable, add a mechanism to make the vsyscall page be execute only. Cc: Kees Cook Cc: Borislav Petkov Cc: Kernel Hardening Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andy Lutomirski --- .../admin-guide/kernel-parameters.txt | 7 +++- arch/x86/Kconfig | 33 ++- arch/x86/entry/vsyscall/vsyscall_64.c | 16 +++-- 3 files changed, 44 insertions(+), 12 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0082d1e56999..be8c3a680afa 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5100,7 +5100,12 @@ targets for exploits that can control RIP. emulate [default] Vsyscalls turn into traps and are - emulated reasonably safely. + emulated reasonably safely. The vsyscall + page is readable. + + xonly Vsyscalls turn into traps and are + emulated reasonably safely. The vsyscall + page is not readable. noneVsyscalls don't work at all. This makes them quite hard to use for exploits but diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2bbbd4d1ba31..0182d2c67590 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2293,23 +2293,38 @@ choice it can be used to assist security vulnerability exploitation. This setting can be changed at boot time via the kernel command - line parameter vsyscall=[emulate|none]. + line parameter vsyscall=[emulate|xonly|none]. On a system with recent enough glibc (2.14 or newer) and no static binaries, you can say None without a performance penalty to improve security. - If unsure, select "Emulate". + If unsure, select "Emulate execution only". config LEGACY_VSYSCALL_EMULATE - bool "Emulate" + bool "Full emulation" help - The kernel traps and emulates calls into the fixed - vsyscall address mapping. This makes the mapping - non-executable, but it still contains known contents, - which could be used in certain rare security vulnerability - exploits. This configuration is recommended when userspace - still uses the vsyscall area. + The kernel traps and emulates calls into the fixed vsyscall + address mapping. This makes the mapping non-executable, but + it still contains readable known contents, which could be + used in certain rare security vulnerability exploits. This + configuration is recommended when using legacy userspace + that still uses vsyscalls along with legacy binary + instrumentation tools that require code to be readable. + + An example of this type of legacy userspace is running + Pin on an old binary that still uses vsyscalls. + + config LEGACY_VSYSCALL_XONLY + bool "Emulate execution only" + help + The kernel traps and emulates calls into the fixed vsyscall + address mapping and does not allow reads. This + configuration is recommended when userspace might use the + legacy vsyscall area but support for legacy binary + instrumentation of legacy code is not needed. It mitigates + certain uses of the vsyscall area as an ASLR-bypassing + buffer. config LEGACY_VSYSCALL_NONE bool "None" diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index d9d81ad7a400..fedd7628f3a6 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -42,9 +42,11 @@ #define CREATE_TRACE_POINTS #include "vsyscall_trace.h" -static enum { EMULATE, NONE } vsyscall_mode = +static enum { EMULATE, XONLY, NONE } vsyscall_mode = #ifdef CONFIG
[PATCH v2 3/8] x86/vsyscall: Show something useful on a read fault
Signed-off-by: Andy Lutomirski --- arch/x86/entry/vsyscall/vsyscall_64.c | 19 ++- arch/x86/include/asm/vsyscall.h | 6 -- arch/x86/mm/fault.c | 11 +-- 3 files changed, 27 insertions(+), 9 deletions(-) diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index fedd7628f3a6..9c58ab807aeb 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -117,7 +117,8 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size) } } -bool emulate_vsyscall(struct pt_regs *regs, unsigned long address) +bool emulate_vsyscall(unsigned long error_code, + struct pt_regs *regs, unsigned long address) { struct task_struct *tsk; unsigned long caller; @@ -126,6 +127,22 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address) long ret; unsigned long orig_dx; + /* Write faults or kernel-privilege faults never get fixed up. */ + if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER) + return false; + + if (!(error_code & X86_PF_INSTR)) { + /* Failed vsyscall read */ + if (vsyscall_mode == EMULATE) + return false; + + /* +* User code tried and failed to read the vsyscall page. +*/ + warn_bad_vsyscall(KERN_INFO, regs, "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround"); + return false; + } + /* * No point in checking CS -- the only way to get here is a user mode * trap to a high address, which means that we're in 64-bit user code. diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h index b986b2ca688a..ab60a71a8dcb 100644 --- a/arch/x86/include/asm/vsyscall.h +++ b/arch/x86/include/asm/vsyscall.h @@ -13,10 +13,12 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root); * Called on instruction fetch fault in vsyscall page. * Returns true if handled. */ -extern bool emulate_vsyscall(struct pt_regs *regs, unsigned long address); +extern bool emulate_vsyscall(unsigned long error_code, +struct pt_regs *regs, unsigned long address); #else static inline void map_vsyscall(void) {} -static inline bool emulate_vsyscall(struct pt_regs *regs, unsigned long address) +static inline bool emulate_vsyscall(unsigned long error_code, + struct pt_regs *regs, unsigned long address) { return false; } diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 46df4c6aae46..288a5462076f 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1369,16 +1369,15 @@ void do_user_addr_fault(struct pt_regs *regs, #ifdef CONFIG_X86_64 /* -* Instruction fetch faults in the vsyscall page might need -* emulation. The vsyscall page is at a high address -* (>PAGE_OFFSET), but is considered to be part of the user -* address space. +* Faults in the vsyscall page might need emulation. The +* vsyscall page is at a high address (>PAGE_OFFSET), but is +* considered to be part of the user address space. * * The vsyscall page does not have a "real" VMA, so do this * emulation before we go searching for VMAs. */ - if ((hw_error_code & X86_PF_INSTR) && is_vsyscall_vaddr(address)) { - if (emulate_vsyscall(regs, address)) + if (is_vsyscall_vaddr(address)) { + if (emulate_vsyscall(hw_error_code, regs, address)) return; } #endif -- 2.21.0
[PATCH v2 4/8] x86/vsyscall: Document odd SIGSEGV error code for vsyscalls
Even if vsyscall=none, we report uer page faults on the vsyscall page as though the PROT bit in the error code was set. Add a comment explaining why this is probably okay and display the value in the test case. While we're at it, explain why our behavior is correct with respect to PKRU. This also modifies the selftest to print the odd error code so that you can run the selftest and see that the behavior is odd. If anyone really cares about more accurate emulation, we could change the behavior. Cc: Kees Cook Cc: Borislav Petkov Cc: Kernel Hardening Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andy Lutomirski --- arch/x86/mm/fault.c | 7 +++ tools/testing/selftests/x86/test_vsyscall.c | 9 - 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 288a5462076f..58e4f1f00bbc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -710,6 +710,10 @@ static void set_signal_archinfo(unsigned long address, * To avoid leaking information about the kernel page * table layout, pretend that user-mode accesses to * kernel addresses are always protection faults. +* +* NB: This means that failed vsyscalls with vsyscall=none +* will have the PROT bit. This doesn't leak any +* information and does not appear to cause any problems. */ if (address >= TASK_SIZE_MAX) error_code |= X86_PF_PROT; @@ -1375,6 +1379,9 @@ void do_user_addr_fault(struct pt_regs *regs, * * The vsyscall page does not have a "real" VMA, so do this * emulation before we go searching for VMAs. +* +* PKRU never rejects instruction fetches, so we don't need +* to consider the PF_PK bit. */ if (is_vsyscall_vaddr(address)) { if (emulate_vsyscall(hw_error_code, regs, address)) diff --git a/tools/testing/selftests/x86/test_vsyscall.c b/tools/testing/selftests/x86/test_vsyscall.c index 0b4f1cc2291c..4c9a8d76dba0 100644 --- a/tools/testing/selftests/x86/test_vsyscall.c +++ b/tools/testing/selftests/x86/test_vsyscall.c @@ -183,9 +183,13 @@ static inline long sys_getcpu(unsigned * cpu, unsigned * node, } static jmp_buf jmpbuf; +static volatile unsigned long segv_err; static void sigsegv(int sig, siginfo_t *info, void *ctx_void) { + ucontext_t *ctx = (ucontext_t *)ctx_void; + + segv_err = ctx->uc_mcontext.gregs[REG_ERR]; siglongjmp(jmpbuf, 1); } @@ -416,8 +420,11 @@ static int test_vsys_r(void) } else if (!can_read && should_read_vsyscall) { printf("[FAIL]\tWe don't have read access, but we should\n"); return 1; + } else if (can_read) { + printf("[OK]\tWe have read access\n"); } else { - printf("[OK]\tgot expected result\n"); + printf("[OK]\tWe do not have read access: #PF(0x%lx)\n", + segv_err); } #endif -- 2.21.0
[PATCH v2 5/8] selftests/x86/vsyscall: Verify that vsyscall=none blocks execution
If vsyscall=none accidentally still allowed vsyscalls, the test wouldn't fail. Fix it. Cc: Kees Cook Cc: Borislav Petkov Cc: Kernel Hardening Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andy Lutomirski --- tools/testing/selftests/x86/test_vsyscall.c | 76 ++--- 1 file changed, 52 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/x86/test_vsyscall.c b/tools/testing/selftests/x86/test_vsyscall.c index 4c9a8d76dba0..34a1d35995ef 100644 --- a/tools/testing/selftests/x86/test_vsyscall.c +++ b/tools/testing/selftests/x86/test_vsyscall.c @@ -49,21 +49,21 @@ static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), } /* vsyscalls and vDSO */ -bool should_read_vsyscall = false; +bool vsyscall_map_r = false, vsyscall_map_x = false; typedef long (*gtod_t)(struct timeval *tv, struct timezone *tz); -gtod_t vgtod = (gtod_t)VSYS(0xff60); +const gtod_t vgtod = (gtod_t)VSYS(0xff60); gtod_t vdso_gtod; typedef int (*vgettime_t)(clockid_t, struct timespec *); vgettime_t vdso_gettime; typedef long (*time_func_t)(time_t *t); -time_func_t vtime = (time_func_t)VSYS(0xff600400); +const time_func_t vtime = (time_func_t)VSYS(0xff600400); time_func_t vdso_time; typedef long (*getcpu_t)(unsigned *, unsigned *, void *); -getcpu_t vgetcpu = (getcpu_t)VSYS(0xff600800); +const getcpu_t vgetcpu = (getcpu_t)VSYS(0xff600800); getcpu_t vdso_getcpu; static void init_vdso(void) @@ -107,7 +107,7 @@ static int init_vsys(void) maps = fopen("/proc/self/maps", "r"); if (!maps) { printf("[WARN]\tCould not open /proc/self/maps -- assuming vsyscall is r-x\n"); - should_read_vsyscall = true; + vsyscall_map_r = true; return 0; } @@ -133,12 +133,8 @@ static int init_vsys(void) } printf("\tvsyscall permissions are %c-%c\n", r, x); - should_read_vsyscall = (r == 'r'); - if (x != 'x') { - vgtod = NULL; - vtime = NULL; - vgetcpu = NULL; - } + vsyscall_map_r = (r == 'r'); + vsyscall_map_x = (x == 'x'); found = true; break; @@ -148,10 +144,8 @@ static int init_vsys(void) if (!found) { printf("\tno vsyscall map in /proc/self/maps\n"); - should_read_vsyscall = false; - vgtod = NULL; - vtime = NULL; - vgetcpu = NULL; + vsyscall_map_r = false; + vsyscall_map_x = false; } return nerrs; @@ -242,7 +236,7 @@ static int test_gtod(void) err(1, "syscall gettimeofday"); if (vdso_gtod) ret_vdso = vdso_gtod(&tv_vdso, &tz_vdso); - if (vgtod) + if (vsyscall_map_x) ret_vsys = vgtod(&tv_vsys, &tz_vsys); if (sys_gtod(&tv_sys2, &tz_sys) != 0) err(1, "syscall gettimeofday"); @@ -256,7 +250,7 @@ static int test_gtod(void) } } - if (vgtod) { + if (vsyscall_map_x) { if (ret_vsys == 0) { nerrs += check_gtod(&tv_sys1, &tv_sys2, &tz_sys, "vsyscall", &tv_vsys, &tz_vsys); } else { @@ -277,7 +271,7 @@ static int test_time(void) { t_sys1 = sys_time(&t2_sys1); if (vdso_time) t_vdso = vdso_time(&t2_vdso); - if (vtime) + if (vsyscall_map_x) t_vsys = vtime(&t2_vsys); t_sys2 = sys_time(&t2_sys2); if (t_sys1 < 0 || t_sys1 != t2_sys1 || t_sys2 < 0 || t_sys2 != t2_sys2) { @@ -298,7 +292,7 @@ static int test_time(void) { } } - if (vtime) { + if (vsyscall_map_x) { if (t_vsys < 0 || t_vsys != t2_vsys) { printf("[FAIL]\tvsyscall failed (ret:%ld output:%ld)\n", t_vsys, t2_vsys); nerrs++; @@ -334,7 +328,7 @@ static int test_getcpu(int cpu) ret_sys = sys_getcpu(&cpu_sys, &node_sys, 0); if (vdso_getcpu) ret_vdso = vdso_getcpu(&cpu_vdso, &node_vdso, 0); - if (vgetcpu) + if (vsyscall_map_x) ret_vsys = vgetcpu(&cpu_vsys, &node_vsys, 0); if (ret_sys == 0) { @@ -373,7 +367,7 @@ static int test_getcpu(int cpu) } } - if (vgetcpu) { + if (vsyscall_map_x) { if (ret_vsys) { printf("[FAIL]\tvsyscall getcpu() failed\n"); nerrs++; @@ -414,10 +408,10 @@ static int test_vsys_r(void) can_read = false; } - if (can_read && !should_read_vsyscall) { + if (can_read && !vsyscall_map_r) { printf("[FAIL]\tWe have read access, but we shouldn't\n"); return 1; - } else i
[PATCH v2 1/8] x86/vsyscall: Remove the vsyscall=native documentation
The vsyscall=native feature is gone -- remove the docs. Fixes: 076ca272a14c ("x86/vsyscall/64: Drop "native" vsyscalls") Cc: sta...@vger.kernel.org Cc: Kees Cook Cc: Borislav Petkov Cc: Kernel Hardening Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andy Lutomirski --- Documentation/admin-guide/kernel-parameters.txt | 6 -- 1 file changed, 6 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 138f6664b2e2..0082d1e56999 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5102,12 +5102,6 @@ emulate [default] Vsyscalls turn into traps and are emulated reasonably safely. - native Vsyscalls are native syscall instructions. - This is a little bit faster than trapping - and makes a few dynamic recompilers work - better than they would in emulation mode. - It also makes exploits much easier to write. - noneVsyscalls don't work at all. This makes them quite hard to use for exploits but might break your system. -- 2.21.0
[PATCH v2 0/8] vsyscall xonly mode
Hi all- This adds a new "xonly" mode for vsyscalls and makes it the default. xonly is a bit more secure -- Kees knows about an exploit that relied on read access to the vsyscall page. It's also nicer from a paging perspective, as it doesn't require user access to any of the kernel address space as far as the CPU is concerned. This would, for example, allow a much simpler implementation of per-process vsyscall disabling. I will follow up with two non-x86 changes that are related but have no dependencies. Changes from v1: - Minor cleanups (Kees) - Add a searchable message when a vsyscall read is denied (Kees) - The test case is vastly improved - Get rid of the extra gate vma object - Add the __ro_after_init patch Andy Lutomirski (8): x86/vsyscall: Remove the vsyscall=native documentation x86/vsyscall: Add a new vsyscall=xonly mode x86/vsyscall: Show something useful on a read fault x86/vsyscall: Document odd SIGSEGV error code for vsyscalls selftests/x86/vsyscall: Verify that vsyscall=none blocks execution x86/vsyscall: Change the default vsyscall mode to xonly x86/vsyscall: Add __ro_after_init to global variables selftests/x86: Add a test for process_vm_readv() on the vsyscall page .../admin-guide/kernel-parameters.txt | 11 +- arch/x86/Kconfig | 35 +++-- arch/x86/entry/vsyscall/vsyscall_64.c | 37 +- arch/x86/include/asm/vsyscall.h | 6 +- arch/x86/mm/fault.c | 18 ++- tools/testing/selftests/x86/test_vsyscall.c | 120 ++ 6 files changed, 174 insertions(+), 53 deletions(-) -- 2.21.0
[PATCH v2 7/8] x86/vsyscall: Add __ro_after_init to global variables
The vDSO is only configurable by command-line options, so make its global variables __ro_after_init. This seems highly unlikely to ever stop an exploit, but I think it's nice anyway. Cc: Kees Cook Cc: Borislav Petkov Cc: Kernel Hardening Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andy Lutomirski --- arch/x86/entry/vsyscall/vsyscall_64.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index 9c58ab807aeb..07003f3f1bfc 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -42,7 +42,7 @@ #define CREATE_TRACE_POINTS #include "vsyscall_trace.h" -static enum { EMULATE, XONLY, NONE } vsyscall_mode = +static enum { EMULATE, XONLY, NONE } vsyscall_mode __ro_after_init = #ifdef CONFIG_LEGACY_VSYSCALL_NONE NONE; #elif defined(CONFIG_LEGACY_VSYSCALL_XONLY) @@ -305,7 +305,7 @@ static const char *gate_vma_name(struct vm_area_struct *vma) static const struct vm_operations_struct gate_vma_ops = { .name = gate_vma_name, }; -static struct vm_area_struct gate_vma = { +static struct vm_area_struct gate_vma __ro_after_init = { .vm_start = VSYSCALL_ADDR, .vm_end = VSYSCALL_ADDR + PAGE_SIZE, .vm_page_prot = PAGE_READONLY_EXEC, -- 2.21.0
[PATCH v2 8/8] selftests/x86: Add a test for process_vm_readv() on the vsyscall page
get_gate_page() is a piece of somewhat alarming code to make get_user_pages() work on the vsyscall page. Test it via process_vm_readv(). Cc: Kees Cook Cc: Borislav Petkov Cc: Kernel Hardening Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andy Lutomirski --- tools/testing/selftests/x86/test_vsyscall.c | 35 + 1 file changed, 35 insertions(+) diff --git a/tools/testing/selftests/x86/test_vsyscall.c b/tools/testing/selftests/x86/test_vsyscall.c index 34a1d35995ef..4602326b8f5b 100644 --- a/tools/testing/selftests/x86/test_vsyscall.c +++ b/tools/testing/selftests/x86/test_vsyscall.c @@ -18,6 +18,7 @@ #include #include #include +#include #ifdef __x86_64__ # define VSYS(x) (x) @@ -459,6 +460,38 @@ static int test_vsys_x(void) return 0; } +static int test_process_vm_readv(void) +{ +#ifdef __x86_64__ + char buf[4096]; + struct iovec local, remote; + int ret; + + printf("[RUN]\tprocess_vm_readv() from vsyscall page\n"); + + local.iov_base = buf; + local.iov_len = 4096; + remote.iov_base = (void *)0xff60; + remote.iov_len = 4096; + ret = process_vm_readv(getpid(), &local, 1, &remote, 1, 0); + if (ret != 4096) { + printf("[OK]\tprocess_vm_readv() failed (ret = %d, errno = %d)\n", ret, errno); + return 0; + } + + if (vsyscall_map_r) { + if (!memcmp(buf, (const void *)0xff60, 4096)) { + printf("[OK]\tIt worked and read correct data\n"); + } else { + printf("[FAIL]\tIt worked but returned incorrect data\n"); + return 1; + } + } +#endif + + return 0; +} + #ifdef __x86_64__ #define X86_EFLAGS_TF (1UL << 8) static volatile sig_atomic_t num_vsyscall_traps; @@ -533,6 +566,8 @@ int main(int argc, char **argv) nerrs += test_vsys_r(); nerrs += test_vsys_x(); + nerrs += test_process_vm_readv(); + #ifdef __x86_64__ nerrs += test_emulation(); #endif -- 2.21.0
[PATCH v2 6/8] x86/vsyscall: Change the default vsyscall mode to xonly
The use case for full emulation over xonly is very esoteric. Let's change the default to the safer xonly mode. Cc: Kees Cook Cc: Borislav Petkov Cc: Kernel Hardening Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andy Lutomirski --- arch/x86/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0182d2c67590..32028edc1b0e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2285,7 +2285,7 @@ config COMPAT_VDSO choice prompt "vsyscall table for legacy applications" depends on X86_64 - default LEGACY_VSYSCALL_EMULATE + default LEGACY_VSYSCALL_XONLY help Legacy user code that does not know how to find the vDSO expects to be able to issue three syscalls by calling fixed addresses in -- 2.21.0
[RESEND PATCHv4 0/1] coresight: Do not default to CPU0 for missing CPU phandle
In case of missing CPU phandle, the affinity is set default to CPU0 which is not a correct assumption. Fix this in coresight platform to set affinity to invalid and abort the probe in drivers. Also update the dt-bindings accordingly. Resent with Reviewed tag by Suzuki. v4: * Fix return for !CONFIG_ACPI and !CONFIG_OF. v3: * Addressed review comments from Suzuki and updated acpi_coresight_get_cpu. * Removed patch 2 which had invalid check for online cpus. v2: * Addressed review comments from Suzuki and Mathieu. * Allows the probe of etm and cpu-debug to abort earlier in case of unavailability of respective cpus. Sai Prakash Ranjan (1): coresight: Do not default to CPU0 for missing CPU phandle .../bindings/arm/coresight-cpu-debug.txt | 4 ++-- .../devicetree/bindings/arm/coresight.txt | 8 +--- .../hwtracing/coresight/coresight-cpu-debug.c | 3 +++ drivers/hwtracing/coresight/coresight-etm3x.c | 3 +++ drivers/hwtracing/coresight/coresight-etm4x.c | 3 +++ .../hwtracing/coresight/coresight-platform.c | 20 +-- 6 files changed, 26 insertions(+), 15 deletions(-) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[RESEND PATCHv4 1/1] coresight: Do not default to CPU0 for missing CPU phandle
Coresight platform support assumes that a missing "cpu" phandle defaults to CPU0. This could be problematic and unnecessarily binds components to CPU0, where they may not be. Let us make the DT binding rules a bit stricter by not defaulting to CPU0 for missing "cpu" affinity information. Also in coresight etm and cpu-debug drivers, abort the probe for such cases. Signed-off-by: Sai Prakash Ranjan Reviewed-by: Suzuki K Poulose --- .../bindings/arm/coresight-cpu-debug.txt | 4 ++-- .../devicetree/bindings/arm/coresight.txt | 8 +--- .../hwtracing/coresight/coresight-cpu-debug.c | 3 +++ drivers/hwtracing/coresight/coresight-etm3x.c | 3 +++ drivers/hwtracing/coresight/coresight-etm4x.c | 3 +++ .../hwtracing/coresight/coresight-platform.c | 20 +-- 6 files changed, 26 insertions(+), 15 deletions(-) diff --git a/Documentation/devicetree/bindings/arm/coresight-cpu-debug.txt b/Documentation/devicetree/bindings/arm/coresight-cpu-debug.txt index 298291211ea4..f1de3247c1b7 100644 --- a/Documentation/devicetree/bindings/arm/coresight-cpu-debug.txt +++ b/Documentation/devicetree/bindings/arm/coresight-cpu-debug.txt @@ -26,8 +26,8 @@ Required properties: processor core is clocked by the internal CPU clock, so it is enabled with CPU clock by default. -- cpu : the CPU phandle the debug module is affined to. When omitted - the module is considered to belong to CPU0. +- cpu : the CPU phandle the debug module is affined to. Do not assume it +to default to CPU0 if omitted. Optional properties: diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt index 8a88ddebc1a2..fcc3bacfd8bc 100644 --- a/Documentation/devicetree/bindings/arm/coresight.txt +++ b/Documentation/devicetree/bindings/arm/coresight.txt @@ -59,6 +59,11 @@ its hardware characteristcs. * port or ports: see "Graph bindings for Coresight" below. +* Additional required property for Embedded Trace Macrocell (version 3.x and + version 4.x): + * cpu: the cpu phandle this ETM/PTM is affined to. Do not + assume it to default to CPU0 if omitted. + * Additional required properties for System Trace Macrocells (STM): * reg: along with the physical base address and length of the register set as described above, another entry is required to describe the @@ -87,9 +92,6 @@ its hardware characteristcs. * arm,cp14: must be present if the system accesses ETM/PTM management registers via co-processor 14. - * cpu: the cpu phandle this ETM/PTM is affined to. When omitted the - source is considered to belong to CPU0. - * Optional property for TMC: * arm,buffer-size: size of contiguous buffer space for TMC ETR diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c index 07a1367c733f..58bfd6319f65 100644 --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c @@ -579,6 +579,9 @@ static int debug_probe(struct amba_device *adev, const struct amba_id *id) return -ENOMEM; drvdata->cpu = coresight_get_cpu(dev); + if (drvdata->cpu < 0) + return drvdata->cpu; + if (per_cpu(debug_drvdata, drvdata->cpu)) { dev_err(dev, "CPU%d drvdata has already been initialized\n", drvdata->cpu); diff --git a/drivers/hwtracing/coresight/coresight-etm3x.c b/drivers/hwtracing/coresight/coresight-etm3x.c index 225c2982e4fe..e2cb6873c3f2 100644 --- a/drivers/hwtracing/coresight/coresight-etm3x.c +++ b/drivers/hwtracing/coresight/coresight-etm3x.c @@ -816,6 +816,9 @@ static int etm_probe(struct amba_device *adev, const struct amba_id *id) } drvdata->cpu = coresight_get_cpu(dev); + if (drvdata->cpu < 0) + return drvdata->cpu; + desc.name = devm_kasprintf(dev, GFP_KERNEL, "etm%d", drvdata->cpu); if (!desc.name) return -ENOMEM; diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index 7fe266194ab5..7bcac8896fc1 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -1101,6 +1101,9 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id) spin_lock_init(&drvdata->spinlock); drvdata->cpu = coresight_get_cpu(dev); + if (drvdata->cpu < 0) + return drvdata->cpu; + desc.name = devm_kasprintf(dev, GFP_KERNEL, "etm%d", drvdata->cpu); if (!desc.name) return -ENOMEM; diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c index 3c5ceda8db24..cf580ffbc27c 100644 --- a/drivers/hwtracing/coresight/coresight-platform.c +++ b/drivers/hwtracing/cores
Re: [PATCH] powerpc/64s/radix: Define arch_ioremap_p4d_supported()
On 06/26/2019 01:21 PM, Anshuman Khandual wrote: Recent core ioremap changes require HAVE_ARCH_HUGE_VMAP subscribing archs provide arch_ioremap_p4d_supported() failing which will result in a build failure like the following. ld: lib/ioremap.o: in function `.ioremap_huge_init': ioremap.c:(.init.text+0x3c): undefined reference to `.arch_ioremap_p4d_supported' This defines a stub implementation for arch_ioremap_p4d_supported() keeping it disabled for now to fix the build problem. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Aneesh Kumar K.V" Cc: Nicholas Piggin Cc: Andrew Morton Cc: Stephen Rothwell Cc: linuxppc-...@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Cc: linux-n...@vger.kernel.org Signed-off-by: Anshuman Khandual Add a Fixes: tag ? For instance: Fixes: d909f9109c30 ("powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP") Christophe --- This has been just build tested and fixes the problem reported earlier. arch/powerpc/mm/book3s64/radix_pgtable.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 8904aa1..c81da88 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -1124,6 +1124,11 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, set_pte_at(mm, addr, ptep, pte); } +int __init arch_ioremap_p4d_supported(void) +{ + return 0; +} + int __init arch_ioremap_pud_supported(void) { /* HPT does not cope with large pages in the vmalloc area */
Re: [PATCH v3 2/4] objtool: Add support for C jump tables
On Wed, Jun 26, 2019 at 8:56 PM Josh Poimboeuf wrote: > > The last patch was based weird, this one's based on upstream. Will test > tomorrow. Great. Once it passes your tests I'll be happy to test it on my side.
Re: [PATCH 08/16] nfsd: escape high characters in binary data
On Wed, Jun 26, 2019 at 12:21:49PM -0400, J. Bruce Fields wrote: > On Mon, Jun 24, 2019 at 05:05:12PM -0400, J. Bruce Fields wrote: > > On Sat, Jun 22, 2019 at 01:22:56PM -0700, Kees Cook wrote: > > > On Sat, Jun 22, 2019 at 03:00:58PM -0400, J. Bruce Fields wrote: > > > > The logic around ESCAPE_NP and the "only" string is really confusing. I > > > > started assuming I could just add an ESCAPE_NONASCII flag and stick " > > > > and \ into the "only" string, but it doesn't work that way. > > > > > > Yeah, if ESCAPE_NP isn't specified, the "only" characters are passed > > > through. It'd be nice to have an "add" or a clearer way to do actual > > > ctype subsets, etc. If there isn't an obviously clear way to refactor > > > it, just skip it for now and I'm happy to ack your original patch. :) > > > > There may well be some simplification possible here There aren't > > really many users of "only", for example. I'll look into it some more. > > The printk users are kind of mysterious to me. I did a grep for > > git grep '%[0-9.*]pE' > > which got 75 hits. All of them for pE. I couldn't find any of the > other pE[achnops] variants. pE is equivalent to ESCAPE_ANY|ESCAPE_NP. I saw pEn and pEhp and pEp: drivers/staging/rtl8192e/rtllib.h: snprintf(escaped, sizeof(escaped), "%*pEn", essid_len, essid); drivers/staging/rtl8192u/ieee80211/ieee80211.h: snprintf(escaped, sizeof(escaped), "%*pEn", essid_len, essid); drivers/staging/wlan-ng/prism2sta.c: netdev_info(wlandev->netdev, "Prism2 card SN: %*pEhp\n", drivers/thunderbolt/xdomain.c: return sprintf(buf, "%*pEp\n", (int)strlen(svc->key), svc->key); However, every use was insufficient, AFAICT. This: git grep -2 '\bescape_essid\b' Shows that all the staging uses end up getting logged as: '%s' so their escaping is insufficient. > Confusingly, ESCAPE_NP doesn't mean "escape non-printable", it means > "don't escape printable". So things like carriage returns aren't > escaped. Right -- any they're almost all logged surrounded by ' or " which means those would need to be escaped as well. The prism2 is leaking newlines too, as well as the thunderbolt sysfs printing. So... seems like we should fix this. :P > Of those 57 were in drivers/net/wireless, and from a quick check seemed > mostly to be for SSIDs in debug messages. I *think* SSIDs can be > arbitrary bytes? If they really want them escaped then I suspect they > want more than just nonprintable characters escaped. > > One of the hits outside wireless code was in drm_dp_cec_adap_status, > which was printing some device ID into a debugfs file with "ID: %*pE\n". > If the ID actually needs escaping, then I suspect the meant to escape \n > too to prevent misparsing that output. I think we need to make the default produce "loggable" output. non-ascii, non-printables, \, ', and " need to be escaped. Maybe " " too? -- Kees Cook
linux-next: manual merge of the mlx5-next tree with the net-next tree
Hi all, Today's linux-next merge of the mlx5-next tree got a conflict in: drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c between commits: 955858009708 ("net/mlx5e: Fix number of vports for ingress ACL configuration") d4a18e16c570 ("net/mlx5e: Enable setting multiple match criteria for flow group") from the net-next tree and commits: 7445cfb1169c ("net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs") c01cfd0f1115 ("net/mlx5: E-Switch, Add match on vport metadata for rule in fast path") from the mlx5-next tree. I fixed it up (I basically used the latter versions) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell pgpQvA9EzDSzl.pgp Description: OpenPGP digital signature
Re: [RFC PATCH v3 1/4] vfio: Define device specific irq type capability
On Thu, 27 Jun 2019 11:37:59 +0800 Tina Zhang wrote: > Cap the number of irqs with fixed indexes and use capability chains > to chain device specific irqs. > > Signed-off-by: Tina Zhang > --- > include/uapi/linux/vfio.h | 19 ++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 02bb7ad6e986..600784acc4ac 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -444,11 +444,27 @@ struct vfio_irq_info { > #define VFIO_IRQ_INFO_MASKABLE (1 << 1) > #define VFIO_IRQ_INFO_AUTOMASKED (1 << 2) > #define VFIO_IRQ_INFO_NORESIZE (1 << 3) > +#define VFIO_IRQ_INFO_FLAG_CAPS (1 << 4) /* Info supports caps > */ > __u32 index; /* IRQ index */ > + __u32 cap_offset; /* Offset within info struct of first cap */ > __u32 count; /* Number of IRQs within this index */ > }; This cannot be inserted into the middle of the structure, it breaks compatibility with all existing userspace binaries. I must be added to the end of the structure. > #define VFIO_DEVICE_GET_IRQ_INFO _IO(VFIO_TYPE, VFIO_BASE + 9) > > +/* > + * The irq type capability allows irqs unique to a specific device or > + * class of devices to be exposed. > + * > + * The structures below define version 1 of this capability. > + */ > +#define VFIO_IRQ_INFO_CAP_TYPE 3 > + > +struct vfio_irq_info_cap_type { > + struct vfio_info_cap_header header; > + __u32 type; /* global per bus driver */ > + __u32 subtype; /* type specific */ > +}; > + > /** > * VFIO_DEVICE_SET_IRQS - _IOW(VFIO_TYPE, VFIO_BASE + 10, struct > vfio_irq_set) > * > @@ -550,7 +566,8 @@ enum { > VFIO_PCI_MSIX_IRQ_INDEX, > VFIO_PCI_ERR_IRQ_INDEX, > VFIO_PCI_REQ_IRQ_INDEX, > - VFIO_PCI_NUM_IRQS > + VFIO_PCI_NUM_IRQS = 5 /* Fixed user ABI, IRQ indexes >=5 use */ > + /* device specific cap to define content */ > }; > > /*
Re: [PATCH 1/8] dt-bindings: pinctrl: aspeed: Split bindings document in two
On Thu, 27 Jun 2019 at 04:02, Andrew Jeffery wrote: > > > > On Thu, 27 Jun 2019, at 13:02, Joel Stanley wrote: > > On Wed, 26 Jun 2019 at 07:15, Andrew Jeffery wrote: > > > > > > Have one for each of the AST2400 and AST2500. The only thing that was > > > common was the fact that both support ASPEED BMC SoCs. > > > > > > Signed-off-by: Andrew Jeffery > > > --- > > > .../pinctrl/aspeed,ast2400-pinctrl.txt| 80 +++ > > > ...-aspeed.txt => aspeed,ast2500-pinctrl.txt} | 63 ++- > > > 2 files changed, 85 insertions(+), 58 deletions(-) > > > create mode 100644 > > > Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.txt > > > rename Documentation/devicetree/bindings/pinctrl/{pinctrl-aspeed.txt => > > > aspeed,ast2500-pinctrl.txt} (66%) > > > > > > diff --git > > > a/Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.txt > > > b/Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.txt > > > new file mode 100644 > > > index ..67e0325ccf2e > > > --- /dev/null > > > +++ b/Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.txt > > > @@ -0,0 +1,80 @@ > > > += > > > +Aspeed AST2400 Pin Controller > > > += > > > + > > > +Required properties for the AST2400: > > > +- compatible : Should be one of the following: > > > + "aspeed,ast2400-pinctrl" > > > + "aspeed,g4-pinctrl" > > > + > > > +The pin controller node should be the child of a syscon node with the > > > required > > > +property: > > > + > > > +- compatible : Should be one of the following: > > > + "aspeed,ast2400-scu", "syscon", "simple-mfd" > > > + "aspeed,g4-scu", "syscon", "simple-mfd" > > > > I think we can use this as an opportunity to drop the unused g4-scu > > compatible from the bindings. Similarly for the g5. > > I Wonder if we should eradicate that pattern for all the aspeed compatibles? Yes. We've settled on ast2x00,aspeed- for most of them. If you're aware of others we should remove them from the bindings. I think we've stopped any new users of the gx style from getting merged.
Re: [PATCH] mm: vmscan: fix not scanning anonymous pages when detecting file refaults
Could we please get some review of this one? Johannes, it supposedly fixes your patch? I added cc:stable to this. Agreeable? From: Kuo-Hsin Yang Subject: mm: vmscan: fix not scanning anonymous pages when detecting file refaults When file refaults are detected and there are many inactive file pages, the system never reclaim anonymous pages, the file pages are dropped aggressively when there are still a lot of cold anonymous pages and system thrashes. This issue impacts the performance of applications with large executable, e.g. chrome. When file refaults are detected. inactive_list_is_low() may return different values depends on the actual_reclaim parameter, the following 2 conditions could be satisfied at the same time. 1) inactive_list_is_low() returns false in get_scan_count() to trigger scanning file lists only. 2) inactive_list_is_low() returns true in shrink_list() to allow scanning active file list. In that case vmscan would only scan file lists, and as active file list is also scanned, inactive_list_is_low() may keep returning false in get_scan_count() until file cache is very low. Before 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in cache workingset transition"), inactive_list_is_low() never returns different value in get_scan_count() and shrink_list() in one shrink_node_memcg() run. The original design should be that when inactive_list_is_low() returns false for file lists, vmscan only scan inactive file list. As only inactive file list is scanned, inactive_list_is_low() would soon return true. This patch makes the return value of inactive_list_is_low() independent of actual_reclaim. The problem can be reproduced by the following test program. ---8<--- void fallocate_file(const char *filename, off_t size) { struct stat st; int fd; if (!stat(filename, &st) && st.st_size >= size) return; fd = open(filename, O_WRONLY | O_CREAT, 0600); if (fd < 0) { perror("create file"); exit(1); } if (posix_fallocate(fd, 0, size)) { perror("fallocate"); exit(1); } close(fd); } long *alloc_anon(long size) { long *start = malloc(size); memset(start, 1, size); return start; } long access_file(const char *filename, long size, long rounds) { int fd, i; volatile char *start1, *end1, *start2; const int page_size = getpagesize(); long sum = 0; fd = open(filename, O_RDONLY); if (fd == -1) { perror("open"); exit(1); } /* * Some applications, e.g. chrome, use a lot of executable file * pages, map some of the pages with PROT_EXEC flag to simulate * the behavior. */ start1 = mmap(NULL, size / 2, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0); if (start1 == MAP_FAILED) { perror("mmap"); exit(1); } end1 = start1 + size / 2; start2 = mmap(NULL, size / 2, PROT_READ, MAP_SHARED, fd, size / 2); if (start2 == MAP_FAILED) { perror("mmap"); exit(1); } for (i = 0; i < rounds; ++i) { struct timeval before, after; volatile char *ptr1 = start1, *ptr2 = start2; gettimeofday(&before, NULL); for (; ptr1 < end1; ptr1 += page_size, ptr2 += page_size) sum += *ptr1 + *ptr2; gettimeofday(&after, NULL); printf("File access time, round %d: %f (sec) ", i, (after.tv_sec - before.tv_sec) + (after.tv_usec - before.tv_usec) / 100.0); } return sum; } int main(int argc, char *argv[]) { const long MB = 1024 * 1024; long anon_mb, file_mb, file_rounds; const char filename[] = "large"; long *ret1; long ret2; if (argc != 4) { printf("usage: thrash ANON_MB FILE_MB FILE_ROUNDS "); exit(0); } anon_mb = atoi(argv[1]); file_mb = atoi(argv[2]); file_rounds = atoi(argv[3]); fallocate_file(filename, file_mb * MB); printf("Allocate %ld MB anonymous pages ", anon_mb); ret1 = alloc_anon(anon_mb * MB); printf("Access %ld MB file pages ", file_mb); ret2 = access_file(filename, file_mb * MB, file_rounds); printf("Print result to prevent optimization: %ld ", *ret1 + ret2); return 0; } ---8<--- Running the test program on 2GB RAM VM with kernel 5.2.0-rc5, the program fills ram with 2048 MB memory, access a 200 MB file for 10 times. Without this patch, the file cache is dropped aggresively and every access to the file is from disk. $ ./thrash 2048 200 10 Allocate 2048 MB anonymous pages Access 200 MB file pages File
Re: [PATCH 1/8] dt-bindings: pinctrl: aspeed: Split bindings document in two
On Thu, 27 Jun 2019, at 13:02, Joel Stanley wrote: > On Wed, 26 Jun 2019 at 07:15, Andrew Jeffery wrote: > > > > Have one for each of the AST2400 and AST2500. The only thing that was > > common was the fact that both support ASPEED BMC SoCs. > > > > Signed-off-by: Andrew Jeffery > > --- > > .../pinctrl/aspeed,ast2400-pinctrl.txt| 80 +++ > > ...-aspeed.txt => aspeed,ast2500-pinctrl.txt} | 63 ++- > > 2 files changed, 85 insertions(+), 58 deletions(-) > > create mode 100644 > > Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.txt > > rename Documentation/devicetree/bindings/pinctrl/{pinctrl-aspeed.txt => > > aspeed,ast2500-pinctrl.txt} (66%) > > > > diff --git > > a/Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.txt > > b/Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.txt > > new file mode 100644 > > index ..67e0325ccf2e > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.txt > > @@ -0,0 +1,80 @@ > > += > > +Aspeed AST2400 Pin Controller > > += > > + > > +Required properties for the AST2400: > > +- compatible : Should be one of the following: > > + "aspeed,ast2400-pinctrl" > > + "aspeed,g4-pinctrl" > > + > > +The pin controller node should be the child of a syscon node with the > > required > > +property: > > + > > +- compatible : Should be one of the following: > > + "aspeed,ast2400-scu", "syscon", "simple-mfd" > > + "aspeed,g4-scu", "syscon", "simple-mfd" > > I think we can use this as an opportunity to drop the unused g4-scu > compatible from the bindings. Similarly for the g5. I Wonder if we should eradicate that pattern for all the aspeed compatibles? > > Acked-by: Joel Stanley Cheers, Andrew
Re: KASAN: use-after-free Write in xfrm_hash_rebuild
syzbot has found a reproducer for the following crash on: HEAD commit:249155c2 Merge branch 'parisc-5.2-4' of git://git.kernel.o.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=10f017c3a0 kernel config: https://syzkaller.appspot.com/x/.config?x=9a31528e58cc12e2 dashboard link: https://syzkaller.appspot.com/bug?extid=0165480d4ef07360eeda compiler: clang version 9.0.0 (/home/glider/llvm/clang 80fee25776c2fb61e74c1ecb1a523375c2500b69) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16cf37c3a0 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+0165480d4ef07360e...@syzkaller.appspotmail.com == BUG: KASAN: use-after-free in __write_once_size include/linux/compiler.h:221 [inline] BUG: KASAN: use-after-free in __hlist_del include/linux/list.h:748 [inline] BUG: KASAN: use-after-free in hlist_del_rcu include/linux/rculist.h:455 [inline] BUG: KASAN: use-after-free in xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318 Write of size 8 at addr 888095e79c00 by task kworker/1:3/8066 CPU: 1 PID: 8066 Comm: kworker/1:3 Not tainted 5.2.0-rc6+ #7 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events xfrm_hash_rebuild Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113 print_address_description+0x6d/0x310 mm/kasan/report.c:188 __kasan_report+0x14b/0x1c0 mm/kasan/report.c:317 kasan_report+0x26/0x50 mm/kasan/common.c:614 __asan_report_store8_noabort+0x17/0x20 mm/kasan/generic_report.c:137 __write_once_size include/linux/compiler.h:221 [inline] __hlist_del include/linux/list.h:748 [inline] hlist_del_rcu include/linux/rculist.h:455 [inline] xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318 process_one_work+0x814/0x1130 kernel/workqueue.c:2269 worker_thread+0xc01/0x1640 kernel/workqueue.c:2415 kthread+0x325/0x350 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Allocated by task 8064: save_stack mm/kasan/common.c:71 [inline] set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:489 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc mm/slab.c:3660 [inline] __kmalloc+0x23c/0x310 mm/slab.c:3669 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:742 [inline] xfrm_hash_alloc+0x38/0xe0 net/xfrm/xfrm_hash.c:21 xfrm_policy_init net/xfrm/xfrm_policy.c:4036 [inline] xfrm_net_init+0x269/0xd60 net/xfrm/xfrm_policy.c:4120 ops_init+0x336/0x420 net/core/net_namespace.c:130 setup_net+0x212/0x690 net/core/net_namespace.c:316 copy_net_ns+0x224/0x380 net/core/net_namespace.c:439 create_new_namespaces+0x4ec/0x700 kernel/nsproxy.c:103 unshare_nsproxy_namespaces+0x12a/0x190 kernel/nsproxy.c:202 ksys_unshare+0x540/0xac0 kernel/fork.c:2692 __do_sys_unshare kernel/fork.c:2760 [inline] __se_sys_unshare kernel/fork.c:2758 [inline] __x64_sys_unshare+0x38/0x40 kernel/fork.c:2758 do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 17: save_stack mm/kasan/common.c:71 [inline] set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x12a/0x1e0 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xae/0x120 mm/slab.c:3755 xfrm_hash_free+0x38/0xd0 net/xfrm/xfrm_hash.c:35 xfrm_bydst_resize net/xfrm/xfrm_policy.c:602 [inline] xfrm_hash_resize+0x13f1/0x1840 net/xfrm/xfrm_policy.c:680 process_one_work+0x814/0x1130 kernel/workqueue.c:2269 worker_thread+0xc01/0x1640 kernel/workqueue.c:2415 kthread+0x325/0x350 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 The buggy address belongs to the object at 888095e79c00 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [888095e79c00, 888095e79c40) The buggy address belongs to the page: page:ea0002579e40 refcount:1 mapcount:0 mapping:8880aa400340 index:0x0 flags: 0x1fffc000200(slab) raw: 01fffc000200 ea0002540888 ea0002907548 8880aa400340 raw: 888095e79000 00010020 page dumped because: kasan: bad access detected Memory state around the buggy address: 888095e79b00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc 888095e79b80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc 888095e79c00: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ 888095e79c80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc 888095e79d00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ==
Reminder: 3 open syzbot bugs in input subsystem
[This email was generated by a script. Let me know if you have any suggestions to make it better.] Of the currently open syzbot reports against the upstream kernel, I've manually marked 3 of them as possibly being bugs in the input subsystem. I've listed these reports below, sorted by an algorithm that tries to list first the reports most likely to be still valid, important, and actionable. Of these 3 bugs, 2 were seen in mainline in the last week. If you believe a bug is no longer valid, please close the syzbot report by sending a '#syz fix', '#syz dup', or '#syz invalid' command in reply to the original thread, as explained at https://goo.gl/tpsmEJ#status If you believe I misattributed a bug to the input subsystem, please let me know, and if possible forward the report to the correct people or mailing list. Here are the bugs: Title: WARNING in aiptek_open/usb_submit_urb Last occurred: 2 days ago Reported: 19 days ago Branches: Mainline (with usb-fuzzer patches) Dashboard link: https://syzkaller.appspot.com/bug?id=0e35393fd821f0570b2a1663a01ac7bdcd15046a Original thread: https://lkml.kernel.org/lkml/1abc1c058ab95...@google.com/T/#u This bug has a C reproducer. No one has replied to the original thread for this bug yet. This looks like a bug in an input USB driver. If you fix this bug, please add the following tag to the commit: Reported-by: syzbot+75cccf2b7da87fb6f...@syzkaller.appspotmail.com If you send any email or patch for this bug, please consider replying to the original thread. For the git send-email command to use, or tips on how to reply if the thread isn't in your mailbox, see the "Reply instructions" at https://lkml.kernel.org/r/1abc1c058ab95...@google.com Title: INFO: trying to register non-static key in usbtouch_reset_resume Last occurred: 6 days ago Reported: 30 days ago Branches: Mainline (with usb-fuzzer patches) Dashboard link: https://syzkaller.appspot.com/bug?id=64fd387d8358406dc0037511ee44db159f6f1605 Original thread: https://lkml.kernel.org/lkml/5463aa0589dcf...@google.com/T/#u This bug has a C reproducer. No one has replied to the original thread for this bug yet. This looks like a bug in an input USB driver. If you fix this bug, please add the following tag to the commit: Reported-by: syzbot+933daad9be4e67ba9...@syzkaller.appspotmail.com If you send any email or patch for this bug, please consider replying to the original thread. For the git send-email command to use, or tips on how to reply if the thread isn't in your mailbox, see the "Reply instructions" at https://lkml.kernel.org/r/5463aa0589dcf...@google.com Title: INFO: task hung in evdev_release Last occurred: 246 days ago Reported: 253 days ago Branches: Mainline and others Dashboard link: https://syzkaller.appspot.com/bug?id=ebbbff1dcac574b81f9fd5e07100a4879e5bf53d Original thread: https://lkml.kernel.org/lkml/f1be430578524...@google.com/T/#u This bug has a syzkaller reproducer only. syzbot has bisected this bug, but I think the bisection result is incorrect. The original thread for this bug received 1 reply, 86 days ago. If you fix this bug, please add the following tag to the commit: Reported-by: syzbot+a979743610b4755d4...@syzkaller.appspotmail.com If you send any email or patch for this bug, please consider replying to the original thread. For the git send-email command to use, or tips on how to reply if the thread isn't in your mailbox, see the "Reply instructions" at https://lkml.kernel.org/r/f1be430578524...@google.com
Re: [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask
On Mon, 27 May 2019 21:58:17 +0800 zhong jiang wrote: > On 2019/5/27 20:23, Vlastimil Babka wrote: > > On 5/25/19 8:28 PM, Andrew Morton wrote: > >> (Cc Vlastimil) > > Oh dear, 2 years and I forgot all the details about how this works. > > > >> On Sat, 25 May 2019 15:07:23 +0800 zhong jiang > >> wrote: > >> > >>> We bind an different node to different vma, Unluckily, > >>> it will bind different vma to same node by checking the > >>> /proc/pid/numa_maps. > >>> Commit 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when > >>> updating cpusets") > >>> has introduced the issue. when we change memory policy by seting > >>> cpuset.mems, > >>> A process will rebind the specified policy more than one times. > >>> if the cpuset_mems_allowed is not equal to user specified nodes. hence > >>> the issue will trigger. > >>> Maybe result in the out of memory which allocating memory from same node. > > I have a hard time understanding what the problem is. Could you please > > write it as a (pseudo) reproducer? I.e. an example of the process/admin > > mempolicy/cpuset actions that have some wrong observed results vs the > > correct expected result. > Sorry, I havn't an testcase to reproduce the issue. At first, It was > disappeared by > my colleague to configure the xml to start an vm. To his suprise, The bind > mempolicy > doesn't work. So... what do we do with this patch? > Thanks, > zhong jiang > >>> --- a/mm/mempolicy.c > >>> +++ b/mm/mempolicy.c > >>> @@ -345,7 +345,7 @@ static void mpol_rebind_nodemask(struct mempolicy > >>> *pol, const nodemask_t *nodes) > >>> else { > >>> nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed, > >>> *nodes); > >>> - pol->w.cpuset_mems_allowed = tmp; > >>> + pol->w.cpuset_mems_allowed = *nodes; > > Looks like a mechanical error on my side when removing the code for > > step1+step2 rebinding. Before my commit there was > > > > pol->w.cpuset_mems_allowed = step ? tmp : *nodes; > > > > Since 'step' was removed and thus 0, I should have used *nodes indeed. > > Thanks for catching that. Was that an ack? > >>> } > >>> > >>> if (nodes_empty(tmp)) > >> hm, I'm not surprised the code broke. What the heck is going on in > >> there? It used to have a perfunctory comment, but Vlastimil deleted > >> it. > > Yeah the comment was specific for the case that was being removed. > > > >> Could someone please propose a comment for the above code block > >> explaining why we're doing what we do? > > I'll have to relearn this first... > > > > >
Re: [PATCH 5/8] pinctrl: aspeed: Correct comment that is no longer true
On Thu, 27 Jun 2019, at 13:00, Joel Stanley wrote: > On Wed, 26 Jun 2019 at 07:16, Andrew Jeffery wrote: > > > > We have handled the GFX register case for quite some time now. > > > > Signed-off-by: Andrew Jeffery > > --- > > drivers/pinctrl/aspeed/pinctrl-aspeed.h | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/drivers/pinctrl/aspeed/pinctrl-aspeed.h > > b/drivers/pinctrl/aspeed/pinctrl-aspeed.h > > index 4b06ddbc6aec..c5918c4a087c 100644 > > --- a/drivers/pinctrl/aspeed/pinctrl-aspeed.h > > +++ b/drivers/pinctrl/aspeed/pinctrl-aspeed.h > > @@ -240,8 +240,7 @@ > > * opposed to naming them e.g. PINMUX_CTRL_[0-9]). Further, signal > > expressions > > * reference registers beyond those dedicated to pinmux, such as the system > > * reset control and MAC clock configuration registers. The AST2500 goes a > > step > > AST2600 too? No mention of the GFX block in the pinctrl table for the 2600, it appears the pinmux state is entirely determined by SCU registers. > > Acked-by: Joel Stanley Cheers, Andrew > > > - * further and references registers in the graphics IP block, but that > > isn't > > - * handled yet. > > + * further and references registers in the graphics IP block. > > */ > > #define SCU2C 0x2C /* Misc. Control Register */ > > #define SCU3C 0x3C /* System Reset Control/Status Register */ > > -- > > 2.20.1 > > >
Re: [PATCH v3 2/4] objtool: Add support for C jump tables
On Wed, Jun 26, 2019 at 10:44:47PM -0500, Josh Poimboeuf wrote: > > > How about the following approach instead? This is the only other way I > > > can think of to annotate a jump table so that objtool can distinguish > > > it: > > > > > > #define __annotate_jump_table __section(".jump_table.rodata") > > > > > > Then bpf would just need the following: > > > > > > - static const void *jumptable[256] = { > > > + static const void __annotate_jump_table *jumptable[256] = { > > > > > > This would be less magical and fragile than my original approach. > > > > > > I think the jump table would still be placed with all the other rodata, > > > like before, because the vmlinux linker script recognizes the section > > > ".rodata" suffix and bundles them all together. > > > > I like it if that works :) > > Definitely cleaner. > > May be a bit more linker script magic would be necessary, > > but hopefully not. > > And no need to rely on gcc style of mangling static vars. The last patch was based weird, this one's based on upstream. Will test tomorrow. diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 8aaf7cd026b0..84212bcb5015 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -116,9 +116,14 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, ".pushsection .discard.unreachable\n\t" \ ".long 999b - .\n\t"\ ".popsection\n\t" + +/* Annotate a C jump table to enable objtool to follow the code flow */ +#define __annotate_jump_table __section(".jump_table.rodata") + #else #define annotate_reachable() #define annotate_unreachable() +#define __annotate_jump_table #endif #ifndef ASM_UNREACHABLE diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 080e2bb644cc..e67977e22967 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1299,7 +1299,7 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) { #define BPF_INSN_2_LBL(x, y)[BPF_##x | BPF_##y] = &&x##_##y #define BPF_INSN_3_LBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = &&x##_##y##_##z - static const void *jumptable[256] = { + static const void __annotate_jump_table *jumptable[256] = { [0 ... 255] = &&default_label, /* Now overwrite non-defaults ... */ BPF_INSN_MAP(BPF_INSN_2_LBL, BPF_INSN_3_LBL), @@ -1558,7 +1558,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) BUG_ON(1); return 0; } -STACK_FRAME_NON_STANDARD(___bpf_prog_run); /* jump table */ #define PROG_NAME(stack_size) __bpf_prog_run##stack_size #define DEFINE_BPF_PROG_RUN(stack_size) \ diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 172f99195726..6ade2b32f484 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -18,6 +18,8 @@ #define FAKE_JUMP_OFFSET -1 +#define C_JUMP_TABLE_SECTION ".jump_table.rodata" + struct alternative { struct list_head list; struct instruction *insn; @@ -1035,9 +1037,15 @@ static struct rela *find_switch_table(struct objtool_file *file, /* * Make sure the .rodata address isn't associated with a -* symbol. gcc jump tables are anonymous data. +* symbol. GCC jump tables are anonymous data. +* +* Also support C jump tables which are in the same format as +* switch jump tables. For objtool to recognize them, they +* need to be placed in the C_JUMP_TABLE_SECTION section. They +* have symbols associated with them. */ - if (find_symbol_containing(rodata_sec, table_offset)) + if (find_symbol_containing(rodata_sec, table_offset) && + strcmp(rodata_sec->name, C_JUMP_TABLE_SECTION)) continue; rodata_rela = find_rela_by_dest(rodata_sec, table_offset); @@ -1277,13 +1285,18 @@ static void mark_rodata(struct objtool_file *file) bool found = false; /* -* This searches for the .rodata section or multiple .rodata.func_name -* sections if -fdata-sections is being used. The .str.1.1 and .str.1.8 -* rodata sections are ignored as they don't contain jump tables. +* Search for the following rodata sections, each of which can +* potentially contain jump tables: +* +* - .rodata: can contain GCC switch tables +* - .rodata.: same, if -fdata-sections is being used +* - .jump_table.rodata: contains C annotated jump tables +* +* .rodata.str1.* sections are ignored; they don't contain jump tables. */ for_each_sec(file, sec) { - if (!strncmp(sec->name, ".rodata", 7) && - !strstr(sec->name, ".str1.")) { + if ((!strncmp(se