> The current code does a full search of the segment list every time for > every page. This is wasteful, since it's almost certain that the next > page will be in the same segment. Instead, check if the previous segment > covers the current page before doing the list search. > > Signed-off-by: Omar Sandoval <osan...@fb.com> > --- > fs/proc/kcore.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c > index e2ca58d49938..25fefdd05ee5 100644 > --- a/fs/proc/kcore.c > +++ b/fs/proc/kcore.c > @@ -428,10 +428,18 @@ read_kcore(struct file *file, char __user *buffer, > size_t buflen, loff_t *fpos) > if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) > tsz = buflen; > > + m = NULL; > while (buflen) { > - list_for_each_entry(m, &kclist_head, list) { > - if (start >= m->addr && start < (m->addr+m->size)) > - break; > + /* > + * If this is the first iteration or the address is not within > + * the previous entry, search for a matching entry. > + */ > + if (!m || start < m->addr || start >= m->addr + m->size) {
This line apparently triggers a KASAN warning since I rebooted on 4.19-rc1 This is 100% reproductible on my machine when the kdump service starts (fedora28 x86_64 VM), here's the full stack (on 4.19-rc1): [ 38.161102] BUG: KASAN: global-out-of-bounds in read_kcore+0xd5c/0xf20 [ 38.162123] Read of size 8 at addr ffffffffa6c0f770 by task kexec/6201 [ 38.163386] CPU: 16 PID: 6201 Comm: kexec Not tainted 4.19.0-rc1+ #13 [ 38.164374] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [ 38.166211] Call Trace: [ 38.166658] dump_stack+0x71/0xa9 [ 38.167443] print_address_description+0x65/0x22e [ 38.168194] ? read_kcore+0xd5c/0xf20 [ 38.168778] kasan_report.cold.6+0x241/0x306 [ 38.169458] read_kcore+0xd5c/0xf20 [ 38.170018] ? open_kcore+0x1d0/0x1d0 [ 38.170605] ? avc_has_perm_noaudit+0x370/0x370 [ 38.171291] ? kasan_unpoison_shadow+0x30/0x40 [ 38.171973] ? kasan_kmalloc+0xbf/0xe0 [ 38.172562] ? kmem_cache_alloc_trace+0x105/0x200 [ 38.173289] ? open_kcore+0x5f/0x1d0 [ 38.173858] ? open_kcore+0x5f/0x1d0 [ 38.174428] ? deref_stack_reg+0xe0/0xe0 [ 38.175038] proc_reg_read+0x18b/0x220 [ 38.175652] ? proc_reg_unlocked_ioctl+0x210/0x210 [ 38.176399] __vfs_read+0xe1/0x6b0 [ 38.176930] ? __x64_sys_copy_file_range+0x450/0x450 [ 38.177723] ? do_filp_open+0x190/0x250 [ 38.178313] ? may_open_dev+0xc0/0xc0 [ 38.178886] ? __fsnotify_update_child_dentry_flags.part.3+0x330/0x330 [ 38.179883] ? __fsnotify_inode_delete+0x20/0x20 [ 38.180608] ? __inode_security_revalidate+0x8e/0xb0 [ 38.181378] vfs_read+0xde/0x2c0 [ 38.181889] ksys_read+0xb2/0x160 [ 38.182413] ? kernel_write+0x130/0x130 [ 38.183000] ? task_work_run+0x74/0x1c0 [ 38.183621] do_syscall_64+0xa0/0x2e0 [ 38.184183] ? async_page_fault+0x8/0x30 [ 38.184802] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 38.185610] RIP: 0033:0x7fc525d74091 [ 38.186155] Code: fe ff ff 50 48 8d 3d b6 b6 09 00 e8 59 05 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 51 39 2d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48 [ 38.188980] RSP: 002b:00007fffd6802a28 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 38.190153] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc525d74091 [ 38.191240] RDX: 0000000000010000 RSI: 00000000017f90b0 RDI: 0000000000000004 [ 38.192329] RBP: 0000000000010000 R08: 00007fc526043420 R09: 0000000000000001 [ 38.194593] R10: 00000000017e8010 R11: 0000000000000246 R12: 00000000017f90b0 [ 38.196823] R13: 0000000000000004 R14: 00007fffd6802ac8 R15: 00007fffd6802cb0 [ 38.200526] The buggy address belongs to the variable: [ 38.202539] kclist_head+0x10/0x440 [ 38.205568] Memory state around the buggy address: [ 38.207411] ffffffffa6c0f600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 38.209648] ffffffffa6c0f680: 00 00 00 00 00 00 00 00 04 fa fa fa fa fa fa fa [ 38.211812] >ffffffffa6c0f700: 00 00 00 00 00 fa fa fa fa fa fa fa 00 00 fa fa [ 38.213936] ^ [ 38.216010] ffffffffa6c0f780: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 [ 38.218178] ffffffffa6c0f800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 38.220306] ================================================================== where [ 37.636589] read_kcore+0xd5c/0xf20 symbolizes to [< none >] read_kcore+0xd5c/0xf20 fs/proc/kcore.c:454 , the above check. I haven't checked but I think I am in the first case below: if (&m->list == &kclist_head) { meaning no address matched in the list, so you cannot check m->addr and m->size in this case -- I'm afraid you will have to run through the list just in case if that happens even if there likely won't be any match for the next address either. > + list_for_each_entry(m, &kclist_head, list) { > + if (start >= m->addr && > + start < m->addr + m->size) > + break; > + } > } > > if (&m->list == &kclist_head) { -- Dominique Martinet