On Mon, May 1, 2017 at 4:41 AM, Baoquan He <b...@redhat.com> wrote:
> Jeff Moyer reported that on his system with two memory regions 0~64G and
> 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr
> will make system hang intermittently during boot. While adding 'nokaslr'
> won't.
>
> This is because the for loop count calculation in sync_global_pgds is
> not correct. When a mapping area crosses pgd entries, we should
> calculate the starting address of region which next pgd covers and assign
> it to next for loop count, but not add PGDIR_SIZE directly. The old
> code works right only if the mapping area is times of PGDIR_SIZE,
> otherwize the end region could be skipped so that it can't be synchronized
> to all other processes from kernel pgd init_mm.pgd.
>
> In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
> PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
> makes this area be mapped inside one pgd entry. With kaslr enabled,
> this area could cross two pgd entries, then the next pgd entry won't
> be synced to all other processes. That is why we saw empty PGD.
>
> Fix it in this patch.
>
> The back trace is pasted as below:
>
> [    9.988867] IP: memcpy_erms+0x6/0x10
> [    9.988868] PGD 0
> [    9.988868]
> [    9.988870] Oops: 0000 [#1] SMP
> [    9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) 
> syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) 
> ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) 
> nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E) 
> i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> [    9.988886] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: G            E   
> 4.11.0-rc5+ #43
> [    9.988887] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS 
> SE5C600.86B.02.01.SP06.050920141054 05/09/2014
> [    9.988888] task: ffff9267dc2f8000 task.stack: ffffba92c783c000
> [    9.988890] RIP: 0010:memcpy_erms+0x6/0x10
> [    9.988891] RSP: 0018:ffffba92c783f9b8 EFLAGS: 00010286
> [    9.988892] RAX: ffff925f19e27000 RBX: 0000000000000000 RCX: 
> 0000000000001000
> [    9.988893] RDX: 0000000000001000 RSI: ffff9387bfff0000 RDI: 
> ffff925f19e27000
> [    9.988893] RBP: ffffba92c783fa38 R08: 0000000000000000 R09: 
> 0000000017ffff80
> [    9.988894] R10: 0000000000000000 R11: ffff9387bfff0000 R12: 
> ffff925fde811ed8
> [    9.988895] R13: 0000002fffff0000 R14: 0000000000001000 R15: 
> ffff925f19e27000
> [    9.988896] FS:  00007f1ee18e68c0(0000) GS:ffff925fdec00000(0000) 
> knlGS:0000000000000000
> [    9.988896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    9.988897] CR2: ffff9387bfff0000 CR3: 000000081ba28000 CR4: 
> 00000000001406f0
> [    9.988897] Call Trace:
> [    9.988902]  ? pmem_do_bvec+0x93/0x290 [nd_pmem]
> [    9.988904]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
> [    9.988905]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
> [    9.988907]  pmem_rw_page+0x3a/0x60 [nd_pmem]
> [    9.988909]  bdev_read_page+0x81/0xb0
> [    9.988911]  do_mpage_readpage+0x56f/0x770
> [    9.988912]  ? I_BDEV+0x20/0x20
> [    9.988915]  ? lru_cache_add+0xe/0x10
> [    9.988917]  mpage_readpages+0x148/0x1e0
> [    9.988917]  ? I_BDEV+0x20/0x20
> [    9.988918]  ? I_BDEV+0x20/0x20
> [    9.988921]  ? alloc_pages_current+0x88/0x120
> [    9.988923]  blkdev_readpages+0x1d/0x20
> [    9.988924]  __do_page_cache_readahead+0x1ce/0x2c0
> [    9.988926]  force_page_cache_readahead+0xa2/0x100
> [    9.988927]  page_cache_sync_readahead+0x3f/0x50
> [    9.988930]  generic_file_read_iter+0x60d/0x8c0
> [    9.988931]  blkdev_read_iter+0x37/0x40
> [    9.988933]  __vfs_read+0xe0/0x150
> [    9.988934]  vfs_read+0x8c/0x130
> [    9.988936]  SyS_read+0x55/0xc0
> [    9.988939]  entry_SYSCALL_64_fastpath+0x1a/0xa9
> [    9.988940] RIP: 0033:0x7f1ee0822480
> [    9.988941] RSP: 002b:00007ffcf9e741f8 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000000
> [    9.988942] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
> 00007f1ee0822480
> [    9.988943] RDX: 0000000000000040 RSI: 0000561b7e1aabc8 RDI: 
> 0000000000000008
> [    9.988943] RBP: 0000561b7e1a86a0 R08: 0000000000000005 R09: 
> 0000000000000068
> [    9.988944] R10: 00007ffcf9e73f80 R11: 0000000000000246 R12: 
> 0000000000000000
> [    9.988945] R13: 0000000000000001 R14: 0000561b7e1a61b0 R15: 
> 0000561b7e1a55e0
> [    9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 
> 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> 
> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
> [    9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ffffba92c783f9b8
> [    9.988962] CR2: ffff9387bfff0000
> [    9.989022] ---[ end trace fe34c0fc0fe685ab ]---
> [    9.998690] Kernel panic - not syncing: Fatal exception
> [   10.004708] Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation 
> range: 0xffffffff80000000-0xffffffffbfffffff)
>
> Reported-by: Jeff Moyer <jmo...@redhat.com>
> Signed-off-by: Baoquan He <b...@redhat.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: "H. Peter Anvin" <h...@zytor.com>
> Cc: x...@kernel.org
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Thomas Garnier <thgar...@google.com>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Yasuaki Ishimatsu <yasu.isim...@gmail.com>
> Cc: Jinbum Park <jinb.pa...@gmail.com>
> Cc: Dave Hansen <dave.han...@linux.intel.com>
> Cc: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com>
> Cc: Yinghai Lu <ying...@kernel.org>
> Cc: Dan Williams <dan.j.willi...@intel.com>
> Cc: Dave Young <dyo...@redhat.com>
> ---
>  arch/x86/mm/init_64.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 15173d3..dbf4f00 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -94,12 +94,14 @@ __setup("noexec32=", nonx32_setup);
>   */
>  void sync_global_pgds(unsigned long start, unsigned long end)
>  {
> -       unsigned long address;
> +       unsigned long address, address_next;
>
> -       for (address = start; address <= end; address += PGDIR_SIZE) {
> +       for (address = start; address <= end; address = address_next) {
>                 const pgd_t *pgd_ref = pgd_offset_k(address);
>                 struct page *page;
>
> +               address_next = (address & PGDIR_MASK) + PGDIR_SIZE;
> +
>                 if (pgd_none(*pgd_ref))
>                         continue;
>

This one is better than V2.

It would better if could rename address to addr as Ingo suggested.

Acked-by: Yinghai Lu <ying...@kernel.org>

Reply via email to