Re: netconsole deadlock with virtnet
On (20/11/18 11:46), Sergey Senozhatsky wrote: [..] > > Because I'm not sure where the xmit_lock is taken while holding the > > target_list_lock. > > I don't see where does this happen. It seems to me that the report > is not about broken locking order, but more about: > - soft-irq can be preempted (while holding _xmit_lock) by a hardware > interrupt, that will attempt to acquire the same _xmit_lock lock. > >CPU0 ><> > virtnet_poll_tx() > __netif_tx_lock() > spin_lock(_xmit_lock) ><> > add_interrupt_randomness() > crng_fast_load() > printk() >call_console_drivers() > spin_lock_irqsave(&target_list_lock) >spin_lock(_xmit_lock); > > Does this make sense? Hmm, lockdep says something similar, but there are 2 printk() happening - both on local and remote CPUs. [ 21.149564]CPU0CPU1 [ 21.149565] [ 21.149566] lock(_xmit_ETHER#2); [ 21.149569]local_irq_disable(); [ 21.149570]lock(console_owner); [ 21.149572]lock(target_list_lock); [ 21.149575] [ 21.149576] lock(console_owner); This CPU0 lock(_xmit_ETHER#2) -> hard IRQ -> lock(console_owner) is basically soft IRQ -> lock(_xmit_ETHER#2) -> hard IRQ -> printk() Then CPU1 spins on xmit, which is owned by CPU0, CPU0 spins on console_owner, which is owned by CPU1? A quick-and-dirty idea (it doesn't fix the lockdep report) - can we add some sort of max_loops variable to console_trylock_spinning(), so that it will not spin forever in `while (READ_ONCE(console_waiter))` waiting for a console_owner to pass the lock? -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: netconsole deadlock with virtnet
On (20/11/17 09:33), Steven Rostedt wrote: > > [ 21.149601] IN-HARDIRQ-W at: > > [ 21.149602] __lock_acquire+0xa78/0x1a94 > > [ 21.149603] lock_acquire.part.0+0x170/0x360 > > [ 21.149604] lock_acquire+0x68/0x8c > > [ 21.149605] console_unlock+0x1e8/0x6a4 > > [ 21.149606] vprintk_emit+0x1c4/0x3c4 > > [ 21.149607] vprintk_default+0x40/0x4c > > [ 21.149608] vprintk_func+0x10c/0x220 > > [ 21.149610] printk+0x68/0x90 > > [ 21.149611] crng_fast_load+0x1bc/0x1c0 > > [ 21.149612] add_interrupt_randomness+0x280/0x290 > > [ 21.149613] handle_irq_event+0x80/0x120 > > [ 21.149614] handle_fasteoi_irq+0xac/0x200 > > [ 21.149615] __handle_domain_irq+0x84/0xf0 > > [ 21.149616] gic_handle_irq+0xd4/0x320 > > [ 21.149617] el1_irq+0xd0/0x180 > > [ 21.149618] arch_cpu_idle+0x24/0x44 > > [ 21.149619] default_idle_call+0x48/0xa0 > > [ 21.149620] do_idle+0x260/0x300 > > [ 21.149621] cpu_startup_entry+0x30/0x6c > > [ 21.149622] rest_init+0x1b4/0x288 > > [ 21.149624] arch_call_rest_init+0x18/0x24 > > [ 21.149625] start_kernel+0x5cc/0x608 > > [ 21.149625] IN-SOFTIRQ-W at: > > [ 21.149627] __lock_acquire+0x894/0x1a94 > > [ 21.149628] lock_acquire.part.0+0x170/0x360 > > [ 21.149629] lock_acquire+0x68/0x8c > > [ 21.149630] console_unlock+0x1e8/0x6a4 > > [ 21.149631] vprintk_emit+0x1c4/0x3c4 > > [ 21.149632] vprintk_default+0x40/0x4c > > [ 21.149633] vprintk_func+0x10c/0x220 > > [ 21.149634] printk+0x68/0x90 > > [ 21.149635] hrtimer_interrupt+0x290/0x294 > > [ 21.149636] arch_timer_handler_virt+0x3c/0x50 > > [ 21.149637] handle_percpu_devid_irq+0x94/0x164 > > [ 21.149673] __handle_domain_irq+0x84/0xf0 > > [ 21.149674] gic_handle_irq+0xd4/0x320 > > [ 21.149675] el1_irq+0xd0/0x180 > > [ 21.149676] __do_softirq+0x108/0x638 > > [ 21.149677] __irq_exit_rcu+0x17c/0x1b0 > > [ 21.149678] irq_exit+0x18/0x44 > > [ 21.149679] __handle_domain_irq+0x88/0xf0 > > [ 21.149680] gic_handle_irq+0xd4/0x320 > > [ 21.149681] el1_irq+0xd0/0x180 > > [ 21.149682] > > smp_call_function_many_cond+0x3cc/0x3f0 > > [ 21.149683] kick_all_cpus_sync+0x4c/0x80 > > [ 21.149684] load_module+0x1eec/0x2734 > > [ 21.149685] __do_sys_finit_module+0xbc/0x12c > > [ 21.149686] __arm64_sys_finit_module+0x28/0x34 > > [ 21.149687] > > el0_svc_common.constprop.0+0x84/0x200 > > [ 21.149688] do_el0_svc+0x2c/0x90 > > [ 21.149689] el0_svc+0x18/0x50 > > [ 21.149690] el0_sync_handler+0xe0/0x350 > > [ 21.149691] el0_sync+0x158/0x180 [..] > It really sucks that we lose 190 messages that would help to decipher this > more. :-p Indeed. > Because I'm not sure where the xmit_lock is taken while holding the > target_list_lock. I don't see where does this happen. It seems to me that the report is not about broken locking order, but more about: - soft-irq can be preempted (while holding _xmit_lock) by a hardware interrupt, that will attempt to acquire the same _xmit_lock lock. CPU0 <> virtnet_poll_tx() __netif_tx_lock() spin_lock(_xmit_lock) <> add_interrupt_randomness() crng_fast_load() printk() call_console_drivers() spin_lock_irqsave(&target_list_lock) spin_lock(_xmit_lock); Does this make sense? -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] drm/virtio: use kvmalloc for large allocations
Hi, On (20/11/05 07:52), Gerd Hoffmann wrote: > > - *ents = kmalloc_array(*nents, sizeof(struct virtio_gpu_mem_entry), > > - GFP_KERNEL); > > + *ents = kvmalloc_array(*nents, > > + sizeof(struct virtio_gpu_mem_entry), > > + GFP_KERNEL); > > Shouldn't that be balanced with a kvfree() elsewhere? I think it already is. ents pointer is assigned to vbuf->data_buf, and free_vbuf() already uses kvfree(vbuf->data_buf) to free it. -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 15/15] block: Add FIXME comment to handle device_add_disk error
On (08/17/16 15:15), Fam Zheng wrote: [..] > ( > rc = device_add_disk(e1, e2, e3); > | > + /* FIXME: handle error. */ > device_add_disk(e1, e2, e3); or use __must_check for device_add_disk() function? /* which is _attribute__((warn_unused_result)) */ -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 11/15] zram: Pass attribute group to device_add_disk
On (08/18/16 10:59), Sergey Senozhatsky wrote: [..] > I like the previous "Error creating sysfs group for device" string better, > than "Error creating disk", because the latter one is much less informative. > > do you want to do something like below? > > int device_add_disk(struct device *parent, struct gendisk *disk, > ... >if (attr_group) { >retval = sysfs_create_group(&disk_to_dev(disk)->kobj, >attr_group); > > + pr_err("Error creating sysfs group for device ...\n", ...); d'oh... a typo. pr_err() is meant to be in `if (retval)' branch. >if (retval) >goto fail; >} -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 11/15] zram: Pass attribute group to device_add_disk
Hello, On (08/17/16 15:15), Fam Zheng wrote: [..] > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index 20920a2..2331788 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -1298,13 +1298,10 @@ static int zram_add(void) > zram->disk->queue->limits.discard_zeroes_data = 0; > queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue); > > - device_add_disk(NULL, zram->disk, NULL); > + ret = device_add_disk(NULL, zram->disk, &zram_disk_attr_group); > > - ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj, > - &zram_disk_attr_group); > if (ret < 0) { > - pr_err("Error creating sysfs group for device %d\n", > - device_id); > + pr_err("Error creating disk %d\n", device_id); > goto out_free_disk; > } > strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor)); I like the previous "Error creating sysfs group for device" string better, than "Error creating disk", because the latter one is much less informative. do you want to do something like below? int device_add_disk(struct device *parent, struct gendisk *disk, ... if (attr_group) { retval = sysfs_create_group(&disk_to_dev(disk)->kobj, attr_group); + pr_err("Error creating sysfs group for device ...\n", ...); if (retval) goto fail; } -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 00/12] Support non-lru page migration
On (06/16/16 15:47), Minchan Kim wrote: > > [..] > > > > this is what I'm getting with the [zsmalloc: keep first object offset > > > > in struct page] > > > > applied: "count:0 mapcount:-127". which may be not related to zsmalloc > > > > at this point. > > > > > > > > kernel: BUG: Bad page state in process khugepaged pfn:101db8 > > > > kernel: page:ea0004076e00 count:0 mapcount:-127 mapping: > > > > (null) index:0x1 > > > > > > Hm, it seems double free. > > > > > > It doen't happen if you disable zram? IOW, it seems to be related > > > zsmalloc migration? > > > > need to test more, can't confidently answer now. > > > > > How easy can you reprodcue it? Could you bisect it? > > > > it takes some (um.. random) time to trigger the bug. > > I'll try to come up with more details. > > Could you revert [1] and retest? > > [1] mm/compaction: split freepages without holding the zone lock ok, so this is not related to zsmalloc. finally manged to reproduce it. will fork a separate thread. -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 00/12] Support non-lru page migration
On (06/16/16 13:47), Minchan Kim wrote: [..] > > this is what I'm getting with the [zsmalloc: keep first object offset in > > struct page] > > applied: "count:0 mapcount:-127". which may be not related to zsmalloc at > > this point. > > > > kernel: BUG: Bad page state in process khugepaged pfn:101db8 > > kernel: page:ea0004076e00 count:0 mapcount:-127 mapping: > > (null) index:0x1 > > Hm, it seems double free. > > It doen't happen if you disable zram? IOW, it seems to be related > zsmalloc migration? need to test more, can't confidently answer now. > How easy can you reprodcue it? Could you bisect it? it takes some (um.. random) time to trigger the bug. I'll try to come up with more details. -ss > > kernel: flags: 0x8000() > > kernel: page dumped because: nonzero mapcount > > kernel: Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon > > crc32c_intel snd_hda_codec_realtek i2c_i801 snd_hda_codec_generic r8169 mii > > snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd > > soundcore lpc_ich processor mfd_core sch_fq_codel sd_mod hid_generic usb > > kernel: CPU: 3 PID: 38 Comm: khugepaged Not tainted > > 4.7.0-rc3-next-20160615-dbg-5-gfd11984-dirty #491 > > kernel: 8801124c73f8 814d69b0 ea0004076e00 > > kernel: 81e658a0 8801124c7420 811e9b63 > > kernel: ea0004076e00 81e658a0 8801124c7440 811e9ca9 > > kernel: Call Trace: > > kernel: [] dump_stack+0x68/0x92 > > kernel: [] bad_page+0x158/0x1a2 > > kernel: [] free_pages_check_bad+0xfc/0x101 > > kernel: [] free_hot_cold_page+0x135/0x5de > > kernel: [] __free_pages+0x67/0x72 > > kernel: [] release_freepages+0x13a/0x191 > > kernel: [] compact_zone+0x845/0x1155 > > kernel: [] ? compaction_suitable+0x76/0x76 > > kernel: [] compact_zone_order+0xe0/0x167 > > kernel: [] ? compact_zone+0x1155/0x1155 > > kernel: [] try_to_compact_pages+0x2f1/0x648 > > kernel: [] ? try_to_compact_pages+0x2f1/0x648 > > kernel: [] ? compaction_zonelist_suitable+0x3a6/0x3a6 > > kernel: [] ? get_page_from_freelist+0x2c0/0x133c > > kernel: [] __alloc_pages_direct_compact+0xea/0x30d > > kernel: [] ? get_page_from_freelist+0x133c/0x133c > > kernel: [] ? drain_all_pages+0x1d6/0x205 > > kernel: [] __alloc_pages_nodemask+0x143d/0x16b6 > > kernel: [] ? debug_show_all_locks+0x226/0x226 > > kernel: [] ? warn_alloc_failed+0x24c/0x24c > > kernel: [] ? finish_wait+0x1a4/0x1b0 > > kernel: [] ? lock_acquire+0xec/0x147 > > kernel: [] ? _raw_spin_unlock_irqrestore+0x3b/0x5c > > kernel: [] ? _raw_spin_unlock_irqrestore+0x47/0x5c > > kernel: [] ? finish_wait+0x1a4/0x1b0 > > kernel: [] khugepaged+0x1d4/0x484f > > kernel: [] ? hugepage_vma_revalidate+0xef/0xef > > kernel: [] ? finish_task_switch+0x3de/0x484 > > kernel: [] ? _raw_spin_unlock_irq+0x27/0x45 > > kernel: [] ? trace_hardirqs_on_caller+0x3d2/0x492 > > kernel: [] ? prepare_to_wait_event+0x3f7/0x3f7 > > kernel: [] ? __schedule+0xa4d/0xd16 > > kernel: [] kthread+0x252/0x261 > > kernel: [] ? hugepage_vma_revalidate+0xef/0xef > > kernel: [] ? kthread_create_on_node+0x377/0x377 > > kernel: [] ret_from_fork+0x1f/0x40 > > kernel: [] ? kthread_create_on_node+0x377/0x377 > > -- Reboot -- ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 00/12] Support non-lru page migration
On (06/16/16 11:58), Minchan Kim wrote: [..] > RAX: 2065676162726166 so rax is totally garbage, I think. > It means obj_to_head returns garbage because get_first_obj_offset is > utter crab because (page_idx / class->pages_per_zspage) was totally > wrong. > > > ^^ > > 6408: f0 0f ba 28 00 lock btsl $0x0,(%rax) > > > > > > Could you test with [zsmalloc: keep first object offset in struct page] > > > in mmotm? > > > > sure, I can. will it help, tho? we have a race condition here I think. > > I guess root cause is caused by get_first_obj_offset. sounds reasonable. > Please test with it. this is what I'm getting with the [zsmalloc: keep first object offset in struct page] applied: "count:0 mapcount:-127". which may be not related to zsmalloc at this point. kernel: BUG: Bad page state in process khugepaged pfn:101db8 kernel: page:ea0004076e00 count:0 mapcount:-127 mapping: (null) index:0x1 kernel: flags: 0x8000() kernel: page dumped because: nonzero mapcount kernel: Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel snd_hda_codec_realtek i2c_i801 snd_hda_codec_generic r8169 mii snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd soundcore lpc_ich processor mfd_core sch_fq_codel sd_mod hid_generic usb kernel: CPU: 3 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160615-dbg-5-gfd11984-dirty #491 kernel: 8801124c73f8 814d69b0 ea0004076e00 kernel: 81e658a0 8801124c7420 811e9b63 kernel: ea0004076e00 81e658a0 8801124c7440 811e9ca9 kernel: Call Trace: kernel: [] dump_stack+0x68/0x92 kernel: [] bad_page+0x158/0x1a2 kernel: [] free_pages_check_bad+0xfc/0x101 kernel: [] free_hot_cold_page+0x135/0x5de kernel: [] __free_pages+0x67/0x72 kernel: [] release_freepages+0x13a/0x191 kernel: [] compact_zone+0x845/0x1155 kernel: [] ? compaction_suitable+0x76/0x76 kernel: [] compact_zone_order+0xe0/0x167 kernel: [] ? compact_zone+0x1155/0x1155 kernel: [] try_to_compact_pages+0x2f1/0x648 kernel: [] ? try_to_compact_pages+0x2f1/0x648 kernel: [] ? compaction_zonelist_suitable+0x3a6/0x3a6 kernel: [] ? get_page_from_freelist+0x2c0/0x133c kernel: [] __alloc_pages_direct_compact+0xea/0x30d kernel: [] ? get_page_from_freelist+0x133c/0x133c kernel: [] ? drain_all_pages+0x1d6/0x205 kernel: [] __alloc_pages_nodemask+0x143d/0x16b6 kernel: [] ? debug_show_all_locks+0x226/0x226 kernel: [] ? warn_alloc_failed+0x24c/0x24c kernel: [] ? finish_wait+0x1a4/0x1b0 kernel: [] ? lock_acquire+0xec/0x147 kernel: [] ? _raw_spin_unlock_irqrestore+0x3b/0x5c kernel: [] ? _raw_spin_unlock_irqrestore+0x47/0x5c kernel: [] ? finish_wait+0x1a4/0x1b0 kernel: [] khugepaged+0x1d4/0x484f kernel: [] ? hugepage_vma_revalidate+0xef/0xef kernel: [] ? finish_task_switch+0x3de/0x484 kernel: [] ? _raw_spin_unlock_irq+0x27/0x45 kernel: [] ? trace_hardirqs_on_caller+0x3d2/0x492 kernel: [] ? prepare_to_wait_event+0x3f7/0x3f7 kernel: [] ? __schedule+0xa4d/0xd16 kernel: [] kthread+0x252/0x261 kernel: [] ? hugepage_vma_revalidate+0xef/0xef kernel: [] ? kthread_create_on_node+0x377/0x377 kernel: [] ret_from_fork+0x1f/0x40 kernel: [] ? kthread_create_on_node+0x377/0x377 -- Reboot -- -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 00/12] Support non-lru page migration
Hi, On (06/16/16 08:12), Minchan Kim wrote: > > [ 315.146533] kasan: CONFIG_KASAN_INLINE enabled > > [ 315.146538] kasan: GPF could be caused by NULL-ptr deref or user memory > > access > > [ 315.146546] general protection fault: [#1] PREEMPT SMP KASAN > > [ 315.146576] Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon > > crc32c_intel r8169 i2c_i801 mii snd_hda_codec_realtek snd_hda_codec_generic > > snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd > > soundcore lpc_ich mfd_core processor sch_fq_codel sd_mod hid_generic usbhid > > hid ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common > > [ 315.146785] CPU: 3 PID: 38 Comm: khugepaged Not tainted > > 4.7.0-rc3-next-20160614-dbg-4-ga1c2cbc-dirty #488 > > [ 315.146841] task: 8800bfaf2900 ti: 880112468000 task.ti: > > 880112468000 > > [ 315.146859] RIP: 0010:[] [] > > zs_page_migrate+0x355/0xaa0 [zsmalloc] > > Thanks for the report! > > zs_page_migrate+0x355? Could you tell me what line is it? > > It seems to be related to obj_to_head. reproduced. a bit different call stack this time. but the problem is still the same. zs_compact() ... 6371: e8 00 00 00 00 callq 6376 6376: 0f 0b ud2 6378: 48 8b 95 a8 fe ff ffmov-0x158(%rbp),%rdx 637f: 4d 8d 74 24 78 lea0x78(%r12),%r14 6384: 4c 89 eemov%r13,%rsi 6387: 4c 89 e7mov%r12,%rdi 638a: e8 86 c7 ff ff callq 2b15 638f: 41 89 c5mov%eax,%r13d 6392: 4c 89 f0mov%r14,%rax 6395: 48 c1 e8 03 shr$0x3,%rax 6399: 8a 04 18mov(%rax,%rbx,1),%al 639c: 84 c0 test %al,%al 639e: 0f 85 f2 02 00 00 jne6696 63a4: 41 8b 44 24 78 mov0x78(%r12),%eax 63a9: 41 0f af c7 imul %r15d,%eax 63ad: 41 01 c5add%eax,%r13d 63b0: 4c 89 f0mov%r14,%rax 63b3: 48 c1 e8 03 shr$0x3,%rax 63b7: 48 01 d8add%rbx,%rax 63ba: 48 89 85 88 fe ff ffmov%rax,-0x178(%rbp) 63c1: 41 81 fd ff 0f 00 00cmp$0xfff,%r13d 63c8: 0f 87 1a 03 00 00 ja 66e8 63ce: 49 63 f5movslq %r13d,%rsi 63d1: 48 03 b5 98 fe ff ffadd-0x168(%rbp),%rsi 63d8: 48 8b bd a8 fe ff ffmov-0x158(%rbp),%rdi 63df: e8 67 d9 ff ff callq 3d4b 63e4: a8 01 test $0x1,%al 63e6: 0f 84 d9 02 00 00 je 66c5 63ec: 48 83 e0 fe and$0xfffe,%rax 63f0: bf 01 00 00 00 mov$0x1,%edi 63f5: 48 89 85 b0 fe ff ffmov%rax,-0x150(%rbp) 63fc: e8 00 00 00 00 callq 6401 6401: 48 8b 85 b0 fe ff ffmov-0x150(%rbp),%rax ^^ 6408: f0 0f ba 28 00 lock btsl $0x0,(%rax) 640d: 0f 82 98 02 00 00 jb 66ab 6413: 48 8b 85 10 fe ff ffmov-0x1f0(%rbp),%rax 641a: 48 8d b8 48 10 00 00lea0x1048(%rax),%rdi 6421: 48 89 f8mov%rdi,%rax 6424: 48 c1 e8 03 shr$0x3,%rax 6428: 8a 04 18mov(%rax,%rbx,1),%al 642b: 84 c0 test %al,%al 642d: 0f 85 c5 02 00 00 jne66f8 6433: 48 8b 85 10 fe ff ffmov-0x1f0(%rbp),%rax 643a: 65 4c 8b 2c 25 00 00mov%gs:0x0,%r13 6441: 00 00 6443: 49 8d bd 48 10 00 00lea0x1048(%r13),%rdi 644a: ff 88 48 10 00 00 decl 0x1048(%rax) 6450: 48 89 f8mov%rdi,%rax 6453: 48 c1 e8 03 shr$0x3,%rax 6457: 8a 04 18mov(%rax,%rbx,1),%al 645a: 84 c0 test %al,%al 645c: 0f 85 a8 02 00 00 jne670a 6462: 41 83 bd 48 10 00 00cmpl $0x0,0x1048(%r13) which is _next/./arch/x86/include/asm/bitops.h:206 _next/./arch/x86/include/asm/bitops.h:219 _next/include/linux/bit_spinlock.h:44 _next/mm/zsmalloc.c:950 _next/mm/zsmalloc.c:1774 _next/mm/zsmalloc.c:1809 _next/mm/zsmalloc.c:2306 _next/mm/zsmalloc.c:2346 smells like race conditon. backtraces: [ 319.363646] kasan: CONFIG_KASAN_INLINE enabled [ 319.363650] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 319.363658] general protection fault: [#1] PREEMPT SMP KASAN [ 319.363688] Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel snd_hda_codec_realtek snd_hda_codec_generic r
Re: [PATCH v7 00/12] Support non-lru page migration
Hello Minchan, -next 4.7.0-rc3-next-20160614 [ 315.146533] kasan: CONFIG_KASAN_INLINE enabled [ 315.146538] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 315.146546] general protection fault: [#1] PREEMPT SMP KASAN [ 315.146576] Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel r8169 i2c_i801 mii snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd soundcore lpc_ich mfd_core processor sch_fq_codel sd_mod hid_generic usbhid hid ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common [ 315.146785] CPU: 3 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160614-dbg-4-ga1c2cbc-dirty #488 [ 315.146841] task: 8800bfaf2900 ti: 880112468000 task.ti: 880112468000 [ 315.146859] RIP: 0010:[] [] zs_page_migrate+0x355/0xaa0 [zsmalloc] [ 315.146892] RSP: :88011246f138 EFLAGS: 00010293 [ 315.146906] RAX: 736761742d6f6e2c RBX: 880017ad9a80 RCX: [ 315.146924] RDX: 1064d704 RSI: 88000511469a RDI: 8326ba20 [ 315.146942] RBP: 88011246f328 R08: 0001 R09: [ 315.146959] R10: 88011246f0a8 R11: 8800bfc07fff R12: 88011246f300 [ 315.146977] R13: ed0015523e6f R14: 8800aa91f378 R15: ea144500 [ 315.146995] FS: () GS:88011378() knlGS: [ 315.147015] CS: 0010 DS: ES: CR0: 80050033 [ 315.147030] CR2: 7f3f97911000 CR3: 02209000 CR4: 06e0 [ 315.147046] Stack: [ 315.147052] 110015523e0f 88011246f240 880005116800 00017f80e000 [ 315.147083] 880017ad9aa8 736761742d6f6e2c 11002248de34 880017ad9a90 [ 315.147113] 069a1246f660 069a 880005114000 ea0002ff0180 [ 315.147143] Call Trace: [ 315.147154] [] ? obj_to_head+0x9d/0x9d [zsmalloc] [ 315.147175] [] ? _raw_spin_unlock_irqrestore+0x47/0x5c [ 315.147195] [] ? isolate_freepages_block+0x2f9/0x5a6 [ 315.147213] [] ? kasan_poison_shadow+0x2f/0x31 [ 315.147230] [] ? kasan_alloc_pages+0x39/0x3b [ 315.147246] [] ? map_pages+0x1f3/0x3ad [ 315.147262] [] ? update_pageblock_skip+0x18d/0x18d [ 315.147280] [] ? up_read+0x1a/0x30 [ 315.147296] [] ? debug_check_no_locks_freed+0x150/0x22b [ 315.147315] [] move_to_new_page+0x4dd/0x615 [ 315.147332] [] ? migrate_page+0x75/0x75 [ 315.147347] [] ? isolate_freepages_block+0x5a6/0x5a6 [ 315.147366] [] migrate_pages+0xadd/0x131a [ 315.147382] [] ? isolate_freepages_block+0x5a6/0x5a6 [ 315.147399] [] ? kzfree+0x2b/0x2b [ 315.147414] [] ? buffer_migrate_page+0x2db/0x2db [ 315.147431] [] compact_zone+0xcdb/0x1155 [ 315.147448] [] ? compaction_suitable+0x76/0x76 [ 315.147465] [] compact_zone_order+0xe0/0x167 [ 315.147481] [] ? debug_show_all_locks+0x226/0x226 [ 315.147499] [] ? compact_zone+0x1155/0x1155 [ 315.147515] [] ? finish_task_switch+0x3de/0x484 [ 315.147533] [] try_to_compact_pages+0x2f1/0x648 [ 315.147550] [] ? try_to_compact_pages+0x2f1/0x648 [ 315.147568] [] ? compaction_zonelist_suitable+0x3a6/0x3a6 [ 315.147589] [] ? get_page_from_freelist+0x2c0/0x129a [ 315.147608] [] __alloc_pages_direct_compact+0xea/0x30d [ 315.147626] [] ? get_page_from_freelist+0x129a/0x129a [ 315.147645] [] __alloc_pages_nodemask+0x840/0x16b6 [ 315.147663] [] ? try_to_wake_up+0x696/0x6c8 [ 315.149147] [] ? warn_alloc_failed+0x226/0x226 [ 315.150615] [] ? wake_up_process+0x10/0x12 [ 315.152078] [] ? wake_up_q+0x89/0xa7 [ 315.153539] [] ? rwsem_wake+0x131/0x15c [ 315.155007] [] ? khugepaged+0x4072/0x484f [ 315.156471] [] khugepaged+0x1d4/0x484f [ 315.157940] [] ? hugepage_vma_revalidate+0xef/0xef [ 315.159402] [] ? finish_task_switch+0x3de/0x484 [ 315.160870] [] ? _raw_spin_unlock_irq+0x27/0x45 [ 315.162341] [] ? trace_hardirqs_on_caller+0x3d2/0x492 [ 315.163814] [] ? prepare_to_wait_event+0x3f7/0x3f7 [ 315.165295] [] ? __schedule+0xa4d/0xd16 [ 315.166763] [] kthread+0x252/0x261 [ 315.168214] [] ? hugepage_vma_revalidate+0xef/0xef [ 315.169646] [] ? kthread_create_on_node+0x377/0x377 [ 315.171056] [] ret_from_fork+0x1f/0x40 [ 315.172462] [] ? kthread_create_on_node+0x377/0x377 [ 315.173869] Code: 03 b5 60 fe ff ff e8 2e fc ff ff a8 01 74 4c 48 83 e0 fe bf 01 00 00 00 48 89 85 38 fe ff ff e8 41 18 e1 e0 48 8b 85 38 fe ff ff 0f ba 28 00 73 29 bf 01 00 00 00 41 bc f5 ff ff ff e8 ea 27 [ 315.175573] RIP [] zs_page_migrate+0x355/0xaa0 [zsmalloc] [ 315.177084] RSP [ 315.186572] ---[ end trace 0962b8ee48c98bbc ]--- [ 315.186577] BUG: sleeping function called from invalid context at include/linux/sched.h:2960 [ 315.186580] in_atomic(): 1, irqs_disabled(): 0, pid: 38, name: khugepaged [ 315.186581] INFO: lockdep is turned off. [ 315.186583] Preemption disabled at:[] zs_page_migrate+0x135/0xaa0 [zsmalloc] [ 315.186594] CPU: 3 PID: 38 Comm: khugepaged
Re: [PATCH v5 02/12] mm: migrate: support non-lru movable page migration
On (05/17/16 10:18), Minchan Kim wrote: [..] > > > #ifdef CONFIG_MIGRATION > > > > > > +extern int PageMovable(struct page *page); > > > +extern void __SetPageMovable(struct page *page, struct address_space > > > *mapping); > > > +extern void __ClearPageMovable(struct page *page); > > > extern void putback_movable_pages(struct list_head *l); > > > extern int migrate_page(struct address_space *, > > > struct page *, struct page *, enum migrate_mode); > > > extern int migrate_pages(struct list_head *l, new_page_t new, > > > free_page_t free, > > > unsigned long private, enum migrate_mode mode, int reason); > > > +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode); > > > +extern void putback_movable_page(struct page *page); > > > > > > extern int migrate_prep(void); > > > extern int migrate_prep_local(void); > > > > given that some of Movable users can be built as modules, shouldn't > > at least some of those symbols be exported via EXPORT_SYMBOL? > > Those functions aim for VM compaction so driver shouldn't use it. > Only driver should be aware of are __SetPageMovable and __CleraPageMovable. > I will export them. > and PageMovable(), probably? zsmalloc(): VM_BUG_ON_PAGE(!PageMovable(page), page); > Thanks for the review, Sergey! no prob! sorry, I'm a bit slow and late here. -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 02/12] mm: migrate: support non-lru movable page migration
On (05/09/16 11:20), Minchan Kim wrote: [..] > +++ b/include/linux/migrate.h > @@ -32,11 +32,16 @@ extern char *migrate_reason_names[MR_TYPES]; > > #ifdef CONFIG_MIGRATION > > +extern int PageMovable(struct page *page); > +extern void __SetPageMovable(struct page *page, struct address_space > *mapping); > +extern void __ClearPageMovable(struct page *page); > extern void putback_movable_pages(struct list_head *l); > extern int migrate_page(struct address_space *, > struct page *, struct page *, enum migrate_mode); > extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t > free, > unsigned long private, enum migrate_mode mode, int reason); > +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode); > +extern void putback_movable_page(struct page *page); > > extern int migrate_prep(void); > extern int migrate_prep_local(void); given that some of Movable users can be built as modules, shouldn't at least some of those symbols be exported via EXPORT_SYMBOL? -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 02/12] mm: migrate: support non-lru movable page migration
On (05/09/16 11:20), Minchan Kim wrote: > +++ b/include/linux/migrate.h > @@ -32,11 +32,16 @@ extern char *migrate_reason_names[MR_TYPES]; > > #ifdef CONFIG_MIGRATION > > +extern int PageMovable(struct page *page); > +extern void __SetPageMovable(struct page *page, struct address_space > *mapping); > +extern void __ClearPageMovable(struct page *page); > extern void putback_movable_pages(struct list_head *l); > extern int migrate_page(struct address_space *, > struct page *, struct page *, enum migrate_mode); > extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t > free, > unsigned long private, enum migrate_mode mode, int reason); > +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode); > +extern void putback_movable_page(struct page *page); > > extern int migrate_prep(void); > extern int migrate_prep_local(void); __ClearPageMovable() is under CONFIG_MIGRATION in include/linux/migrate.h, but zsmalloc checks for CONFIG_COMPACTION. can we have stub declarations of movable functions for !CONFIG_MIGRATION builds? otherwise the users (zsmalloc, for example) have to do things like static void reset_page(struct page *page) { #ifdef CONFIG_COMPACTION __ClearPageMovable(page); #endif clear_bit(PG_private, &page->flags); clear_bit(PG_private_2, &page->flags); set_page_private(page, 0); ClearPageHugeObject(page); page->freelist = NULL; } -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 11/16] zsmalloc: separate free_zspage from putback_zspage
Hello Minchan, On (04/19/16 16:51), Minchan Kim wrote: [..] > > I guess it is remained thing after I rebased to catch any mistake. > But I'm heavily chainging this part. > Please review next version instead of this after a few days. :) ah, got it. thanks! -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 08/16] zsmalloc: squeeze freelist into page->mapping
Hello, On (03/30/16 16:12), Minchan Kim wrote: [..] > +static void objidx_to_page_and_offset(struct size_class *class, > + struct page *first_page, > + unsigned long obj_idx, > + struct page **obj_page, > + unsigned long *offset_in_page) > { > - unsigned long obj; > + int i; > + unsigned long offset; > + struct page *cursor; > + int nr_page; > > - if (!page) { > - VM_BUG_ON(obj_idx); > - return NULL; > - } > + offset = obj_idx * class->size; so we already know the `offset' before we call objidx_to_page_and_offset(), thus we can drop `struct size_class *class' and `obj_idx', and pass `long obj_offset' (which is `obj_idx * class->size') instead, right? we also _may be_ can return `cursor' from the function. static struct page *objidx_to_page_and_offset(struct page *first_page, unsigned long obj_offset, unsigned long *offset_in_page); this can save ~20 instructions, which is not so terrible for a hot path like obj_malloc(). what do you think? well, seems that `unsigned long *offset_in_page' can be calculated outside of this function too, it's basically *offset_in_page = (obj_idx * class->size) & ~PAGE_MASK; so we don't need to supply it to this function, nor modify it there. which can save ~40 instructions on my system. does this sound silly? -ss > + cursor = first_page; > + nr_page = offset >> PAGE_SHIFT; > > - obj = page_to_pfn(page) << OBJ_INDEX_BITS; > - obj |= ((obj_idx) & OBJ_INDEX_MASK); > - obj <<= OBJ_TAG_BITS; > + *offset_in_page = offset & ~PAGE_MASK; > + > + for (i = 0; i < nr_page; i++) > + cursor = get_next_page(cursor); > > - return (void *)obj; > + *obj_page = cursor; > } -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 05/16] zsmalloc: keep max_object in size_class
Hello, On (03/30/16 16:12), Minchan Kim wrote: > > Every zspage in a size_class has same number of max objects so > we could move it to a size_class. > Reviewed-by: Sergey Senozhatsky -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 09/16] zsmalloc: move struct zs_meta from mapping to freelist
Hello, On (03/30/16 16:12), Minchan Kim wrote: > For supporting migration from VM, we need to have address_space > on every page so zsmalloc shouldn't use page->mapping. So, > this patch moves zs_meta from mapping to freelist. > > Signed-off-by: Minchan Kim a small get_zspage_meta() helper would make this patch shorter :) Reviewed-by: Sergey Senozhatsky -ss > --- > mm/zsmalloc.c | 22 +++--- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 807998462539..d4d33a819832 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -29,7 +29,7 @@ > * Look at size_class->huge. > * page->lru: links together first pages of various zspages. > * Basically forming list of zspages in a fullness group. > - * page->mapping: override by struct zs_meta > + * page->freelist: override by struct zs_meta > * > * Usage of struct page flags: > * PG_private: identifies the first component page > @@ -418,7 +418,7 @@ static int get_zspage_inuse(struct page *first_page) > > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > > - m = (struct zs_meta *)&first_page->mapping; > + m = (struct zs_meta *)&first_page->freelist; > > return m->inuse; > } > @@ -429,7 +429,7 @@ static void set_zspage_inuse(struct page *first_page, int > val) > > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > > - m = (struct zs_meta *)&first_page->mapping; > + m = (struct zs_meta *)&first_page->freelist; > m->inuse = val; > } > > @@ -439,7 +439,7 @@ static void mod_zspage_inuse(struct page *first_page, int > val) > > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > > - m = (struct zs_meta *)&first_page->mapping; > + m = (struct zs_meta *)&first_page->freelist; > m->inuse += val; > } > > @@ -449,7 +449,7 @@ static void set_freeobj(struct page *first_page, int idx) > > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > > - m = (struct zs_meta *)&first_page->mapping; > + m = (struct zs_meta *)&first_page->freelist; > m->freeobj = idx; > } > > @@ -459,7 +459,7 @@ static unsigned long get_freeobj(struct page *first_page) > > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > > - m = (struct zs_meta *)&first_page->mapping; > + m = (struct zs_meta *)&first_page->freelist; > return m->freeobj; > } > > @@ -471,7 +471,7 @@ static void get_zspage_mapping(struct page *first_page, > > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > > - m = (struct zs_meta *)&first_page->mapping; > + m = (struct zs_meta *)&first_page->freelist; > *fullness = m->fullness; > *class_idx = m->class; > } > @@ -484,7 +484,7 @@ static void set_zspage_mapping(struct page *first_page, > > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > > - m = (struct zs_meta *)&first_page->mapping; > + m = (struct zs_meta *)&first_page->freelist; > m->fullness = fullness; > m->class = class_idx; > } > @@ -946,7 +946,6 @@ static void reset_page(struct page *page) > clear_bit(PG_private, &page->flags); > clear_bit(PG_private_2, &page->flags); > set_page_private(page, 0); > - page->mapping = NULL; > page->freelist = NULL; > } > > @@ -1056,6 +1055,7 @@ static struct page *alloc_zspage(struct size_class > *class, gfp_t flags) > > INIT_LIST_HEAD(&page->lru); > if (i == 0) { /* first page */ > + page->freelist = NULL; > SetPagePrivate(page); > set_page_private(page, 0); > first_page = page; > @@ -2068,9 +2068,9 @@ static int __init zs_init(void) > > /* >* A zspage's a free object index, class index, fullness group, > - * inuse object count are encoded in its (first)page->mapping > + * inuse object count are encoded in its (first)page->freelist >* so sizeof(struct zs_meta) should be less than > - * sizeof(page->mapping(i.e., unsigned long)). > + * sizeof(page->freelist(i.e., void *)). >*/ > BUILD_BUG_ON(sizeof(struct zs_meta) > sizeof(unsigned long)); > > -- > 1.9.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 07/16] zsmalloc: remove page_mapcount_reset
Hello, On (03/30/16 16:12), Minchan Kim wrote: > We don't use page->_mapcount any more so no need to reset. > > Signed-off-by: Minchan Kim Reviewed-by: Sergey Senozhatsky -ss > --- > mm/zsmalloc.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 4dd72a803568..0f6cce9b9119 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -922,7 +922,6 @@ static void reset_page(struct page *page) > set_page_private(page, 0); > page->mapping = NULL; > page->freelist = NULL; > - page_mapcount_reset(page); > } > > static void free_zspage(struct page *first_page) > -- > 1.9.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 06/16] zsmalloc: squeeze inuse into page->mapping
Hello, On (03/30/16 16:12), Minchan Kim wrote: [..] > +static int get_zspage_inuse(struct page *first_page) > +{ > + struct zs_meta *m; > + > + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > + > + m = (struct zs_meta *)&first_page->mapping; .. > +static void set_zspage_inuse(struct page *first_page, int val) > +{ > + struct zs_meta *m; > + > + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > + > + m = (struct zs_meta *)&first_page->mapping; .. > +static void mod_zspage_inuse(struct page *first_page, int val) > +{ > + struct zs_meta *m; > + > + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > + > + m = (struct zs_meta *)&first_page->mapping; .. > static void get_zspage_mapping(struct page *first_page, > unsigned int *class_idx, > enum fullness_group *fullness) > { > - unsigned long m; > + struct zs_meta *m; > + > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > + m = (struct zs_meta *)&first_page->mapping; .. > static void set_zspage_mapping(struct page *first_page, > unsigned int class_idx, > enum fullness_group fullness) > { > + struct zs_meta *m; > + > VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); > > + m = (struct zs_meta *)&first_page->mapping; > + m->fullness = fullness; > + m->class = class_idx; > } a nitpick: this struct zs_meta *m; VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); m = (struct zs_meta *)&first_page->mapping; seems to be common in several places, may be it makes sense to factor it out and turn into a macro or a static inline helper? other than that, looks good to me Reviewed-by: Sergey Senozhatsky -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 11/16] zsmalloc: separate free_zspage from putback_zspage
Hello Minchan, On (03/30/16 16:12), Minchan Kim wrote: [..] > @@ -1835,23 +1827,31 @@ static void __zs_compact(struct zs_pool *pool, struct > size_class *class) > if (!migrate_zspage(pool, class, &cc)) > break; > > - putback_zspage(pool, class, dst_page); > + VM_BUG_ON_PAGE(putback_zspage(pool, class, > + dst_page) == ZS_EMPTY, dst_page); can this VM_BUG_ON_PAGE() condition ever be true? > } > /* Stop if we couldn't find slot */ > if (dst_page == NULL) > break; > - putback_zspage(pool, class, dst_page); > - if (putback_zspage(pool, class, src_page) == ZS_EMPTY) > + VM_BUG_ON_PAGE(putback_zspage(pool, class, > + dst_page) == ZS_EMPTY, dst_page); hm... this VM_BUG_ON_PAGE(dst_page) is sort of confusing. under what circumstances it can be true? a minor nit, it took me some time (need some coffee I guess) to correctly parse this macro wrapper VM_BUG_ON_PAGE(putback_zspage(pool, class, dst_page) == ZS_EMPTY, dst_page); may be do it like: fullness = putback_zspage(pool, class, dst_page); VM_BUG_ON_PAGE(fullness == ZS_EMPTY, dst_page); well, if we want to VM_BUG_ON_PAGE() at all. there haven't been any problems with compaction, is there any specific reason these macros were added? > + if (putback_zspage(pool, class, src_page) == ZS_EMPTY) { > pool->stats.pages_compacted += class->pages_per_zspage; > - spin_unlock(&class->lock); > + spin_unlock(&class->lock); > + free_zspage(pool, class, src_page); do we really need to free_zspage() out of class->lock? wouldn't something like this if (putback_zspage(pool, class, src_page) == ZS_EMPTY) { pool->stats.pages_compacted += class->pages_per_zspage; free_zspage(pool, class, src_page); } spin_unlock(&class->lock); be simpler? besides, free_zspage() now updates class stats out of class lock, not critical but still. -ss > + } else { > + spin_unlock(&class->lock); > + } > + > cond_resched(); > spin_lock(&class->lock); > } > > if (src_page) > - putback_zspage(pool, class, src_page); > + VM_BUG_ON_PAGE(putback_zspage(pool, class, > + src_page) == ZS_EMPTY, src_page); > > spin_unlock(&class->lock); > } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 10/16] zsmalloc: factor page chain functionality out
Hello, On (03/30/16 16:12), Minchan Kim wrote: > @@ -1421,7 +1434,6 @@ static unsigned long obj_malloc(struct size_class > *class, > unsigned long m_offset; > void *vaddr; > > - handle |= OBJ_ALLOCATED_TAG; a nitpick, why did you replace this ALLOCATED_TAG assignment with 2 'handle | OBJ_ALLOCATED_TAG'? -ss > obj = get_freeobj(first_page); > objidx_to_page_and_offset(class, first_page, obj, > &m_page, &m_offset); > @@ -1431,10 +1443,10 @@ static unsigned long obj_malloc(struct size_class > *class, > set_freeobj(first_page, link->next >> OBJ_ALLOCATED_TAG); > if (!class->huge) > /* record handle in the header of allocated chunk */ > - link->handle = handle; > + link->handle = handle | OBJ_ALLOCATED_TAG; > else > /* record handle in first_page->private */ > - set_page_private(first_page, handle); > + set_page_private(first_page, handle | OBJ_ALLOCATED_TAG); > kunmap_atomic(vaddr); > mod_zspage_inuse(first_page, 1); > zs_stat_inc(class, OBJ_USED, 1); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 00/16] Support non-lru page migration
On (03/30/16 16:11), Andrew Morton wrote: [..] > > For details, please read description in > > "mm/compaction: support non-lru movable page migration". > > OK, I grabbed all these. > > I wonder about testing coverage during the -next period. How many > people are likely to exercise these code paths in a serious way before > it all hits mainline? I'm hammering the zsmalloc part. -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 09/19] zsmalloc: keep max_object in size_class
On (03/11/16 16:30), Minchan Kim wrote: > Every zspage in a size_class has same number of max objects so > we could move it to a size_class. > > Signed-off-by: Minchan Kim > --- > mm/zsmalloc.c | 29 ++--- > 1 file changed, 14 insertions(+), 15 deletions(-) > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index b4fb11831acb..ca663c82c1fc 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -32,8 +32,6 @@ > * page->freelist: points to the first free object in zspage. > * Free objects are linked together using in-place > * metadata. > - * page->objects: maximum number of objects we can store in this > - * zspage (class->zspage_order * PAGE_SIZE / class->size) > * page->lru: links together first pages of various zspages. > * Basically forming list of zspages in a fullness group. > * page->mapping: class index and fullness group of the zspage > @@ -211,6 +209,7 @@ struct size_class { >* of ZS_ALIGN. >*/ > int size; > + int objs_per_zspage; > unsigned int index; struct page ->objects "comes for free". now we don't use it, instead every size_class grows by 4 bytes? is there any reason for this? -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 05/19] zsmalloc: use first_page rather than page
On (03/11/16 16:30), Minchan Kim wrote: > This patch cleans up function parameter "struct page". > Many functions of zsmalloc expects that page paramter is "first_page" > so use "first_page" rather than "page" for code readability. > > Signed-off-by: Minchan Kim Reviewed-by: Sergey Senozhatsky -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 07/19] zsmalloc: reordering function parameter
On (03/11/16 16:30), Minchan Kim wrote: > This patch cleans up function parameter ordering to order > higher data structure first. > > Signed-off-by: Minchan Kim Reviewed-by: Sergey Senozhatsky -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 08/19] zsmalloc: remove unused pool param in obj_free
On (03/11/16 16:30), Minchan Kim wrote: > Let's remove unused pool param in obj_free > > Signed-off-by: Minchan Kim Reviewed-by: Sergey Senozhatsky -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 06/19] zsmalloc: clean up many BUG_ON
On (03/11/16 16:30), Minchan Kim wrote: > There are many BUG_ON in zsmalloc.c which is not recommened so > change them as alternatives. > > Normal rule is as follows: > > 1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE > 2. use VM_BUG_ON_PAGE if we need to see struct page's fields > 3. use those assertion in primitive functions so higher functions > can rely on the assertion in the primitive function. > 4. Don't use assertion if following instruction can trigger Oops > > Signed-off-by: Minchan Kim Reviewed-by: Sergey Senozhatsky -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 19/19] zram: use __GFP_MOVABLE for memory allocation
On (03/11/16 16:30), Minchan Kim wrote: [..] > init > Node 0, zone DMA208120 51 41 11 0 0 0 > 0 0 0 > Node 0, zoneDMA32 16380 13777 9184 3805789 54 3 0 > 0 0 0 > compaction > Node 0, zone DMA132 82 40 39 16 2 1 0 > 0 0 0 > Node 0, zoneDMA32 5219 5526 4969 3455 1831677139 15 > 0 0 0 > > new: > > init > Node 0, zone DMA379115 97 19 2 0 0 0 > 0 0 0 > Node 0, zoneDMA32 18891 16774 10862 3947637 21 0 0 > 0 0 0 > compaction 1 > Node 0, zone DMA214 66 87 29 10 3 0 0 > 0 0 0 > Node 0, zoneDMA32 1612 3139 3154 2469 1745990384 94 > 7 0 0 > > As you can see, compaction made so many high-order pages. Yay! > > Signed-off-by: Minchan Kim Reviewed-by: Sergey Senozhatsky -ss ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 11/19] zsmalloc: squeeze freelist into page->mapping
On (03/11/16 16:30), Minchan Kim wrote: > -static void *location_to_obj(struct page *page, unsigned long obj_idx) > +static void objidx_to_page_and_ofs(struct size_class *class, > + struct page *first_page, > + unsigned long obj_idx, > + struct page **obj_page, > + unsigned long *ofs_in_page) this looks big; 5 params, function "returning" both page and offset... any chance to split it in two steps, perhaps? besides, it is more intuitive (at least to me) when 'offset' shortened to 'offt', not 'ofs'. -ss > { > - unsigned long obj; > + int i; > + unsigned long ofs; > + struct page *cursor; > + int nr_page; > > - if (!page) { > - VM_BUG_ON(obj_idx); > - return NULL; > - } > + ofs = obj_idx * class->size; > + cursor = first_page; > + nr_page = ofs >> PAGE_SHIFT; > > - obj = page_to_pfn(page) << OBJ_INDEX_BITS; > - obj |= ((obj_idx) & OBJ_INDEX_MASK); > - obj <<= OBJ_TAG_BITS; > + *ofs_in_page = ofs & ~PAGE_MASK; > + > + for (i = 0; i < nr_page; i++) > + cursor = get_next_page(cursor); > > - return (void *)obj; > + *obj_page = cursor; > } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization