Re: netconsole deadlock with virtnet

2020-11-17 Thread Sergey Senozhatsky
On (20/11/18 11:46), Sergey Senozhatsky wrote:
[..]
> > Because I'm not sure where the xmit_lock is taken while holding the
> > target_list_lock.
> 
> I don't see where does this happen. It seems to me that the report
> is not about broken locking order, but more about:
> - soft-irq can be preempted (while holding _xmit_lock) by a hardware
>   interrupt, that will attempt to acquire the same _xmit_lock lock.
> 
>CPU0
><>
> virtnet_poll_tx()
>  __netif_tx_lock()
>   spin_lock(_xmit_lock)
><>
> add_interrupt_randomness()
>  crng_fast_load()
>   printk()
>call_console_drivers()
> spin_lock_irqsave(&target_list_lock)
>spin_lock(_xmit_lock);
> 
> Does this make sense?

Hmm, lockdep says something similar, but there are 2 printk()
happening - both on local and remote CPUs.

[   21.149564]CPU0CPU1
[   21.149565]
[   21.149566]   lock(_xmit_ETHER#2);
[   21.149569]local_irq_disable();
[   21.149570]lock(console_owner);
[   21.149572]lock(target_list_lock);
[   21.149575]   
[   21.149576] lock(console_owner);

This CPU0 lock(_xmit_ETHER#2) -> hard IRQ -> lock(console_owner) is
basically
soft IRQ -> lock(_xmit_ETHER#2) -> hard IRQ -> printk()

Then CPU1 spins on xmit, which is owned by CPU0, CPU0 spins on
console_owner, which is owned by CPU1?

A quick-and-dirty idea (it doesn't fix the lockdep report) - can we
add some sort of max_loops variable to console_trylock_spinning(),
so that it will not spin forever in `while (READ_ONCE(console_waiter))`
waiting for a console_owner to pass the lock?

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: netconsole deadlock with virtnet

2020-11-17 Thread Sergey Senozhatsky
On (20/11/17 09:33), Steven Rostedt wrote:
> > [   21.149601] IN-HARDIRQ-W at:
> > [   21.149602]  __lock_acquire+0xa78/0x1a94
> > [   21.149603]  lock_acquire.part.0+0x170/0x360
> > [   21.149604]  lock_acquire+0x68/0x8c
> > [   21.149605]  console_unlock+0x1e8/0x6a4
> > [   21.149606]  vprintk_emit+0x1c4/0x3c4
> > [   21.149607]  vprintk_default+0x40/0x4c
> > [   21.149608]  vprintk_func+0x10c/0x220
> > [   21.149610]  printk+0x68/0x90
> > [   21.149611]  crng_fast_load+0x1bc/0x1c0
> > [   21.149612]  add_interrupt_randomness+0x280/0x290
> > [   21.149613]  handle_irq_event+0x80/0x120
> > [   21.149614]  handle_fasteoi_irq+0xac/0x200
> > [   21.149615]  __handle_domain_irq+0x84/0xf0
> > [   21.149616]  gic_handle_irq+0xd4/0x320
> > [   21.149617]  el1_irq+0xd0/0x180
> > [   21.149618]  arch_cpu_idle+0x24/0x44
> > [   21.149619]  default_idle_call+0x48/0xa0
> > [   21.149620]  do_idle+0x260/0x300
> > [   21.149621]  cpu_startup_entry+0x30/0x6c
> > [   21.149622]  rest_init+0x1b4/0x288
> > [   21.149624]  arch_call_rest_init+0x18/0x24
> > [   21.149625]  start_kernel+0x5cc/0x608
> > [   21.149625] IN-SOFTIRQ-W at:
> > [   21.149627]  __lock_acquire+0x894/0x1a94
> > [   21.149628]  lock_acquire.part.0+0x170/0x360
> > [   21.149629]  lock_acquire+0x68/0x8c
> > [   21.149630]  console_unlock+0x1e8/0x6a4
> > [   21.149631]  vprintk_emit+0x1c4/0x3c4
> > [   21.149632]  vprintk_default+0x40/0x4c
> > [   21.149633]  vprintk_func+0x10c/0x220
> > [   21.149634]  printk+0x68/0x90
> > [   21.149635]  hrtimer_interrupt+0x290/0x294
> > [   21.149636]  arch_timer_handler_virt+0x3c/0x50
> > [   21.149637]  handle_percpu_devid_irq+0x94/0x164
> > [   21.149673]  __handle_domain_irq+0x84/0xf0
> > [   21.149674]  gic_handle_irq+0xd4/0x320
> > [   21.149675]  el1_irq+0xd0/0x180
> > [   21.149676]  __do_softirq+0x108/0x638
> > [   21.149677]  __irq_exit_rcu+0x17c/0x1b0
> > [   21.149678]  irq_exit+0x18/0x44
> > [   21.149679]  __handle_domain_irq+0x88/0xf0
> > [   21.149680]  gic_handle_irq+0xd4/0x320
> > [   21.149681]  el1_irq+0xd0/0x180
> > [   21.149682]  
> > smp_call_function_many_cond+0x3cc/0x3f0
> > [   21.149683]  kick_all_cpus_sync+0x4c/0x80
> > [   21.149684]  load_module+0x1eec/0x2734
> > [   21.149685]  __do_sys_finit_module+0xbc/0x12c
> > [   21.149686]  __arm64_sys_finit_module+0x28/0x34
> > [   21.149687]  
> > el0_svc_common.constprop.0+0x84/0x200
> > [   21.149688]  do_el0_svc+0x2c/0x90
> > [   21.149689]  el0_svc+0x18/0x50
> > [   21.149690]  el0_sync_handler+0xe0/0x350
> > [   21.149691]  el0_sync+0x158/0x180

[..]

> It really sucks that we lose 190 messages that would help to decipher this
> more. :-p

Indeed.

> Because I'm not sure where the xmit_lock is taken while holding the
> target_list_lock.

I don't see where does this happen. It seems to me that the report
is not about broken locking order, but more about:
- soft-irq can be preempted (while holding _xmit_lock) by a hardware
  interrupt, that will attempt to acquire the same _xmit_lock lock.

   CPU0
   <>
virtnet_poll_tx()
 __netif_tx_lock()
  spin_lock(_xmit_lock)
   <>
add_interrupt_randomness()
 crng_fast_load()
  printk()
   call_console_drivers()
spin_lock_irqsave(&target_list_lock)
 spin_lock(_xmit_lock);

Does this make sense?

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drm/virtio: use kvmalloc for large allocations

2020-11-04 Thread Sergey Senozhatsky
Hi,

On (20/11/05 07:52), Gerd Hoffmann wrote:
> > -   *ents = kmalloc_array(*nents, sizeof(struct virtio_gpu_mem_entry),
> > - GFP_KERNEL);
> > +   *ents = kvmalloc_array(*nents,
> > +  sizeof(struct virtio_gpu_mem_entry),
> > +  GFP_KERNEL);
> 
> Shouldn't that be balanced with a kvfree() elsewhere?

I think it already is. ents pointer is assigned to vbuf->data_buf,
and free_vbuf() already uses kvfree(vbuf->data_buf) to free it.

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 15/15] block: Add FIXME comment to handle device_add_disk error

2016-08-17 Thread Sergey Senozhatsky
On (08/17/16 15:15), Fam Zheng wrote:
[..]
>   (
> rc = device_add_disk(e1, e2, e3);
>   |
>   + /* FIXME: handle error. */
> device_add_disk(e1, e2, e3);

or use __must_check for device_add_disk() function?
/* which is _attribute__((warn_unused_result)) */

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 11/15] zram: Pass attribute group to device_add_disk

2016-08-17 Thread Sergey Senozhatsky
On (08/18/16 10:59), Sergey Senozhatsky wrote:
[..]
> I like the previous "Error creating sysfs group for device" string better,
> than "Error creating disk", because the latter one is much less informative.
> 
> do you want to do something like below?
> 
> int device_add_disk(struct device *parent, struct gendisk *disk,
> ...
>if (attr_group) {
>retval = sysfs_create_group(&disk_to_dev(disk)->kobj,
>attr_group);
> 
> + pr_err("Error creating sysfs group for device ...\n", ...);

d'oh... a typo. pr_err() is meant to be in `if (retval)' branch.

>if (retval)
>goto fail;
>}

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 11/15] zram: Pass attribute group to device_add_disk

2016-08-17 Thread Sergey Senozhatsky
Hello,

On (08/17/16 15:15), Fam Zheng wrote:
[..]
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 20920a2..2331788 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1298,13 +1298,10 @@ static int zram_add(void)
>   zram->disk->queue->limits.discard_zeroes_data = 0;
>   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue);
>  
> - device_add_disk(NULL, zram->disk, NULL);
> + ret = device_add_disk(NULL, zram->disk, &zram_disk_attr_group);
>  
> - ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj,
> - &zram_disk_attr_group);
>   if (ret < 0) {
> - pr_err("Error creating sysfs group for device %d\n",
> - device_id);
> + pr_err("Error creating disk %d\n", device_id);
>   goto out_free_disk;
>   }
>   strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));

I like the previous "Error creating sysfs group for device" string better,
than "Error creating disk", because the latter one is much less informative.

do you want to do something like below?

int device_add_disk(struct device *parent, struct gendisk *disk,
...
   if (attr_group) {
   retval = sysfs_create_group(&disk_to_dev(disk)->kobj,
   attr_group);

+   pr_err("Error creating sysfs group for device ...\n", ...);

   if (retval)
   goto fail;
   }

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 00/12] Support non-lru page migration

2016-06-16 Thread Sergey Senozhatsky
On (06/16/16 15:47), Minchan Kim wrote:
> > [..]
> > > > this is what I'm getting with the [zsmalloc: keep first object offset 
> > > > in struct page]
> > > > applied:  "count:0 mapcount:-127". which may be not related to zsmalloc 
> > > > at this point.
> > > > 
> > > > kernel: BUG: Bad page state in process khugepaged  pfn:101db8
> > > > kernel: page:ea0004076e00 count:0 mapcount:-127 mapping:  
> > > > (null) index:0x1
> > > 
> > > Hm, it seems double free.
> > > 
> > > It doen't happen if you disable zram? IOW, it seems to be related
> > > zsmalloc migration?
> > 
> > need to test more, can't confidently answer now.
> > 
> > > How easy can you reprodcue it? Could you bisect it?
> > 
> > it takes some (um.. random) time to trigger the bug.
> > I'll try to come up with more details.
> 
> Could you revert [1] and retest?
> 
> [1] mm/compaction: split freepages without holding the zone lock

ok, so this is not related to zsmalloc. finally manged to reproduce
it. will fork a separate thread.

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 00/12] Support non-lru page migration

2016-06-15 Thread Sergey Senozhatsky
On (06/16/16 13:47), Minchan Kim wrote:
[..]
> > this is what I'm getting with the [zsmalloc: keep first object offset in 
> > struct page]
> > applied:  "count:0 mapcount:-127". which may be not related to zsmalloc at 
> > this point.
> > 
> > kernel: BUG: Bad page state in process khugepaged  pfn:101db8
> > kernel: page:ea0004076e00 count:0 mapcount:-127 mapping:  
> > (null) index:0x1
> 
> Hm, it seems double free.
> 
> It doen't happen if you disable zram? IOW, it seems to be related
> zsmalloc migration?

need to test more, can't confidently answer now.

> How easy can you reprodcue it? Could you bisect it?

it takes some (um.. random) time to trigger the bug.
I'll try to come up with more details.

-ss

> > kernel: flags: 0x8000()
> > kernel: page dumped because: nonzero mapcount
> > kernel: Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon 
> > crc32c_intel snd_hda_codec_realtek i2c_i801 snd_hda_codec_generic r8169 mii 
> > snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd 
> > soundcore lpc_ich processor mfd_core sch_fq_codel sd_mod hid_generic usb
> > kernel: CPU: 3 PID: 38 Comm: khugepaged Not tainted 
> > 4.7.0-rc3-next-20160615-dbg-5-gfd11984-dirty #491
> > kernel:   8801124c73f8 814d69b0 ea0004076e00
> > kernel:  81e658a0 8801124c7420 811e9b63 
> > kernel:  ea0004076e00 81e658a0 8801124c7440 811e9ca9
> > kernel: Call Trace:
> > kernel:  [] dump_stack+0x68/0x92
> > kernel:  [] bad_page+0x158/0x1a2
> > kernel:  [] free_pages_check_bad+0xfc/0x101
> > kernel:  [] free_hot_cold_page+0x135/0x5de
> > kernel:  [] __free_pages+0x67/0x72
> > kernel:  [] release_freepages+0x13a/0x191
> > kernel:  [] compact_zone+0x845/0x1155
> > kernel:  [] ? compaction_suitable+0x76/0x76
> > kernel:  [] compact_zone_order+0xe0/0x167
> > kernel:  [] ? compact_zone+0x1155/0x1155
> > kernel:  [] try_to_compact_pages+0x2f1/0x648
> > kernel:  [] ? try_to_compact_pages+0x2f1/0x648
> > kernel:  [] ? compaction_zonelist_suitable+0x3a6/0x3a6
> > kernel:  [] ? get_page_from_freelist+0x2c0/0x133c
> > kernel:  [] __alloc_pages_direct_compact+0xea/0x30d
> > kernel:  [] ? get_page_from_freelist+0x133c/0x133c
> > kernel:  [] ? drain_all_pages+0x1d6/0x205
> > kernel:  [] __alloc_pages_nodemask+0x143d/0x16b6
> > kernel:  [] ? debug_show_all_locks+0x226/0x226
> > kernel:  [] ? warn_alloc_failed+0x24c/0x24c
> > kernel:  [] ? finish_wait+0x1a4/0x1b0
> > kernel:  [] ? lock_acquire+0xec/0x147
> > kernel:  [] ? _raw_spin_unlock_irqrestore+0x3b/0x5c
> > kernel:  [] ? _raw_spin_unlock_irqrestore+0x47/0x5c
> > kernel:  [] ? finish_wait+0x1a4/0x1b0
> > kernel:  [] khugepaged+0x1d4/0x484f
> > kernel:  [] ? hugepage_vma_revalidate+0xef/0xef
> > kernel:  [] ? finish_task_switch+0x3de/0x484
> > kernel:  [] ? _raw_spin_unlock_irq+0x27/0x45
> > kernel:  [] ? trace_hardirqs_on_caller+0x3d2/0x492
> > kernel:  [] ? prepare_to_wait_event+0x3f7/0x3f7
> > kernel:  [] ? __schedule+0xa4d/0xd16
> > kernel:  [] kthread+0x252/0x261
> > kernel:  [] ? hugepage_vma_revalidate+0xef/0xef
> > kernel:  [] ? kthread_create_on_node+0x377/0x377
> > kernel:  [] ret_from_fork+0x1f/0x40
> > kernel:  [] ? kthread_create_on_node+0x377/0x377
> > -- Reboot --
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 00/12] Support non-lru page migration

2016-06-15 Thread Sergey Senozhatsky
On (06/16/16 11:58), Minchan Kim wrote:
[..]
> RAX: 2065676162726166 so rax is totally garbage, I think.
> It means obj_to_head returns garbage because get_first_obj_offset is
> utter crab because (page_idx / class->pages_per_zspage) was totally
> wrong.
> 
> > ^^
> > 6408:   f0 0f ba 28 00  lock btsl $0x0,(%rax)
>  
> 
> 
> > > Could you test with [zsmalloc: keep first object offset in struct page]
> > > in mmotm?
> > 
> > sure, I can.  will it help, tho? we have a race condition here I think.
> 
> I guess root cause is caused by get_first_obj_offset.

sounds reasonable.

> Please test with it.


this is what I'm getting with the [zsmalloc: keep first object offset in struct 
page]
applied:  "count:0 mapcount:-127". which may be not related to zsmalloc at this 
point.

kernel: BUG: Bad page state in process khugepaged  pfn:101db8
kernel: page:ea0004076e00 count:0 mapcount:-127 mapping:  (null) 
index:0x1
kernel: flags: 0x8000()
kernel: page dumped because: nonzero mapcount
kernel: Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon 
crc32c_intel snd_hda_codec_realtek i2c_i801 snd_hda_codec_generic r8169 mii 
snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd 
soundcore lpc_ich processor mfd_core sch_fq_codel sd_mod hid_generic usb
kernel: CPU: 3 PID: 38 Comm: khugepaged Not tainted 
4.7.0-rc3-next-20160615-dbg-5-gfd11984-dirty #491
kernel:   8801124c73f8 814d69b0 ea0004076e00
kernel:  81e658a0 8801124c7420 811e9b63 
kernel:  ea0004076e00 81e658a0 8801124c7440 811e9ca9
kernel: Call Trace:
kernel:  [] dump_stack+0x68/0x92
kernel:  [] bad_page+0x158/0x1a2
kernel:  [] free_pages_check_bad+0xfc/0x101
kernel:  [] free_hot_cold_page+0x135/0x5de
kernel:  [] __free_pages+0x67/0x72
kernel:  [] release_freepages+0x13a/0x191
kernel:  [] compact_zone+0x845/0x1155
kernel:  [] ? compaction_suitable+0x76/0x76
kernel:  [] compact_zone_order+0xe0/0x167
kernel:  [] ? compact_zone+0x1155/0x1155
kernel:  [] try_to_compact_pages+0x2f1/0x648
kernel:  [] ? try_to_compact_pages+0x2f1/0x648
kernel:  [] ? compaction_zonelist_suitable+0x3a6/0x3a6
kernel:  [] ? get_page_from_freelist+0x2c0/0x133c
kernel:  [] __alloc_pages_direct_compact+0xea/0x30d
kernel:  [] ? get_page_from_freelist+0x133c/0x133c
kernel:  [] ? drain_all_pages+0x1d6/0x205
kernel:  [] __alloc_pages_nodemask+0x143d/0x16b6
kernel:  [] ? debug_show_all_locks+0x226/0x226
kernel:  [] ? warn_alloc_failed+0x24c/0x24c
kernel:  [] ? finish_wait+0x1a4/0x1b0
kernel:  [] ? lock_acquire+0xec/0x147
kernel:  [] ? _raw_spin_unlock_irqrestore+0x3b/0x5c
kernel:  [] ? _raw_spin_unlock_irqrestore+0x47/0x5c
kernel:  [] ? finish_wait+0x1a4/0x1b0
kernel:  [] khugepaged+0x1d4/0x484f
kernel:  [] ? hugepage_vma_revalidate+0xef/0xef
kernel:  [] ? finish_task_switch+0x3de/0x484
kernel:  [] ? _raw_spin_unlock_irq+0x27/0x45
kernel:  [] ? trace_hardirqs_on_caller+0x3d2/0x492
kernel:  [] ? prepare_to_wait_event+0x3f7/0x3f7
kernel:  [] ? __schedule+0xa4d/0xd16
kernel:  [] kthread+0x252/0x261
kernel:  [] ? hugepage_vma_revalidate+0xef/0xef
kernel:  [] ? kthread_create_on_node+0x377/0x377
kernel:  [] ret_from_fork+0x1f/0x40
kernel:  [] ? kthread_create_on_node+0x377/0x377
-- Reboot --

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 00/12] Support non-lru page migration

2016-06-15 Thread Sergey Senozhatsky
Hi,

On (06/16/16 08:12), Minchan Kim wrote:
> > [  315.146533] kasan: CONFIG_KASAN_INLINE enabled
> > [  315.146538] kasan: GPF could be caused by NULL-ptr deref or user memory 
> > access
> > [  315.146546] general protection fault:  [#1] PREEMPT SMP KASAN
> > [  315.146576] Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon 
> > crc32c_intel r8169 i2c_i801 mii snd_hda_codec_realtek snd_hda_codec_generic 
> > snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd 
> > soundcore lpc_ich mfd_core processor sch_fq_codel sd_mod hid_generic usbhid 
> > hid ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common
> > [  315.146785] CPU: 3 PID: 38 Comm: khugepaged Not tainted 
> > 4.7.0-rc3-next-20160614-dbg-4-ga1c2cbc-dirty #488
> > [  315.146841] task: 8800bfaf2900 ti: 880112468000 task.ti: 
> > 880112468000
> > [  315.146859] RIP: 0010:[]  [] 
> > zs_page_migrate+0x355/0xaa0 [zsmalloc]
> 
> Thanks for the report!
> 
> zs_page_migrate+0x355? Could you tell me what line is it?
> 
> It seems to be related to obj_to_head.

reproduced. a bit different call stack this time. but the problem is
still the same.

zs_compact()
...
6371:   e8 00 00 00 00  callq  6376 
6376:   0f 0b   ud2
6378:   48 8b 95 a8 fe ff ffmov-0x158(%rbp),%rdx
637f:   4d 8d 74 24 78  lea0x78(%r12),%r14
6384:   4c 89 eemov%r13,%rsi
6387:   4c 89 e7mov%r12,%rdi
638a:   e8 86 c7 ff ff  callq  2b15 
638f:   41 89 c5mov%eax,%r13d
6392:   4c 89 f0mov%r14,%rax
6395:   48 c1 e8 03 shr$0x3,%rax
6399:   8a 04 18mov(%rax,%rbx,1),%al
639c:   84 c0   test   %al,%al
639e:   0f 85 f2 02 00 00   jne6696 
63a4:   41 8b 44 24 78  mov0x78(%r12),%eax
63a9:   41 0f af c7 imul   %r15d,%eax
63ad:   41 01 c5add%eax,%r13d
63b0:   4c 89 f0mov%r14,%rax
63b3:   48 c1 e8 03 shr$0x3,%rax
63b7:   48 01 d8add%rbx,%rax
63ba:   48 89 85 88 fe ff ffmov%rax,-0x178(%rbp)
63c1:   41 81 fd ff 0f 00 00cmp$0xfff,%r13d
63c8:   0f 87 1a 03 00 00   ja 66e8 
63ce:   49 63 f5movslq %r13d,%rsi
63d1:   48 03 b5 98 fe ff ffadd-0x168(%rbp),%rsi
63d8:   48 8b bd a8 fe ff ffmov-0x158(%rbp),%rdi
63df:   e8 67 d9 ff ff  callq  3d4b 
63e4:   a8 01   test   $0x1,%al
63e6:   0f 84 d9 02 00 00   je 66c5 
63ec:   48 83 e0 fe and$0xfffe,%rax
63f0:   bf 01 00 00 00  mov$0x1,%edi
63f5:   48 89 85 b0 fe ff ffmov%rax,-0x150(%rbp)
63fc:   e8 00 00 00 00  callq  6401 
6401:   48 8b 85 b0 fe ff ffmov-0x150(%rbp),%rax
^^
6408:   f0 0f ba 28 00  lock btsl $0x0,(%rax)
640d:   0f 82 98 02 00 00   jb 66ab 
6413:   48 8b 85 10 fe ff ffmov-0x1f0(%rbp),%rax
641a:   48 8d b8 48 10 00 00lea0x1048(%rax),%rdi
6421:   48 89 f8mov%rdi,%rax
6424:   48 c1 e8 03 shr$0x3,%rax
6428:   8a 04 18mov(%rax,%rbx,1),%al
642b:   84 c0   test   %al,%al
642d:   0f 85 c5 02 00 00   jne66f8 
6433:   48 8b 85 10 fe ff ffmov-0x1f0(%rbp),%rax
643a:   65 4c 8b 2c 25 00 00mov%gs:0x0,%r13
6441:   00 00 
6443:   49 8d bd 48 10 00 00lea0x1048(%r13),%rdi
644a:   ff 88 48 10 00 00   decl   0x1048(%rax)
6450:   48 89 f8mov%rdi,%rax
6453:   48 c1 e8 03 shr$0x3,%rax
6457:   8a 04 18mov(%rax,%rbx,1),%al
645a:   84 c0   test   %al,%al
645c:   0f 85 a8 02 00 00   jne670a 
6462:   41 83 bd 48 10 00 00cmpl   $0x0,0x1048(%r13)


which is

_next/./arch/x86/include/asm/bitops.h:206
_next/./arch/x86/include/asm/bitops.h:219
_next/include/linux/bit_spinlock.h:44
_next/mm/zsmalloc.c:950
_next/mm/zsmalloc.c:1774
_next/mm/zsmalloc.c:1809
_next/mm/zsmalloc.c:2306
_next/mm/zsmalloc.c:2346


smells like race conditon.



backtraces:

[  319.363646] kasan: CONFIG_KASAN_INLINE enabled
[  319.363650] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[  319.363658] general protection fault:  [#1] PREEMPT SMP KASAN
[  319.363688] Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon 
crc32c_intel snd_hda_codec_realtek snd_hda_codec_generic r

Re: [PATCH v7 00/12] Support non-lru page migration

2016-06-15 Thread Sergey Senozhatsky
Hello Minchan,

-next 4.7.0-rc3-next-20160614


[  315.146533] kasan: CONFIG_KASAN_INLINE enabled
[  315.146538] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[  315.146546] general protection fault:  [#1] PREEMPT SMP KASAN
[  315.146576] Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon 
crc32c_intel r8169 i2c_i801 mii snd_hda_codec_realtek snd_hda_codec_generic 
snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd 
soundcore lpc_ich mfd_core processor sch_fq_codel sd_mod hid_generic usbhid hid 
ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common
[  315.146785] CPU: 3 PID: 38 Comm: khugepaged Not tainted 
4.7.0-rc3-next-20160614-dbg-4-ga1c2cbc-dirty #488
[  315.146841] task: 8800bfaf2900 ti: 880112468000 task.ti: 
880112468000
[  315.146859] RIP: 0010:[]  [] 
zs_page_migrate+0x355/0xaa0 [zsmalloc]
[  315.146892] RSP: :88011246f138  EFLAGS: 00010293
[  315.146906] RAX: 736761742d6f6e2c RBX: 880017ad9a80 RCX: 
[  315.146924] RDX: 1064d704 RSI: 88000511469a RDI: 8326ba20
[  315.146942] RBP: 88011246f328 R08: 0001 R09: 
[  315.146959] R10: 88011246f0a8 R11: 8800bfc07fff R12: 88011246f300
[  315.146977] R13: ed0015523e6f R14: 8800aa91f378 R15: ea144500
[  315.146995] FS:  () GS:88011378() 
knlGS:
[  315.147015] CS:  0010 DS:  ES:  CR0: 80050033
[  315.147030] CR2: 7f3f97911000 CR3: 02209000 CR4: 06e0
[  315.147046] Stack:
[  315.147052]  110015523e0f 88011246f240 880005116800 
00017f80e000
[  315.147083]  880017ad9aa8 736761742d6f6e2c 11002248de34 
880017ad9a90
[  315.147113]  069a1246f660 069a 880005114000 
ea0002ff0180
[  315.147143] Call Trace:
[  315.147154]  [] ? obj_to_head+0x9d/0x9d [zsmalloc]
[  315.147175]  [] ? _raw_spin_unlock_irqrestore+0x47/0x5c
[  315.147195]  [] ? isolate_freepages_block+0x2f9/0x5a6
[  315.147213]  [] ? kasan_poison_shadow+0x2f/0x31
[  315.147230]  [] ? kasan_alloc_pages+0x39/0x3b
[  315.147246]  [] ? map_pages+0x1f3/0x3ad
[  315.147262]  [] ? update_pageblock_skip+0x18d/0x18d
[  315.147280]  [] ? up_read+0x1a/0x30
[  315.147296]  [] ? debug_check_no_locks_freed+0x150/0x22b
[  315.147315]  [] move_to_new_page+0x4dd/0x615
[  315.147332]  [] ? migrate_page+0x75/0x75
[  315.147347]  [] ? isolate_freepages_block+0x5a6/0x5a6
[  315.147366]  [] migrate_pages+0xadd/0x131a
[  315.147382]  [] ? isolate_freepages_block+0x5a6/0x5a6
[  315.147399]  [] ? kzfree+0x2b/0x2b
[  315.147414]  [] ? buffer_migrate_page+0x2db/0x2db
[  315.147431]  [] compact_zone+0xcdb/0x1155
[  315.147448]  [] ? compaction_suitable+0x76/0x76
[  315.147465]  [] compact_zone_order+0xe0/0x167
[  315.147481]  [] ? debug_show_all_locks+0x226/0x226
[  315.147499]  [] ? compact_zone+0x1155/0x1155
[  315.147515]  [] ? finish_task_switch+0x3de/0x484
[  315.147533]  [] try_to_compact_pages+0x2f1/0x648
[  315.147550]  [] ? try_to_compact_pages+0x2f1/0x648
[  315.147568]  [] ? compaction_zonelist_suitable+0x3a6/0x3a6
[  315.147589]  [] ? get_page_from_freelist+0x2c0/0x129a
[  315.147608]  [] __alloc_pages_direct_compact+0xea/0x30d
[  315.147626]  [] ? get_page_from_freelist+0x129a/0x129a
[  315.147645]  [] __alloc_pages_nodemask+0x840/0x16b6
[  315.147663]  [] ? try_to_wake_up+0x696/0x6c8
[  315.149147]  [] ? warn_alloc_failed+0x226/0x226
[  315.150615]  [] ? wake_up_process+0x10/0x12
[  315.152078]  [] ? wake_up_q+0x89/0xa7
[  315.153539]  [] ? rwsem_wake+0x131/0x15c
[  315.155007]  [] ? khugepaged+0x4072/0x484f
[  315.156471]  [] khugepaged+0x1d4/0x484f
[  315.157940]  [] ? hugepage_vma_revalidate+0xef/0xef
[  315.159402]  [] ? finish_task_switch+0x3de/0x484
[  315.160870]  [] ? _raw_spin_unlock_irq+0x27/0x45
[  315.162341]  [] ? trace_hardirqs_on_caller+0x3d2/0x492
[  315.163814]  [] ? prepare_to_wait_event+0x3f7/0x3f7
[  315.165295]  [] ? __schedule+0xa4d/0xd16
[  315.166763]  [] kthread+0x252/0x261
[  315.168214]  [] ? hugepage_vma_revalidate+0xef/0xef
[  315.169646]  [] ? kthread_create_on_node+0x377/0x377
[  315.171056]  [] ret_from_fork+0x1f/0x40
[  315.172462]  [] ? kthread_create_on_node+0x377/0x377
[  315.173869] Code: 03 b5 60 fe ff ff e8 2e fc ff ff a8 01 74 4c 48 83 e0 fe 
bf 01 00 00 00 48 89 85 38 fe ff ff e8 41 18 e1 e0 48 8b 85 38 fe ff ff  0f 
ba 28 00 73 29 bf 01 00 00 00 41 bc f5 ff ff ff e8 ea 27 
[  315.175573] RIP  [] zs_page_migrate+0x355/0xaa0 [zsmalloc]
[  315.177084]  RSP 
[  315.186572] ---[ end trace 0962b8ee48c98bbc ]---




[  315.186577] BUG: sleeping function called from invalid context at 
include/linux/sched.h:2960
[  315.186580] in_atomic(): 1, irqs_disabled(): 0, pid: 38, name: khugepaged
[  315.186581] INFO: lockdep is turned off.
[  315.186583] Preemption disabled at:[] 
zs_page_migrate+0x135/0xaa0 [zsmalloc]

[  315.186594] CPU: 3 PID: 38 Comm: khugepaged 

Re: [PATCH v5 02/12] mm: migrate: support non-lru movable page migration

2016-05-16 Thread Sergey Senozhatsky
On (05/17/16 10:18), Minchan Kim wrote:
[..]
> > >  #ifdef CONFIG_MIGRATION
> > >  
> > > +extern int PageMovable(struct page *page);
> > > +extern void __SetPageMovable(struct page *page, struct address_space 
> > > *mapping);
> > > +extern void __ClearPageMovable(struct page *page);
> > >  extern void putback_movable_pages(struct list_head *l);
> > >  extern int migrate_page(struct address_space *,
> > >   struct page *, struct page *, enum migrate_mode);
> > >  extern int migrate_pages(struct list_head *l, new_page_t new, 
> > > free_page_t free,
> > >   unsigned long private, enum migrate_mode mode, int reason);
> > > +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
> > > +extern void putback_movable_page(struct page *page);
> > >  
> > >  extern int migrate_prep(void);
> > >  extern int migrate_prep_local(void);
> > 
> > given that some of Movable users can be built as modules, shouldn't
> > at least some of those symbols be exported via EXPORT_SYMBOL?
> 
> Those functions aim for VM compaction so driver shouldn't use it.
> Only driver should be aware of are __SetPageMovable and __CleraPageMovable.
> I will export them.
> 

and PageMovable(), probably?

zsmalloc():

VM_BUG_ON_PAGE(!PageMovable(page), page);


> Thanks for the review, Sergey!

no prob! sorry, I'm a bit slow and late here.

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 02/12] mm: migrate: support non-lru movable page migration

2016-05-16 Thread Sergey Senozhatsky
On (05/09/16 11:20), Minchan Kim wrote:
[..]
> +++ b/include/linux/migrate.h
> @@ -32,11 +32,16 @@ extern char *migrate_reason_names[MR_TYPES];
>  
>  #ifdef CONFIG_MIGRATION
>  
> +extern int PageMovable(struct page *page);
> +extern void __SetPageMovable(struct page *page, struct address_space 
> *mapping);
> +extern void __ClearPageMovable(struct page *page);
>  extern void putback_movable_pages(struct list_head *l);
>  extern int migrate_page(struct address_space *,
>   struct page *, struct page *, enum migrate_mode);
>  extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t 
> free,
>   unsigned long private, enum migrate_mode mode, int reason);
> +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
> +extern void putback_movable_page(struct page *page);
>  
>  extern int migrate_prep(void);
>  extern int migrate_prep_local(void);

given that some of Movable users can be built as modules, shouldn't
at least some of those symbols be exported via EXPORT_SYMBOL?

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 02/12] mm: migrate: support non-lru movable page migration

2016-05-16 Thread Sergey Senozhatsky
On (05/09/16 11:20), Minchan Kim wrote:
> +++ b/include/linux/migrate.h
> @@ -32,11 +32,16 @@ extern char *migrate_reason_names[MR_TYPES];
>  
>  #ifdef CONFIG_MIGRATION
>  
> +extern int PageMovable(struct page *page);
> +extern void __SetPageMovable(struct page *page, struct address_space 
> *mapping);
> +extern void __ClearPageMovable(struct page *page);
>  extern void putback_movable_pages(struct list_head *l);
>  extern int migrate_page(struct address_space *,
>   struct page *, struct page *, enum migrate_mode);
>  extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t 
> free,
>   unsigned long private, enum migrate_mode mode, int reason);
> +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
> +extern void putback_movable_page(struct page *page);
>  
>  extern int migrate_prep(void);
>  extern int migrate_prep_local(void);

__ClearPageMovable() is under CONFIG_MIGRATION in include/linux/migrate.h,
but zsmalloc checks for CONFIG_COMPACTION.

can we have stub declarations of movable functions for !CONFIG_MIGRATION builds?
otherwise the users (zsmalloc, for example) have to do things like

static void reset_page(struct page *page)
{
#ifdef CONFIG_COMPACTION
__ClearPageMovable(page);
#endif
clear_bit(PG_private, &page->flags);
clear_bit(PG_private_2, &page->flags);
set_page_private(page, 0);
ClearPageHugeObject(page);
page->freelist = NULL;
}

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 11/16] zsmalloc: separate free_zspage from putback_zspage

2016-04-19 Thread Sergey Senozhatsky
Hello Minchan,

On (04/19/16 16:51), Minchan Kim wrote:
[..]
> 
> I guess it is remained thing after I rebased to catch any mistake.
> But I'm heavily chainging this part.
> Please review next version instead of this after a few days. :)

ah, got it. thanks!

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 08/16] zsmalloc: squeeze freelist into page->mapping

2016-04-18 Thread Sergey Senozhatsky
Hello,

On (03/30/16 16:12), Minchan Kim wrote:
[..]
> +static void objidx_to_page_and_offset(struct size_class *class,
> + struct page *first_page,
> + unsigned long obj_idx,
> + struct page **obj_page,
> + unsigned long *offset_in_page)
>  {
> - unsigned long obj;
> + int i;
> + unsigned long offset;
> + struct page *cursor;
> + int nr_page;
>  
> - if (!page) {
> - VM_BUG_ON(obj_idx);
> - return NULL;
> - }
> + offset = obj_idx * class->size;

so we already know the `offset' before we call objidx_to_page_and_offset(),
thus we can drop `struct size_class *class' and `obj_idx', and pass
`long obj_offset'  (which is `obj_idx * class->size') instead, right?

we also _may be_ can return `cursor' from the function.

static struct page *objidx_to_page_and_offset(struct page *first_page,
unsigned long obj_offset,
unsigned long *offset_in_page);

this can save ~20 instructions, which is not so terrible for a hot path
like obj_malloc(). what do you think?

well, seems that `unsigned long *offset_in_page' can be calculated
outside of this function too, it's basically

*offset_in_page = (obj_idx * class->size) & ~PAGE_MASK;

so we don't need to supply it to this function, nor modify it there.
which can save ~40 instructions on my system. does this sound silly?

-ss

> + cursor = first_page;
> + nr_page = offset >> PAGE_SHIFT;
>  
> - obj = page_to_pfn(page) << OBJ_INDEX_BITS;
> - obj |= ((obj_idx) & OBJ_INDEX_MASK);
> - obj <<= OBJ_TAG_BITS;
> + *offset_in_page = offset & ~PAGE_MASK;
> +
> + for (i = 0; i < nr_page; i++)
> + cursor = get_next_page(cursor);
>  
> - return (void *)obj;
> + *obj_page = cursor;
>  }

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 05/16] zsmalloc: keep max_object in size_class

2016-04-18 Thread Sergey Senozhatsky
Hello,

On (03/30/16 16:12), Minchan Kim wrote:
> 
> Every zspage in a size_class has same number of max objects so
> we could move it to a size_class.
> 

Reviewed-by: Sergey Senozhatsky 

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 09/16] zsmalloc: move struct zs_meta from mapping to freelist

2016-04-18 Thread Sergey Senozhatsky
Hello,

On (03/30/16 16:12), Minchan Kim wrote:
> For supporting migration from VM, we need to have address_space
> on every page so zsmalloc shouldn't use page->mapping. So,
> this patch moves zs_meta from mapping to freelist.
> 
> Signed-off-by: Minchan Kim 

a small get_zspage_meta() helper would make this patch shorter :)

Reviewed-by: Sergey Senozhatsky 

-ss

> ---
>  mm/zsmalloc.c | 22 +++---
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 807998462539..d4d33a819832 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -29,7 +29,7 @@
>   *   Look at size_class->huge.
>   *   page->lru: links together first pages of various zspages.
>   *   Basically forming list of zspages in a fullness group.
> - *   page->mapping: override by struct zs_meta
> + *   page->freelist: override by struct zs_meta
>   *
>   * Usage of struct page flags:
>   *   PG_private: identifies the first component page
> @@ -418,7 +418,7 @@ static int get_zspage_inuse(struct page *first_page)
>  
>   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
>  
> - m = (struct zs_meta *)&first_page->mapping;
> + m = (struct zs_meta *)&first_page->freelist;
>  
>   return m->inuse;
>  }
> @@ -429,7 +429,7 @@ static void set_zspage_inuse(struct page *first_page, int 
> val)
>  
>   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
>  
> - m = (struct zs_meta *)&first_page->mapping;
> + m = (struct zs_meta *)&first_page->freelist;
>   m->inuse = val;
>  }
>  
> @@ -439,7 +439,7 @@ static void mod_zspage_inuse(struct page *first_page, int 
> val)
>  
>   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
>  
> - m = (struct zs_meta *)&first_page->mapping;
> + m = (struct zs_meta *)&first_page->freelist;
>   m->inuse += val;
>  }
>  
> @@ -449,7 +449,7 @@ static void set_freeobj(struct page *first_page, int idx)
>  
>   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
>  
> - m = (struct zs_meta *)&first_page->mapping;
> + m = (struct zs_meta *)&first_page->freelist;
>   m->freeobj = idx;
>  }
>  
> @@ -459,7 +459,7 @@ static unsigned long get_freeobj(struct page *first_page)
>  
>   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
>  
> - m = (struct zs_meta *)&first_page->mapping;
> + m = (struct zs_meta *)&first_page->freelist;
>   return m->freeobj;
>  }
>  
> @@ -471,7 +471,7 @@ static void get_zspage_mapping(struct page *first_page,
>  
>   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
>  
> - m = (struct zs_meta *)&first_page->mapping;
> + m = (struct zs_meta *)&first_page->freelist;
>   *fullness = m->fullness;
>   *class_idx = m->class;
>  }
> @@ -484,7 +484,7 @@ static void set_zspage_mapping(struct page *first_page,
>  
>   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
>  
> - m = (struct zs_meta *)&first_page->mapping;
> + m = (struct zs_meta *)&first_page->freelist;
>   m->fullness = fullness;
>   m->class = class_idx;
>  }
> @@ -946,7 +946,6 @@ static void reset_page(struct page *page)
>   clear_bit(PG_private, &page->flags);
>   clear_bit(PG_private_2, &page->flags);
>   set_page_private(page, 0);
> - page->mapping = NULL;
>   page->freelist = NULL;
>  }
>  
> @@ -1056,6 +1055,7 @@ static struct page *alloc_zspage(struct size_class 
> *class, gfp_t flags)
>  
>   INIT_LIST_HEAD(&page->lru);
>   if (i == 0) {   /* first page */
> + page->freelist = NULL;
>   SetPagePrivate(page);
>   set_page_private(page, 0);
>   first_page = page;
> @@ -2068,9 +2068,9 @@ static int __init zs_init(void)
>  
>   /*
>* A zspage's a free object index, class index, fullness group,
> -  * inuse object count are encoded in its (first)page->mapping
> +  * inuse object count are encoded in its (first)page->freelist
>* so sizeof(struct zs_meta) should be less than
> -  * sizeof(page->mapping(i.e., unsigned long)).
> +  * sizeof(page->freelist(i.e., void *)).
>*/
>   BUILD_BUG_ON(sizeof(struct zs_meta) > sizeof(unsigned long));
>  
> -- 
> 1.9.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 07/16] zsmalloc: remove page_mapcount_reset

2016-04-18 Thread Sergey Senozhatsky
Hello,

On (03/30/16 16:12), Minchan Kim wrote:
> We don't use page->_mapcount any more so no need to reset.
> 
> Signed-off-by: Minchan Kim 

Reviewed-by: Sergey Senozhatsky 

-ss

> ---
>  mm/zsmalloc.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 4dd72a803568..0f6cce9b9119 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -922,7 +922,6 @@ static void reset_page(struct page *page)
>   set_page_private(page, 0);
>   page->mapping = NULL;
>   page->freelist = NULL;
> - page_mapcount_reset(page);
>  }
>  
>  static void free_zspage(struct page *first_page)
> -- 
> 1.9.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 06/16] zsmalloc: squeeze inuse into page->mapping

2016-04-18 Thread Sergey Senozhatsky
Hello,

On (03/30/16 16:12), Minchan Kim wrote:
[..]
> +static int get_zspage_inuse(struct page *first_page)
> +{
> + struct zs_meta *m;
> +
> + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
> +
> + m = (struct zs_meta *)&first_page->mapping;
..
> +static void set_zspage_inuse(struct page *first_page, int val)
> +{
> + struct zs_meta *m;
> +
> + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
> +
> + m = (struct zs_meta *)&first_page->mapping;
..
> +static void mod_zspage_inuse(struct page *first_page, int val)
> +{
> + struct zs_meta *m;
> +
> + VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
> +
> + m = (struct zs_meta *)&first_page->mapping;
..
>  static void get_zspage_mapping(struct page *first_page,
>   unsigned int *class_idx,
>   enum fullness_group *fullness)
>  {
> - unsigned long m;
> + struct zs_meta *m;
> +
>   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
> + m = (struct zs_meta *)&first_page->mapping;
..
>  static void set_zspage_mapping(struct page *first_page,
>   unsigned int class_idx,
>   enum fullness_group fullness)
>  {
> + struct zs_meta *m;
> +
>   VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
>  
> + m = (struct zs_meta *)&first_page->mapping;
> + m->fullness = fullness;
> + m->class = class_idx;
>  }


a nitpick: this

struct zs_meta *m;
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
m = (struct zs_meta *)&first_page->mapping;


seems to be common in several places, may be it makes sense to
factor it out and turn into a macro or a static inline helper?

other than that, looks good to me

Reviewed-by: Sergey Senozhatsky 

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 11/16] zsmalloc: separate free_zspage from putback_zspage

2016-04-17 Thread Sergey Senozhatsky
Hello Minchan,

On (03/30/16 16:12), Minchan Kim wrote:
[..]
> @@ -1835,23 +1827,31 @@ static void __zs_compact(struct zs_pool *pool, struct 
> size_class *class)
>   if (!migrate_zspage(pool, class, &cc))
>   break;
>  
> - putback_zspage(pool, class, dst_page);
> + VM_BUG_ON_PAGE(putback_zspage(pool, class,
> + dst_page) == ZS_EMPTY, dst_page);

can this VM_BUG_ON_PAGE() condition ever be true?

>   }
>   /* Stop if we couldn't find slot */
>   if (dst_page == NULL)
>   break;
> - putback_zspage(pool, class, dst_page);
> - if (putback_zspage(pool, class, src_page) == ZS_EMPTY)
> + VM_BUG_ON_PAGE(putback_zspage(pool, class,
> + dst_page) == ZS_EMPTY, dst_page);

hm... this VM_BUG_ON_PAGE(dst_page) is sort of confusing. under what
circumstances it can be true?

a minor nit, it took me some time (need some coffee I guess) to
correctly parse this macro wrapper

VM_BUG_ON_PAGE(putback_zspage(pool, class,
dst_page) == ZS_EMPTY, dst_page);

may be do it like:
fullness = putback_zspage(pool, class, dst_page);
VM_BUG_ON_PAGE(fullness == ZS_EMPTY, dst_page);


well, if we want to VM_BUG_ON_PAGE() at all. there haven't been any
problems with compaction, is there any specific reason these macros
were added?



> + if (putback_zspage(pool, class, src_page) == ZS_EMPTY) {
>   pool->stats.pages_compacted += class->pages_per_zspage;
> - spin_unlock(&class->lock);
> + spin_unlock(&class->lock);
> + free_zspage(pool, class, src_page);

do we really need to free_zspage() out of class->lock?
wouldn't something like this

if (putback_zspage(pool, class, src_page) == ZS_EMPTY) {
pool->stats.pages_compacted += class->pages_per_zspage;
free_zspage(pool, class, src_page);
}
spin_unlock(&class->lock);

be simpler?

besides, free_zspage() now updates class stats out of class lock,
not critical but still.

-ss

> + } else {
> + spin_unlock(&class->lock);
> + }
> +
>   cond_resched();
>   spin_lock(&class->lock);
>   }
>  
>   if (src_page)
> - putback_zspage(pool, class, src_page);
> + VM_BUG_ON_PAGE(putback_zspage(pool, class,
> + src_page) == ZS_EMPTY, src_page);
>  
>   spin_unlock(&class->lock);
>  }
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 10/16] zsmalloc: factor page chain functionality out

2016-04-17 Thread Sergey Senozhatsky
Hello,

On (03/30/16 16:12), Minchan Kim wrote:
> @@ -1421,7 +1434,6 @@ static unsigned long obj_malloc(struct size_class 
> *class,
>   unsigned long m_offset;
>   void *vaddr;
>  
> - handle |= OBJ_ALLOCATED_TAG;

a nitpick, why did you replace this ALLOCATED_TAG assignment
with 2 'handle | OBJ_ALLOCATED_TAG'?

-ss

>   obj = get_freeobj(first_page);
>   objidx_to_page_and_offset(class, first_page, obj,
>   &m_page, &m_offset);
> @@ -1431,10 +1443,10 @@ static unsigned long obj_malloc(struct size_class 
> *class,
>   set_freeobj(first_page, link->next >> OBJ_ALLOCATED_TAG);
>   if (!class->huge)
>   /* record handle in the header of allocated chunk */
> - link->handle = handle;
> + link->handle = handle | OBJ_ALLOCATED_TAG;
>   else
>   /* record handle in first_page->private */
> - set_page_private(first_page, handle);
> + set_page_private(first_page, handle | OBJ_ALLOCATED_TAG);
>   kunmap_atomic(vaddr);
>   mod_zspage_inuse(first_page, 1);
>   zs_stat_inc(class, OBJ_USED, 1);
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 00/16] Support non-lru page migration

2016-03-30 Thread Sergey Senozhatsky
On (03/30/16 16:11), Andrew Morton wrote:
[..]
> > For details, please read description in
> > "mm/compaction: support non-lru movable page migration".
> 
> OK, I grabbed all these.
> 
> I wonder about testing coverage during the -next period.  How many
> people are likely to exercise these code paths in a serious way before
> it all hits mainline?

I'm hammering the zsmalloc part.

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 09/19] zsmalloc: keep max_object in size_class

2016-03-19 Thread Sergey Senozhatsky
On (03/11/16 16:30), Minchan Kim wrote:
> Every zspage in a size_class has same number of max objects so
> we could move it to a size_class.
> 
> Signed-off-by: Minchan Kim 
> ---
>  mm/zsmalloc.c | 29 ++---
>  1 file changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index b4fb11831acb..ca663c82c1fc 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -32,8 +32,6 @@
>   *   page->freelist: points to the first free object in zspage.
>   *   Free objects are linked together using in-place
>   *   metadata.
> - *   page->objects: maximum number of objects we can store in this
> - *   zspage (class->zspage_order * PAGE_SIZE / class->size)
>   *   page->lru: links together first pages of various zspages.
>   *   Basically forming list of zspages in a fullness group.
>   *   page->mapping: class index and fullness group of the zspage
> @@ -211,6 +209,7 @@ struct size_class {
>* of ZS_ALIGN.
>*/
>   int size;
> + int objs_per_zspage;
>   unsigned int index;

struct page ->objects "comes for free". now we don't use it, instead
every size_class grows by 4 bytes? is there any reason for this?

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 05/19] zsmalloc: use first_page rather than page

2016-03-19 Thread Sergey Senozhatsky
On (03/11/16 16:30), Minchan Kim wrote:
> This patch cleans up function parameter "struct page".
> Many functions of zsmalloc expects that page paramter is "first_page"
> so use "first_page" rather than "page" for code readability.
> 
> Signed-off-by: Minchan Kim 

Reviewed-by: Sergey Senozhatsky 

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 07/19] zsmalloc: reordering function parameter

2016-03-19 Thread Sergey Senozhatsky
On (03/11/16 16:30), Minchan Kim wrote:
> This patch cleans up function parameter ordering to order
> higher data structure first.
> 
> Signed-off-by: Minchan Kim 

Reviewed-by: Sergey Senozhatsky 

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 08/19] zsmalloc: remove unused pool param in obj_free

2016-03-19 Thread Sergey Senozhatsky
On (03/11/16 16:30), Minchan Kim wrote:
> Let's remove unused pool param in obj_free
> 
> Signed-off-by: Minchan Kim 

Reviewed-by: Sergey Senozhatsky 

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 06/19] zsmalloc: clean up many BUG_ON

2016-03-19 Thread Sergey Senozhatsky
On (03/11/16 16:30), Minchan Kim wrote:
> There are many BUG_ON in zsmalloc.c which is not recommened so
> change them as alternatives.
> 
> Normal rule is as follows:
> 
> 1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE
> 2. use VM_BUG_ON_PAGE if we need to see struct page's fields
> 3. use those assertion in primitive functions so higher functions
> can rely on the assertion in the primitive function.
> 4. Don't use assertion if following instruction can trigger Oops
> 
> Signed-off-by: Minchan Kim 

Reviewed-by: Sergey Senozhatsky 

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 19/19] zram: use __GFP_MOVABLE for memory allocation

2016-03-19 Thread Sergey Senozhatsky
On (03/11/16 16:30), Minchan Kim wrote:
[..]
> init
> Node 0, zone  DMA208120 51 41 11  0  0  0 
>  0  0  0
> Node 0, zoneDMA32  16380  13777   9184   3805789 54  3  0 
>  0  0  0
> compaction
> Node 0, zone  DMA132 82 40 39 16  2  1  0 
>  0  0  0
> Node 0, zoneDMA32   5219   5526   4969   3455   1831677139 15 
>  0  0  0
> 
> new:
> 
> init
> Node 0, zone  DMA379115 97 19  2  0  0  0 
>  0  0  0
> Node 0, zoneDMA32  18891  16774  10862   3947637 21  0  0 
>  0  0  0
> compaction  1
> Node 0, zone  DMA214 66 87 29 10  3  0  0 
>  0  0  0
> Node 0, zoneDMA32   1612   3139   3154   2469   1745990384 94 
>  7  0  0
> 
> As you can see, compaction made so many high-order pages. Yay!
> 
> Signed-off-by: Minchan Kim 

Reviewed-by: Sergey Senozhatsky 

-ss
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 11/19] zsmalloc: squeeze freelist into page->mapping

2016-03-18 Thread Sergey Senozhatsky
On (03/11/16 16:30), Minchan Kim wrote:
> -static void *location_to_obj(struct page *page, unsigned long obj_idx)
> +static void objidx_to_page_and_ofs(struct size_class *class,
> + struct page *first_page,
> + unsigned long obj_idx,
> + struct page **obj_page,
> + unsigned long *ofs_in_page)

this looks big; 5 params, function "returning" both page and offset...
any chance to split it in two steps, perhaps?

besides, it is more intuitive (at least to me) when 'offset'
shortened to 'offt', not 'ofs'.

-ss

>  {
> - unsigned long obj;
> + int i;
> + unsigned long ofs;
> + struct page *cursor;
> + int nr_page;
>  
> - if (!page) {
> - VM_BUG_ON(obj_idx);
> - return NULL;
> - }
> + ofs = obj_idx * class->size;
> + cursor = first_page;
> + nr_page = ofs >> PAGE_SHIFT;
>  
> - obj = page_to_pfn(page) << OBJ_INDEX_BITS;
> - obj |= ((obj_idx) & OBJ_INDEX_MASK);
> - obj <<= OBJ_TAG_BITS;
> + *ofs_in_page = ofs & ~PAGE_MASK;
> +
> + for (i = 0; i < nr_page; i++)
> + cursor = get_next_page(cursor);
>  
> - return (void *)obj;
> + *obj_page = cursor;
>  }
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization