Re: [Intel-gfx] [PATCH 24/29] mm: vmscan: make global slab shrink lockless

2023-06-22 Thread Vlastimil Babka
On 6/22/23 10:53, Qi Zheng wrote:
> The shrinker_rwsem is a global read-write lock in
> shrinkers subsystem, which protects most operations
> such as slab shrink, registration and unregistration
> of shrinkers, etc. This can easily cause problems in
> the following cases.
> 
> 1) When the memory pressure is high and there are many
>filesystems mounted or unmounted at the same time,
>slab shrink will be affected (down_read_trylock()
>failed).
> 
>Such as the real workload mentioned by Kirill Tkhai:
> 
>```
>One of the real workloads from my experience is start
>of an overcommitted node containing many starting
>containers after node crash (or many resuming containers
>after reboot for kernel update). In these cases memory
>pressure is huge, and the node goes round in long reclaim.
>```
> 
> 2) If a shrinker is blocked (such as the case mentioned
>in [1]) and a writer comes in (such as mount a fs),
>then this writer will be blocked and cause all
>subsequent shrinker-related operations to be blocked.
> 
> Even if there is no competitor when shrinking slab, there
> may still be a problem. If we have a long shrinker list
> and we do not reclaim enough memory with each shrinker,
> then the down_read_trylock() may be called with high
> frequency. Because of the poor multicore scalability of
> atomic operations, this can lead to a significant drop
> in IPC (instructions per cycle).
> 
> We used to implement the lockless slab shrink with
> SRCU [1], but then kernel test robot reported -88.8%
> regression in stress-ng.ramfs.ops_per_sec test case [2],
> so we reverted it [3].
> 
> This commit uses the refcount+RCU method [4] proposed by
> by Dave Chinner to re-implement the lockless global slab
> shrink. The memcg slab shrink is handled in the subsequent
> patch.
> 
> Currently, the shrinker instances can be divided into
> the following three types:
> 
> a) global shrinker instance statically defined in the kernel,
> such as workingset_shadow_shrinker.
> 
> b) global shrinker instance statically defined in the kernel
> modules, such as mmu_shrinker in x86.
> 
> c) shrinker instance embedded in other structures.
> 
> For case a, the memory of shrinker instance is never freed.
> For case b, the memory of shrinker instance will be freed
> after the module is unloaded. But we will call synchronize_rcu()
> in free_module() to wait for RCU read-side critical section to
> exit. For case c, the memory of shrinker instance will be
> dynamically freed by calling kfree_rcu(). So we can use
> rcu_read_{lock,unlock}() to ensure that the shrinker instance
> is valid.
> 
> The shrinker::refcount mechanism ensures that the shrinker
> instance will not be run again after unregistration. So the
> structure that records the pointer of shrinker instance can be
> safely freed without waiting for the RCU read-side critical
> section.
> 
> In this way, while we implement the lockless slab shrink, we
> don't need to be blocked in unregister_shrinker() to wait
> RCU read-side critical section.
> 
> The following are the test results:
> 
> stress-ng --timeout 60 --times --verify --metrics-brief --ramfs 9 &
> 
> 1) Before applying this patchset:
> 
>  setting to a 60 second run per stressor
>  dispatching hogs: 9 ramfs
>  stressor   bogo ops real time  usr time  sys time   bogo ops/s bogo 
> ops/s
>(secs)(secs)(secs)   (real time) (usr+sys 
> time)
>  ramfs880623 60.02  7.71226.93 14671.45
> 3753.09
>  ramfs:
>   1 System Management Interrupt
>  for a 60.03s run time:
> 5762.40s available CPU time
>7.71s user time   (  0.13%)
>  226.93s system time (  3.94%)
>  234.64s total time  (  4.07%)
>  load average: 8.54 3.06 2.11
>  passed: 9: ramfs (9)
>  failed: 0
>  skipped: 0
>  successful run completed in 60.03s (1 min, 0.03 secs)
> 
> 2) After applying this patchset:
> 
>  setting to a 60 second run per stressor
>  dispatching hogs: 9 ramfs
>  stressor   bogo ops real time  usr time  sys time   bogo ops/s bogo 
> ops/s
>(secs)(secs)(secs)   (real time) (usr+sys 
> time)
>  ramfs847562 60.02  7.44230.22 14120.66
> 3566.23
>  ramfs:
>   4 System Management Interrupts
>  for a 60.12s run time:
> 5771.95s available CPU time
>7.44s user time   (  0.13%)
>  230.22s system time (  3.99%)
>  237.66s total time  (  4.12%)
>  load average: 8.18 2.43 0.84
>  passed: 9: ramfs (9)
>  failed: 0
>  skipped: 0
>  successful run completed in 60.12s (1 min, 0.12 secs)
> 
> We can see that the ops/s has hardly changed.
> 
> [1]. 
> https://lore.kernel.org/lkml/20230313112819.38938-1-zhengqi.a...@bytedance.com/
> [2]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie@intel.com/
> [3]. https://lore.kernel.org/all/20230609081518.3039120-1-qi.zh...@linux.dev/
> [4]. https://lore.kernel.org/

Re: [Intel-gfx] [PATCH 29/29] mm: shrinker: move shrinker-related code into a separate file

2023-06-22 Thread Vlastimil Babka
On 6/22/23 10:53, Qi Zheng wrote:
> The mm/vmscan.c file is too large, so separate the shrinker-related
> code from it into a separate file. No functional changes.
> 
> Signed-off-by: Qi Zheng 

Maybe do this move as patch 01 so the further changes are done in the new
file already?



Re: [Intel-gfx] [PATCH 01/29] mm: shrinker: add shrinker::private_data field

2023-06-22 Thread Vlastimil Babka
On 6/22/23 10:53, Qi Zheng wrote:
> To prepare for the dynamic allocation of shrinker instances
> embedded in other structures, add a private_data field to
> struct shrinker, so that we can use shrinker::private_data
> to record and get the original embedded structure.
> 
> Signed-off-by: Qi Zheng 

I would fold this to 02/29, less churn.

> ---
>  include/linux/shrinker.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index 224293b2dd06..43e6fcabbf51 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -70,6 +70,8 @@ struct shrinker {
>   int seeks;  /* seeks to recreate an obj */
>   unsigned flags;
>  
> + void *private_data;
> +
>   /* These are for internal use */
>   struct list_head list;
>  #ifdef CONFIG_MEMCG



Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix oops due to missing stack depot

2022-01-26 Thread Vlastimil Babka
On 1/26/22 09:15, Ville Syrjala wrote:
> From: Ville Syrjälä 
> 
> We call __save_depot_stack() unconditionally so the stack depot

Ah, in __untrack_all_wakerefs()? Looks like I missed it, sorry.

> must always be initialized or else we'll oops on platforms without
> runtime pm support.
> 
> Presumably we've not seen this in CI due to stack_depot_init()
> already getting called via drm_mm_init()+CONFIG_DRM_DEBUG_MM.
> 
> Cc: Vlastimil Babka 
> Cc: Dmitry Vyukov 
> Cc: Marco Elver  # stackdepot
> Cc: Chris Wilson 
> Cc: Imre Deak 
> Fixes: 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table 
> allocation by kvmalloc()")
> Signed-off-by: Ville Syrjälä 

Acked-by: Vlastimil Babka 
Thanks!

> ---
>  drivers/gpu/drm/i915/intel_runtime_pm.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
> b/drivers/gpu/drm/i915/intel_runtime_pm.c
> index 53f1ccb78849..64c2708efc9e 100644
> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
> @@ -68,9 +68,7 @@ static noinline depot_stack_handle_t 
> __save_depot_stack(void)
>  static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm)
>  {
>   spin_lock_init(&rpm->debug.lock);
> -
> - if (rpm->available)
> - stack_depot_init();
> + stack_depot_init();
>  }
>  
>  static noinline depot_stack_handle_t



Re: [Intel-gfx] [PATCH] lib/stackdepot: always do filter_irq_stacks() in stack_depot_save()

2021-11-30 Thread Vlastimil Babka
On 11/30/21 10:57, Marco Elver wrote:
> The non-interrupt portion of interrupt stack traces before interrupt
> entry is usually arbitrary. Therefore, saving stack traces of interrupts
> (that include entries before interrupt entry) to stack depot leads to
> unbounded stackdepot growth.
> 
> As such, use of filter_irq_stacks() is a requirement to ensure
> stackdepot can efficiently deduplicate interrupt stacks.
> 
> Looking through all current users of stack_depot_save(), none (except
> KASAN) pass the stack trace through filter_irq_stacks() before passing
> it on to stack_depot_save().
> 
> Rather than adding filter_irq_stacks() to all current users of
> stack_depot_save(), it became clear that stack_depot_save() should
> simply do filter_irq_stacks().

Agree.

> Signed-off-by: Marco Elver 

Acked-by: Vlastimil Babka 

Thanks.

> ---
>  lib/stackdepot.c  | 13 +
>  mm/kasan/common.c |  1 -
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index b437ae79aca1..519c7898c7f2 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -305,6 +305,9 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
>   * (allocates using GFP flags of @alloc_flags). If @can_alloc is %false, 
> avoids
>   * any allocations and will fail if no space is left to store the stack 
> trace.
>   *
> + * If the stack trace in @entries is from an interrupt, only the portion up 
> to
> + * interrupt entry is saved.
> + *
>   * Context: Any context, but setting @can_alloc to %false is required if
>   *  alloc_pages() cannot be used from the current context. Currently
>   *  this is the case from contexts where neither %GFP_ATOMIC nor
> @@ -323,6 +326,16 @@ depot_stack_handle_t __stack_depot_save(unsigned long 
> *entries,
>   unsigned long flags;
>   u32 hash;
>  
> + /*
> +  * If this stack trace is from an interrupt, including anything before
> +  * interrupt entry usually leads to unbounded stackdepot growth.
> +  *
> +  * Because use of filter_irq_stacks() is a requirement to ensure
> +  * stackdepot can efficiently deduplicate interrupt stacks, always
> +  * filter_irq_stacks() to simplify all callers' use of stackdepot.
> +  */
> + nr_entries = filter_irq_stacks(entries, nr_entries);
> +
>   if (unlikely(nr_entries == 0) || stack_depot_disable)
>   goto fast_exit;
>  
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index 8428da2aaf17..efaa836e5132 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -36,7 +36,6 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool 
> can_alloc)
>   unsigned int nr_entries;
>  
>   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
> - nr_entries = filter_irq_stacks(entries, nr_entries);
>   return __stack_depot_save(entries, nr_entries, flags, can_alloc);
>  }
>  
> 



Re: [Intel-gfx] [PATCH v3] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()

2021-10-15 Thread Vlastimil Babka
On 10/13/21 09:30, Vlastimil Babka wrote:
> Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated
> from memblock, even if stack depot ends up not actually used. The default size
> of stack_table is 4MB on 32-bit, 8MB on 64-bit.
> 
> This is fine for use-cases such as KASAN which is also a config option and
> has overhead on its own. But it's an issue for functionality that has to be
> actually enabled on boot (page_owner) or depends on hardware (GPU drivers)
> and thus the memory might be wasted. This was raised as an issue [1] when
> attempting to add stackdepot support for SLUB's debug object tracking
> functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable
> slub_debug on boot only when needed, or create only specific kmem caches with
> debugging for testing purposes.
> 
> It would thus be more efficient if stackdepot's table was allocated only when
> actually going to be used. This patch thus makes the allocation (and whole
> stack_depot_init() call) optional:
> 
> - Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
>   well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN
>   select this flag.
> - Other users have to call stack_depot_init() as part of their own init when
>   it's determined that stack depot will actually be used. This may depend on
>   both config and runtime conditions. Convert current users which are
>   page_owner and several in the DRM subsystem. Same will be done for SLUB
>   later.
> - Because the init might now be called after the boot-time memblock allocation
>   has given all memory to the buddy allocator, change stack_depot_init() to
>   allocate stack_table with kvmalloc() when memblock is no longer available.
>   Also handle allocation failure by disabling stackdepot (could have
>   theoretically happened even with memblock allocation previously), and don't
>   unnecessarily align the memblock allocation to its own size anymore.
> 
> [1] 
> https://lore.kernel.org/all/CAMuHMdW=eovzm1re5fvoen87nkfilmm2+ah7enu2kxehcvb...@mail.gmail.com/
...
> ---
> Changes in v3:
> - stack_depot_init_mutex made static and moved inside stack_depot_init()
>   Reported-by: kernel test robot 
> - use !stack_table condition instead of stack_table == NULL
>   reported by checkpatch on freedesktop.org patchwork

The last change above was missing because I forgot git commit --amend before
git format-patch. More importantly there was a bot report for FLATMEM. Please
add this fixup. Thanks.

8<
>From a971a1670491f8fbbaab579eef3c756a5263af95 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka 
Date: Thu, 7 Oct 2021 10:49:09 +0200
Subject: [PATCH] lib/stackdepot: allow optional init and stack_table
 allocation by kvmalloc() - fixup

On FLATMEM, we call page_ext_init_flatmem_late() just before kmem_cache_init()
which means stack_depot_init() (called by page owner init) will not recognize
properly it should use kvmalloc() and not memblock_alloc(). memblock_alloc()
will also not issue a warning and return a block memory that can be invalid and
cause kernel page fault when saving stacks, as reported by the kernel test
robot [1].

Fix this by moving page_ext_init_flatmem_late() below kmem_cache_init() so that
slab_is_available() is true during stack_depot_init(). SPARSEMEM doesn't have
this issue, as it doesn't do page_ext_init_flatmem_late(), but a different
page_ext_init() even later in the boot process.

Thanks to Mike Rapoport for pointing out the FLATMEM init ordering issue.

While at it, also actually resolve a checkpatch warning in stack_depot_init()
from DRM CI, which was supposed to be in the original patch already.

[1] https://lore.kernel.org/all/20211014085450.GC18719@xsang-OptiPlex-9020/

Signed-off-by: Vlastimil Babka 
Reported-by: kernel test robot 
---
 init/main.c  | 7 +--
 lib/stackdepot.c | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/init/main.c b/init/main.c
index ca2765c8e45c..0ab632f681c5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -845,9 +845,12 @@ static void __init mm_init(void)
stack_depot_early_init();
mem_init();
mem_init_print_info();
-   /* page_owner must be initialized after buddy is ready */
-   page_ext_init_flatmem_late();
kmem_cache_init();
+   /*
+* page_owner must be initialized after buddy is ready, and also after
+* slab is ready so that stack_depot_init() works properly
+*/
+   page_ext_init_flatmem_late();
kmemleak_init();
pgtable_init();
debug_objects_mem_init();
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 049d7d025d78..1f8ea6d0899b 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -172,7 +172,7 @@ __ref int stack_depot_init(void)
static DEFINE_MUT

Re: [Intel-gfx] [lib/stackdepot] 1cd8ce52c5: BUG:unable_to_handle_page_fault_for_address

2021-10-15 Thread Vlastimil Babka
On 10/14/21 12:16, Mike Rapoport wrote:
> On Thu, Oct 14, 2021 at 11:33:03AM +0200, Vlastimil Babka wrote:
>> On 10/14/21 10:54, kernel test robot wrote:
>> 
>> In my local testing of the patch, when stackdepot was initialized through
>> page owner init, it was using kvmalloc() so slab_is_available() was true.
>> Looks like the exact order of slab vs page_owner alloc is non-deterministic,
>> could be arch-dependent or just random ordering of init calls. A wrong order
>> will exploit the apparent fact that slab_is_available() is not a good
>> indicator of using memblock vs page allocator, and we would need a better 
>> one.
>> Thoughts?
> 
> The order of slab vs page_owner is deterministic, but it is different for
> FLATMEM and SPARSEMEM. And page_ext_init_flatmem_late() that initializes
> page_ext for FLATMEM is called exactly between buddy and slab setup:

Oh, so it was due to FLATMEM, thanks for figuring that out!

> static void __init mm_init(void)
> {
>   ...
> 
>   mem_init();
>   mem_init_print_info();
>   /* page_owner must be initialized after buddy is ready */
>   page_ext_init_flatmem_late();
>   kmem_cache_init();
> 
>   ...
> }
> 
> I've stared for a while at page_ext init and it seems that the
> page_ext_init_flatmem_late() can be simply dropped because there is anyway
> a call to invoke_init_callbacks() in page_ext_init() that is called much
> later in the boot process.

Yeah, but page_ext_init() only does something for SPARSEMEM, and is empty on
FLATMEM. Otherwise it would be duplicating all the work. So I'll just move
page_ext_init_flatmem_late() below kmem_cache_init() in mm_init(). Thanks
again!




Re: [Intel-gfx] [lib/stackdepot] 1cd8ce52c5: BUG:unable_to_handle_page_fault_for_address

2021-10-14 Thread Vlastimil Babka
On 10/14/21 10:54, kernel test robot wrote:
> 
> 
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 1cd8ce52c520c26c513899fb5aee42b8e5f60d0d ("[PATCH v2] lib/stackdepot: 
> allow optional init and stack_table allocation by kvmalloc()")
> url: 
> https://github.com/0day-ci/linux/commits/Vlastimil-Babka/lib-stackdepot-allow-optional-init-and-stack_table-allocation-by-kvmalloc/20211012-170816
> base: git://anongit.freedesktop.org/drm-intel for-linux-next
> 
> in testcase: rcutorture
> version: 
> with following parameters:
> 
>   runtime: 300s
>   test: cpuhotplug
>   torture_type: srcud
> 
> test-description: rcutorture is rcutorture kernel module load/unload test.
> test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
> 
> 
> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> +-+++
> | | a94a6d76c9 | 1cd8ce52c5 |
> +-+++
> | boot_successes  | 30 | 0  |
> | boot_failures   | 0  | 7  |
> | BUG:kernel_NULL_pointer_dereference,address | 0  | 2  |
> | Oops:#[##]  | 0  | 7  |
> | EIP:stack_depot_save| 0  | 7  |
> | Kernel_panic-not_syncing:Fatal_exception| 0  | 7  |
> | BUG:unable_to_handle_page_fault_for_address | 0  | 5  |
> +-+++
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot 
> 
> 
> 
> [  319.147926][  T259] BUG: unable to handle page fault for address: 0ec74110
> [  319.149309][  T259] #PF: supervisor read access in kernel mode
> [  319.150362][  T259] #PF: error_code(0x) - not-present page
> [  319.151372][  T259] *pde = 
> [  319.151964][  T259] Oops:  [#1] SMP
> [  319.152617][  T259] CPU: 0 PID: 259 Comm: systemd-rc-loca Not tainted 
> 5.15.0-rc1-00270-g1cd8ce52c520 #1
> [  319.154514][  T259] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS 1.12.0-1 04/01/2014
> [  319.156200][  T259] EIP: stack_depot_save+0x12a/0x4d0


Cc Mike Rapoport, looks like:
- memblock_alloc() should have failed (I think, because page allocator
  already took over?), but didn't. So apparently we got some area that wasn't
  fully mapped.
- using slab_is_available() is not accurate enough to detect when to use
memblock or page allocator (kvmalloc in case of my patch). I have used it
because memblock_alloc_internal() checks the same condition to issue a warning.

Relevant part of dmesg.xz that was attached:
[1.589075][T0] Dentry cache hash table entries: 524288 (order: 9, 
2097152 bytes, linear)
[1.592396][T0] Inode-cache hash table entries: 262144 (order: 8, 
1048576 bytes, linear)
[2.916844][T0] allocated 31496920 bytes of page_ext

- this means we were allocating from page allocator by alloc_pages_exact_nid() 
already

[2.918197][T0] mem auto-init: stack:off, heap alloc:off, heap free:on
[2.919683][T0] mem auto-init: clearing system memory may take some 
time...
[2.921239][T0] Initializing HighMem for node 0 (000b67fe:000bffe0)
[   23.023619][T0] Initializing Movable for node 0 (:)
[  245.194520][T0] Checking if this processor honours the WP bit even in 
supervisor mode...Ok.
[  245.196847][T0] Memory: 2914460K/3145208K available (20645K kernel code, 
5953K rwdata, 12624K rodata, 760K init, 8112K bss, 230748K reserved, 0K 
cma-reserved, 155528K highmem)
[  245.200521][T0] Stack Depot allocating hash table with memblock_alloc

- initializing stack depot as part of initializing page_owner, uses 
memblock_alloc()
  because slab_is_available() is still false

[  245.212005][T0] Node 0, zone   Normal: page owner found early allocated 
0 pages
[  245.213867][T0] Node 0, zone  HighMem: page owner found early allocated 
0 pages
[  245.216126][T0] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, 
Nodes=1

- printed by slub's kmem_cache_init() after create_kmalloc_caches() setting 
slab_state
  to UP, making slab_is_available() true, but too late

In my local testing of the patch, when stackdepot was initialized through
page owner init, it was using kvmalloc() so slab_is_available() was true.
Looks like the exact order of slab vs page_owner alloc is non-deterministic,
could be arch-dependent or just random ordering of init calls. A wrong order
will exploit the apparent fact that slab_is_available() is not a good
indicator of using memblock vs page allocator, and we would need a better one.
Thoughts?


[Intel-gfx] [PATCH v3] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()

2021-10-13 Thread Vlastimil Babka
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated
from memblock, even if stack depot ends up not actually used. The default size
of stack_table is 4MB on 32-bit, 8MB on 64-bit.

This is fine for use-cases such as KASAN which is also a config option and
has overhead on its own. But it's an issue for functionality that has to be
actually enabled on boot (page_owner) or depends on hardware (GPU drivers)
and thus the memory might be wasted. This was raised as an issue [1] when
attempting to add stackdepot support for SLUB's debug object tracking
functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable
slub_debug on boot only when needed, or create only specific kmem caches with
debugging for testing purposes.

It would thus be more efficient if stackdepot's table was allocated only when
actually going to be used. This patch thus makes the allocation (and whole
stack_depot_init() call) optional:

- Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
  well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN
  select this flag.
- Other users have to call stack_depot_init() as part of their own init when
  it's determined that stack depot will actually be used. This may depend on
  both config and runtime conditions. Convert current users which are
  page_owner and several in the DRM subsystem. Same will be done for SLUB
  later.
- Because the init might now be called after the boot-time memblock allocation
  has given all memory to the buddy allocator, change stack_depot_init() to
  allocate stack_table with kvmalloc() when memblock is no longer available.
  Also handle allocation failure by disabling stackdepot (could have
  theoretically happened even with memblock allocation previously), and don't
  unnecessarily align the memblock allocation to its own size anymore.

[1] 
https://lore.kernel.org/all/CAMuHMdW=eovzm1re5fvoen87nkfilmm2+ah7enu2kxehcvb...@mail.gmail.com/

Signed-off-by: Vlastimil Babka 
Acked-by: Dmitry Vyukov 
Reviewed-by: Marco Elver  # stackdepot
Cc: Marco Elver 
Cc: Vijayanand Jitta 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Geert Uytterhoeven 
Cc: Oliver Glitta 
Cc: Imran Khan 
---
Changes in v3:
- stack_depot_init_mutex made static and moved inside stack_depot_init()
  Reported-by: kernel test robot 
- use !stack_table condition instead of stack_table == NULL
  reported by checkpatch on freedesktop.org patchwork
 drivers/gpu/drm/drm_dp_mst_topology.c   |  1 +
 drivers/gpu/drm/drm_mm.c|  4 +++
 drivers/gpu/drm/i915/intel_runtime_pm.c |  3 +++
 include/linux/stackdepot.h  | 25 ---
 init/main.c |  2 +-
 lib/Kconfig |  4 +++
 lib/Kconfig.kasan   |  2 +-
 lib/stackdepot.c| 33 +
 mm/page_owner.c |  2 ++
 9 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 86d13d6bc463..b0ebdc843a00 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -5493,6 +5493,7 @@ int drm_dp_mst_topology_mgr_init(struct 
drm_dp_mst_topology_mgr *mgr,
mutex_init(&mgr->probe_lock);
 #if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS)
mutex_init(&mgr->topology_ref_history_lock);
+   stack_depot_init();
 #endif
INIT_LIST_HEAD(&mgr->tx_msg_downq);
INIT_LIST_HEAD(&mgr->destroy_port_list);
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 93d48a6f04ab..5916228ea0c9 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -983,6 +983,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
add_hole(&mm->head_node);
 
mm->scan_active = 0;
+
+#ifdef CONFIG_DRM_DEBUG_MM
+   stack_depot_init();
+#endif
 }
 EXPORT_SYMBOL(drm_mm_init);
 
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
b/drivers/gpu/drm/i915/intel_runtime_pm.c
index eaf7688f517d..d083506986e1 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -78,6 +78,9 @@ static void __print_depot_stack(depot_stack_handle_t stack,
 static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm)
 {
spin_lock_init(&rpm->debug.lock);
+
+   if (rpm->available)
+   stack_depot_init();
 }
 
 static noinline depot_stack_handle_t
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 6bb4bc1a5f54..40fc5e92194f 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -13,6 +13,22 @@
 
 typedef u32 depot_stack_handle_t;
 
+/*
+ * Every 

[Intel-gfx] [PATCH v2] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()

2021-10-12 Thread Vlastimil Babka
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated
from memblock, even if stack depot ends up not actually used. The default size
of stack_table is 4MB on 32-bit, 8MB on 64-bit.

This is fine for use-cases such as KASAN which is also a config option and
has overhead on its own. But it's an issue for functionality that has to be
actually enabled on boot (page_owner) or depends on hardware (GPU drivers)
and thus the memory might be wasted. This was raised as an issue [1] when
attempting to add stackdepot support for SLUB's debug object tracking
functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable
slub_debug on boot only when needed, or create only specific kmem caches with
debugging for testing purposes.

It would thus be more efficient if stackdepot's table was allocated only when
actually going to be used. This patch thus makes the allocation (and whole
stack_depot_init() call) optional:

- Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
  well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN
  select this flag.
- Other users have to call stack_depot_init() as part of their own init when
  it's determined that stack depot will actually be used. This may depend on
  both config and runtime conditions. Convert current users which are
  page_owner and several in the DRM subsystem. Same will be done for SLUB
  later.
- Because the init might now be called after the boot-time memblock allocation
  has given all memory to the buddy allocator, change stack_depot_init() to
  allocate stack_table with kvmalloc() when memblock is no longer available.
  Also handle allocation failure by disabling stackdepot (could have
  theoretically happened even with memblock allocation previously), and don't
  unnecessarily align the memblock allocation to its own size anymore.

[1] 
https://lore.kernel.org/all/CAMuHMdW=eovzm1re5fvoen87nkfilmm2+ah7enu2kxehcvb...@mail.gmail.com/

Signed-off-by: Vlastimil Babka 
Acked-by: Dmitry Vyukov 
Cc: Marco Elver 
Cc: Vijayanand Jitta 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Geert Uytterhoeven 
Cc: Oliver Glitta 
Cc: Imran Khan 
---
Changes in v2:
- Rebase to v5.15-rc5.
- Stylistic changes suggested by Marco Elver.
 drivers/gpu/drm/drm_dp_mst_topology.c   |  1 +
 drivers/gpu/drm/drm_mm.c|  4 
 drivers/gpu/drm/i915/intel_runtime_pm.c |  3 +++
 include/linux/stackdepot.h  | 25 ---
 init/main.c |  2 +-
 lib/Kconfig |  4 
 lib/Kconfig.kasan   |  2 +-
 lib/stackdepot.c| 32 +
 mm/page_owner.c |  2 ++
 9 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 86d13d6bc463..b0ebdc843a00 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -5493,6 +5493,7 @@ int drm_dp_mst_topology_mgr_init(struct 
drm_dp_mst_topology_mgr *mgr,
mutex_init(&mgr->probe_lock);
 #if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS)
mutex_init(&mgr->topology_ref_history_lock);
+   stack_depot_init();
 #endif
INIT_LIST_HEAD(&mgr->tx_msg_downq);
INIT_LIST_HEAD(&mgr->destroy_port_list);
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 93d48a6f04ab..5916228ea0c9 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -983,6 +983,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
add_hole(&mm->head_node);
 
mm->scan_active = 0;
+
+#ifdef CONFIG_DRM_DEBUG_MM
+   stack_depot_init();
+#endif
 }
 EXPORT_SYMBOL(drm_mm_init);
 
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
b/drivers/gpu/drm/i915/intel_runtime_pm.c
index eaf7688f517d..d083506986e1 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -78,6 +78,9 @@ static void __print_depot_stack(depot_stack_handle_t stack,
 static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm)
 {
spin_lock_init(&rpm->debug.lock);
+
+   if (rpm->available)
+   stack_depot_init();
 }
 
 static noinline depot_stack_handle_t
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 6bb4bc1a5f54..40fc5e92194f 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -13,6 +13,22 @@
 
 typedef u32 depot_stack_handle_t;
 
+/*
+ * Every user of stack depot has to call this during its own init when it's
+ * decided that it will be calling stack_depot_save() later.
+ *
+ * The alternative is to select STACKDEPOT_ALWAYS_I

Re: [Intel-gfx] [PATCH] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()

2021-10-12 Thread Vlastimil Babka
On 10/11/21 19:08, Marco Elver wrote:
> On Mon, 11 Oct 2021 at 19:02, Vlastimil Babka  wrote:
> [...]
>> > On the other hand, the lazy initialization mode you're introducing
>> > requires an explicit stack_depot_init() call somewhere and isn't as
>> > straightforward as before.
>> >
>> > Not sure what is best. My intuition tells me STACKDEPOT_LAZY_INIT would
>> > be safer as it's a deliberate opt-in to the lazy initialization
>> > behaviour.
>>
>> I think it should be fine with ALWAYS_INIT. There are not many stackdepot
>> users being added, and anyone developing a new one will very quickly find
>> out if they forget to call stack_depot_init()?
> 
> I think that's fine.
> 
>> > Preferences?
>> >
>> > [...]
>> >> --- a/drivers/gpu/drm/drm_mm.c
>> >> +++ b/drivers/gpu/drm/drm_mm.c
>> >> @@ -980,6 +980,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 
>> >> size)
>> >>  add_hole(&mm->head_node);
>> >>
>> >>  mm->scan_active = 0;
>> >> +
>> >> +#ifdef CONFIG_DRM_DEBUG_MM
>> >> +stack_depot_init();
>> >> +#endif
>> >
>> > DRM_DEBUG_MM implies STACKDEPOT. Not sure what is more readable to drm
>> > maintainers, but perhaps it'd be nicer to avoid the #ifdef here, and
>> > instead just keep the no-op version of stack_depot_init() in
>> > . I don't have a strong preference.
>>
>> Hm, but in case STACKDEPOT is also selected by something else (e.g.
>> CONFIG_PAGE_OWNER) which uses lazy init but isn't enabled on boot, then
>> without #ifdef CONFIG_DRM_DEBUG_MM above, this code would call a
>> stack_depot_init() (that's not a no-op) even in case it's not going to be
>> using it, so not what we want to achieve.
>> But it could be changed to use IS_ENABLED() if that's preferred by DRM folks.
> 
> You're right -- but I'll leave this to DRM folks.

Ah, the file only includes stackdepot.h in a #ifdef CONFIG_DRM_DEBUG_MM
section so I will keep the #ifdef here for a minimal change, unless
requested otherwise.


Re: [Intel-gfx] [PATCH] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()

2021-10-11 Thread Vlastimil Babka
On 10/7/21 13:01, Marco Elver wrote:
> On Thu, Oct 07, 2021 at 11:58AM +0200, Vlastimil Babka wrote:
> [...] 
>> - Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
>>   well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN
>>   select this flag.
>> - Other users have to call stack_depot_init() as part of their own init when
>>   it's determined that stack depot will actually be used. This may depend on
>>   both config and runtime conditions. Convert current users which are
>>   page_owner and several in the DRM subsystem. Same will be done for SLUB
>>   later.
>> - Because the init might now be called after the boot-time memblock 
>> allocation
>>   has given all memory to the buddy allocator, change stack_depot_init() to
>>   allocate stack_table with kvmalloc() when memblock is no longer available.
>>   Also handle allocation failure by disabling stackdepot (could have
>>   theoretically happened even with memblock allocation previously), and don't
>>   unnecessarily align the memblock allocation to its own size anymore.
> ...
>> Hi, I'd appreciate review of the DRM parts - namely that I've got correctly
>> that stack_depot_init() is called from the proper init functions and iff
>> stack_depot_save() is going to be used later. Thanks!
> 
> For ease of review between stackdepot and DRM changes, I thought it'd be
> nice to split into 2 patches, but not sure it'll work, because you're
> changing the semantics of the normal STACKDEPOT.

Yeah, that's why it's a single patch. As the DRM parts are clearly separated
to their files, I think review should be fine.

> One option would be to flip it around, and instead have
> STACKDEPOT_LAZY_INIT, but that seems counter-intuitive if the majority
> of STACKDEPOT users are LAZY_INIT users.

Agree.

> On the other hand, the lazy initialization mode you're introducing
> requires an explicit stack_depot_init() call somewhere and isn't as
> straightforward as before.
> 
> Not sure what is best. My intuition tells me STACKDEPOT_LAZY_INIT would
> be safer as it's a deliberate opt-in to the lazy initialization
> behaviour.

I think it should be fine with ALWAYS_INIT. There are not many stackdepot
users being added, and anyone developing a new one will very quickly find
out if they forget to call stack_depot_init()?

> Preferences?
> 
> [...]
>> --- a/drivers/gpu/drm/drm_mm.c
>> +++ b/drivers/gpu/drm/drm_mm.c
>> @@ -980,6 +980,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
>>  add_hole(&mm->head_node);
>>  
>>  mm->scan_active = 0;
>> +
>> +#ifdef CONFIG_DRM_DEBUG_MM
>> +stack_depot_init();
>> +#endif
> 
> DRM_DEBUG_MM implies STACKDEPOT. Not sure what is more readable to drm
> maintainers, but perhaps it'd be nicer to avoid the #ifdef here, and
> instead just keep the no-op version of stack_depot_init() in
> . I don't have a strong preference.

Hm, but in case STACKDEPOT is also selected by something else (e.g.
CONFIG_PAGE_OWNER) which uses lazy init but isn't enabled on boot, then
without #ifdef CONFIG_DRM_DEBUG_MM above, this code would call a
stack_depot_init() (that's not a no-op) even in case it's not going to be
using it, so not what we want to achieve.
But it could be changed to use IS_ENABLED() if that's preferred by DRM folks.

BTW it's possible that there won't be any DRM review because this failed to
apply:
https://patchwork.freedesktop.org/series/95549/
DRM folks, any hint how to indicate that the base was next-20211001?

>> @@ -30,13 +40,4 @@ int stack_depot_snprint(depot_stack_handle_t handle, char 
>> *buf, size_t size,
>>  
>>  void stack_depot_print(depot_stack_handle_t stack);
>>  
>> -#ifdef CONFIG_STACKDEPOT
>> -int stack_depot_init(void);
>> -#else
>> -static inline int stack_depot_init(void)
>> -{
>> -return 0;
>> -}
>> -#endif  /* CONFIG_STACKDEPOT */
>> -
> 
> Could we avoid the IS_ENABLED() in init/main.c by adding a wrapper here:
> 
> +#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
> +static inline int stack_depot_early_init(void)   { return 
> stack_depot_init(); }
> +#else
> +static inline int stack_depot_early_init(void)   { return 0; }
> +#endif   /* CONFIG_STACKDEPOT_ALWAYS_INIT */

We could, but it's a wrapper made for only a single caller...

>>  #endif
>> diff --git a/init/main.c b/init/main.c
>> index ee4d3e1b3eb9..b6a5833d98f5 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -844,7 +844,8 @@ static void __init mm_init(void)
>>  init_mem_debug

[Intel-gfx] [PATCH] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()

2021-10-07 Thread Vlastimil Babka
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated
from memblock, even if stack depot ends up not actually used. The default size
of stack_table is 4MB on 32-bit, 8MB on 64-bit.

This is fine for use-cases such as KASAN which is also a config option and
has overhead on its own. But it's an issue for functionality that has to be
actually enabled on boot (page_owner) or depends on hardware (GPU drivers)
and thus the memory might be wasted. This was raised as an issue when trying
to add stackdepot support for SLUB's debug object tracking functionality. It's
common to build kernels with CONFIG_SLUB_DEBUG and enable slub_debug on boot
only when needed, or create specific kmem caches with debugging for testing
purposes.

It would thus be more efficient if stackdepot's table was allocated only when
actually going to be used. This patch thus makes the allocation (and whole
stack_depot_init() call) optional:

- Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
  well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN
  select this flag.
- Other users have to call stack_depot_init() as part of their own init when
  it's determined that stack depot will actually be used. This may depend on
  both config and runtime conditions. Convert current users which are
  page_owner and several in the DRM subsystem. Same will be done for SLUB
  later.
- Because the init might now be called after the boot-time memblock allocation
  has given all memory to the buddy allocator, change stack_depot_init() to
  allocate stack_table with kvmalloc() when memblock is no longer available.
  Also handle allocation failure by disabling stackdepot (could have
  theoretically happened even with memblock allocation previously), and don't
  unnecessarily align the memblock allocation to its own size anymore.

[1] 
https://lore.kernel.org/all/CAMuHMdW=eovzm1re5fvoen87nkfilmm2+ah7enu2kxehcvb...@mail.gmail.com/

Signed-off-by: Vlastimil Babka 
Cc: Marco Elver 
Cc: Vijayanand Jitta 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Geert Uytterhoeven 
Cc: Oliver Glitta 
Cc: Imran Khan 
---
Hi, I'd appreciate review of the DRM parts - namely that I've got correctly
that stack_depot_init() is called from the proper init functions and iff
stack_depot_save() is going to be used later. Thanks!

 drivers/gpu/drm/drm_dp_mst_topology.c   |  1 +
 drivers/gpu/drm/drm_mm.c|  4 
 drivers/gpu/drm/i915/intel_runtime_pm.c |  3 +++
 include/linux/stackdepot.h  | 19 ---
 init/main.c |  3 ++-
 lib/Kconfig |  3 +++
 lib/Kconfig.kasan   |  1 +
 lib/stackdepot.c| 32 +
 mm/page_owner.c |  2 ++
 9 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 2d1adab9e360..bbe972d59dae 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -5490,6 +5490,7 @@ int drm_dp_mst_topology_mgr_init(struct 
drm_dp_mst_topology_mgr *mgr,
mutex_init(&mgr->probe_lock);
 #if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS)
mutex_init(&mgr->topology_ref_history_lock);
+   stack_depot_init();
 #endif
INIT_LIST_HEAD(&mgr->tx_msg_downq);
INIT_LIST_HEAD(&mgr->destroy_port_list);
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 7d1c578388d3..8257f9d4f619 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -980,6 +980,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
add_hole(&mm->head_node);
 
mm->scan_active = 0;
+
+#ifdef CONFIG_DRM_DEBUG_MM
+   stack_depot_init();
+#endif
 }
 EXPORT_SYMBOL(drm_mm_init);
 
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 0d85f3c5c526..806c32ab410b 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -68,6 +68,9 @@ static noinline depot_stack_handle_t __save_depot_stack(void)
 static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm)
 {
spin_lock_init(&rpm->debug.lock);
+
+   if (rpm->available)
+   stack_depot_init();
 }
 
 static noinline depot_stack_handle_t
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index c34b55a6e554..60ba99a43745 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -15,6 +15,16 @@
 
 typedef u32 depot_stack_handle_t;
 
+/*
+ * Every user of stack depot has to call this during its own init when it's
+ * decided that it will be cal

Re: [Intel-gfx] [PATCH 0/1] lib, stackdepot: Add helper to print stack entries into buffer.

2021-09-14 Thread Vlastimil Babka
On 9/10/21 16:10, Imran Khan wrote:
> This change is in response to discussion at [1].
> The patch has been created on top of my earlier changes [2] and [3].
> If needed I can resend all of these patches together, though my
> earlier patches have been Acked.

I think you sent those at the beginning of merge window, so it would be best
to gather everything in a self-contained series now and resend. I suggested
another change for one of those anyway.

You can of course resend including the Acks you already got, as you did
already with "[PATCH v2 1/1] lib, stackdepot: Add helper to print stack
entries into buffer."

> [1] https://lore.kernel.org/lkml/e6f6fb85-1d83-425b-9e36-b5784cc9e...@suse.cz/
> [2] https://lore.kernel.org/lkml/fe94ffd8-d235-87d8-9c3d-80f7f73e0...@suse.cz/
> [3] https://lore.kernel.org/lkml/85f4f073-0b5a-9052-0ba9-74d450608...@suse.cz/
> 
> Imran Khan (1):
>   lib, stackdepot: Add helper to print stack entries into buffer.
> 
>  drivers/gpu/drm/drm_dp_mst_topology.c   |  5 +
>  drivers/gpu/drm/drm_mm.c|  5 +
>  drivers/gpu/drm/i915/i915_vma.c |  5 +
>  drivers/gpu/drm/i915/intel_runtime_pm.c | 20 +---
>  include/linux/stackdepot.h  |  3 +++
>  lib/stackdepot.c| 23 +++
>  mm/page_owner.c |  5 +
>  7 files changed, 35 insertions(+), 31 deletions(-)
> 



Re: [Intel-gfx] [PATCH 1/1] lib, stackdepot: Add helper to print stack entries into buffer.

2021-09-13 Thread Vlastimil Babka
On 9/10/21 16:10, Imran Khan wrote:
> To print stack entries into a buffer, users of stackdepot,
> first get a list of stack entries using stack_depot_fetch
> and then print this list into a buffer using stack_trace_snprint.
> Provide a helper in stackdepot for this purpose.
> Also change above mentioned users to use this helper.
> 
> Signed-off-by: Imran Khan 
> Suggested-by: Vlastimil Babka 

Acked-by: Vlastimil Babka 

A comment below:

> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -214,6 +214,29 @@ static inline struct stack_record *find_stack(struct 
> stack_record *bucket,
>   return NULL;
>  }
>  
> +/**
> + * stack_depot_snprint - print stack entries from a depot into a buffer
> + *
> + * @handle:  Stack depot handle which was returned from
> + *   stack_depot_save().
> + * @buf: Pointer to the print buffer
> + *
> + * @size:Size of the print buffer
> + *
> + * @spaces:  Number of leading spaces to print
> + *
> + * Return:   Number of bytes printed.
> + */
> +int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
> +int spaces)
> +{
> + unsigned long *entries;
> + unsigned int nr_entries;
> +
> + nr_entries = stack_depot_fetch(handle, &entries);
> + return stack_trace_snprint(buf, size, entries, nr_entries, 0);

stack_trace_snprint() has a WARN_ON(!entries).
So maybe we should not call it if nr_entries is 0 (because e.g. handle was
0) as the warnings are not useful in that case.


Re: [Intel-gfx] [PATCH 04/65] mm: Extract might_alloc() debug check

2020-10-23 Thread Vlastimil Babka

On 10/23/20 2:21 PM, Daniel Vetter wrote:

Extracted from slab.h, which seems to have the most complete version
including the correct might_sleep() check. Roll it out to slob.c.

Motivated by a discussion with Paul about possibly changing call_rcu
behaviour to allocate memory, but only roughly every 500th call.

There are a lot fewer places in the kernel that care about whether
allocating memory is allowed or not (due to deadlocks with reclaim
code) than places that care whether sleeping is allowed. But debugging
these also tends to be a lot harder, so nice descriptive checks could
come in handy. I might have some use eventually for annotations in
drivers/gpu.

Note that unlike fs_reclaim_acquire/release gfpflags_allow_blocking
does not consult the PF_MEMALLOC flags. But there is no flag
equivalent for GFP_NOWAIT, hence this check can't go wrong due to
memalloc_no*_save/restore contexts.

Cc: Paul E. McKenney 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrew Morton 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Vlastimil Babka 
Cc: Mathieu Desnoyers 
Cc: Sebastian Andrzej Siewior 
Cc: Michel Lespinasse 
Cc: Daniel Vetter 
Cc: Waiman Long 
Cc: Thomas Gleixner 
Cc: Randy Dunlap 
Cc: linux...@kvack.org
Cc: linux-fsde...@vger.kernel.org
Cc: Dave Chinner 
Cc: Qian Cai 
Cc: linux-...@vger.kernel.org
Signed-off-by: Daniel Vetter 


Looks useful.
Acked-by: Vlastimil Babka 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 02/12] blk: use for_each_if

2018-07-13 Thread Vlastimil Babka
On 07/12/2018 08:45 AM, Joe Perches wrote:
> On Wed, 2018-07-11 at 20:50 +0200, Daniel Vetter wrote:
>> On Wed, Jul 11, 2018 at 8:30 PM, Jens Axboe  wrote:
>>> On 7/11/18 10:45 AM, Tejun Heo wrote:
 On Wed, Jul 11, 2018 at 09:40:58AM -0700, Tejun Heo wrote:
> On Mon, Jul 09, 2018 at 10:36:40AM +0200, Daniel Vetter wrote:
>> Makes the macros resilient against if {} else {} blocks right
>> afterwards.
>>
>> Signed-off-by: Daniel Vetter 
>> Cc: Tejun Heo 
>> Cc: Jens Axboe 
>> Cc: Shaohua Li 
>> Cc: Kate Stewart 
>> Cc: Greg Kroah-Hartman 
>> Cc: Joseph Qi 
>> Cc: Daniel Vetter 
>> Cc: Arnd Bergmann 
>
> Acked-by: Tejun Heo 
>
> Jens, it'd probably be best to route this through block tree.

 Oops, this requires an earlier patch to move the for_each_if def to a
 common header and should be routed together.
>>>
>>> Yeah, this is a problem with the submission.
>>>
>>> Always (ALWAYS) CC folks on at least the cover letter and generic
>>> earlier patches. Getting just one patch sent like this is mostly
>>> useless, and causes more harm than good.
>>
>> Ime sending a patch with more than 20 or so recipients means it gets
>> stuck everywhere in moderation queues. Or outright spam filters. I
>> thought the correct way to do this is to cc: mailing lists (lkml has
>> them all), but apparently that's not how it's done. Despite that all
>> the patch series I get never have the cover letter addressed to me
>> either.
>>
>> So what's the magic way to make this possible?
> 
> Jens' advice is crap.

This statement was rather offensive and totally uncalled for, AFAICS.
Why did you write it like that?

> There is no generic way to make this possible.
> 
> BCC's don't work, series that touch multiple subsystems
> get rejected when the recipient list is too large.

I don't know what's the usual limit for recipient list, probably never
hit it myself, but for series that are not so large, I use this approach
to make sure the cover letter is CC'd to everyone that's CC'd in any
patch in the series:

- add per-patch Cc:'s to the git commit logs
- clear out *.patch from the working dir
- git format-patch --cover-letter ...
- edit cover letter
- git send-email ... --cc-cmd=./cc.sh ...

where cc.sh contains this:

#/bin/sh
if [[ $1 == *cover-letter* ]]; then
grep '<.*@.*>' -h *.patch | sed 's/^.*: //' | sort | uniq
else
grep '<.*@.*>' -h $1 | sed 's/^.*: //' | sort | uniq
fi

That proceses all tags besides Cc (acked-by, reported-by etc) and turns
them to Cc's for each patch (or does git now do that by itself as well?)
and for cover letter, it accumulates that from all the patches.

Vlastimil

> I think you did it correctly.
> 

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] mm: Track actual nr_scanned during shrink_slab()

2017-08-24 Thread Vlastimil Babka
On 08/24/2017 07:11 AM, Minchan Kim wrote:
> Hello Chris,
> 
> On Tue, Aug 22, 2017 at 02:53:24PM +0100, Chris Wilson wrote:
>> Some shrinkers may only be able to free a bunch of objects at a time, and
>> so free more than the requested nr_to_scan in one pass.

Can such shrinkers reflect that in their shrinker->batch value? Or is it
unpredictable for each scan?

>> Whilst other
>> shrinkers may find themselves even unable to scan as many objects as
>> they counted, and so underreport. Account for the extra freed/scanned
>> objects against the total number of objects we intend to scan, otherwise
>> we may end up penalising the slab far more than intended. Similarly,
>> we want to add the underperforming scan to the deferred pass so that we
>> try harder and harder in future passes.
>>
>> v2: Andrew's shrinkctl->nr_scanned
>>
>> Signed-off-by: Chris Wilson 
>> Cc: Andrew Morton 
>> Cc: Michal Hocko 
>> Cc: Johannes Weiner 
>> Cc: Hillf Danton 
>> Cc: Minchan Kim 
>> Cc: Vlastimil Babka 
>> Cc: Mel Gorman 
>> Cc: Shaohua Li 
>> Cc: linux...@kvack.org
>> ---
>>  include/linux/shrinker.h | 7 +++
>>  mm/vmscan.c  | 7 ---
>>  2 files changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
>> index 4fcacd915d45..51d189615bda 100644
>> --- a/include/linux/shrinker.h
>> +++ b/include/linux/shrinker.h
>> @@ -18,6 +18,13 @@ struct shrink_control {
>>   */
>>  unsigned long nr_to_scan;
>>  
>> +/*
>> + * How many objects did scan_objects process?
>> + * This defaults to nr_to_scan before every call, but the callee
>> + * should track its actual progress.
> 
> So, if shrinker scans object more than requested, it shoud add up
> top nr_scanned?

That sounds fair.

> opposite case, if shrinker scans less than requested, it should reduce
> nr_scanned to the value scanned real?

Unsure. If they can't scan more, the following attempt in the next
iteration should fail and thus result in SHRINK_STOP?

> To track the progress is burden for the shrinker users.

You mean shrinker authors, not users? AFAICS this nr_scanned is opt-in,
if they don't want to touch it, the default remains nr_to_scan.

> Even if a
> shrinker has a mistake, VM will have big trouble like infinite loop.

We could fake 0 as 1 or something, at least.

> IMHO, we need concrete reason to do it but fail to see it at this moment.
> 
> Could we just add up more freed object than requested to total_scan
> like you did in first version[1]?

That's a bit different metric, but maybe it doesn't matter. Different
shrinkers are essentially apples and oranges anyway, so improving the
arithmetics can only help to some extent, IMHO.

> [1] lkml.kernel.org/r/<20170812113437.7397-1-ch...@chris-wilson.co.uk>
> 
>> + */
>> +unsigned long nr_scanned;
>> +
>>  /* current node being shrunk (for NUMA aware shrinkers) */
>>  int nid;
>>  
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a1af041930a6..339b8fc95fc9 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -393,14 +393,15 @@ static unsigned long do_shrink_slab(struct 
>> shrink_control *shrinkctl,
>>  unsigned long nr_to_scan = min(batch_size, total_scan);
>>  
>>  shrinkctl->nr_to_scan = nr_to_scan;
>> +shrinkctl->nr_scanned = nr_to_scan;
>>  ret = shrinker->scan_objects(shrinker, shrinkctl);
>>  if (ret == SHRINK_STOP)
>>  break;
>>  freed += ret;
>>  
>> -count_vm_events(SLABS_SCANNED, nr_to_scan);
>> -total_scan -= nr_to_scan;
>> -scanned += nr_to_scan;
>> +count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
>> +total_scan -= shrinkctl->nr_scanned;
>> +scanned += shrinkctl->nr_scanned;
> 
> If we really want to go this way, at least, We need some defense code
> to prevent infinite loop when shrinker doesn't have object any more.
> However, I really want to go with your first version.
> 
> Andrew?
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> 

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] mm, drm/i915: Mark pinned shmemfs pages as unevictable

2017-06-06 Thread Vlastimil Babka
On 06/06/2017 02:14 PM, Michal Hocko wrote:
> On Tue 06-06-17 13:04:36, Chris Wilson wrote:
>> Similar in principle to the treatment of get_user_pages, pages that
>> i915.ko acquires from shmemfs are not immediately reclaimable and so
>> should be excluded from the mm accounting and vmscan until they have
>> been returned to the system via shrink_slab/i915_gem_shrink. By moving
>> the unreclaimable pages off the inactive anon lru, not only should
>> vmscan be improved by avoiding walking unreclaimable pages, but the
>> system should also have a better idea of how much memory it can reclaim
>> at that moment in time.
> 
> That is certainly desirable. Peter has proposed a generic pin_page (or
> similar) API. What happened with it? I think it would be a better
> approach than (ab)using mlock API. I am also not familiar with the i915
> code to be sure that using lock_page is really safe here. I think that
> all we need is to simply move those pages in/out to/from unevictable LRU
> list on pin/unpining.

Hmm even when on unevictable list, the pages were still allocated as
MOVABLE, while pinning prevents them from being migrated, so it doesn't
play well with compaction/grouping by mobility/CMA etc. Addressing that
would be more useful IMHO, and e.g. one of the features envisioned for
the pinning API was to first migrate the pinned pages out of movable
zones and CMA/MOVABLE pageblocks.

>> Note, however, the interaction with shrink_slab which will move some
>> mlocked pages back to the inactive anon lru.
>>
>> Suggested-by: Dave Hansen 
>> Signed-off-by: Chris Wilson 
>> Cc: Joonas Lahtinen 
>> Cc: Matthew Auld 
>> Cc: Dave Hansen 
>> Cc: "Kirill A . Shutemov" 
>> Cc: Andrew Morton 
>> Cc: Michal Hocko 
>> ---
>>  drivers/gpu/drm/i915/i915_gem.c | 17 -
>>  mm/mlock.c  |  2 ++
>>  2 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 8cb811519db1..37a98fbc6a12 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2193,6 +2193,9 @@ void __i915_gem_object_truncate(struct 
>> drm_i915_gem_object *obj)
>>  obj->mm.pages = ERR_PTR(-EFAULT);
>>  }
>>  
>> +extern void mlock_vma_page(struct page *page);
>> +extern unsigned int munlock_vma_page(struct page *page);
>> +
>>  static void
>>  i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj,
>>struct sg_table *pages)
>> @@ -2214,6 +2217,10 @@ i915_gem_object_put_pages_gtt(struct 
>> drm_i915_gem_object *obj,
>>  if (obj->mm.madv == I915_MADV_WILLNEED)
>>  mark_page_accessed(page);
>>  
>> +lock_page(page);
>> +munlock_vma_page(page);
>> +unlock_page(page);
>> +
>>  put_page(page);
>>  }
>>  obj->mm.dirty = false;
>> @@ -2412,6 +2419,10 @@ i915_gem_object_get_pages_gtt(struct 
>> drm_i915_gem_object *obj)
>>  }
>>  last_pfn = page_to_pfn(page);
>>  
>> +lock_page(page);
>> +mlock_vma_page(page);
>> +unlock_page(page);
>> +
>>  /* Check that the i965g/gm workaround works. */
>>  WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x0010UL));
>>  }
>> @@ -2450,8 +2461,12 @@ i915_gem_object_get_pages_gtt(struct 
>> drm_i915_gem_object *obj)
>>  err_sg:
>>  sg_mark_end(sg);
>>  err_pages:
>> -for_each_sgt_page(page, sgt_iter, st)
>> +for_each_sgt_page(page, sgt_iter, st) {
>> +lock_page(page);
>> +munlock_vma_page(page);
>> +unlock_page(page);
>>  put_page(page);
>> +}
>>  sg_free_table(st);
>>  kfree(st);
>>  
>> diff --git a/mm/mlock.c b/mm/mlock.c
>> index b562b5523a65..531d9f8fd033 100644
>> --- a/mm/mlock.c
>> +++ b/mm/mlock.c
>> @@ -94,6 +94,7 @@ void mlock_vma_page(struct page *page)
>>  putback_lru_page(page);
>>  }
>>  }
>> +EXPORT_SYMBOL_GPL(mlock_vma_page);
>>  
>>  /*
>>   * Isolate a page from LRU with optional get_page() pin.
>> @@ -211,6 +212,7 @@ unsigned int munlock_vma_page(struct page *page)
>>  out:
>>  return nr_pages - 1;
>>  }
>> +EXPORT_SYMBOL_GPL(munlock_vma_page);
>>  
>>  /*
>>   * convert get_user_pages() return value to posix mlock() error
>> -- 
>> 2.11.0
>>
> 

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx