Re: [LTP] [f2fs] 02eb84b96b: ltp.swapon03.fail

2021-03-08 Thread Richard Palethorpe
Hello,

> kern  :err   : [  187.461914] F2FS-fs (sda1): Swapfile does not align to 
> section

> commit 02eb84b96bc1b382dd138bf60724edbefe77b025
> Author: huangjia...@oppo.com 
> Date:   Mon Mar 1 12:58:44 2021 +0800

> f2fs: check if swapfile is section-alligned

> If the swapfile isn't created by pin and fallocate, it can't be
> guaranteed section-aligned, so it may be selected by f2fs gc. When
> gc_pin_file_threshold is reached, the address of swapfile may change,
> but won't be synchronized to swap_extent, so swap will write to wrong
> address, which will cause data corruption.

> Signed-off-by: Huang Jianan 
> Signed-off-by: Guo Weichao 
> Reviewed-by: Chao Yu 
> Signed-off-by: Jaegeuk Kim 

The test uses fallocate to preallocate the swap file and writes zeros to
it. I'm not sure what pin refers to?

-- 
Thank you,
Richard.


Re: [PATCH v2] nvdimm: Avoid race between probe and reading device attributes

2021-02-02 Thread Richard Palethorpe
Hello Dan,

Dan Williams  writes:
>
> I see why this works, but I think the bug is in
> available_slots_show(). It is a bug for a sysfs attribute to reference
> driver-data without synchronizing against bind. So it should be
> possible for probe set that pointer whenever it wants. In other words
> this fix (forgive the whitespace damage from pasting).

Ah, that makes sense! I see the new patch and all is good AFAICT.

-- 
Thank you,
Richard.


[BUG] Lockdep splat during kvfree_call_rcu and stack_depot_save

2020-10-26 Thread Richard Palethorpe
Hello,

The kmem memcg selftest causes the following lockdep splat on 5.9+

[   67.534319] =
[   67.534410] [ BUG: Invalid wait context ]
[   67.534522] 5.9.1-22-default #125 Not tainted
[   67.534647] -
[   67.534732] ksoftirqd/5/36 is trying to lock:
[   67.534833] a7802d58 (depot_lock){..-.}-{3:3}, at: stack_depot_save 
(lib/stackdepot.c:286)
[   67.534993] other info that might help us debug this:
[   67.535089] context-{3:3}
[   67.535139] 3 locks held by ksoftirqd/5/36:
[   67.535216] #0: a769d3e0 (rcu_callback){}-{0:0}, at: 
rcu_do_batch (./include/linux/rcupdate.h:241 kernel/rcu/tree.c:2425)
[   67.535362] #1: a769d4a0 (rcu_read_lock){}-{1:3}, at: 
percpu_ref_switch_to_atomic_rcu (./arch/x86/include/asm/preempt.h:79 
./include/linux/rcupdate.h:60 ./include/linux/rcupdate.h:632 
./include/linux/percpu-refcount.h:304 ./include/linux/percpu-refcount.h:325 
lib/percpu-refcount.c:131 lib/percpu-refcount.c:166)
[   67.535556] #2: 96ca3b55e910 (krc.lock){..-.}-{2:2}, at: kvfree_call_rcu 
(kernel/rcu/tree.c:3301 kernel/rcu/tree.c:3404)
[   67.535709] stack backtrace:
[   67.535780] CPU: 5 PID: 36 Comm: ksoftirqd/5 Not tainted 5.9.1-22-default 
#125
[   67.535907] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014
[   67.536108] Call Trace:
[   67.536151] dump_stack (lib/dump_stack.c:120)
[   67.536221] print_lock_invalid_wait_context.cold 
(kernel/locking/lockdep.c:4093)
[   67.536311] __lock_acquire (kernel/locking/lockdep.c:4391)
[   67.536377] ? validate_chain (kernel/locking/lockdep.c:2603 
kernel/locking/lockdep.c:3218)
[   67.536453] lock_acquire (kernel/locking/lockdep.c:398 
kernel/locking/lockdep.c:5031)
[   67.536521] ? stack_depot_save (lib/stackdepot.c:286)
[   67.536590] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:24)
[   67.536662] _raw_spin_lock_irqsave (./include/linux/spinlock_api_smp.h:117 
kernel/locking/spinlock.c:159)
[   67.536873] ? stack_depot_save (lib/stackdepot.c:286)
[   67.537108] stack_depot_save (lib/stackdepot.c:286)
[   67.537284] save_stack (mm/page_owner.c:137)
[   67.537382] ? prep_new_page (./include/linux/page_owner.h:31 
mm/page_alloc.c:2220 mm/page_alloc.c:2226)
[   67.537479] ? get_page_from_freelist (mm/page_alloc.c:3851)
[   67.537664] ? __alloc_pages_nodemask (mm/page_alloc.c:4896)
[   67.537843] ? __get_free_pages (mm/page_alloc.c:4940)
[   67.537971] ? kvfree_call_rcu (kernel/rcu/tree.c:3336 kernel/rcu/tree.c:3404)
[   67.538066] ? percpu_ref_switch_to_atomic_rcu 
(./include/linux/percpu-refcount.h:309 ./include/linux/percpu-refcount.h:325 
lib/percpu-refcount.c:131 lib/percpu-refcount.c:166)
[   67.538218] ? rcu_do_batch (./include/linux/rcupdate.h:246 
kernel/rcu/tree.c:2432)
[   67.538348] ? rcu_core (kernel/rcu/tree.c:2658)
[   67.538409] ? __do_softirq (./arch/x86/include/asm/jump_label.h:25 
./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 
kernel/softirq.c:299)
[   67.538509] ? run_ksoftirqd (kernel/softirq.c:653 kernel/softirq.c:644)
[   67.538640] ? smpboot_thread_fn (kernel/smpboot.c:165)
[   67.538814] ? kthread (kernel/kthread.c:292)
[   67.539004] ? ret_from_fork (arch/x86/entry/entry_64.S:300)
[   67.539172] __set_page_owner (mm/page_owner.c:169 mm/page_owner.c:192)
[   67.539281] prep_new_page (./include/linux/page_owner.h:31 
mm/page_alloc.c:2220 mm/page_alloc.c:2226)
[   67.539445] get_page_from_freelist (mm/page_alloc.c:3851)
[   67.539653] ? kvfree_call_rcu (kernel/rcu/tree.c:3301 kernel/rcu/tree.c:3404)
[   67.539823] __alloc_pages_nodemask (mm/page_alloc.c:4896)
[   67.540020] __get_free_pages (mm/page_alloc.c:4940)
[   67.540171] kvfree_call_rcu (kernel/rcu/tree.c:3336 kernel/rcu/tree.c:3404)
[   67.540324] ? rcu_do_batch (./include/linux/rcupdate.h:241 
kernel/rcu/tree.c:2425)
[   67.540524] percpu_ref_switch_to_atomic_rcu 
(./include/linux/percpu-refcount.h:309 ./include/linux/percpu-refcount.h:325 
lib/percpu-refcount.c:131 lib/percpu-refcount.c:166)
[   67.540736] rcu_do_batch (./include/linux/rcupdate.h:246 
kernel/rcu/tree.c:2432)
[   67.540941] rcu_core (kernel/rcu/tree.c:2658)
[   67.541169] __do_softirq (./arch/x86/include/asm/jump_label.h:25 
./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 
kernel/softirq.c:299)
[   67.541375] run_ksoftirqd (kernel/softirq.c:653 kernel/softirq.c:644)
[   67.541574] smpboot_thread_fn (kernel/smpboot.c:165)
[   67.541771] ? smpboot_register_percpu_thread (kernel/smpboot.c:108)
[   67.542025] kthread (kernel/kthread.c:292)
[   67.542219] ? kthread_create_worker_on_cpu (kernel/kthread.c:245)

This appears to be caused by
8ac88f7177c7 ("rcu/tree: Keep kfree_rcu() awake during lock contention")

which switched krc.lock from a spinlock to a raw_spinlock. Indeed if I
switch it back to a spinlock again then the splat is no longer
reproducible.

-- 
Thank you,
Richard.


Re: [PATCH v4] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-26 Thread Richard Palethorpe
Hello Roman,

Roman Gushchin  writes:

> On Thu, Oct 22, 2020 at 04:59:56PM -0700, Shakeel Butt wrote:
>> On Thu, Oct 22, 2020 at 10:25 AM Roman Gushchin  wrote:
>> >
>> [snip]
>> > >
>> > > Since bf4f059954dc ("mm: memcg/slab: obj_cgroup API") is in 5.9, I
>> > > think we can take this patch for 5.9 and 5.10 but keep Roman's cleanup
>> > > for 5.11.
>> > >
>> > > What does everyone think?
>> >
>> > I think we should use the link to the root approach both for stable 
>> > backports
>> > and for 5.11+, to keep them in sync. The cleanup (always charging the root 
>> > cgroup)
>> > is not directly related to this problem, and we can keep it for 5.11+ only.
>> >
>> > Thanks!
>> 
>> Roman, can you send the signed-off patch for the root linking for
>> use_hierarchy=0?
>
> Sure, here we are.
>
> Thanks!
>
> --
>
> From 19d66695f0ef1bf1ef7c51073ab91d67daa91362 Mon Sep 17 00:00:00 2001
> From: Roman Gushchin 
> Date: Thu, 22 Oct 2020 17:12:32 -0700
> Subject: [PATCH] mm: memcg: link page counters to root if use_hierarchy is 
> false
>
> Richard reported a warning which can be reproduced by running the LTP
> madvise6 test (cgroup v1 in the non-hierarchical mode should be used):
>
> [9.841552] [ cut here ]
> [9.841788] WARNING: CPU: 0 PID: 12 at mm/page_counter.c:57 
> page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 
> mm/page_counter.c:156)
> [9.841982] Modules linked in:
> [9.842072] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 
> 5.9.0-rc7-22-default #77
> [9.842266] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014
> [9.842571] Workqueue: events drain_local_stock
> [9.842750] RIP: 0010:page_counter_uncharge (mm/page_counter.c:57 
> mm/page_counter.c:50 mm/page_counter.c:156)
> [ 9.842894] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff 
> ff 48 85 db 78 10 48 8b 6d 28 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb 
> ec 90 e8 4b f9 88 2a 48 8b 17 48 39 d6 72 41 41 54 49 89
> [9.843438] RSP: 0018:b1c18006be28 EFLAGS: 00010086
> [9.843585] RAX:  RBX:  RCX: 
> 94803bc2cae0
> [9.843806] RDX: 0001 RSI:  RDI: 
> 948007d2b248
> [9.844026] RBP: 948007d2b248 R08: 948007c58eb0 R09: 
> 948007da05ac
> [9.844248] R10: 0018 R11: 0018 R12: 
> 0001
> [9.844477] R13:  R14:  R15: 
> 94803bc2cac0
> [9.844696] FS:  () GS:94803bc0() 
> knlGS:
> [9.844915] CS:  0010 DS:  ES:  CR0: 80050033
> [9.845096] CR2: 7f0579ee0384 CR3: 2cc0a000 CR4: 
> 06f0
> [9.845319] Call Trace:
> [9.845429] __memcg_kmem_uncharge (mm/memcontrol.c:3022)
> [9.845582] drain_obj_stock (./include/linux/rcupdate.h:689 
> mm/memcontrol.c:3114)
> [9.845684] drain_local_stock (mm/memcontrol.c:2255)
> [9.845789] process_one_work (./arch/x86/include/asm/jump_label.h:25 
> ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 
> kernel/workqueue.c:2274)
> [9.845898] worker_thread (./include/linux/list.h:282 
> kernel/workqueue.c:2416)
> [9.846034] ? process_one_work (kernel/workqueue.c:2358)
> [9.846162] kthread (kernel/kthread.c:292)
> [9.846271] ? __kthread_bind_mask (kernel/kthread.c:245)
> [9.846420] ret_from_fork (arch/x86/entry/entry_64.S:300)
> [9.846531] ---[ end trace 8b5647c1eba9d18a ]---
>
> The problem occurs because in the non-hierarchical mode non-root page
> counters are not linked to root page counters, so the charge is not
> propagated to the root memory cgroup.
>
> After the removal of the original memory cgroup and reparenting of the
> object cgroup, the root cgroup might be uncharged by draining a objcg

I think it is worth mentioning that reparenting will always be to root
to avoid any confusion about what may happen with deeper, broken,
hierarchies.

> stock, for example. It leads to an eventual underflow of the charge
> and triggers a warning.
>
> Fix it by linking all page counters to corresponding root page
> counters in the non-hierarchical mode.
>
> The patch doesn't affect how the hierarchical mode is working,
> which is the only sane and truly supported mode now.
>
> Thanks to Richard for reporting, debugging and providing an
> alternative version of the fix!
>
> Reported-by: l...@lists.linux.it
> Debugged-by: Richard Palethorpe 

Much appreciated, thanks!

-- 
Thank you,
Richard.


[PATCH v4] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-22 Thread Richard Palethorpe
.844248] R10: 0018 R11: 0018 R12: 0001
[9.844477] R13:  R14:  R15: 94803bc2cac0
[9.844696] FS:  () GS:94803bc0() 
knlGS:
[9.844915] CS:  0010 DS:  ES:  CR0: 80050033
[9.845096] CR2: 7f0579ee0384 CR3: 2cc0a000 CR4: 06f0
[9.845319] Call Trace:
[9.845429] __memcg_kmem_uncharge (mm/memcontrol.c:3022)
[9.845582] drain_obj_stock (./include/linux/rcupdate.h:689 
mm/memcontrol.c:3114)
[9.845684] drain_local_stock (mm/memcontrol.c:2255)
[9.845789] process_one_work (./arch/x86/include/asm/jump_label.h:25 
./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 
kernel/workqueue.c:2274)
[9.845898] worker_thread (./include/linux/list.h:282 
kernel/workqueue.c:2416)
[9.846034] ? process_one_work (kernel/workqueue.c:2358)
[9.846162] kthread (kernel/kthread.c:292)
[9.846271] ? __kthread_bind_mask (kernel/kthread.c:245)
[9.846420] ret_from_fork (arch/x86/entry/entry_64.S:300)
[9.846531] ---[ end trace 8b5647c1eba9d18a ]---

Reported-by: l...@lists.linux.it
Signed-off-by: Richard Palethorpe 
Acked-by: Roman Gushchin 
Cc: Johannes Weiner 
Cc: Andrew Morton 
Cc: Shakeel Butt 
Cc: Christoph Lameter 
Cc: Michal Hocko 
Cc: Tejun Heo 
Cc: Vlastimil Babka 
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
---

V4: Same as V3, but with hopefully better/correct commit message.

 mm/memcontrol.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6877c765b8d0..34b8c4a66853 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -291,7 +291,7 @@ static void obj_cgroup_release(struct percpu_ref *ref)
 
spin_lock_irqsave(_set_lock, flags);
memcg = obj_cgroup_memcg(objcg);
-   if (nr_pages)
+   if (nr_pages && (!mem_cgroup_is_root(memcg) || memcg->use_hierarchy))
__memcg_kmem_uncharge(memcg, nr_pages);
list_del(>list);
mem_cgroup_put(memcg);
@@ -3100,6 +3100,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, 
unsigned int nr_bytes)
 static void drain_obj_stock(struct memcg_stock_pcp *stock)
 {
struct obj_cgroup *old = stock->cached_objcg;
+   struct mem_cgroup *memcg;
 
if (!old)
return;
@@ -3110,7 +3111,9 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 
if (nr_pages) {
rcu_read_lock();
-   __memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages);
+   memcg = obj_cgroup_memcg(old);
+   if (!mem_cgroup_is_root(memcg) || memcg->use_hierarchy)
+   __memcg_kmem_uncharge(memcg, nr_pages);
rcu_read_unlock();
}
 
-- 
2.28.0



Re: [PATCH v3] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-22 Thread Richard Palethorpe
Hello,

Michal Koutný  writes:

> Hi.
>
> On Tue, Oct 20, 2020 at 06:52:08AM +0100, Richard Palethorpe 
>  wrote:
>> I don't think that is relevant as we get the memcg from objcg->memcg
>> which is set during reparenting. I suppose however, we can determine if
>> the objcg was reparented by inspecting memcg->objcg.
> +1
>
>> If we just check use_hierarchy then objects directly charged to the
>> memcg where use_hierarchy=0 will not be uncharged. However, maybe it is
>> better to check if it was reparented and if use_hierarchy=0.
> I think (I had to make a table) the yielded condition would be:
>
> if ((memcg->use_hierarchy && reparented) || (!mem_cgroup_is_root(memcg) && 
> !reparented))

This looks correct, but I don't think we need to check for reparenting
after all. We have the following unique scenarious:

use_hierarchy=1, memcg=?, reparented=?:
Because use_hierarchy=1 any descendants will have charged the current
memcg, including root, so we can simply uncharge regardless of the memcg
or objcg.

use_hierarchy=0, memcg!=root, reparented=?:
When use_hierarchy=0, objcgs are *only* reparented to root, so if we are
not root the objcg must belong to us.

use_hierarchy=0, memcg=root, reparented=?:
We must have been reparented because root is not initialised with an
objcg, but use_hierarchy=0 so we can not uncharge.

So I believe that the following is sufficient.

if (memcg->use_hierarchy || !mem_cgroup_is_root(memcg))
>__memcg_kmem_uncharge(memcg, nr_pages);
>
> (I admit it's not very readable.)
>
>
> Michal

For the record, I did create the following patch which checks for
reparenting, but it appears to be unecessary.



diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6877c765b8d0..0285f760f003 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -257,6 +257,14 @@ struct cgroup_subsys_state *vmpressure_to_css(struct 
vmpressure *vmpr)
 #ifdef CONFIG_MEMCG_KMEM
 extern spinlock_t css_set_lock;

+/* Assumes objcg originated from a descendant of memcg or is memcg's */
+static bool obj_cgroup_did_charge(const struct obj_cgroup *objcg,
+ const struct mem_cgroup *memcg)
+{
+   return memcg->use_hierarchy ||
+   rcu_access_pointer(memcg->objcg) == objcg;
+}
+
 static void obj_cgroup_release(struct percpu_ref *ref)
 {
struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt);
@@ -291,7 +299,7 @@ static void obj_cgroup_release(struct percpu_ref *ref)

spin_lock_irqsave(_set_lock, flags);
memcg = obj_cgroup_memcg(objcg);
-   if (nr_pages)
+   if (nr_pages && obj_cgroup_did_charge(objcg, memcg))
__memcg_kmem_uncharge(memcg, nr_pages);
list_del(>list);
mem_cgroup_put(memcg);
@@ -3100,6 +3108,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, 
unsigned int nr_bytes)
 static void drain_obj_stock(struct memcg_stock_pcp *stock)
 {
struct obj_cgroup *old = stock->cached_objcg;
+   struct mem_cgroup *memcg;

if (!old)
return;
@@ -3110,7 +3119,9 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)

if (nr_pages) {
rcu_read_lock();
-   __memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages);
+   memcg = obj_cgroup_memcg(old);
+   if (obj_cgroup_did_charge(old, memcg))
+   __memcg_kmem_uncharge(memcg, nr_pages);
rcu_read_unlock();
}

-- 
Thank you,
Richard.


Re: [RFC PATCH] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-20 Thread Richard Palethorpe
counter *counter, 
> unsigned long nr_pages)
>  {
>   struct page_counter *c;
>  
> - for (c = counter; c; c = c->parent) {
> + for_each_nonroot_ancestor(c, counter) {
>   long new;
>  
>   new = atomic_long_add_return(nr_pages, >usage);
> @@ -97,7 +97,7 @@ bool page_counter_try_charge(struct page_counter *counter,
>  {
>   struct page_counter *c;
>  
> - for (c = counter; c; c = c->parent) {
> + for_each_nonroot_ancestor(c, counter) {
>   long new;
>   /*
>* Charge speculatively to avoid an expensive CAS.  If
> @@ -137,8 +137,11 @@ bool page_counter_try_charge(struct page_counter 
> *counter,
>   return true;
>  
>  failed:
> - for (c = counter; c != *fail; c = c->parent)
> + for_each_nonroot_ancestor(c, counter) {
> + if (c == *fail)
> + break;
>   page_counter_cancel(c, nr_pages);
> + }
>  
>   return false;
>  }
> @@ -152,7 +155,7 @@ void page_counter_uncharge(struct page_counter *counter, 
> unsigned long nr_pages)
>  {
>   struct page_counter *c;
>  
> - for (c = counter; c; c = c->parent)
> + for_each_nonroot_ancestor(c, counter)
>   page_counter_cancel(c, nr_pages);
>  }
>  
> @@ -211,7 +214,7 @@ void page_counter_set_min(struct page_counter *counter, 
> unsigned long nr_pages)
>  
>   WRITE_ONCE(counter->min, nr_pages);
>  
> - for (c = counter; c; c = c->parent)
> + for_each_nonroot_ancestor(c, counter)
>       propagate_protected_usage(c, atomic_long_read(>usage));
>  }
>  
> @@ -228,7 +231,7 @@ void page_counter_set_low(struct page_counter *counter, 
> unsigned long nr_pages)
>  
>   WRITE_ONCE(counter->low, nr_pages);
>  
> - for (c = counter; c; c = c->parent)
> + for_each_nonroot_ancestor(c, counter)
>   propagate_protected_usage(c, atomic_long_read(>usage));
>  }

This for sure prevents the counter underflow reported by madvise06 and
makes my patch redundant. Although perhaps this is significantly more
intrusive, so not suitable for the 5.9 stable branch?

Tested-by: Richard Palethorpe 

-- 
Thank you,
Richard.


Re: [PATCH v3] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-20 Thread Richard Palethorpe
Hello,

Richard Palethorpe  writes:

> Hello Shakeel,
>
> Shakeel Butt  writes:
>>>
>>> V3: Handle common case where use_hierarchy=1 and update description.
>>>
>>>  mm/memcontrol.c | 7 +--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 6877c765b8d0..34b8c4a66853 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -291,7 +291,7 @@ static void obj_cgroup_release(struct percpu_ref *ref)
>>>
>>> spin_lock_irqsave(_set_lock, flags);
>>> memcg = obj_cgroup_memcg(objcg);
>>> -   if (nr_pages)
>>> +   if (nr_pages && (!mem_cgroup_is_root(memcg) || 
>>> memcg->use_hierarchy))
>>
>> If we have non-root memcg with use_hierarchy as 0 and this objcg was
>> reparented then this __memcg_kmem_uncharge() can potentially underflow
>> the page counter and give the same warning.
>
> Yes, although the kernel considers such a config to be broken, and
> prints a warning to the log, it does allow it.

Actually this can not happen because if use_hierarchy=0 then the objcg
will be reparented to root.

>
>>
>> We never set root_mem_cgroup->objcg, so, no need to check for root
>
> I don't think that is relevant as we get the memcg from objcg->memcg
> which is set during reparenting. I suppose however, we can determine if
> the objcg was reparented by inspecting memcg->objcg.
>
>> here. I think checking just memcg->use_hierarchy should be sufficient.
>
> If we just check use_hierarchy then objects directly charged to the
> memcg where use_hierarchy=0 will not be uncharged. However, maybe it is
> better to check if it was reparented and if use_hierarchy=0.

-- 
Thank you,
Richard.


Re: [RFC PATCH] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-20 Thread Richard Palethorpe
Hello,

Richard Palethorpe  writes:

> Hello Roman,
>
> Roman Gushchin  writes:
>
>> -page_counter_init(>memory, NULL);
>> -page_counter_init(>swap, NULL);
>> -page_counter_init(>kmem, NULL);
>> -page_counter_init(>tcpmem, NULL);
>> +/*
>> + * If use_hierarchy == false, consider all page counters direct
>> + * descendants of the corresponding root level counters.
>> + */
>> +page_counter_init(>memory, _mem_cgroup->memory);
>> +page_counter_init(>swap, _mem_cgroup->swap);
>> +page_counter_init(>kmem, _mem_cgroup->kmem);
>> +page_counter_init(>tcpmem, _mem_cgroup->tcpmem);
>> +
>>  /*
>>   * Deeper hierachy with use_hierarchy == false doesn't make
>>   * much sense so let cgroup subsystem know about this
>
> Perhaps in this case, where the hierarchy is broken, objcgs should also
> be reparented directly to root? Otherwise it will still be possible to
> underflow the counter in a descendant of root which has use_hierarchy=0,
> but also has children.

Sorry ignore me, parent_mem_cgroup already selects root. So in the case
of a broken hierarchy objcgs are reparented directly to root.

-- 
Thank you,
Richard.


Re: [RFC PATCH] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-20 Thread Richard Palethorpe
Hello Roman,

Roman Gushchin  writes:

> On Fri, Oct 16, 2020 at 07:15:02PM +0200, Michal Koutny wrote:
>> On Fri, Oct 16, 2020 at 10:53:08AM -0400, Johannes Weiner 
>>  wrote:
>> > The central try_charge() function charges recursively all the way up
>> > to and including the root.
>> Except for use_hiearchy=0 (which is the case here as Richard
>> wrote). The reparenting is hence somewhat incompatible with
>> new_parent.use_hiearchy=0 :-/
>> 
>> > We should clean this up one way or another: either charge the root or
>> > don't, but do it consistently.
>> I agree this'd be good to unify. One upside of excluding root memcg from
>> charging is that users are spared from the charging overhead when memcg
>> tree is not created.  (Actually, I thought that was the reason for this
>> exception.)
>
> Yeah, I'm completely on the same page. Moving a process to the root memory
> cgroup is currently a good way to estimate the memory cgroup overhead.
>
> How about the patch below, which consistently avoids charging the root
> memory cgroup? It seems like it doesn't add too many checks.
>
> Thanks!
>
> --
>
> From f50ea74d8f118b9121da3754acdde630ddc060a7 Mon Sep 17 00:00:00 2001
> From: Roman Gushchin 
> Date: Mon, 19 Oct 2020 14:37:35 -0700
> Subject: [PATCH RFC] mm: memcontrol: do not charge the root memory cgroup
>
> Currently the root memory cgroup is never charged directly, but
> if an ancestor cgroup is charged, the charge is propagated up to the
> root memory cgroup. The root memory cgroup doesn't show the charge
> to a user, neither it does allow to set any limits/protections.
> So the information about the current charge is completely useless.
>
> Avoiding to charge the root memory cgroup allows to:
> 1) simplify the model and the code, so, hopefully, fewer bugs will
>be introduced in the future;
> 2) avoid unnecessary atomic operations, which are used to (un)charge
>corresponding root page counters.
>
> In the default hierarchy case or if use_hiearchy == true, it's very
> straightforward: when the page counters tree is traversed to the root,
> the root page counter (the one with parent == NULL), should be
> skipped. To avoid multiple identical checks over the page counters
> code, for_each_nonroot_ancestor() macro is introduced.
>
> To handle the use_hierarchy == false case without adding custom
> checks, let's make page counters of all non-root memory cgroup
> direct ascendants of the corresponding root memory cgroup's page
> counters. In this case for_each_nonroot_ancestor() will work correctly
> as well.
>
> Please, note, that cgroup v1 provides root level memory.usage_in_bytes.
> However, it's not based on page counters (refer to mem_cgroup_usage()).
>
> Signed-off-by: Roman Gushchin 
> ---
>  mm/memcontrol.c   | 21 -
>  mm/page_counter.c | 21 -
>  2 files changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2636f8bad908..34cac7522e74 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5339,17 +5339,28 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
> *parent_css)
>   memcg->swappiness = mem_cgroup_swappiness(parent);
>   memcg->oom_kill_disable = parent->oom_kill_disable;
>   }
> - if (parent && parent->use_hierarchy) {
> + if (!parent) {
> + /* root memory cgroup */
> + page_counter_init(>memory, NULL);
> + page_counter_init(>swap, NULL);
> + page_counter_init(>kmem, NULL);
> + page_counter_init(>tcpmem, NULL);
> + } else if (parent->use_hierarchy) {
>   memcg->use_hierarchy = true;
>   page_counter_init(>memory, >memory);
>   page_counter_init(>swap, >swap);
>   page_counter_init(>kmem, >kmem);
>   page_counter_init(>tcpmem, >tcpmem);
>   } else {
> - page_counter_init(>memory, NULL);
> - page_counter_init(>swap, NULL);
> - page_counter_init(>kmem, NULL);
> - page_counter_init(>tcpmem, NULL);
> + /*
> +  * If use_hierarchy == false, consider all page counters direct
> +  * descendants of the corresponding root level counters.
> +  */
> + page_counter_init(>memory, _mem_cgroup->memory);
> + page_counter_init(>swap, _mem_cgroup->swap);
> + page_counter_init(>kmem, _mem_cgroup->kmem);
> + page_counter_init(>tcpmem, _mem_cgroup->tcpmem);
> +
>   /*
>* Deeper hierachy with use_hierarchy == false doesn't make
>* much sense so let cgroup subsystem know about this

Perhaps in this case, where the hierarchy is broken, objcgs should also
be reparented directly to root? Otherwise it will still be possible to
underflow the counter in a descendant of root which has use_hierarchy=0,
but also has children.

> diff --git a/mm/page_counter.c b/mm/page_counter.c
> 

Re: [PATCH v3] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-19 Thread Richard Palethorpe
Hello Shakeel,

Shakeel Butt  writes:
>>
>> V3: Handle common case where use_hierarchy=1 and update description.
>>
>>  mm/memcontrol.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 6877c765b8d0..34b8c4a66853 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -291,7 +291,7 @@ static void obj_cgroup_release(struct percpu_ref *ref)
>>
>> spin_lock_irqsave(_set_lock, flags);
>> memcg = obj_cgroup_memcg(objcg);
>> -   if (nr_pages)
>> +   if (nr_pages && (!mem_cgroup_is_root(memcg) || memcg->use_hierarchy))
>
> If we have non-root memcg with use_hierarchy as 0 and this objcg was
> reparented then this __memcg_kmem_uncharge() can potentially underflow
> the page counter and give the same warning.

Yes, although the kernel considers such a config to be broken, and
prints a warning to the log, it does allow it.

>
> We never set root_mem_cgroup->objcg, so, no need to check for root

I don't think that is relevant as we get the memcg from objcg->memcg
which is set during reparenting. I suppose however, we can determine if
the objcg was reparented by inspecting memcg->objcg.

> here. I think checking just memcg->use_hierarchy should be sufficient.

If we just check use_hierarchy then objects directly charged to the
memcg where use_hierarchy=0 will not be uncharged. However, maybe it is
better to check if it was reparented and if use_hierarchy=0.

>
>> __memcg_kmem_uncharge(memcg, nr_pages);
>> list_del(>list);
>> mem_cgroup_put(memcg);
>> @@ -3100,6 +3100,7 @@ static bool consume_obj_stock(struct obj_cgroup 
>> *objcg, unsigned int nr_bytes)
>>  static void drain_obj_stock(struct memcg_stock_pcp *stock)
>>  {
>> struct obj_cgroup *old = stock->cached_objcg;
>> +   struct mem_cgroup *memcg;
>>
>> if (!old)
>> return;
>> @@ -3110,7 +3111,9 @@ static void drain_obj_stock(struct memcg_stock_pcp 
>> *stock)
>>
>> if (nr_pages) {
>> rcu_read_lock();
>> -   __memcg_kmem_uncharge(obj_cgroup_memcg(old), 
>> nr_pages);
>> +   memcg = obj_cgroup_memcg(old);
>> +   if (!mem_cgroup_is_root(memcg) || 
>> memcg->use_hierarchy)
>> +   __memcg_kmem_uncharge(memcg, nr_pages);
>> rcu_read_unlock();
>> }
>>
>> --
>> 2.28.0
>>


-- 
Thank you,
Richard.


[PATCH v3] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-19 Thread Richard Palethorpe
948007d2b248 R08: 948007c58eb0 R09: 948007da05ac
[9.844248] R10: 0018 R11: 0018 R12: 0001
[9.844477] R13:  R14:  R15: 94803bc2cac0
[9.844696] FS:  () GS:94803bc0() 
knlGS:
[9.844915] CS:  0010 DS:  ES:  CR0: 80050033
[9.845096] CR2: 7f0579ee0384 CR3: 2cc0a000 CR4: 06f0
[9.845319] Call Trace:
[9.845429] __memcg_kmem_uncharge (mm/memcontrol.c:3022)
[9.845582] drain_obj_stock (./include/linux/rcupdate.h:689 
mm/memcontrol.c:3114)
[9.845684] drain_local_stock (mm/memcontrol.c:2255)
[9.845789] process_one_work (./arch/x86/include/asm/jump_label.h:25 
./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 
kernel/workqueue.c:2274)
[9.845898] worker_thread (./include/linux/list.h:282 
kernel/workqueue.c:2416)
[9.846034] ? process_one_work (kernel/workqueue.c:2358)
[9.846162] kthread (kernel/kthread.c:292)
[9.846271] ? __kthread_bind_mask (kernel/kthread.c:245)
[9.846420] ret_from_fork (arch/x86/entry/entry_64.S:300)
[9.846531] ---[ end trace 8b5647c1eba9d18a ]---

Reported-by: l...@lists.linux.it
Signed-off-by: Richard Palethorpe 
Acked-by: Roman Gushchin 
Cc: Johannes Weiner 
Cc: Andrew Morton 
Cc: Shakeel Butt 
Cc: Christoph Lameter 
Cc: Michal Hocko 
Cc: Tejun Heo 
Cc: Vlastimil Babka 
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
---

V3: Handle common case where use_hierarchy=1 and update description.

 mm/memcontrol.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6877c765b8d0..34b8c4a66853 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -291,7 +291,7 @@ static void obj_cgroup_release(struct percpu_ref *ref)
 
spin_lock_irqsave(_set_lock, flags);
memcg = obj_cgroup_memcg(objcg);
-   if (nr_pages)
+   if (nr_pages && (!mem_cgroup_is_root(memcg) || memcg->use_hierarchy))
__memcg_kmem_uncharge(memcg, nr_pages);
list_del(>list);
mem_cgroup_put(memcg);
@@ -3100,6 +3100,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, 
unsigned int nr_bytes)
 static void drain_obj_stock(struct memcg_stock_pcp *stock)
 {
struct obj_cgroup *old = stock->cached_objcg;
+   struct mem_cgroup *memcg;
 
if (!old)
return;
@@ -3110,7 +3111,9 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 
if (nr_pages) {
rcu_read_lock();
-   __memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages);
+   memcg = obj_cgroup_memcg(old);
+   if (!mem_cgroup_is_root(memcg) || memcg->use_hierarchy)
+   __memcg_kmem_uncharge(memcg, nr_pages);
rcu_read_unlock();
}
 
-- 
2.28.0



Re: [RFC PATCH] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-19 Thread Richard Palethorpe
Hello,

Michal Koutný  writes:

> On Fri, Oct 16, 2020 at 10:53:08AM -0400, Johannes Weiner 
>  wrote:
>> The central try_charge() function charges recursively all the way up
>> to and including the root.
> Except for use_hiearchy=0 (which is the case here as Richard
> wrote). The reparenting is hence somewhat incompatible with
> new_parent.use_hiearchy=0 :-/
>

Yes and it also seems

new_parent.use_hierarch=0 -> new_child.use_hierarchy=0

and

new_parent.use_hierarch=0 -> new_child.use_hierarchy=1

are considered valid on cgroupsV1. The kernel will also allow more
descendants on new_child.use_hierarchy=0, but sets
broken_hierarchy=1. However this will not stop the stack trace occuring
(AFAICT) when the reparenting happens between two descendants.

>> We should clean this up one way or another: either charge the root or
>> don't, but do it consistently.
> I agree this'd be good to unify. One upside of excluding root memcg from
> charging is that users are spared from the charging overhead when memcg
> tree is not created.  (Actually, I thought that was the reason for this
> exception.)
>
> Michal


-- 
Thank you,
Richard.


Re: [RFC PATCH] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-16 Thread Richard Palethorpe
Hello,

Richard Palethorpe  writes:

> Hello Michal,
>
> Michal Koutný  writes:
>
>> Hello.
>>
>> On Wed, Oct 14, 2020 at 08:07:49PM +0100, Richard Palethorpe 
>>  wrote:
>>> SLAB objects which outlive their memcg are moved to their parent
>>> memcg where they may be uncharged. However if they are moved to the
>>> root memcg, uncharging will result in negative page counter values as
>>> root has no page counters.
>> Why do you think those are reparented objects? If those are originally
>> charged in a non-root cgroup, then the charge value should be propagated up 
>> the
>> hierarchy, including root memcg, so if they're later uncharged in root
>> after reparenting, it should still break even. (Or did I miss some stock
>> imbalance?)
>
> I traced it and can see they are reparented objects and that the root
> groups counters are zero (or negative if I run madvise06 multiple times)
> before a drain takes place. I'm guessing this is because the root group
> has 'use_hierachy' set to false so that the childs page_counter parents
> are set to NULL. However I will check, because I'm not sure about
> either.

Yes, it appears that use_hierarchy=0 which is probably because the test
mounts cgroup v1, creates a child group within that and does not set
use_hierarchy on the root. On v2 root always has use_hierarchy enabled.

>
>>
>> (But the patch seems justifiable to me as objects (not)charged directly to
>> root memcg may be incorrectly uncharged.)
>>
>> Thanks,
>> Michal

I'm don't know if that could happen without reparenting. I suppose if
use_hierarchy=1 then actually this patch will result in root being
overcharged, so perhaps it should also check for use_hierarchy?

-- 
Thank you,
Richard.


Re: [RFC PATCH] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-16 Thread Richard Palethorpe
Hello Michal,

Michal Koutný  writes:

> Hello.
>
> On Wed, Oct 14, 2020 at 08:07:49PM +0100, Richard Palethorpe 
>  wrote:
>> SLAB objects which outlive their memcg are moved to their parent
>> memcg where they may be uncharged. However if they are moved to the
>> root memcg, uncharging will result in negative page counter values as
>> root has no page counters.
> Why do you think those are reparented objects? If those are originally
> charged in a non-root cgroup, then the charge value should be propagated up 
> the
> hierarchy, including root memcg, so if they're later uncharged in root
> after reparenting, it should still break even. (Or did I miss some stock
> imbalance?)

I traced it and can see they are reparented objects and that the root
groups counters are zero (or negative if I run madvise06 multiple times)
before a drain takes place. I'm guessing this is because the root group
has 'use_hierachy' set to false so that the childs page_counter parents
are set to NULL. However I will check, because I'm not sure about
either.

>
> (But the patch seems justifiable to me as objects (not)charged directly to
> root memcg may be incorrectly uncharged.)
>
> Thanks,
> Michal


-- 
Thank you,
Richard.


Re: [RFC PATCH] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-15 Thread Richard Palethorpe
Hello Roman,

Roman Gushchin  writes:

> Hi Richard!
>
>> SLAB objects which outlive their memcg are moved to their parent
>> memcg where they may be uncharged. However if they are moved to the
>> root memcg, uncharging will result in negative page counter values as
>> root has no page counters.
>> 
>> To prevent this, we check whether we are about to uncharge the root
>> memcg and skip it if we are. Possibly instead; the obj_cgroups should
>> be removed from their slabs and any per cpu stocks instead of
>> reparenting them to root?
>
> It would be really complex. I think your fix is totally fine.
> We have similar checks in cancel_charge(), uncharge_batch(),
> mem_cgroup_swapout(), mem_cgroup_uncharge_swap() etc.
>
>
> Acked-by: Roman Gushchin 
>
> Thanks!

Great I will respin.

-- 
Thank you,
Richard.


[RFC PATCH] mm: memcg/slab: Stop reparented obj_cgroups from charging root

2020-10-14 Thread Richard Palethorpe
.845096] CR2: 7f0579ee0384 CR3: 2cc0a000 CR4: 06f0
[9.845319] Call Trace:
[9.845429] __memcg_kmem_uncharge (mm/memcontrol.c:3022)
[9.845582] drain_obj_stock (./include/linux/rcupdate.h:689 
mm/memcontrol.c:3114)
[9.845684] drain_local_stock (mm/memcontrol.c:2255)
[9.845789] process_one_work (./arch/x86/include/asm/jump_label.h:25 
./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 
kernel/workqueue.c:2274)
[9.845898] worker_thread (./include/linux/list.h:282 
kernel/workqueue.c:2416)
[9.846034] ? process_one_work (kernel/workqueue.c:2358)
[9.846162] kthread (kernel/kthread.c:292)
[9.846271] ? __kthread_bind_mask (kernel/kthread.c:245)
[9.846420] ret_from_fork (arch/x86/entry/entry_64.S:300)
[9.846531] ---[ end trace 8b5647c1eba9d18a ]---

Reported-By: l...@lists.linux.it
Signed-off-by: Richard Palethorpe 
Cc: Johannes Weiner 
Cc: Roman Gushchin 
Cc: Andrew Morton 
Cc: Shakeel Butt 
Cc: Christoph Lameter 
Cc: Michal Hocko 
Cc: Tejun Heo 
Cc: Vlastimil Babka 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
---
 mm/memcontrol.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6877c765b8d0..214e1fe4e9a2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -291,7 +291,7 @@ static void obj_cgroup_release(struct percpu_ref *ref)
 
spin_lock_irqsave(_set_lock, flags);
memcg = obj_cgroup_memcg(objcg);
-   if (nr_pages)
+   if (nr_pages && !mem_cgroup_is_root(memcg))
__memcg_kmem_uncharge(memcg, nr_pages);
list_del(>list);
mem_cgroup_put(memcg);
@@ -3100,6 +3100,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, 
unsigned int nr_bytes)
 static void drain_obj_stock(struct memcg_stock_pcp *stock)
 {
struct obj_cgroup *old = stock->cached_objcg;
+   struct mem_cgroup *memcg;
 
if (!old)
return;
@@ -3110,7 +3111,9 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 
if (nr_pages) {
rcu_read_lock();
-   __memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages);
+   memcg = obj_cgroup_memcg(old);
+   if (!mem_cgroup_is_root(memcg))
+   __memcg_kmem_uncharge(memcg, nr_pages);
rcu_read_unlock();
}
 
-- 
2.28.0



[PATCH v2] nvdimm: Avoid race between probe and reading device attributes

2020-06-15 Thread Richard Palethorpe
+0x2b0/0x2b0
[   12.832333]  ? nvdimm_probe+0x259/0x420 [libnvdimm]
[   12.832975]  ? mutex_trylock+0x2b0/0x2b0
[   12.833500]  ? nvdimm_probe+0x259/0x420 [libnvdimm]
[   12.834122]  ? prepare_ftrace_return+0xa1/0xf0
[   12.834724]  ? ftrace_graph_caller+0x6b/0xa0
[   12.835269]  ? acpi_label_write+0x390/0x390 [nfit]
[   12.835909]  ? nvdimm_probe+0x259/0x420 [libnvdimm]
[   12.836558]  ? nvdimm_probe+0x259/0x420 [libnvdimm]
[   12.837179]  nvdimm_probe+0x259/0x420 [libnvdimm]
[   12.837802]  nvdimm_bus_probe+0x110/0x6b0 [libnvdimm]
[   12.838470]  really_probe+0x212/0x9a0
[   12.838954]  driver_probe_device+0x1cd/0x300
[   12.839511]  ? driver_probe_device+0x5/0x300
[   12.840063]  device_driver_attach+0xe7/0x120
[   12.840623]  bind_store+0x18d/0x230
[   12.841075]  kernfs_fop_write+0x200/0x420
[   12.841606]  vfs_write+0x154/0x450
[   12.842047]  ksys_write+0xf9/0x1d0
[   12.842497]  ? __ia32_sys_read+0xb0/0xb0
[   12.843010]  do_syscall_64+0x95/0x4a0
[   12.843495]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[   12.844140] RIP: 0033:0x7f5b235d3563
[   12.844607] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 
00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
[   12.846877] RSP: 002b:7fff1c3bc578 EFLAGS: 0246 ORIG_RAX: 
0001
[   12.847822] RAX: ffda RBX: 0006 RCX: 7f5b235d3563
[   12.848717] RDX: 0006 RSI: 55f9576710d0 RDI: 0001
[   12.849594] RBP: 55f9576710d0 R08: 000a R09: 
[   12.850470] R10:  R11: 0246 R12: 0006
[   12.851333] R13: 7f5b236a3500 R14: 0006 R15: 7f5b236a3700
[   12.852247]
[   12.852466] Allocated by task 225:
[   12.852893]  save_stack+0x1b/0x40
[   12.853310]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[   12.853918]  kmem_cache_alloc_node+0xef/0x270
[   12.854475]  copy_process+0x485/0x6130
[   12.854945]  _do_fork+0xf1/0xb40
[   12.855353]  __do_sys_clone+0xc3/0x100
[   12.855843]  do_syscall_64+0x95/0x4a0
[   12.856302]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[   12.856939]
[   12.857140] Freed by task 0:
[   12.857522]  save_stack+0x1b/0x40
[   12.857940]  __kasan_slab_free+0x12c/0x170
[   12.858464]  kmem_cache_free+0xb0/0x330
[   12.858945]  rcu_core+0x55f/0x19f0
[   12.859385]  __do_softirq+0x228/0x944
[   12.859869]
[   12.860075] The buggy address belongs to the object at 888065c26200
[   12.860075]  which belongs to the cache task_struct of size 6016
[   12.861638] The buggy address is located 56 bytes inside of
[   12.861638]  6016-byte region [888065c26200, 888065c27980)
[   12.863084] The buggy address belongs to the page:
[   12.863702] page:ea0001970800 refcount:1 mapcount:0 
mapping:21ee3712 index:0x0 head:ea0001970800 order:3 
compound_mapcount:0 compound_pincount:0
[   12.865478] flags: 0x810200(slab|head)
[   12.866039] raw: 00810200  00010001 
888066c0f980
[   12.867010] raw:  80050005 0001 

[   12.867986] page dumped because: kasan: bad access detected
[   12.868696]
[   12.868900] Memory state around the buggy address:
[   12.869514]  888065c26100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[   12.870414]  888065c26180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[   12.871318] >888065c26200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[   12.872238] ^
[   12.872870]  888065c26280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[   12.873754]  888065c26300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[   12.874640]
==

This can be prevented by setting the driver data after initialisation is
complete.

Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm 
device-driver infrastructure")
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Ira Weiny 
Cc: linux-nvd...@lists.01.org
Cc: linux-kernel@vger.kernel.org
Cc: Coly Li 
Signed-off-by: Richard Palethorpe 
---

V2:
+ Reviewed by Coly and removed unecessary lock

 drivers/nvdimm/dimm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
index 7d4ddc4d9322..3d3988e1d9a0 100644
--- a/drivers/nvdimm/dimm.c
+++ b/drivers/nvdimm/dimm.c
@@ -43,7 +43,6 @@ static int nvdimm_probe(struct device *dev)
if (!ndd)
return -ENOMEM;
 
-   dev_set_drvdata(dev, ndd);
ndd->dpa.name = dev_name(dev);
ndd->ns_current = -1;
ndd->ns_next = -1;
@@ -106,6 +105,8 @@ static int nvdimm_probe(struct device *dev)
if (rc)
goto err;
 
+   dev_set_drvdata(dev, ndd);
+
return 0;
 
  err:
-- 
2.26.2



[PATCH] nvdimm: Avoid race between probe and reading device attributes

2020-05-04 Thread Richard Palethorpe
+0x2b0/0x2b0
[   12.832333]  ? nvdimm_probe+0x259/0x420 [libnvdimm]
[   12.832975]  ? mutex_trylock+0x2b0/0x2b0
[   12.833500]  ? nvdimm_probe+0x259/0x420 [libnvdimm]
[   12.834122]  ? prepare_ftrace_return+0xa1/0xf0
[   12.834724]  ? ftrace_graph_caller+0x6b/0xa0
[   12.835269]  ? acpi_label_write+0x390/0x390 [nfit]
[   12.835909]  ? nvdimm_probe+0x259/0x420 [libnvdimm]
[   12.836558]  ? nvdimm_probe+0x259/0x420 [libnvdimm]
[   12.837179]  nvdimm_probe+0x259/0x420 [libnvdimm]
[   12.837802]  nvdimm_bus_probe+0x110/0x6b0 [libnvdimm]
[   12.838470]  really_probe+0x212/0x9a0
[   12.838954]  driver_probe_device+0x1cd/0x300
[   12.839511]  ? driver_probe_device+0x5/0x300
[   12.840063]  device_driver_attach+0xe7/0x120
[   12.840623]  bind_store+0x18d/0x230
[   12.841075]  kernfs_fop_write+0x200/0x420
[   12.841606]  vfs_write+0x154/0x450
[   12.842047]  ksys_write+0xf9/0x1d0
[   12.842497]  ? __ia32_sys_read+0xb0/0xb0
[   12.843010]  do_syscall_64+0x95/0x4a0
[   12.843495]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[   12.844140] RIP: 0033:0x7f5b235d3563
[   12.844607] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 
00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
[   12.846877] RSP: 002b:7fff1c3bc578 EFLAGS: 0246 ORIG_RAX: 
0001
[   12.847822] RAX: ffda RBX: 0006 RCX: 7f5b235d3563
[   12.848717] RDX: 0006 RSI: 55f9576710d0 RDI: 0001
[   12.849594] RBP: 55f9576710d0 R08: 000a R09: 
[   12.850470] R10:  R11: 0246 R12: 0006
[   12.851333] R13: 7f5b236a3500 R14: 0006 R15: 7f5b236a3700
[   12.852247]
[   12.852466] Allocated by task 225:
[   12.852893]  save_stack+0x1b/0x40
[   12.853310]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[   12.853918]  kmem_cache_alloc_node+0xef/0x270
[   12.854475]  copy_process+0x485/0x6130
[   12.854945]  _do_fork+0xf1/0xb40
[   12.855353]  __do_sys_clone+0xc3/0x100
[   12.855843]  do_syscall_64+0x95/0x4a0
[   12.856302]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[   12.856939]
[   12.857140] Freed by task 0:
[   12.857522]  save_stack+0x1b/0x40
[   12.857940]  __kasan_slab_free+0x12c/0x170
[   12.858464]  kmem_cache_free+0xb0/0x330
[   12.858945]  rcu_core+0x55f/0x19f0
[   12.859385]  __do_softirq+0x228/0x944
[   12.859869]
[   12.860075] The buggy address belongs to the object at 888065c26200
[   12.860075]  which belongs to the cache task_struct of size 6016
[   12.861638] The buggy address is located 56 bytes inside of
[   12.861638]  6016-byte region [888065c26200, 888065c27980)
[   12.863084] The buggy address belongs to the page:
[   12.863702] page:ea0001970800 refcount:1 mapcount:0 
mapping:21ee3712 index:0x0 head:ea0001970800 order:3 
compound_mapcount:0 compound_pincount:0
[   12.865478] flags: 0x810200(slab|head)
[   12.866039] raw: 00810200  00010001 
888066c0f980
[   12.867010] raw:  80050005 0001 

[   12.867986] page dumped because: kasan: bad access detected
[   12.868696]
[   12.868900] Memory state around the buggy address:
[   12.869514]  888065c26100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[   12.870414]  888065c26180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[   12.871318] >888065c26200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[   12.872238] ^
[   12.872870]  888065c26280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[   12.873754]  888065c26300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[   12.874640]
==

This can be prevented by setting the driver data after initialisation is
complete. As a precaution the bus lock is also taken when setting the driver
data.

Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm 
device-driver infrastructure")
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Ira Weiny 
Cc: linux-nvd...@lists.01.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Richard Palethorpe 
---
 drivers/nvdimm/dimm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
index 7d4ddc4d9322..c757e8a5094d 100644
--- a/drivers/nvdimm/dimm.c
+++ b/drivers/nvdimm/dimm.c
@@ -43,7 +43,6 @@ static int nvdimm_probe(struct device *dev)
if (!ndd)
return -ENOMEM;
 
-   dev_set_drvdata(dev, ndd);
ndd->dpa.name = dev_name(dev);
ndd->ns_current = -1;
ndd->ns_next = -1;
@@ -106,6 +105,10 @@ static int nvdimm_probe(struct device *dev)
if (rc)
goto err;
 
+   nvdimm_bus_lock(dev);
+   dev_set_drvdata(dev, ndd);
+   nvdimm_bus_unlock(dev);
+
return 0;
 
  err:
-- 
2.26.1



Re: [LTP] 12abeb544d: ltp.read_all_dev.fail

2019-09-23 Thread Richard Palethorpe
Hello,

kernel test robot  writes:

> FYI, we noticed the following commit (built with gcc-7):
>
> commit: 12abeb544d548f55f56323fc6e5e6c0fb74f58e1 ("horrible test hack")
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/luto/linux.git 
> random/kill-it
>
> in testcase: ltp
> with following parameters:
>
>   disk: 1HDD
>   fs: ext4
>   test: fs-02
>
> test-description: The LTP testsuite contains a collection of tools for 
> testing the Linux kernel and related features.
> test-url: http://linux-test-project.github.io/
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot 
>
> <<>>
> tag=read_all_dev stime=1569106866
> cmdline="read_all -d /dev -p -q -r 10"
> contacts=""
> analysis=exit
> <<>>
> tst_test.c:1108: INFO: Timeout per run is 0h 05m 00s
> Test timeouted, sending SIGKILL!
> tst_test.c:1148: INFO: If you are running on slow machine, try exporting 
> LTP_TIMEOUT_MUL > 1
> tst_test.c:1149: BROK: Test killed! (timeout?)

So perhaps this is caused by reads of /dev/random hanging? At any rate,
I suppose this is intended to deliberately break something, so we can
ignore it.

--
Thank you,
Richard.


Re: [LTP] 3239b6f29b ("KEYS: return full count in keyring_read() if buffer is too small"): ltp.keyctl06.fail

2017-12-05 Thread Richard Palethorpe
Hello,

Eric Biggers writes:

> On Fri, Dec 01, 2017 at 10:40:12AM +0800, Fengguang Wu wrote:
>> Hi Eric,
>> 
>> We noticed LTP keyctl06 test regression in
>> 
>> commit: 3239b6f29bdfb4b0a2ba59df995fc9e6f4df7f1f ("KEYS: return full count 
>> in keyring_read() if buffer is too small")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>> 
>> on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 
>> 64G memory
>> with following parameters:
>> 
>>  disk: 1HDD
>>  fs: ext4
>>  test: syscalls_part2
>> 
>> test-description: The LTP testsuite contains a collection of tools for 
>> testing the Linux kernel and related features.
>> test-url: http://linux-test-project.github.io/
>> 
>> To reproduce:
>> 
>>  git clone https://github.com/intel/lkp-tests.git
>>  cd lkp-tests
>>  bin/lkp install job.yaml  # job file is attached in this email
>>  bin/lkp run job.yaml
>> 
>> testcase/path_params/tbox_group/run: ltp/1HDD-ext4-syscalls_part2/ivb44
>> 
>
> Did this include LTP commit f21703fe45d3 ("syscalls/keyctl06: update to test 
> for
> follow-on fix")?  It would be really helpful if these reports would include 
> the
> LTP commit that was used.
>
> Eric

We are getting the following result, even with the follow up patch:

keyctl06.c:68: FAIL: KEYCTL_READ returned 4 but expected 8

This is with SUSE SLE15 kernel 4.12.14. I haven't had chance to
investigate it yet.

-- 
Thank you,
Richard.


Re: [LTP] 3239b6f29b ("KEYS: return full count in keyring_read() if buffer is too small"): ltp.keyctl06.fail

2017-12-05 Thread Richard Palethorpe
Hello,

Eric Biggers writes:

> On Fri, Dec 01, 2017 at 10:40:12AM +0800, Fengguang Wu wrote:
>> Hi Eric,
>> 
>> We noticed LTP keyctl06 test regression in
>> 
>> commit: 3239b6f29bdfb4b0a2ba59df995fc9e6f4df7f1f ("KEYS: return full count 
>> in keyring_read() if buffer is too small")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>> 
>> on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 
>> 64G memory
>> with following parameters:
>> 
>>  disk: 1HDD
>>  fs: ext4
>>  test: syscalls_part2
>> 
>> test-description: The LTP testsuite contains a collection of tools for 
>> testing the Linux kernel and related features.
>> test-url: http://linux-test-project.github.io/
>> 
>> To reproduce:
>> 
>>  git clone https://github.com/intel/lkp-tests.git
>>  cd lkp-tests
>>  bin/lkp install job.yaml  # job file is attached in this email
>>  bin/lkp run job.yaml
>> 
>> testcase/path_params/tbox_group/run: ltp/1HDD-ext4-syscalls_part2/ivb44
>> 
>
> Did this include LTP commit f21703fe45d3 ("syscalls/keyctl06: update to test 
> for
> follow-on fix")?  It would be really helpful if these reports would include 
> the
> LTP commit that was used.
>
> Eric

We are getting the following result, even with the follow up patch:

keyctl06.c:68: FAIL: KEYCTL_READ returned 4 but expected 8

This is with SUSE SLE15 kernel 4.12.14. I haven't had chance to
investigate it yet.

-- 
Thank you,
Richard.