Re: [ANNOUNCE] 5.10.199-rt97

2023-11-11 Thread Mike Galbraith
On Sat, 2023-11-11 at 10:54 +0100, Pavel Machek wrote:
> Hi!
>
> > I'm pleased to announce the 5.10.199-rt97 stable release.
> >
> > This release is an update to the new stable 5.10.199 version and no
> > RT-specific changes were made, with the exception of a fix to a build
> > failure due to upstream changes affecting only the PREEMPT_RT code.
>
> Thanks for the release. Do you plan 5.10.200-based one by chance? .199
> is buggy on hardware we care about, so it would make our job a bit
> easier.

FWIW, the extracted (git format-patch) patch set slid into 5.10.200
here without so much as an offset.

-Mike


[tip: x86/urgent] x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access

2021-04-20 Thread tip-bot2 for Mike Galbraith
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 5849cdf8c120e3979c57d34be55b92d90a77a47e
Gitweb:
https://git.kernel.org/tip/5849cdf8c120e3979c57d34be55b92d90a77a47e
Author:Mike Galbraith 
AuthorDate:Fri, 16 Apr 2021 14:02:07 +02:00
Committer: Borislav Petkov 
CommitterDate: Tue, 20 Apr 2021 17:32:46 +02:00

x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access

Commit in Fixes: added support for kexec-ing a kernel on panic using a
new system call. As part of it, it does prepare a memory map for the new
kernel.

However, while doing so, it wrongly accesses memory it has not
allocated: it accesses the first element of the cmem->ranges[] array in
memmap_exclude_ranges() but it has not allocated the memory for it in
crash_setup_memmap_entries(). As KASAN reports:

  BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
  Write of size 8 at addr c9426008 by task kexec/1187

  (gdb) list *crash_setup_memmap_entries+0x17e
  0x8107cafe is in crash_setup_memmap_entries 
(arch/x86/kernel/crash.c:322).
  317  unsigned long long mend)
  318 {
  319 unsigned long start, end;
  320
  321 cmem->ranges[0].start = mstart;
  322 cmem->ranges[0].end = mend;
  323 cmem->nr_ranges = 1;
  324
  325 /* Exclude elf header region */
  326 start = image->arch.elf_load_addr;
  (gdb)

Make sure the ranges array becomes a single element allocated.

 [ bp: Write a proper commit message. ]

Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Mike Galbraith 
Signed-off-by: Borislav Petkov 
Reviewed-by: Dave Young 
Cc: 
Link: 
https://lkml.kernel.org/r/725fa3dc1da2737f0f6188a1a9701bead257ea9d.ca...@gmx.de
---
 arch/x86/kernel/crash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index a8f3af2..b1deacb 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -337,7 +337,7 @@ int crash_setup_memmap_entries(struct kimage *image, struct 
boot_params *params)
struct crash_memmap_data cmd;
struct crash_mem *cmem;
 
-   cmem = vzalloc(sizeof(struct crash_mem));
+   cmem = vzalloc(struct_size(cmem, ranges, 1));
if (!cmem)
return -ENOMEM;
 


Re: [patch] x86/crash: fix crash_setup_memmap_entries() out-of-bounds access

2021-04-16 Thread Mike Galbraith
On Fri, 2021-04-16 at 23:44 +0200, Thomas Gleixner wrote:
>
> Can all of you involved stop this sandpit fight and do something useful
> to fix that obvious bug already?

?? We're not fighting afaik.  Boris hated my changelog enough to offer
to write a better one, and I'm fine with that.  It's a seven year old
*latent* buglet of microscopic proportions, hardly a pressing issue.

-Mike



Re: [patch] x86/crash: fix crash_setup_memmap_entries() out-of-bounds access

2021-04-16 Thread Mike Galbraith
On Fri, 2021-04-16 at 16:44 +0200, Borislav Petkov wrote:
> On Fri, Apr 16, 2021 at 03:16:07PM +0200, Mike Galbraith wrote:
> > On Fri, 2021-04-16 at 14:16 +0200, Borislav Petkov wrote:
> > >
> > > Please be more verbose and structure your commit message like this:
> >
> > Hrmph, I thought it was too verbose for dinky one-liner if anything.
>
> Please look at how other commit messages in tip have free text - not
> only tools output.
>
> Also, this looks like a fix for some previous commit. Please dig out
> which commit introduced the issue and put its commit ID in a Fixes: tag
> above your S-o-B.
>
> If you don't have time or desire to do that, you can say so and I'll do
> it myself when I get a chance.

Ok, bin it for the nonce.

-Mike



BUG: KASAN: slab-out-of-bounds in acpi_cppc_processor_probe+0x15c/0xa50

2021-04-16 Thread Mike Galbraith
[6.343387] BUG: KASAN: slab-out-of-bounds in 
acpi_cppc_processor_probe+0x15c/0xa50
[6.343474] Read of size 4 at addr 888120cf1630 by task swapper/0/1

[6.343565] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G  I   
5.12.0.g8b1fdf9-tip #2
[6.343654] Hardware name: HP HP Spectre x360 Convertible/804F, BIOS F.47 
11/22/2017
[6.343735] Call Trace:
[6.343766]  ? acpi_cppc_processor_probe+0x15c/0xa50
[6.343824]  dump_stack+0x8a/0xb5
[6.343865]  print_address_description.constprop.0+0x16/0xa0
[6.343931]  kasan_report+0xcb/0x110
[6.343974]  ? acpi_cppc_processor_probe+0x15c/0xa50
[6.344032]  acpi_cppc_processor_probe+0x15c/0xa50
[6.344086]  ? mutex_unlock+0x1d/0x40
[6.344130]  ? kernfs_add_one+0x1b1/0x210
[6.344177]  ? __might_sleep+0x31/0xd0
[6.344223]  ? acpi_get_psd_map+0x2d0/0x2d0
[6.344271]  ? mutex_lock+0x91/0xd0
[6.344313]  __acpi_processor_start+0x4e/0x150
[6.344364]  acpi_processor_start+0x3d/0x60
[6.344412]  really_probe+0x182/0x6c0
[6.344458]  driver_probe_device+0x13f/0x1d0
[6.346259]  device_driver_attach+0x110/0x120
[6.347081]  ? device_driver_attach+0x120/0x120
[6.347081]  __driver_attach+0xae/0x190
[6.347081]  ? device_driver_attach+0x120/0x120
[6.347081]  bus_for_each_dev+0xd8/0x120
[6.347081]  ? subsys_dev_iter_exit+0x10/0x10
[6.347081]  bus_add_driver+0x1f8/0x2e0
[6.347081]  driver_register+0x10f/0x190
[6.347081]  acpi_processor_driver_init+0x2f/0xc3
[6.347081]  ? acpi_pci_slot_init+0x11/0x11
[6.347081]  do_one_initcall+0x71/0x260
[6.347081]  ? trace_event_raw_event_initcall_finish+0x120/0x120
[6.347081]  ? parameq+0x90/0x90
[6.347081]  ? kasan_unpoison+0x21/0x50
[6.347081]  ? __kasan_slab_alloc+0x24/0x70
[6.347081]  do_initcalls+0xff/0x129
[6.347081]  kernel_init_freeable+0x19c/0x1ce
[6.347081]  ? rest_init+0xc6/0xc6
[6.347081]  kernel_init+0xd/0x11a
[6.347081]  ret_from_fork+0x1f/0x30

[6.347081] Allocated by task 1:
[6.347081]  kasan_save_stack+0x1b/0x40
[6.347081]  __kasan_kmalloc+0x7a/0x90
[6.347081]  acpi_ut_initialize_buffer+0x41/0x8b
[6.347081]  acpi_evaluate_object+0x306/0x395
[6.347081]  acpi_evaluate_object_typed+0xd4/0x201
[6.347081]  acpi_cppc_processor_probe+0xa0/0xa50
[6.347081]  __acpi_processor_start+0x4e/0x150
[6.347081]  acpi_processor_start+0x3d/0x60
[6.347081]  really_probe+0x182/0x6c0
[6.347081]  driver_probe_device+0x13f/0x1d0
[6.347081]  device_driver_attach+0x110/0x120
[6.347081]  __driver_attach+0xae/0x190
[6.347081]  bus_for_each_dev+0xd8/0x120
[6.347081]  bus_add_driver+0x1f8/0x2e0
[6.347081]  driver_register+0x10f/0x190
[6.347081]  acpi_processor_driver_init+0x2f/0xc3
[6.347081]  do_one_initcall+0x71/0x260
[6.347081]  do_initcalls+0xff/0x129
[6.347081]  kernel_init_freeable+0x19c/0x1ce
[6.347081]  kernel_init+0xd/0x11a
[6.347081]  ret_from_fork+0x1f/0x30

[6.347081] The buggy address belongs to the object at 888120cf1600
which belongs to the cache kmalloc-64 of size 64
[6.347081] The buggy address is located 48 bytes inside of
64-byte region [888120cf1600, 888120cf1640)
[6.347081] The buggy address belongs to the page:
[6.347081] page:1f073982 refcount:1 mapcount:0 
mapping: index:0x0 pfn:0x120cf1
[6.347081] flags: 0x8200(slab)
[6.347081] raw: 8200 dead0100 dead0122 
888100042640
[6.347081] raw:  80200020 0001 

[6.347081] page dumped because: kasan: bad access detected

[6.347081] Memory state around the buggy address:
[6.347081]  888120cf1500: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc 
fc
[6.347081]  888120cf1580: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc 
fc
[6.347081] >888120cf1600: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc 
fc
[6.347081]  ^
[6.347081]  888120cf1680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[6.347081]  888120cf1700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc



Re: [patch] x86/crash: fix crash_setup_memmap_entries() out-of-bounds access

2021-04-16 Thread Mike Galbraith
On Fri, 2021-04-16 at 14:16 +0200, Borislav Petkov wrote:
>
> Please be more verbose and structure your commit message like this:

Hrmph, I thought it was too verbose for dinky one-liner if anything.  I
showed the complaint along with an 8x10 color glossy crime scene photo,
then explained why it happened and what to do about it with a perhaps
terse but perfectly clear sentence.

-Mike



[patch] x86/crash: fix crash_setup_memmap_entries() out-of-bounds access

2021-04-16 Thread Mike Galbraith
[   15.428011] BUG: KASAN: vmalloc-out-of-bounds in 
crash_setup_memmap_entries+0x17e/0x3a0
[   15.428018] Write of size 8 at addr c9426008 by task kexec/1187

(gdb) list *crash_setup_memmap_entries+0x17e
0x8107cafe is in crash_setup_memmap_entries 
(arch/x86/kernel/crash.c:322).
317  unsigned long long mend)
318 {
319 unsigned long start, end;
320
321 cmem->ranges[0].start = mstart;
322 cmem->ranges[0].end = mend;
323 cmem->nr_ranges = 1;
324
325 /* Exclude elf header region */
326 start = image->arch.elf_load_addr;
(gdb)

Append missing struct crash_mem_range to cmem.

Signed-off-by: Mike Galbraith 
---
 arch/x86/kernel/crash.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -337,7 +337,7 @@ int crash_setup_memmap_entries(struct ki
struct crash_memmap_data cmd;
struct crash_mem *cmem;

-   cmem = vzalloc(sizeof(struct crash_mem));
+   cmem = vzalloc(struct_size(cmem, ranges, 1));
if (!cmem)
return -ENOMEM;




Re: x86/crash: fix crash_setup_memmap_entries() out-of-bounds access

2021-04-16 Thread Mike Galbraith
On Fri, 2021-04-16 at 19:07 +0800, Dave Young wrote:
>
> > We're excluding two ranges, allocate the scratch space we need to do that.
>
> I think 1 range should be fine, have you tested 1?

Have now, and vzalloc(struct_size(cmem, ranges, 1)) worked just fine.

-Mike



Re: Question on KASAN calltrace record in RT

2021-04-15 Thread Mike Galbraith
On Wed, 2021-04-14 at 07:26 +0200, Dmitry Vyukov wrote:
>
> > [   15.428008] 
> > ==
> > [   15.428011] BUG: KASAN: vmalloc-out-of-bounds in 
> > crash_setup_memmap_entries+0x17e/0x3a0
>
> This looks like a genuine kernel bug on first glance. I think it needs
> to be fixed rather than ignored.

I posted a fix, so KASAN earns another notch in its pistol grips.

-Mike



x86/crash: fix crash_setup_memmap_entries() out-of-bounds access

2021-04-15 Thread Mike Galbraith
x86/crash: fix crash_setup_memmap_entries() KASAN vmalloc-out-of-bounds gripe

[   15.428011] BUG: KASAN: vmalloc-out-of-bounds in 
crash_setup_memmap_entries+0x17e/0x3a0
[   15.428018] Write of size 8 at addr c9426008 by task kexec/1187

(gdb) list *crash_setup_memmap_entries+0x17e
0x8107cafe is in crash_setup_memmap_entries 
(arch/x86/kernel/crash.c:322).
317  unsigned long long mend)
318 {
319 unsigned long start, end;
320
321 cmem->ranges[0].start = mstart;
322 cmem->ranges[0].end = mend;
323 cmem->nr_ranges = 1;
324
325 /* Exclude elf header region */
326 start = image->arch.elf_load_addr;
(gdb)

We're excluding two ranges, allocate the scratch space we need to do that.

Signed-off-by: Mike Galbraith 
---
 arch/x86/kernel/crash.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -337,7 +337,7 @@ int crash_setup_memmap_entries(struct ki
struct crash_memmap_data cmd;
struct crash_mem *cmem;

-   cmem = vzalloc(sizeof(struct crash_mem));
+   cmem = vzalloc(sizeof(struct crash_mem)+(2*sizeof(struct 
crash_mem_range)));
if (!cmem)
return -ENOMEM;




[patch] kasan: make it RT aware

2021-04-14 Thread Mike Galbraith
On Wed, 2021-04-14 at 08:15 +0200, Mike Galbraith wrote:
> On Wed, 2021-04-14 at 07:26 +0200, Dmitry Vyukov wrote:
> > On Wed, Apr 14, 2021 at 6:00 AM Mike Galbraith  wrote:
> >
> > > [0.692437] BUG: sleeping function called from invalid context at 
> > > kernel/locking/rtmutex.c:943
> > > [0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, 
> > > name: swapper/0
> > > [0.692442] Preemption disabled at:
> > > [0.692443] [] on_each_cpu_cond_mask+0x30/0xb0
> > > [0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 
> > > 5.12.0.g2afefec-tip-rt #5
> > > [0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
> > > 09/23/2013
> > > [0.692456] Call Trace:
> > > [0.692458]  ? on_each_cpu_cond_mask+0x30/0xb0
> > > [0.692462]  dump_stack+0x8a/0xb5
> > > [0.692467]  ___might_sleep.cold+0xfe/0x112
> > > [0.692471]  rt_spin_lock+0x1c/0x60
> >
> > HI Mike,
> >
> > If freeing pages from smp_call_function is not OK, then perhaps we
> > need just to collect the objects to be freed to the task/CPU that
> > executes kasan_quarantine_remove_cache and it will free them (we know
> > it can free objects).
>
> Yeah, RT will have to shove freeing into preemptible context.

There's a very similar problem addressed in the RT patch set, so I used
the free samples on top of your *very* convenient hint that pesky
preallocation is optional, to seemingly make KASAN a happy RT camper.
Dunno if RT maintainers would prefer something like this over simply
disabling KASAN for RT configs, but what the heck, I'll show it.

kasan: make it RT aware

Skip preallocation when not possible for RT, and move cache removal
from IPI to synchronous work.

Signed-off-by: Mike Galbraith 
---
 lib/stackdepot.c  |   10 +-
 mm/kasan/quarantine.c |   49 +
 2 files changed, 54 insertions(+), 5 deletions(-)

--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -71,7 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX
 static int depot_index;
 static int next_slab_inited;
 static size_t depot_offset;
-static DEFINE_SPINLOCK(depot_lock);
+static DEFINE_RAW_SPINLOCK(depot_lock);

 static bool init_stack_slab(void **prealloc)
 {
@@ -265,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un
struct page *page = NULL;
void *prealloc = NULL;
unsigned long flags;
-   u32 hash;
+   u32 hash, may_prealloc = !IS_ENABLED(CONFIG_PREEMPT_RT) || 
preemptible();

if (unlikely(nr_entries == 0) || stack_depot_disable)
goto fast_exit;
@@ -291,7 +291,7 @@ depot_stack_handle_t stack_depot_save(un
 * The smp_load_acquire() here pairs with smp_store_release() to
 * |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
 */
-   if (unlikely(!smp_load_acquire(_slab_inited))) {
+   if (unlikely(!smp_load_acquire(_slab_inited) && may_prealloc)) {
/*
 * Zero out zone modifiers, as we don't have specific zone
 * requirements. Keep the flags related to allocation in atomic
@@ -305,7 +305,7 @@ depot_stack_handle_t stack_depot_save(un
prealloc = page_address(page);
}

-   spin_lock_irqsave(_lock, flags);
+   raw_spin_lock_irqsave(_lock, flags);

found = find_stack(*bucket, entries, nr_entries, hash);
if (!found) {
@@ -329,7 +329,7 @@ depot_stack_handle_t stack_depot_save(un
WARN_ON(!init_stack_slab());
}

-   spin_unlock_irqrestore(_lock, flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
 exit:
if (prealloc) {
/* Nobody used this memory, ok to free it. */
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -19,6 +19,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 

 #include "../slab.h"
@@ -308,6 +311,48 @@ static void per_cpu_remove_cache(void *a
qlist_free_all(_free, cache);
 }

+#ifdef CONFIG_PREEMPT_RT
+struct remove_cache_work {
+   struct work_struct work;
+   struct kmem_cache *cache;
+};
+
+static DEFINE_MUTEX(remove_caches_lock);
+static DEFINE_PER_CPU(struct remove_cache_work, remove_cache_work);
+
+static void per_cpu_remove_cache_work(struct work_struct *w)
+{
+   struct remove_cache_work *rcw;
+
+   rcw = container_of(w, struct remove_cache_work, work);
+   per_cpu_remove_cache(rcw->cache);
+}
+
+static void per_cpu_remove_caches_sync(struct kmem_cache *cache)
+{
+   struct remove_cache_work *rcw;
+   unsigned int cpu;
+
+   cpus_read_lock();
+   mutex_lock(_caches_lock);
+
+   for_each_online_cpu(cpu) {
+   rcw = _cpu(remove_cache_work, cpu);
+   INIT

Re: 回复: Question on KASAN calltrace record in RT

2021-04-14 Thread Mike Galbraith
On Wed, 2021-04-14 at 07:29 +, Zhang, Qiang wrote:
>
> if CONFIG_PREEMPT_RT is enabled and  but not in preemptible, the prealloc 
> should be allowed

No, you can't take an rtmutex when not preemptible.

-Mike



Re: Question on KASAN calltrace record in RT

2021-04-14 Thread Mike Galbraith
On Wed, 2021-04-14 at 07:26 +0200, Dmitry Vyukov wrote:
> On Wed, Apr 14, 2021 at 6:00 AM Mike Galbraith  wrote:
>
> > [0.692437] BUG: sleeping function called from invalid context at 
> > kernel/locking/rtmutex.c:943
> > [0.692439] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, 
> > name: swapper/0
> > [0.692442] Preemption disabled at:
> > [0.692443] [] on_each_cpu_cond_mask+0x30/0xb0
> > [0.692451] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 
> > 5.12.0.g2afefec-tip-rt #5
> > [0.692454] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
> > 09/23/2013
> > [0.692456] Call Trace:
> > [0.692458]  ? on_each_cpu_cond_mask+0x30/0xb0
> > [0.692462]  dump_stack+0x8a/0xb5
> > [0.692467]  ___might_sleep.cold+0xfe/0x112
> > [0.692471]  rt_spin_lock+0x1c/0x60
>
> HI Mike,
>
> If freeing pages from smp_call_function is not OK, then perhaps we
> need just to collect the objects to be freed to the task/CPU that
> executes kasan_quarantine_remove_cache and it will free them (we know
> it can free objects).

Yeah, RT will have to shove freeing into preemptible context.

> >
> > [   15.428008] 
> > ==
> > [   15.428011] BUG: KASAN: vmalloc-out-of-bounds in 
> > crash_setup_memmap_entries+0x17e/0x3a0
>
> This looks like a genuine kernel bug on first glance. I think it needs
> to be fixed rather than ignored.

I figured KASAN probably knew what it was talking about, I just wanted
it to either go find something shiny or leave lockdep the heck alone.

-Mike



Re: Question on KASAN calltrace record in RT

2021-04-13 Thread Mike Galbraith
_bus_init+0x183/0x183
[0.692539]  ? intel_idle_init+0x36d/0x36d
[0.692543]  ? acpi_bus_init+0x183/0x183
[0.692546]  do_one_initcall+0x71/0x300
[0.692550]  ? trace_event_raw_event_initcall_finish+0x120/0x120
[0.692553]  ? parameq+0x90/0x90
[0.692556]  ? __wake_up_common+0x1e0/0x200
[0.692560]  ? kasan_unpoison+0x21/0x50
[0.692562]  ? __kasan_slab_alloc+0x24/0x70
[0.692567]  do_initcalls+0xff/0x129
[0.692571]  kernel_init_freeable+0x19c/0x1ce
[0.692574]  ? rest_init+0xc6/0xc6
[0.692577]  kernel_init+0xd/0x11a
[0.692580]  ret_from_fork+0x1f/0x30

[   15.428008] 
==
[   15.428011] BUG: KASAN: vmalloc-out-of-bounds in 
crash_setup_memmap_entries+0x17e/0x3a0
[   15.428018] Write of size 8 at addr c9426008 by task kexec/1187
[   15.428022] CPU: 2 PID: 1187 Comm: kexec Tainted: GW   E 
5.12.0.g2afefec-tip-rt #5
[   15.428025] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[   15.428027] Call Trace:
[   15.428029]  ? crash_setup_memmap_entries+0x17e/0x3a0
[   15.428032]  dump_stack+0x8a/0xb5
[   15.428037]  print_address_description.constprop.0+0x16/0xa0
[   15.428044]  kasan_report+0xc4/0x100
[   15.428047]  ? crash_setup_memmap_entries+0x17e/0x3a0
[   15.428050]  crash_setup_memmap_entries+0x17e/0x3a0
[   15.428053]  ? strcmp+0x2e/0x50
[   15.428057]  ? native_machine_crash_shutdown+0x240/0x240
[   15.428059]  ? kexec_purgatory_find_symbol.isra.0+0x145/0x1a0
[   15.428066]  setup_boot_parameters+0x181/0x5c0
[   15.428069]  bzImage64_load+0x6b5/0x740
[   15.428072]  ? bzImage64_probe+0x140/0x140
[   15.428075]  ? iov_iter_kvec+0x5f/0x70
[   15.428080]  ? rw_verify_area+0x80/0x80
[   15.428087]  ? __might_sleep+0x31/0xd0
[   15.428091]  ? __might_sleep+0x31/0xd0
[   15.428094]  ? ___might_sleep+0xc9/0xe0
[   15.428096]  ? bzImage64_probe+0x140/0x140
[   15.428099]  arch_kexec_kernel_image_load+0x102/0x130
[   15.428102]  kimage_file_alloc_init+0xda/0x290
[   15.428107]  __do_sys_kexec_file_load+0x21f/0x390
[   15.428110]  ? __x64_sys_open+0x100/0x100
[   15.428113]  ? kexec_calculate_store_digests+0x390/0x390
[   15.428117]  ? rcu_nocb_flush_deferred_wakeup+0x36/0x50
[   15.428122]  do_syscall_64+0x3d/0x80
[   15.428127]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   15.428132] RIP: 0033:0x7f46ad026759
[   15.428135] Code: 00 48 81 c4 80 00 00 00 89 f0 c3 66 0f 1f 44 00 00 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 0f d7 2b 00 f7 d8 64 89 01 48
[   15.428137] RSP: 002b:7ffcf6f96788 EFLAGS: 0206 ORIG_RAX: 
0140
[   15.428141] RAX: ffda RBX: 0006 RCX: 7f46ad026759
[   15.428143] RDX: 0182 RSI: 0005 RDI: 0003
[   15.428145] RBP: 7ffcf6f96a28 R08: 0002 R09: 
[   15.428146] R10: 00b0d5e0 R11: 0206 R12: 0004
[   15.428148] R13:  R14:  R15: 
[   15.428152] Memory state around the buggy address:
[   15.428164]  c9425f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 
f8
[   15.428166]  c9425f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 
f8
[   15.428168] >c9426000: 00 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 
f8
[   15.428169]   ^
[   15.428171]  c9426080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 
f8
[   15.428172]  c9426100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 
f8
[   15.428173] 
======
[   15.428174] Disabling lock debugging due to kernel taint

kasan: stop grumbling about CRASH_DUMP

Signed-off-by: Mike Galbraith 
---
 arch/x86/kernel/Makefile |1 +
 kernel/Makefile  |1 +
 2 files changed, 2 insertions(+)

--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -105,6 +105,7 @@ obj-$(CONFIG_X86_TSC)   += trace_clock.o
 obj-$(CONFIG_CRASH_CORE)   += crash_core_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += relocate_kernel_$(BITS).o crash.o
+KASAN_SANITIZE_crash.o := n
 obj-$(CONFIG_KEXEC_FILE)   += kexec-bzimage64.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump_$(BITS).o
 obj-y  += kprobes/
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -72,6 +72,7 @@ obj-$(CONFIG_CRASH_CORE) += crash_core.o
 obj-$(CONFIG_KEXEC_CORE) += kexec_core.o
 obj-$(CONFIG_KEXEC) += kexec.o
 obj-$(CONFIG_KEXEC_FILE) += kexec_file.o
+KASAN_SANITIZE_kexec_file.o := n
 obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o
 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o



Re: [RT v5.12-rc3-rt3] __call_rcu with KASAN causes invalid sleeping function call

2021-03-26 Thread Mike Galbraith
On Fri, 2021-03-26 at 16:20 -0500, Andrew Halaney wrote:
> Hi,
>
> I booted the RT kernel (v5.12-rc3-rt3) with KASAN enabled for the first
> time today and noticed this:

That should probably be one of the auto-disabled config options.  I
started down the fix the KASAN gripes path, and quickly determined that
that was a job for someone with motivation that I fortunately lacked.

-Mike



Re: [ANNOUNCE] v5.12-rc3-rt3

2021-03-21 Thread Mike Galbraith
On Sun, 2021-03-21 at 08:46 +0100, Mike Galbraith wrote:
> On Sat, 2021-03-20 at 09:18 +0100, Mike Galbraith wrote:
> > On Fri, 2021-03-19 at 23:33 +0100, Sebastian Andrzej Siewior wrote:
> > > Dear RT folks!
> > >
> > > I'm pleased to announce the v5.12-rc3-rt3 patch set.
> >
> > My little rpi4b is fairly unhappy with 5.12-rt, whereas 5.11-rt works
> > fine on it.  The below spew is endless, making boot endless.  I turned
> > it into a WARN_ON_ONCE to see if the thing would finish boot, and
> > surprisingly, it seems perfectly fine with that bad idea. Having not
> > the foggiest clue what I'm doing down in arm arch-land, bug is in no
> > immediate danger :)
>
> Actually, it looks like a defenseless little buglet, and this gripe
> simply wants to be disabled for RT.

Or completely removed instead.

It's entirely possible I'm missing something obvious to arm experts,
but I don't _think_ the register read needs protection, leaving me
wondering why arch_faults_on_old_pte() was born with that warning.

-Mike



Re: [ANNOUNCE] v5.12-rc3-rt3

2021-03-21 Thread Mike Galbraith
On Sat, 2021-03-20 at 09:18 +0100, Mike Galbraith wrote:
> On Fri, 2021-03-19 at 23:33 +0100, Sebastian Andrzej Siewior wrote:
> > Dear RT folks!
> >
> > I'm pleased to announce the v5.12-rc3-rt3 patch set.
>
> My little rpi4b is fairly unhappy with 5.12-rt, whereas 5.11-rt works
> fine on it.  The below spew is endless, making boot endless.  I turned
> it into a WARN_ON_ONCE to see if the thing would finish boot, and
> surprisingly, it seems perfectly fine with that bad idea. Having not
> the foggiest clue what I'm doing down in arm arch-land, bug is in no
> immediate danger :)

Actually, it looks like a defenseless little buglet, and this gripe
simply wants to be disabled for RT.

arm64: disable arch_faults_on_old_pte() preemptible() warning for RT

arch_faults_on_old_pte() was never called in < 5.12-rt, but 5.12 added
arch_wants_old_prefaulted_pte(), which is a wrapper thereof, and thus
finish_fault() -> do_set_pte() -> arch_wants_old_prefaulted_pte() now
calls it, in preemptible context, and a flood of complaints ensues.

Kill it for RT.

Signed-off-by: Mike Galbraith 
---
 arch/arm64/include/asm/pgtable.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -979,7 +979,7 @@ static inline void update_mmu_cache(stru
  */
 static inline bool arch_faults_on_old_pte(void)
 {
-   WARN_ON(preemptible());
+   WARN_ON(!IS_ENABLED(CONFIG_PREEMPT_RT) && preemptible());

return !cpu_has_hw_af();
 }

>
> [2.216913] WARNING: CPU: 0 PID: 1 at arch/arm64/include/asm/pgtable.h:982 
> do_set_pte+0x1cc/0x1d4
> [2.216949] Modules linked in:
> [2.216961] CPU: 0 PID: 1 Comm: init Not tainted 5.12.0.g425ed5a-v8-rt #33
> [2.216973] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
> [2.216979] pstate: 2005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
> [2.216990] pc : do_set_pte+0x1cc/0x1d4
> [2.217004] lr : filemap_map_pages+0x178/0x380
> [2.217016] sp : ffc01153bbb0
> [2.217020] x29: ffc01153bbb0 x28: fffe07d93080
> [2.217033] x27:  x26: ff8101c9e000
> [2.217044] x25: ff8101b40fd8 x24: 
> [2.217054] x23: ff8101674170 x22: 007fb1b4b000
> [2.217064] x21: fffe07d93080 x20: ffc01153bcf0
> [2.217073] x19: 0021f64c2fc3 x18: 
> [2.217082] x17:  x16: 
> [2.217091] x15:  x14: 
> [2.217100] x13:  x12: 
> [2.217108] x11:  x10: 
> [2.217117] x9 : ffc010209068 x8 : 000f
> [2.217126] x7 : ff8101e87c68 x6 : fffe
> [2.217135] x5 : 00101e8b x4 : ff8101e880a8
> [2.217144] x3 : 00200fc3 x2 : 
> [2.217153] x1 :  x0 : 
> [2.217162] Call trace:
> [2.217166]  do_set_pte+0x1cc/0x1d4
> [2.217181]  filemap_map_pages+0x178/0x380
> [2.217189]  __handle_mm_fault+0x75c/0x930
> [2.217202]  handle_mm_fault+0x178/0x25c
> [2.217214]  do_page_fault+0x16c/0x470
> [2.217233]  do_translation_fault+0xbc/0xd8
> [2.217244]  do_mem_abort+0x4c/0xbc
> [2.217259]  el0_ia+0x68/0xcc
> [2.217272]  el0_sync_handler+0x180/0x1b0
> [2.217284]  el0_sync+0x170/0x180



Re: [ANNOUNCE] v5.12-rc3-rt3

2021-03-20 Thread Mike Galbraith
On Fri, 2021-03-19 at 23:33 +0100, Sebastian Andrzej Siewior wrote:
> Dear RT folks!
>
> I'm pleased to announce the v5.12-rc3-rt3 patch set.

My little rpi4b is fairly unhappy with 5.12-rt, whereas 5.11-rt works
fine on it.  The below spew is endless, making boot endless.  I turned
it into a WARN_ON_ONCE to see if the thing would finish boot, and
surprisingly, it seems perfectly fine with that bad idea. Having not
the foggiest clue what I'm doing down in arm arch-land, bug is in no
immediate danger :)

[2.216913] WARNING: CPU: 0 PID: 1 at arch/arm64/include/asm/pgtable.h:982 
do_set_pte+0x1cc/0x1d4
[2.216949] Modules linked in:
[2.216961] CPU: 0 PID: 1 Comm: init Not tainted 5.12.0.g425ed5a-v8-rt #33
[2.216973] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[2.216979] pstate: 2005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
[2.216990] pc : do_set_pte+0x1cc/0x1d4
[2.217004] lr : filemap_map_pages+0x178/0x380
[2.217016] sp : ffc01153bbb0
[2.217020] x29: ffc01153bbb0 x28: fffe07d93080
[2.217033] x27:  x26: ff8101c9e000
[2.217044] x25: ff8101b40fd8 x24: 
[2.217054] x23: ff8101674170 x22: 007fb1b4b000
[2.217064] x21: fffe07d93080 x20: ffc01153bcf0
[2.217073] x19: 0021f64c2fc3 x18: 
[2.217082] x17:  x16: 
[2.217091] x15:  x14: 
[2.217100] x13:  x12: 
[2.217108] x11:  x10: 
[2.217117] x9 : ffc010209068 x8 : 000f
[2.217126] x7 : ff8101e87c68 x6 : fffe
[2.217135] x5 : 00101e8b x4 : ff8101e880a8
[2.217144] x3 : 00200fc3 x2 : 
[2.217153] x1 :  x0 : 
[2.217162] Call trace:
[2.217166]  do_set_pte+0x1cc/0x1d4
[2.217181]  filemap_map_pages+0x178/0x380
[2.217189]  __handle_mm_fault+0x75c/0x930
[2.217202]  handle_mm_fault+0x178/0x25c
[2.217214]  do_page_fault+0x16c/0x470
[2.217233]  do_translation_fault+0xbc/0xd8
[2.217244]  do_mem_abort+0x4c/0xbc
[2.217259]  el0_ia+0x68/0xcc
[2.217272]  el0_sync_handler+0x180/0x1b0
[2.217284]  el0_sync+0x170/0x180



Re: Re: [PATCH] futex: use wake_up_process() instead of wake_up_state()

2021-03-19 Thread Mike Galbraith
On Fri, 2021-03-19 at 17:15 +0800, 王擎 wrote:
> >> On Fri, 2021-03-19 at 10:59 +0800, Wang Qing wrote:
> >> > Using wake_up_process() is more simpler and friendly,
> >> > and it is more convenient for analysis and statistics
> >>
> >> I likely needn't bother, and don't have a NAK to paste on this thing,
> >> but here's another copy of my NOPE for yet another gratuitous change
> >> with complete BS justification.
> >
> >Let me try a bit softer tone.  I think you're trying to help, but
> >ignoring feedback is not the way to achieve that goal.  My feedback was
> >and remains that your change is not an improvement, it's churn, but
> >more importantly, that changes require technical justification, which
> >you did not provide.  You were subsequently handed the justification
> >you lacked by none other than the maintainer of the code you were
> >modifying.  He told you that your change could become a tiny kernel
> >size optimization by converting like instances all in one patch.. which
> >you promptly ignored, instead submitting multiple patches with zero
> >justification.  That is not the path to success.
>
> Thank you for your reply. There are two reasons for sending patch again.
> One is that I think this is only an improvement in format and has no
> substantial impact, so no verification is required.

No substantial impact is not an argument in favor, quite the contrary.

> The second one is that I want to hear more opinions from the maintainer.

The highly qualified opinion Ingo took the time to hand you on a silver
platter was not good enough?  I'm starting to regret being concerned
about having potentially discouraged a potential contributor.

> Because the entire kernel may have similar problems, I have to figure out
> whether this is a tacit behavior.

My obviously misguided concerns just appeared in my rear view mirror.

> Also, I don't understand what you mean by "your change could become a
> tiny kernel size optimization by converting like instances all in one patch".

Go back and read what Ingo wrote.  While at it, read what the other two
maintainers who chimed in had to say before you consider doing any re-
submission.  The low hanging fruit you're trying to pick isn't anywhere
near as juicy as you seem to think it is.

> >>
> >> >
> >> > Signed-off-by: Wang Qing 
> >> > ---
> >> >  kernel/futex.c | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/kernel/futex.c b/kernel/futex.c
> >> > index e68db77..078a1f9
> >> > --- a/kernel/futex.c
> >> > +++ b/kernel/futex.c
> >> > @@ -1820,7 +1820,7 @@ void requeue_pi_wake_futex(struct futex_q *q, 
> >> > union futex_key *key,
> >> >
> >> >  q->lock_ptr = >lock;
> >> >
> >> > -wake_up_state(q->task, TASK_NORMAL);
> >> > +wake_up_process(q->task);
> >> >  }
> >> >
> >> >  /**
> >>
> >
>
>



Re: [PATCH] futex: use wake_up_process() instead of wake_up_state()

2021-03-19 Thread Mike Galbraith
On Fri, 2021-03-19 at 06:21 +0100, Mike Galbraith wrote:
> On Fri, 2021-03-19 at 10:59 +0800, Wang Qing wrote:
> > Using wake_up_process() is more simpler and friendly,
> > and it is more convenient for analysis and statistics
>
> I likely needn't bother, and don't have a NAK to paste on this thing,
> but here's another copy of my NOPE for yet another gratuitous change
> with complete BS justification.

Let me try a bit softer tone.  I think you're trying to help, but
ignoring feedback is not the way to achieve that goal.  My feedback was
and remains that your change is not an improvement, it's churn, but
more importantly, that changes require technical justification, which
you did not provide.  You were subsequently handed the justification
you lacked by none other than the maintainer of the code you were
modifying.  He told you that your change could become a tiny kernel
size optimization by converting like instances all in one patch.. which
you promptly ignored, instead submitting multiple patches with zero
justification.  That is not the path to success.

>
> >
> > Signed-off-by: Wang Qing 
> > ---
> >  kernel/futex.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index e68db77..078a1f9
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -1820,7 +1820,7 @@ void requeue_pi_wake_futex(struct futex_q *q, union 
> > futex_key *key,
> >
> > q->lock_ptr = >lock;
> >
> > -   wake_up_state(q->task, TASK_NORMAL);
> > +   wake_up_process(q->task);
> >  }
> >
> >  /**
>



Re: [PATCH] futex: use wake_up_process() instead of wake_up_state()

2021-03-18 Thread Mike Galbraith
On Fri, 2021-03-19 at 10:59 +0800, Wang Qing wrote:
> Using wake_up_process() is more simpler and friendly,
> and it is more convenient for analysis and statistics

I likely needn't bother, and don't have a NAK to paste on this thing,
but here's another copy of my NOPE for yet another gratuitous change
with complete BS justification.

>
> Signed-off-by: Wang Qing 
> ---
>  kernel/futex.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index e68db77..078a1f9
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1820,7 +1820,7 @@ void requeue_pi_wake_futex(struct futex_q *q, union 
> futex_key *key,
>
>   q->lock_ptr = >lock;
>
> - wake_up_state(q->task, TASK_NORMAL);
> + wake_up_process(q->task);
>  }
>
>  /**



Re: Re: [PATCH] sched: swait: use wake_up_process() instead of wake_up_state()

2021-03-17 Thread Mike Galbraith
On Thu, 2021-03-18 at 10:14 +0800, 王擎 wrote:
> >>
> >> * Mike Galbraith  wrote:
> >>
> >> > On Tue, 2021-03-16 at 19:20 +0800, Wang Qing wrote:
> >> > > Why not just use wake_up_process().
> >> >
> >> > IMO this is not an improvement.  There are other places where explicit
> >> > TASK_NORMAL is used as well, and they're all perfectly clear as is.
> >>
> >> Arguably those could all be converted to wake_up_process() as well.
> >> It's a very small kernel code size optimization. There's about 3 such
> >> places, could be converted in a single patch.
> >
> >It's still pointless churn IMO.
>
> Using wake_up_process() is more simpler and friendly for beginners,
> and it is more convenient for analysis and statistics.

If that's your argument, that should have been in the change log. That
said, it's IMO still pretty darn weak. When presenting a patch, do what
Ingo did, show the technical merit, that's what will determine whether
it flies or dies.

-Mike



Re: [PATCH] sched: swait: use wake_up_process() instead of wake_up_state()

2021-03-17 Thread Mike Galbraith
On Wed, 2021-03-17 at 10:46 +0100, Ingo Molnar wrote:
> * Mike Galbraith  wrote:
>
> > On Tue, 2021-03-16 at 19:20 +0800, Wang Qing wrote:
> > > Why not just use wake_up_process().
> >
> > IMO this is not an improvement.  There are other places where explicit
> > TASK_NORMAL is used as well, and they're all perfectly clear as is.
>
> Arguably those could all be converted to wake_up_process() as well.
> It's a very small kernel code size optimization. There's about 3 such
> places, could be converted in a single patch.

I still prefer the way it sits, but that's certainlyly a heck of a lot
better change justification than "why not" :)

-Mike



Re: [PATCH] sched: rename __prepare_to_swait() to add_swait_queue_locked()

2021-03-16 Thread Mike Galbraith
On Tue, 2021-03-16 at 19:59 +0800, Wang Qing wrote:
> This function just puts wait into queue, and does not do an operation similar
> to prepare_to_wait() in wait.c.
> And during the operation, the caller needs to hold the lock to protect.

I see zero benefit to churn like this.  You're taking a dinky little
file that's perfectly clear (and pretty), and restating the obvious.

>
> Signed-off-by: Wang Qing 
> ---
>  kernel/sched/completion.c | 2 +-
>  kernel/sched/sched.h  | 2 +-
>  kernel/sched/swait.c  | 6 +++---
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
> index a778554..3d28a5a
> --- a/kernel/sched/completion.c
> +++ b/kernel/sched/completion.c
> @@ -79,7 +79,7 @@ do_wait_for_common(struct completion *x,
>   timeout = -ERESTARTSYS;
>   break;
>   }
> - __prepare_to_swait(>wait, );
> + add_swait_queue_locked(>wait, );
>   __set_current_state(state);
>   raw_spin_unlock_irq(>wait.lock);
>   timeout = action(timeout);
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 10a1522..0516f50
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2719,4 +2719,4 @@ static inline bool is_per_cpu_kthread(struct 
> task_struct *p)
>  #endif
>
>  void swake_up_all_locked(struct swait_queue_head *q);
> -void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue 
> *wait);
> +void add_swait_queue_locked(struct swait_queue_head *q, struct swait_queue 
> *wait);
> diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
> index 7a24925..f48a544
> --- a/kernel/sched/swait.c
> +++ b/kernel/sched/swait.c
> @@ -82,7 +82,7 @@ void swake_up_all(struct swait_queue_head *q)
>  }
>  EXPORT_SYMBOL(swake_up_all);
>
> -void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
> +void add_swait_queue_locked(struct swait_queue_head *q, struct swait_queue 
> *wait)
>  {
>   wait->task = current;
>   if (list_empty(>task_list))
> @@ -94,7 +94,7 @@ void prepare_to_swait_exclusive(struct swait_queue_head *q, 
> struct swait_queue *
>   unsigned long flags;
>
>   raw_spin_lock_irqsave(>lock, flags);
> - __prepare_to_swait(q, wait);
> + add_swait_queue_locked(q, wait);
>   set_current_state(state);
>   raw_spin_unlock_irqrestore(>lock, flags);
>  }
> @@ -114,7 +114,7 @@ long prepare_to_swait_event(struct swait_queue_head *q, 
> struct swait_queue *wait
>   list_del_init(>task_list);
>   ret = -ERESTARTSYS;
>   } else {
> - __prepare_to_swait(q, wait);
> + add_swait_queue_locked(q, wait);
>   set_current_state(state);
>   }
>   raw_spin_unlock_irqrestore(>lock, flags);



Re: [PATCH] sched: swait: use wake_up_process() instead of wake_up_state()

2021-03-16 Thread Mike Galbraith
On Tue, 2021-03-16 at 19:20 +0800, Wang Qing wrote:
> Why not just use wake_up_process().

IMO this is not an improvement.  There are other places where explicit
TASK_NORMAL is used as well, and they're all perfectly clear as is.

> Signed-off-by: Wang Qing 
> ---
>  kernel/sched/swait.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
> index e1c655f..7a24925
> --- a/kernel/sched/swait.c
> +++ b/kernel/sched/swait.c
> @@ -69,7 +69,7 @@ void swake_up_all(struct swait_queue_head *q)
>   while (!list_empty()) {
>   curr = list_first_entry(, typeof(*curr), task_list);
>
> - wake_up_state(curr->task, TASK_NORMAL);
> + wake_up_process(curr->task);
>   list_del_init(>task_list);
>
>   if (list_empty())



Re: [bisected] Re: nouveau: lockdep cli->mutex vs reservation_ww_class_mutex deadlock report

2021-03-15 Thread Mike Galbraith
On Mon, 2021-03-15 at 09:53 +0100, Mike Galbraith wrote:
> On Mon, 2021-03-15 at 09:05 +0100, Christian König wrote:
> > Hi Mike,
> >
> > I'm pretty sure your bisection is a bit off.
>
> (huh?) Ah crap, yup, the spew from hell you plugged obliterated the
> lockdep gripe I was grepping for as go/nogo, and off into lala land we
> go.. twice.. whee :)  Oh well, the ordering gripe is clear enough
> without a whodoneit.

However, after having rummaged around, two minutes with gitk was enough
for 551620f2 to flash neon red, and one build later confirm it.

-Mike



Re: [bisected] Re: nouveau: lockdep cli->mutex vs reservation_ww_class_mutex deadlock report

2021-03-15 Thread Mike Galbraith
On Mon, 2021-03-15 at 09:05 +0100, Christian König wrote:
> Hi Mike,
>
> I'm pretty sure your bisection is a bit off.

(huh?) Ah crap, yup, the spew from hell you plugged obliterated the
lockdep gripe I was grepping for as go/nogo, and off into lala land we
go.. twice.. whee :)  Oh well, the ordering gripe is clear enough
without a whodoneit.

-Mike



[bisected] Re: nouveau: lockdep cli->mutex vs reservation_ww_class_mutex deadlock report

2021-03-13 Thread Mike Galbraith
This little bugger bisected to...

b73cd1e2ebfc "drm/ttm: stop destroying pinned ghost object"

...and (the second time around) was confirmed on the spot.  However,
while the fingered commit still reverts cleanly, doing so at HEAD does
not make lockdep return to happy camper state (leading to bisection
#2), ie the fingered commit is only the beginning of nouveau's 5.12
cycle lockdep woes.

homer:..kernel/linux-master # quilt applied|grep revert
patches/revert-drm-ttm-Remove-pinned-bos-from-LRU-in-ttm_bo_move_to_lru_tail-v2.patch
patches/revert-drm-ttm-cleanup-LRU-handling-further.patch
patches/revert-drm-ttm-use-pin_count-more-extensively.patch
patches/revert-drm-ttm-stop-destroying-pinned-ghost-object.patch

That still ain't enough to appease lockdep at HEAD.  I'm not going to
muck about with it beyond that, since this looks a whole lot like yet
another example of "fixing stuff exposes other busted stuff".

On Wed, 2021-03-10 at 10:58 +0100, Mike Galbraith wrote:
> [   29.966927] ==
> [   29.966929] WARNING: possible circular locking dependency detected
> [   29.966932] 5.12.0.g05a59d7-master #2 Tainted: GW   E
> [   29.966934] --
> [   29.966937] X/2145 is trying to acquire lock:
> [   29.966939] 888120714518 (>mutex){+.+.}-{3:3}, at: 
> nouveau_bo_move+0x11f/0x980 [nouveau]
> [   29.967002]
>but task is already holding lock:
> [   29.967004] 888123c201a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> nouveau_bo_pin+0x2b/0x310 [nouveau]
> [   29.967053]
>which lock already depends on the new lock.
>
> [   29.967056]
>the existing dependency chain (in reverse order) is:
> [   29.967058]
>-> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
> [   29.967063]__ww_mutex_lock.constprop.16+0xbe/0x10d0
> [   29.967069]nouveau_bo_pin+0x2b/0x310 [nouveau]
> [   29.967112]nouveau_channel_prep+0x106/0x2e0 [nouveau]
> [   29.967151]nouveau_channel_new+0x4f/0x760 [nouveau]
> [   29.967188]nouveau_abi16_ioctl_channel_alloc+0xdf/0x350 [nouveau]
> [   29.967223]drm_ioctl_kernel+0x91/0xe0 [drm]
> [   29.967245]drm_ioctl+0x2db/0x380 [drm]
> [   29.967259]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
> [   29.967303]__x64_sys_ioctl+0x76/0xb0
> [   29.967307]do_syscall_64+0x33/0x40
> [   29.967310]entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   29.967314]
>-> #0 (>mutex){+.+.}-{3:3}:
> [   29.967318]__lock_acquire+0x1494/0x1ac0
> [   29.967322]lock_acquire+0x23e/0x3b0
> [   29.967325]__mutex_lock+0x95/0x9d0
> [   29.967330]nouveau_bo_move+0x11f/0x980 [nouveau]
> [   29.967377]ttm_bo_handle_move_mem+0x79/0x130 [ttm]
> [   29.967384]ttm_bo_validate+0x156/0x1b0 [ttm]
> [   29.967390]nouveau_bo_validate+0x48/0x70 [nouveau]
> [   29.967438]nouveau_bo_pin+0x1de/0x310 [nouveau]
> [   29.967487]nv50_wndw_prepare_fb+0x53/0x4d0 [nouveau]
> [   29.967531]drm_atomic_helper_prepare_planes+0x8a/0x110 
> [drm_kms_helper]
> [   29.967547]nv50_disp_atomic_commit+0xa9/0x1b0 [nouveau]
> [   29.967593]drm_atomic_helper_update_plane+0x10a/0x150 
> [drm_kms_helper]
> [   29.967606]drm_mode_cursor_universal+0x10b/0x220 [drm]
> [   29.967627]drm_mode_cursor_common+0x190/0x200 [drm]
> [   29.967648]drm_mode_cursor_ioctl+0x3d/0x50 [drm]
> [   29.967669]drm_ioctl_kernel+0x91/0xe0 [drm]
> [   29.967684]drm_ioctl+0x2db/0x380 [drm]
> [   29.967699]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
> [   29.967748]__x64_sys_ioctl+0x76/0xb0
> [   29.967752]do_syscall_64+0x33/0x40
> [   29.967756]entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   29.967760]
>other info that might help us debug this:
>
> [   29.967764]  Possible unsafe locking scenario:
>
> [   29.967767]CPU0CPU1
> [   29.967770]
> [   29.967772]   lock(reservation_ww_class_mutex);
> [   29.967776]lock(>mutex);
> [   29.967779]
> lock(reservation_ww_class_mutex);
> [   29.967783]   lock(>mutex);
> [   29.967786]
> *** DEADLOCK ***
>
> [   29.967790] 3 locks held by X/2145:
> [   29.967792]  #0: 88810365bcf8 (crtc_ww_class_acquire){+.+.}-{0:0}, at: 
> drm_mode_cursor_common+0x87/0x200 [drm]
> [   29.967817]  #1: 888108d9e098 (crtc_ww_class_mutex){+.+.}-{3:3}, at: 
> drm_modeset_lock+0xc3/0xe0 [drm]
> [   29.967841]  #2: 888123c201a0 
>

drm: unexpected static_call insn opcode 0xe9 at drm_gem_check_release_pagevec+0x2a/0x30 [drm]

2021-03-10 Thread Mike Galbraith
Box is aging i4790 box w. GTX980+nouveau.

[2.833432] [ cut here ]
[2.833448] unexpected static_call insn opcode 0xe9 at 
drm_gem_check_release_pagevec+0x2a/0x30 [drm]
[2.833505] WARNING: CPU: 2 PID: 329 at arch/x86/kernel/static_call.c:77 
__static_call_validate+0x71/0x80
[2.833522] Modules linked in: drm(E+) usbcore(E) video(E) button(E+) 
sd_mod(E) t10_pi(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio_ring(E) 
virtio(E) ext4(E) crc32c_intel(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) 
dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) 
scsi_mod(E) efivarfs(E) autofs4(E)
[2.833595] CPU: 2 PID: 329 Comm: systemd-udevd Tainted: GE 
5.12.0.g05a59d7-master #2
[2.833598] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[2.833601] RIP: 0010:__static_call_validate+0x71/0x80
[2.833605] Code: 75 cc f3 c3 0f b6 49 04 38 4f 04 75 e1 f3 c3 f3 c3 48 89 
fa 0f b6 f0 48 c7 c7 48 9f 03 82 c6 05 e6 a3 3e 01 01 e8 1f 59 04 00 <0f> 0b c3 
66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 55
[2.833608] RSP: 0018:888108d07c70 EFLAGS: 00010286
[2.833612] RAX: 0058 RBX:  RCX: 0001
[2.833614] RDX: 8001 RSI: 82066fc1 RDI: 
[2.833617] RBP: 811cbad0 R08: 0001 R09: 0001
[2.833619] R10: 888108d07c78 R11: 888108d07988 R12: a0353cea
[2.833622] R13:  R14: a0353cea R15: 8226c140
[2.833624] FS:  7fbe0b7e1d40() GS:88841ec8() 
knlGS:
[2.833627] CS:  0010 DS:  ES:  CR0: 80050033
[2.833630] CR2: 7fbe0b7eabb0 CR3: 000108d08001 CR4: 001706e0
[2.833633] Call Trace:
[2.833635]  arch_static_call_transform+0x5f/0x90
[2.833657]  __static_call_init.part.2+0xc2/0x210
[2.833667]  ? __SCT__tp_func_sched_update_nr_running_tp+0x8/0x8
[2.833672]  static_call_module_notify+0x14c/0x190
[2.833684]  notifier_call_chain_robust+0x56/0xc0
[2.833706]  blocking_notifier_call_chain_robust+0x41/0x60
[2.833720]  ? kfree+0x170/0x2b0
[2.833729]  load_module+0x1a19/0x21e0
[2.833767]  ? __do_sys_finit_module+0x94/0xe0
[2.833772]  __do_sys_finit_module+0x94/0xe0
[2.833802]  do_syscall_64+0x33/0x40
[2.833808]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[2.833812] RIP: 0033:0x7fbe0a620759
[2.833816] Code: 00 48 81 c4 80 00 00 00 89 f0 c3 66 0f 1f 44 00 00 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 0f d7 2b 00 f7 d8 64 89 01 48
[2.833819] RSP: 002b:7fff37392f08 EFLAGS: 0246 ORIG_RAX: 
0139
[2.833824] RAX: ffda RBX: 55792e98ecf0 RCX: 7fbe0a620759
[2.833826] RDX:  RSI: 7fbe0af5b87d RDI: 0006
[2.833829] RBP: 7fbe0af5b87d R08:  R09: 
[2.833831] R10: 0006 R11: 0246 R12: 0002
[2.833834] R13: 55792e9a7810 R14:  R15: 03938700
[2.833860] irq event stamp: 10273
[2.833862] hardirqs last  enabled at (10279): [] 
vprintk_emit+0x283/0x2b0
[2.833867] hardirqs last disabled at (10284): [] 
vprintk_emit+0x240/0x2b0
[2.833870] softirqs last  enabled at (7608): [] 
__do_softirq+0x365/0x492
[2.833875] softirqs last disabled at (7453): [] 
irq_exit_rcu+0xf6/0x100
[2.833879] ---[ end trace d6083471edf28ca7 ]---



nouveau: lockdep cli->mutex vs reservation_ww_class_mutex deadlock report

2021-03-10 Thread Mike Galbraith


[   29.966927] ==
[   29.966929] WARNING: possible circular locking dependency detected
[   29.966932] 5.12.0.g05a59d7-master #2 Tainted: GW   E
[   29.966934] --
[   29.966937] X/2145 is trying to acquire lock:
[   29.966939] 888120714518 (>mutex){+.+.}-{3:3}, at: 
nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.967002]
   but task is already holding lock:
[   29.967004] 888123c201a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
nouveau_bo_pin+0x2b/0x310 [nouveau]
[   29.967053]
   which lock already depends on the new lock.

[   29.967056]
   the existing dependency chain (in reverse order) is:
[   29.967058]
   -> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
[   29.967063]__ww_mutex_lock.constprop.16+0xbe/0x10d0
[   29.967069]nouveau_bo_pin+0x2b/0x310 [nouveau]
[   29.967112]nouveau_channel_prep+0x106/0x2e0 [nouveau]
[   29.967151]nouveau_channel_new+0x4f/0x760 [nouveau]
[   29.967188]nouveau_abi16_ioctl_channel_alloc+0xdf/0x350 [nouveau]
[   29.967223]drm_ioctl_kernel+0x91/0xe0 [drm]
[   29.967245]drm_ioctl+0x2db/0x380 [drm]
[   29.967259]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
[   29.967303]__x64_sys_ioctl+0x76/0xb0
[   29.967307]do_syscall_64+0x33/0x40
[   29.967310]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   29.967314]
   -> #0 (>mutex){+.+.}-{3:3}:
[   29.967318]__lock_acquire+0x1494/0x1ac0
[   29.967322]lock_acquire+0x23e/0x3b0
[   29.967325]__mutex_lock+0x95/0x9d0
[   29.967330]nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.967377]ttm_bo_handle_move_mem+0x79/0x130 [ttm]
[   29.967384]ttm_bo_validate+0x156/0x1b0 [ttm]
[   29.967390]nouveau_bo_validate+0x48/0x70 [nouveau]
[   29.967438]nouveau_bo_pin+0x1de/0x310 [nouveau]
[   29.967487]nv50_wndw_prepare_fb+0x53/0x4d0 [nouveau]
[   29.967531]drm_atomic_helper_prepare_planes+0x8a/0x110 
[drm_kms_helper]
[   29.967547]nv50_disp_atomic_commit+0xa9/0x1b0 [nouveau]
[   29.967593]drm_atomic_helper_update_plane+0x10a/0x150 
[drm_kms_helper]
[   29.967606]drm_mode_cursor_universal+0x10b/0x220 [drm]
[   29.967627]drm_mode_cursor_common+0x190/0x200 [drm]
[   29.967648]drm_mode_cursor_ioctl+0x3d/0x50 [drm]
[   29.967669]drm_ioctl_kernel+0x91/0xe0 [drm]
[   29.967684]drm_ioctl+0x2db/0x380 [drm]
[   29.967699]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
[   29.967748]__x64_sys_ioctl+0x76/0xb0
[   29.967752]do_syscall_64+0x33/0x40
[   29.967756]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   29.967760]
   other info that might help us debug this:

[   29.967764]  Possible unsafe locking scenario:

[   29.967767]CPU0CPU1
[   29.967770]
[   29.967772]   lock(reservation_ww_class_mutex);
[   29.967776]lock(>mutex);
[   29.967779]lock(reservation_ww_class_mutex);
[   29.967783]   lock(>mutex);
[   29.967786]
*** DEADLOCK ***

[   29.967790] 3 locks held by X/2145:
[   29.967792]  #0: 88810365bcf8 (crtc_ww_class_acquire){+.+.}-{0:0}, at: 
drm_mode_cursor_common+0x87/0x200 [drm]
[   29.967817]  #1: 888108d9e098 (crtc_ww_class_mutex){+.+.}-{3:3}, at: 
drm_modeset_lock+0xc3/0xe0 [drm]
[   29.967841]  #2: 888123c201a0 (reservation_ww_class_mutex){+.+.}-{3:3}, 
at: nouveau_bo_pin+0x2b/0x310 [nouveau]
[   29.967896]
   stack backtrace:
[   29.967899] CPU: 6 PID: 2145 Comm: X Kdump: loaded Tainted: GW   E   
  5.12.0.g05a59d7-master #2
[   29.967904] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[   29.967908] Call Trace:
[   29.967911]  dump_stack+0x6d/0x89
[   29.967915]  check_noncircular+0xe7/0x100
[   29.967919]  ? nvkm_vram_map+0x48/0x50 [nouveau]
[   29.967959]  ? __lock_acquire+0x1494/0x1ac0
[   29.967963]  __lock_acquire+0x1494/0x1ac0
[   29.967967]  lock_acquire+0x23e/0x3b0
[   29.967971]  ? nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.968020]  __mutex_lock+0x95/0x9d0
[   29.968024]  ? nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.968070]  ? nvif_vmm_map+0xf4/0x110 [nouveau]
[   29.968093]  ? nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.968137]  ? lock_release+0x160/0x280
[   29.968141]  ? nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.968184]  nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.968226]  ? up_write+0x17/0x130
[   29.968229]  ? unmap_mapping_pages+0x53/0x110
[   29.968234]  ttm_bo_handle_move_mem+0x79/0x130 [ttm]
[   29.968240]  ttm_bo_validate+0x156/0x1b0 [ttm]
[   29.968247]  nouveau_bo_validate+0x48/0x70 [nouveau]
[   29.968289]  nouveau_bo_pin+0x1de/0x310 [nouveau]
[   29.968330]  nv50_wndw_prepare_fb+0x53/0x4d0 [nouveau]
[   

Re: futex breakage in 4.9 stable branch

2021-03-04 Thread Mike Galbraith
On Thu, 2021-03-04 at 19:47 +0100, Thomas Gleixner wrote:
> On Thu, Mar 04 2021 at 10:12, Mike Galbraith wrote:
> > On Mon, 2021-03-01 at 18:29 +0100, Ben Hutchings wrote:
> >
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -874,8 +874,12 @@ static void free_pi_state(struct futex_p
> >  * and has cleaned up the pi_state already
> >  */
> > if (pi_state->owner) {
> > +   unsigned long flags;
> > +
> > +   raw_spin_lock_irqsave(_state->pi_mutex.wait_lock, flags);
> > pi_state_update_owner(pi_state, NULL);
> > rt_mutex_proxy_unlock(_state->pi_mutex);
> > + raw_spin_unlock_irqrestore(_state->pi_mutex.wait_lock, 
> > flags);
>
> This hunk is missing in 4.9 as well.
>
> > }
> >
> > if (current->pi_state_cache)
> > @@ -1406,7 +1410,7 @@ static int wake_futex_pi(u32 __user *uad
> > if (pi_state->owner != current)
> > return -EINVAL;
> >
> > -   raw_spin_lock(_state->pi_mutex.wait_lock);
> > +   raw_spin_lock_irq(_state->pi_mutex.wait_lock);
> > new_owner = rt_mutex_next_owner(_state->pi_mutex);
> >
> > /*
>
> That looks correct.

Thanks for the eyeballs.  FWIW, updated 4.4.259-rt214 on top of that is
a happy camper as well.

-Mike



Re: futex breakage in 4.9 stable branch

2021-03-04 Thread Mike Galbraith
On Thu, 2021-03-04 at 14:11 +0100, Greg Kroah-Hartman wrote:
> On Thu, Mar 04, 2021 at 10:12:56AM +0100, Mike Galbraith wrote:
>
> > futex: fix 4.4-stable 34c8e1c2c025 backport inspired lockdep complaint
> >
> > 1. 34c8e1c2c025 "futex: Provide and use pi_state_update_owner()" was 
> > backported
> > to stable, leading to the therein assertion that 
> > pi_state->pi_mutex.wait_lock
> > be held triggering in 4.4-stable.  Fixing that leads to lockdep moan part 2.
> >
> > 2: b4abf91047cf "rtmutex: Make wait_lock irq safe" is absent in 4.4-stable, 
> > but
> > wake_futex_pi() nonetheless managed to acquire an unbalanced raw_spin_lock()
> > raw_spin_inlock_irq() pair, which inspires lockdep to moan after 
> > aforementioned
> > assert has been appeased.
> >
> > With this applied, futex tests pass, and no longer inspire lockdep gripeage.
> >
> > Not-Signed-off-by: Mike Galbraith 
> > ---
> >  kernel/futex.c |6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -874,8 +874,12 @@ static void free_pi_state(struct futex_p
> >  * and has cleaned up the pi_state already
> >  */
> > if (pi_state->owner) {
> > +   unsigned long flags;
> > +
> > +   raw_spin_lock_irqsave(_state->pi_mutex.wait_lock, flags);
> > pi_state_update_owner(pi_state, NULL);
> > rt_mutex_proxy_unlock(_state->pi_mutex);
> > +   raw_spin_unlock_irqrestore(_state->pi_mutex.wait_lock, 
> > flags);
> > }
> >
> > if (current->pi_state_cache)
> > @@ -1406,7 +1410,7 @@ static int wake_futex_pi(u32 __user *uad
> > if (pi_state->owner != current)
> > return -EINVAL;
> >
> > -   raw_spin_lock(_state->pi_mutex.wait_lock);
> > +   raw_spin_lock_irq(_state->pi_mutex.wait_lock);
> > new_owner = rt_mutex_next_owner(_state->pi_mutex);
> >
> > /*
> >
>
> Care to sign-off on it so that if this is correct, I can apply it?  :)

Consider it signed off iff Thomas acks it.  I think it's correct.. just
like the guys who have installed every other bug in the damn things,
just a wee bit less over-confident :)

-Mike



Re: futex breakage in 4.9 stable branch

2021-03-04 Thread Mike Galbraith
On Mon, 2021-03-01 at 18:29 +0100, Ben Hutchings wrote:
> On Mon, Mar 01, 2021 at 09:07:03AM +0100, Greg Kroah-Hartman wrote:
> > On Mon, Mar 01, 2021 at 01:13:08AM +0100, Ben Hutchings wrote:
> > > On Tue, 2021-02-23 at 15:00 +0100, Greg Kroah-Hartman wrote:
> > > > I'm announcing the release of the 4.9.258 kernel.
> > > >
> > > > All users of the 4.9 kernel series must upgrade.
> > > >
> > > > The updated 4.9.y git tree can be found at:
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
> > > > linux-4.9.y
> > > > and can be browsed at the normal kernel.org git web browser:
> > > >
> > >
> > > The backported futex fixes are still incomplete/broken in this version.
> > > If I enable lockdep and run the futex self-tests (from 5.10):
> > >
> > > - on 4.9.246, they pass with no lockdep output
> > > - on 4.9.257 and 4.9.258, they pass but futex_requeue_pi trigers a
> > >   lockdep splat
> > >
> > > I have a local branch that essentially updates futex and rtmutex in
> > > 4.9-stable to match 4.14-stable.  With this, the tests pass and lockdep
> > > is happy.
> > >
> > > Unfortunately, that branch has about another 60 commits.
>
> I have now rebased that on top of 4.9.258, and there are "only" 39
> commits.
>
> > > Further, the
> > > more we change futex in 4.9, the more difficult it is going to be to
> > > update the 4.9-rt branch.  But I don't see any better option available
> > > at the moment.
> > >
> > > Thoughts?
> >
> > There were some posted futex fixes for 4.9 (and 4.4) on the stable list
> > that I have not gotten to yet.
> >
> > Hopefully after these are merged (this week), these issues will be
> > resolved.
>
> I'm afraid they are not sufficient.
>
> > If not, then yes, they need to be fixed and any help you can provide
> > would be appreciated.
> >
> > As for "difficulty", yes, it's rough, but the changes backported were
> > required, for obvious reasons :(
>
> I had another look at the locking bug and I was able to make a series
> of 7 commits (on top of the 2 already queued) that is sufficient to
> make lockdep happy.  But I am not very confident that there won't be
> other regressions.  I'll send that over shortly.

This is all I had to do to make 4.4-stable a happy camper again.

futex: fix 4.4-stable 34c8e1c2c025 backport inspired lockdep complaint

1. 34c8e1c2c025 "futex: Provide and use pi_state_update_owner()" was backported
to stable, leading to the therein assertion that pi_state->pi_mutex.wait_lock
be held triggering in 4.4-stable.  Fixing that leads to lockdep moan part 2.

2: b4abf91047cf "rtmutex: Make wait_lock irq safe" is absent in 4.4-stable, but
wake_futex_pi() nonetheless managed to acquire an unbalanced raw_spin_lock()
raw_spin_inlock_irq() pair, which inspires lockdep to moan after aforementioned
assert has been appeased.

With this applied, futex tests pass, and no longer inspire lockdep gripeage.

Not-Signed-off-by: Mike Galbraith 
---
 kernel/futex.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -874,8 +874,12 @@ static void free_pi_state(struct futex_p
 * and has cleaned up the pi_state already
 */
if (pi_state->owner) {
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(_state->pi_mutex.wait_lock, flags);
pi_state_update_owner(pi_state, NULL);
rt_mutex_proxy_unlock(_state->pi_mutex);
+   raw_spin_unlock_irqrestore(_state->pi_mutex.wait_lock, 
flags);
}

if (current->pi_state_cache)
@@ -1406,7 +1410,7 @@ static int wake_futex_pi(u32 __user *uad
if (pi_state->owner != current)
return -EINVAL;

-   raw_spin_lock(_state->pi_mutex.wait_lock);
+   raw_spin_lock_irq(_state->pi_mutex.wait_lock);
new_owner = rt_mutex_next_owner(_state->pi_mutex);

/*



Re: drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 16:34 +0100, Christian König wrote:
> Any objections that I add a Reported-and-tested-by: Mike Galbraith
>  ?

Fine by me.

-Mike



Re: drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 14:26 +0100, Christian König wrote:
>
> Am 10.02.21 um 13:22 schrieb Mike Galbraith:
> > On Wed, 2021-02-10 at 12:44 +0100, Christian König wrote:
> >> Please try to add a "return NULL" at the beginning of ttm_pool_type_take().
> >>
> >> That should effectively disable using the pool.
> > That did away with the yield looping, but it doesn't take long for the
> > display to freeze.  I ssh'd in from lappy, but there was nada in dmesg.
>
> Yeah, that is expected. Without taking pages from the pool we leak
> memory like sieve.
>
> At least we could narrow down the problem quite a bit with that.
>
> Can you test the attached patch and see if it helps?

Yup, that seems to have fixed it all up.  Another one bites the dust ;)

-Mike



Re: drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 12:44 +0100, Christian König wrote:
> Please try to add a "return NULL" at the beginning of ttm_pool_type_take().
>
> That should effectively disable using the pool.

That did away with the yield looping, but it doesn't take long for the
display to freeze.  I ssh'd in from lappy, but there was nada in dmesg.

> Thanks for testing,

Happy to.

-Mike



Re: drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 11:52 +0100, Christian König wrote:
>
>
> You could try to replace the "for (order = min(MAX_ORDER - 1UL,
> __fls(num_pages)); num_pages;" in ttm_pool_alloc() with "for (order = 0;
> num_pages;" to get the old behavior.

That's a nogo too.

-Mike



Re: drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 11:42 +0100, Christian König wrote:
>
> Am 10.02.21 um 11:40 schrieb Mike Galbraith:
> > On Wed, 2021-02-10 at 11:34 +0100, Christian König wrote:
> >> Hi Mike,
> >>
> >> do you have more information than just system stuck in a loop?
> > No, strace shows no syscalls but sched_yield().
>
> Well you can try to comment out the call to register_shrinker() in
> ttm_pool.c, but apart from that I don't have much ideas.

Nogo.. off to suggestion #2 I go.

-Mike



Re: drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 11:34 +0100, Christian König wrote:
>
> What seems to happen here is that your system is low on resources and we
> just try to free up pages.

FWIW, box has oodles generic ram free right after boot.

-Mike



Re: drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 11:34 +0100, Christian König wrote:
> Hi Mike,
>
> do you have more information than just system stuck in a loop?

No, strace shows no syscalls but sched_yield().

-Mike



drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
Greetings,

The symptom is tasks stuck waiting for lord knows what by calling
sched_yield() in a loop (less than wonderful, sched_yield() sucks).
After boot to KDE login, I immediately see tracker-extract chewing cpu
in aforementioned loop. Firing up evolution and poking 'new' to
compose, WebKitWebProcess joins in the yield loop fun.

Hand rolled reverts of 256dd44b "drm/ttm: nuke old page allocator" and
the fingered commit cures the problem for me at 207665fd in the bisect
log below, and at master and tip HEAD.

There's a "things that make ya go hmm" aspect to this thing though.  If
you look at the bisect log below, the starting "bad" is 207665fd.  That
commit DOES NOT exhibit the yield loop symptom immediately out of the
box, but DOES after applying the much needed fix...

660a59953f4f "drm/nouveau: fix multihop when move doesn't work"

...to prevent an earlier regression from quickly appearing, one which
Dave will likely recall having fixed.  Relevant?  No idea, but seems
worth mentioning.

Box: aging generic i4790 box with its equally aged Nvidia GTX 980.


461619f5c3242aaee9ec3f0b7072719bd86ea207 is the first bad commit
commit 461619f5c3242aaee9ec3f0b7072719bd86ea207
Author: Christian König 
Date:   Sat Oct 24 13:13:25 2020 +0200

drm/nouveau: switch to new allocator

It should be able to handle all cases now.

Signed-off-by: Christian König 
Reviewed-by: Dave Airlie 
Reviewed-by: Madhav Chauhan 
Tested-by: Huang Rui 
Link: https://patchwork.freedesktop.org/patch/397082/?series=83051=1

 drivers/gpu/drm/nouveau/nouveau_bo.c  | 30 ++
 drivers/gpu/drm/nouveau/nouveau_drv.h |  1 -
 2 files changed, 2 insertions(+), 29 deletions(-)

git bisect start
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 3f995f8e0b540342612d3f6b1fc299f5bf486987
# bad: [207665fd37561f97591e74d0ee80f24bdf06b789] Merge tag 
'exynos-drm-next-for-v5.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next
git bisect bad 207665fd37561f97591e74d0ee80f24bdf06b789
# good: [f8394f232b1eab649ce2df5c5f15b0e528c92091] Linux 5.10-rc3
git bisect good f8394f232b1eab649ce2df5c5f15b0e528c92091
# good: [b3bf99daaee96a141536ce5c60a0d6dba6ec1d23] drm/i915/display: Defer 
initial modeset until after GGTT is initialised
git bisect good b3bf99daaee96a141536ce5c60a0d6dba6ec1d23
# good: [dfbbfe3c17651fa0fcf2658fb90317df08e52bb2] drm/amd/display: Add formats 
for DCC with 2/3 planes.
git bisect good dfbbfe3c17651fa0fcf2658fb90317df08e52bb2
# bad: [112e505a76de69f8667e2fe8da38433f754364a8] Merge drm/drm-next into 
drm-misc-next
git bisect bad 112e505a76de69f8667e2fe8da38433f754364a8
# bad: [49a3f51dfeeecb52c5aa28c5cb9592fe5e39bf95] drm/gem: Use struct 
dma_buf_map in GEM vmap ops and convert GEM backends
git bisect bad 49a3f51dfeeecb52c5aa28c5cb9592fe5e39bf95
# bad: [d7e0798925ea9272f8c8e66ceb1f7c51823e50ab] dt-bindings: display: bridge: 
Intel KeemBay DSI
git bisect bad d7e0798925ea9272f8c8e66ceb1f7c51823e50ab
# bad: [c489573b5b6ce6442ad4658d9d5ec77839b91622] Merge drm/drm-next into 
drm-misc-next
git bisect bad c489573b5b6ce6442ad4658d9d5ec77839b91622
# bad: [8567d51555c12d169c4e0f796030051fff1c318d] drm/vmwgfx: switch to new 
allocator
git bisect bad 8567d51555c12d169c4e0f796030051fff1c318d
# good: [5144eead3f8c80ac7f913c07139442fede94003e] drm: xlnx: Use 
dma_request_chan for DMA channel request
git bisect good 5144eead3f8c80ac7f913c07139442fede94003e
# good: [e93b2da9799e5cb97760969f3e1f02a5bdac29fe] drm/amdgpu: switch to new 
allocator v2
git bisect good e93b2da9799e5cb97760969f3e1f02a5bdac29fe
# bad: [461619f5c3242aaee9ec3f0b7072719bd86ea207] drm/nouveau: switch to new 
allocator
git bisect bad 461619f5c3242aaee9ec3f0b7072719bd86ea207
# good: [0fe3cf3a53b5c1205ec7d321be1185b075dff205] drm/radeon: switch to new 
allocator v2
git bisect good 0fe3cf3a53b5c1205ec7d321be1185b075dff205
# first bad commit: [461619f5c3242aaee9ec3f0b7072719bd86ea207] drm/nouveau: 
switch to new allocator



Re: [patch] preempt: select PREEMPT_DYNAMIC under PREEMPTION instead of PREEMPT

2021-02-09 Thread Mike Galbraith
On Tue, 2021-02-09 at 17:19 +0100, Peter Zijlstra wrote:
> On Tue, Feb 09, 2021 at 05:13:15PM +0100, Peter Zijlstra wrote:
> > On Tue, Feb 09, 2021 at 05:05:14PM +0100, Mike Galbraith wrote:
> >
> > > ld: init/main.o: in function `trace_initcall_start':
> > > /backup/usr/local/src/kernel/linux-tip-rt/./include/trace/events/initcall.h:27:
> > >  undefined reference to `__SCT__preempt_schedule_notrace'
> >
> > Ooohh... this is because x86 can't build PREEMPT without PREEMPT_DYNAMIC
> > anymore. Maybe I should fix that. Lemme see what that would take.
>
> Does this work?

Yup, worked fine.

-Mike



Re: [patch] preempt: select PREEMPT_DYNAMIC under PREEMPTION instead of PREEMPT

2021-02-09 Thread Mike Galbraith
On Tue, 2021-02-09 at 17:13 +0100, Peter Zijlstra wrote:
> On Tue, Feb 09, 2021 at 05:05:14PM +0100, Mike Galbraith wrote:
>
> > ld: init/main.o: in function `trace_initcall_start':
> > /backup/usr/local/src/kernel/linux-tip-rt/./include/trace/events/initcall.h:27:
> >  undefined reference to `__SCT__preempt_schedule_notrace'
>
> Ooohh... this is because x86 can't build PREEMPT without PREEMPT_DYNAMIC
> anymore. Maybe I should fix that. Lemme see what that would take.

Uhoh, looks like something a lot worse managed to sneak into master.
While test-driving that tip-rt build, I couldn't help noticing that
evolution is going mad, chewing cpu and being useless enough that I
couldn't reply to you.  Unfortunately, it ain't my tip-rt tree, nor is
it virgin tip, the evilness is present in virgin master, but not in
5.10 stable.  Hohum, a hunting I shall go I suppose.

-Mike



Re: [patch] preempt: select PREEMPT_DYNAMIC under PREEMPTION instead of PREEMPT

2021-02-09 Thread Mike Galbraith
On Tue, 2021-02-09 at 16:13 +0100, Peter Zijlstra wrote:
> On Tue, Feb 09, 2021 at 02:45:37PM +0100, Mike Galbraith wrote:
> >
> > PREEMPT_RT and PREEMPT both needs PREEMPT_DYNAMIC to build, so move
> > selection of PREEMPT_DYNAMIC to the common denominator, PREEMPTION.
>
> I'm confused, why would you want PREEMPT_DYNAMIC for
> PREEMPT_RT ?
>
> PREEMPT_RT without full preemption is just plain daft, no?

If you turn on PREEMPT, PREEMPT_DYNAMIC is selected because
HAVE_PREEMPT_DYNAMIC is true.

homer:..kernel/linux-tip # grep PREEMPT .config
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DEBUG_PREEMPT=y
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set

Now over to tip-rt, with patchlet...

homer:..kernel/linux-tip-rt # grep PREEMPT .config
CONFIG_HAVE_PREEMPT_LAZY=y
CONFIG_PREEMPT_LAZY=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_RT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DEBUG_PREEMPT=y
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set

...tree builds/runs.  Pop patchlet, and...

  LD  vmlinux.o
  MODPOST vmlinux.symvers
  MODINFO modules.builtin.modinfo
  GEN modules.builtin
  LD  .tmp_vmlinux.kallsyms1
ld: init/main.o: in function `trace_initcall_start':
/backup/usr/local/src/kernel/linux-tip-rt/./include/trace/events/initcall.h:27: 
undefined reference to `__SCT__preempt_schedule_notrace'
ld: init/main.o: in function `trace_initcall_finish':
/backup/usr/local/src/kernel/linux-tip-rt/./include/trace/events/initcall.h:48: 
undefined reference to `__SCT__preempt_schedule_notrace'
ld: init/main.o: in function `trace_initcall_level':
/backup/usr/local/src/kernel/linux-tip-rt/./include/trace/events/initcall.h:10: 
undefined reference to `__SCT__preempt_schedule_notrace'
ld: init/main.o:(.static_call_sites+0x4): undefined reference to 
`__SCK__preempt_schedule_notrace'
ld: init/main.o:(.static_call_sites+0x14): undefined reference to 
`__SCK__preempt_schedule_notrace'
ld: init/main.o:(.static_call_sites+0x24): undefined reference to 
`__SCK__preempt_schedule_notrace'
ld: arch/x86/entry/vsyscall/vsyscall_64.o: in function `trace_emulate_vsyscall':
/backup/usr/local/src/kernel/linux-tip-rt/arch/x86/entry/vsyscall/vsyscall_trace.h:10:
 undefined reference to `__SCT__preempt_schedule_notrace'
ld: arch/x86/entry/vsyscall/vsyscall_64.o:(.static_call_sites+0x4): undefined 
reference to `__SCK__preempt_schedule_notrace'
ld: arch/x86/events/amd/ibs.o: in function `ibs_eilvt_valid':
/backup/usr/local/src/kernel/linux-tip-rt/arch/x86/events/amd/ibs.c:865: 
undefined reference to `__SCT__preempt_schedule'
ld: arch/x86/events/amd/ibs.o: in function `force_ibs_eilvt_setup':
/backup/usr/local/src/kernel/linux-tip-rt/arch/x86/events/amd/ibs.c:923: 
undefined reference to `__SCT__preempt_schedule'
ld: /backup/usr/local/src/kernel/linux-tip-rt/arch/x86/events/amd/ibs.c:943: 
undefined reference to `__SCT__preempt_schedule'
ld: arch/x86/events/amd/ibs.o: in function `ibs_eilvt_valid':
/backup/usr/local/src/kernel/linux-tip-rt/arch/x86/events/amd/ibs.c:865: 
undefined reference to `__SCT__preempt_schedule'
...




[patch] preempt: select PREEMPT_DYNAMIC under PREEMPTION instead of PREEMPT

2021-02-09 Thread Mike Galbraith


PREEMPT_RT and PREEMPT both needs PREEMPT_DYNAMIC to build, so move
selection of PREEMPT_DYNAMIC to the common denominator, PREEMPTION.

Signed-off-by: Mike Galbraith 
---
 kernel/Kconfig.preempt |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -40,7 +40,6 @@ config PREEMPT
depends on !ARCH_NO_PREEMPT
select PREEMPTION
select UNINLINE_SPIN_UNLOCK if !ARCH_INLINE_SPIN_UNLOCK
-   select PREEMPT_DYNAMIC if HAVE_PREEMPT_DYNAMIC
help
  This option reduces the latency of the kernel by making
  all kernel code (that is not executing in a critical section)
@@ -81,6 +80,7 @@ config PREEMPT_COUNT
 config PREEMPTION
bool
select PREEMPT_COUNT
+   select PREEMPT_DYNAMIC if HAVE_PREEMPT_DYNAMIC

 config PREEMPT_DYNAMIC
bool



Re: [PATCH] zsmalloc: do not use bit_spin_lock

2020-12-20 Thread Mike Galbraith
On Sun, 2020-12-20 at 21:20 +, Song Bao Hua (Barry Song) wrote:
>
> > -Original Message-
> > From: Mike Galbraith [mailto:efa...@gmx.de]
> > Sent: Sunday, December 20, 2020 8:48 PM
> > To: Vitaly Wool ; LKML
> > ; linux-mm 
> > Cc: Song Bao Hua (Barry Song) ; Sebastian 
> > Andrzej
> > Siewior ; Minchan Kim ; 
> > NitinGupta
> > 
> > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> >
> > On Sun, 2020-12-20 at 02:23 +0100, Mike Galbraith wrote:
> > > On Sun, 2020-12-20 at 02:22 +0200, Vitaly Wool wrote:
> > > > zsmalloc takes bit spinlock in its _map() callback and releases it
> > > > only in unmap() which is unsafe and leads to zswap complaining
> > > > about scheduling in atomic context.
> > > >
> > > > To fix that and to improve RT properties of zsmalloc, remove that
> > > > bit spinlock completely and use a bit flag instead.
> > >
> > > It also does get_cpu_var() in map(), put_cpu_var() in unmap().
> >
> > That aside, the bit spinlock removal seems to hold up to beating in RT.
> > I stripped out the RT changes to replace the bit spinlocks, applied the
> > still needed atm might_sleep() fix, and ltp zram and zswap test are
> > running in a loop with no signs that it's a bad idea, so I hope that
> > makes it in (minus the preempt disabled spin which I whacked), as it
> > makes zsmalloc markedly more RT friendly.
> >
> > RT changes go from:
> >  1 file changed, 79 insertions(+), 6 deletions(-)
> > to:
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> >
>
> Sorry, would you like to show the change for
> "8 insertions(+), 3 deletions(-)"?

Sure.
---
 mm/zsmalloc.c |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 

 #define ZSPAGE_MAGIC   0x58

@@ -293,6 +294,7 @@ struct zspage {
 };

 struct mapping_area {
+   local_lock_t lock;
char *vm_buf; /* copy buffer for objects that span pages */
char *vm_addr; /* address of kmap_atomic()'ed pages */
enum zs_mapmode vm_mm; /* mapping mode */
@@ -455,7 +457,9 @@ MODULE_ALIAS("zpool-zsmalloc");
 #endif /* CONFIG_ZPOOL */

 /* per-cpu VM mapping areas for zspage accesses that cross page
boundaries */
-static DEFINE_PER_CPU(struct mapping_area, zs_map_area);
+static DEFINE_PER_CPU(struct mapping_area, zs_map_area) = {
+   .lock   = INIT_LOCAL_LOCK(lock),
+};

 static bool is_zspage_isolated(struct zspage *zspage)
 {
@@ -1276,7 +1280,8 @@ void *zs_map_object(struct zs_pool *pool
class = pool->size_class[class_idx];
off = (class->size * obj_idx) & ~PAGE_MASK;

-   area = _cpu_var(zs_map_area);
+   local_lock(_map_area.lock);
+   area = this_cpu_ptr(_map_area);
area->vm_mm = mm;
if (off + class->size <= PAGE_SIZE) {
/* this object is contained entirely within a page */
@@ -1330,7 +1335,7 @@ void zs_unmap_object(struct zs_pool *poo

__zs_unmap_object(area, pages, off, class->size);
}
-   put_cpu_var(zs_map_area);
+   local_unlock(_map_area.lock);

migrate_read_unlock(zspage);
unpin_tag(handle);


> BTW, your original patch looks not right as
> crypto_wait_req(crypto_acomp_decompress()...)
> can sleep too.
>
> [copy from your original patch with comment]
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1258,20 +1258,20 @@ static int zswap_frontswap_load(unsigned
>
>   /* decompress */
>   dlen = PAGE_SIZE;
> + acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> + mutex_lock(acomp_ctx->mutex);
>   src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
>   if (zpool_evictable(entry->pool->zpool))
>   src += sizeof(struct zswap_header);
>
> - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> - mutex_lock(acomp_ctx->mutex);
>   sg_init_one(, src, entry->length);
>   sg_init_table(, 1);
>   sg_set_page(, page, PAGE_SIZE, 0);
>   acomp_request_set_params(acomp_ctx->req, , , 
> entry->length, dlen);
>
> /*
>  * here crypto could sleep
>  !!*/

Hohum, another one for my Bitmaster-9000 patch shredder.

-Mike




Re: [PATCH] zsmalloc: do not use bit_spin_lock

2020-12-19 Thread Mike Galbraith
On Sun, 2020-12-20 at 02:23 +0100, Mike Galbraith wrote:
> On Sun, 2020-12-20 at 02:22 +0200, Vitaly Wool wrote:
> > zsmalloc takes bit spinlock in its _map() callback and releases it
> > only in unmap() which is unsafe and leads to zswap complaining
> > about scheduling in atomic context.
> >
> > To fix that and to improve RT properties of zsmalloc, remove that
> > bit spinlock completely and use a bit flag instead.
>
> It also does get_cpu_var() in map(), put_cpu_var() in unmap().

That aside, the bit spinlock removal seems to hold up to beating in RT.
I stripped out the RT changes to replace the bit spinlocks, applied the
still needed atm might_sleep() fix, and ltp zram and zswap test are
running in a loop with no signs that it's a bad idea, so I hope that
makes it in (minus the preempt disabled spin which I whacked), as it
makes zsmalloc markedly more RT friendly.

RT changes go from:
 1 file changed, 79 insertions(+), 6 deletions(-)
to:
 1 file changed, 8 insertions(+), 3 deletions(-)

-Mike



Re: [PATCH] zsmalloc: do not use bit_spin_lock

2020-12-19 Thread Mike Galbraith
On Sun, 2020-12-20 at 02:23 +0100, Mike Galbraith wrote:
> On Sun, 2020-12-20 at 02:22 +0200, Vitaly Wool wrote:
> > zsmalloc takes bit spinlock in its _map() callback and releases it
> > only in unmap() which is unsafe and leads to zswap complaining
> > about scheduling in atomic context.
> >
> > To fix that and to improve RT properties of zsmalloc, remove that
> > bit spinlock completely and use a bit flag instead.
>
> It also does get_cpu_var() in map(), put_cpu_var() in unmap().

Bah, I forgot to mention the config dependent rwlock, it's held across
map()/unmap() as well, so there are two more hurdles, not one.

-Mike



Re: [PATCH] zsmalloc: do not use bit_spin_lock

2020-12-19 Thread Mike Galbraith
On Sun, 2020-12-20 at 02:22 +0200, Vitaly Wool wrote:
> zsmalloc takes bit spinlock in its _map() callback and releases it
> only in unmap() which is unsafe and leads to zswap complaining
> about scheduling in atomic context.
>
> To fix that and to improve RT properties of zsmalloc, remove that
> bit spinlock completely and use a bit flag instead.


> -static void pin_tag(unsigned long handle) __acquires(bitlock)
> +static void pin_tag(unsigned long handle)
>  {
> - bit_spin_lock(HANDLE_PIN_BIT, (unsigned long *)handle);
> + preempt_disable();
> + while(test_and_set_bit(HANDLE_PIN_BIT, (unsigned long *)handle))
> + cpu_relax();
> + preempt_enable();
>  }

If try doesn't need to disable preemption, neither does pin.

-Mike




Re: [PATCH] zsmalloc: do not use bit_spin_lock

2020-12-19 Thread Mike Galbraith
On Sun, 2020-12-20 at 02:22 +0200, Vitaly Wool wrote:
> zsmalloc takes bit spinlock in its _map() callback and releases it
> only in unmap() which is unsafe and leads to zswap complaining
> about scheduling in atomic context.
>
> To fix that and to improve RT properties of zsmalloc, remove that
> bit spinlock completely and use a bit flag instead.

It also does get_cpu_var() in map(), put_cpu_var() in unmap().

-Mike



Re: [patch] zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat

2020-12-19 Thread Mike Galbraith
(CC zsmalloc maintainers)

On Sat, 2020-12-19 at 11:59 +0100, Mike Galbraith wrote:
> On Sat, 2020-12-19 at 11:46 +0100, Vitaly Wool wrote:
> > On Sat, 19 Dec 2020, 11:27 Mike Galbraith,  wrote:
> >
> > > The kernel that generated that splat was NOT an RT kernel, it was plain
> > > master.today with a PREEMPT config.
> >
> >
> > I see, thanks. I don't think it makes things better for zsmalloc
> > though. From what I can see, the offending code is this:
> >
> > >/* From now on, migration cannot move the object */
> > >pin_tag(handle);
> >
> > Bit spinlock is taken in pin_tag(). I find the comment above somewhat
> > misleading, why is it necessary to take a spinlock to prevent
> > migration? I would guess an atomic flag should normally be enough.
> >
> > zswap is not broken here, it is zsmalloc that needs to be fixed.
>
> Cool, those damn bit spinlocks going away would be a case of happiness
> for RT as well :)
>
>   -Mike



Re: [patch] zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat

2020-12-19 Thread Mike Galbraith
On Sat, 2020-12-19 at 11:46 +0100, Vitaly Wool wrote:
> On Sat, 19 Dec 2020, 11:27 Mike Galbraith,  wrote:
>
> > The kernel that generated that splat was NOT an RT kernel, it was plain
> > master.today with a PREEMPT config.
>
>
> I see, thanks. I don't think it makes things better for zsmalloc
> though. From what I can see, the offending code is this:
>
> >/* From now on, migration cannot move the object */
> >pin_tag(handle);
>
> Bit spinlock is taken in pin_tag(). I find the comment above somewhat
> misleading, why is it necessary to take a spinlock to prevent
> migration? I would guess an atomic flag should normally be enough.
>
> zswap is not broken here, it is zsmalloc that needs to be fixed.

Cool, those damn bit spinlocks going away would be a case of happiness
for RT as well :)

-Mike



Re: [patch] zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat

2020-12-19 Thread Mike Galbraith
On Sat, 2020-12-19 at 11:20 +0100, Vitaly Wool wrote:
> Hi Mike,
>
> On Sat, Dec 19, 2020 at 11:12 AM Mike Galbraith  wrote:
> >
> > (mailer partially munged formatting? resend)
> >
> > mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() 
> > splat
> >
> > zsmalloc map/unmap methods use preemption disabling bit spinlocks.  Take the
> > mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done
> > in zswap_frontswap_store().
>
> oh wait... So is zsmalloc taking a spin lock in its map callback and
> releasing it only in unmap? In this case, I would rather keep zswap as
> is, mark zsmalloc as RT unsafe and have zsmalloc maintainer fix it.

The kernel that generated that splat was NOT an RT kernel, it was plain
master.today with a PREEMPT config.

-Mike



Re: [patch] zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat

2020-12-19 Thread Mike Galbraith
(mailer partially munged formatting? resend)

mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() 
splat

zsmalloc map/unmap methods use preemption disabling bit spinlocks.  Take the
mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done
in zswap_frontswap_store().

Signed-off-by: Mike Galbraith 
Fixes: 1ec3b5fe6eec "mm/zswap: move to use crypto_acomp API for hardware 
acceleration"
---
 mm/zswap.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1258,20 +1258,20 @@ static int zswap_frontswap_load(unsigned

/* decompress */
dlen = PAGE_SIZE;
+   acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+   mutex_lock(acomp_ctx->mutex);
src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
if (zpool_evictable(entry->pool->zpool))
src += sizeof(struct zswap_header);

-   acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-   mutex_lock(acomp_ctx->mutex);
sg_init_one(, src, entry->length);
sg_init_table(, 1);
sg_set_page(, page, PAGE_SIZE, 0);
acomp_request_set_params(acomp_ctx->req, , , 
entry->length, dlen);
ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), 
_ctx->wait);
-   mutex_unlock(acomp_ctx->mutex);

zpool_unmap_handle(entry->pool->zpool, entry->handle);
+   mutex_unlock(acomp_ctx->mutex);
BUG_ON(ret);

 freeentry:



[patch] zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat

2020-12-19 Thread Mike Galbraith
Greetings,

Beating on zswap+zsmalloc leads to the splat below, possible patch
below that.

[  139.794413] BUG: sleeping function called from invalid context at 
kernel/locking/mutex.c:935
[  139.794585] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 907, 
name: avahi-daemon
[  139.794608] 2 locks held by avahi-daemon/907:
[  139.794621]  #0: 8881010766d8 (>mmap_lock#2){}-{3:3}, at: 
exc_page_fault+0x15b/0x6e0
[  139.794659]  #1: 8881b6b13110 (>lock){.+.+}-{2:2}, at: 
zs_map_object+0x89/0x350
[  139.794691] Preemption disabled at:
[  139.794693] [] zs_map_object+0x38/0x350
[  139.794719] CPU: 0 PID: 907 Comm: avahi-daemon Kdump: loaded Tainted: G  
  E 5.10.0.g3644e2d-preempt #4
[  139.794746] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[  139.794766] Call Trace:
[  139.794775]  dump_stack+0x77/0x97
[  139.794788]  ___might_sleep+0x17d/0x260
[  139.794806]  __mutex_lock+0x47/0x9d0
[  139.794823]  ? zswap_frontswap_load+0xcb/0x240
[  139.794841]  ? zs_map_object+0x248/0x350
[  139.794858]  ? zswap_frontswap_load+0xcb/0x240
[  139.794871]  zswap_frontswap_load+0xcb/0x240
[  139.794886]  ? lru_cache_add+0xe9/0x1b0
[  139.794904]  __frontswap_load+0x6e/0xd0
[  139.794921]  swap_readpage+0x70/0x240
[  139.794939]  swap_cluster_readahead+0x1d9/0x280
[  139.794964]  ? swapin_readahead+0x140/0x3e0
[  139.794979]  swapin_readahead+0x140/0x3e0
[  139.794997]  ? lookup_swap_cache+0x5c/0x190
[  139.795016]  ? do_swap_page+0x2f7/0x900
[  139.795030]  do_swap_page+0x2f7/0x900
[  139.795046]  handle_mm_fault+0xa06/0x13b0
[  139.795083]  exc_page_fault+0x2a3/0x6e0
[  139.795110]  ? asm_exc_page_fault+0x1e/0x30
[  139.795139]  ? asm_exc_page_fault+0x8/0x30
[  139.795166]  asm_exc_page_fault+0x1e/0x30
[  139.795191] RIP: 0033:0x560caa4074d0
[  139.795215] Code: 89 05 e4 74 21 00 0f 84 28 06 00 00 e8 d9 13 00 00 e8 b4 
12 00 00 44 8b 25 b9 75 21 00 45 85 e4 0f 85 ac 04 00 00 41 83 cf ff <48> 8b 3d 
b1 74 21 00 44 89 fe e8 d1 f6 ff ff 85 c0 89 c3 0f 88 d9
[  139.795299] RSP: 002b:7ffeb4ec5ce0 EFLAGS: 00010246
[  139.795316] RAX:  RBX:  RCX: 7ff1616e4db4
[  139.795336] RDX: 0001 RSI: 7ffeb4ec5c5f RDI: 0006
[  139.795356] RBP: 560cab63be60 R08: 560cab656220 R09: 560cab670750
[  139.795376] R10:  R11: 0246 R12: 
[  139.795395] R13: 560cab63f550 R14: 560cab63c000 R15: 

(CC Sebastian because RT has the ~same problem at the same spot,
curable the same way, or by fixing the upstream buglet then take
backports+fix, and nuking the associated RT patch)

mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() 
splat

zsmalloc map/unmap methods use preemption disabling bit spinlocks.
Take the
mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done
in zswap_frontswap_store().

Signed-off-by: Mike Galbraith 
Fixes: 1ec3b5fe6eec "mm/zswap: move to use crypto_acomp API for hardware 
acceleration"
---
 mm/zswap.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1258,20 +1258,20 @@ static int zswap_frontswap_load(unsigned

/* decompress */
dlen = PAGE_SIZE;
+   acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+   mutex_lock(acomp_ctx->mutex);
src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
if (zpool_evictable(entry->pool->zpool))
src += sizeof(struct zswap_header);

-   acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-   mutex_lock(acomp_ctx->mutex);
sg_init_one(, src, entry->length);
sg_init_table(, 1);
sg_set_page(, page, PAGE_SIZE, 0);
acomp_request_set_params(acomp_ctx->req, , , 
entry->length, dlen);
ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), 
_ctx->wait);
-   mutex_unlock(acomp_ctx->mutex);

zpool_unmap_handle(entry->pool->zpool, entry->handle);
+   mutex_unlock(acomp_ctx->mutex);
BUG_ON(ret);

 freeentry:



Re: [bisected] Re: regression: nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop

2020-12-17 Thread Mike Galbraith
On Fri, 2020-12-18 at 05:45 +1000, David Airlie wrote:

> Does the attached patch help?

Yup, that seems to have done the trick.  Fast bug squashing by the drm
guys today, two slowly bisected, two quickly squashed.

-Mike



Re: [bisected] Re: drm, qxl: post 5.11 merge warning+explosion

2020-12-17 Thread Mike Galbraith
On Thu, 2020-12-17 at 17:38 +0100, Christian König wrote:
>
> Mike can you test the attached patch?

Yup, one-liner made it all better. That was quick like bunny.

-Mike



Re: [bisected] Re: drm, qxl: post 5.11 merge warning+explosion

2020-12-17 Thread Mike Galbraith
On Thu, 2020-12-17 at 17:24 +0100, Christian König wrote:
> Hi Mike,
>
> what exactly is the warning from qxl you are seeing?

[1.815561] WARNING: CPU: 7 PID: 355 at drivers/gpu/drm/ttm/ttm_pool.c:365 
ttm_pool_alloc+0x41b/0x540 [ttm]
[1.815561] Modules linked in: ext4(E) crc16(E) mbcache(E) jbd2(E) 
ata_generic(E) ata_piix(E) virtio_console(E) virtio_rng(E) virtio_blk(E) qxl(E) 
drm_ttm_helper(E) ttm(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) 
sysimgblt(E) ahci(E) fb_sys_fops(E) cec(E) libahci(E) uhci_hcd(E) ehci_pci(E) 
rc_core(E) ehci_hcd(E) crc32c_intel(E) serio_raw(E) virtio_pci(E) 
virtio_ring(E) 8139cp(E) virtio(E) libata(E) drm(E) usbcore(E) mii(E) sg(E) 
dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) 
scsi_mod(E) autofs4(E)
[1.815589] CPU: 7 PID: 355 Comm: kworker/7:2 Tainted: GE 
5.10.0.g489e9fe-master #26
[1.815590] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[1.815614] Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
[1.815621] RIP: 0010:ttm_pool_alloc+0x41b/0x540 [ttm]
[1.815623] Code: fc ff ff 89 ea 48 8d 04 d5 00 00 00 00 48 29 d0 48 8d 3c 
c5 00 1c 40 a0 e9 d7 fc ff ff 85 c0 0f 89 2f fc ff ff e9 28 fc ff ff <0f> 0b e9 
35 fc ff ff 89 e9 49 8b 7d 00 b8 00 10 00 00 48 d3 e0 45
[1.815623] RSP: 0018:888105d3b818 EFLAGS: 00010246
[1.815625] RAX:  RBX: 888106978800 RCX: 
[1.815626] RDX: 888105d3bc68 RSI: 0001 RDI: 888106238820
[1.815626] RBP: 888106238758 R08: c9296000 R09: 816b
[1.815627] R10: 0001 R11: c9296000 R12: 
[1.815628] R13: 888106238820 R14:  R15: 888106978800
[1.815628] FS:  () GS:888237dc() 
knlGS:
[1.815632] CS:  0010 DS:  ES:  CR0: 80050033
[1.815633] CR2: 7eff52a0d5b8 CR3: 02010003 CR4: 001706e0
[1.815633] Call Trace:
[1.815644]  ttm_tt_populate+0xb1/0xc0 [ttm]
[1.815647]  ttm_bo_move_memcpy+0x4a5/0x500 [ttm]
[1.815652]  qxl_bo_move+0x230/0x2f0 [qxl]
[1.815655]  ttm_bo_handle_move_mem+0x79/0x140 [ttm]
[1.815657]  ttm_bo_evict+0x124/0x250 [ttm]
[1.815693]  ? drm_mm_insert_node_in_range+0x55c/0x580 [drm]
[1.815696]  ttm_mem_evict_first+0x110/0x3d0 [ttm]
[1.815698]  ttm_bo_mem_space+0x261/0x270 [ttm]
[1.815702]  ? qxl_ttm_debugfs_init+0xb0/0xb0 [qxl]
[1.815705]  ttm_bo_validate+0x117/0x150 [ttm]
[1.815756]  ttm_bo_init_reserved+0x2c8/0x3c0 [ttm]
[1.815772]  qxl_bo_create+0x134/0x1d0 [qxl]
[1.815775]  ? qxl_ttm_debugfs_init+0xb0/0xb0 [qxl]
[1.815791]  qxl_alloc_bo_reserved+0x2c/0x90 [qxl]
[1.815794]  qxl_image_alloc_objects+0xa3/0x120 [qxl]
[1.815797]  qxl_draw_dirty_fb+0x155/0x450 [qxl]
[1.815815]  ? _cond_resched+0x15/0x40
[1.815819]  ? ww_mutex_lock_interruptible+0x12/0x60
[1.815822]  qxl_framebuffer_surface_dirty+0x14f/0x1a0 [qxl]
[1.815841]  drm_fb_helper_dirty_work+0x11d/0x180 [drm_kms_helper]
[1.815853]  process_one_work+0x1f5/0x3c0
[1.815866]  ? process_one_work+0x3c0/0x3c0
[1.815867]  worker_thread+0x2d/0x3d0
[1.815868]  ? process_one_work+0x3c0/0x3c0
[1.815872]  kthread+0x117/0x130
[1.815876]  ? kthread_park+0x90/0x90
[1.815880]  ret_from_fork+0x1f/0x30
[1.815886] ---[ end trace 51e464c1e89a1728 ]---
[1.815894] BUG: kernel NULL pointer dereference, address: 0230
[1.815895] #PF: supervisor read access in kernel mode
[1.815895] #PF: error_code(0x) - not-present page
[1.815896] PGD 0 P4D 0
[1.815898] Oops:  [#1] SMP NOPTI
[1.815900] CPU: 7 PID: 355 Comm: kworker/7:2 Tainted: GW   E 
5.10.0.g489e9fe-master #26
[1.815901] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[1.815916] Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
[1.815921] RIP: 0010:dma_map_page_attrs+0xf/0x1c0
[1.815922] Code: 1f 17 5b 01 48 85 c0 75 e3 31 c0 c3 66 66 2e 0f 1f 84 00 
00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 41 55 41 54 55 53 48 83 ec 08 <48> 8b 87 
30 02 00 00 48 85 c0 48 0f 44 05 e7 16 5b 01 41 83 f8 02
[1.815923] RSP: 0018:888105d3b7e8 EFLAGS: 00010296
[1.815924] RAX: 1000 RBX: 0001 RCX: 1000
[1.815924] RDX:  RSI: ea0004171e40 RDI: 
[1.815925] RBP:  R08:  R09: 
[1.815925] R10: ea0004171e40 R11: c9296000 R12: 0001
[1.815926] R13: 888106238820 R14: 888105d07100 R15: 888106978800
[1.815926] FS:  () GS:888237dc() 
knlGS:
[1.815928] CS:  0010 DS:  ES:  CR0: 80050033
[

[bisected] Re: drm, qxl: post 5.11 merge warning+explosion

2020-12-17 Thread Mike Galbraith
ee5d2a8e549e90325fcc31825269f89647cd6fac is the first bad commit
commit ee5d2a8e549e90325fcc31825269f89647cd6fac
Author: Christian König 
Date:   Sat Oct 24 13:10:28 2020 +0200

drm/ttm: wire up the new pool as default one v2

Provide the necessary parameters by all drivers and use the new pool alloc
when no driver specific function is provided.

v2: fix the GEM VRAM helpers

Signed-off-by: Christian König 
Reviewed-by: Dave Airlie 
Reviewed-by: Madhav Chauhan 
Tested-by: Huang Rui 
Link: https://patchwork.freedesktop.org/patch/397081/?series=83051=1

 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  4 ++--
 drivers/gpu/drm/drm_gem_vram_helper.c   |  4 ++--
 drivers/gpu/drm/nouveau/nouveau_ttm.c   | 14 +-
 drivers/gpu/drm/qxl/qxl_ttm.c   |  5 ++---
 drivers/gpu/drm/radeon/radeon_ttm.c |  4 ++--
 drivers/gpu/drm/ttm/ttm_bo.c|  8 ++--
 drivers/gpu/drm/ttm/ttm_memory.c|  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c|  5 ++---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  5 +++--
 include/drm/ttm/ttm_bo_driver.h | 11 +++
 10 files changed, 36 insertions(+), 26 deletions(-)

git bisect start 'drivers/gpu/drm/qxl'
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [accefff5b547a9a1d959c7e76ad539bf2480e78b] Merge tag 
'arm-soc-omap-genpd-5.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad accefff5b547a9a1d959c7e76ad539bf2480e78b
# bad: [d635a69dd4981cc51f90293f5f64268620ed1565] Merge tag 'net-next-5.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad d635a69dd4981cc51f90293f5f64268620ed1565
# bad: [0ca2ce81eb8ee30f3ba8ac7967fef9cfbb44dbdb] Merge tag 'arm64-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
git bisect bad 0ca2ce81eb8ee30f3ba8ac7967fef9cfbb44dbdb
# bad: [f8aab60422c371425365d386dfd51e0c6c5b1041] drm/amdgpu: Initialise 
drm_gem_object_funcs for imported BOs
git bisect bad f8aab60422c371425365d386dfd51e0c6c5b1041
# bad: [c0f98d2f8b076bf3e3183aa547395f919c943a14] Merge tag 
'drm-misc-next-2020-11-05' of git://anongit.freedesktop.org/drm/drm-misc into 
drm-next
git bisect bad c0f98d2f8b076bf3e3183aa547395f919c943a14
# good: [6a6e5988a2657cd0c91f6f1a3e7d194599248b6d] drm/ttm: replace last 
move_notify with delete_mem_notify
git bisect good 6a6e5988a2657cd0c91f6f1a3e7d194599248b6d
# good: [f566fdcd6cc49a9d5b5d782f56e3e7cb243f01b8] drm/i915: Force VT'd 
workarounds when running as a guest OS
git bisect good f566fdcd6cc49a9d5b5d782f56e3e7cb243f01b8
# good: [e76ab2cf21c38331155ea613cdf18582f011c30f] drm/i915: Remove 
per-platform IIR HPD masking
git bisect good e76ab2cf21c38331155ea613cdf18582f011c30f
# bad: [268af50f38b1f2199a2e85e38073d7a25c20190c] drm/panfrost: Support 
cache-coherent integrations
git bisect bad 268af50f38b1f2199a2e85e38073d7a25c20190c
# good: [e000650375b65ff77c5ee852b5086f58c741179e] fbdev/atafb: Remove unused 
extern variables
git bisect good e000650375b65ff77c5ee852b5086f58c741179e
# bad: [461619f5c3242aaee9ec3f0b7072719bd86ea207] drm/nouveau: switch to new 
allocator
git bisect bad 461619f5c3242aaee9ec3f0b7072719bd86ea207
# good: [d099fc8f540add80f725014fdd4f7f49f3c58911] drm/ttm: new TT backend 
allocation pool v3
git bisect good d099fc8f540add80f725014fdd4f7f49f3c58911
# bad: [e93b2da9799e5cb97760969f3e1f02a5bdac29fe] drm/amdgpu: switch to new 
allocator v2
git bisect bad e93b2da9799e5cb97760969f3e1f02a5bdac29fe
# bad: [ee5d2a8e549e90325fcc31825269f89647cd6fac] drm/ttm: wire up the new pool 
as default one v2
git bisect bad ee5d2a8e549e90325fcc31825269f89647cd6fac
# first bad commit: [ee5d2a8e549e90325fcc31825269f89647cd6fac] drm/ttm: wire up 
the new pool as default one v2



[bisected] Re: regression: nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop

2020-12-17 Thread Mike Galbraith
On Wed, 2020-12-16 at 14:31 +0100, Mike Galbraith wrote:
> When the below new to 5.11 cycle badness happens, it's time to reboot.
>
> ...
> [   27.467260] NFSD: Using UMH upcall client tracking operations.
> [   27.467273] NFSD: starting 90-second grace period (net f0a0)
> [   27.965138] Bridge firewalling registered
> [   39.096604] fuse: init (API version 7.32)
> [  961.579832] nouveau :01:00.0: fifo: fault 01 [WRITE] at 
> 0069f000 engine 15 [CE0] client 01 [HUB/CE0] reason 02 [PTE] on 
> channel 1 [00ff73d000 DRM]
> [  961.579840] nouveau :01:00.0: fifo: channel 1: killed
> [  961.579844] nouveau :01:00.0: fifo: runlist 0: scheduled for recovery
> [  961.579850] nouveau :01:00.0: fifo: runlist 4: scheduled for recovery
> [  961.579853] nouveau :01:00.0: fifo: engine 4: scheduled for recovery
>
> Box is aging generic i4790 desktop box with...
> 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] 
> (rev a1)

Bisection was straight forward.  A post bisect test revert was equally
straight forward, and seems to confirm the fingered commit.

0c8c0659d7475b6304b67374caf15b56cf0be4f9 is the first bad commit
commit 0c8c0659d7475b6304b67374caf15b56cf0be4f9
Author: Dave Airlie 
Date:   Thu Oct 29 13:59:20 2020 +1000

drm/nouveau/ttm: use multihop

This removes the code to move resources directly between
SYSTEM and VRAM in favour of using the core ttm mulithop code.

Signed-off-by: Dave Airlie 
Acked-by: Daniel Vetter 
Reviewed-by: Christian König 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20201109005432.861936-4-airl...@gmail.com

 drivers/gpu/drm/nouveau/nouveau_bo.c | 112 ---
 1 file changed, 13 insertions(+), 99 deletions(-)

git bisect start 'drivers/gpu'
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [accefff5b547a9a1d959c7e76ad539bf2480e78b] Merge tag 
'arm-soc-omap-genpd-5.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad accefff5b547a9a1d959c7e76ad539bf2480e78b
# bad: [d635a69dd4981cc51f90293f5f64268620ed1565] Merge tag 'net-next-5.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad d635a69dd4981cc51f90293f5f64268620ed1565
# bad: [0ca2ce81eb8ee30f3ba8ac7967fef9cfbb44dbdb] Merge tag 'arm64-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
git bisect bad 0ca2ce81eb8ee30f3ba8ac7967fef9cfbb44dbdb
# good: [f8aab60422c371425365d386dfd51e0c6c5b1041] drm/amdgpu: Initialise 
drm_gem_object_funcs for imported BOs
git bisect good f8aab60422c371425365d386dfd51e0c6c5b1041
# bad: [fab0fca1da5cdc48be051715cd9787df04fdce3a] Merge tag 'media/v5.11-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect bad fab0fca1da5cdc48be051715cd9787df04fdce3a
# bad: [bcc68bd8161261ceeb1a4ab02b5265758944f90d] Merge tag 
'auxdisplay-for-linus-v5.11' of git://github.com/ojeda/linux
git bisect bad bcc68bd8161261ceeb1a4ab02b5265758944f90d
# bad: [22f8c80566c4a29a0d8b5ebf24aa1fd1679b39e5] Merge tag 
'drm-misc-next-2020-11-18' of ssh://git.freedesktop.org/git/drm/drm-misc into 
drm-next
git bisect bad 22f8c80566c4a29a0d8b5ebf24aa1fd1679b39e5
# bad: [a1ac250a82a5e97db71f14101ff7468291a6aaef] fbcon: Avoid using 
FNTCHARCNT() and hard-coded built-in font charcount
git bisect bad a1ac250a82a5e97db71f14101ff7468291a6aaef
# good: [a39855076c859b7f6c58ed4da8f195a2a6cd3c7b] drm/cma-helper: Make default 
object functions the default
git bisect good a39855076c859b7f6c58ed4da8f195a2a6cd3c7b
# bad: [5f1f10998e7f0ba98a8efc27009cd9a11cff6616] 
drm/atmel-hlcdc/atmel_hlcdc_plane: Staticise local function 
'atmel_hlcdc_plane_setup_scaler()'
git bisect bad 5f1f10998e7f0ba98a8efc27009cd9a11cff6616
# good: [55c8bcaeccaa5c6d9e7a432ebd0a1717f488a3f4] drm: mxsfb: Implement 
.format_mod_supported
git bisect good 55c8bcaeccaa5c6d9e7a432ebd0a1717f488a3f4
# bad: [0c8c0659d7475b6304b67374caf15b56cf0be4f9] drm/nouveau/ttm: use multihop
git bisect bad 0c8c0659d7475b6304b67374caf15b56cf0be4f9
# good: [23d6ab1d4c503660632e7b18cbb571d62d9bf792] drm: remove 
pgprot_decrypted() before calls to io_remap_pfn_range()
git bisect good 23d6ab1d4c503660632e7b18cbb571d62d9bf792
# good: [ebdf565169af006ee3be8c40eecbfc77d28a3b84] drm/ttm: add multihop 
infrastrucutre (v3)
git bisect good ebdf565169af006ee3be8c40eecbfc77d28a3b84
# good: [f5a89a5cae812a39993be32e74c8ed7856b1e2b2] drm/amdgpu/ttm: use multihop
git bisect good f5a89a5cae812a39993be32e74c8ed7856b1e2b2
# first bad commit: [0c8c0659d7475b6304b67374caf15b56cf0be4f9] drm/nouveau/ttm: 
use multihop



Re: regression: 9a56493f6942 "uts: Use generic ns_common::count" broke makedumpfile 1.6.7

2020-12-16 Thread Mike Galbraith
On Wed, 2020-12-16 at 18:20 +0300, Kirill Tkhai wrote:
>
> We may consider a patch like the below till updated makedumpfile is not 
> published:

It's nice to be considerate to the tool folks, but that can leave a
wake of cruft (thinks printk ringbuffer.. ew), so it's probably best to
just let it break.  The why and wherefore is documented in this thread,
that's enough for anybody who needs a quick fix to get on with frobbing
their dang buggy bleeding edge kernel ;-)

> diff --git a/include/linux/utsname.h b/include/linux/utsname.h
> index 2b1737c9b244..d6bcad448f52 100644
> --- a/include/linux/utsname.h
> +++ b/include/linux/utsname.h
> @@ -8,6 +8,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  enum uts_proc {
>   UTS_PROC_OSTYPE,
> @@ -21,6 +22,7 @@ struct user_namespace;
>  extern struct user_namespace init_user_ns;
>
>  struct uts_namespace {
> + struct kref unused;
>   struct new_utsname name;
>   struct user_namespace *user_ns;
>   struct ucounts *ucounts;



Re: regression: 9a56493f6942 "uts: Use generic ns_common::count" broke makedumpfile 1.6.7

2020-12-16 Thread Mike Galbraith
On Wed, 2020-12-16 at 15:31 +0100, Mike Galbraith wrote:
> On Wed, 2020-12-16 at 17:23 +0300, Kirill Tkhai wrote:
> >
> > Does this regression only cause that one error message "check_release: 
> > Can't get the kernel version"
> > is printed instead of another: "The kernel version is not supported."?
>
> The thing does indeed mutter about the kernel version, with or without
> 9a56493f6942 reverted, but it work despite the muttering with
> 9a56493f6942 reverted.

makedumpfile 1.6.7 source only claims to work up to linux-5.4.8, but
actually does work all the way up until 9a56493f6942.

If the answer here is that v1,6,7 has reached EOL, that's fine, I'll
just carry a revert until I can cobble together an updated package.

-Mike



Re: regression: 9a56493f6942 "uts: Use generic ns_common::count" broke makedumpfile 1.6.7

2020-12-16 Thread Mike Galbraith
On Wed, 2020-12-16 at 17:23 +0300, Kirill Tkhai wrote:
>
> Does this regression only cause that one error message "check_release: Can't 
> get the kernel version"
> is printed instead of another: "The kernel version is not supported."?

The thing does indeed mutter about the kernel version, with or without
9a56493f6942 reverted, but it work despite the muttering with
9a56493f6942 reverted.

-Mike



Re: drm, qxl: post 5.11 merge warning+explosion

2020-12-16 Thread Mike Galbraith


On Wed, 2020-12-16 at 03:41 +0100, Mike Galbraith wrote:
> Little kvm works fine with nomodeset, so will be busy for a while
> bisecting two other problems that popped up here this cycle. (hohum)
>
> [1.815561] WARNING: CPU: 7 PID: 355 at drivers/gpu/drm/ttm/ttm_pool.c:365 
> ttm_pool_alloc+0x41b/0x540 [ttm]
> [1.815561] Modules linked in: ext4(E) crc16(E) mbcache(E) jbd2(E) 
> ata_generic(E) ata_piix(E) virtio_console(E) virtio_rng(E) virtio_blk(E) 
> qxl(E) drm_ttm_helper(E) ttm(E) drm_kms_helper(E) syscopyarea(E) 
> sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) cec(E) libahci(E) 
> uhci_hcd(E) ehci_pci(E) rc_core(E) ehci_hcd(E) crc32c_intel(E) serio_raw(E) 
> virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) libata(E) drm(E) usbcore(E) 
> mii(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) 
> scsi_dh_alua(E) scsi_mod(E) autofs4(E)
> [1.815589] CPU: 7 PID: 355 Comm: kworker/7:2 Tainted: GE 
> 5.10.0.g489e9fe-master #26
> [1.815590] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> [1.815614] Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
> [1.815621] RIP: 0010:ttm_pool_alloc+0x41b/0x540 [ttm]
> [1.815623] Code: fc ff ff 89 ea 48 8d 04 d5 00 00 00 00 48 29 d0 48 8d 3c 
> c5 00 1c 40 a0 e9 d7 fc ff ff 85 c0 0f 89 2f fc ff ff e9 28 fc ff ff <0f> 0b 
> e9 35 fc ff ff 89 e9 49 8b 7d 00 b8 00 10 00 00 48 d3 e0 45
> [1.815623] RSP: 0018:888105d3b818 EFLAGS: 00010246
> [1.815625] RAX:  RBX: 888106978800 RCX: 
> 
> [1.815626] RDX: 888105d3bc68 RSI: 0001 RDI: 
> 888106238820
> [1.815626] RBP: 888106238758 R08: c9296000 R09: 
> 816b
> [1.815627] R10: 0001 R11: c9296000 R12: 
> 
> [1.815628] R13: 888106238820 R14:  R15: 
> 888106978800
> [1.815628] FS:  () GS:888237dc() 
> knlGS:
> [1.815632] CS:  0010 DS:  ES:  CR0: 80050033
> [1.815633] CR2: 7eff52a0d5b8 CR3: 02010003 CR4: 
> 001706e0
> [1.815633] Call Trace:
> [1.815644]  ttm_tt_populate+0xb1/0xc0 [ttm]
> [1.815647]  ttm_bo_move_memcpy+0x4a5/0x500 [ttm]
> [1.815652]  qxl_bo_move+0x230/0x2f0 [qxl]
> [1.815655]  ttm_bo_handle_move_mem+0x79/0x140 [ttm]
> [1.815657]  ttm_bo_evict+0x124/0x250 [ttm]
> [1.815693]  ? drm_mm_insert_node_in_range+0x55c/0x580 [drm]
> [1.815696]  ttm_mem_evict_first+0x110/0x3d0 [ttm]
> [1.815698]  ttm_bo_mem_space+0x261/0x270 [ttm]
> [1.815702]  ? qxl_ttm_debugfs_init+0xb0/0xb0 [qxl]
> [1.815705]  ttm_bo_validate+0x117/0x150 [ttm]
> [1.815756]  ttm_bo_init_reserved+0x2c8/0x3c0 [ttm]
> [1.815772]  qxl_bo_create+0x134/0x1d0 [qxl]
> [1.815775]  ? qxl_ttm_debugfs_init+0xb0/0xb0 [qxl]
> [1.815791]  qxl_alloc_bo_reserved+0x2c/0x90 [qxl]
> [1.815794]  qxl_image_alloc_objects+0xa3/0x120 [qxl]
> [1.815797]  qxl_draw_dirty_fb+0x155/0x450 [qxl]
> [1.815815]  ? _cond_resched+0x15/0x40
> [1.815819]  ? ww_mutex_lock_interruptible+0x12/0x60
> [1.815822]  qxl_framebuffer_surface_dirty+0x14f/0x1a0 [qxl]
> [1.815841]  drm_fb_helper_dirty_work+0x11d/0x180 [drm_kms_helper]
> [1.815853]  process_one_work+0x1f5/0x3c0
> [1.815866]  ? process_one_work+0x3c0/0x3c0
> [1.815867]  worker_thread+0x2d/0x3d0
> [1.815868]  ? process_one_work+0x3c0/0x3c0
> [1.815872]  kthread+0x117/0x130
> [1.815876]  ? kthread_park+0x90/0x90
> [1.815880]  ret_from_fork+0x1f/0x30
> [1.815886] ---[ end trace 51e464c1e89a1728 ]---
> [1.815894] BUG: kernel NULL pointer dereference, address: 0230
> [1.815895] #PF: supervisor read access in kernel mode
> [1.815895] #PF: error_code(0x) - not-present page
> [1.815896] PGD 0 P4D 0
> [1.815898] Oops:  [#1] SMP NOPTI
> [1.815900] CPU: 7 PID: 355 Comm: kworker/7:2 Tainted: GW   E 
> 5.10.0.g489e9fe-master #26
> [1.815901] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> [1.815916] Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
> [1.815921] RIP: 0010:dma_map_page_attrs+0xf/0x1c0
> [1.815922] Code: 1f 17 5b 01 48 85 c0 75 e3 31 c0 c3 66 66 2e 0f 1f 84 00 
> 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 41 55 41 54 55 53 48 83 ec 08 <48> 8b 
> 87 30 02 00 00 48 85 c0 48 0f 44 05 e7 16 5b 01 41 83 f8 02
> [1.815923] RSP: 0018:888105d3b7e8 EFLAGS: 00010296
> [1.815924] RAX: 1000 RBX: 0001 

Re: regression: 9a56493f6942 "uts: Use generic ns_common::count" broke makedumpfile 1.6.7

2020-12-16 Thread Mike Galbraith
On Wed, 2020-12-16 at 15:35 +0300, Kirill Tkhai wrote:
> Hi, Alexander,
>
> On 16.12.2020 14:02, Mike Galbraith wrote:
> > Greetings,
> >
> > With this commit, bisected and confirmed, kdump stops working here,
> > makedumpfile saying "check_release: Can't get the kernel version".
>
> hasn't your commit 55d9e11398a4 "kdump: append uts_namespace.name offset to 
> VMCOREINFO"
> fixed this issue?

FWIW, I applied the below, but it didn't help.

---
 kernel/crash_core.c |1 +
 1 file changed, 1 insertion(+)

--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -447,6 +447,7 @@ static int __init crash_save_vmcoreinfo_
VMCOREINFO_PAGESIZE(PAGE_SIZE);

VMCOREINFO_SYMBOL(init_uts_ns);
+   VMCOREINFO_OFFSET(uts_namespace, name);
VMCOREINFO_SYMBOL(node_online_map);
 #ifdef CONFIG_MMU
VMCOREINFO_SYMBOL_ARRAY(swapper_pg_dir);



regression: nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop

2020-12-16 Thread Mike Galbraith
When the below new to 5.11 cycle badness happens, it's time to reboot.

...
[   27.467260] NFSD: Using UMH upcall client tracking operations.
[   27.467273] NFSD: starting 90-second grace period (net f0a0)
[   27.965138] Bridge firewalling registered
[   39.096604] fuse: init (API version 7.32)
[  961.579832] nouveau :01:00.0: fifo: fault 01 [WRITE] at 0069f000 
engine 15 [CE0] client 01 [HUB/CE0] reason 02 [PTE] on channel 1 [00ff73d000 
DRM]
[  961.579840] nouveau :01:00.0: fifo: channel 1: killed
[  961.579844] nouveau :01:00.0: fifo: runlist 0: scheduled for recovery
[  961.579850] nouveau :01:00.0: fifo: runlist 4: scheduled for recovery
[  961.579853] nouveau :01:00.0: fifo: engine 4: scheduled for recovery

Box is aging generic i4790 desktop box with...
01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] 
(rev a1)

-Mike



regression: 9a56493f6942 "uts: Use generic ns_common::count" broke makedumpfile 1.6.7

2020-12-16 Thread Mike Galbraith
Greetings,

With this commit, bisected and confirmed, kdump stops working here,
makedumpfile saying "check_release: Can't get the kernel version".

-Mike



drm, qxl: post 5.11 merge warning+explosion

2020-12-15 Thread Mike Galbraith
Little kvm works fine with nomodeset, so will be busy for a while
bisecting two other problems that popped up here this cycle. (hohum)

[1.815561] WARNING: CPU: 7 PID: 355 at drivers/gpu/drm/ttm/ttm_pool.c:365 
ttm_pool_alloc+0x41b/0x540 [ttm]
[1.815561] Modules linked in: ext4(E) crc16(E) mbcache(E) jbd2(E) 
ata_generic(E) ata_piix(E) virtio_console(E) virtio_rng(E) virtio_blk(E) qxl(E) 
drm_ttm_helper(E) ttm(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) 
sysimgblt(E) ahci(E) fb_sys_fops(E) cec(E) libahci(E) uhci_hcd(E) ehci_pci(E) 
rc_core(E) ehci_hcd(E) crc32c_intel(E) serio_raw(E) virtio_pci(E) 
virtio_ring(E) 8139cp(E) virtio(E) libata(E) drm(E) usbcore(E) mii(E) sg(E) 
dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) 
scsi_mod(E) autofs4(E)
[1.815589] CPU: 7 PID: 355 Comm: kworker/7:2 Tainted: GE 
5.10.0.g489e9fe-master #26
[1.815590] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[1.815614] Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
[1.815621] RIP: 0010:ttm_pool_alloc+0x41b/0x540 [ttm]
[1.815623] Code: fc ff ff 89 ea 48 8d 04 d5 00 00 00 00 48 29 d0 48 8d 3c 
c5 00 1c 40 a0 e9 d7 fc ff ff 85 c0 0f 89 2f fc ff ff e9 28 fc ff ff <0f> 0b e9 
35 fc ff ff 89 e9 49 8b 7d 00 b8 00 10 00 00 48 d3 e0 45
[1.815623] RSP: 0018:888105d3b818 EFLAGS: 00010246
[1.815625] RAX:  RBX: 888106978800 RCX: 
[1.815626] RDX: 888105d3bc68 RSI: 0001 RDI: 888106238820
[1.815626] RBP: 888106238758 R08: c9296000 R09: 816b
[1.815627] R10: 0001 R11: c9296000 R12: 
[1.815628] R13: 888106238820 R14:  R15: 888106978800
[1.815628] FS:  () GS:888237dc() 
knlGS:
[1.815632] CS:  0010 DS:  ES:  CR0: 80050033
[1.815633] CR2: 7eff52a0d5b8 CR3: 02010003 CR4: 001706e0
[1.815633] Call Trace:
[1.815644]  ttm_tt_populate+0xb1/0xc0 [ttm]
[1.815647]  ttm_bo_move_memcpy+0x4a5/0x500 [ttm]
[1.815652]  qxl_bo_move+0x230/0x2f0 [qxl]
[1.815655]  ttm_bo_handle_move_mem+0x79/0x140 [ttm]
[1.815657]  ttm_bo_evict+0x124/0x250 [ttm]
[1.815693]  ? drm_mm_insert_node_in_range+0x55c/0x580 [drm]
[1.815696]  ttm_mem_evict_first+0x110/0x3d0 [ttm]
[1.815698]  ttm_bo_mem_space+0x261/0x270 [ttm]
[1.815702]  ? qxl_ttm_debugfs_init+0xb0/0xb0 [qxl]
[1.815705]  ttm_bo_validate+0x117/0x150 [ttm]
[1.815756]  ttm_bo_init_reserved+0x2c8/0x3c0 [ttm]
[1.815772]  qxl_bo_create+0x134/0x1d0 [qxl]
[1.815775]  ? qxl_ttm_debugfs_init+0xb0/0xb0 [qxl]
[1.815791]  qxl_alloc_bo_reserved+0x2c/0x90 [qxl]
[1.815794]  qxl_image_alloc_objects+0xa3/0x120 [qxl]
[1.815797]  qxl_draw_dirty_fb+0x155/0x450 [qxl]
[1.815815]  ? _cond_resched+0x15/0x40
[1.815819]  ? ww_mutex_lock_interruptible+0x12/0x60
[1.815822]  qxl_framebuffer_surface_dirty+0x14f/0x1a0 [qxl]
[1.815841]  drm_fb_helper_dirty_work+0x11d/0x180 [drm_kms_helper]
[1.815853]  process_one_work+0x1f5/0x3c0
[1.815866]  ? process_one_work+0x3c0/0x3c0
[1.815867]  worker_thread+0x2d/0x3d0
[1.815868]  ? process_one_work+0x3c0/0x3c0
[1.815872]  kthread+0x117/0x130
[1.815876]  ? kthread_park+0x90/0x90
[1.815880]  ret_from_fork+0x1f/0x30
[1.815886] ---[ end trace 51e464c1e89a1728 ]---
[1.815894] BUG: kernel NULL pointer dereference, address: 0230
[1.815895] #PF: supervisor read access in kernel mode
[1.815895] #PF: error_code(0x) - not-present page
[1.815896] PGD 0 P4D 0
[1.815898] Oops:  [#1] SMP NOPTI
[1.815900] CPU: 7 PID: 355 Comm: kworker/7:2 Tainted: GW   E 
5.10.0.g489e9fe-master #26
[1.815901] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[1.815916] Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
[1.815921] RIP: 0010:dma_map_page_attrs+0xf/0x1c0
[1.815922] Code: 1f 17 5b 01 48 85 c0 75 e3 31 c0 c3 66 66 2e 0f 1f 84 00 
00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 41 55 41 54 55 53 48 83 ec 08 <48> 8b 87 
30 02 00 00 48 85 c0 48 0f 44 05 e7 16 5b 01 41 83 f8 02
[1.815923] RSP: 0018:888105d3b7e8 EFLAGS: 00010296
[1.815924] RAX: 1000 RBX: 0001 RCX: 1000
[1.815924] RDX:  RSI: ea0004171e40 RDI: 
[1.815925] RBP:  R08:  R09: 
[1.815925] R10: ea0004171e40 R11: c9296000 R12: 0001
[1.815926] R13: 888106238820 R14: 888105d07100 R15: 888106978800
[1.815926] FS:  () GS:888237dc() 
knlGS:
[1.815928] CS:  0010 DS:  ES:  CR0: 

Re: sched: exporting linux-rt migrate_disable for ZFS

2020-12-14 Thread Mike Galbraith
On Mon, 2020-12-14 at 13:33 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-12-08 23:58:27 [+], Orivej Desh wrote:
> > Greetings!
> Hi,
>
> > With sched-Add-migrate_disable.patch first released in v5.9-rc8-rt14 [1]
> > linux-rt defines functions migrate_disable and migrate_enable.
> > They are used in linux headers in various macros and static inline
> > functions, and in particular in kmap_atomic and kunmap_atomic.
> > The latter are needed by ZFS which currently fails to build against
> > 5.9-rt [2] because these functions are exported with EXPORT_SYMBOL_GPL.
> > Could you export them with EXPORT_SYMBOL instead?
>
> This is out of my jurisdiction, so just a few notes:
> - v5.9 is out of maintenance. Be careful.
> - We don't export symbols for out-of-tree modules.

Hm, dejavu all over again.  I recall this issue long ago having been
resolved in favor of exporting via plain EXPORT_SYMBOL.

https://lore.kernel.org/linux-rt-users/1321359802.4181.1.camel@frodo/

From 859a31c5ec958326dd046f4e41f6fa0db0ce98c3 Mon Sep 17 00:00:00 2001
From: Steven Rostedt 
Date: Mon, 16 Apr 2012 21:51:54 -0400
Subject: rt: Make migrate_disable/enable() and __rt_mutex_init non-GPL only

Modules that load on the normal vanilla kernel should also load on
an -rt kernel as well. This does not mean we condone non-GPL modules,
we are only being consistent.

Signed-off-by: Steven Rostedt 
---
 kernel/rtmutex.c | 2 +-
 kernel/sched.c   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index a02fbd9f9583..d58db993f5cd 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -1282,7 +1282,7 @@ void __rt_mutex_init(struct rt_mutex *lock, const char 
*name)

debug_rt_mutex_init(lock, name);
 }
-EXPORT_SYMBOL_GPL(__rt_mutex_init);
+EXPORT_SYMBOL(__rt_mutex_init);

 /**
  * rt_mutex_init_proxy_locked - initialize and lock a rt_mutex on behalf of a
diff --git a/kernel/sched.c b/kernel/sched.c
index df3048139d1b..ed3433573641 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4255,7 +4255,7 @@ void migrate_disable(void)
p->migrate_disable = 1;
preempt_enable();
 }
-EXPORT_SYMBOL_GPL(migrate_disable);
+EXPORT_SYMBOL(migrate_disable);

 void migrate_enable(void)
 {
@@ -4307,7 +4307,7 @@ void migrate_enable(void)
unpin_current_cpu();
preempt_enable();
 }
-EXPORT_SYMBOL_GPL(migrate_enable);
+EXPORT_SYMBOL(migrate_enable);
 #else
 static inline void update_migrate_disable(struct task_struct *p) { }
 #define migrate_disabled_updated(p)0
--
2.29.2




Re: [RT] 5.9-rt14 softirq_ctrl.lock vs listening_hash[i].lock lockdep splat

2020-12-09 Thread Mike Galbraith
On Wed, 2020-12-09 at 11:05 +0100, Peter Zijlstra wrote:
>
> > [   47.844585]  Possible unsafe locking scenario:
> >
> > [   47.844586]CPU0CPU1
> > [   47.844586]
> > [   47.844587]   lock(>listening_hash[i].lock);
> > [   47.844588]
> > lock((softirq_ctrl.lock).lock);
> > [   47.844588]
> > lock(>listening_hash[i].lock);
> > [   47.844589]   lock((softirq_ctrl.lock).lock);
> > [   47.844590]
> > *** DEADLOCK ***
> >
> >
> So I've been looking at these local_lock vs lockdep splats for a bit,
> and unlike the IRQ inversions as reported here:
>
>   
> https://lore.kernel.org/linux-usb/20201029174348.omqiwjqy64teb...@linutronix.de/
>
> I think the above is an actual real problem (for RT).
>
> AFAICT the above translates to:
>
>   inet_listen()
> lock_sock()
>   spin_lock_bh(>sk_lock.slock);
>   acquire(softirq_ctrl);
>   acquire(>sk_lock.slock);
>
> inet_csk_listen_start()
>   sk->sk_prot->hash() := inet_hash()
>   local_bh_disable()
>   __inet_hash()
> spin_lock(>lock);
>   acquire(>lock);
>
>   
>
>   tcp4_seq_next()
> listening_get_next()
>   spin_lock(>lock);
>   acquire(>lock);
>
>   tcp4_seq_show()
> get_tcp4_sock()
>   sock_i_ino()
>   read_lock_bh(>sk_callback_lock);
> acquire(softirq_ctrl) // < whoops
> acquire(>sk_callback_lock)
>
>
> Which you can run in two tasks on the same CPU (and thus get the same
> softirq_ctrl local_lock), and deadlock.
>
> By holding softirq_ctrl we serialize against softirq-context
> (in-softirq) but that isn't relevant here, since neither context is
> that.
>
> On !RT there isn't a problem because softirq_ctrl isn't an actual lock,
> but the moment that turns into a real lock (like on RT) you're up a
> creek.
>
> In general we have the rule that as long as a lock is only ever used
> from task context (like the above ilb->lock, afaict) then it doesn't
> matter if you also take it with (soft)irqs disabled or not. But this
> softirq scheme breaks that. If you ever take a lock with BH disabled,
> you must now always take it with BH disabled, otherwise you risk
> deadlocks against the softirq_ctrl lock.
>
> Or am I missing something obvious (again) ?

Sebastian fixed this via...

From 0fe43be6c32e05d0dd692069d41a40c5453a2195 Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior 
Date: Mon, 12 Oct 2020 17:33:54 +0200
Subject: tcp: Remove superfluous BH-disable around listening_hash

Commit
   9652dc2eb9e40 ("tcp: relax listening_hash operations")

removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.

Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock, invoke inet_ehash_nolisten() with disabled BH.
inet_unhash() conditionally acquires listening_hash->lock.

Reported-by: Mike Galbraith 
Signed-off-by: Sebastian Andrzej Siewior 
Link: 
https://lore.kernel.org/linux-rt-users/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.ca...@gmx.de/
Signed-off-by: Mike Galbraith 
---
 net/ipv4/inet_hashtables.c  | 19 ---
 net/ipv6/inet6_hashtables.c |  5 +
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 45fb450b4522..5fb95030e7c0 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -635,7 +635,9 @@ int __inet_hash(struct sock *sk, struct sock *osk)
int err = 0;

if (sk->sk_state != TCP_LISTEN) {
+   local_bh_disable();
inet_ehash_nolisten(sk, osk, NULL);
+   local_bh_enable();
return 0;
}
WARN_ON(!sk_unhashed(sk));
@@ -667,11 +669,8 @@ int inet_hash(struct sock *sk)
 {
int err = 0;

-   if (sk->sk_state != TCP_CLOSE) {
-   local_bh_disable();
+   if (sk->sk_state != TCP_CLOSE)
err = __inet_hash(sk, NULL);
-   local_bh_enable();
-   }

return err;
 }
@@ -682,17 +681,20 @@ void inet_unhash(struct sock *sk)
struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
struct inet_listen_hashbucket *ilb = NULL;
spinlock_t *lock;
+   bool state_listen;

if (sk_unhashed(sk))
return;

if (sk->sk_state == TCP_LISTEN) {
+   state_listen = true;
ilb = >listening_hash[inet_sk_listen_hashfn(sk)];
-   lock = >lock;

Re: scheduling while atomic in z3fold

2020-12-08 Thread Mike Galbraith
On Wed, 2020-12-09 at 07:13 +0100, Mike Galbraith wrote:
> On Wed, 2020-12-09 at 00:26 +0100, Vitaly Wool wrote:
> > Hi Mike,
> >
> > On 2020-12-07 16:41, Mike Galbraith wrote:
> > > On Mon, 2020-12-07 at 16:21 +0100, Vitaly Wool wrote:
> > >> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith  wrote:
> > >>>
> > >>
> > >>> Unfortunately, that made zero difference.
> > >>
> > >> Okay, I suggest that you submit the patch that changes read_lock() to
> > >> write_lock() in __release_z3fold_page() and I'll ack it then.
> > >> I would like to rewrite the code so that write_lock is not necessary
> > >> there but I don't want to hold you back and it isn't likely that I'll
> > >> complete this today.
> > >
> > > Nah, I'm in no rush... especially not to sign off on "Because the
> > > little voices in my head said this bit should look like that bit over
> > > yonder, and testing _seems_ to indicate they're right about that" :)
> > >
> > >   -Mike
> > >
> >
> > okay, thanks. Would this make things better:
>
> Yup, z3fold became RT tolerant with this (un-munged and) applied.

Below is the other change that any RT users of z3fold will need.

mm, z3fold: Remove preempt disabled sections for RT

Replace get_cpu_ptr() with migrate_disable()+this_cpu_ptr() so RT can take
spinlocks that become sleeping locks.

Signed-off-by Mike Galbraith 
---
 mm/z3fold.c |   17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -617,14 +617,16 @@ static inline void add_to_unbuddied(stru
 {
if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
zhdr->middle_chunks == 0) {
-   struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+   struct list_head *unbuddied;
int freechunks = num_free_chunks(zhdr);
+
+   migrate_disable();
+   unbuddied = this_cpu_ptr(pool->unbuddied);
spin_lock(>lock);
list_add(>buddy, [freechunks]);
spin_unlock(>lock);
zhdr->cpu = smp_processor_id();
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
}
 }

@@ -861,8 +863,9 @@ static inline struct z3fold_header *__z3
int chunks = size_to_chunks(size), i;

 lookup:
+   migrate_disable();
/* First, try to find an unbuddied z3fold page. */
-   unbuddied = get_cpu_ptr(pool->unbuddied);
+   unbuddied = this_cpu_ptr(pool->unbuddied);
for_each_unbuddied_list(i, chunks) {
struct list_head *l = [i];

@@ -880,7 +883,7 @@ static inline struct z3fold_header *__z3
!z3fold_page_trylock(zhdr)) {
spin_unlock(>lock);
zhdr = NULL;
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
if (can_sleep)
cond_resched();
goto lookup;
@@ -894,7 +897,7 @@ static inline struct z3fold_header *__z3
test_bit(PAGE_CLAIMED, >private)) {
z3fold_page_unlock(zhdr);
zhdr = NULL;
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
if (can_sleep)
cond_resched();
goto lookup;
@@ -909,7 +912,7 @@ static inline struct z3fold_header *__z3
kref_get(>refcount);
break;
}
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();

if (!zhdr) {
int cpu;



Re: scheduling while atomic in z3fold

2020-12-08 Thread Mike Galbraith
On Wed, 2020-12-09 at 00:26 +0100, Vitaly Wool wrote:
> Hi Mike,
>
> On 2020-12-07 16:41, Mike Galbraith wrote:
> > On Mon, 2020-12-07 at 16:21 +0100, Vitaly Wool wrote:
> >> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith  wrote:
> >>>
> >>
> >>> Unfortunately, that made zero difference.
> >>
> >> Okay, I suggest that you submit the patch that changes read_lock() to
> >> write_lock() in __release_z3fold_page() and I'll ack it then.
> >> I would like to rewrite the code so that write_lock is not necessary
> >> there but I don't want to hold you back and it isn't likely that I'll
> >> complete this today.
> >
> > Nah, I'm in no rush... especially not to sign off on "Because the
> > little voices in my head said this bit should look like that bit over
> > yonder, and testing _seems_ to indicate they're right about that" :)
> >
> > -Mike
> >
>
> okay, thanks. Would this make things better:

Yup, z3fold became RT tolerant with this (un-munged and) applied.

BTW, I don't think my write_lock() hacklet actually plugged the hole
that leads to the below.  I think it just reduced the odds of the two
meeting enough to make it look ~solid in fairly limited test time.

[ 3369.373023] kworker/-7413  4.12 3368809247us : do_compact_page: 
zhdr: 95d93abd8000 zhdr->slots: 95d951f5df80 zhdr->slots->slot[0]: 0
[ 3369.373025] kworker/-7413  4.12 3368809248us : do_compact_page: 
old_handle 95d951f5df98 *old_handle was 95d93abd804f now is 
95da3ab8104c
[ 3369.373027] kworker/-7413  4.11 3368809249us : 
__release_z3fold_page.constprop.25: freed 95d951f5df80
[ 3369.373028] -
[ 3369.373029] CR2: 0018
crash> p debug_handle | grep '\[2'
  [2]: 95dc1ecac0d0
crash> rd 95dc1ecac0d0
95dc1ecac0d0:  95d951f5df98...Q
crash> p debug_zhdr | grep '\[2'
  [2]: 95dc1ecac0c8
crash> rd 95dc1ecac0c8
95dc1ecac0c8:  95da3ab81000...:  <== kworker is 
not working on same zhdr as free_handle()..
crash> p debug_slots | grep '\[2'
  [2]: 95dc1ecac0c0
crash> rd 95dc1ecac0c0
95dc1ecac0c0:  95d951f5df80...Q  <== ..but is 
the same slots, and frees it under free_handle().
crash> bt -sx  leading 
to use after free+corruption+explosion 1us later.
PID: 9334   TASK: 95d95a1eb3c0  CPU: 2   COMMAND: "swapoff"
 #0 [b4248847f8f0] machine_kexec+0x16e at a605f8ce
 #1 [b4248847f938] __crash_kexec+0xd2 at a614c162
 #2 [b4248847f9f8] crash_kexec+0x30 at a614d350
 #3 [b4248847fa08] oops_end+0xca at a602680a
 #4 [b4248847fa28] no_context+0x14d at a606d7cd
 #5 [b4248847fa88] exc_page_fault+0x2b8 at a68bdb88
 #6 [b4248847fae0] asm_exc_page_fault+0x1e at a6a00ace
 #7 [b4248847fb68] mark_wakeup_next_waiter+0x51 at a60ea121
 #8 [b4248847fbd0] rt_mutex_futex_unlock+0x4f at a68c93ef
 #9 [b4248847fc10] z3fold_zpool_free+0x593 at c0ecb663 [z3fold]
#10 [b4248847fc78] zswap_free_entry+0x43 at a627c823
#11 [b4248847fc88] zswap_frontswap_invalidate_page+0x8a at a627c92a
#12 [b4248847fcb0] __frontswap_invalidate_page+0x48 at a627c018
#13 [b4248847fcd8] swapcache_free_entries+0x1ee at a6276f5e
#14 [b4248847fd20] free_swap_slot+0x9f at a627b8ff
#15 [b4248847fd40] delete_from_swap_cache+0x61 at a6274621
#16 [b4248847fd60] try_to_free_swap+0x70 at a6277520
#17 [b4248847fd70] unuse_vma+0x55c at a627869c
#18 [b4248847fe90] try_to_unuse+0x139 at a6278e89
#19 [b4248847fee8] __x64_sys_swapoff+0x1eb at a62798cb
#20 [b4248847ff40] do_syscall_64+0x33 at a68b9ab3
#21 [b4248847ff50] entry_SYSCALL_64_after_hwframe+0x44 at a6a0007c
RIP: 7fbd835a5d17  RSP: 7ffd60634458  RFLAGS: 0202
RAX: ffda  RBX: 559540e34b60  RCX: 7fbd835a5d17
RDX: 0001  RSI: 0001  RDI: 559540e34b60
RBP: 0001   R8: 7ffd606344c0   R9: 0003
R10: 559540e34721  R11: 0202  R12: 0001
R13:   R14: 7ffd606344c0  R15: 
ORIG_RAX: 00a8  CS: 0033  SS: 002b
crash>

>
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index 18feaa0bc537..340c38a5ffac 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -303,10 +303,9 @@ static inline void put_z3fold_header(struct
> z3fold_header *zhdr)
>   z3fold_page_unlock(zhdr);
>   }
>
> -static inline void free_handle(unsigned long handle)

Re: scheduling while atomic in z3fold

2020-12-07 Thread Mike Galbraith
On Mon, 2020-12-07 at 16:21 +0100, Vitaly Wool wrote:
> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith  wrote:
> >
>
> > Unfortunately, that made zero difference.
>
> Okay, I suggest that you submit the patch that changes read_lock() to
> write_lock() in __release_z3fold_page() and I'll ack it then.
> I would like to rewrite the code so that write_lock is not necessary
> there but I don't want to hold you back and it isn't likely that I'll
> complete this today.

Nah, I'm in no rush... especially not to sign off on "Because the
little voices in my head said this bit should look like that bit over
yonder, and testing _seems_ to indicate they're right about that" :)

-Mike



Re: scheduling while atomic in z3fold

2020-12-07 Thread Mike Galbraith
On Mon, 2020-12-07 at 12:52 +0100, Vitaly Wool wrote:
>
> Thanks. This trace beats me because I don't quite get how this could
> have happened.

I swear there's a mythical creature loose in there somewhere ;-)
Everything looks just peachy up to the instant it goes boom, then you
find in the wreckage that which was most definitely NOT there just a
few ns ago.

> Hitting write_unlock at line 341 would mean that HANDLES_ORPHANED bit
> is set but obviously it isn't.
> Could you please comment out the ".shrink = z3fold_zpool_shrink" line
> and retry?

Unfortunately, that made zero difference.

-Mike



Re: scheduling while atomic in z3fold

2020-12-06 Thread Mike Galbraith
On Mon, 2020-12-07 at 02:05 +0100, Vitaly Wool wrote:
>
> Could you please try the following patch in your setup:

crash> gdb list *z3fold_zpool_free+0x527
0xc0e14487 is in z3fold_zpool_free (mm/z3fold.c:341).
336 if (slots->slot[i]) {
337 is_free = false;
338 break;
339 }
340 }
341 write_unlock(>lock);  <== boom
342
343 if (is_free) {
344 struct z3fold_pool *pool = slots_to_pool(slots);
345
crash> z3fold_buddy_slots -x 99a3287b8780
struct z3fold_buddy_slots {
  slot = {0xdeadbeef, 0xdeadbeef, 0xdeadbeef, 0xdeadbeef},
  pool = 0x99a3146b8400,
  lock = {
rtmutex = {
  wait_lock = {
raw_lock = {
  {
val = {
  counter = 0x1
},
{
  locked = 0x1,
  pending = 0x0
},
{
  locked_pending = 0x1,
  tail = 0x0
}
  }
}
  },
  waiters = {
rb_root = {
  rb_node = 0x99a3287b8e00
},
rb_leftmost = 0x0
  },
  owner = 0x99a355c24500,
  save_state = 0x1
},
readers = {
  counter = 0x8000
}
  }
}

> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index 18feaa0bc537..efe9a012643d 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -544,12 +544,17 @@ static void __release_z3fold_page(struct z3fold_header 
> *zhdr, bool locked)
>   break;
>   }
>   }
> - if (!is_free)
> + if (!is_free) {
>   set_bit(HANDLES_ORPHANED, >slots->pool);
> - read_unlock(>slots->lock);
> -
> - if (is_free)
> + read_unlock(>slots->lock);
> + } else {
> + zhdr->slots->slot[0] =
> + zhdr->slots->slot[1] =
> + zhdr->slots->slot[2] =
> + zhdr->slots->slot[3] = 0xdeadbeef;
> + read_unlock(>slots->lock);
>   kmem_cache_free(pool->c_handle, zhdr->slots);
> + }
>
>   if (locked)
>   z3fold_page_unlock(zhdr);



Re: scheduling while atomic in z3fold

2020-12-06 Thread Mike Galbraith
On Thu, 2020-12-03 at 14:39 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-12-03 09:18:21 [+0100], Mike Galbraith wrote:
> > On Thu, 2020-12-03 at 03:16 +0100, Mike Galbraith wrote:
> > > On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> > > Looks like...
> > >
> > > d8f117abb380 z3fold: fix use-after-free when freeing handles
> > >
> > > ...wasn't completely effective...
> >
> > The top two hunks seem to have rendered the thing RT tolerant.
>
> Yes, it appears to. I have no idea if this is a proper fix or not.
> Without your write lock, after a few attempts, KASAN says:
>
> | BUG: KASAN: use-after-free in __pv_queued_spin_lock_slowpath+0x293/0x770
> | Write of size 2 at addr 88800e0e10aa by task kworker/u16:3/237

Things that make ya go hmmm...

I started poking at it thinking ok, given write_lock() fixes it, bad
juju must be happening under another read_lock(), so I poisoned the
structure about to be freed by __release_z3fold_page() under lock, and
opened a delay window for bad juju to materialize, but it didn't, so I
just poisoned instead, and sprinkled landmines all over the place.

My landmines are not triggering but the detonator is materializing
inside the structure containing the lock we explode on.  Somebody
blasts a recycled z3fold_buddy_slots into ram we were working on?

crash> bt -sx
PID: 11850  TASK: 933ee67f2280  CPU: 1   COMMAND: "swapoff"
 #0 [a7bf4e7f3900] machine_kexec+0x16e at 8405f86e
 #1 [a7bf4e7f3948] __crash_kexec+0xd2 at 8414c182
 #2 [a7bf4e7f3a08] crash_kexec+0x30 at 8414d370
 #3 [a7bf4e7f3a18] oops_end+0xca at 8402680a
 #4 [a7bf4e7f3a38] no_context+0x14d at 8406d7ed
 #5 [a7bf4e7f3a98] exc_page_fault+0x2b8 at 848bdb88
 #6 [a7bf4e7f3af0] asm_exc_page_fault+0x1e at 84a00ace
 #7 [a7bf4e7f3b78] mark_wakeup_next_waiter+0x51 at 840ea141
 #8 [a7bf4e7f3bd8] rt_mutex_futex_unlock+0x4f at 848c93ef
 #9 [a7bf4e7f3c18] z3fold_zpool_free+0x580 at c0f37680 [z3fold]
#10 [a7bf4e7f3c78] zswap_free_entry+0x43 at 8427c863
#11 [a7bf4e7f3c88] zswap_frontswap_invalidate_page+0x8a at 8427c96a
#12 [a7bf4e7f3cb0] __frontswap_invalidate_page+0x48 at 8427c058
#13 [a7bf4e7f3cd8] swapcache_free_entries+0x1ee at 84276f8e
#14 [a7bf4e7f3d20] free_swap_slot+0x9f at 8427b93f
#15 [a7bf4e7f3d40] delete_from_swap_cache+0x61 at 84274651
#16 [a7bf4e7f3d60] try_to_free_swap+0x70 at 84277550
#17 [a7bf4e7f3d70] unuse_vma+0x55c at 842786cc
#18 [a7bf4e7f3e90] try_to_unuse+0x139 at 84278eb9
#19 [a7bf4e7f3ee8] __x64_sys_swapoff+0x1eb at 842798fb
#20 [a7bf4e7f3f40] do_syscall_64+0x33 at 848b9ab3
#21 [a7bf4e7f3f50] entry_SYSCALL_64_after_hwframe+0x44 at 84a0007c
RIP: 7fdd9ce26d17  RSP: 7ffe99f98238  RFLAGS: 0202
RAX: ffda  RBX: 5625219e8b60  RCX: 7fdd9ce26d17
RDX: 0001  RSI: 0001  RDI: 5625219e8b60
RBP: 0001   R8: 7ffe99f982a0   R9: 0003
R10: 5625219e8721  R11: 0202  R12: 0001
R13:   R14: 7ffe99f982a0  R15: 
ORIG_RAX: 00a8  CS: 0033  SS: 002b
crash> bt -e
PID: 11850  TASK: 933ee67f2280  CPU: 1   COMMAND: "swapoff"

  KERNEL-MODE EXCEPTION FRAME AT: a7bf4e7f3950
[exception RIP: mark_wakeup_next_waiter+81]
RIP: 840ea141  RSP: a7bf4e7f3ba0  RFLAGS: 00010046
RAX:   RBX:   RCX: 0001
RDX: 0001  RSI: a7bf4e7f3bf0  RDI: 0001
RBP: 933e5a7baba8   R8: 933f502c2000   R9: 933e5a7babc0
R10: 0001  R11: 855938b8  R12: a7bf4e7f3be0
R13: a7bf4e7f3bf0  R14: 0018  R15: 933ee67f2280
ORIG_RAX:   CS: 0010  SS: 0018
...
crash> gdb list *z3fold_zpool_free+0x580
0xc0f37680 is in z3fold_zpool_free (mm/z3fold.c:338).
warning: Source file is more recent than executable.
333
334 /* we are freeing a foreign handle if we are here */
335 zhdr->foreign_handles--;
336 is_free = true;
337 if (!test_bit(HANDLES_ORPHANED, >pool)) {
338 write_unlock(>lock);
339 return;
340 }
341 for (i = 0; i <= BUDDY_MASK; i++) {
342 if (slots->slot[i]) {
crash> z3fold_buddy_slots -x 933e5a7bab80
struct z3fold_buddy_slots {
  slot = {0x0, 0x0, 0x0, 0x0},
  pool = 0x933e4b6d8e00,
  lock = {
rtmutex = {
  wait_lock = {
raw_lock = {
  {
val = {
  counter = 0x1

Re: scheduling while atomic in z3fold

2020-12-03 Thread Mike Galbraith
On Thu, 2020-12-03 at 03:16 +0100, Mike Galbraith wrote:
> On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> Looks like...
>
> d8f117abb380 z3fold: fix use-after-free when freeing handles
>
> ...wasn't completely effective...

The top two hunks seem to have rendered the thing RT tolerant.

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 18feaa0bc537..851d9f4f1644 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -537,7 +537,7 @@ static void __release_z3fold_page(struct z3fold_header 
*zhdr, bool locked)
spin_unlock(>lock);

/* If there are no foreign handles, free the handles array */
-   read_lock(>slots->lock);
+   write_lock(>slots->lock);
for (i = 0; i <= BUDDY_MASK; i++) {
if (zhdr->slots->slot[i]) {
is_free = false;
@@ -546,7 +546,7 @@ static void __release_z3fold_page(struct z3fold_header 
*zhdr, bool locked)
}
if (!is_free)
set_bit(HANDLES_ORPHANED, >slots->pool);
-   read_unlock(>slots->lock);
+   write_unlock(>slots->lock);

if (is_free)
kmem_cache_free(pool->c_handle, zhdr->slots);
@@ -642,14 +642,16 @@ static inline void add_to_unbuddied(struct z3fold_pool 
*pool,
 {
if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
zhdr->middle_chunks == 0) {
-   struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+   struct list_head *unbuddied;
int freechunks = num_free_chunks(zhdr);
+
+   migrate_disable();
+   unbuddied = this_cpu_ptr(pool->unbuddied);
spin_lock(>lock);
list_add(>buddy, [freechunks]);
spin_unlock(>lock);
zhdr->cpu = smp_processor_id();
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
}
 }

@@ -886,8 +888,9 @@ static inline struct z3fold_header *__z3fold_alloc(struct 
z3fold_pool *pool,
int chunks = size_to_chunks(size), i;

 lookup:
+   migrate_disable();
/* First, try to find an unbuddied z3fold page. */
-   unbuddied = get_cpu_ptr(pool->unbuddied);
+   unbuddied = this_cpu_ptr(pool->unbuddied);
for_each_unbuddied_list(i, chunks) {
struct list_head *l = [i];

@@ -905,7 +908,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct 
z3fold_pool *pool,
!z3fold_page_trylock(zhdr)) {
spin_unlock(>lock);
zhdr = NULL;
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
if (can_sleep)
cond_resched();
goto lookup;
@@ -919,7 +922,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct 
z3fold_pool *pool,
test_bit(PAGE_CLAIMED, >private)) {
z3fold_page_unlock(zhdr);
zhdr = NULL;
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
if (can_sleep)
cond_resched();
goto lookup;
@@ -934,7 +937,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct 
z3fold_pool *pool,
kref_get(>refcount);
break;
}
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();

if (!zhdr) {
int cpu;



Re: scheduling while atomic in z3fold

2020-12-02 Thread Mike Galbraith
On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-12-02 03:30:27 [+0100], Mike Galbraith wrote:
>
> > What I'm seeing is the below.  rt_mutex_has_waiters() says yup we have
> > a waiter, rt_mutex_top_waiter() emits the missing cached leftmost, and
> > rt_mutex_dequeue_pi() chokes on it.  Lock is buggered.
>
> correct. So this:
>
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -342,7 +342,7 @@ static inline void free_handle(unsigned long handle)
>
>   if (is_free) {
>   struct z3fold_pool *pool = slots_to_pool(slots);
> -
> + memset(slots, 0xee, sizeof(struct z3fold_buddy_slots));
>   kmem_cache_free(pool->c_handle, slots);
>   }
>  }
> @@ -548,8 +549,10 @@ static void __release_z3fold_page(struct z3fold_header 
> *zhdr, bool locked)
>   set_bit(HANDLES_ORPHANED, >slots->pool);
>   read_unlock(>slots->lock);
>
> - if (is_free)
> + if (is_free) {
> + memset(zhdr->slots, 0xdd, sizeof(struct z3fold_buddy_slots));
>   kmem_cache_free(pool->c_handle, zhdr->slots);
> + }
>
>   if (locked)
>   z3fold_page_unlock(zhdr);
>
> resulted in:
>
> |[  377.200696] Out of memory: Killed process 284358 (oom01) 
> total-vm:15780488kB, anon-rss:150624kB, file-rss:0kB, shmem-rss:0kB, UID:0 
> pgtables:16760kB oom_score_adj:0
> |[  377.205438] [ cut here ]
> |[  377.205441] pvqspinlock: lock 0x8880105c6828 has corrupted value 
> 0x!
> |[  377.205448] WARNING: CPU: 6 PID: 72 at 
> kernel/locking/qspinlock_paravirt.h:498 
> __pv_queued_spin_unlock_slowpath+0xb3/0xc0
> |[  377.205455] Modules linked in:
> |[  377.205456] CPU: 6 PID: 72 Comm: oom_reaper Not tainted 
> 5.10.0-rc6-rt13-rt+ #103
> |[  377.205458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.14.0-1 04/01/2014
> |[  377.205460] RIP: 0010:__pv_queued_spin_unlock_slowpath+0xb3/0xc0
> …
> |[  377.205475] Call Trace:
> |[  377.205477]  __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
> |[  377.205481]  .slowpath+0x9/0xe
> |[  377.205483]  _raw_spin_unlock_irqrestore+0x5/0x50
> |[  377.205486]  rt_mutex_futex_unlock+0x9e/0xb0
> |[  377.205488]  z3fold_free+0x2b0/0x470
> |[  377.205491]  zswap_free_entry+0x7d/0xc0
> |[  377.205493]  zswap_frontswap_invalidate_page+0x87/0x90
> |[  377.205495]  __frontswap_invalidate_page+0x58/0x90
> |[  377.205496]  swap_range_free.constprop.0+0x99/0xb0
> |[  377.205499]  swapcache_free_entries+0x131/0x390
> |[  377.205501]  free_swap_slot+0x99/0xc0
> |[  377.205502]  __swap_entry_free+0x8a/0xa0
> |[  377.205504]  free_swap_and_cache+0x36/0xd0
> |[  377.205506]  zap_pte_range+0x16a/0x940
> |[  377.205509]  unmap_page_range+0x1d8/0x310
> |[  377.205514]  __oom_reap_task_mm+0xe7/0x190
> |[  377.205520]  oom_reap_task_mm+0x5a/0x280
> |[  377.205521]  oom_reaper+0x98/0x1c0
> |[  377.205525]  kthread+0x18c/0x1b0
> |[  377.205528]  ret_from_fork+0x22/0x30
> |[  377.205531] ---[ end trace 0002 ]---
>
> Then I reverted commit
>4a3ac9311dac3 ("mm/z3fold.c: add inter-page compaction")
>
> and it seems to work now. Any suggestions? It looks like use-after-free.

Looks like...

d8f117abb380 z3fold: fix use-after-free when freeing handles

...wasn't completely effective.  write_unlock() in handle_free() is
where I see explosions.  Only trouble is that 4a3ac9311dac3 arrived in
5.5, and my 5.[5-7]-rt refuse to reproduce (d8f117abb380 applied),
whereas 5.9 and 5.10 do so quite reliably.  There is a heisenbug aspect
though, one trace_printk() in handle_free() made bug go hide, so the
heisen-fairies in earlier trees were probably just messing with me..
because they can, being the twisted little freaks they are ;-)

-Mike



Re: scheduling while atomic in z3fold

2020-12-01 Thread Mike Galbraith
On Mon, 2020-11-30 at 17:03 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> > On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > > How do you test this? I triggered a few oom-killer and I have here git
> > > gc running for a few hours now… Everything is fine.
> >
> > In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> > insanely, it explodes quite reliably here with either z3fold or
> > zsmalloc.. but not with zbud.
>
> This just passed. It however killed my git-gc task which wasn't done.
> Let me try tomorrow with your config.

What I'm seeing is the below.  rt_mutex_has_waiters() says yup we have
a waiter, rt_mutex_top_waiter() emits the missing cached leftmost, and
rt_mutex_dequeue_pi() chokes on it.  Lock is buggered.

[  894.376962] BUG: kernel NULL pointer dereference, address: 0018
[  894.377639] #PF: supervisor read access in kernel mode
[  894.378130] #PF: error_code(0x) - not-present page
[  894.378735] PGD 0 P4D 0
[  894.378974] Oops:  [#1] PREEMPT_RT SMP PTI
[  894.379384] CPU: 2 PID: 78 Comm: oom_reaper Kdump: loaded Tainted: G 
   E 5.9.11-rt20-rt #9
[  894.380253] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[  894.381352] RIP: 0010:mark_wakeup_next_waiter+0x51/0x150
[  894.381869] Code: 00 00 49 89 f5 e8 9f 1c 7c 00 48 8b 5d 10 48 85 db 74 0a 
48 3b 6b 38 0f 85 00 01 00 00 65 4c 8b 3c 25 c0 8d 01 00 4c 8d 73 18 <4c> 39 73 
18 0f 85 94 00 00 00 65 48 8b 3c 25 c0 8d 01 00 48 8b 87
[  894.383640] RSP: 0018:b792802cfb18 EFLAGS: 00010046
[  894.384135] RAX:  RBX:  RCX: 0001
[  894.384804] RDX: 0001 RSI: b792802cfb68 RDI: 0001
[  894.385473] RBP: 997b4e508628 R08: 997b39075000 R09: 997a47800db0
[  894.386134] R10:  R11: 8a58f4d8 R12: b792802cfb58
[  894.387030] R13: b792802cfb68 R14: 0018 R15: 997a7f1d3300
[  894.387715] FS:  () GS:997b77c8() 
knlGS:
[  894.388476] CS:  0010 DS:  ES:  CR0: 80050033
[  894.389209] CR2: 0018 CR3: 0001cc156006 CR4: 001706e0
[  894.389881] Call Trace:
[  894.390127]  rt_mutex_futex_unlock+0x4f/0x90
[  894.390547]  z3fold_zpool_free+0x539/0x5c0
[  894.390930]  zswap_free_entry+0x43/0x50
[  894.391193]  zswap_frontswap_invalidate_page+0x8a/0x90
[  894.391544]  __frontswap_invalidate_page+0x48/0x80
[  894.391875]  swapcache_free_entries+0x1ee/0x330
[  894.392189]  ? rt_mutex_futex_unlock+0x65/0x90
[  894.392502]  free_swap_slot+0xad/0xc0
[  894.392757]  __swap_entry_free+0x70/0x90
[  894.393046]  free_swap_and_cache+0x39/0xe0
[  894.393351]  unmap_page_range+0x5e1/0xb30
[  894.393646]  ? flush_tlb_mm_range+0xfb/0x170
[  894.393965]  __oom_reap_task_mm+0xb2/0x170
[  894.394254]  ? __switch_to+0x12a/0x520
[  894.394514]  oom_reaper+0x119/0x540
[  894.394756]  ? wait_woken+0xa0/0xa0
[  894.394997]  ? __oom_reap_task_mm+0x170/0x170
[  894.395297]  kthread+0x169/0x180
[  894.395535]  ? kthread_park+0x90/0x90
[  894.395867]  ret_from_fork+0x22/0x30
[  894.396252] Modules linked in: ebtable_filter(E) ebtables(E) uinput(E) 
fuse(E) rpcsec_gss_krb5(E) nfsv4(E) xt_comment(E) dns_resolver(E) nfs(E) 
nf_log_ipv6(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) nfs_ssc(E) 
fscache(E>
[  894.396280]  cryptd(E) glue_helper(E) pcspkr(E) nfsd(E) auth_rpcgss(E) 
nfs_acl(E) lockd(E) grace(E) sunrpc(E) sch_fq_codel(E) hid_generic(E) usbhid(E) 
ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) virtio_console(E) 
virtio_blk(E)>
[  894.406791] Dumping ftrace buffer:
[  894.407037](ftrace buffer empty)
[  894.407293] CR2: 0018

crash> gdb list *mark_wakeup_next_waiter+0x51
0x810e87e1 is in mark_wakeup_next_waiter (kernel/locking/rtmutex.c:362).
357 }
358
359 static void
360 rt_mutex_dequeue_pi(struct task_struct *task, struct rt_mutex_waiter 
*waiter)
361 {
362 if (RB_EMPTY_NODE(>pi_tree_entry))
363 return;
364
365 rb_erase_cached(>pi_tree_entry, >pi_waiters);
366 RB_CLEAR_NODE(>pi_tree_entry);

crash> rwlock_t -x 0x997b4e508628
struct rwlock_t {
  rtmutex = {
wait_lock = {
  raw_lock = {
{
  val = {
counter = 0x1
  },
  {
locked = 0x1,
pending = 0x0
  },
  {
locked_pending = 0x1,
tail = 0x0
  }
}
  }
},
waiters = {
  rb_root = {
rb_node = 0x997b4e508580
  },
  rb_leftmost = 0x0
},
owner = 0x997a7f1d3300,
save_state = 0x1
  },
  readers = {
counter = 0x8000
  }
}
crash> rb_root 0x997b4e508580
struct rb_root {
  rb_node = 0x0
}



Re: scheduling while atomic in z3fold

2020-11-30 Thread Mike Galbraith
On Mon, 2020-11-30 at 17:32 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 17:27:17 [+0100], Mike Galbraith wrote:
> > > This just passed. It however killed my git-gc task which wasn't done.
> > > Let me try tomorrow with your config.
> >
> > FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
> > so (as expected) it's not some devel tree oopsie.
>
> v5.9 has the same missing local-lock init due to merged local-lock
> upstream. I don't remember when this get_cpu_ptr() came it.

Looks like get_cpu_ptr() arrived in 4.14.

I patched up zswap/z3fold in my old 5.1-rt tree (old forward port of
4.19-rt), turned them on with a script, and it all worked fine, so the
RT zswap allergy lies somewhere upstream of there it seems.

If I were to start merging zswap and friends forward, I'd likely
eventually meet whatever RT is allergic to.  I might try that, but
won't be the least surprised if it turns into a god awful mess of
dependencies.

Hopping forward in rt branches first and patching them up will bracket
the little bugger a lot quicker.

-Mike



Re: scheduling while atomic in z3fold

2020-11-30 Thread Mike Galbraith
On Mon, 2020-11-30 at 17:27 +0100, Mike Galbraith wrote:
> On Mon, 2020-11-30 at 17:03 +0100, Sebastian Andrzej Siewior wrote:
> > On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> > > On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > > > How do you test this? I triggered a few oom-killer and I have here git
> > > > gc running for a few hours now… Everything is fine.
> > >
> > > In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> > > insanely, it explodes quite reliably here with either z3fold or
> > > zsmalloc.. but not with zbud.
> >
> > This just passed. It however killed my git-gc task which wasn't done.
> > Let me try tomorrow with your config.
>
> FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
> so (as expected) it's not some devel tree oopsie.

It reproduces in full distro KVM too.

-Mike



Re: scheduling while atomic in z3fold

2020-11-30 Thread Mike Galbraith
On Mon, 2020-11-30 at 17:32 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 17:27:17 [+0100], Mike Galbraith wrote:
> > > This just passed. It however killed my git-gc task which wasn't done.
> > > Let me try tomorrow with your config.
> >
> > FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
> > so (as expected) it's not some devel tree oopsie.
>
> v5.9 has the same missing local-lock init due to merged local-lock
> upstream. I don't remember when this get_cpu_ptr() came it.

This was with your test patch applied.

-Mike



Re: scheduling while atomic in z3fold

2020-11-30 Thread Mike Galbraith
On Mon, 2020-11-30 at 17:03 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> > On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > > How do you test this? I triggered a few oom-killer and I have here git
> > > gc running for a few hours now… Everything is fine.
> >
> > In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> > insanely, it explodes quite reliably here with either z3fold or
> > zsmalloc.. but not with zbud.
>
> This just passed. It however killed my git-gc task which wasn't done.
> Let me try tomorrow with your config.

FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
so (as expected) it's not some devel tree oopsie.

-Mike



Re: scheduling while atomic in z3fold

2020-11-30 Thread Mike Galbraith
On Mon, 2020-11-30 at 16:01 +0100, Mike Galbraith wrote:
> On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > How do you test this? I triggered a few oom-killer and I have here git
> > gc running for a few hours now… Everything is fine.
>
> In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> insanely, it explodes quite reliably here with either z3fold or
> zsmalloc.. but not with zbud.

My config attached in case it's relevant.


config.gz
Description: application/gzip


Re: scheduling while atomic in z3fold

2020-11-30 Thread Mike Galbraith
On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> How do you test this? I triggered a few oom-killer and I have here git
> gc running for a few hours now… Everything is fine.

In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
insanely, it explodes quite reliably here with either z3fold or
zsmalloc.. but not with zbud.

-Mike



Re: scheduling while atomic in z3fold

2020-11-30 Thread Mike Galbraith
On Mon, 2020-11-30 at 14:20 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-29 12:41:14 [+0100], Mike Galbraith wrote:
> > On Sun, 2020-11-29 at 12:29 +0100, Oleksandr Natalenko wrote:
> > >
> > > Ummm so do compressors explode under non-rt kernel in your tests as
> > > well, or it is just -rt that triggers this?
> >
> > I only tested a non-rt kernel with z3fold, which worked just fine.
>
> I tried this and it did not not explode yet. Mike, can you please
> confirm?

This explodes in write_unlock() as mine did.   Oleksandr's local_lock()
variant explodes in the lock he added.  (ew, corruption)

I think I'll try a stable-rt tree.  This master tree _should_ be fine
given it seems to work just peachy for everything else, but my box is
the only one making boom... and also NOT making boom with the zbud
compressor.  Hrmph.

crash.rt> bt -sx
PID: 27961  TASK: 8f6ad5344500  CPU: 4   COMMAND: "oom01"
 #0 [a439d4747708] machine_kexec+0x16e at bb05eace
 #1 [a439d4747750] __crash_kexec+0x5a at bb145e5a
 #2 [a439d4747810] crash_kexec+0x24 at bb147004
 #3 [a439d4747818] oops_end+0x93 at bb025c03
 #4 [a439d4747838] exc_page_fault+0x6b at bb9011bb
 #5 [a439d4747860] asm_exc_page_fault+0x1e at bba00ace
 #6 [a439d47478e8] mark_wakeup_next_waiter+0x51 at bb0e6781
 #7 [a439d4747950] rt_mutex_futex_unlock+0x50 at bb90bae0
 #8 [a439d4747990] z3fold_free+0x4a8 at bb2bf8a8
 #9 [a439d47479f0] zswap_free_entry+0x82 at bb285dd2
#10 [a439d4747a08] zswap_frontswap_invalidate_page+0x8c at bb285edc
#11 [a439d4747a30] __frontswap_invalidate_page+0x4e at bb28548e
#12 [a439d4747a58] swap_range_free.constprop.0+0x9e at bb27fd4e
#13 [a439d4747a78] swapcache_free_entries+0x10d at bb28101d
#14 [a439d4747ac0] free_swap_slot+0xac at bb284d8c
#15 [a439d4747ae0] __swap_entry_free+0x8f at bb2808ff
#16 [a439d4747b08] free_swap_and_cache+0x3b at bb2829db
#17 [a439d4747b38] zap_pte_range+0x164 at bb258004
#18 [a439d4747bc0] unmap_page_range+0x1dd at bb258b6d
#19 [a439d4747c38] __oom_reap_task_mm+0xd5 at bb2235c5
#20 [a439d4747d08] exit_mmap+0x154 at bb264084
#21 [a439d4747da0] mmput+0x4e at bb07e66e
#22 [a439d4747db0] exit_mm+0x172 at bb088372
#23 [a439d4747df0] do_exit+0x1a8 at bb088588
#24 [a439d4747e20] do_group_exit+0x39 at bb0888a9
#25 [a439d4747e48] get_signal+0x155 at bb096ef5
#26 [a439d4747e98] arch_do_signal+0x1a at bb0224ba
#27 [a439d4747f18] exit_to_user_mode_loop+0xc7 at bb11c037
#28 [a439d4747f38] exit_to_user_mode_prepare+0x6a at bb11c12a
#29 [a439d4747f48] irqentry_exit_to_user_mode+0x5 at bb901ae5
#30 [a439d4747f50] asm_exc_page_fault+0x1e at bba00ace
RIP: 00414300  RSP: 7f193033bec0  RFLAGS: 00010206
RAX: 1000  RBX: c000  RCX: a6e6a000
RDX: 7f19159a4000  RSI: c000  RDI: 
RBP: 7f186eb3a000   R8:    R9: 
R10: 0022  R11: 0246  R12: 1000
R13: 0001  R14: 0001  R15: 7ffd0ffb96d0
ORIG_RAX:   CS: 0033  SS: 002b

>



Re: scheduling while atomic in z3fold

2020-11-29 Thread Mike Galbraith
On Sun, 2020-11-29 at 12:29 +0100, Oleksandr Natalenko wrote:
>
> Ummm so do compressors explode under non-rt kernel in your tests as
> well, or it is just -rt that triggers this?

I only tested a non-rt kernel with z3fold, which worked just fine.

-Mike



Re: scheduling while atomic in z3fold

2020-11-29 Thread Mike Galbraith
On Sun, 2020-11-29 at 10:21 +0100, Mike Galbraith wrote:
> On Sun, 2020-11-29 at 08:48 +0100, Mike Galbraith wrote:
> > On Sun, 2020-11-29 at 07:41 +0100, Mike Galbraith wrote:
> > > On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
> > > >
> > > > > > Shouldn't the list manipulation be protected with
> > > > > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> > > >
> > > > Totally untested:
> > >
> > > Hrm, the thing doesn't seem to care deeply about preemption being
> > > disabled, so adding another lock may be overkill.  It looks like you
> > > could get the job done via migrate_disable()+this_cpu_ptr().
> >
> > There is however an ever so tiny chance that I'm wrong about that :)
>
> Or not, your local_lock+this_cpu_ptr version exploded too.
>
> Perhaps there's a bit of non-rt related racy racy going on in zswap
> thingy that makes swap an even less wonderful idea for RT than usual.

Raciness seems to be restricted to pool compressor.  "zbud" seems to be
solid, virgin "zsmalloc" explodes, as does "z3fold" regardless which of
us puts his grubby fingerprints on it.

Exploding compressors survived zero runs of runltp -f mm, I declared
zbud to be at least kinda sorta stable after box survived five runs.

-Mike



Re: scheduling while atomic in z3fold

2020-11-29 Thread Mike Galbraith
On Sun, 2020-11-29 at 08:48 +0100, Mike Galbraith wrote:
> On Sun, 2020-11-29 at 07:41 +0100, Mike Galbraith wrote:
> > On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
> > >
> > > > > Shouldn't the list manipulation be protected with
> > > > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> > >
> > > Totally untested:
> >
> > Hrm, the thing doesn't seem to care deeply about preemption being
> > disabled, so adding another lock may be overkill.  It looks like you
> > could get the job done via migrate_disable()+this_cpu_ptr().
>
> There is however an ever so tiny chance that I'm wrong about that :)

Or not, your local_lock+this_cpu_ptr version exploded too.

Perhaps there's a bit of non-rt related racy racy going on in zswap
thingy that makes swap an even less wonderful idea for RT than usual.

crash.rt> bt -s
PID: 32399  TASK: 8e4528cd8000  CPU: 4   COMMAND: "cc1"
 #0 [9f0f1228f488] machine_kexec+366 at 8c05f87e
 #1 [9f0f1228f4d0] __crash_kexec+210 at 8c14c052
 #2 [9f0f1228f590] crash_kexec+48 at 8c14d240
 #3 [9f0f1228f5a0] oops_end+202 at 8c02680a
 #4 [9f0f1228f5c0] exc_general_protection+403 at 8c8be413
 #5 [9f0f1228f650] asm_exc_general_protection+30 at 8ca00a0e
 #6 [9f0f1228f6d8] __z3fold_alloc+118 at 8c2b4ea6
 #7 [9f0f1228f760] z3fold_zpool_malloc+115 at 8c2b56c3
 #8 [9f0f1228f7c8] zswap_frontswap_store+789 at 8c27d335
 #9 [9f0f1228f828] __frontswap_store+110 at 8c27bafe
#10 [9f0f1228f858] swap_writepage+55 at 8c273b17
#11 [9f0f1228f870] shmem_writepage+612 at 8c232964
#12 [9f0f1228f8a8] pageout+210 at 8c225f12
#13 [9f0f1228f928] shrink_page_list+2428 at 8c22744c
#14 [9f0f1228f9c0] shrink_inactive_list+534 at 8c229746
#15 [9f0f1228fa68] shrink_lruvec+927 at 8c22a35f
#16 [9f0f1228fb78] shrink_node+567 at 8c22a7d7
#17 [9f0f1228fbf8] do_try_to_free_pages+185 at 8c22ad39
#18 [9f0f1228fc40] try_to_free_pages+201 at 8c22c239
#19 [9f0f1228fcd0] __alloc_pages_slowpath.constprop.111+1056 at 
8c26eb70
#20 [9f0f1228fda8] __alloc_pages_nodemask+786 at 8c26f7e2
#21 [9f0f1228fe00] alloc_pages_vma+309 at 8c288f15
#22 [9f0f1228fe40] handle_mm_fault+1687 at 8c24ee97
#23 [9f0f1228fef8] exc_page_fault+821 at 8c8c1be5
#24 [9f0f1228ff50] asm_exc_page_fault+30 at 8ca00ace
RIP: 010fea3b  RSP: 7ffc88ad5a50  RFLAGS: 00010206
RAX: 7f4a548d1638  RBX: 7f4a5c0c5000  RCX: 
RDX: 0002  RSI: 000f  RDI: 00018001
RBP: 7f4a5547a000   R8: 00018000   R9: 0024
R10: 7f4a5523a000  R11: 00098628  R12: 7f4a548d1638
R13: 7f4a54ab1478  R14: 5002ac29  R15: 
ORIG_RAX:   CS: 0033  SS: 002b
crash.rt>




Re: scheduling while atomic in z3fold

2020-11-28 Thread Mike Galbraith
On Sun, 2020-11-29 at 07:41 +0100, Mike Galbraith wrote:
> On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
> >
> > > > Shouldn't the list manipulation be protected with
> > > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> >
> > Totally untested:
>
> Hrm, the thing doesn't seem to care deeply about preemption being
> disabled, so adding another lock may be overkill.  It looks like you
> could get the job done via migrate_disable()+this_cpu_ptr().

There is however an ever so tiny chance that I'm wrong about that :)

crash.rt> bt -s
PID: 6699   TASK: 913c464b5640  CPU: 0   COMMAND: "oom01"
 #0 [b6b94adff6f0] machine_kexec+366 at bd05f87e
 #1 [b6b94adff738] __crash_kexec+210 at bd14c052
 #2 [b6b94adff7f8] crash_kexec+48 at bd14d240
 #3 [b6b94adff808] oops_end+202 at bd02680a
 #4 [b6b94adff828] no_context+333 at bd06d7ed
 #5 [b6b94adff888] exc_page_fault+696 at bd8c0b68
 #6 [b6b94adff8e0] asm_exc_page_fault+30 at bda00ace
 #7 [b6b94adff968] mark_wakeup_next_waiter+81 at bd0ea1e1
 #8 [b6b94adff9c8] rt_mutex_futex_unlock+79 at bd8cc3cf
 #9 [b6b94adffa08] z3fold_zpool_free+1319 at bd2b6b17
#10 [b6b94adffa68] zswap_free_entry+67 at bd27c6f3
#11 [b6b94adffa78] zswap_frontswap_invalidate_page+138 at bd27c7fa
#12 [b6b94adffaa0] __frontswap_invalidate_page+72 at bd27bee8
#13 [b6b94adffac8] swapcache_free_entries+494 at bd276e1e
#14 [b6b94adffb10] free_swap_slot+173 at bd27b7dd
#15 [b6b94adffb30] __swap_entry_free+112 at bd2768d0
#16 [b6b94adffb58] free_swap_and_cache+57 at bd278939
#17 [b6b94adffb80] unmap_page_range+1485 at bd24c52d
#18 [b6b94adffc40] __oom_reap_task_mm+178 at bd218f02
#19 [b6b94adffd10] exit_mmap+339 at bd257da3
#20 [b6b94adffdb0] mmput+78 at bd07fe7e
#21 [b6b94adffdc0] do_exit+822 at bd089bc6
#22 [b6b94adffe28] do_group_exit+71 at bd08a547
#23 [b6b94adffe50] get_signal+319 at bd0979ff
#24 [b6b94adffe98] arch_do_signal+30 at bd022cbe
#25 [b6b94adfff28] exit_to_user_mode_prepare+293 at bd1223e5
#26 [b6b94adfff48] irqentry_exit_to_user_mode+5 at bd8c1675
#27 [b6b94adfff50] asm_exc_page_fault+30 at bda00ace
RIP: 00414300  RSP: 7f5ddf065ec0  RFLAGS: 00010206
RAX: 1000  RBX: c000  RCX: adf28000
RDX: 7f5d0bf8d000  RSI: c000  RDI: 
RBP: 7f5c5e065000   R8:    R9: 
R10: 0022  R11: 0246  R12: 1000
R13: 0001  R14: 0001  R15: 7ffc953ebcd0
ORIG_RAX:   CS: 0033  SS: 002b
crash.rt>




Re: scheduling while atomic in z3fold

2020-11-28 Thread Mike Galbraith
On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
>
> > > Shouldn't the list manipulation be protected with
> > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
>
> Totally untested:

Hrm, the thing doesn't seem to care deeply about preemption being
disabled, so adding another lock may be overkill.  It looks like you
could get the job done via migrate_disable()+this_cpu_ptr().

In the case of the list in __z3fold_alloc(), after the unlocked peek,
it double checks under the existing lock.  There's not a whole lot of
difference between another cpu a few lines down in the same function
diddling the list and a preemption doing the same, so why bother?

Ditto add_to_unbuddied().  Flow decisions are being made based on zhdr-
>first/middle/last_chunks state prior to preemption being disabled as
the thing sits, so presumably their stability is not dependent thereon,
while the list is protected by the already existing lock, making the
preempt_disable() look more incidental than intentional.

Equally untested :)

---
 mm/z3fold.c |   17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -642,14 +642,16 @@ static inline void add_to_unbuddied(stru
 {
if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
zhdr->middle_chunks == 0) {
-   struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+   struct list_head *unbuddied;
int freechunks = num_free_chunks(zhdr);
+
+   migrate_disable();
+   unbuddied = this_cpu_ptr(pool->unbuddied);
spin_lock(>lock);
list_add(>buddy, [freechunks]);
spin_unlock(>lock);
zhdr->cpu = smp_processor_id();
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
}
 }

@@ -886,8 +888,9 @@ static inline struct z3fold_header *__z3
int chunks = size_to_chunks(size), i;

 lookup:
+   migrate_disable();
/* First, try to find an unbuddied z3fold page. */
-   unbuddied = get_cpu_ptr(pool->unbuddied);
+   unbuddied = this_cpu_ptr(pool->unbuddied);
for_each_unbuddied_list(i, chunks) {
struct list_head *l = [i];

@@ -905,7 +908,7 @@ static inline struct z3fold_header *__z3
!z3fold_page_trylock(zhdr)) {
spin_unlock(>lock);
zhdr = NULL;
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
if (can_sleep)
cond_resched();
goto lookup;
@@ -919,7 +922,7 @@ static inline struct z3fold_header *__z3
test_bit(PAGE_CLAIMED, >private)) {
z3fold_page_unlock(zhdr);
zhdr = NULL;
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
if (can_sleep)
cond_resched();
goto lookup;
@@ -934,7 +937,7 @@ static inline struct z3fold_header *__z3
kref_get(>refcount);
break;
}
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();

if (!zhdr) {
int cpu;



nouveau: WARNING: CPU: 0 PID: 20957 at drivers/gpu/drm/nouveau/nvif/vmm.c:71

2020-11-19 Thread Mike Galbraith
[15561.391527] [ cut here ]
[15561.391560] WARNING: CPU: 0 PID: 20957 at 
drivers/gpu/drm/nouveau/nvif/vmm.c:71 nvif_vmm_put+0x4a/0x50 [nouveau]
[15561.391562] Modules linked in: nls_utf8(E) isofs(E) fuse(E) msr(E) 
xt_comment(E) br_netfilter(E) xt_physdev(E) nfnetlink_cthelper(E) nfnetlink(E) 
ebtable_filter(E) ebtables(E) af_packet(E) bridge(E) stp(E) llc(E) 
iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) xt_pkttype(E) xt_tcpudp(E) 
ip6t_REJECT(E) nf_reject_ipv6(E) ipt_REJECT(E) nf_reject_ipv4(E) 
iptable_filter(E) bpfilter(E) ip6table_mangle(E) ip_tables(E) xt_conntrack(E) 
nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) hid_logitech_hidpp(E) sr_mod(E) 
usblp(E) cdrom(E) hid_logitech_dj(E) joydev(E) intel_rapl_msr(E) 
intel_rapl_common(E) at24(E) mei_hdcp(E) iTCO_wdt(E) regmap_i2c(E) 
intel_pmc_bxt(E) iTCO_vendor_support(E) snd_hda_codec_realtek(E) 
snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_codec_hdmi(E) 
x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_intel(E) coretemp(E) 
snd_intel_dspcfg(E) kvm_intel(E) snd_hda_codec(E) snd_hwdep(E) snd_hda_core(E) 
kvm(E)
[15561.391586]  nls_iso8859_1(E) nls_cp437(E) snd_pcm(E) irqbypass(E) 
crct10dif_pclmul(E) snd_timer(E) crc32_pclmul(E) r8169(E) 
ghash_clmulni_intel(E) snd(E) aesni_intel(E) realtek(E) crypto_simd(E) 
i2c_i801(E) mei_me(E) mdio_devres(E) cryptd(E) pcspkr(E) soundcore(E) 
i2c_smbus(E) lpc_ich(E) glue_helper(E) mfd_core(E) libphy(E) mei(E) fan(E) 
thermal(E) intel_smartconnect(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) 
grace(E) sch_fq_codel(E) sunrpc(E) nfs_ssc(E) uas(E) usb_storage(E) 
hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) 
syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) xhci_pci(E) cec(E) 
ahci(E) rc_core(E) ehci_pci(E) xhci_hcd(E) ttm(E) libahci(E) ehci_hcd(E) 
libata(E) drm(E) usbcore(E) video(E) button(E) sd_mod(E) t10_pi(E) vfat(E) 
fat(E) virtio_blk(E) virtio_mmio(E) virtio_ring(E) virtio(E) ext4(E) 
crc32c_intel(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) dm_multipath(E) 
dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E)
[15561.391626]  efivarfs(E) autofs4(E)
[15561.391637] CPU: 0 PID: 20957 Comm: kworker/0:4 Kdump: loaded Tainted: G S   
   E 5.10.0.g9c87c9f-master #3
[15561.391640] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[15561.391667] Workqueue: events nouveau_cli_work [nouveau]
[15561.391682] RIP: 0010:nvif_vmm_put+0x4a/0x50 [nouveau]
[15561.391684] Code: 00 00 00 48 89 e2 48 c7 04 24 00 00 00 00 48 89 44 24 08 
e8 48 e7 ff ff 85 c0 75 0e 48 c7 43 08 00 00 00 00 48 83 c4 10 5b c3 <0f> 0b eb 
ee 66 90 0f 1f 44 00 00 53 48 83 ec 18 83 fe 01 48 8b 5c
[15561.391686] RSP: :8881feca7e08 EFLAGS: 00010282
[15561.391688] RAX: fffe RBX: 8881feca7e28 RCX: 
[15561.391690] RDX: 0010 RSI: 8881feca7d80 RDI: 8881feca7e18
[15561.391692] RBP: 8881feca7e50 R08: 01dc5000 R09: 
[15561.391693] R10: 82003de8 R11: fefefefefefefeff R12: dead0122
[15561.391695] R13: dead0100 R14: 888102fa9328 R15: 888102fa9308
[15561.391697] FS:  () GS:88841ec0() 
knlGS:
[15561.391698] CS:  0010 DS:  ES:  CR0: 80050033
[15561.391700] CR2: 7fd692058000 CR3: 03c10002 CR4: 001706f0
[15561.391701] Call Trace:
[15561.391729]  nouveau_vma_del+0x58/0xa0 [nouveau]
[15561.391755]  nouveau_gem_object_delete_work+0x26/0x40 [nouveau]
[15561.391782]  nouveau_cli_work+0x76/0x120 [nouveau]
[15561.391786]  ? __schedule+0x35c/0x770
[15561.391790]  process_one_work+0x1f5/0x3c0
[15561.391792]  ? process_one_work+0x3c0/0x3c0
[15561.391794]  worker_thread+0x2d/0x3d0
[15561.391796]  ? process_one_work+0x3c0/0x3c0
[15561.391798]  kthread+0x117/0x130
[15561.391800]  ? kthread_park+0x90/0x90
[15561.391803]  ret_from_fork+0x1f/0x30
[15561.391806] ---[ end trace 1f8ba448e97e64e0 ]---



Re: [PATCH RT 1/5] net: Properly annotate the try-lock for the seqlock

2020-11-14 Thread Mike Galbraith
On Sat, 2020-11-14 at 13:24 -0600, Tom Zanussi wrote:
> On Sat, 2020-11-14 at 20:00 +0100, Mike Galbraith wrote:
>
> > __raw_write_seqcount_end() is an integral part of write_sequnlock(),
> > but we do seem to be missing a seqcount_release() in 5.4-rt.
> >
>
> Yep, you're right, it's just the missing seqcount_release() - I'll
> resubmit with just that.

Or just drop the backport, since it adds annotation, while the original
was fixing existing annotation.

__raw_write_seqcount_begin() called in 5.4-rt try_write_seqlock() is
not annotated, while write_seqcount_begin() called by the 5.9-rt
version leads to the broken annotation that the original then fixed.

-Mike



Re: [PATCH RT 1/5] net: Properly annotate the try-lock for the seqlock

2020-11-14 Thread Mike Galbraith
On Fri, 2020-11-13 at 14:14 -0600, Tom Zanussi wrote:
>
> This patch seems to fix it for me:

If there was any discussion about this patch, I missed it.

> From 4855377d0cb34b1b67a5c6d84cc8609c9da0bc3e Mon Sep 17 00:00:00 2001
> Message-Id: 
> <4855377d0cb34b1b67a5c6d84cc8609c9da0bc3e.1605297603.git.zanu...@kernel.org>
> From: Tom Zanussi 
> Date: Fri, 13 Nov 2020 13:04:15 -0600
> Subject: [PATCH] net: Add missing __raw_write_seqcount_end() and
>  seqcount_release()
>
> The patch ('net: Properly annotate the try-lock for the seqlock") adds
> __raw_write_seqcount_begin() in qdisc_run_begin() but omits the
> corresponding __raw_write_seqcount_end() and seqcount_release() in
> qdisc_run_end().
>
> Add it unconditionally, since qdisc_run_end() is never called unless
> qdisc_run_begin() succeeds, and if it succeeds,
> __raw_write_seqcount_begin() seqcount_acquire() will have been called.
>
> Signed-off-by: Tom Zanussi 
> ---
>  include/net/sch_generic.h | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 112d2dca8b08..c5ccce4f8f62 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -192,7 +192,11 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
>  static inline void qdisc_run_end(struct Qdisc *qdisc)
>  {
>  #ifdef CONFIG_PREEMPT_RT
> + seqcount_t *s = >running.seqcount;
> +
>   write_sequnlock(>running);
> + __raw_write_seqcount_end(s);
> + seqcount_release(>dep_map, 1, _RET_IP_);
>  #else
>   write_seqcount_end(>running);
>  #endif

__raw_write_seqcount_end() is an integral part of write_sequnlock(),
but we do seem to be missing a seqcount_release() in 5.4-rt.

-Mike



[tip: locking/urgent] futex: Handle transient "ownerless" rtmutex state correctly

2020-11-07 Thread tip-bot2 for Mike Galbraith
The following commit has been merged into the locking/urgent branch of tip:

Commit-ID: 9f5d1c336a10c0d24e83e40b4c1b9539f7dba627
Gitweb:
https://git.kernel.org/tip/9f5d1c336a10c0d24e83e40b4c1b9539f7dba627
Author:Mike Galbraith 
AuthorDate:Wed, 04 Nov 2020 16:12:44 +01:00
Committer: Thomas Gleixner 
CommitterDate: Sat, 07 Nov 2020 22:07:04 +01:00

futex: Handle transient "ownerless" rtmutex state correctly

Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:

Task Prio   Operation
T1   120lock(F)
T2   120lock(F)   -> blocks (top waiter)
T3   50 (RT)lock(F)   -> boosts T1 and blocks (new top waiter)
XX  timeout/  -> wakes T2
signal
T1   50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is 
set)
T2   120cleanup   -> try_to_take_mutex() fails because T3 is the top 
waiter
 and the lower priority T2 cannot steal the lock.
  -> fixup_pi_state_owner() sees newowner == NULL -> 
BUG_ON()

The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.

The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().

Drop the locks, so that T3 can make progress, and then try the fixup again.

Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.

[ tglx: Wrote comment and changelog ]

Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan 
Signed-off-by: Mike Galbraith 
Signed-off-by: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb@ni.com
Link: https://lore.kernel.org/r/87sg9pkvf7@nanos.tec.linutronix.de
---
 kernel/futex.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f8614ef..ac32887 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2380,10 +2380,22 @@ retry:
}
 
/*
-* Since we just failed the trylock; there must be an owner.
+* The trylock just failed, so either there is an owner or
+* there is a higher priority waiter than this one.
 */
newowner = rt_mutex_owner(_state->pi_mutex);
-   BUG_ON(!newowner);
+   /*
+* If the higher priority waiter has not yet taken over the
+* rtmutex then newowner is NULL. We can't return here with
+* that state because it's inconsistent vs. the user space
+* state. So drop the locks and try again. It's a valid
+* situation and not any different from the other retry
+* conditions.
+*/
+   if (unlikely(!newowner)) {
+   err = -EAGAIN;
+   goto handle_err;
+   }
} else {
WARN_ON_ONCE(argowner != current);
if (oldowner == current) {


Re: [tip: locking/urgent] futex: Handle transient "ownerless" rtmutex state correctly

2020-11-07 Thread Mike Galbraith
On Fri, 2020-11-06 at 21:26 +, tip-bot2 for Mike Galbraith wrote:
>
> ---
>  kernel/futex.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index f8614ef..7406914 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2380,10 +2380,22 @@ retry:
>   }
>
>   /*
> -  * Since we just failed the trylock; there must be an owner.
> +  * The trylock just failed, so either there is an owner or
> +  * there is a higher priority waiter than this one.
>*/
>   newowner = rt_mutex_owner(_state->pi_mutex);
> - BUG_ON(!newowner);
> + /*
> +  * If the higher priority waiter has not yet taken over the
> +  * rtmutex then newowner is NULL. We can't return here with
> +  * that state because it's inconsistent vs. the user space
> +  * state. So drop the locks and try again. It's a valid
> +  * situation and not any different from the other retry
> +  * conditions.
> +  */
> + if (unlikely(!newowner)) {
> + ret = -EAGAIN;
^^^

My box just discovered an unnoticed typo.  That 'ret' should read 'err'
so we goto retry, else fbomb_v2 proggy will trigger gripeage.

[   44.089233] fuse: init (API version 7.32)
[   78.485163] [ cut here ]
[   78.485171] WARNING: CPU: 1 PID: 4557 at kernel/futex.c:2482 
fixup_pi_state_owner.isra.17+0x125/0x350
[   78.485171] [ cut here ]
[   78.485174] WARNING: CPU: 2 PID: 4559 at kernel/futex.c:1486 
do_futex+0x920/0xaf0


-Mike



[tip: locking/urgent] futex: Handle transient "ownerless" rtmutex state correctly

2020-11-06 Thread tip-bot2 for Mike Galbraith
The following commit has been merged into the locking/urgent branch of tip:

Commit-ID: 63c1b4db662a0967dd7839a2fbaa5300e553901d
Gitweb:
https://git.kernel.org/tip/63c1b4db662a0967dd7839a2fbaa5300e553901d
Author:Mike Galbraith 
AuthorDate:Wed, 04 Nov 2020 16:12:44 +01:00
Committer: Thomas Gleixner 
CommitterDate: Fri, 06 Nov 2020 22:24:58 +01:00

futex: Handle transient "ownerless" rtmutex state correctly

Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:

Task Prio   Operation
T1   120lock(F)
T2   120lock(F)   -> blocks (top waiter)
T3   50 (RT)lock(F)   -> boosts T1 and blocks (new top waiter)
XX  timeout/  -> wakes T2
signal
T1   50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is 
set)
T2   120cleanup   -> try_to_take_mutex() fails because T3 is the top 
waiter
 and the lower priority T2 cannot steal the lock.
  -> fixup_pi_state_owner() sees newowner == NULL -> 
BUG_ON()

The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.

The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().

Drop the locks, so that T3 can make progress, and then try the fixup again.

Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above. 

[ tglx: Wrote comment and changelog ]

Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan 
Signed-off-by: Mike Galbraith 
Signed-off-by: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb@ni.com
Link: https://lore.kernel.org/r/87sg9pkvf7@nanos.tec.linutronix.de

---
 kernel/futex.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f8614ef..7406914 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2380,10 +2380,22 @@ retry:
}
 
/*
-* Since we just failed the trylock; there must be an owner.
+* The trylock just failed, so either there is an owner or
+* there is a higher priority waiter than this one.
 */
newowner = rt_mutex_owner(_state->pi_mutex);
-   BUG_ON(!newowner);
+   /*
+* If the higher priority waiter has not yet taken over the
+* rtmutex then newowner is NULL. We can't return here with
+* that state because it's inconsistent vs. the user space
+* state. So drop the locks and try again. It's a valid
+* situation and not any different from the other retry
+* conditions.
+*/
+   if (unlikely(!newowner)) {
+   ret = -EAGAIN;
+   goto handle_err;
+   }
} else {
WARN_ON_ONCE(argowner != current);
if (oldowner == current) {


Re: v5.8+ powersave governor breakage?

2020-11-05 Thread Mike Galbraith
On Thu, 2020-11-05 at 19:02 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 5, 2020 4:08:30 PM CET Mike Galbraith wrote:
> > On Thu, 2020-11-05 at 15:31 +0100, Rafael J. Wysocki wrote:
> > > On Monday, November 2, 2020 7:18:41 AM CET Mike Galbraith wrote:
> > >
> > > > Desktop box did, it gained a working ondemand, while its previously
> > > > working powersave went broke.
> > >
> > > Most likely that's because it was handled by intel_pstate in the "active" 
> > > mode
> > > previously, while it is now handled by it in the "passive" mode...
> >
> > Perhaps the user interface should then nak switching to powersave as it
> > used to nak switching to ondemand?
>
> It cannot do that if the powersave governor is configured in.
>
> [Essentially, the problem is that the "powersave" thing advertised by
> intel_pstate in the "active" mode is not really the powersave governor,
> but that is a mistake made in the past and cannot be undone.  Sorry about
> that.]

Hohum.  A little unfortunate, but it probably only affects a few aging
boxen like mine, and I now know better that to ever again do that.



Re: v5.8+ powersave governor breakage?

2020-11-05 Thread Mike Galbraith
On Thu, 2020-11-05 at 15:31 +0100, Rafael J. Wysocki wrote:
> On Monday, November 2, 2020 7:18:41 AM CET Mike Galbraith wrote:
>
> > Desktop box did, it gained a working ondemand, while its previously
> > working powersave went broke.
>
> Most likely that's because it was handled by intel_pstate in the "active" mode
> previously, while it is now handled by it in the "passive" mode...

Perhaps the user interface should then nak switching to powersave as it
used to nak switching to ondemand?

-Mike



Re: [PATCH] futex: Handle transient "ownerless" rtmutex state correctly

2020-11-04 Thread Mike Galbraith
On Wed, 2020-11-04 at 16:12 +0100, Thomas Gleixner wrote:
> From: Mike Galbraith 

Hrmph.  How about a suggested-by, or just take it.  That dinky diag hit
the mark (not _entirely_ by accident, but..;) is irrelevant, you did
all of the work to make it a patch.

-Mike

> Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
> This is one possible chain of events leading to this:
>
> Task Prio   Operation
> T1   120  lock(F)
> T2   120  lock(F)   -> blocks (top waiter)
> T3   50 (RT)  lock(F)   -> boosts T3 and blocks (new top waiter)
> XXtimeout/  -> wakes T2
>   signal
> T1   50   unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter 
> bit is set)
> T2   120  cleanup   -> try_to_take_mutex() fails because T3 is the top 
> waiter
>and the lower priority T2 cannot steal the lock.
> -> fixup_pi_state_owner() sees newowner == NULL -> 
> BUG_ON()
>
> The comment states that this is invalid and rt_mutex_real_owner() must
> return a non NULL owner when the trylock failed, but in case of a queued
> and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
> state. The higher priority waiter has simply not yet managed to take over
> the rtmutex.
>
> The BUG_ON() is therefore wrong and this is just another retry condition in
> fixup_pi_state_owner().
>
> Drop the locks, so that T3 can make progress, and then try the fixup again.
>
> Gratian provided a great analysis, traces and a reproducer. The analysis is
> to the point, but it confused the hell out of that tglx dude who had to
> page in all the futex horrors again. Condensed version is above.
>
> [ tglx: Wrote comment and changelog ]
>
> Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
> Reported-by: Gratian Crisan 
> Signed-off-by: Mike Galbraith 
> Signed-off-by: Thomas Gleixner 
> Cc: sta...@vger.kernel.org
> Link: https://lore.kernel.org/r/87a6w6x7bb@ni.com
> ---
>  kernel/futex.c |   16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2380,10 +2380,22 @@ static int fixup_pi_state_owner(u32 __us
>   }
>
>   /*
> -  * Since we just failed the trylock; there must be an owner.
> +  * The trylock just failed, so either there is an owner or
> +  * there is a higher priority waiter than this one.
>*/
>   newowner = rt_mutex_owner(_state->pi_mutex);
> - BUG_ON(!newowner);
> + /*
> +  * If the higher priority waiter has not yet taken over the
> +  * rtmutex then newowner is NULL. We can't return here with
> +  * that state because it's inconsistent vs. the user space
> +  * state. So drop the locks and try again. It's a valid
> +  * situation and not any different from the other retry
> +  * conditions.
> +  */
> + if (unlikely(!newowner)) {
> + ret = -EAGAIN;
> + goto handle_err;
> + }
>   } else {
>   WARN_ON_ONCE(argowner != current);
>   if (oldowner == current) {



  1   2   3   4   5   6   7   8   9   10   >