Re: [Nouveau] dri, nouveau: BUG: KASAN: use-after-free in dma_fence_signal_timestamp_locked+0x399/0x430

2021-08-28 Thread Mike Galbraith
On Sat, 2021-08-28 at 11:38 +0200, Mike Galbraith wrote:
> Enabling kasan or kcsan in my GTX-980 equipped box will in fairly short
> order...

Correction: kasan does NOT reproduce on demand.  My bottom line remains
the same though, before enabling, either fix it, or evict it, lest it
take testing center stage ala "Hey, over here, me me fix me" :)

-Mike


[Nouveau] dri, nouveau: BUG: KASAN: use-after-free in dma_fence_signal_timestamp_locked+0x399/0x430

2021-08-28 Thread Mike Galbraith
Enabling kasan or kcsan in my GTX-980 equipped box will in fairly short
order result in emission of a use-after-free detection gripe (no access
assert in kcsan case.. same same), immediately followed by a small
mushroom cloud as the kernel attempts to access the twilight zone.

The below (brought to you by me forgetting to boot nomodeset despite
knowing full well that nouveau WILL muck up any testing with either of
these tools:) is x86-tip, with lockdep and kasan enabled.  Branch isn't
really irrelevant, it explodes just as readily in master.

[  604.071721] 
==
[  604.072204] BUG: KASAN: use-after-free in 
dma_fence_signal_timestamp_locked+0x399/0x430
[  604.072269] Read of size 8 at addr 8881fffa0b28 by task swapper/1/0
[  604.072330] 
[  604.072351] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G   
 E 5.14.0.g29fb75d-tip_debug #19
[  604.072439] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[  604.072502] Call Trace:
[  604.072530]  
[  604.072563]  dump_stack_lvl+0x45/0x59
[  604.072605]  print_address_description.constprop.0+0x1f/0x140
[  604.072650]  ? dma_fence_signal_timestamp_locked+0x399/0x430
[  604.072708]  kasan_report.cold+0x83/0xdf
[  604.072761]  ? dma_fence_signal_timestamp_locked+0x399/0x430
[  604.072820]  dma_fence_signal_timestamp_locked+0x399/0x430
[  604.072865]  ? perf_trace_dma_fence+0x940/0x940
[  604.072916]  ? ktime_get+0x64/0x160
[  604.072955]  ? ktime_get+0x99/0x160
[  604.072981]  nouveau_fence_signal+0x11/0x210 [nouveau]
[  604.073161]  nouveau_fence_wait_uevent_handler+0x116/0x220 [nouveau]
[  604.07]  ? __lock_release+0xec/0x4e0
[  604.073367]  nvif_notify+0x276/0x4f0 [nouveau]
[  604.073490]  ? nvif_notify_get+0x170/0x170 [nouveau]
[  604.073623]  ? nvkm_notify_send+0x195/0x510 [nouveau]
[  604.073760]  ? do_raw_spin_unlock+0x55/0x1f0
[  604.073814]  nvkm_notify_send+0x238/0x510 [nouveau]
[  604.073903]  ? do_raw_spin_unlock+0x55/0x1f0
[  604.073934]  nvkm_event_send+0x1e3/0x2d0 [nouveau]
[  604.074069]  ? validate_chain+0x124/0xd50
[  604.074096]  nvkm_fifo_uevent+0x60/0x70 [nouveau]
[  604.074257]  ? nvkm_fifo_cevent+0x20/0x20 [nouveau]
[  604.074367]  ? check_prev_add+0x20c0/0x20c0
[  604.074409]  ? mark_lock+0xc3/0xac0
[  604.074448]  gk104_fifo_intr+0x627/0x960 [nouveau]
[  604.074585]  nvkm_mc_intr+0x407/0x5e0 [nouveau]
[  604.074715]  ? __lock_acquire+0xad9/0x17b0
[  604.074765]  nvkm_pci_intr+0x12b/0x190 [nouveau]
[  604.074912]  ? nvkm_pci_init+0x1d0/0x1d0 [nouveau]
[  604.075076]  ? nvkm_pci_init+0x1d0/0x1d0 [nouveau]
[  604.075202]  __handle_irq_event_percpu+0x24a/0x640
[  604.075240]  handle_irq_event+0xef/0x230
[  604.075285]  ? handle_irq_event_percpu+0x100/0x100
[  604.075348]  handle_edge_irq+0x20d/0xb70
[  604.075408]  __common_interrupt+0x94/0x1e0
[  604.075459]  common_interrupt+0x9f/0xd0
[  604.075503]  
[  604.075533]  asm_common_interrupt+0x1e/0x40
[  604.075576] RIP: 0010:cpuidle_enter_state+0x1f8/0x8d0
[  604.075629] Code: 00 41 8b 77 04 bf ff ff ff ff e8 43 ef ff ff 31 ff e8 0c 
15 fe fe 80 7c 24 08 00 0f 85 9e 01 00 00 e8 bc aa 22 ff fb 45 85 e4 <0f> 88 8c 
02 00 00 49 63 ec 48 8d 44 6d 00 48 8d 44 85 00 48 8d 7c
[  604.075781] RSP: 0018:8881009bfdc8 EFLAGS: 0206
[  604.075835] RAX: 00701531 RBX: 83a34520 RCX: 1078ba21
[  604.075899] RDX:  RSI: 82e83020 RDI: 82fa1660
[  604.075962] RBP: 0003 R08: 0001 R09: 83c5f617
[  604.076025] R10: fbfff078bec2 R11: 0001 R12: 0003
[  604.076088] R13: 8883ce8c564c R14: 008ca56ecfa2 R15: 8883ce8c5648
[  604.076179]  ? cpuidle_enter_state+0x1f4/0x8d0
[  604.076238]  cpuidle_enter+0x4a/0xa0
[  604.076283]  cpuidle_idle_call+0x255/0x3c0
[  604.076328]  ? arch_cpu_idle_exit+0x40/0x40
[  604.076372]  ? tsc_verify_tsc_adjust+0x9c/0x2e0
[  604.076418]  ? lockdep_hardirqs_off+0x90/0xd0
[  604.076472]  do_idle+0xd7/0x140
[  604.076513]  cpu_startup_entry+0x19/0x20
[  604.076554]  start_secondary+0x250/0x2f0
[  604.076598]  ? set_cpu_sibling_map+0x1c20/0x1c20
[  604.076657]  secondary_startup_64_no_verify+0xb0/0xbb
[  604.076742] 
[  604.076762] Allocated by task 2004:
[  604.076796]  kasan_save_stack+0x1b/0x40
[  604.076836]  __kasan_kmalloc+0x7c/0x90
[  604.076873]  nouveau_gem_object_close+0x300/0x7f0 [nouveau]
[  604.077060]  drm_gem_object_release_handle+0x69/0xf0 [drm]
[  604.077171]  drm_gem_handle_delete+0x5b/0xa0 [drm]
[  604.077260]  drm_ioctl_kernel+0x1a7/0x240 [drm]
[  604.077349]  drm_ioctl+0x400/0x8b0 [drm]
[  604.077453]  nouveau_drm_ioctl+0xec/0x230 [nouveau]
[  604.077630]  __x64_sys_ioctl+0x11c/0x170
[  604.077671]  do_syscall_64+0x38/0x90
[  604.077707]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  604.077754] 
[  604.02] Freed by task 4941:
[  604.077803]  kasan_save_stack+0x1b/0x40
[  604.077840]  kasan_set_track+0x1c/0x30
[  604.077877]  

[Nouveau] kcsan+slub+nouveau+threadirqs --> kaBoOm

2021-08-19 Thread Mike Galbraith
Greetings,

I had thought SLAB_FREELIST_HARDENED=y was also a required explosion
ingredient, but turns out it's not, as I got a NULL pointer explosion
(below first explosion) as I was composing this with it turned off.

There are various stack traces w. SLAB_FREELIST_HARDENED enabled, all
ending in an allocation exploding in slub::freelist_ptr(). I first met
this in RT after twiddling KCSAN to make it usable in RT kernels, but
it's not RT related, as shown by the virgin master explosions below,
you just have to add threadirqs to make virgin source explode.

Below the second explosion is an interesting looking kcsan use after
free assertion.  It came from a tip-rt kernel, but that _seems_ to be
irrelevant.  It was emitted immediately before a SLAB_FREELIST_HARDENED
slub::freelist_ptr() explosion common to virgin master, master-rt, tip
and tip-rt.. iow everywhere.  Many kcsan grumbles are common ground
too, so I have no reason to suspect it of being any more inventive at
sending me on snipe hunts in RT kernels than it is in virgin source.
  
SLAB_FREELIST_HARDENED=y
[ 3404.198096] general protection fault, probably for non-canonical address 
0xf6a2bc6e32c35c19:  [#1] PREEMPT SMP NOPTI
[ 3404.198172] CPU: 1 PID: 2068 Comm: X Kdump: loaded Tainted: GE   
  5.14.0.gd6d09a6-master-kcsan #7
[ 3404.198271] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[ 3404.198336] RIP: 0010:__kmalloc+0xa9/0x3e0
[ 3404.198360] Code: 48 8b 70 08 48 39 f2 75 e7 48 83 78 10 00 4c 8b 20 0f 84 
f3 02 00 00 4d 85 e4 0f 84 ea 02 00 00 41 8b 46 28 49 8b 3e 4c 01 e0 <48> 8b 18 
48 89 c1 49 33 9e b8 00 00 00 4c 89 e0 48 0f c9 48 31 cb
[ 3404.198479] RSP: 0018:888103d5b840 EFLAGS: 00010282
[ 3404.198494] RAX: f6a2bc6e32c35c19 RBX: 8881021ac170 RCX: 000308e0
[ 3404.198543] RDX: 0076f479 RSI: 0076f479 RDI: 000308e0
[ 3404.198559] RBP: 0cc0 R08:  R09: 00018882c1e255b0
[ 3404.198636] R10: 0001 R11: 00018882c1e255b7 R12: f6a2bc6e32c35be9
[ 3404.198723] R13: 888103d5b930 R14: 888100042600 R15: a06b6b7a
[ 3404.198739] FS:  7f11832b36c0() GS:88840ec4() 
knlGS:
[ 3404.198782] CS:  0010 DS:  ES:  CR0: 80050033
[ 3404.198818] CR2: 7f1113f68000 CR3: 00012380e006 CR4: 001706e0
[ 3404.198902] Call Trace:
[ 3404.198914]  nvif_object_ctor+0xca/0x2c0 [nouveau]
[ 3404.201154]  ? nvkm_memory_unref+0x35/0x60 [nouveau]
[ 3404.202793]  ? nvkm_uvmm_mthd_map.isra.0+0x1e5/0x370 [nouveau]
[ 3404.204822]  nvif_mem_ctor_type+0x11b/0x1f0 [nouveau]
[ 3404.206816]  ? wq_calc_node_cpumask+0xd0/0x180
[ 3404.206833]  ? sugov_update_single_freq+0x62/0x180
[ 3404.206861]  ? copyout+0x6e/0x80
[ 3404.206873]  ? pollwake+0x2a/0xf0
[ 3404.206900]  nouveau_mem_vram+0x14f/0x270 [nouveau]
[ 3404.208761]  nouveau_vram_manager_new+0x108/0x140 [nouveau]
[ 3404.210775]  ? dma_resv_reserve_shared+0x21e/0x2b0
[ 3404.210865]  ttm_resource_alloc+0x70/0x80 [ttm]
[ 3404.210962]  ttm_bo_mem_space+0xfc/0x400 [ttm]
[ 3404.211089]  ? ttm_bo_mem_compat+0x81/0xb0 [ttm]
[ 3404.211317]  ttm_bo_validate+0xa9/0x1d0 [ttm]
[ 3404.211382]  ? _raw_write_unlock+0x1b/0x30
[ 3404.211477]  ? drm_vma_offset_add+0x3b/0x70 [drm]
[ 3404.212429]  ttm_bo_init_reserved+0x300/0x3c0 [ttm]
[ 3404.212568]  ttm_bo_init+0x92/0x140 [ttm]
[ 3404.212682]  ? nouveau_ttm_io_mem_free+0x90/0x90 [nouveau]
[ 3404.214454]  nouveau_bo_init+0x90/0xa0 [nouveau]
[ 3404.216180]  ? nouveau_ttm_io_mem_free+0x90/0x90 [nouveau]
[ 3404.218047]  nouveau_gem_new+0xe9/0x190 [nouveau]
[ 3404.219953]  nouveau_gem_ioctl_new+0xaa/0x150 [nouveau]
[ 3404.222151]  ? nouveau_gem_new+0x190/0x190 [nouveau]
[ 3404.224316]  drm_ioctl_kernel+0xd7/0x130 [drm]
[ 3404.224998]  ? nouveau_gem_new+0x190/0x190 [nouveau]
[ 3404.226734]  drm_ioctl+0x28c/0x4a0 [drm]
[ 3404.227516]  ? __fdget+0xf/0x10
[ 3404.227562]  ? __rcu_read_unlock+0x53/0x70
[ 3404.227608]  nouveau_drm_ioctl+0x8a/0x100 [nouveau]
[ 3404.229971]  __x64_sys_ioctl+0xb2/0xd0
[ 3404.229985]  do_syscall_64+0x36/0x80
[ 3404.230118]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3404.230159] RIP: 0033:0x7f1180b5a807
[ 3404.230170] Code: b3 66 90 48 8b 05 89 76 2d 00 64 c7 00 26 00 00 00 48 c7 
c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 59 76 2d 00 f7 d8 64 89 01 48
[ 3404.230272] RSP: 002b:7ffca7b71fe8 EFLAGS: 0246 ORIG_RAX: 
0010
[ 3404.230289] RAX: ffda RBX: 55abce254dc0 RCX: 7f1180b5a807
[ 3404.230374] RDX: 7ffca7b72040 RSI: c0306480 RDI: 000e
[ 3404.230388] RBP: 7ffca7b72040 R08:  R09: 000c
[ 3404.230454] R10: 0030 R11: 0246 R12: c0306480
[ 3404.230518] R13: 000e R14: 55abce27d9d0 R15: 55abcd76caf0
[ 3404.230535] Modules linked in: af_packet(E) ip6table_mangle(E) 
ip6table_raw(E) iptable_raw(E) 

Re: [Nouveau] drm/nouveau: lockdep circular locking dependency report

2021-06-27 Thread Mike Galbraith
I've now applied a revert of 551620f2a3816397266dfd812cd8b3be89f14be4
to all trees where lockdep may be enabled to re-hide the inversion.  It
thus won't every remind me of its existence, thus I won't be inspired
to pass that reminder along.

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] drm/nouveau: lockdep circular locking dependency report

2021-06-27 Thread Mike Galbraith
Having forgotten to boot nomodeset when running lockdep enabled
kernels, I was reminded that this gripe is still alive and well.

Graphics card is same old GTX-980 in same old box as last report.  It's
harmless other than mucking up testing, but since it reminded me again,
I'll pass it along again.

[   29.130076] ==
[   29.130079] WARNING: possible circular locking dependency detected
[   29.130081] 5.13.0.g625acff-master #4 Tainted: GE
[   29.130084] --
[   29.130087] X/2064 is trying to acquire lock:
[   29.130089] 888120a54518 (>mutex){+.+.}-{3:3}, at: 
nouveau_bo_move+0x11c/0x830 [nouveau]
[   29.130160]
   but task is already holding lock:
[   29.130162] 888100912da0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
nouveau_bo_pin+0x2b/0x320 [nouveau]
[   29.130217]
   which lock already depends on the new lock.

[   29.130220]
   the existing dependency chain (in reverse order) is:
[   29.130223]
   -> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
[   29.130227]lock_acquire+0x258/0x2f0
[   29.130232]__ww_mutex_lock.constprop.17+0xbe/0x1090
[   29.130237]nouveau_bo_pin+0x2b/0x320 [nouveau]
[   29.130285]nouveau_channel_prep+0x106/0x2e0 [nouveau]
[   29.130328]nouveau_channel_new+0x4f/0x760 [nouveau]
[   29.130369]nouveau_abi16_ioctl_channel_alloc+0xdf/0x350 [nouveau]
[   29.130407]drm_ioctl_kernel+0x8f/0xe0 [drm]
[   29.130430]drm_ioctl+0x2db/0x380 [drm]
[   29.130446]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
[   29.130495]__x64_sys_ioctl+0x73/0xb0
[   29.130499]do_syscall_64+0x39/0x80
[   29.130502]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   29.130506]
   -> #0 (>mutex){+.+.}-{3:3}:
[   29.130510]validate_chain+0xbb8/0x1740
[   29.130514]__lock_acquire+0x8ab/0xc20
[   29.130516]lock_acquire+0x258/0x2f0
[   29.130520]__mutex_lock+0x95/0x9b0
[   29.130523]nouveau_bo_move+0x11c/0x830 [nouveau]
[   29.130571]ttm_bo_handle_move_mem+0x76/0x130 [ttm]
[   29.130576]ttm_bo_validate+0x156/0x1b0 [ttm]
[   29.130581]nouveau_bo_validate+0x48/0x70 [nouveau]
[   29.130628]nouveau_bo_pin+0x1ec/0x320 [nouveau]
[   29.130673]nv50_wndw_prepare_fb+0x53/0x4d0 [nouveau]
[   29.130715]drm_atomic_helper_prepare_planes+0x87/0x110 
[drm_kms_helper]
[   29.130731]nv50_disp_atomic_commit+0xa9/0x1b0 [nouveau]
[   29.130776]drm_atomic_helper_update_plane+0x10a/0x150 
[drm_kms_helper]
[   29.130788]drm_mode_cursor_universal+0x10b/0x220 [drm]
[   29.130810]drm_mode_cursor_common+0x190/0x200 [drm]
[   29.130828]drm_mode_cursor_ioctl+0x3d/0x50 [drm]
[   29.130845]drm_ioctl_kernel+0x8f/0xe0 [drm]
[   29.130870]drm_ioctl+0x2db/0x380 [drm]
[   29.130884]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
[   29.130932]__x64_sys_ioctl+0x73/0xb0
[   29.130936]do_syscall_64+0x39/0x80
[   29.130939]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   29.130943]
   other info that might help us debug this:

[   29.130947]  Possible unsafe locking scenario:

[   29.130950]CPU0CPU1
[   29.130952]
[   29.130955]   lock(reservation_ww_class_mutex);
[   29.130958]lock(>mutex);
[   29.130961]lock(reservation_ww_class_mutex);
[   29.130965]   lock(>mutex);
[   29.130967]
*** DEADLOCK ***

[   29.130970] 3 locks held by X/2064:
[   29.130973]  #0: 888103ecfcf0 (crtc_ww_class_acquire){+.+.}-{0:0}, at: 
drm_mode_cursor_common+0x87/0x200 [drm]
[   29.130996]  #1: 8881209e00c0 (crtc_ww_class_mutex){+.+.}-{3:3}, at: 
drm_modeset_backoff+0xe4/0x190 [drm]
[   29.131020]  #2: 888100912da0 (reservation_ww_class_mutex){+.+.}-{3:3}, 
at: nouveau_bo_pin+0x2b/0x320 [nouveau]
[   29.131073]
   stack backtrace:
[   29.131076] CPU: 5 PID: 2064 Comm: X Kdump: loaded Tainted: GE   
  5.13.0.g625acff-master #4
[   29.131081] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[   29.131084] Call Trace:
[   29.131087]  dump_stack+0x7f/0xad
[   29.131091]  check_noncircular+0x10c/0x120
[   29.131096]  ? nvkm_vmm_map+0xca/0x3c0 [nouveau]
[   29.131144]  ? validate_chain+0xbb8/0x1740
[   29.131148]  validate_chain+0xbb8/0x1740
[   29.131154]  __lock_acquire+0x8ab/0xc20
[   29.131158]  lock_acquire+0x258/0x2f0
[   29.131162]  ? nouveau_bo_move+0x11c/0x830 [nouveau]
[   29.131212]  __mutex_lock+0x95/0x9b0
[   29.131216]  ? nouveau_bo_move+0x11c/0x830 [nouveau]
[   29.131265]  ? nvif_vmm_map+0xf4/0x110 [nouveau]
[   29.131291]  ? nouveau_bo_move+0x11c/0x830 [nouveau]
[   29.131341]  ? nouveau_bo_move+0x11c/0x830 [nouveau]
[   29.131390]  

Re: [Nouveau] [bisected] Re: nouveau: lockdep cli->mutex vs reservation_ww_class_mutex deadlock report

2021-03-15 Thread Mike Galbraith
On Mon, 2021-03-15 at 09:53 +0100, Mike Galbraith wrote:
> On Mon, 2021-03-15 at 09:05 +0100, Christian König wrote:
> > Hi Mike,
> >
> > I'm pretty sure your bisection is a bit off.
>
> (huh?) Ah crap, yup, the spew from hell you plugged obliterated the
> lockdep gripe I was grepping for as go/nogo, and off into lala land we
> go.. twice.. whee :)  Oh well, the ordering gripe is clear enough
> without a whodoneit.

However, after having rummaged around, two minutes with gitk was enough
for 551620f2 to flash neon red, and one build later confirm it.

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [bisected] Re: nouveau: lockdep cli->mutex vs reservation_ww_class_mutex deadlock report

2021-03-15 Thread Mike Galbraith
On Mon, 2021-03-15 at 09:05 +0100, Christian König wrote:
> Hi Mike,
>
> I'm pretty sure your bisection is a bit off.

(huh?) Ah crap, yup, the spew from hell you plugged obliterated the
lockdep gripe I was grepping for as go/nogo, and off into lala land we
go.. twice.. whee :)  Oh well, the ordering gripe is clear enough
without a whodoneit.

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [bisected] Re: nouveau: lockdep cli->mutex vs reservation_ww_class_mutex deadlock report

2021-03-13 Thread Mike Galbraith
This little bugger bisected to...

b73cd1e2ebfc "drm/ttm: stop destroying pinned ghost object"

...and (the second time around) was confirmed on the spot.  However,
while the fingered commit still reverts cleanly, doing so at HEAD does
not make lockdep return to happy camper state (leading to bisection
#2), ie the fingered commit is only the beginning of nouveau's 5.12
cycle lockdep woes.

homer:..kernel/linux-master # quilt applied|grep revert
patches/revert-drm-ttm-Remove-pinned-bos-from-LRU-in-ttm_bo_move_to_lru_tail-v2.patch
patches/revert-drm-ttm-cleanup-LRU-handling-further.patch
patches/revert-drm-ttm-use-pin_count-more-extensively.patch
patches/revert-drm-ttm-stop-destroying-pinned-ghost-object.patch

That still ain't enough to appease lockdep at HEAD.  I'm not going to
muck about with it beyond that, since this looks a whole lot like yet
another example of "fixing stuff exposes other busted stuff".

On Wed, 2021-03-10 at 10:58 +0100, Mike Galbraith wrote:
> [   29.966927] ==
> [   29.966929] WARNING: possible circular locking dependency detected
> [   29.966932] 5.12.0.g05a59d7-master #2 Tainted: GW   E
> [   29.966934] --
> [   29.966937] X/2145 is trying to acquire lock:
> [   29.966939] 888120714518 (>mutex){+.+.}-{3:3}, at: 
> nouveau_bo_move+0x11f/0x980 [nouveau]
> [   29.967002]
>but task is already holding lock:
> [   29.967004] 888123c201a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> nouveau_bo_pin+0x2b/0x310 [nouveau]
> [   29.967053]
>which lock already depends on the new lock.
>
> [   29.967056]
>the existing dependency chain (in reverse order) is:
> [   29.967058]
>-> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
> [   29.967063]__ww_mutex_lock.constprop.16+0xbe/0x10d0
> [   29.967069]nouveau_bo_pin+0x2b/0x310 [nouveau]
> [   29.967112]nouveau_channel_prep+0x106/0x2e0 [nouveau]
> [   29.967151]nouveau_channel_new+0x4f/0x760 [nouveau]
> [   29.967188]nouveau_abi16_ioctl_channel_alloc+0xdf/0x350 [nouveau]
> [   29.967223]drm_ioctl_kernel+0x91/0xe0 [drm]
> [   29.967245]drm_ioctl+0x2db/0x380 [drm]
> [   29.967259]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
> [   29.967303]__x64_sys_ioctl+0x76/0xb0
> [   29.967307]do_syscall_64+0x33/0x40
> [   29.967310]entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   29.967314]
>-> #0 (>mutex){+.+.}-{3:3}:
> [   29.967318]__lock_acquire+0x1494/0x1ac0
> [   29.967322]lock_acquire+0x23e/0x3b0
> [   29.967325]__mutex_lock+0x95/0x9d0
> [   29.967330]nouveau_bo_move+0x11f/0x980 [nouveau]
> [   29.967377]ttm_bo_handle_move_mem+0x79/0x130 [ttm]
> [   29.967384]ttm_bo_validate+0x156/0x1b0 [ttm]
> [   29.967390]nouveau_bo_validate+0x48/0x70 [nouveau]
> [   29.967438]nouveau_bo_pin+0x1de/0x310 [nouveau]
> [   29.967487]nv50_wndw_prepare_fb+0x53/0x4d0 [nouveau]
> [   29.967531]drm_atomic_helper_prepare_planes+0x8a/0x110 
> [drm_kms_helper]
> [   29.967547]nv50_disp_atomic_commit+0xa9/0x1b0 [nouveau]
> [   29.967593]drm_atomic_helper_update_plane+0x10a/0x150 
> [drm_kms_helper]
> [   29.967606]drm_mode_cursor_universal+0x10b/0x220 [drm]
> [   29.967627]drm_mode_cursor_common+0x190/0x200 [drm]
> [   29.967648]drm_mode_cursor_ioctl+0x3d/0x50 [drm]
> [   29.967669]drm_ioctl_kernel+0x91/0xe0 [drm]
> [   29.967684]drm_ioctl+0x2db/0x380 [drm]
> [   29.967699]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
> [   29.967748]__x64_sys_ioctl+0x76/0xb0
> [   29.967752]do_syscall_64+0x33/0x40
> [   29.967756]entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   29.967760]
>other info that might help us debug this:
>
> [   29.967764]  Possible unsafe locking scenario:
>
> [   29.967767]CPU0CPU1
> [   29.967770]
> [   29.967772]   lock(reservation_ww_class_mutex);
> [   29.967776]lock(>mutex);
> [   29.967779]
> lock(reservation_ww_class_mutex);
> [   29.967783]   lock(>mutex);
> [   29.967786]
> *** DEADLOCK ***
>
> [   29.967790] 3 locks held by X/2145:
> [   29.967792]  #0: 88810365bcf8 (crtc_ww_class_acquire){+.+.}-{0:0}, at: 
> drm_mode_cursor_common+0x87/0x200 [drm]
> [   29.967817]  #1: 888108d9e098 (crtc_ww_class_mutex){+.+.}-{3:3}, at: 
> drm_modeset_lock+0xc3/0xe0 [drm]
> [   29.967841]  #2: 888123c201a0 
>

[Nouveau] nouveau: lockdep cli->mutex vs reservation_ww_class_mutex deadlock report

2021-03-10 Thread Mike Galbraith


[   29.966927] ==
[   29.966929] WARNING: possible circular locking dependency detected
[   29.966932] 5.12.0.g05a59d7-master #2 Tainted: GW   E
[   29.966934] --
[   29.966937] X/2145 is trying to acquire lock:
[   29.966939] 888120714518 (>mutex){+.+.}-{3:3}, at: 
nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.967002]
   but task is already holding lock:
[   29.967004] 888123c201a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
nouveau_bo_pin+0x2b/0x310 [nouveau]
[   29.967053]
   which lock already depends on the new lock.

[   29.967056]
   the existing dependency chain (in reverse order) is:
[   29.967058]
   -> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
[   29.967063]__ww_mutex_lock.constprop.16+0xbe/0x10d0
[   29.967069]nouveau_bo_pin+0x2b/0x310 [nouveau]
[   29.967112]nouveau_channel_prep+0x106/0x2e0 [nouveau]
[   29.967151]nouveau_channel_new+0x4f/0x760 [nouveau]
[   29.967188]nouveau_abi16_ioctl_channel_alloc+0xdf/0x350 [nouveau]
[   29.967223]drm_ioctl_kernel+0x91/0xe0 [drm]
[   29.967245]drm_ioctl+0x2db/0x380 [drm]
[   29.967259]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
[   29.967303]__x64_sys_ioctl+0x76/0xb0
[   29.967307]do_syscall_64+0x33/0x40
[   29.967310]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   29.967314]
   -> #0 (>mutex){+.+.}-{3:3}:
[   29.967318]__lock_acquire+0x1494/0x1ac0
[   29.967322]lock_acquire+0x23e/0x3b0
[   29.967325]__mutex_lock+0x95/0x9d0
[   29.967330]nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.967377]ttm_bo_handle_move_mem+0x79/0x130 [ttm]
[   29.967384]ttm_bo_validate+0x156/0x1b0 [ttm]
[   29.967390]nouveau_bo_validate+0x48/0x70 [nouveau]
[   29.967438]nouveau_bo_pin+0x1de/0x310 [nouveau]
[   29.967487]nv50_wndw_prepare_fb+0x53/0x4d0 [nouveau]
[   29.967531]drm_atomic_helper_prepare_planes+0x8a/0x110 
[drm_kms_helper]
[   29.967547]nv50_disp_atomic_commit+0xa9/0x1b0 [nouveau]
[   29.967593]drm_atomic_helper_update_plane+0x10a/0x150 
[drm_kms_helper]
[   29.967606]drm_mode_cursor_universal+0x10b/0x220 [drm]
[   29.967627]drm_mode_cursor_common+0x190/0x200 [drm]
[   29.967648]drm_mode_cursor_ioctl+0x3d/0x50 [drm]
[   29.967669]drm_ioctl_kernel+0x91/0xe0 [drm]
[   29.967684]drm_ioctl+0x2db/0x380 [drm]
[   29.967699]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
[   29.967748]__x64_sys_ioctl+0x76/0xb0
[   29.967752]do_syscall_64+0x33/0x40
[   29.967756]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   29.967760]
   other info that might help us debug this:

[   29.967764]  Possible unsafe locking scenario:

[   29.967767]CPU0CPU1
[   29.967770]
[   29.967772]   lock(reservation_ww_class_mutex);
[   29.967776]lock(>mutex);
[   29.967779]lock(reservation_ww_class_mutex);
[   29.967783]   lock(>mutex);
[   29.967786]
*** DEADLOCK ***

[   29.967790] 3 locks held by X/2145:
[   29.967792]  #0: 88810365bcf8 (crtc_ww_class_acquire){+.+.}-{0:0}, at: 
drm_mode_cursor_common+0x87/0x200 [drm]
[   29.967817]  #1: 888108d9e098 (crtc_ww_class_mutex){+.+.}-{3:3}, at: 
drm_modeset_lock+0xc3/0xe0 [drm]
[   29.967841]  #2: 888123c201a0 (reservation_ww_class_mutex){+.+.}-{3:3}, 
at: nouveau_bo_pin+0x2b/0x310 [nouveau]
[   29.967896]
   stack backtrace:
[   29.967899] CPU: 6 PID: 2145 Comm: X Kdump: loaded Tainted: GW   E   
  5.12.0.g05a59d7-master #2
[   29.967904] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[   29.967908] Call Trace:
[   29.967911]  dump_stack+0x6d/0x89
[   29.967915]  check_noncircular+0xe7/0x100
[   29.967919]  ? nvkm_vram_map+0x48/0x50 [nouveau]
[   29.967959]  ? __lock_acquire+0x1494/0x1ac0
[   29.967963]  __lock_acquire+0x1494/0x1ac0
[   29.967967]  lock_acquire+0x23e/0x3b0
[   29.967971]  ? nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.968020]  __mutex_lock+0x95/0x9d0
[   29.968024]  ? nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.968070]  ? nvif_vmm_map+0xf4/0x110 [nouveau]
[   29.968093]  ? nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.968137]  ? lock_release+0x160/0x280
[   29.968141]  ? nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.968184]  nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.968226]  ? up_write+0x17/0x130
[   29.968229]  ? unmap_mapping_pages+0x53/0x110
[   29.968234]  ttm_bo_handle_move_mem+0x79/0x130 [ttm]
[   29.968240]  ttm_bo_validate+0x156/0x1b0 [ttm]
[   29.968247]  nouveau_bo_validate+0x48/0x70 [nouveau]
[   29.968289]  nouveau_bo_pin+0x1de/0x310 [nouveau]
[   29.968330]  nv50_wndw_prepare_fb+0x53/0x4d0 [nouveau]
[   

Re: [Nouveau] drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 14:26 +0100, Christian König wrote:
>
> Am 10.02.21 um 13:22 schrieb Mike Galbraith:
> > On Wed, 2021-02-10 at 12:44 +0100, Christian König wrote:
> >> Please try to add a "return NULL" at the beginning of ttm_pool_type_take().
> >>
> >> That should effectively disable using the pool.
> > That did away with the yield looping, but it doesn't take long for the
> > display to freeze.  I ssh'd in from lappy, but there was nada in dmesg.
>
> Yeah, that is expected. Without taking pages from the pool we leak
> memory like sieve.
>
> At least we could narrow down the problem quite a bit with that.
>
> Can you test the attached patch and see if it helps?

Yup, that seems to have fixed it all up.  Another one bites the dust ;)

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 12:44 +0100, Christian König wrote:
> Please try to add a "return NULL" at the beginning of ttm_pool_type_take().
>
> That should effectively disable using the pool.

That did away with the yield looping, but it doesn't take long for the
display to freeze.  I ssh'd in from lappy, but there was nada in dmesg.

> Thanks for testing,

Happy to.

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 11:52 +0100, Christian König wrote:
>
>
> You could try to replace the "for (order = min(MAX_ORDER - 1UL,
> __fls(num_pages)); num_pages;" in ttm_pool_alloc() with "for (order = 0;
> num_pages;" to get the old behavior.

That's a nogo too.

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 11:42 +0100, Christian König wrote:
>
> Am 10.02.21 um 11:40 schrieb Mike Galbraith:
> > On Wed, 2021-02-10 at 11:34 +0100, Christian König wrote:
> >> Hi Mike,
> >>
> >> do you have more information than just system stuck in a loop?
> > No, strace shows no syscalls but sched_yield().
>
> Well you can try to comment out the call to register_shrinker() in
> ttm_pool.c, but apart from that I don't have much ideas.

Nogo.. off to suggestion #2 I go.

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
On Wed, 2021-02-10 at 11:34 +0100, Christian König wrote:
> Hi Mike,
>
> do you have more information than just system stuck in a loop?

No, strace shows no syscalls but sched_yield().

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] drm/nouneau: 5.11 cycle regression bisected to 461619f5c324 "drm/nouveau: switch to new allocator"

2021-02-10 Thread Mike Galbraith
Greetings,

The symptom is tasks stuck waiting for lord knows what by calling
sched_yield() in a loop (less than wonderful, sched_yield() sucks).
After boot to KDE login, I immediately see tracker-extract chewing cpu
in aforementioned loop. Firing up evolution and poking 'new' to
compose, WebKitWebProcess joins in the yield loop fun.

Hand rolled reverts of 256dd44b "drm/ttm: nuke old page allocator" and
the fingered commit cures the problem for me at 207665fd in the bisect
log below, and at master and tip HEAD.

There's a "things that make ya go hmm" aspect to this thing though.  If
you look at the bisect log below, the starting "bad" is 207665fd.  That
commit DOES NOT exhibit the yield loop symptom immediately out of the
box, but DOES after applying the much needed fix...

660a59953f4f "drm/nouveau: fix multihop when move doesn't work"

...to prevent an earlier regression from quickly appearing, one which
Dave will likely recall having fixed.  Relevant?  No idea, but seems
worth mentioning.

Box: aging generic i4790 box with its equally aged Nvidia GTX 980.


461619f5c3242aaee9ec3f0b7072719bd86ea207 is the first bad commit
commit 461619f5c3242aaee9ec3f0b7072719bd86ea207
Author: Christian König 
Date:   Sat Oct 24 13:13:25 2020 +0200

drm/nouveau: switch to new allocator

It should be able to handle all cases now.

Signed-off-by: Christian König 
Reviewed-by: Dave Airlie 
Reviewed-by: Madhav Chauhan 
Tested-by: Huang Rui 
Link: https://patchwork.freedesktop.org/patch/397082/?series=83051=1

 drivers/gpu/drm/nouveau/nouveau_bo.c  | 30 ++
 drivers/gpu/drm/nouveau/nouveau_drv.h |  1 -
 2 files changed, 2 insertions(+), 29 deletions(-)

git bisect start
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 3f995f8e0b540342612d3f6b1fc299f5bf486987
# bad: [207665fd37561f97591e74d0ee80f24bdf06b789] Merge tag 
'exynos-drm-next-for-v5.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next
git bisect bad 207665fd37561f97591e74d0ee80f24bdf06b789
# good: [f8394f232b1eab649ce2df5c5f15b0e528c92091] Linux 5.10-rc3
git bisect good f8394f232b1eab649ce2df5c5f15b0e528c92091
# good: [b3bf99daaee96a141536ce5c60a0d6dba6ec1d23] drm/i915/display: Defer 
initial modeset until after GGTT is initialised
git bisect good b3bf99daaee96a141536ce5c60a0d6dba6ec1d23
# good: [dfbbfe3c17651fa0fcf2658fb90317df08e52bb2] drm/amd/display: Add formats 
for DCC with 2/3 planes.
git bisect good dfbbfe3c17651fa0fcf2658fb90317df08e52bb2
# bad: [112e505a76de69f8667e2fe8da38433f754364a8] Merge drm/drm-next into 
drm-misc-next
git bisect bad 112e505a76de69f8667e2fe8da38433f754364a8
# bad: [49a3f51dfeeecb52c5aa28c5cb9592fe5e39bf95] drm/gem: Use struct 
dma_buf_map in GEM vmap ops and convert GEM backends
git bisect bad 49a3f51dfeeecb52c5aa28c5cb9592fe5e39bf95
# bad: [d7e0798925ea9272f8c8e66ceb1f7c51823e50ab] dt-bindings: display: bridge: 
Intel KeemBay DSI
git bisect bad d7e0798925ea9272f8c8e66ceb1f7c51823e50ab
# bad: [c489573b5b6ce6442ad4658d9d5ec77839b91622] Merge drm/drm-next into 
drm-misc-next
git bisect bad c489573b5b6ce6442ad4658d9d5ec77839b91622
# bad: [8567d51555c12d169c4e0f796030051fff1c318d] drm/vmwgfx: switch to new 
allocator
git bisect bad 8567d51555c12d169c4e0f796030051fff1c318d
# good: [5144eead3f8c80ac7f913c07139442fede94003e] drm: xlnx: Use 
dma_request_chan for DMA channel request
git bisect good 5144eead3f8c80ac7f913c07139442fede94003e
# good: [e93b2da9799e5cb97760969f3e1f02a5bdac29fe] drm/amdgpu: switch to new 
allocator v2
git bisect good e93b2da9799e5cb97760969f3e1f02a5bdac29fe
# bad: [461619f5c3242aaee9ec3f0b7072719bd86ea207] drm/nouveau: switch to new 
allocator
git bisect bad 461619f5c3242aaee9ec3f0b7072719bd86ea207
# good: [0fe3cf3a53b5c1205ec7d321be1185b075dff205] drm/radeon: switch to new 
allocator v2
git bisect good 0fe3cf3a53b5c1205ec7d321be1185b075dff205
# first bad commit: [461619f5c3242aaee9ec3f0b7072719bd86ea207] drm/nouveau: 
switch to new allocator

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [bisected] Re: regression: nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop

2020-12-17 Thread Mike Galbraith
On Fri, 2020-12-18 at 05:45 +1000, David Airlie wrote:

> Does the attached patch help?

Yup, that seems to have done the trick.  Fast bug squashing by the drm
guys today, two slowly bisected, two quickly squashed.

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [bisected] Re: regression: nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop

2020-12-17 Thread Mike Galbraith
On Wed, 2020-12-16 at 14:31 +0100, Mike Galbraith wrote:
> When the below new to 5.11 cycle badness happens, it's time to reboot.
>
> ...
> [   27.467260] NFSD: Using UMH upcall client tracking operations.
> [   27.467273] NFSD: starting 90-second grace period (net f0a0)
> [   27.965138] Bridge firewalling registered
> [   39.096604] fuse: init (API version 7.32)
> [  961.579832] nouveau :01:00.0: fifo: fault 01 [WRITE] at 
> 0069f000 engine 15 [CE0] client 01 [HUB/CE0] reason 02 [PTE] on 
> channel 1 [00ff73d000 DRM]
> [  961.579840] nouveau :01:00.0: fifo: channel 1: killed
> [  961.579844] nouveau :01:00.0: fifo: runlist 0: scheduled for recovery
> [  961.579850] nouveau :01:00.0: fifo: runlist 4: scheduled for recovery
> [  961.579853] nouveau :01:00.0: fifo: engine 4: scheduled for recovery
>
> Box is aging generic i4790 desktop box with...
> 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] 
> (rev a1)

Bisection was straight forward.  A post bisect test revert was equally
straight forward, and seems to confirm the fingered commit.

0c8c0659d7475b6304b67374caf15b56cf0be4f9 is the first bad commit
commit 0c8c0659d7475b6304b67374caf15b56cf0be4f9
Author: Dave Airlie 
Date:   Thu Oct 29 13:59:20 2020 +1000

drm/nouveau/ttm: use multihop

This removes the code to move resources directly between
SYSTEM and VRAM in favour of using the core ttm mulithop code.

Signed-off-by: Dave Airlie 
Acked-by: Daniel Vetter 
Reviewed-by: Christian König 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20201109005432.861936-4-airl...@gmail.com

 drivers/gpu/drm/nouveau/nouveau_bo.c | 112 ---
 1 file changed, 13 insertions(+), 99 deletions(-)

git bisect start 'drivers/gpu'
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [accefff5b547a9a1d959c7e76ad539bf2480e78b] Merge tag 
'arm-soc-omap-genpd-5.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad accefff5b547a9a1d959c7e76ad539bf2480e78b
# bad: [d635a69dd4981cc51f90293f5f64268620ed1565] Merge tag 'net-next-5.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad d635a69dd4981cc51f90293f5f64268620ed1565
# bad: [0ca2ce81eb8ee30f3ba8ac7967fef9cfbb44dbdb] Merge tag 'arm64-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
git bisect bad 0ca2ce81eb8ee30f3ba8ac7967fef9cfbb44dbdb
# good: [f8aab60422c371425365d386dfd51e0c6c5b1041] drm/amdgpu: Initialise 
drm_gem_object_funcs for imported BOs
git bisect good f8aab60422c371425365d386dfd51e0c6c5b1041
# bad: [fab0fca1da5cdc48be051715cd9787df04fdce3a] Merge tag 'media/v5.11-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect bad fab0fca1da5cdc48be051715cd9787df04fdce3a
# bad: [bcc68bd8161261ceeb1a4ab02b5265758944f90d] Merge tag 
'auxdisplay-for-linus-v5.11' of git://github.com/ojeda/linux
git bisect bad bcc68bd8161261ceeb1a4ab02b5265758944f90d
# bad: [22f8c80566c4a29a0d8b5ebf24aa1fd1679b39e5] Merge tag 
'drm-misc-next-2020-11-18' of ssh://git.freedesktop.org/git/drm/drm-misc into 
drm-next
git bisect bad 22f8c80566c4a29a0d8b5ebf24aa1fd1679b39e5
# bad: [a1ac250a82a5e97db71f14101ff7468291a6aaef] fbcon: Avoid using 
FNTCHARCNT() and hard-coded built-in font charcount
git bisect bad a1ac250a82a5e97db71f14101ff7468291a6aaef
# good: [a39855076c859b7f6c58ed4da8f195a2a6cd3c7b] drm/cma-helper: Make default 
object functions the default
git bisect good a39855076c859b7f6c58ed4da8f195a2a6cd3c7b
# bad: [5f1f10998e7f0ba98a8efc27009cd9a11cff6616] 
drm/atmel-hlcdc/atmel_hlcdc_plane: Staticise local function 
'atmel_hlcdc_plane_setup_scaler()'
git bisect bad 5f1f10998e7f0ba98a8efc27009cd9a11cff6616
# good: [55c8bcaeccaa5c6d9e7a432ebd0a1717f488a3f4] drm: mxsfb: Implement 
.format_mod_supported
git bisect good 55c8bcaeccaa5c6d9e7a432ebd0a1717f488a3f4
# bad: [0c8c0659d7475b6304b67374caf15b56cf0be4f9] drm/nouveau/ttm: use multihop
git bisect bad 0c8c0659d7475b6304b67374caf15b56cf0be4f9
# good: [23d6ab1d4c503660632e7b18cbb571d62d9bf792] drm: remove 
pgprot_decrypted() before calls to io_remap_pfn_range()
git bisect good 23d6ab1d4c503660632e7b18cbb571d62d9bf792
# good: [ebdf565169af006ee3be8c40eecbfc77d28a3b84] drm/ttm: add multihop 
infrastrucutre (v3)
git bisect good ebdf565169af006ee3be8c40eecbfc77d28a3b84
# good: [f5a89a5cae812a39993be32e74c8ed7856b1e2b2] drm/amdgpu/ttm: use multihop
git bisect good f5a89a5cae812a39993be32e74c8ed7856b1e2b2
# first bad commit: [0c8c0659d7475b6304b67374caf15b56cf0be4f9] drm/nouveau/ttm: 
use multihop

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] regression: nouveau fifo: fault 01 ==> channel 1: killed ==> dead desktop

2020-12-16 Thread Mike Galbraith
When the below new to 5.11 cycle badness happens, it's time to reboot.

...
[   27.467260] NFSD: Using UMH upcall client tracking operations.
[   27.467273] NFSD: starting 90-second grace period (net f0a0)
[   27.965138] Bridge firewalling registered
[   39.096604] fuse: init (API version 7.32)
[  961.579832] nouveau :01:00.0: fifo: fault 01 [WRITE] at 0069f000 
engine 15 [CE0] client 01 [HUB/CE0] reason 02 [PTE] on channel 1 [00ff73d000 
DRM]
[  961.579840] nouveau :01:00.0: fifo: channel 1: killed
[  961.579844] nouveau :01:00.0: fifo: runlist 0: scheduled for recovery
[  961.579850] nouveau :01:00.0: fifo: runlist 4: scheduled for recovery
[  961.579853] nouveau :01:00.0: fifo: engine 4: scheduled for recovery

Box is aging generic i4790 desktop box with...
01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] 
(rev a1)

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] nouveau: WARNING: CPU: 0 PID: 20957 at drivers/gpu/drm/nouveau/nvif/vmm.c:71

2020-11-19 Thread Mike Galbraith
[15561.391527] [ cut here ]
[15561.391560] WARNING: CPU: 0 PID: 20957 at 
drivers/gpu/drm/nouveau/nvif/vmm.c:71 nvif_vmm_put+0x4a/0x50 [nouveau]
[15561.391562] Modules linked in: nls_utf8(E) isofs(E) fuse(E) msr(E) 
xt_comment(E) br_netfilter(E) xt_physdev(E) nfnetlink_cthelper(E) nfnetlink(E) 
ebtable_filter(E) ebtables(E) af_packet(E) bridge(E) stp(E) llc(E) 
iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) xt_pkttype(E) xt_tcpudp(E) 
ip6t_REJECT(E) nf_reject_ipv6(E) ipt_REJECT(E) nf_reject_ipv4(E) 
iptable_filter(E) bpfilter(E) ip6table_mangle(E) ip_tables(E) xt_conntrack(E) 
nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) hid_logitech_hidpp(E) sr_mod(E) 
usblp(E) cdrom(E) hid_logitech_dj(E) joydev(E) intel_rapl_msr(E) 
intel_rapl_common(E) at24(E) mei_hdcp(E) iTCO_wdt(E) regmap_i2c(E) 
intel_pmc_bxt(E) iTCO_vendor_support(E) snd_hda_codec_realtek(E) 
snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_codec_hdmi(E) 
x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_intel(E) coretemp(E) 
snd_intel_dspcfg(E) kvm_intel(E) snd_hda_codec(E) snd_hwdep(E) snd_hda_core(E) 
kvm(E)
[15561.391586]  nls_iso8859_1(E) nls_cp437(E) snd_pcm(E) irqbypass(E) 
crct10dif_pclmul(E) snd_timer(E) crc32_pclmul(E) r8169(E) 
ghash_clmulni_intel(E) snd(E) aesni_intel(E) realtek(E) crypto_simd(E) 
i2c_i801(E) mei_me(E) mdio_devres(E) cryptd(E) pcspkr(E) soundcore(E) 
i2c_smbus(E) lpc_ich(E) glue_helper(E) mfd_core(E) libphy(E) mei(E) fan(E) 
thermal(E) intel_smartconnect(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) 
grace(E) sch_fq_codel(E) sunrpc(E) nfs_ssc(E) uas(E) usb_storage(E) 
hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) 
syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) xhci_pci(E) cec(E) 
ahci(E) rc_core(E) ehci_pci(E) xhci_hcd(E) ttm(E) libahci(E) ehci_hcd(E) 
libata(E) drm(E) usbcore(E) video(E) button(E) sd_mod(E) t10_pi(E) vfat(E) 
fat(E) virtio_blk(E) virtio_mmio(E) virtio_ring(E) virtio(E) ext4(E) 
crc32c_intel(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) dm_multipath(E) 
dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod
 (E)
[15561.391626]  efivarfs(E) autofs4(E)
[15561.391637] CPU: 0 PID: 20957 Comm: kworker/0:4 Kdump: loaded Tainted: G S   
   E 5.10.0.g9c87c9f-master #3
[15561.391640] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[15561.391667] Workqueue: events nouveau_cli_work [nouveau]
[15561.391682] RIP: 0010:nvif_vmm_put+0x4a/0x50 [nouveau]
[15561.391684] Code: 00 00 00 48 89 e2 48 c7 04 24 00 00 00 00 48 89 44 24 08 
e8 48 e7 ff ff 85 c0 75 0e 48 c7 43 08 00 00 00 00 48 83 c4 10 5b c3 <0f> 0b eb 
ee 66 90 0f 1f 44 00 00 53 48 83 ec 18 83 fe 01 48 8b 5c
[15561.391686] RSP: :8881feca7e08 EFLAGS: 00010282
[15561.391688] RAX: fffe RBX: 8881feca7e28 RCX: 
[15561.391690] RDX: 0010 RSI: 8881feca7d80 RDI: 8881feca7e18
[15561.391692] RBP: 8881feca7e50 R08: 01dc5000 R09: 
[15561.391693] R10: 82003de8 R11: fefefefefefefeff R12: dead0122
[15561.391695] R13: dead0100 R14: 888102fa9328 R15: 888102fa9308
[15561.391697] FS:  () GS:88841ec0() 
knlGS:
[15561.391698] CS:  0010 DS:  ES:  CR0: 80050033
[15561.391700] CR2: 7fd692058000 CR3: 03c10002 CR4: 001706f0
[15561.391701] Call Trace:
[15561.391729]  nouveau_vma_del+0x58/0xa0 [nouveau]
[15561.391755]  nouveau_gem_object_delete_work+0x26/0x40 [nouveau]
[15561.391782]  nouveau_cli_work+0x76/0x120 [nouveau]
[15561.391786]  ? __schedule+0x35c/0x770
[15561.391790]  process_one_work+0x1f5/0x3c0
[15561.391792]  ? process_one_work+0x3c0/0x3c0
[15561.391794]  worker_thread+0x2d/0x3d0
[15561.391796]  ? process_one_work+0x3c0/0x3c0
[15561.391798]  kthread+0x117/0x130
[15561.391800]  ? kthread_park+0x90/0x90
[15561.391803]  ret_from_fork+0x1f/0x30
[15561.391806] ---[ end trace 1f8ba448e97e64e0 ]---

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] kvm+nouveau induced lockdep gripe

2020-10-23 Thread Mike Galbraith
On Sat, 2020-10-24 at 10:22 +0800, Hillf Danton wrote:
>
> Looks like we can break the lock chain by moving ttm bo's release
> method out of mmap_lock, see diff below.

Ah, the perfect compliment to morning java, a patchlet to wedge in and
see what happens.

wedge/build/boot 

Mmm, box says no banana... a lot.

[   30.456921] 
[   30.456924] WARNING: inconsistent lock state
[   30.456928] 5.9.0.gf11901e-master #2 Tainted: G S  E
[   30.456932] 
[   30.456935] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   30.456940] ksoftirqd/4/36 [HC0[0]:SC1[1]:HE1:SE0] takes:
[   30.456944] 8e2c8bde9e40 (>vm_lock){++?+}-{2:2}, at: 
drm_vma_offset_remove+0x14/0x70 [drm]
[   30.456976] {SOFTIRQ-ON-W} state was registered at:
[   30.456982]   lock_acquire+0x1a7/0x3b0
[   30.456987]   _raw_write_lock+0x2f/0x40
[   30.457006]   drm_vma_offset_add+0x1c/0x60 [drm]
[   30.457013]   ttm_bo_init_reserved+0x28b/0x460 [ttm]
[   30.457020]   ttm_bo_init+0x57/0x110 [ttm]
[   30.457066]   nouveau_bo_init+0xb0/0xc0 [nouveau]
[   30.457108]   nouveau_bo_new+0x4d/0x60 [nouveau]
[   30.457145]   nv84_fence_create+0xb9/0x130 [nouveau]
[   30.457180]   nvc0_fence_create+0xe/0x47 [nouveau]
[   30.457221]   nouveau_drm_device_init+0x3d9/0x800 [nouveau]
[   30.457262]   nouveau_drm_probe+0xfb/0x200 [nouveau]
[   30.457268]   local_pci_probe+0x42/0x90
[   30.457272]   pci_device_probe+0xe7/0x1a0
[   30.457276]   really_probe+0xf7/0x4d0
[   30.457280]   driver_probe_device+0x5d/0x140
[   30.457284]   device_driver_attach+0x4f/0x60
[   30.457288]   __driver_attach+0xa4/0x140
[   30.457292]   bus_for_each_dev+0x67/0x90
[   30.457296]   bus_add_driver+0x18c/0x230
[   30.457299]   driver_register+0x5b/0xf0
[   30.457304]   do_one_initcall+0x54/0x2f0
[   30.457309]   do_init_module+0x5b/0x21b
[   30.457314]   load_module+0x1e40/0x2370
[   30.457317]   __do_sys_finit_module+0x98/0xe0
[   30.457321]   do_syscall_64+0x33/0x40
[   30.457326]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.457329] irq event stamp: 366850
[   30.457335] hardirqs last  enabled at (366850): [] 
rcu_nocb_unlock_irqrestore+0x4f/0x60
[   30.457342] hardirqs last disabled at (366849): [] 
rcu_do_batch+0x59f/0x990
[   30.457347] softirqs last  enabled at (366834): [] 
__do_softirq+0x2d7/0x4a4
[   30.457357] softirqs last disabled at (366839): [] 
run_ksoftirqd+0x32/0x60
[   30.457363]
   other info that might help us debug this:
[   30.457369]  Possible unsafe locking scenario:

[   30.457375]CPU0
[   30.457378]
[   30.457381]   lock(>vm_lock);
[   30.457386]   
[   30.457389] lock(>vm_lock);
[   30.457394]
*** DEADLOCK ***



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] kvm+nouveau induced lockdep gripe

2020-10-23 Thread Mike Galbraith
On Fri, 2020-10-23 at 11:01 +0200, Sebastian Andrzej Siewior wrote:
> On 2020-10-22 07:28:20 [+0200], Mike Galbraith wrote:
> > I've only as yet seen nouveau lockdep gripage when firing up one of my
> > full distro KVM's.
>
> Could you please check !RT with the `threadirqs' command line option? I
> don't think RT is doing here anything different (except for having
> threaded interrupts enabled by default).

Yup, you are correct, RT is innocent.


[   70.135201] ==
[   70.135206] WARNING: possible circular locking dependency detected
[   70.135211] 5.9.0.gf989335-master #1 Tainted: GE
[   70.135216] --
[   70.135220] libvirtd/1838 is trying to acquire lock:
[   70.135225] 983590c2d5a8 (>mmap_lock#2){}-{3:3}, at: 
mpol_rebind_mm+0x1e/0x50
[   70.135239]
   but task is already holding lock:
[   70.135244] 8a585410 (_rwsem){}-{0:0}, at: 
cpuset_attach+0x38/0x390
[   70.135256]
   which lock already depends on the new lock.

[   70.135261]
   the existing dependency chain (in reverse order) is:
[   70.135266]
   -> #3 (_rwsem){}-{0:0}:
[   70.135275]cpuset_read_lock+0x39/0xd0
[   70.135282]__sched_setscheduler+0x456/0xa90
[   70.135287]_sched_setscheduler+0x69/0x70
[   70.135292]__kthread_create_on_node+0x114/0x170
[   70.135297]kthread_create_on_node+0x37/0x40
[   70.135306]setup_irq_thread+0x37/0x90
[   70.135312]__setup_irq+0x4e0/0x7c0
[   70.135318]request_threaded_irq+0xf8/0x160
[   70.135371]nvkm_pci_oneinit+0x4c/0x70 [nouveau]
[   70.135399]nvkm_subdev_init+0x60/0x1e0 [nouveau]
[   70.135449]nvkm_device_init+0x10b/0x240 [nouveau]
[   70.135506]nvkm_udevice_init+0x49/0x70 [nouveau]
[   70.135531]nvkm_object_init+0x3d/0x180 [nouveau]
[   70.13]nvkm_ioctl_new+0x1a1/0x260 [nouveau]
[   70.135578]nvkm_ioctl+0x10a/0x240 [nouveau]
[   70.135600]nvif_object_ctor+0xeb/0x150 [nouveau]
[   70.135622]nvif_device_ctor+0x1f/0x60 [nouveau]
[   70.135668]nouveau_cli_init+0x1ac/0x590 [nouveau]
[   70.135711]nouveau_drm_device_init+0x68/0x800 [nouveau]
[   70.135753]nouveau_drm_probe+0xfb/0x200 [nouveau]
[   70.135761]local_pci_probe+0x42/0x90
[   70.135767]pci_device_probe+0xe7/0x1a0
[   70.135773]really_probe+0xf7/0x4d0
[   70.135779]driver_probe_device+0x5d/0x140
[   70.135785]device_driver_attach+0x4f/0x60
[   70.135790]__driver_attach+0xa4/0x140
[   70.135796]bus_for_each_dev+0x67/0x90
[   70.135801]bus_add_driver+0x18c/0x230
[   70.135807]driver_register+0x5b/0xf0
[   70.135813]do_one_initcall+0x54/0x2f0
[   70.135819]do_init_module+0x5b/0x21b
[   70.135825]load_module+0x1e40/0x2370
[   70.135830]__do_sys_finit_module+0x98/0xe0
[   70.135836]do_syscall_64+0x33/0x40
[   70.135842]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   70.135847]
   -> #2 (>mutex){+.+.}-{3:3}:
[   70.135857]__mutex_lock+0x90/0x9c0
[   70.135902]nvkm_udevice_fini+0x23/0x70 [nouveau]
[   70.135927]nvkm_object_fini+0xb8/0x210 [nouveau]
[   70.135951]nvkm_object_fini+0x73/0x210 [nouveau]
[   70.135974]nvkm_ioctl_del+0x7e/0xa0 [nouveau]
[   70.135997]nvkm_ioctl+0x10a/0x240 [nouveau]
[   70.136019]nvif_object_dtor+0x4a/0x60 [nouveau]
[   70.136040]nvif_client_dtor+0xe/0x40 [nouveau]
[   70.136085]nouveau_cli_fini+0x7a/0x90 [nouveau]
[   70.136128]nouveau_drm_postclose+0xaa/0xe0 [nouveau]
[   70.136150]drm_file_free.part.7+0x273/0x2c0 [drm]
[   70.136165]drm_release+0x6e/0xf0 [drm]
[   70.136171]__fput+0xb2/0x260
[   70.136177]task_work_run+0x73/0xc0
[   70.136183]exit_to_user_mode_prepare+0x1a5/0x1d0
[   70.136189]syscall_exit_to_user_mode+0x46/0x2a0
[   70.136195]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   70.136200]
   -> #1 (>lock){+.+.}-{3:3}:
[   70.136209]__mutex_lock+0x90/0x9c0
[   70.136252]nouveau_mem_fini+0x4c/0x70 [nouveau]
[   70.136294]nouveau_sgdma_destroy+0x20/0x50 [nouveau]
[   70.136302]ttm_bo_cleanup_memtype_use+0x3e/0x60 [ttm]
[   70.136310]ttm_bo_release+0x29c/0x600 [ttm]
[   70.136317]ttm_bo_vm_close+0x15/0x30 [ttm]
[   70.136324]remove_vma+0x3e/0x70
[   70.136329]__do_munmap+0x2b7/0x4f0
[   70.136333]__vm_munmap+0x5b/0xa0
[   70.136338]__x64_sys_munmap+0x27/0x30
[   70.136343]do_syscall_64+0x33/0x40
[   70.136349]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   70.136354]
   -> #0 (>mmap_lock#2){}-{3:3}:
[   70.136365]__lock_acquire+0x149d/0x

[Nouveau] nouveau: BUG: Invalid wait context

2020-09-09 Thread Mike Galbraith
Greetings,

Box is an aging generic i4790 + GTX-980 desktop.

[ 1143.133663] =
[ 1143.133666] [ BUG: Invalid wait context ]
[ 1143.133671] 5.9.0.g34d4ddd-preempt #2 Tainted: G S  E
[ 1143.133675] -
[ 1143.133678] X/2015 is trying to lock:
[ 1143.133682] 8d3e9efd63d8 (>lock){..-.}-{3:3}, at: 
get_page_from_freelist+0x6ed/0x1e10
[ 1143.133694] other info that might help us debug this:
[ 1143.133697] context-{5:5}
[ 1143.133700] 4 locks held by X/2015:
[ 1143.133703]  #0: 8d3e562d30c0 (>mutex){+.+.}-{4:4}, at: 
nouveau_abi16_get+0x2c/0x60 [nouveau]
[ 1143.133756]  #1: a9a6c0c57d30 
(reservation_ww_class_acquire){+.+.}-{0:0}, at: drm_ioctl_kernel+0x91/0xe0 [drm]
[ 1143.133785]  #2: 8d3e3dcef1a0 (reservation_ww_class_mutex){+.+.}-{4:4}, 
at: nouveau_gem_ioctl_pushbuf+0x63b/0x1cb0 [nouveau]
[ 1143.133834]  #3: 8d3e9ec9ea10 (krc.lock){-.-.}-{2:2}, at: 
kvfree_call_rcu+0x65/0x210
[ 1143.133845] stack backtrace:
[ 1143.133850] CPU: 2 PID: 2015 Comm: X Kdump: loaded Tainted: G S  E   
  5.9.0.g34d4ddd-preempt #2
[ 1143.133856] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[ 1143.133862] Call Trace:
[ 1143.133872]  dump_stack+0x77/0x9b
[ 1143.133879]  __lock_acquire+0x629/0xbd0
[ 1143.133887]  ? try_to_wake_up+0x250/0x860
[ 1143.133894]  lock_acquire+0x92/0x390
[ 1143.133900]  ? get_page_from_freelist+0x6ed/0x1e10
[ 1143.133908]  ? __mutex_unlock_slowpath+0x124/0x280
[ 1143.133915]  _raw_spin_lock+0x2f/0x40
[ 1143.133921]  ? get_page_from_freelist+0x6ed/0x1e10
[ 1143.133927]  get_page_from_freelist+0x6ed/0x1e10
[ 1143.133937]  ? lock_acquire+0x92/0x390
[ 1143.133943]  ? kvfree_call_rcu+0x65/0x210
[ 1143.133949]  __alloc_pages_nodemask+0x173/0x3e0
[ 1143.133957]  __get_free_pages+0xd/0x40
[ 1143.133962]  kvfree_call_rcu+0x135/0x210
[ 1143.134002]  nouveau_fence_unref+0x36/0x50 [nouveau]
[ 1143.134045]  validate_fini_no_ticket.isra.8+0x138/0x240 [nouveau]
[ 1143.134090]  nouveau_gem_ioctl_pushbuf+0x10c8/0x1cb0 [nouveau]
[ 1143.134136]  ? nouveau_gem_ioctl_new+0xc0/0xc0 [nouveau]
[ 1143.134159]  ? drm_ioctl_kernel+0x91/0xe0 [drm]
[ 1143.134170]  drm_ioctl_kernel+0x91/0xe0 [drm]
[ 1143.134182]  drm_ioctl+0x2db/0x380 [drm]
[ 1143.134211]  ? nouveau_gem_ioctl_new+0xc0/0xc0 [nouveau]
[ 1143.134217]  ? _raw_spin_unlock_irqrestore+0x47/0x60
[ 1143.134222]  ? lockdep_hardirqs_on+0x78/0x100
[ 1143.134226]  ? _raw_spin_unlock_irqrestore+0x34/0x60
[ 1143.134257]  nouveau_drm_ioctl+0x56/0xb0 [nouveau]
[ 1143.134263]  __x64_sys_ioctl+0x8e/0xd0
[ 1143.134267]  ? lockdep_hardirqs_on+0x78/0x100
[ 1143.134271]  do_syscall_64+0x33/0x40
[ 1143.134276]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1143.134280] RIP: 0033:0x7f02e2a84ac7
[ 1143.134284] Code: b3 66 90 48 8b 05 d1 13 2c 00 64 c7 00 26 00 00 00 48 c7 
c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d a1 13 2c 00 f7 d8 64 89 01 48
[ 1143.134293] RSP: 002b:7ffe2c2739e8 EFLAGS: 0246 ORIG_RAX: 
0010
[ 1143.134298] RAX: ffda RBX: 560c3f86ec08 RCX: 7f02e2a84ac7
[ 1143.134302] RDX: 7ffe2c273a50 RSI: c0406481 RDI: 000e
[ 1143.134306] RBP: 7ffe2c273a50 R08: 560c3f846820 R09: 560c3f87fc08
[ 1143.134310] R10: 560c401dbb38 R11: 0246 R12: c0406481
[ 1143.134314] R13: 000e R14: 560c3f846820 R15: 560c3f84a010

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [patch] swiotlb: fix ignored DMA_ATTR_NO_WARN request

2018-05-12 Thread Mike Galbraith
To conclude to this snail like thread (/me=walking wounded), with the
v4.16.8 hunk below, traces showing that swiotlb_alloc_coherent() was
being asked to not bother warning started showing up after the box had
been flogged for a while.

Whatever finally happens with swiotlb (seems to be in flux), other
folks meeting annoying gripeage can find bandaids in the interim.

The End

v4.16.8 !DMA_DIRECT_OPS
Xorg-3105  [001]   2156.711471: swiotlb_alloc_coherent+0xa7/0x1e0: yup
Xorg-3105  [001]   2156.711497: 
 => ttm_dma_populate+0x23c/0x310 [ttm]
 => ttm_tt_bind+0x31/0x60 [ttm]
 => ttm_bo_handle_move_mem+0x527/0x580 [ttm]
 => ttm_bo_validate+0xfb/0x110 [ttm]
 => ttm_bo_init_reserved+0x289/0x450 [ttm]
 => ttm_bo_init+0x77/0xd0 [ttm]
 => nouveau_bo_new+0x3fc/0x5e0 [nouveau]
 => nouveau_gem_new+0x66/0x110 [nouveau]
 => nouveau_gem_ioctl_new+0x48/0xc0 [nouveau]
 => drm_ioctl_kernel+0x66/0xb0 [drm]
 => drm_ioctl+0x2a4/0x360 [drm]
 => nouveau_drm_ioctl+0x50/0xb0 [nouveau]
 => do_vfs_ioctl+0x92/0x5e0
 => SyS_ioctl+0x3b/0x70
 => do_syscall_64+0x74/0x1a0
 => entry_SYSCALL_64_after_hwframe+0x3d/0xa2

--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -28,10 +28,8 @@ void *x86_swiotlb_alloc_coherent(struct
 * swiotlb_alloc_coherent() will print a warning when the DMA
 * memory allocation ultimately failed.
 */
-   flags |= __GFP_NOWARN;
-
-   vaddr = dma_generic_alloc_coherent(hwdev, size, dma_handle, flags,
-  attrs);
+   vaddr = dma_generic_alloc_coherent(hwdev, size, dma_handle,
+  flags | __GFP_NOWARN, attrs);
if (vaddr)
return vaddr;
 
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [patch] swiotlb: fix ignored DMA_ATTR_NO_WARN request

2018-05-11 Thread Mike Galbraith

In the trace below, swiotlb_alloc() is called with __GFP_NOWARN, it ors
attrs with DMA_ATTR_NO_WARN and passes it to swiotlb_alloc_buffer(),
which does NOT pass it on to swiotlb_tbl_map_single(), leading to an
ever repeating warning that the caller of swiotlb_alloc() explicitly
asked to be squelched.  Pass the caller's request for silence onward.

 Xorg-3170  [006]    963.866098: swiotlb_alloc+0x1d/0x1a0: gfp & 
__GFP_NOWARN
 Xorg-3170  [006]    963.866101: 
 => ttm_dma_populate+0x250/0x310 [ttm]
 => ttm_tt_populate+0x28/0x70 [ttm]
 => ttm_tt_bind+0x26/0x60 [ttm]
 => ttm_bo_handle_move_mem+0x51a/0x580 [ttm]
 => ttm_bo_validate+0xfa/0x110 [ttm]
 => ttm_bo_init_reserved+0x296/0x450 [ttm]
 => ttm_bo_init+0x73/0xd0 [ttm]
 => nouveau_bo_new+0x3eb/0x5c0 [nouveau]
 => nouveau_gem_new+0x66/0x110 [nouveau]
 => nouveau_gem_ioctl_new+0x48/0xc0 [nouveau]
 => drm_ioctl_kernel+0x66/0xb0 [drm]
 => drm_ioctl+0x28d/0x340 [drm]
 => nouveau_drm_ioctl+0x50/0xb0 [nouveau]
 => do_vfs_ioctl+0x92/0x5e0
 => ksys_ioctl+0x3a/0x70
 => __x64_sys_ioctl+0x16/0x20
 => do_syscall_64+0x5b/0x180
 => entry_SYSCALL_64_after_hwframe+0x44/0xa9
 Xorg-3170  [006]    963.866917: swiotlb_tbl_map_single+0x29b/0x2d0: 
swiotlb buffer is full (sz: 2097152 bytes)

Signed-off-by: Mike Galbraith <efa...@gmx.de>
---
 lib/swiotlb.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -714,7 +714,7 @@ swiotlb_alloc_buffer(struct device *dev,
 
phys_addr = swiotlb_tbl_map_single(dev,
__phys_to_dma(dev, io_tlb_start),
-   0, size, DMA_FROM_DEVICE, 0);
+   0, size, DMA_FROM_DEVICE, attrs);
if (phys_addr == SWIOTLB_MAP_ERROR)
goto out_warn;
 
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] kernel spew from nouveau/ swiotlb

2018-05-11 Thread Mike Galbraith
On Thu, 2018-05-10 at 12:28 +0200, Mike Galbraith wrote:
> On Thu, 2018-05-10 at 11:10 +0200, Mike Galbraith wrote:
> > Greetings,
> > 
> > When box is earning its keep, nouveau/swiotlb grumble.. a LOT.  The
> > below is from master.today.
> > 
> > [12594.640959] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> > bytes)
> > [12594.693000] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> > bytes)
> > [12594.713787] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> > bytes)
> > [12594.743413] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> > bytes)
> > [12594.796740] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> > bytes)
> > [12607.000774] swiotlb_tbl_map_single: 54 callbacks suppressed
> > [12607.000776] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> > bytes)
> > [12607.347941] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> > bytes)
> > [12608.677038] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> > bytes)
> > homer:/novell/ssh # dmesg|grep 'swiotlb buffer is full'|wc -l
> > 2052
> > homer:/novell/ssh # dmesg|grep 'callbacks suppressed'|wc -l
> > 171
> > 
> > lib/swiotlb.c:
> >  573 not_found:
> >  574 spin_unlock_irqrestore(_tlb_lock, flags);
> >  575 if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
> >  576 dev_warn(hwdev, "swiotlb buffer is full (sz: %zd 
> > bytes)\n", size);
> > 
> > Does nouveau perhaps want one of those DMA_ATTR_NO_WARN thingies?
> 
> Or should ttm perhaps always use the one on hand?  (seems to work)

No it didn't, I just didn't wait long enough for spew to start...

> ---
>  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> @@ -342,7 +342,7 @@ static struct dma_page *__ttm_dma_alloc_
>   if (!d_page)
>   return NULL;
>  
> - if (pool->type & IS_HUGE)
> + if (1 || pool->type & IS_HUGE)
>   attrs = DMA_ATTR_NO_WARN;
>  
>   vaddr = dma_alloc_attrs(pool->dev, pool->size, _page->dma,

While IS_HUGE is indeed false on my box, it just doesn't matter,
because when we get to either the old or the new alloc(), it calls
swiotlb_alloc_buffer(), which drops attrs passed to it on the floor,
making it unlikely that alloc() caller wishes are granted.

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] kernel spew from nouveau/ swiotlb

2018-05-10 Thread Mike Galbraith
On Thu, 2018-05-10 at 17:31 +0200, Mike Galbraith wrote:
> On Thu, 2018-05-10 at 10:31 -0400, Jerome Glisse wrote:
> > 
> > Could you bisect ? I would love to point finger upstream to the DMA
> > folk who made changes to that API without testing with GPU.
> 
> Rummaging a bit, it might be...
> 

(unsend, whack duplicate line, munge, send;)

> nouveau_bo_new()
> ...
> ttm_dma_pool_alloc_new_pages()
>   dma_alloc_attrs()
> ops->alloc() == x86_swiotlb_alloc_coherent()
> x86_swiotlb_alloc_coherent() flags |= __GFP_NOWARN;
>   swiotlb_alloc_coherent(..flags)
> swiotlb_alloc_coherent(..flags) attrs = (flags & __GFP_NOWARN) ? 
> DMA_ATTR_NO_WARN : 0;
>   swiotlb_alloc_buffer(..attrs)
*  swiotlb_tbl_map_single(..0) passed 0 vs attrs, gripeage follows

Or something like that.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] kernel spew from nouveau/ swiotlb

2018-05-10 Thread Mike Galbraith
On Thu, 2018-05-10 at 10:31 -0400, Jerome Glisse wrote:
> 
> Could you bisect ? I would love to point finger upstream to the DMA
> folk who made changes to that API without testing with GPU.

Rummaging a bit, it might be...

nouveau_bo_new()
...
ttm_dma_pool_alloc_new_pages()
  dma_alloc_attrs()
ops->alloc() == x86_swiotlb_alloc_coherent()
x86_swiotlb_alloc_coherent() flags |= __GFP_NOWARN;
  swiotlb_alloc_coherent(..flags)
swiotlb_alloc_coherent(..flags) attrs = (flags & __GFP_NOWARN) ? 
DMA_ATTR_NO_WARN : 0;
  swiotlb_alloc_buffer(..attr)
swiotlb_alloc_buffer(..0)  <== hm, pass zero instead of attr?
  swiotlb_tbl_map_single() gripeage

...that?

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] kernel spew from nouveau/ swiotlb

2018-05-10 Thread Mike Galbraith
On Thu, 2018-05-10 at 11:10 +0200, Mike Galbraith wrote:
> Greetings,
> 
> When box is earning its keep, nouveau/swiotlb grumble.. a LOT.  The
> below is from master.today.
> 
> [12594.640959] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> bytes)
> [12594.693000] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> bytes)
> [12594.713787] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> bytes)
> [12594.743413] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> bytes)
> [12594.796740] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> bytes)
> [12607.000774] swiotlb_tbl_map_single: 54 callbacks suppressed
> [12607.000776] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> bytes)
> [12607.347941] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> bytes)
> [12608.677038] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 
> bytes)
> homer:/novell/ssh # dmesg|grep 'swiotlb buffer is full'|wc -l
> 2052
> homer:/novell/ssh # dmesg|grep 'callbacks suppressed'|wc -l
> 171
> 
> lib/swiotlb.c:
>  573 not_found:
>  574 spin_unlock_irqrestore(_tlb_lock, flags);
>  575 if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
>  576 dev_warn(hwdev, "swiotlb buffer is full (sz: %zd 
> bytes)\n", size);
> 
> Does nouveau perhaps want one of those DMA_ATTR_NO_WARN thingies?

Or should ttm perhaps always use the one on hand?  (seems to work)

---
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -342,7 +342,7 @@ static struct dma_page *__ttm_dma_alloc_
if (!d_page)
return NULL;
 
-   if (pool->type & IS_HUGE)
+   if (1 || pool->type & IS_HUGE)
attrs = DMA_ATTR_NO_WARN;
 
vaddr = dma_alloc_attrs(pool->dev, pool->size, _page->dma,
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] kernel spew from nouveau/ swiotlb

2018-05-10 Thread Mike Galbraith
Greetings,

When box is earning its keep, nouveau/swiotlb grumble.. a LOT.  The
below is from master.today.

[12594.640959] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[12594.693000] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[12594.713787] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[12594.743413] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[12594.796740] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[12607.000774] swiotlb_tbl_map_single: 54 callbacks suppressed
[12607.000776] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[12607.347941] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[12608.677038] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
homer:/novell/ssh # dmesg|grep 'swiotlb buffer is full'|wc -l
2052
homer:/novell/ssh # dmesg|grep 'callbacks suppressed'|wc -l
171

lib/swiotlb.c:
 573 not_found:
 574 spin_unlock_irqrestore(_tlb_lock, flags);
 575 if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
 576 dev_warn(hwdev, "swiotlb buffer is full (sz: %zd 
bytes)\n", size);

Does nouveau perhaps want one of those DMA_ATTR_NO_WARN thingies?

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] nouveau: swiotlb buffer is full (sz: 2097152 bytes)/swiotlb: coherent allocation failed, size=2097152 spam

2018-04-09 Thread Mike Galbraith
Greetings,

Box is i4790 w. GTX 980 running virgin master (.today).

All I have to do to trigger a slew of these warnings is to fire up
firefox, point it at a youtube clip, and let it autoplay while I do
routine kernel merge/build maintenance.  nouveau doesn't seem to care
deeply, but moans again and again and again...

 726 [2.743823] fb: switching to nouveaufb from EFI VGA
 727 [2.743850] Console: switching to colour dummy device 80x25
 728 [2.743973] nouveau :01:00.0: NVIDIA GM204 (124000a1)
...
 758 [2.826604] nouveau :01:00.0: bios: version 84.04.1f.00.02
 759 [2.827479] nouveau :01:00.0: fb: 4096 MiB GDDR5
 760 [2.827506] nouveau :01:00.0: bus: MMIO write of 8195 FAULT at 
10eb14 [ IBUS ]
 761 [2.891876] [TTM] Zone  kernel: Available graphics memory: 7927764 kiB
 762 [2.891880] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
 763 [2.891881] [TTM] Initializing pool allocator
 764 [2.891885] [TTM] Initializing DMA pool allocator
 765 [2.891895] nouveau :01:00.0: DRM: VRAM: 4096 MiB
 766 [2.891897] nouveau :01:00.0: DRM: GART: 1048576 MiB
 767 [2.891900] nouveau :01:00.0: DRM: TMDS table version 2.0
 768 [2.891902] nouveau :01:00.0: DRM: DCB version 4.1
 769 [2.891904] nouveau :01:00.0: DRM: DCB outp 00: 01000f02 00020030
 770 [2.891906] nouveau :01:00.0: DRM: DCB outp 01: 02000f00 
 771 [2.891908] nouveau :01:00.0: DRM: DCB outp 02: 02811f76 04400020
 772 [2.891910] nouveau :01:00.0: DRM: DCB outp 03: 02011f72 00020020
 773 [2.891912] nouveau :01:00.0: DRM: DCB outp 04: 04822f86 04400010
 774 [2.891914] nouveau :01:00.0: DRM: DCB outp 05: 04022f82 00020010
 775 [2.891916] nouveau :01:00.0: DRM: DCB outp 06: 04833f96 04400020
 776 [2.891918] nouveau :01:00.0: DRM: DCB outp 07: 04033f92 00020020
 777 [2.891920] nouveau :01:00.0: DRM: DCB outp 08: 02044f62 00020010
 778 [2.891922] nouveau :01:00.0: DRM: DCB outp 15: 01df5ff8 
 779 [2.891924] nouveau :01:00.0: DRM: DCB conn 00: 1030
 780 [2.891926] nouveau :01:00.0: DRM: DCB conn 01: 00020146
 781 [2.891928] nouveau :01:00.0: DRM: DCB conn 02: 01000246
 782 [2.891929] nouveau :01:00.0: DRM: DCB conn 03: 02000346
 783 [2.891931] nouveau :01:00.0: DRM: DCB conn 04: 00010461
 784 [2.891933] nouveau :01:00.0: DRM: DCB conn 05: 0570
 785 [2.953898] nouveau :01:00.0: DRM: failed to create encoder 1/8/0: 
-19
 786 [2.953902] nouveau :01:00.0: DRM: Virtual-1 has no encoders, 
removing
 787 [2.953928] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
 788 [2.953930] [drm] Driver supports precise vblank timestamp query.
 789 [3.006557] nouveau :01:00.0: DRM: MM: using COPY for buffer copies
...
 811 [3.329981] nouveau :01:00.0: DRM: allocated 1920x1080 fb: 0x8, 
bo fed1a05d
 812 [3.331002] fbcon: nouveaufb (fb0) is primary device
 813 [3.376014] usb 3-10: new full-speed USB device number 3 using xhci_hcd
 814 [3.376174] hid-generic 0003:0E8F:0020.0002: input,hidraw1: USB HID 
v1.10 Mouse [GASIA PS2toUSB Adapter] on usb-:00:14.0-1/input1
 815 [3.526234] usb 3-10: New USB device found, idVendor=046d, 
idProduct=c52b, bcdDevice=12.01
 816 [3.526235] usb 3-10: New USB device strings: Mfr=1, Product=2, 
SerialNumber=0
 817 [3.526237] usb 3-10: Product: USB Receiver
 818 [3.526237] usb 3-10: Manufacturer: Logitech
 819 [3.63] Console: switching to colour frame buffer device 240x67
 820 [3.872710] nouveau :01:00.0: fb0: nouveaufb frame buffer device
 821 [3.892080] [drm] Initialized nouveau 1.3.1 20120801 for :01:00.0 
on minor 0
...
[ 6253.341530] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[ 6253.341535] nouveau :01:00.0: swiotlb: coherent allocation failed, 
size=2097152
[ 6253.341539] CPU: 2 PID: 3740 Comm: Xorg Kdump: loaded Tainted: G
E4.16.0.gf8cf2f1-default #687
[ 6253.341541] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[ 6253.341543] Call Trace:
[ 6253.341553]  dump_stack+0x78/0xb3
[ 6253.341559]  swiotlb_alloc+0x134/0x170
[ 6253.341567]  ttm_dma_pool_alloc_new_pages+0x161/0x3c0 [ttm]
[ 6253.341574]  ttm_dma_pool_get_pages+0xe0/0x1c0 [ttm]
[ 6253.341580]  ttm_dma_populate+0x250/0x310 [ttm]
[ 6253.341586]  ttm_tt_populate+0x28/0x70 [ttm]
[ 6253.341591]  ttm_tt_bind+0x26/0x60 [ttm]
[ 6253.341596]  ttm_bo_handle_move_mem+0x51a/0x580 [ttm]
[ 6253.341612]  ? drm_mm_insert_node_in_range+0x42b/0x480 [drm]
[ 6253.341617]  ttm_bo_validate+0xfa/0x110 [ttm]
[ 6253.341622]  ? _raw_write_unlock+0x12/0x30
[ 6253.341634]  ? drm_vma_offset_add+0x5c/0x70 [drm]
[ 6253.341638]  ttm_bo_init_reserved+0x296/0x450 [ttm]
[ 6253.341643]  ttm_bo_init+0x73/0xd0 [ttm]
[ 6253.341675]  ? nv10_bo_put_tile_region+0x50/0x50 [nouveau]
[ 6253.341704]  

Re: [Nouveau] [PATCH v3] drm/nouveau: Move irq setup/teardown to pci ctor/dtor

2018-01-25 Thread Mike Galbraith
On Fri, 2018-01-26 at 02:20 +0100, Adam Borowski wrote:
> On Thu, Jan 25, 2018 at 06:29:53PM -0500, Lyude Paul wrote:
> > This was made apparent by what appeared to be a regression in the
> > mainline kernel that started introducing suspend/resume issues for
> > nouveau:
> > 
> > a0c9259dc4e1 (irq/matrix: Spread interrupts on allocation)
> 
> I'm just a dumb user here, but I confirm:
> CPU: AMD Phenom II X6 1055T, GPU: GTX 560 Ti
> 100% fail to resume GPU on 4.15-rc*, 100% ok with your patch.

Ditto.. and my GTX 980 works again.

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [nouveau] grumble/gripe ... fifo: read fault ... channel 12 killed! (eternal freeze-frame)

2018-01-02 Thread Mike Galbraith
Twice now with v4.15-rc6, my display has gone belly up.

Note: swiotlb: suppress warning when __GFP_NOWARN is set v2 is applied,
but I don't _think_ it was the first time it happened.

[ 3729.558261] nouveau :01:00.0: gr: TRAP ch 2 [00ff842000 Xorg[3413]]
[ 3729.558269] nouveau :01:00.0: gr: GPC0/TPC0/TEX: 8041
[ 3729.558273] nouveau :01:00.0: gr: GPC0/TPC1/TEX: 8041
[ 3729.558277] nouveau :01:00.0: gr: GPC0/TPC2/TEX: 8041
[ 3729.558280] nouveau :01:00.0: gr: GPC0/TPC3/TEX: 8041
[ 3729.558286] nouveau :01:00.0: gr: GPC1/TPC0/TEX: 8041
[ 3729.558289] nouveau :01:00.0: gr: GPC1/TPC1/TEX: 8041
[ 3729.558293] nouveau :01:00.0: gr: GPC1/TPC2/TEX: 8041
[ 3729.558297] nouveau :01:00.0: gr: GPC1/TPC3/TEX: 8041
[ 3729.558302] nouveau :01:00.0: gr: GPC2/TPC0/TEX: 8041
[ 3729.558305] nouveau :01:00.0: gr: GPC2/TPC1/TEX: 8041
[ 3729.558309] nouveau :01:00.0: gr: GPC2/TPC2/TEX: 8041
[ 3729.558313] nouveau :01:00.0: gr: GPC2/TPC3/TEX: 8041
[ 3729.558318] nouveau :01:00.0: gr: GPC3/TPC0/TEX: 8041
[ 3729.558322] nouveau :01:00.0: gr: GPC3/TPC1/TEX: 8041
[ 3729.558325] nouveau :01:00.0: gr: GPC3/TPC2/TEX: 8041
[ 3729.558329] nouveau :01:00.0: gr: GPC3/TPC3/TEX: 8041
[ 3729.558336] nouveau :01:00.0: fifo: read fault at 000a9dd000 engine 00 
[GR] client 0a [GPC2/T1_3] reason 02 [PTE] on channel 2 [00ff842000 Xorg[3413]]
[ 3729.558341] nouveau :01:00.0: fifo: channel 2: killed
[ 3729.558343] nouveau :01:00.0: fifo: runlist 0: scheduled for recovery
[ 3729.558346] nouveau :01:00.0: fifo: engine 0: scheduled for recovery
[ 3729.558352] nouveau :01:00.0: fifo: engine 7: scheduled for recovery
[ 3729.558355] nouveau :01:00.0: Xorg[3413]: channel 2 killed!
[ 3729.562994] nouveau :01:00.0: gr: TRAP ch 12 [00fd2d6000 
plasmashell[3898]]
[ 3729.563011] nouveau :01:00.0: gr: GPC0/TPC0/TEX: 8041
[ 3729.563015] nouveau :01:00.0: gr: GPC0/TPC1/TEX: 8041
[ 3729.563018] nouveau :01:00.0: gr: GPC0/TPC2/TEX: 8041
[ 3729.563022] nouveau :01:00.0: gr: GPC0/TPC3/TEX: 8041
[ 3729.563027] nouveau :01:00.0: gr: GPC1/TPC0/TEX: 8041
[ 3729.563031] nouveau :01:00.0: gr: GPC1/TPC1/TEX: 8041
[ 3729.563034] nouveau :01:00.0: gr: GPC1/TPC2/TEX: 8041
[ 3729.563038] nouveau :01:00.0: gr: GPC1/TPC3/TEX: 8041
[ 3729.563043] nouveau :01:00.0: gr: GPC2/TPC0/TEX: 8041
[ 3729.563047] nouveau :01:00.0: gr: GPC2/TPC1/TEX: 8041
[ 3729.563050] nouveau :01:00.0: gr: GPC2/TPC2/TEX: 8041
[ 3729.563054] nouveau :01:00.0: gr: GPC2/TPC3/TEX: 8041
[ 3729.563059] nouveau :01:00.0: gr: GPC3/TPC0/TEX: 8041
[ 3729.563063] nouveau :01:00.0: gr: GPC3/TPC1/TEX: 8041
[ 3729.563066] nouveau :01:00.0: gr: GPC3/TPC2/TEX: 8041
[ 3729.563070] nouveau :01:00.0: gr: GPC3/TPC3/TEX: 8041
[ 3729.563078] nouveau :01:00.0: fifo: read fault at 0004be4000 engine 00 
[GR] client 01 [GPC0/T1_0] reason 02 [PTE] on channel 12 [00fd2d6000 
plasmashell[3898]]
[ 3729.563083] nouveau :01:00.0: fifo: channel 12: killed
[ 3729.563085] nouveau :01:00.0: fifo: runlist 0: scheduled for recovery
[ 3729.563089] nouveau :01:00.0: fifo: engine 0: scheduled for recovery
[ 3729.563092] nouveau :01:00.0: plasmashell[3898]: channel 12 killed!
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152

2017-12-31 Thread Mike Galbraith
On Sun, 2017-12-31 at 13:27 -0500, Ilia Mirkin wrote:
> On Tue, Dec 19, 2017 at 8:45 AM, Christian König
> <ckoenig.leichtzumer...@gmail.com> wrote:
> > Am 19.12.2017 um 11:39 schrieb Michel Dänzer:
> >>
> >> On 2017-12-19 11:37 AM, Michel Dänzer wrote:
> >>>
> >>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote:
> >>>>
> >>>> On 12/18/17 7:06 PM, Mike Galbraith wrote:
> >>>>>
> >>>>> Greetings,
> >>>>>
> >>>>> Kernel bound workloads seem to trigger the below for whatever reason.
> >>>>>I only see this when beating up NFS.  There was a kworker wakeup
> >>>>> latency issue, but with a bandaid applied to fix that up, I can still
> >>>>> trigger this.
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> i have seen this one as well with my system, but i could not find an
> >>>> easy way to trigger it for bisecting purpose. If you can trigger it
> >>>> conveniently, a bisect would be nice!
> >>>
> >>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a
> >>> backup, creating memory pressure. I happen to have just finished
> >>> bisecting, the result is:
> >>>
> >>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit
> >>> commit 648bc3574716400acc06f99915815f80d9563783
> >>> Author: Christian König <christian.koe...@amd.com>
> >>> Date:   Thu Jul 6 09:59:43 2017 +0200
> >>>
> >>>  drm/ttm: add transparent huge page support for DMA allocations v2
> >>>
> >>>  Try to allocate huge pages when it makes sense.
> >>>
> >>>  v2: fix comment and use ifdef
> >>>
> >>>
> >> BTW, I haven't noticed any bad effects other than the dmesg splats, so
> >> maybe it's just noise about transient failures for which there is a
> >> proper fallback in place.
> >
> >
> > Yeah, I think that is exactly what happens here.
> >
> > We try to allocate a huge page, but fail and so fall back to using multiple
> > 4k pages instead.
> >
> > Going to send out a patch to suppress the warning.
> 
> Hi Christian,
> 
> Did you ever send out such a patch? I didn't see one on the list, but
> perhaps I missed it. One definitely hasn't made it upstream yet. (I
> just hit the issue myself with Linus's tree from last night.)

Actually, that wants a bit more methinks, because while the stack dump
goes away, you still get spammed, it just comes in smaller chunks.

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152

2017-12-18 Thread Mike Galbraith
On Mon, 2017-12-18 at 20:01 +0100, Tobias Klausmann wrote:
> On 12/18/17 7:06 PM, Mike Galbraith wrote:
> > Greetings,
> >
> > Kernel bound workloads seem to trigger the below for whatever reason.
> >   I only see this when beating up NFS.  There was a kworker wakeup
> > latency issue, but with a bandaid applied to fix that up, I can still
> > trigger this.
> 
> 
> Hi,
> 
> i have seen this one as well with my system, but i could not find an 
> easy way to trigger it for bisecting purpose. If you can trigger it 
> conveniently, a bisect would be nice!

Workload permitting.  To reproduce, mount your box NFS, cd to somewhere
the NFS mount, and just do bonnie -s .  There, maybe
you'll beat me to it.  I hope so, I have multiple kernels doing the
annoying "baby birds in a nest" thing at me literally endlessly :)

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152

2017-12-18 Thread Mike Galbraith
Greetings,

Kernel bound workloads seem to trigger the below for whatever reason.
 I only see this when beating up NFS.  There was a kworker wakeup
latency issue, but with a bandaid applied to fix that up, I can still
trigger this.

[ 1313.811031] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[ 1313.811035] swiotlb: coherent allocation failed for device :01:00.0 
size=2097152
[ 1313.811038] CPU: 6 PID: 3026 Comm: Xorg Tainted: GE
4.15.0.g1291a0d5-master #355
[ 1313.811040] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[ 1313.811041] Call Trace:
[ 1313.811049]  dump_stack+0x7c/0xb6
[ 1313.811053]  swiotlb_alloc_coherent+0x13f/0x150
[ 1313.811060]  ttm_dma_pool_alloc_new_pages+0x106/0x3c0 [ttm]
[ 1313.811066]  ttm_dma_pool_get_pages+0x10a/0x1e0 [ttm]
[ 1313.811070]  ttm_dma_populate+0x21f/0x2f0 [ttm]
[ 1313.811075]  ttm_tt_bind+0x2f/0x60 [ttm]
[ 1313.811079]  ttm_bo_handle_move_mem+0x51f/0x580 [ttm]
[ 1313.811084]  ? ttm_bo_handle_move_mem+0x5/0x580 [ttm]
[ 1313.811088]  ttm_bo_validate+0x10c/0x120 [ttm]
[ 1313.811092]  ? ttm_bo_validate+0x5/0x120 [ttm]
[ 1313.811106]  ? drm_mode_setcrtc+0x20e/0x540 [drm]
[ 1313.811109]  ttm_bo_init_reserved+0x290/0x490 [ttm]
[ 1313.84]  ttm_bo_init+0x52/0xb0 [ttm]
[ 1313.811141]  ? nv10_bo_put_tile_region+0x60/0x60 [nouveau]
[ 1313.811163]  nouveau_bo_new+0x465/0x5e0 [nouveau]
[ 1313.811184]  ? nv10_bo_put_tile_region+0x60/0x60 [nouveau]
[ 1313.811203]  nouveau_gem_new+0x66/0x110 [nouveau]
[ 1313.811223]  ? nouveau_gem_new+0x110/0x110 [nouveau]
[ 1313.811241]  nouveau_gem_ioctl_new+0x48/0xc0 [nouveau]
[ 1313.811249]  drm_ioctl_kernel+0x64/0xb0 [drm]
[ 1313.811257]  drm_ioctl+0x2a4/0x360 [drm]
[ 1313.811276]  ? nouveau_gem_new+0x110/0x110 [nouveau]
[ 1313.811285]  ? drm_ioctl+0x5/0x360 [drm]
[ 1313.811304]  nouveau_drm_ioctl+0x50/0xb0 [nouveau]
[ 1313.811308]  do_vfs_ioctl+0x90/0x690
[ 1313.811311]  ? do_vfs_ioctl+0x5/0x690
[ 1313.811313]  SyS_ioctl+0x3b/0x70
[ 1313.811316]  entry_SYSCALL_64_fastpath+0x1f/0x91
[ 1313.811320] RIP: 0033:0x7f3234746227
[ 1313.811321] RSP: 002b:7ffc3ace0408 EFLAGS: 3246 ORIG_RAX: 
0010
[ 1313.811324] RAX: ffda RBX: 025515d0 RCX: 7f3234746227
[ 1313.811325] RDX: 7ffc3ace0460 RSI: c0306480 RDI: 000b
[ 1313.811326] RBP: 00824120 R08: 02548f80 R09: 025490d0
[ 1313.811328] R10:  R11: 3246 R12: 093d
[ 1313.811329] R13: 02aff74c R14: 00824150 R15: 
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Wed, 2017-07-12 at 07:37 -0400, Ilia Mirkin wrote:
> On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith <efa...@gmx.de> wrote:
> > On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote:
> >> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> >> >
> >> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> >> > too much trouble, a bisect would be pretty useful.
> >>
> >> Bisection seemingly went fine, but the result is odd.
> >>
> >> e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit
> >
> > But it really really is bad.  Looking at gitk fork in the road leading
> > to it...
> >
> > 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good
> > e4e818cc2d7c drm: make drm_panel.h self-contained - good
> > 9cf8f5802f39 drm: add missing declaration to drm_blend.h  - good
> >
> > Before the git highway splits, all is well.  The lane with commits
> > works fine at both ends, but e98c58e55f68 is busted.  Merge arfifact?
> 
> Hmmm... that tree does not appear to have gotten a v4.12 backmerge at
> any point. The last backmerge from Linus as far as I can tell was
> v4.11-rc7. Could be an interaction with some out-of-tree change.

Ok, a network outage gave me time to go hunting.  Indeed it is a bad
interaction with the tree DRM merged into.  All DRM did was to slip a
WARN_ON_ONCE() that nouveau triggers into a kernel module where such
things no longer warn, they blow the box out of the water.  I made a
dinky testcase module (attached), and bisected to the real root

19d436268dde95389c616bb3819da73f0a8b28a8 is the first bad commit
commit 19d436268dde95389c616bb3819da73f0a8b28a8
Author: Peter Zijlstra <pet...@infradead.org>
Date:   Sat Feb 25 08:56:53 2017 +0100

debug: Add _ONCE() logic to report_bug()

Josh suggested moving the _ONCE logic inside the trap handler, using a
bit in the bug_entry::flags field, avoiding the need for the extra
variable.

Sadly this only works for WARN_ON_ONCE(), since the others have
printk() statements prior to triggering the trap.

Still, this saves a fair amount of text and some data:

  text data   filename
  10682460 4530992defconfig-build/vmlinux.orig
  10665111 4530096defconfig-build/vmlinux.patched

Suggested-by: Josh Poimboeuf <jpoim...@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Arnd Bergmann <a...@arndb.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Ingo Molnar <mi...@kernel.org>

:04 04 9f47f66ec4c234f6ee8e2a09e991c95fe47cf2c1 
3e92aa9e77b39ed075ae2c3bdf041d92ef898f62 M  arch
:04 04 34f70b73d40c82533dd7df9b289106be69e2fa8d 
dd5d7248694a36b3e170f2dca5d9c4121535a990 M  include
:04 04 f6e627b0d378f0a00d2987fdd0c7b215306e6e3c 
b360d4ee2579744cce530184d7dab13493f73ee0 M  lib ---
 kernel/Makefile |2 ++
 kernel/foo.c|   15 +++
 2 files changed, 17 insertions(+)

--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,8 @@ obj-$(CONFIG_MEMBARRIER) += membarrier.o
 
 obj-$(CONFIG_HAS_IOMEM) += memremap.o
 
+obj-m += foo.o
+
 $(obj)/configs.o: $(obj)/config_data.h
 
 targets += config_data.gz
--- /dev/null
+++ b/kernel/foo.c
@@ -0,0 +1,15 @@
+#include 
+#include 
+
+static int __init foo_init(void)
+{
+	printk(KERN_INFO "foo: module loaded\n");
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+static void __exit foo_exit(void) { }
+
+module_init(foo_init);
+module_exit(foo_exit);
+MODULE_LICENSE("GPL");
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Wed, 2017-07-12 at 07:37 -0400, Ilia Mirkin wrote:
> On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith <efa...@gmx.de> wrote:
> > On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote:
> >> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> >> >
> >> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> >> > too much trouble, a bisect would be pretty useful.
> >>
> >> Bisection seemingly went fine, but the result is odd.
> >>
> >> e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit
> >
> > But it really really is bad.  Looking at gitk fork in the road leading
> > to it...
> >
> > 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good
> > e4e818cc2d7c drm: make drm_panel.h self-contained - good
> > 9cf8f5802f39 drm: add missing declaration to drm_blend.h  - good
> >
> > Before the git highway splits, all is well.  The lane with commits
> > works fine at both ends, but e98c58e55f68 is busted.  Merge arfifact?
> 
> Hmmm... that tree does not appear to have gotten a v4.12 backmerge at
> any point. The last backmerge from Linus as far as I can tell was
> v4.11-rc7. Could be an interaction with some out-of-tree change.

FWIW, checking out the fingered commit then..

git log --oneline 52d9d38c183b..e98c58e55f68|grep nouveau and reverting
the lot helped not at all.

Checking out 6b7781b42dc9 and reverting the fingered commit did.  Given
the nouveau bits reverted are mostly the vblank changes, CC to Daniel,
maybe he'll know why both GTX 980 and GeForce 8600 GT get all upset.

Either I'm damn lucky, both of my nvidia equipped boxen going boom 100%
repeatably, or there are a lot of folks out there who haven't yet tried
suspend with our latest/greatest kernel.  I suspect the later.

-Mike

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> 
> Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> too much trouble, a bisect would be pretty useful.

Bisection seemingly went fine, but the result is odd.

e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit

-Mike



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Fri, 2017-07-14 at 14:42 -0500, Josh Poimboeuf wrote:
> 
> Does this fix it?

Yup, both READONLY __bug_table and "extra stern" warning are gone.

> diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
> index 39e702d..aa6b202 100644
> --- a/arch/x86/include/asm/bug.h
> +++ b/arch/x86/include/asm/bug.h
> @@ -35,7 +35,7 @@
>  #define _BUG_FLAGS(ins, flags)   
> \
>  do { \
>   asm volatile("1:\t" ins "\n"\
> -  ".pushsection __bug_table,\"a\"\n" \
> +  ".pushsection __bug_table,\"aw\"\n"\
>"2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"   \
>"\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"   \
>"\t.word %c1""\t# bug_entry::line\n"   \
> @@ -52,7 +52,7 @@ do {
> \
>  #define _BUG_FLAGS(ins, flags)   
> \
>  do { \
>   asm volatile("1:\t" ins "\n"\
> -  ".pushsection __bug_table,\"a\"\n" \
> +  ".pushsection __bug_table,\"aw\"\n"\
>"2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"   \
>"\t.word %c0""\t# bug_entry::flags\n"  \
>"\t.org 2b+%c1\n"  \
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Fri, 2017-07-14 at 17:50 +0200, Peter Zijlstra wrote:
> On Fri, Jul 14, 2017 at 03:36:08PM +0200, Mike Galbraith wrote:
> > Ok, a network outage gave me time to go hunting.  Indeed it is a bad
> > interaction with the tree DRM merged into.  All DRM did was to slip a
> > WARN_ON_ONCE() that nouveau triggers into a kernel module where such
> > things no longer warn, they blow the box out of the water.  I made a
> > dinky testcase module (attached), and bisected to the real root
> > 
> > 19d436268dde95389c616bb3819da73f0a8b28a8 is the first bad commit
> > commit 19d436268dde95389c616bb3819da73f0a8b28a8
> > Author: Peter Zijlstra <pet...@infradead.org>
> > Date:   Sat Feb 25 08:56:53 2017 +0100
> > 
> > debug: Add _ONCE() logic to report_bug()
> 
> Urgh, is for some mysterious reason the __bug_table section of modules
> ending up in RO memory?
> 
> I forever get lost in that link magic :/

+1

drm.ko
 20 __bug_table   0630      0004bff3  2**0
  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
vmlinux
 15 __bug_table   ba84  81af26c0  01af26c0  00cf26c0  2**0
  CONTENTS, ALLOC, LOAD, READONLY, DATA

Danged if I know... um um RELOC business mucks things up?

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
Greetings,

I met $subject in master-rt post drm merge, but taking the config
(attached) to virgin v4.12-10624-g9967468c0a10, it's reproducible.

  KERNEL: vmlinux-4.12.0.g9967468-preempt.gz
DUMPFILE: vmcore
CPUS: 8
DATE: Tue Jul 11 18:55:28 2017
  UPTIME: 00:02:03
LOAD AVERAGE: 3.43, 1.39, 0.52
   TASKS: 467
NODENAME: homer
 RELEASE: 4.12.0.g9967468-preempt
 VERSION: #155 SMP PREEMPT Tue Jul 11 18:18:11 CEST 2017
 MACHINE: x86_64  (3591 Mhz)
  MEMORY: 16 GB
   PANIC: "BUG: unable to handle kernel paging request at a022990f"
 PID: 4658
 COMMAND: "kworker/u16:26"
TASK: 8803c6068f80  [THREAD_INFO: 8803c6068f80]
 CPU: 7
   STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 4658   TASK: 8803c6068f80  CPU: 7   COMMAND: "kworker/u16:26"
 #0 [c900039f76a0] machine_kexec at 810481fc
 #1 [c900039f76f0] __crash_kexec at 81109e3a
 #2 [c900039f77b0] crash_kexec at 8110adc9
 #3 [c900039f77c8] oops_end at 8101d059
 #4 [c900039f77e8] no_context at 81055ce5
 #5 [c900039f7838] do_page_fault at 81056c5b
 #6 [c900039f7860] page_fault at 81690a88
[exception RIP: report_bug+93]
RIP: 8167227d  RSP: c900039f7918  RFLAGS: 00010002
RAX: a0229905  RBX: a020af0f  RCX: 0001
RDX: 0907  RSI: a020af11  RDI: 98f6
RBP: c900039f7a58   R8: 0001   R9: 03fc
R10: 81a01906  R11: 8803f84711f8  R12: a02231fb
R13: 0260  R14: 0004  R15: 0006
ORIG_RAX:   CS: 0010  SS: 0018
 #7 [c900039f7910] report_bug at 81672248
 #8 [c900039f7938] fixup_bug at 8101af85
 #9 [c900039f7950] do_trap at 8101b0d9
#10 [c900039f79a0] do_error_trap at 8101b190
#11 [c900039f7a50] invalid_op at 8169063e
[exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335]
RIP: a020af0f  RSP: c900039f7b00  RFLAGS: 00010086
RAX: a04fa100  RBX: 8803f9550800  RCX: 0001
RDX: a0228a58  RSI: 0001  RDI: a022321b
RBP: c900039f7b80   R8:    R9: a020adc0
R10: a048a1b0  R11: 8803f84711f8  R12: 0001
R13: 8803f8471000  R14: c900039f7b94  R15: c900039f7bd0
ORIG_RAX:   CS: 0010  SS: 0018
#12 [c900039f7b18] gf119_head_vblank_put at a04422f9 [nouveau]
#13 [c900039f7b88] drm_get_last_vbltimestamp at a020ad91 [drm]
#14 [c900039f7ba8] drm_update_vblank_count at a020b3e1 [drm]
#15 [c900039f7c10] drm_vblank_disable_and_save at a020bbe9 [drm]
#16 [c900039f7c40] drm_crtc_vblank_off at a020c3c0 [drm]
#17 [c900039f7cb0] nouveau_display_fini at a048a4d6 [nouveau]
#18 [c900039f7ce0] nouveau_display_suspend at a048ac4f [nouveau]
#19 [c900039f7d00] nouveau_do_suspend at a047e5ec [nouveau]
#20 [c900039f7d38] nouveau_pmops_suspend at a047e77d [nouveau]
#21 [c900039f7d50] pci_pm_suspend at 813b1ff0
#22 [c900039f7d80] dpm_run_callback at 814c4dbd
#23 [c900039f7db8] __device_suspend at 814c5a61
#24 [c900039f7e30] async_suspend at 814c5cfa
#25 [c900039f7e48] async_run_entry_fn at 81091683
#26 [c900039f7e70] process_one_work at 810882bc
#27 [c900039f7eb0] worker_thread at 8108854a
#28 [c900039f7f10] kthread at 8108e387
#29 [c900039f7f50] ret_from_fork at 8168fa85
crash> gdb list *drm_calc_vbltimestamp_from_scanoutpos+335
0xa020af0f is in drm_calc_vbltimestamp_from_scanoutpos 
(drivers/gpu/drm/drm_vblank.c:608).
603 /* If mode timing undefined, just return as no-op:
604  * Happens during initial modesetting of a crtc.
605  */
606 if (mode->crtc_clock == 0) {
607 DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", 
pipe);
608 WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
609 
610 return false;
611 }
612 
crash> gdb list *report_bug+93
0x8167227d is in report_bug (lib/bug.c:177).
172 return BUG_TRAP_TYPE_WARN;
173 
174 /*
175  * Since this is the only store, concurrency is 
not an issue.
176  */
177 bug->flags |= BUGFLAG_DONE;
178 }
179 }
180 
181 if (warning) {
crash>

config.xz
Description: application/xz
___
Nouveau mailing list
Nouveau@lists.freedesktop.org

Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Fri, 2017-07-14 at 17:05 +0200, Tobias Klausmann wrote:
> On 7/14/17 3:41 PM, Mike Galbraith wrote:
> > On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote:
> >>   All DRM did was to slip a
> >> WARN_ON_ONCE() that nouveau triggers into a kernel module where such
> >> things no longer warn, they blow the box out of the water.
> > BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c
> > into a WARN_ONCE(), and all is peachy, you get the warning, box lives.
> >
> > ---
> >   drivers/gpu/drm/drm_vblank.c |3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > --- a/drivers/gpu/drm/drm_vblank.c
> > +++ b/drivers/gpu/drm/drm_vblank.c
> > @@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp
> >  */
> > if (mode->crtc_clock == 0) {
> > DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe);
> > -   WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
> > +   WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report me.\n",
> > + dev->driver->name);
> >   
> > return false;
> > }
> 
> 
> Hey,
> 
> confirmed this helps saving the box, but we still have to find the root 
> cause! Backtrace with the above fix applied (and the one which came in 
> with the latest drm-fixes merge)!

Yeah, I'll be reporting some extra whining from my 8600 GT backup box.

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [drm/nouveau] GeForce 8600 GT boot/suspend grumbling

2017-07-17 Thread Mike Galbraith
Greetings,

box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside
kernel: master.today (v4.12-11690-gccd5d1b91f22)

lspci -nn -d 10de:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce 8600 
GT] [10de:0402] (rev a1)

abreviated dmesg:
...
[3.720990] fb: switching to nouveaufb from VESA VGA
[3.744489] Console: switching to colour dummy device 80x25
[3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2)
...
[3.846963] usbcore: registered new interface driver uas
[3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12
[3.870769] hid-generic 0003:04CA:002B.0002: input,hidraw1: USB HID v1.11 
Keyboard [Liteon Wireless keyboard and mouse] on usb-:00:1d.0-1/input0
[3.870773] nouveau :01:00.0: bios: M0203T not found
[3.870774] nouveau :01:00.0: bios: M0203E not matched!
[3.870777] nouveau :01:00.0: fb: 256 MiB DDR2
[3.871168] input: Liteon Wireless keyboard and mouse as 
/devices/pci:00/:00:1d.0/usb4/4-1/4-1:1.1/0003:04CA:002B.0003/input/input7
[3.896090] usb 3-2: new low-speed USB device number 3 using uhci_hcd
[3.919101] [TTM] Zone  kernel: Available graphics memory: 3881208 kiB
[3.919106] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[3.919110] [TTM] Initializing pool allocator
[3.919120] [TTM] Initializing DMA pool allocator
[3.919141] nouveau :01:00.0: DRM: VRAM: 256 MiB
[3.919146] nouveau :01:00.0: DRM: GART: 1048576 MiB
[3.919152] nouveau :01:00.0: DRM: TMDS table version 2.0
[3.919157] nouveau :01:00.0: DRM: DCB version 4.0
[3.919162] nouveau :01:00.0: DRM: DCB outp 00: 04000310 0028
[3.919167] nouveau :01:00.0: DRM: DCB outp 01: 02011300 0028
[3.919171] nouveau :01:00.0: DRM: DCB outp 02: 01011302 0030
[3.919176] nouveau :01:00.0: DRM: DCB outp 03: 02022322 00020010
[3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083
[3.919185] nouveau :01:00.0: DRM: DCB conn 00: 
[3.919189] nouveau :01:00.0: DRM: DCB conn 01: 1130
[3.919194] nouveau :01:00.0: DRM: DCB conn 02: 2261
[3.919198] nouveau :01:00.0: DRM: DCB conn 03: 0310
[3.919202] nouveau :01:00.0: DRM: DCB conn 04: 0311
[3.919206] nouveau :01:00.0: DRM: DCB conn 05: 0313
[3.919258] [ cut here ]
[3.919316] WARNING: CPU: 3 PID: 224 at 
drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c:83 
nvkm_outp_xlat.isra.0+0x26/0x80 [nouveau]
[3.919322] Modules linked in: uas(E) usb_storage(E) hid_generic(E+) 
usbhid(E) nouveau(E+) wmi(E) video(E) i2c_algo_bit(E) ahci(E+) 
drm_kms_helper(E) syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) 
fb_sys_fops(E) firewire_ohci(E) libata(E) firewire_core(E) crc_itu_t(E) 
ehci_pci(E+) serio_raw(E) ttm(E) button(E) drm(E) uhci_hcd(E) ehci_hcd(E) 
usbcore(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) 
scsi_dh_alua(E) scsi_mod(E) autofs4(E)
[3.919360] CPU: 3 PID: 224 Comm: systemd-udevd Tainted: GE   
4.12.0.gccd5d1b-master #186
[3.919366] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
[3.919370] task: 880211cd3d40 task.stack: c9714000
[3.919412] RIP: 0010:nvkm_outp_xlat.isra.0+0x26/0x80 [nouveau]
[3.919417] RSP: 0018:c97177b0 EFLAGS: 00010202
[3.919421] RAX: 88021128fc08 RBX: 880211c0aa80 RCX: c9717870
[3.919425] RDX: c97177fc RSI:  RDI: 0001
[3.919429] RBP: 88021128fc10 R08: 880211c0aa80 R09: 880211c0aa80
[3.919433] R10:  R11: ea00084cf980 R12: 8802130f5500
[3.919437] R13: 880211c0a9d0 R14: 0003 R15: 0004
[3.919442] FS:  7fe2035b68c0() GS:88022fd8() 
knlGS:
[3.919448] CS:  0010 DS:  ES:  CR0: 80050033
[3.919452] CR2: 7fe203586000 CR3: 0002133e3000 CR4: 06e0
[3.919456] Call Trace:
[3.919500]  nvkm_outp_ctor+0x105/0x130 [nouveau]
[3.919508]  ? kmem_cache_alloc_trace+0x135/0x140
[3.919550]  nvkm_disp_oneinit+0x132/0x510 [nouveau]
[3.919583]  nvkm_engine_init+0x74/0x1d0 [nouveau]
[3.919617]  nvkm_subdev_init+0xaf/0x200 [nouveau]
[3.919648]  nvkm_engine_ref+0x4a/0x70 [nouveau]
[3.919681]  nvkm_ioctl_new+0x118/0x280 [nouveau]
[3.919705]  ? drm_property_create+0x100/0x150 [drm]
[3.919746]  ? nvkm_udevice_map+0x40/0x40 [nouveau]
[3.919779]  nvkm_ioctl+0x13c/0x230 [nouveau]
[3.919785]  ? try_to_grab_pending+0xa7/0x130
[3.919816]  nvif_object_init+0xc0/0x130 [nouveau]
[3.919859]  nouveau_display_create+0x13e/0x630 [nouveau]
[3.919903]  nouveau_drm_load+0x1e2/0x8d0 [nouveau]
[3.919910]  ? sysfs_do_create_link_sd.isra.2+0x6b/0xb0
[3.919924]  drm_dev_register+0x139/0x1d0 [drm]
[3.919930]  ? pci_read_config_word.part.9+0x47/0x60
[

Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Fri, 2017-07-14 at 18:10 +0200, Peter Zijlstra wrote:
> On Fri, Jul 14, 2017 at 05:58:18PM +0200, Mike Galbraith wrote:
> > On Fri, 2017-07-14 at 17:50 +0200, Peter Zijlstra wrote:
> 
> > > Urgh, is for some mysterious reason the __bug_table section of modules
> > > ending up in RO memory?
> > > 
> > > I forever get lost in that link magic :/
> > 
> > +1
> > 
> > drm.ko
> >  20 __bug_table   0630      0004bff3  
> > 2**0
> >   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
> > vmlinux
> >  15 __bug_table   ba84  81af26c0  01af26c0  00cf26c0  
> > 2**0
> >   CONTENTS, ALLOC, LOAD, READONLY, DATA
> > 
> > Danged if I know... um um RELOC business mucks things up?
> 
> Argh, it shouldn't be READONLY for vmlinux either, but apparently that
> is working for mysterious reasons.
> 
> Some architectures were in fact complaining that I broke that, and hence
> patch:
> 
> b5effd3815cc ("debug: Fix __bug_table[] in arch linker scripts")
> 
> I think we need professional help with this linking stuff, but who to
> ask?

Andy Lutomirski?
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote:
> Some details that may be useful in analysis of the bug:
> 
> 1. lspci -nn -d 10de:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 
980] [10de:13c0] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio 
Controller [10de:0fbb] (rev a1

> 2. What displays, if any, you have plugged into the NVIDIA board when
> this happens?

A Philips 273V, via DVI.

> 3. Any boot parameters, esp relating to ACPI, PM, or related?

None for those, what's there that will be unfamiliar to you are for
patches that aren't applied.

nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0
nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60
ignore_loglevel crashkernel=256M,high

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote:
>  All DRM did was to slip a
> WARN_ON_ONCE() that nouveau triggers into a kernel module where such
> things no longer warn, they blow the box out of the water.

BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c
into a WARN_ONCE(), and all is peachy, you get the warning, box lives.

---
 drivers/gpu/drm/drm_vblank.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp
 */
if (mode->crtc_clock == 0) {
DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe);
-   WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
+   WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report me.\n",
+ dev->driver->name);
 
return false;
}
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Tue, 2017-07-11 at 20:53 +0200, Mike Galbraith wrote:
> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> 
> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> > too much trouble, a bisect would be pretty useful.
> 
> Vacation -> back to work happens in the very early AM, so bisection
> will have to wait a bit.

Hm, my backup workstation (old GeForce 8600 GT box) has the same issue,
so perhaps I can bisect it as I work on backlog (multitasking: screw up
multiple tasks concurrently).

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> 
> OK, thanks. So in other words, a fairly standard desktop with a PCIe
> board plugged in. No funny business. (Laptops can create a ton of
> additional weirdness, which I assumed you had since you were talking
> about STR.)

Yup, garden variety deskside box.

> My best guess is that gf119_head_vblank_put either has a bogus head id
> (should be in the 0..3 range) which causes it to do an out-of-bounds
> read on MMIO space, or that the MMIO mapping has already been removed
> by the time nouveau_display_suspend runs. Adding Ben Skeggs for
> additional insight.
> 
> Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> too much trouble, a bisect would be pretty useful.

Vacation -> back to work happens in the very early AM, so bisection
will have to wait a bit.

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-17 Thread Mike Galbraith
On Fri, 2017-07-14 at 17:10 +0200, Karol Herbst wrote:
> Yeah, we shouldn't let the machine die. Are there more WARN_ON_ONCE
> usage we could convert to WARN_ONCE?

Shooting the messenger is generally considered uncool :)

-Mike
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau