Re: Kenel Ooops with: "drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status" [bisected].

2019-09-02 Thread Przemek Socha
Dnia niedziela, 1 września 2019 19:40:25 CEST piszesz:
> On Sun, Sep 1, 2019 at 1:09 PM Przemek Socha  wrote:
> > Hello everyone,
> > 
> > after today sync with amd-staging-drm-next repo my machine was hit by
> > Ooops
> > bug.
> > Maybe my google-foo is weak, but I could not find any fix on patchwork for
> > this that will/was implemented or planned.
> > 
> > Machine is a Lenovo netbook with a6-6310 APU, R4 (CIK).
> > 
> > I have done bisection and here are the results:
> > 
> > 
> > 1.  dmesg output from pstore after kernel panic:
> > 
> > <6>[   13.133880] [drm] amdgpu kernel modesetting enabled.
> > <6>[   13.133923] amdgpu :00:01.0:
> > remove_conflicting_pci_framebuffers: bar 0: 0xe000 -> 0xefff
> > <6>[   13.133927] amdgpu :00:01.0:
> > remove_conflicting_pci_framebuffers: bar 2: 0xf000 -> 0xf07f
> > <6>[   13.133930] amdgpu :00:01.0:
> > remove_conflicting_pci_framebuffers: bar 5: 0xf0c0 -> 0xf0c3
> > <7>[   13.133933] checking generic (e000 42) vs hw (e000
> > 1000) <6>[   13.133935] fb0: switching to amdgpudrmfb from EFI VGA
> > <6>[   13.133999] Console: switching to colour dummy device 80x25
> > <6>[   13.136463] [drm] initializing kernel modesetting (MULLINS
> > 0x1002:0x9851 0x17AA:0x3801 0x00).
> > <6>[   13.136826] [drm] register mmio base: 0xF0C0
> > <6>[   13.136827] [drm] register mmio size: 262144
> > <6>[   13.136837] [drm] add ip block number 0 
> > <6>[   13.136839] [drm] add ip block number 1 
> > <6>[   13.136840] [drm] add ip block number 2 
> > <6>[   13.136842] [drm] add ip block number 3 
> > <6>[   13.136844] [drm] add ip block number 4 
> > <6>[   13.136845] [drm] add ip block number 5 
> > <6>[   13.136847] [drm] add ip block number 6 
> > <6>[   13.136849] [drm] add ip block number 7 
> > <6>[   13.136850] [drm] add ip block number 8 
> > <6>[   13.136857] amdgpu :00:01.0: kfd not supported on this ASIC
> > <6>[   13.136916] ATOM BIOS: BR45787.ts5
> > <6>[   13.137031] [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
> > fragment size is 9-bit
> > <6>[   13.137042] amdgpu :00:01.0: VRAM: 1024M 0x00F4 -
> > 0x00F43FFF (1024M used)
> > <6>[   13.137046] amdgpu :00:01.0: GART: 1024M 0x00FF -
> > 0x00FF3FFF
> > <6>[   13.137056] [drm] Detected VRAM RAM=1024M, BAR=1024M
> > <6>[   13.137057] [drm] RAM width 64bits UNKNOWN
> > <6>[   13.138102] sdhci: Secure Digital Host Controller Interface driver
> > <6>[   13.138105] sdhci: Copyright(c) Pierre Ossman
> > <6>[   13.138741] [TTM] Zone  kernel: Available graphics memory: 3541568
> > KiB <6>[   13.138744] [TTM] Zone   dma32: Available graphics memory:
> > 2097152 KiB <6>[   13.138745] [TTM] Initializing pool allocator
> > <6>[   13.138754] [TTM] Initializing DMA pool allocator
> > <6>[   13.138882] [drm] amdgpu: 1024M of VRAM memory ready
> > <6>[   13.138891] [drm] amdgpu: 3072M of GTT memory ready.
> > <6>[   13.138932] [drm] GART: num cpu pages 262144, num gpu pages 262144
> > <6>[   13.138970] [drm] PCIE GART of 1024M enabled (table at
> > 0x00F400401000).
> > <6>[   13.176861] [drm] Internal thermal controller without fan control
> > <6>[   13.176865] [drm] amdgpu: dpm initialized
> > <6>[   13.176872] [drm] Found UVD firmware Version: 1.64 Family ID: 9
> > <6>[   13.178133] sdhci-pci :00:14.7: SDHCI controller found
> > [1022:7813] (rev 1)
> > <6>[   13.180552] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
> > <6>[   13.186202] kvm: Nested Virtualization enabled
> > <6>[   13.186205] kvm: Nested Paging enabled
> > <6>[   13.191378] mmc0: SDHCI controller on PCI [:00:14.7] using ADMA
> > <3>[   13.196258] [drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB:
> > invalid powerlevel state: 0!
> > <4>[   13.196308] [drm] Unsupported Connector type:5!
> > <6>[   13.213496] [drm] Display Core initialized with v3.2.48!
> > <6>[   13.221850] [drm] SADs count is: -2, don't need to read it
> > <6>[   13.230392] ath: phy0: WB335 2-ANT card detected
> > <6>[   13.230395] ath: phy0: Set BT/WLAN RX diversity capability
> > <6>[   13.247472] ath: phy0: Enable LNA combining
> > <6>[   13.248570] ath: phy0: ASPM enabled: 0x43
&g

Kenel Ooops with: "drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status" [bisected].

2019-09-01 Thread Przemek Socha
Hello everyone,

after today sync with amd-staging-drm-next repo my machine was hit by Ooops 
bug.
Maybe my google-foo is weak, but I could not find any fix on patchwork for this 
that will/was implemented or planned.

Machine is a Lenovo netbook with a6-6310 APU, R4 (CIK).

I have done bisection and here are the results:


1.  dmesg output from pstore after kernel panic:

<6>[   13.133880] [drm] amdgpu kernel modesetting enabled.
<6>[   13.133923] amdgpu :00:01.0: remove_conflicting_pci_framebuffers: bar 
0: 0xe000 -> 0xefff
<6>[   13.133927] amdgpu :00:01.0: remove_conflicting_pci_framebuffers: bar 
2: 0xf000 -> 0xf07f
<6>[   13.133930] amdgpu :00:01.0: remove_conflicting_pci_framebuffers: bar 
5: 0xf0c0 -> 0xf0c3
<7>[   13.133933] checking generic (e000 42) vs hw (e000 1000)
<6>[   13.133935] fb0: switching to amdgpudrmfb from EFI VGA
<6>[   13.133999] Console: switching to colour dummy device 80x25
<6>[   13.136463] [drm] initializing kernel modesetting (MULLINS 0x1002:0x9851 
0x17AA:0x3801 0x00).
<6>[   13.136826] [drm] register mmio base: 0xF0C0
<6>[   13.136827] [drm] register mmio size: 262144
<6>[   13.136837] [drm] add ip block number 0 
<6>[   13.136839] [drm] add ip block number 1 
<6>[   13.136840] [drm] add ip block number 2 
<6>[   13.136842] [drm] add ip block number 3 
<6>[   13.136844] [drm] add ip block number 4 
<6>[   13.136845] [drm] add ip block number 5 
<6>[   13.136847] [drm] add ip block number 6 
<6>[   13.136849] [drm] add ip block number 7 
<6>[   13.136850] [drm] add ip block number 8 
<6>[   13.136857] amdgpu :00:01.0: kfd not supported on this ASIC
<6>[   13.136916] ATOM BIOS: BR45787.ts5
<6>[   13.137031] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, 
fragment size is 9-bit
<6>[   13.137042] amdgpu :00:01.0: VRAM: 1024M 0x00F4 - 
0x00F43FFF (1024M used)
<6>[   13.137046] amdgpu :00:01.0: GART: 1024M 0x00FF - 
0x00FF3FFF
<6>[   13.137056] [drm] Detected VRAM RAM=1024M, BAR=1024M
<6>[   13.137057] [drm] RAM width 64bits UNKNOWN
<6>[   13.138102] sdhci: Secure Digital Host Controller Interface driver
<6>[   13.138105] sdhci: Copyright(c) Pierre Ossman
<6>[   13.138741] [TTM] Zone  kernel: Available graphics memory: 3541568 KiB
<6>[   13.138744] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
<6>[   13.138745] [TTM] Initializing pool allocator
<6>[   13.138754] [TTM] Initializing DMA pool allocator
<6>[   13.138882] [drm] amdgpu: 1024M of VRAM memory ready
<6>[   13.138891] [drm] amdgpu: 3072M of GTT memory ready.
<6>[   13.138932] [drm] GART: num cpu pages 262144, num gpu pages 262144
<6>[   13.138970] [drm] PCIE GART of 1024M enabled (table at 
0x00F400401000).
<6>[   13.176861] [drm] Internal thermal controller without fan control
<6>[   13.176865] [drm] amdgpu: dpm initialized
<6>[   13.176872] [drm] Found UVD firmware Version: 1.64 Family ID: 9
<6>[   13.178133] sdhci-pci :00:14.7: SDHCI controller found [1022:7813] 
(rev 1)
<6>[   13.180552] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
<6>[   13.186202] kvm: Nested Virtualization enabled
<6>[   13.186205] kvm: Nested Paging enabled
<6>[   13.191378] mmc0: SDHCI controller on PCI [:00:14.7] using ADMA
<3>[   13.196258] [drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB: 
invalid powerlevel state: 0!
<4>[   13.196308] [drm] Unsupported Connector type:5!
<6>[   13.213496] [drm] Display Core initialized with v3.2.48!
<6>[   13.221850] [drm] SADs count is: -2, don't need to read it
<6>[   13.230392] ath: phy0: WB335 2-ANT card detected
<6>[   13.230395] ath: phy0: Set BT/WLAN RX diversity capability
<6>[   13.247472] ath: phy0: Enable LNA combining
<6>[   13.248570] ath: phy0: ASPM enabled: 0x43
<7>[   13.248574] ath: EEPROM regdomain: 0x6a
<7>[   13.248575] ath: EEPROM indicates we should expect a direct regpair map
<7>[   13.248579] ath: Country alpha2 being used: 00
<7>[   13.248580] ath: Regpair used: 0x6a
<7>[   13.261552] ieee80211 phy0: Selected rate control algorithm 
'minstrel_ht'
<6>[   13.261857] ieee80211 phy0: Atheros AR9565 Rev:1 mem=0xa9f1c040, 
irq=43
<6>[   13.296215] ath9k :01:00.0 wlp1s0: renamed from wlan0
<6>[   13.304323] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
<6>[   13.304325] [drm] Driver supports precise vblank timestamp query.
<6>[   13.321092] [drm] UVD initialized successfully.
<6>[   13.373473] usb 1-1: new high-speed USB device number 2 using ehci-pci
<6>[   13.386794] usb 4-1: new high-speed USB device number 2 using ehci-pci
<6>[   13.442287] [drm] VCE initialized successfully.
<1>[   13.444174] BUG: kernel NULL pointer dereference, address: 
00a8
<1>[   13.444191] #PF: supervisor read access in kernel mode
<1>[   13.444197] #PF: error_code(0x) - not-present page
<6>[   13.444202] PGD 0 P4D 0 
<4>[   13.444210] Oops:  [#1] PREEMPT SMP
<4>[   13.444218] CPU: 1 PID: 3311 Comm: laptop_mode Not 

Re: amd-staging-drm-next, CIK apu, [TTM] Erroneous page count. Leaking pages. [BISECTED]

2019-04-11 Thread Przemek Socha
Dnia czwartek, 11 kwietnia 2019 19:31:27 CEST piszesz:
> Yeah, that is a known issue and already fixed in amd-staging-drm-next.
> 
> You just need to wait for the public mirror to update.
> 
> Christian.
> 
> Am 11.04.19 um 18:53 schrieb Przemek Socha:
> 
> > Hi All,
> > after today's sync with amd-staging-drm-next branch I have problems with
> > my
 Mullins APU Netbook.
> > Kernel log is flooded with "[TTM] Erroneous page count. Leaking pages."
> > message.
> >
> >
> >
> > Bisected, and it seems that "drm/ttm: fix start page for huge page check
> > in
 ttm_put_pages()" commit is causing this.
> >
> >
> >
> > git bisect log:
> > git bisect start
> > # good: [c2af9f9e84183a4dbcb53a74abbe3362dc181682] drm/amd/display: fix
> > cursor
 black issue
> > git bisect good c2af9f9e84183a4dbcb53a74abbe3362dc181682
> > # bad: [b07c394a327fc9e435ee03288584c111fa73d963] drm/amd/display: fix is
> > odm
 head pipe logic
> > git bisect bad b07c394a327fc9e435ee03288584c111fa73d963
> > # bad: [367fa96ad66fef31bf73916cf4949c475f42563f] drm/amd/display: use
> > proper
 formula to calculate bandwidth from timing
> > git bisect bad 367fa96ad66fef31bf73916cf4949c475f42563f
> > # good: [94275c6d27deebbce5055e1bb1959986051b93f2] drm/amdgpu: Add a check
> > to
 avoid panic because of unexpected irqs
> > git bisect good 94275c6d27deebbce5055e1bb1959986051b93f2
> > # bad: [8198a32b1211c6f625623bae22629cf852531097] drm/amd/display:
> > Initialize
 stream_update with memset
> > git bisect bad 8198a32b1211c6f625623bae22629cf852531097
> > # bad: [e16858a7e6e7dc00345703682d490b96f2883b99] drm/ttm: fix start page
> > for
 huge page check in ttm_put_pages()
> > git bisect bad e16858a7e6e7dc00345703682d490b96f2883b99
> > # good: [b414dfd1f975bbf4c9ee4600f9bedcf77bae84ed] drm/ttm: fix
> > out-of-bounds
 read in ttm_put_pages() v2
> > git bisect good b414dfd1f975bbf4c9ee4600f9bedcf77bae84ed
> > # first bad commit: [e16858a7e6e7dc00345703682d490b96f2883b99] drm/ttm:
> > fix
 start page for huge page check in ttm_put_pages()
> >
> >
> >
> > e16858a7e6e7dc00345703682d490b96f2883b99 is the first bad commit
> > commit e16858a7e6e7dc00345703682d490b96f2883b99
> > Author: Christian König 
> > Date:   Tue Apr 2 09:29:35 2019 +0200
> >
> >
> >
> >  drm/ttm: fix start page for huge page check in ttm_put_pages()
> >  
> >  The first page entry is always the same with itself.
> >  
> >  Signed-off-by: Christian König 
> >  Reviewed-by: Michel Dänzer 
> >  Reviewed-by: Junwei Zhang 
> >  Reviewed-by: Huang Rui 
> >
> >
> >
> > Any help is appreciated.
> > Thanks,
> > Przemek.
> 
> 

Thanks, and sorry about all the fuss.
Przemek.

signature.asc
Description: This is a digitally signed message part.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

amd-staging-drm-next, CIK apu, [TTM] Erroneous page count. Leaking pages. [BISECTED]

2019-04-11 Thread Przemek Socha
Hi All,
after today's sync with amd-staging-drm-next branch I have problems with my 
Mullins APU Netbook.
Kernel log is flooded with "[TTM] Erroneous page count. Leaking pages." 
message.

Bisected, and it seems that "drm/ttm: fix start page for huge page check in 
ttm_put_pages()" commit is causing this.

git bisect log:
git bisect start
# good: [c2af9f9e84183a4dbcb53a74abbe3362dc181682] drm/amd/display: fix cursor 
black issue
git bisect good c2af9f9e84183a4dbcb53a74abbe3362dc181682
# bad: [b07c394a327fc9e435ee03288584c111fa73d963] drm/amd/display: fix is odm 
head pipe logic
git bisect bad b07c394a327fc9e435ee03288584c111fa73d963
# bad: [367fa96ad66fef31bf73916cf4949c475f42563f] drm/amd/display: use proper 
formula to calculate bandwidth from timing
git bisect bad 367fa96ad66fef31bf73916cf4949c475f42563f
# good: [94275c6d27deebbce5055e1bb1959986051b93f2] drm/amdgpu: Add a check to 
avoid panic because of unexpected irqs
git bisect good 94275c6d27deebbce5055e1bb1959986051b93f2
# bad: [8198a32b1211c6f625623bae22629cf852531097] drm/amd/display: Initialize 
stream_update with memset
git bisect bad 8198a32b1211c6f625623bae22629cf852531097
# bad: [e16858a7e6e7dc00345703682d490b96f2883b99] drm/ttm: fix start page for 
huge page check in ttm_put_pages()
git bisect bad e16858a7e6e7dc00345703682d490b96f2883b99
# good: [b414dfd1f975bbf4c9ee4600f9bedcf77bae84ed] drm/ttm: fix out-of-bounds 
read in ttm_put_pages() v2
git bisect good b414dfd1f975bbf4c9ee4600f9bedcf77bae84ed
# first bad commit: [e16858a7e6e7dc00345703682d490b96f2883b99] drm/ttm: fix 
start page for huge page check in ttm_put_pages()

e16858a7e6e7dc00345703682d490b96f2883b99 is the first bad commit
commit e16858a7e6e7dc00345703682d490b96f2883b99
Author: Christian König 
Date:   Tue Apr 2 09:29:35 2019 +0200

drm/ttm: fix start page for huge page check in ttm_put_pages()

The first page entry is always the same with itself.

Signed-off-by: Christian König 
Reviewed-by: Michel Dänzer 
Reviewed-by: Junwei Zhang 
Reviewed-by: Huang Rui 

Any help is appreciated.
Thanks,
Przemek.

signature.asc
Description: This is a digitally signed message part.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

amd-staging-drm-next - [drm] REG_WAIT timeout 1us * 80000 tries - dce_abm_set_pipe line:62

2019-02-28 Thread Przemek Socha
Hi all,

today I've spotted a warning during hibernation (S4) process while the machine 
was attempting to disable all HW and write hibernation image to disk just 
before "amdgpu :00:01.0: GPU pci config reset" and disabling EC interrupt.

Besides that everything works just fine. System hibernates and resumes 
correctly, so I have no idea if I should worry or not.

System is Lenovo G50-45 with a6-6310 APU and r4 Mullins.

>[14469.490249] [drm] REG_WAIT timeout 1us * 8 tries - dce_abm_set_pipe 
line:62
>[14469.490427] WARNING: CPU: 3 PID: 32028 at drivers/gpu/drm/amd/amdgpu/../
display/dc/dc_helper.c:277 generic_reg_wait.cold.3+0x2a/0x31 [amdgpu]
>[14469.490429] Modules linked in: rfcomm nf_tables ebtable_nat ip_set 
nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables overlay squashfs 
loop bnep ipv6 ath3k >btusb btintel bluetooth ecdh_generic rtsx_usb_ms 
memstick rtsx_usb_sdmmc uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_v4l2 rtsx_usb videobuf2_common videodev media >ath9k kvm_amd 
ath9k_common ath9k_hw kvm irqbypass sdhci_pci cqhci sdhci crc32_pclmul 
ghash_clmulni_intel serio_raw mmc_core mac80211 amdgpu ath xhci_pci xhci_hcd 
cfg80211 >mfd_core chash gpu_sched ehci_pci ttm ehci_hcd sp5100_tco
>[14469.490488] CPU: 3 PID: 32028 Comm: kworker/u8:13 Not tainted 5.0.0-rc1+ 
#71
>[14469.490490] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 
08/04/2016
>[14469.490499] Workqueue: events_unbound async_run_entry_fn
>[14469.490590] RIP: 0010:generic_reg_wait.cold.3+0x2a/0x31 [amdgpu]
>[14469.490595] Code: 44 8b 44 24 68 48 c7 c7 30 2f 43 c0 48 8b 4c 24 60 8b 54 
24 58 8b 74 24 04 e8 16 ed 37 ef 41 83 7c 24 20 01 0f 84 d6 a3 fe ff <0f> 0b e9 
cf a3 fe ff e8 4d c1 eb ff 48 c7 c7 00 a0 4b c0 e8 a1 77
>[14469.490598] RSP: 0018:9759425ff6e0 EFLAGS: 00010297
>[14469.490602] RAX: 0043 RBX: 00013881 RCX: 

>[14469.490605] RDX:  RSI: 0096 RDI: 
>
>[14469.490608] RBP: 1620 R08: 0004 R09: 
0001bb40
>[14469.490611] R10: 02e453506252 R11: 0043 R12: 
8d2552416100
>[14469.490613] R13:  R14: 0001 R15: 
0001
>[14469.490617] FS:  () GS:8d2557b8() knlGS:

>[14469.490620] CS:  0010 DS:  ES:  CR0: 80050033
>[14469.490623] CR2: 7efbb0564038 CR3: 000212b5e000 CR4: 
000406e0
>[14469.490625] Call Trace:
>[14469.490743]  dce_abm_set_pipe+0x47/0x2a8 [amdgpu]
>[14469.490855]  dce_abm_immediate_disable+0x15/0x208 [amdgpu]
>[14469.490949]  dc_link_set_abm_disable+0x31/0x40 [amdgpu]
>[14469.491045]  dce110_blank_stream+0x69/0x70 [amdgpu]
>[14469.491139]  core_link_disable_stream+0x3e/0x238 [amdgpu]
>[14469.491236]  dce110_reset_hw_ctx_wrap+0xbe/0x1e0 [amdgpu]
>[14469.491333]  dce110_apply_ctx_to_hw+0x46/0x768 [amdgpu]
>[14469.491428]  ? amdgpu_pm_compute_clocks.part.11+0x265/0x4d8 [amdgpu]
>[14469.491539]  ? dm_pp_apply_display_requirements+0x1dd/0x1f8 [amdgpu]
>[14469.491633]  dc_commit_state+0x35e/0x9f0 [amdgpu]
>[14469.491731]  ? dce110_timing_generator_get_position+0x71/0x160 [amdgpu]
>[14469.491842]  amdgpu_dm_atomic_commit_tail+0x4b4/0x1cf0 [amdgpu]
>[14469.491941]  ? dce110_timing_generator_get_crtc_scanoutpos+0x75/0x130 
[amdgpu]
>[14469.492031]  ? dc_stream_get_scanoutpos+0x70/0x90 [amdgpu]
>[14469.492140]  ? dm_crtc_get_scanoutpos+0x61/0xb0 [amdgpu]
>[14469.492234]  ? amdgpu_display_get_crtc_scanoutpos+0x80/0x168 [amdgpu]
>[14469.492330]  ? dce110_timing_generator_get_vblank_counter+0x26/0xa0 
[amdgpu]
>[14469.492340]  ? _raw_spin_unlock_irqrestore+0xf/0x28
>[14469.492346]  ? __wake_up_common_lock+0x84/0xb8
>[14469.492456]  ? amdgpu_dm_atomic_commit_tail+0x1cf0/0x1cf0 [amdgpu]
>[14469.492462]  ? preempt_count_add+0x74/0xa0
>[14469.492467]  ? _raw_spin_lock_irq+0xf/0x30
>[14469.492471]  ? _raw_spin_unlock_irq+0xe/0x20
>[14469.492478]  ? wait_for_completion_timeout+0x101/0x128
>[14469.492486]  ? drm_atomic_helper_setup_commit+0x4a7/0x660
>[14469.492493]  ? drm_atomic_helper_commit+0x107/0x418
>[14469.492499]  drm_atomic_helper_commit+0x107/0x418
>[14469.492507]  __drm_atomic_helper_disable_all.constprop.30+0x141/0x150
>[14469.492514]  drm_atomic_helper_suspend+0xe5/0x118
>[14469.492625]  dm_suspend+0x20/0xb8 [amdgpu]
>[14469.492716]  amdgpu_device_ip_suspend_phase1+0x94/0xc0 [amdgpu]
>[14469.492808]  amdgpu_device_suspend+0x2e8/0x490 [amdgpu]
>[14469.492817]  pci_pm_freeze+0x4c/0xc8
>[14469.492823]  ? pci_pm_poweroff+0xd0/0xd0
>[14469.492829]  dpm_run_callback+0x2a/0x120
>[14469.492837]  __device_suspend+0x200/0x7e8
>[14469.492844]  async_suspend+0x15/0x88
>[14469.492849]  async_run_entry_fn+0x32/0xd8
>[14469.492856]  process_one_work+0x1f4/0x428
>[14469.492863]  worker_thread+0x43/0x490
>[14469.492869]  ? process_one_work+0x428/0x428
>[14469.492873]  kthread+0x15d/0x180
>[14469.492878]  ? kthread_create_on_node+0x60/0x60
>[14469.492884]  

Re: BUG - unable to handle null pointer, bisected - drm/amd/display: add gpio lock/unlock

2019-02-07 Thread Przemek Socha
Dnia czwartek, 7 lutego 2019 22:59:59 CET piszesz:

> > I'll post a fix shortly.
> 
> Fix merged to amd-staging-drm-next.
> 
> Harry
> 


I apologize for the late response, 
and thank you very much.

I had a problem with applying the patch on top of clean amd-staging-drm-next 
because it is in one chunk,  I suppose ( but my patch-fu could be weak also) I 
had t o modify it like this:
"
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -1127,10 +1127,11 @@
link->dc->res_pool->funcs->link_init(link);
 
link->hpd_gpio = get_hpd_gpio(link->ctx->dc_bios, link->link_id, 
link->ctx->gpio_service);
-   dal_gpio_open(link->hpd_gpio, GPIO_MODE_INTERRUPT);
-   dal_gpio_unlock_pin(link->hpd_gpio);
-   if (link->hpd_gpio != NULL)
-   link->irq_source_hpd = dal_irq_get_source(link-
>hpd_gpio);
+if (link->hpd_gpio != NULL) {
+   dal_gpio_open(link->hpd_gpio, GPIO_MODE_INTERRUPT);
+   dal_gpio_unlock_pin(link->hpd_gpio);
+link->irq_source_hpd = dal_irq_get_source(link->hpd_gpio);
+   }
 
switch (link->link_id.id) {
case CONNECTOR_ID_HDMI_TYPE_A:
"

After that, machine works as it should.

So this patch also works on Mullins apu.

Once again, thank you all very much.

Przemek.




signature.asc
Description: This is a digitally signed message part.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


BUG - unable to handle null pointer, bisected - drm/amd/display: add gpio lock/unlock

2019-02-06 Thread Przemek Socha
Good morning,

on my Lenovo G50-45 a6310 APU with R4 Mullins commit 
e261568f94d6c37ebb94d3c4b3f8a3085375dd9d is causing kernel Oops (unable to 
handle NULL pointer).
Cross-checked by reverting troublesome commit and machine without it is 
working fine.

Here is a part of the Oops message from pstore:


<1>[   13.200310] BUG: unable to handle kernel NULL pointer dereference at 
0008
<1>[   13.200323] #PF error: [normal kernel read fault]
<6>[   13.200328] PGD 0 P4D 0 
<4>[   13.200335] Oops:  [#1] PREEMPT SMP
<4>[   13.200342] CPU: 2 PID: 2961 Comm: udevd Not tainted 5.0.0-rc1+ #47
<4>[   13.200347] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 
08/04/2016
<4>[   13.200450] RIP: 0010:dal_gpio_open_ex+0x0/0x30 [amdgpu]
<4>[   13.200456] Code: d6 48 89 de 48 89 ef e8 6e f8 ff ff 84 c0 74 c7 48 89 
e8 
5b 5d c3 0f 0b 31 ed 5b 48 89 e8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 <48> 83 
7f 08 00 74 08 0f 0b b8 05 00 00 00 c3 89 77 18 8b 57 14 4c
<4>[   13.200466] RSP: 0018:b78e82bb7650 EFLAGS: 00010282
<4>[   13.200471] RAX:  RBX: b78e82bb76a4 RCX: 

<4>[   13.200476] RDX: 0006 RSI: 0004 RDI: 

<4>[   13.200480] RBP: a1d695e93300 R08: 0003 R09: 
a1d692456600
<4>[   13.200485] R10: f7dc88574dc0 R11: b78e82bb75b8 R12: 
a1d695c68700
<4>[   13.200490] R13: c07ef5a0 R14: b78e82bb79b8 R15: 
a1d692456600
<4>[   13.200495] FS:  7f9c3fcac300() GS:a1d697b0() knlGS:

<4>[   13.200501] CS:  0010 DS:  ES:  CR0: 80050033
<4>[   13.200506] CR2: 0008 CR3: 0002124a CR4: 
000406e0
<4>[   13.200510] Call Trace:
<4>[   13.200605]  construct+0x15f/0x710 [amdgpu]
<4>[   13.200710]  link_create+0x2e/0x48 [amdgpu]
<4>[   13.200803]  dc_create+0x2c0/0x5f0 [amdgpu]
<4>[   13.200899]  dm_hw_init+0xe0/0x150 [amdgpu]
<4>[   13.200990]  amdgpu_device_init.cold.38+0xe06/0xf67 [amdgpu]
<4>[   13.201002]  ? kmalloc_order+0x13/0x38
<4>[   13.201102]  amdgpu_driver_load_kms+0x60/0x210 [amdgpu]
<4>[   13.201112]  drm_dev_register+0x10e/0x150
<4>[   13.201207]  amdgpu_pci_probe+0xb8/0x118 [amdgpu]
<4>[   13.201217]  ? _raw_spin_unlock_irqrestore+0xf/0x28
<4>[   13.201226]  pci_device_probe+0xd1/0x158
<4>[   13.201234]  really_probe+0xee/0x2a0
<4>[   13.201241]  driver_probe_device+0x4a/0xb0
<4>[   13.201247]  __driver_attach+0xaf/0xc8
<4>[   13.201253]  ? driver_probe_device+0xb0/0xb0
<4>[   13.201258]  bus_for_each_dev+0x6f/0xb8
<4>[   13.201265]  bus_add_driver+0x197/0x1d8
<4>[   13.201271]  ? 0xc0933000
<4>[   13.201276]  driver_register+0x66/0xa8
<4>[   13.201281]  ? 0xc0933000
<4>[   13.201287]  do_one_initcall+0x41/0x1e2
<4>[   13.201294]  ? wake_up_page_bit+0x21/0x100
<4>[   13.201301]  ? kmem_cache_alloc_trace+0x2e/0x1a0
<4>[   13.201308]  ? do_init_module+0x1d/0x1e0
<4>[   13.201315]  do_init_module+0x55/0x1e0
<4>[   13.201321]  load_module+0x205c/0x2488
<4>[   13.201329]  ? vfs_read+0x10e/0x138
<4>[   13.201337]  ? __do_sys_finit_module+0xba/0xd8
<4>[   13.201342]  __do_sys_finit_module+0xba/0xd8
<4>[   13.201350]  do_syscall_64+0x50/0x168
<4>[   13.201357]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[   13.201364] RIP: 0033:0x7f9c3fdcf409
<4>[   13.201371] Code: 18 c3 e8 3a 98 01 00 66 2e 0f 1f 84 00 00 00 00 00 48 
89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 
3d 01 f0 ff ff 73 01 c3 48 8b 0d 47 6a 0c 00 f7 d8 64 89 01 48
<4>[   13.201381] RSP: 002b:7fff9b4824f8 EFLAGS: 0246 ORIG_RAX: 
0139
<4>[   13.201389] RAX: ffda RBX: 559d56fe1780 RCX: 
7f9c3fdcf409
<4>[   13.201394] RDX:  RSI: 559d570385c0 RDI: 
000e
<4>[   13.201399] RBP:  R08:  R09: 
7fff9b482610
<4>[   13.201404] R10: 000e R11: 0246 R12: 
559d56ff2120
<4>[   13.201409] R13: 0002 R14: 559d570385c0 R15: 
559d56fe1780
<4>[   13.201416] Modules linked in: kvm_amd kvm ath9k irqbypass crc32_pclmul 
ghash_clmulni_intel serio_raw ath9k_common ath9k_hw sdhci_pci cqhci sdhci 
amdgpu(+) mmc_core mac80211 ath mfd_core chash cfg80211 gpu_sched ttm xhci_pci 
ehci_pci xhci_hcd ehci_hcd sp5100_tco
<4>[   13.201448] CR2: 0008
<4>[   13.206222] ---[ end trace 2244da3024c5ad93 ]---


Here is a full git bisect log on amd-staging-drm-next branch synced today:

git bisect start
# good: [e1be4cb583800db36ed7f6303f7a8c205be24ceb] drm/amd/display: Use memset 
to initialize variables in fill_plane_dcc_attributes
git bisect good e1be4cb583800db36ed7f6303f7a8c205be24ceb
# bad: [25fa5507b06b8cfbec6db7933615ae603516bb7b] drm/amd/display: Disconnect 
mpcc when changing tg
git bisect bad 25fa5507b06b8cfbec6db7933615ae603516bb7b
# good: [e7b4cc9edcbe9c07e5bae2dbdebb04b054e3ff5b] drm/amd/display: Remove 
FreeSync timing changed debug output
git bisect good 

Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected.

2019-01-31 Thread Przemek Socha
Dnia czwartek, 31 stycznia 2019 17:56:32 CET piszesz:

> In my experience only the last chunk of the patch is necessary.  Can you 
> try this without:
> 
> 
>  >> + vm->bulk_moveable = false;
> 
> 
> Too?
> 
> Thanks,
> Tom

Sure.

I have applied only the last chunk of the patch on top of today's amd-staging-
drm-next pull:

> >> @@ -2772,6 +2773,9 @@  void amdgpu_vm_bo_rmv(struct amdgpu_device *adev,
> >> 
> >>struct amdgpu_vm_bo_base **base;
> >>
> >>
> >>
> >>if (bo) {
> >> 
> >> +  if (bo->tbo.resv == vm->root.base.bo->tbo.resv)
> >> +  vm->bulk_moveable = false;
> >> +
> >> 
> >>for (base = _va->base.bo->vm_bo; *base;
> >>
> >> base = &(*base)->next) {
> >>
> >>if (*base != _va->base)

and it seems to be working as expected also. 

Thanks,
Przemek.

signature.asc
Description: This is a digitally signed message part.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected.

2019-01-31 Thread Przemek Socha
Dnia środa, 30 stycznia 2019 13:42:33 CET piszesz:
> Does the attached patch fix the issue?
> 
> Christian.

I have tested this one also - "drm/amdgpu: partial revert cleanup setting 
bulk_movable v2"

>We still need to set bulk_movable to false when new BOs are added or removed.
>
>v2: also set it to false on removal
>
>Signed-off-by: Christian König 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 
> 1 file changed, 4 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/
>amdgpu/amdgpu_vm.c
>index 79f9dde70bc0..822546a149fa 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>@@ -332,6 +332,7 @@  static void amdgpu_vm_bo_base_init(struct 
>amdgpu_vm_bo_base *base,
>   if (bo->tbo.resv != vm->root.base.bo->tbo.resv)
>   return;
> 
>+  vm->bulk_moveable = false;
>   if (bo->tbo.type == ttm_bo_type_kernel)
>   amdgpu_vm_bo_relocated(base);
>   else
>@@ -2772,6 +2773,9 @@  void amdgpu_vm_bo_rmv(struct amdgpu_device *adev,
>   struct amdgpu_vm_bo_base **base;
> 
>   if (bo) {
>+  if (bo->tbo.resv == vm->root.base.bo->tbo.resv)
>+  vm->bulk_moveable = false;
>+
>   for (base = _va->base.bo->vm_bo; *base;
>base = &(*base)->next) {
>   if (*base != _va->base)

and so far I have no lockup and Oops, so I think this one is ok.

Thank you very much,
Przemek.

signature.asc
Description: This is a digitally signed message part.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected.

2019-01-30 Thread Przemek Socha
Dnia środa, 30 stycznia 2019 13:42:33 CET piszesz:
> Does the attached patch fix the issue?
> 
> Christian.
> 
> .

Thanks for the rapid response, but unfortunately no. 
System freezes and only mouse pointer is movable (cannot switch tty's, reboot 
by pwr button, tree-finger-salute doesn't work also).

Here is a trace log after applying the patch. I'm attaching it because it 
looks different:

<4>[   46.864336] [ cut here ]
<2>[   46.864343] kernel BUG at drivers/gpu/drm/ttm/ttm_bo.c:196!
<4>[   46.864361] invalid opcode:  [#1] PREEMPT SMP
<4>[   46.864369] CPU: 3 PID: 10966 Comm: plasmashel:cs0 Not tainted 5.0.0-
rc1+ #44
<4>[   46.864373] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 
08/04/2016
<4>[   46.864388] RIP: 0010:ttm_bo_ref_bug+0x0/0x8 [ttm]
<4>[   46.864393] Code: 00 00 08 00 75 0c 48 83 c7 0c 4c 39 cf 75 ab 31 c0 c3 
b8 01 00 00 00 c3 66 90 f0 ff 8f a4 00 00 00 c3 0f 1f 84 00 00 00 00 00 <0f> 0b 
66 0f 1f 44 00 00 53 48 8b 07 48 89 fb 48 8b 40 18 48 8b 40
<4>[   46.864397] RSP: 0018:a86fc1263af8 EFLAGS: 00010247
<4>[   46.864403] RAX: 8c7b133a787c RBX: a86fc1263c48 RCX: 
8c7b0f7698f8
<4>[   46.864406] RDX: 8c7b133a78f8 RSI: 8c7b11aa2800 RDI: 
8c7b133a787c
<4>[   46.864410] RBP: 8c7ac16d1b38 R08: 8c7b1348d0f8 R09: 
a86fc12639b0
<4>[   46.864414] R10: cfd6c84d07c0 R11: 0003 R12: 
c0364c10
<4>[   46.864417] R13: a86fc1263be0 R14:  R15: 
a86fc1263c48
<4>[   46.864422] FS:  7f3e34019700() GS:8c7b17b8() knlGS:

<4>[   46.864426] CS:  0010 DS:  ES:  CR0: 80050033
<4>[   46.864430] CR2: 7fb1cb765000 CR3: 00021333e000 CR4: 
000406e0
<4>[   46.864433] Call Trace:
<4>[   46.864446]  ttm_bo_del_from_lru+0xab/0xc8 [ttm]
<4>[   46.864456]  ttm_eu_reserve_buffers+0x140/0x2c8 [ttm]
<4>[   46.864557]  amdgpu_cs_ioctl+0x4ee/0x1b08 [amdgpu]
<4>[   46.864575]  ? __switch_to_asm+0x40/0x70
<4>[   46.864668]  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
<4>[   46.864678]  drm_ioctl_kernel+0xa4/0xe8<4>[   46.864686]  
drm_ioctl+0x1db/0x358
<4>[   46.864767]  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
<4>[   46.864848]  amdgpu_drm_ioctl+0x44/0x78 [amdgpu]
<4>[   46.864859]  do_vfs_ioctl+0x9f/0x618
<4>[   46.864867]  ksys_ioctl+0x5b/0x88
<4>[   46.864874]  __x64_sys_ioctl+0x11/0x18
<4>[   46.864881]  do_syscall_64+0x50/0x168
<4>[   46.864888]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[   46.864895] RIP: 0033:0x7f3e4a939fa7
<4>[   46.864900] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 
8d 
dc 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 
01 f0 ff ff 73 01 c3 48 8b 0d a9 ae 0c 00 f7 d8 64 89 01 48
<4>[   46.864904] RSP: 002b:7f3e34018ab8 EFLAGS: 0246 ORIG_RAX: 
0010
<4>[   46.864909] RAX: ffda RBX: 7f3e34018c58 RCX: 
7f3e4a939fa7
<4>[   46.864913] RDX: 7f3e34018b40 RSI: c0186444 RDI: 
0010
<4>[   46.864916] RBP: 7f3e34018b40 R08: 7f3e34018c80 R09: 
7f3e34018c58
<4>[   46.864920] R10: 7f3e34018ca0 R11: 0246 R12: 
c0186444
<4>[   46.864923] R13: 0010 R14: 5e550d70 R15: 
0003
<4>[   46.864929] Modules linked in: rfcomm nf_tables ebtable_nat ip_set 
nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables overlay squashfs 
loop bnep ipv6 rtsx_usb_ms memstick rtsx_usb_sdmmc rtsx_usb ath3k btusb 
btintel bluetooth ecdh_generic uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_v4l2 videobuf2_common videodev media kvm_amd ath9k kvm ath9k_common 
irqbypass ath9k_hw crc32_pclmul mac80211 sdhci_pci cqhci sdhci 
ghash_clmulni_intel serio_raw mmc_core ath cfg80211 amdgpu mfd_core chash 
gpu_sched xhci_pci ttm ehci_pci xhci_hcd ehci_hcd sp5100_tco
<4>[   46.864981] ---[ end trace 7bdf1a5927cdc874 ]---

Thanks,
Przemek.





signature.asc
Description: This is a digitally signed message part.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected.

2019-01-30 Thread Przemek Socha
Good morning,

after last pull from the amd-staging-drm-next tree (29th of February) I have 
random Oops on A6 6310 APU with r4 Mullins.

Here is the Oops part of the log taken from pstore:

<1>[   55.166270] BUG: unable to handle kernel NULL pointer dereference at 
0208
<1>[   55.166281] #PF error: [normal kernel read fault]
<6>[   55.166285] PGD 0 P4D 0 
<4>[   55.166293] Oops:  [#1] PREEMPT SMP
<4>[   55.166301] CPU: 3 PID: 11006 Comm: kwin_x11:cs0 Not tainted 5.0.0-rc1+ 
#44
<4>[   55.166305] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 
08/04/2016
<4>[   55.166320] RIP: 0010:ttm_bo_bulk_move_lru_tail+0xd3/0x188 [ttm]
<4>[   55.166326] Code: 00 4c 8b 0a 48 8b 81 a8 00 00 00 48 81 c1 a8 00 00 00 
49 89 02 4c 8b 92 b0 00 00 00 4c 89 50 08 44 89 c0 48 c1 e0 04 4c 01 c8 <4c> 
8b 90 08 02 00 00 4d 89 1a 4c 8b 90 08 02 00 00 4c 89 92 b0 00
<4>[   55.166330] RSP: 0018:a8bdc0f33b18 EFLAGS: 00010246
<4>[   55.166335] RAX:  RBX:  RCX: 
9cfa935778f8
<4>[   55.166339] RDX: 9cfa950c5050 RSI: 0070 RDI: 
9cfa93575dd0
<4>[   55.166342] RBP: 9cfa5d44d800 R08:  R09: 

<4>[   55.166346] R10: 9cfa8f7730f8 R11: 9cfa950c50f8 R12: 
9cfa93575dd0
<4>[   55.166350] R13: 9cfa93575800 R14: 0001 R15: 
c03adc10
<4>[   55.166355] FS:  7fb327fff700() GS:9cfa97b8() knlGS:

<4>[   55.166359] CS:  0010 DS:  ES:  CR0: 80050033
<4>[   55.166363] CR2: 0208 CR3: 0002150f CR4: 
000406e0
<4>[   55.166366] Call Trace:
<4>[   55.166477]  amdgpu_vm_move_to_lru_tail+0xe4/0x100 [amdgpu]
<4>[   55.166563]  amdgpu_cs_ioctl+0x14e7/0x1b08 [amdgpu]
<4>[   55.166586]  ? __switch_to_asm+0x40/0x70
<4>[   55.166689]  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
<4>[   55.166698]  drm_ioctl_kernel+0xa4/0xe8
<4>[   55.166707]  drm_ioctl+0x1db/0x358
<4>[   55.166805]  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
<4>[   55.166901]  amdgpu_drm_ioctl+0x44/0x78 [amdgpu]
<4>[   55.166931]  do_vfs_ioctl+0x9f/0x618
<4>[   55.166940]  ksys_ioctl+0x5b/0x88
<4>[   55.166947]  __x64_sys_ioctl+0x11/0x18
<4>[   55.166955]  do_syscall_64+0x50/0x168
<4>[   55.166963]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[   55.166969] RIP: 0033:0x7fb34b035fa7
<4>[   55.166974] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 
8d 
dc 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 
01 f0 ff ff 73 01 c3 48 8b 0d a9 ae 0c 00 f7 d8 64 89 01 48
<4>[   55.166978] RSP: 002b:7fb327ffea88 EFLAGS: 0246 ORIG_RAX: 
0010
<4>[   55.166984] RAX: ffda RBX: 7fb327ffec58 RCX: 
7fb34b035fa7
<4>[   55.166987] RDX: 7fb327ffeb10 RSI: c0186444 RDI: 
0010
<4>[   55.166991] RBP: 7fb327ffeb10 R08: 7fb327ffec80 R09: 
7fb327ffec58
<4>[   55.166995] R10: 7fb327ffeca0 R11: 0246 R12: 
c0186444
<4>[   55.166998] R13: 0010 R14: 55ecd2705dc0 R15: 
0003
<4>[   55.167004] Modules linked in: rfcomm nf_tables ebtable_nat ip_set 
nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables overlay squashfs 
loop bnep ipv6 rtsx_usb_ms memstick rtsx_usb_sdmmc rtsx_usb uvcvideo 
videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev 
media ath3k btusb btintel bluetooth ecdh_generic ath9k ath9k_common kvm_amd 
ath9k_hw sdhci_pci kvm cqhci irqbypass mac80211 sdhci crc32_pclmul 
ghash_clmulni_intel ath serio_raw mmc_core cfg80211 amdgpu mfd_core chash 
gpu_sched xhci_pci ttm xhci_hcd ehci_pci ehci_hcd sp5100_tco
<4>[   55.167063] CR2: 0208
<4>[   55.167069] ---[ end trace bf1c4be089002236 ]---

Bisected, and  it seems that the bad commit is "drm/amdgpu: cleanup setting 
bulk_movable". I hope this is relevant.

full git bisect log:

git bisect start
# good: [10117450735c7a7c0858095fb46a860e7037cb9a] drm/amd/display: add -msse2 
to prevent Clang from emitting libcalls to undefined SW FP routines
git bisect good 10117450735c7a7c0858095fb46a860e7037cb9a
# bad: [b9c6252b7f980e7e03c0bf659a251798b36a8094] Revert "drm/amd/display: add 
-msse2 to prevent Clang from emitting libcalls to undefined SW FP routines"
git bisect bad b9c6252b7f980e7e03c0bf659a251798b36a8094
# good: [1de29da5b7281c9a8427d84948bf3d77bc4b8d16] drm: disable uncached DMA 
optimization for ARM and arm64
git bisect good 1de29da5b7281c9a8427d84948bf3d77bc4b8d16
# good: [bbf48cae572b39c4df6023b01d6f8de66ef41b34] Revert "test patch for hpd 
dpms check"
git bisect good bbf48cae572b39c4df6023b01d6f8de66ef41b34
# good: [257b75d373c77d6792d0011f7379398ba60799ec] drm/amdgpu: Show XGMI node 
and hive message per device only once
git bisect good 257b75d373c77d6792d0011f7379398ba60799ec
# good: [4d771657c533d8fe3b574c561084f66aebc77bb6] drm/amdgpu: cleanup 
amdgpu_pte_update_params
git bisect good 4d771657c533d8fe3b574c561084f66aebc77bb6
# 

Re: BISECTED- amd-staging-drm-next, xorg-server segfault A6-6310 APU - R4 Mullins.

2019-01-10 Thread Przemek Socha
Dnia czwartek, 10 stycznia 2019 17:22:07 CET piszesz:
> On 2019-01-10 5:06 p.m., Przemek Socha wrote:
> > Dnia czwartek, 10 stycznia 2019 12:25:01 CET piszesz:
> >> On 2019-01-10 10:44 a.m., Przemek Socha wrote:
> >>> Hi,
> >>> after yesterday's fetch of amd-staging-drm-next tree from agd5f git repo
> >>> my
> >>> xorg server is segfaulting when starting up.
> >>> 
> >>> I am using gentoo ~amd64, xorg-server 1.20.3, xf86-video-amdgpu-18.1.0.
> >>> Machine is an old Lenovo g50-45 netbook with A6-6310 APU - R4 Mullins.
> >>> 
> >>> - excerpt from Xorg.log:
> >>> 
> >>> "[21.878] (II) AMDGPU(0): Setting screen physical size to 700 x 270
> >>> [21.880] (EE)
> >>> [21.880] (EE) Backtrace:
> >>> [21.880] (EE) 0: /usr/bin/X (xorg_backtrace+0x4d) [0x559df051f0bd]
> >>> [21.880] (EE) 1: /usr/bin/X (0x559df0376000+0x1acc89)
> >>> [0x559df0522c89]
> >>> [21.880] (EE) 2: /lib64/libpthread.so.0 (0x7f6f2edad000+0x14560)
> >>> [0x7f6f2edc1560]
> >>> [21.880] (EE) 3: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
> >>> (0x7f6f2f32b000+0x14fce) [0x7f6f2f33ffce]
> >>> [21.880] (EE) 4: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
> >>> (0x7f6f2f32b000+0xd1c4) [0x7f6f2f3381c4]
> >>> [21.880] (EE) 5: /usr/bin/X (0x559df0376000+0xdf024)
> >>> [0x559df0455024]
> >>> [21.881] (EE) 6: /usr/bin/X (InitRootWindow+0x11) [0x559df03f8761]
> >>> [21.881] (EE) 7: /usr/bin/X (0x559df0376000+0x5b574)
> >>> [0x559df03d1574]
> >>> [21.881] (EE) 8: /lib64/libc.so.6 (__libc_start_main+0xee)
> >>> [0x7f6f2ec054ce]
> >>> [21.881] (EE) 9: /usr/bin/X (_start+0x2a) [0x559df03bb00a]
> >>> [21.881] (EE)
> >>> [21.881] (EE) Segmentation fault at address 0x4
> >>> [21.881] (EE)
> >>> Fatal server error:
> >>> [21.881] (EE) Caught signal 11 (Segmentation fault). Server aborting
> >>> [21.881] (EE)
> >>> [21.881] (EE)
> >>> Please consult the The X.Org Foundation support
> >>> 
> >>>at http://wiki.x.org
> >>>  
> >>>  for help.
> >>> 
> >>> [21.881] (EE) Please also check the log file at
> >>> "/var/log/Xorg.0.log"
> >>> for additional information.
> >>> [21.881] (EE)
> >>> [21.881] (II) AIGLX: Suspending AIGLX clients for VT switch
> >>> [21.957] (EE) Server terminated with error (1). Closing log file."
> >>> 
> >>> 
> >>> I am not sure if I didn't mess up anything, but git bisect gives the
> >>> results:
> >>> 
> >>> [...]
> >>> 
> >>> 79c6b898011958fba7722528d567b64e1cdc8dbe is the first bad commit
> >>> commit 79c6b898011958fba7722528d567b64e1cdc8dbe
> >>> Author: Yu Zhao 
> >>> Date:   Mon Jan 7 15:51:14 2019 -0700
> >>> 
> >>> drm/amdgpu: validate user pitch alignment
> >>> 
> >>> Userspace may request pitch alignment that is not supported by GPU.
> >>> Some requests 32, but GPU ignores it and uses default 64 when cpp is
> >>> 4. If GEM object is allocated based on the smaller alignment, GPU
> >>> DMA will go out of bound.
> >>> 
> >>> Cc: sta...@vger.kernel.org # v4.2+
> >>> Reviewed-by: Michel Dänzer 
> >>> Signed-off-by: Yu Zhao 
> >>> :
> >>> :04 04 5338964e9975e461ceedb27f6342c2896f54607a
> >>> 
> >>> ed2f04fc9b665b27b1905fd60b7d2a3933d1fdcc M  drivers
> >> 
> >> Thanks for tracking this down. It turns out the check added by this
> >> change is too strict for linear framebuffers. I've sent a patch
> >> reverting it for review: https://patchwork.freedesktop.org/patch/276122/
> >> 
> >> Sorry I didn't realize this issue when reviewing this change.
> > 
> > Thanks for the swift response and sorry about the delay on my side.
> 
> No worries.
> 
> > Unfortunately applying this patch does not help in my case (but could be
> > necessarily after all).
> > 
> > To start xorg-server I had to apply your patch, and, on top of this,
> > reverse the  "drm/amdgpu: validate user pitch alignment" -
> > 79c6b898011958fba7722528d567b64e1cdc8dbe.
> > 
> > Now x

Re: BISECTED- amd-staging-drm-next, xorg-server segfault A6-6310 APU - R4 Mullins.

2019-01-10 Thread Przemek Socha
Dnia czwartek, 10 stycznia 2019 12:25:01 CET piszesz:
> On 2019-01-10 10:44 a.m., Przemek Socha wrote:
> > Hi,
> > after yesterday's fetch of amd-staging-drm-next tree from agd5f git repo
> > my
> > xorg server is segfaulting when starting up.
> > 
> > I am using gentoo ~amd64, xorg-server 1.20.3, xf86-video-amdgpu-18.1.0.
> > Machine is an old Lenovo g50-45 netbook with A6-6310 APU - R4 Mullins.
> > 
> > - excerpt from Xorg.log:
> > 
> > "[21.878] (II) AMDGPU(0): Setting screen physical size to 700 x 270
> > [21.880] (EE)
> > [21.880] (EE) Backtrace:
> > [21.880] (EE) 0: /usr/bin/X (xorg_backtrace+0x4d) [0x559df051f0bd]
> > [21.880] (EE) 1: /usr/bin/X (0x559df0376000+0x1acc89) [0x559df0522c89]
> > [21.880] (EE) 2: /lib64/libpthread.so.0 (0x7f6f2edad000+0x14560)
> > [0x7f6f2edc1560]
> > [21.880] (EE) 3: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
> > (0x7f6f2f32b000+0x14fce) [0x7f6f2f33ffce]
> > [21.880] (EE) 4: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
> > (0x7f6f2f32b000+0xd1c4) [0x7f6f2f3381c4]
> > [21.880] (EE) 5: /usr/bin/X (0x559df0376000+0xdf024) [0x559df0455024]
> > [21.881] (EE) 6: /usr/bin/X (InitRootWindow+0x11) [0x559df03f8761]
> > [21.881] (EE) 7: /usr/bin/X (0x559df0376000+0x5b574) [0x559df03d1574]
> > [21.881] (EE) 8: /lib64/libc.so.6 (__libc_start_main+0xee)
> > [0x7f6f2ec054ce]
> > [21.881] (EE) 9: /usr/bin/X (_start+0x2a) [0x559df03bb00a]
> > [21.881] (EE)
> > [21.881] (EE) Segmentation fault at address 0x4
> > [21.881] (EE)
> > Fatal server error:
> > [21.881] (EE) Caught signal 11 (Segmentation fault). Server aborting
> > [21.881] (EE)
> > [21.881] (EE)
> > Please consult the The X.Org Foundation support
> > 
> >  at http://wiki.x.org
> >  
> >  for help.
> > 
> > [21.881] (EE) Please also check the log file at "/var/log/Xorg.0.log"
> > for additional information.
> > [21.881] (EE)
> > [21.881] (II) AIGLX: Suspending AIGLX clients for VT switch
> > [21.957] (EE) Server terminated with error (1). Closing log file."
> > 
> > 
> > I am not sure if I didn't mess up anything, but git bisect gives the
> > results:
> > 
> > [...]
> > 
> > 79c6b898011958fba7722528d567b64e1cdc8dbe is the first bad commit
> > commit 79c6b898011958fba7722528d567b64e1cdc8dbe
> > Author: Yu Zhao 
> > Date:   Mon Jan 7 15:51:14 2019 -0700
> > 
> > drm/amdgpu: validate user pitch alignment
> > 
> > Userspace may request pitch alignment that is not supported by GPU.
> > Some requests 32, but GPU ignores it and uses default 64 when cpp is
> > 4. If GEM object is allocated based on the smaller alignment, GPU
> > DMA will go out of bound.
> > 
> > Cc: sta...@vger.kernel.org # v4.2+
> > Reviewed-by: Michel Dänzer 
> > Signed-off-by: Yu Zhao 
> > :
> > :04 04 5338964e9975e461ceedb27f6342c2896f54607a
> > 
> > ed2f04fc9b665b27b1905fd60b7d2a3933d1fdcc M  drivers
> 
> Thanks for tracking this down. It turns out the check added by this
> change is too strict for linear framebuffers. I've sent a patch
> reverting it for review: https://patchwork.freedesktop.org/patch/276122/
> 
> Sorry I didn't realize this issue when reviewing this change.

Thanks for the swift response and sorry about the delay on my side.

Unfortunately applying this patch does not help in my case (but could be 
necessarily after all). 

To start xorg-server I had to apply your patch, and, on top of this, reverse 
the  "drm/amdgpu: validate user pitch alignment" - 
79c6b898011958fba7722528d567b64e1cdc8dbe.

Now xorg server starts, but I have doubt about system stability because 
description of reverted one is saying that "if GEM object is allocated based 
on the smaller alignment, GPU
DMA will go out of bound". 
Sorry about being a layman on this.

Thanks,
Przemek.

signature.asc
Description: This is a digitally signed message part.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


BISECTED- amd-staging-drm-next, xorg-server segfault A6-6310 APU - R4 Mullins.

2019-01-10 Thread Przemek Socha
Hi,
after yesterday's fetch of amd-staging-drm-next tree from agd5f git repo my 
xorg server is segfaulting when starting up.

I am using gentoo ~amd64, xorg-server 1.20.3, xf86-video-amdgpu-18.1.0. 
Machine is an old Lenovo g50-45 netbook with A6-6310 APU - R4 Mullins.

- excerpt from Xorg.log:

"[21.878] (II) AMDGPU(0): Setting screen physical size to 700 x 270
[21.880] (EE) 
[21.880] (EE) Backtrace:
[21.880] (EE) 0: /usr/bin/X (xorg_backtrace+0x4d) [0x559df051f0bd]
[21.880] (EE) 1: /usr/bin/X (0x559df0376000+0x1acc89) [0x559df0522c89]
[21.880] (EE) 2: /lib64/libpthread.so.0 (0x7f6f2edad000+0x14560) 
[0x7f6f2edc1560]
[21.880] (EE) 3: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so 
(0x7f6f2f32b000+0x14fce) [0x7f6f2f33ffce]
[21.880] (EE) 4: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so 
(0x7f6f2f32b000+0xd1c4) [0x7f6f2f3381c4]
[21.880] (EE) 5: /usr/bin/X (0x559df0376000+0xdf024) [0x559df0455024]
[21.881] (EE) 6: /usr/bin/X (InitRootWindow+0x11) [0x559df03f8761]
[21.881] (EE) 7: /usr/bin/X (0x559df0376000+0x5b574) [0x559df03d1574]
[21.881] (EE) 8: /lib64/libc.so.6 (__libc_start_main+0xee) 
[0x7f6f2ec054ce]
[21.881] (EE) 9: /usr/bin/X (_start+0x2a) [0x559df03bb00a]
[21.881] (EE) 
[21.881] (EE) Segmentation fault at address 0x4
[21.881] (EE) 
Fatal server error:
[21.881] (EE) Caught signal 11 (Segmentation fault). Server aborting
[21.881] (EE) 
[21.881] (EE) 
Please consult the The X.Org Foundation support 
 at http://wiki.x.org
 for help. 
[21.881] (EE) Please also check the log file at "/var/log/Xorg.0.log" for 
additional information.
[21.881] (EE) 
[21.881] (II) AIGLX: Suspending AIGLX clients for VT switch
[21.957] (EE) Server terminated with error (1). Closing log file."


I am not sure if I didn't mess up anything, but git bisect gives the results:

git bisect log
git bisect start
# good: [d9c54d61df327dc93374b718d7941a09e02e32e1] drm/amdgpu: Add new VegaM 
pci id
git bisect good d9c54d61df327dc93374b718d7941a09e02e32e1
# bad: [d2d07f246b126b23d02af0603b83866a3c3e2483] drm/amdgpu/si: add missing 
header to fix compilation
git bisect bad d2d07f246b126b23d02af0603b83866a3c3e2483
# good: [abc0add47f449a02f8b784d43f4578723fbd6ac9] drm/amdgpu: Add message 
print when unable to get valid hive
git bisect good abc0add47f449a02f8b784d43f4578723fbd6ac9
# bad: [0b0ff923cb4279ee12913dd2b597146538b76a8b] drm/amdgpu: set 
WRITE_BURST_LENGTH to 64B to workaround SDMA1 hang
git bisect bad 0b0ff923cb4279ee12913dd2b597146538b76a8b
# good: [320a0512c8e42fde01be5c47f33fef07d264c6d4] drm/amd/powerplay: create 
pp_od_clk_voltage device file under OD support
git bisect good 320a0512c8e42fde01be5c47f33fef07d264c6d4
# bad: [79c6b898011958fba7722528d567b64e1cdc8dbe] drm/amdgpu: validate user 
pitch alignment
git bisect bad 79c6b898011958fba7722528d567b64e1cdc8dbe
# good: [069d809a94db281efcb6a55f1c0e4a088af0a4cb] drm/amd/powerplay: drop the 
unnecessary uclk hard min setting
git bisect good 069d809a94db281efcb6a55f1c0e4a088af0a4cb
# first bad commit: [79c6b898011958fba7722528d567b64e1cdc8dbe] drm/amdgpu: 
validate user pitch alignment


79c6b898011958fba7722528d567b64e1cdc8dbe is the first bad commit
commit 79c6b898011958fba7722528d567b64e1cdc8dbe
Author: Yu Zhao 
Date:   Mon Jan 7 15:51:14 2019 -0700

drm/amdgpu: validate user pitch alignment

Userspace may request pitch alignment that is not supported by GPU.
Some requests 32, but GPU ignores it and uses default 64 when cpp is
4. If GEM object is allocated based on the smaller alignment, GPU
DMA will go out of bound.

Cc: sta...@vger.kernel.org # v4.2+
Reviewed-by: Michel Dänzer 
Signed-off-by: Yu Zhao 

:04 04 5338964e9975e461ceedb27f6342c2896f54607a 
ed2f04fc9b665b27b1905fd60b7d2a3933d1fdcc M  drivers

Any help is appreciated.

Thanks,
Przemek.


signature.asc
Description: This is a digitally signed message part.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx