[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-07-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

Michel Dänzer  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #82 from Michel Dänzer  ---
(In reply to Hadet from comment #81)
> Having some similar issues. After closing games running in Wine specifically

Please file your own report. The reporter of this one marked it as resolved.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-07-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

Hadet  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|WORKSFORME  |---

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-07-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #81 from Hadet  ---
Created attachment 144797
  --> https://bugs.freedesktop.org/attachment.cgi?id=144797&action=edit
After AMDGPU crashes

Having some similar issues. After closing games running in Wine specifically

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-03-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

Allan  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #80 from Allan  ---
Closing this issue, here is the summary for a quick look:

The problem : amdgpu hangs suddenly, nothing can kill it.

Solution : The driver got more stable over time.

The causes :
1 - The driver itself was more unstable.
2 - The kernel wasn't supporting ryzen CPUs properly, leading to segfaults and
unexpected behaviors. If it is your case, use any kernel already listed here.
3 - Corsair RAM is not a good deal to work with Ryzen, specially if they don't
have some kind of "Ryzen ready" seal. Aiming the best performance for Intel
platforms made them to not support JEDEC standards properly while trying to use
the SPD profile, even if you try to delay latencies.
Thus, bad RAM -> unexpected behaviors, including from the driver.

Additional information :
I was able to test only a few days (a week or so) before the GPU showed BGA
problems.
It was working fine.

If I ever be able to test it again and find another scenario where the driver
hangs and can't be killed I'll report here.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-03-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #79 from Mauro Gaspari  ---
Please go ahead and close it, I will open a new one. no problem.

Cheers 
Mauro

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-03-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #78 from l...@protonmail.ch ---
I think this issue should be closed. I am not yet sure if my issue was the same
as yours (doesn't seem likely), but if it wasn't the same, I'll just open a new
one if needed. Likewise, those whose issues have not been resolved by this
should just open a new issue IMO.

 Original Message 
On 10 Mar 2019, 11:07, wrote:

> [Comment # 77](https://bugs.freedesktop.org/show_bug.cgi?id=105733#c77) on 
> [bug 105733](https://bugs.freedesktop.org/show_bug.cgi?id=105733) from 
> [Allan](mailto:allan4...@gmail.com)
>
> Also, I need instructions of what to do with the status of the bug.
>
> It worked for me, but there are some users discussing it yet.
>
> I'll wait for a response. Please cite me.
>
> ---
> You are receiving this mail because:
>
> - You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-03-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #77 from Allan  ---
Also, I need instructions of what to do with the status of the bug.

It worked for me, but there are some users discussing it yet.

I'll wait for a response. Please cite me.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-03-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #76 from Allan  ---
To clarify what kernels to aim for if you are using ryzen+amdgpu :

1 - drm-next-4.21-wip
https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.21-wip
2 - drm-next-5.2-wip
https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-5.2-wip
3 - amd-staging-drm-next
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

Any generic kernel provided by debian (as example, most distros may follow a
similar policy and thus lead to the same result) won't be enough to handle the
Ryzen CPU properly yet.
I have tested until 4.19.0-1-amd64 (Debian 4.19.12-1) from debian repos.

There were some fixes awaiting for a pull request acceptance.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-03-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #75 from Allan  ---
Well, after a long time I'm here again to tell what happened:

A very nice AMD staff was following me up because of the CPU, and it ended up
solving the problems I had with the video card (seems like).


1. Regarding the kernel timing
(In reply to fin4478 from comment #52)
> To prevent random kernel lock ups with Ryzen, fix this with bios, set to
> Typical Current Idle  in the bios Advanced/AMD CBS menu.
> 
> Use latest AMD wip kernel and Oibaf ppa Mesa. Disable display composting and
> vsync with Xfce. Use 300Hz kernel timer.
> 
> Working kernel config file for my system as attachment.

Yes, I tried it a lot, believe me, all combinations possible, 300hz, 250hz,
1000hz, your config, linux-firmware drivers. At least 10 attempts with
variations of your config, including a pure one only activating dmcrypt that is
not enabled in yours.

2. Regarding the PSU profile
As already said by fin4478 and requested by AMD, I requested to BIOSTAR a bios
that allowed to change it. They sent me a beta version to test it.

No luck at all, nothing related.

3. The madness
Nothing worked, but the CPU was already ok. The mobo was already ok, the video
card was hunging sometimes, even while on Windows now.

Ok, I made a shot in the dark suspecting of some nonsense incompatibility of
the ram.

And this is it. Even after sending it to the warranty, even after making 100+
tests, the ram was the issue.

Was a Corsair Vengeance one : 2x4GB DDR4 CL15, 2133MHz SPD (JEDEC), 3000MHz
XMP2.

Even at JEDEC specifications it caused the system to fail.

Even if I delayed the latencies by much it was causing it.

It was what was causing the amdgpu driver to fail. Along with any heavy
application. Since the RAM is used before sending things to VRAM, makes sense
to the driver/device to process something unexpected.

I warn everyone that uses Corsair memories, specially if they don't have their
"Ryzen ready" merchandise. Even though there's a standard called JEDEC, they
simply don't implement it very well.

It was the reason why sometimes I could use the system for 1-2 hours, and
sometimes not even 5 minutes before crashing. There is some kind of instability
there.

I sold it to a guy that uses an 8700k or something, exaplined the situation, he
agreed. Until now (more than 2 months) there is not a single issue related to
the memory chips. They must have  done somthing to optimize for Intel beyond
the XMP profile and compromised the entire project. Along with 1 year of my
life and a bunch of money spent.

But, the fixes along time in amdgpu indeed was proven to be useful, so it was
not only a ram's fault. Because using the same ram chips, I had a lot less
problems compared to when I reported this problem.

Now I'm using a G.Skill Tridentz 3200MHz @ 2666MHz that is the speed assured by
AMD that the 1800X must work with. Stable without a single problem related to
it.

4. To confirm that I have won the raffle of a not working system my RX480 died
a month ago probably because of a BGA problem.

Then I found a label in the card, looked for it, and discovered that a selled
sold me a refurbished product as new.

Then I'm evaluating if I'll sue him or just fix the card.

And I told about it because this is why I can't test it again until I get
another amd card. I'm using the nvidia that I couldn't sell in the meantime.

5. The funny part.

The nvidia driver that seemed to be a lot stable at first, started to fail like
hell after replacing the truly problematic CPU.

And the amdgpu driver started to be more stable, more than any other driver
from linux or windows.


Well, I think that this is it. I'll return when I'm able to test amdgpu again.

But the veredict for now is :

I tested the RX480 without a single problem while using amdgpu. Not used
intensively, just common tests and played a little bit of Left for Dead 2
without any issue (good point, it always crashed).

The card showed the BGA problem when using a variation of the Adrenalin driver
for windows, when I was doing some verifications requested by AMD.

Cheers for all.
Prefer G. Skill instead of Corsair.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-03-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #74 from Mauro Gaspari  ---
Quick update.
After latest updates on my Kubuntu 18.10 with Padoka unstable PPA, I am
noticing great improvements. Performance using DXVK with DX11 is greatly
improved with LLVM9.0.0, mesa 19.0.1-devel seems stable and so far I had no
freezes.

I am currently using: Kubuntu 18.10 with Mesa 19.1.0-devel - padoka PPA,
DRM3.26.0, 4.18.0-16-generic, LLVM 9.0.0
This is the PPA being used:
https://launchpad.net/~paulo-miguel-dias/+archive/ubuntu/mesa/

Further explanation and installation guides included here:
https://github.com/lutris/lutris/wiki/Installing-drivers

I hope this helps.
Cheers
Mauro

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-02-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #73 from Mauro Gaspari  ---
This problem affects me as well. It has for quite some time. 
My setup: 
CPU AMD Ryzen 7 2700X
RAM 64GB DDR4 3200
GPU AMD Vega RX 64

Since this issue has plagued me for quite a while, I tried to even install
windows10, and I can confirm there are no issues at all. Having said that AMD
drivers were quite bad at Vega launch on windows too. 

In my experience the bug comes and goes together with mesa versions being used,
or combination of kernel plus mesa. I can reproduce the issue easily by playing
some games.Some extra tests I ran to make sure it was not hardware issue or
game issue:
- Same games work fine on windows on same hardware, same bios settings, etc.
- Same games work fine on my Nvidia+Intel based laptop, running same linux
distributions and kernels.

For example for me kubuntu 18.04.01 Using AMDGPU opensource drivers was ok
without the bug for a very long time. Then, a couple of weeks ago mesa update
came and i started having the freeze again. 
I tried to upgrade to 18.10 and I still had the freeze. Added oibaf PPA, and
the issue was gone. after a few weeks an update came and issue started
happening again. I am now using padoka PPA but still having the freeze.
Same problem happens for me also on OpenSUSE Tumbleweed and Arch on same
machine. 

I tried disabling compositor, disablign vsync, changing compositor on my KDE
Plasma, running game in windowed mode vs full screen. Nothing helped.

Also please note that before upgrading my CPU and Motherboard, I was running
Vega RX64 on an Intel CPU, and I had the same issues.

Some info I saved a while back when running on OpenSUSE Tumbleweed below. If
needed I can grab more recent logs and system info and post.
I am also going to try and install kubuntu 18.04.1 with AMDGPU-PRO proprietary
drivers to see if there is any difference.


---First time i noticed the issue:

OS: OpenSUSE tumbleweed x86_64 updated (2018 04 21)
Kernel: 4.16.2-1-default
Desktop Environment: KDE Plasma (x11)
OpenGL version string: 3.0 Mesa 18.0.0
GPU: AMD Radeon RX Vega 64 8GB

Apr 21 17:08:34 STUDIO kernel: [drm:gfx_v9_0_priv_reg_irq [amdgpu]] *ERROR*
Illegal register access in command stream
Apr 21 17:08:34 STUDIO kernel: [drm] No hardware hang detected. Did some blocks
stall?
Apr 21 17:08:44 STUDIO kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, last signaled seq=128859, last emitted seq=128861
Apr 21 17:08:44 STUDIO kernel: [drm] No hardware hang detected. Did some blocks
stall?
-- Reboot --


Dmesg lines relative to amdgpu:

[3.407020] [drm] amdgpu kernel modesetting enabled.
[3.411462] fb: switching to amdgpudrmfb from VESA VGA
[3.426163] amdgpu :04:00.0: Invalid PCI ROM header signature: expecting
0xaa55, got 0x
[3.426261] amdgpu :04:00.0: VRAM: 8176M 0x00F4 -
0x00F5FEFF (8176M used)
[3.426263] amdgpu :04:00.0: GTT: 256M 0x00F6 -
0x00F60FFF
[3.426371] [drm] amdgpu: 8176M of VRAM memory ready
[3.426372] [drm] amdgpu: 8176M of GTT memory ready.
[4.031665] fbcon: amdgpudrmfb (fb0) is primary device
[4.083803] amdgpu :04:00.0: fb0: amdgpudrmfb frame buffer device
[4.096086] amdgpu :04:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
[4.096088] amdgpu :04:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub
0
[4.096089] amdgpu :04:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub
0
[4.096090] amdgpu :04:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub
0
[4.096091] amdgpu :04:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub
0
[4.096093] amdgpu :04:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub
0
[4.096094] amdgpu :04:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on
hub 0
[4.096095] amdgpu :04:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on
hub 0
[4.096096] amdgpu :04:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on
hub 0
[4.096098] amdgpu :04:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub
0
[4.096099] amdgpu :04:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
[4.096100] amdgpu :04:00.0: ring 11(sdma1) uses VM inv eng 5 on hub 1
[4.096101] amdgpu :04:00.0: ring 12(uvd) uses VM inv eng 6 on hub 1
[4.096103] amdgpu :04:00.0: ring 13(uvd_enc0) uses VM inv eng 7 on hub
1
[4.096104] amdgpu :04:00.0: ring 14(uvd_enc1) uses VM inv eng 8 on hub
1
[4.096105] amdgpu :04:00.0: ring 15(vce0) uses VM inv eng 9 on hub 1
[4.096107] amdgpu :04:00.0: ring 16(vce1) uses VM inv eng 10 on hub 1
[4.096108] amdgpu :04:00.0: ring 17(vce2) uses VM inv eng 11 on hub 1
[4.096662] [drm] Initialized amdgpu 3.23.0 20150101 for :04:00.0 on
minor 0


---It was identified to be this bug
https://bugs.freedesktop.org/show_bug.cgi?id=105317 . After I upgraded
Tumbleweed to mesa 18.0.1 the issue was gone.


--- Later on I had the same bug again.
OS: OpenSUSE tumblewe

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-02-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #72 from castor_fou  ---
I tried comment 64 suggestion: ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2 

After 2 days without any hang, I've just got one.
I am desperate about this problem, it has happened only since 18.04 upgrade. I
had no issue for 4 years with 16.04 and previous versions.

My mitigation is to cron a restart of display-manager twice a day. What a pity
solution.
0 12,19 * * * /bin/systemctl restart display-manager

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-02-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #71 from Garry Hurley Jr  ---
What I want to know is what is calling your machine ‘localhorst’? 

Sent from my iPhone

> On Nov 20, 2018, at 9:15 AM, bugzilla-dae...@freedesktop.org wrote:
> 
> Comment # 47 on bug 105733 from Allan
> I have really bad news.
> 
> I'm delaying a lot to answer because I literally sent for warranty or replaced
> ALL of my components in the PC.
> 
> The CPU (R7 1800X) was replaced from a batch 21 to a new by AMD itself batched
> 35.
> 
> But OK, let's talk about the amdgpu :
> 
> (In reply to Andrey Grodzovsky from comment #25)
> > (In reply to Allan from comment #12)
> > Can you build latest kernel (4.18) and grab again latest firmware and try
> > again ?
> > Links to kernel and firmware:
> > https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
> > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> >  
> 
> For reasons already explained here I couldn't either compile or test it 
> before,
> so please don't be mad with me :
> - Sold my old PC.
> - My notebook was completely filled with files.
> - Components on warranty. Testing everything else.
> 
> So I managed to borrow a PC to test the video cards. I have tested only the
> nvidia one to prove for AMD that the GPU is working and the pci-controller (a
> guess of mine) of the CPU/chipset that is broken. Going to test the RX480 on
> this PC as soon as possible. My warranties are expiring and I had to enumerate
> priorities.
> 
> I already said it here but, with the 1800X I couldn't even clone the git
> repository (the checksum always fails, tried many times).
> 
> Then I managed to free some space on my notebook and started to build
> yesterday.
> - Included amd-ucode firmware.
> - Included polaris10 firmware (for RX480).
> - Made some optimizations for ryzen as descbribed on the gentoo's dedicated
> page.
> 
> Compiled, version 4.20-rc1 as present in the branch. No errors reported.
> 
> There are 2 main applications that are easier to test right now to find the
> problems :
> - Metro 2033 Redux through steam.
> - Left for Dead 2 through steam.
> 
> Started Metro 2033, worked for some minutes with no issue, but it was for some
> reason without any sound. Closed. Turned off the HDMI audio on pavucontrol to
> use only the default output. Restarted steam.
> 
> Started Left for Dead 2 this time. Was able to change graphics settings to max
> without AA and vsync. Played for 15 seconds and got a screen freeze. Waited 
> for
> a script to record properly the logs and temps. Hard rebooted. This time even
> my BIOS/EFI screen had a green background, but still operational. Everything
> was green except the text. Rebooted again, got back to normal colors.
> 
> And here are the logs :
> 
> kern.log about Firefox usage :
> > Nov 14 05:26:50 desk kernel: [  324.714998] Chrome_~dThread[1788]: segfault 
> > at 0 ip 7fbfee5e3181 sp 7fbfec2d1ad0 error 6 in 
> > libxul.so[7fbfee5cf000+3a2c000]
> 
> It points that the CPU stills with either a problematic microcode or is
> defective.
> 
> dmesg about amdgpu screen freeze :
> > [ 3323.920795] amdgpu :09:00.0: GPU fault detected: 146 0x080c for 
> > process hl2_linux pid 14648 thread amdgpu_cs:0 pid 14653
> > [ 3323.920799] amdgpu :09:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   
> > 0x
> > [ 3323.920801] amdgpu :09:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 
> > 0x0200800C
> > [ 3323.920804] amdgpu :09:00.0: VM fault (0x0c, vmid 1, pasid 32774) at 
> > page 0, read from 'TC0' (0x54433000) (8)
> > [ 3334.103233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
> > signaled seq=274140, emitted seq=274142
> > [ 3334.103239] amdgpu :09:00.0: GPU reset begin!
> > [ 3344.332607] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* 
> > [CRTC:46:crtc-0] hw_done or flip_done timed out
> > [ 3504.834097] INFO: task kworker/u32:2:3872 blocked for more than 120 
> > seconds.
> > [ 3504.834103]   Not tainted 4.20.0-rc1-amd #2
> > [ 3504.834105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> > this message.
> > [ 3504.834107] kworker/u32:2   D0  3872  2 0x8000
> > [ 3504.834123] Workqueue: events_unbound commit_work [drm_kms_helper]
> > [ 3504.834126] Call Trace:
> > [ 3504.834133]  ? __schedule+0x2a0/0x880
> > [ 3504.834136]  schedule+0x28/0x80
> > [ 3504.834139]  schedule_timeout+0x25d/0x380
> > [ 3504.834217]  ? dce110_timing_generator_get_position+0x5b/0x70 [amdgpu]
> > [ 3504.834292]  ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0 
> > [amdgpu]
> > [ 3504.834297]  dma_fence_default_wait+0x23b/0x2a0
> > [ 3504.834301]  ? dma_fence_release+0x90/0x90
> > [ 3504.834304]  dma_fence_wait_timeout+0xdd/0x100
> > [ 3504.834308]  reservation_object_wait_timeout_rcu+0x161/0x270
> > [ 3504.834387]  amdgpu_dm_do_flip+0x112/0x370 [amdgpu]
> > [ 3504.834468]  amdgpu_dm_atomic_commit_tail+0x68b/0xcd0 [amdgpu]
> > [ 3504.834472]  ?

Re: [Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-02-05 Thread Garry Hurley
What I want to know is what is calling your machine ‘localhorst’? 

Sent from my iPhone

> On Nov 20, 2018, at 9:15 AM, bugzilla-dae...@freedesktop.org wrote:
> 
> Comment # 47 on bug 105733 from Allan
> I have really bad news.
> 
> I'm delaying a lot to answer because I literally sent for warranty or replaced
> ALL of my components in the PC.
> 
> The CPU (R7 1800X) was replaced from a batch 21 to a new by AMD itself batched
> 35.
> 
> But OK, let's talk about the amdgpu :
> 
> (In reply to Andrey Grodzovsky from comment #25)
> > (In reply to Allan from comment #12)
> > Can you build latest kernel (4.18) and grab again latest firmware and try
> > again ?
> > Links to kernel and firmware:
> > https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
> > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> >  
> 
> For reasons already explained here I couldn't either compile or test it 
> before,
> so please don't be mad with me :
> - Sold my old PC.
> - My notebook was completely filled with files.
> - Components on warranty. Testing everything else.
> 
> So I managed to borrow a PC to test the video cards. I have tested only the
> nvidia one to prove for AMD that the GPU is working and the pci-controller (a
> guess of mine) of the CPU/chipset that is broken. Going to test the RX480 on
> this PC as soon as possible. My warranties are expiring and I had to enumerate
> priorities.
> 
> I already said it here but, with the 1800X I couldn't even clone the git
> repository (the checksum always fails, tried many times).
> 
> Then I managed to free some space on my notebook and started to build
> yesterday.
> - Included amd-ucode firmware.
> - Included polaris10 firmware (for RX480).
> - Made some optimizations for ryzen as descbribed on the gentoo's dedicated
> page.
> 
> Compiled, version 4.20-rc1 as present in the branch. No errors reported.
> 
> There are 2 main applications that are easier to test right now to find the
> problems :
> - Metro 2033 Redux through steam.
> - Left for Dead 2 through steam.
> 
> Started Metro 2033, worked for some minutes with no issue, but it was for some
> reason without any sound. Closed. Turned off the HDMI audio on pavucontrol to
> use only the default output. Restarted steam.
> 
> Started Left for Dead 2 this time. Was able to change graphics settings to max
> without AA and vsync. Played for 15 seconds and got a screen freeze. Waited 
> for
> a script to record properly the logs and temps. Hard rebooted. This time even
> my BIOS/EFI screen had a green background, but still operational. Everything
> was green except the text. Rebooted again, got back to normal colors.
> 
> And here are the logs :
> 
> kern.log about Firefox usage :
> > Nov 14 05:26:50 desk kernel: [  324.714998] Chrome_~dThread[1788]: segfault 
> > at 0 ip 7fbfee5e3181 sp 7fbfec2d1ad0 error 6 in 
> > libxul.so[7fbfee5cf000+3a2c000]
> 
> It points that the CPU stills with either a problematic microcode or is
> defective.
> 
> dmesg about amdgpu screen freeze :
> > [ 3323.920795] amdgpu :09:00.0: GPU fault detected: 146 0x080c for 
> > process hl2_linux pid 14648 thread amdgpu_cs:0 pid 14653
> > [ 3323.920799] amdgpu :09:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   
> > 0x
> > [ 3323.920801] amdgpu :09:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 
> > 0x0200800C
> > [ 3323.920804] amdgpu :09:00.0: VM fault (0x0c, vmid 1, pasid 32774) at 
> > page 0, read from 'TC0' (0x54433000) (8)
> > [ 3334.103233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
> > signaled seq=274140, emitted seq=274142
> > [ 3334.103239] amdgpu :09:00.0: GPU reset begin!
> > [ 3344.332607] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* 
> > [CRTC:46:crtc-0] hw_done or flip_done timed out
> > [ 3504.834097] INFO: task kworker/u32:2:3872 blocked for more than 120 
> > seconds.
> > [ 3504.834103]   Not tainted 4.20.0-rc1-amd #2
> > [ 3504.834105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> > this message.
> > [ 3504.834107] kworker/u32:2   D0  3872  2 0x8000
> > [ 3504.834123] Workqueue: events_unbound commit_work [drm_kms_helper]
> > [ 3504.834126] Call Trace:
> > [ 3504.834133]  ? __schedule+0x2a0/0x880
> > [ 3504.834136]  schedule+0x28/0x80
> > [ 3504.834139]  schedule_timeout+0x25d/0x380
> > [ 3504.834217]  ? dce110_timing_generator_get_position+0x5b/0x70 [amdgpu]
> > [ 3504.834292]  ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0 
> > [amdgpu]
> > [ 3504.834297]  dma_fence_default_wait+0x23b/0x2a0
> > [ 3504.834301]  ? dma_fence_release+0x90/0x90
> > [ 3504.834304]  dma_fence_wait_timeout+0xdd/0x100
> > [ 3504.834308]  reservation_object_wait_timeout_rcu+0x161/0x270
> > [ 3504.834387]  amdgpu_dm_do_flip+0x112/0x370 [amdgpu]
> > [ 3504.834468]  amdgpu_dm_atomic_commit_tail+0x68b/0xcd0 [amdgpu]
> > [ 3504.834472]  ? __switch_to_asm+0x40/0x70
> > [ 3504.834475]  ? wait_for_completion_timeout+0x3b/0x1a0
> > [ 3

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-02-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #70 from jake.hed...@yahoo.com ---
Ok, that seems to have stabilized my system.  It at least withstood constant
use for 4+ hours.  I went idle, stressed it, idle again, and no crashes.  

My current setup is Buster and idle=nomwait.  I am going to move to add
idle=nomwait to my startup permanently for now and continue reading on the
behavior so I can better troubleshoot moving ahead.  From my cursory glance
this seems to indicate it was a CPU issue rather than display problem.   Is
that way off base?  Thank you for the suggestion, I will report back if issue
reoccur.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-02-04 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #69 from jake.hed...@yahoo.com ---
Hi Alex, comment 64 did not resolve the issue.  It did seem to delay the crash,
but ultimately did not resolve it.  I will test the idle=nomwait param now and
begin testing.  If I am still stuck, I also have another suggestion to limit
the Mhz on the GPU itself.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-02-04 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #68 from Alex Deucher  ---
(In reply to jake.hedges from comment #66)
> It really did not take too long to crash it with even with the params.  I
> back to square one.  Thinking I will at least try a few different distros
> and possibly upgrade some hardware though I am not disappointed in their
> performance until I have used linux.  Anyways, I will keep experimenting and
> report back.

Do the suggestions in comment 64 or comment 67 help?

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-02-04 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #66 from jake.hed...@yahoo.com ---
It really did not take too long to crash it with even with the params.  I back
to square one.  Thinking I will at least try a few different distros and
possibly upgrade some hardware though I am not disappointed in their
performance until I have used linux.  Anyways, I will keep experimenting and
report back.

--- Comment #67 from Alex Deucher  ---
For those with AMD platforms, does adding idle=nomwait on the kernel command
line in grub help?

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-02-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #65 from jake.hed...@yahoo.com ---
Adding my PC to the pile affected by this -

Ryzen 5 1600 
Aorus RX 480
Debian (stretch) 
2x8GB G.Skill DIMM (previously OC, but now everything in BIOS is "optimized
default")
ASUS ROG STRIX B350-F with latest bios/aegesa 4207

I am windows migrant who went cold turkey into Linux.  Debian has been kind to
me minus a few hiccups and re-installs.  My very first few installations have
been the most stable sadly.  Finally, I have pin pointed my issue to this
thread.  

/var/log/syslog demonstrates the GPU failure messages right before crash. 
Issue seems to occur whether I have linux-firmware installed or not.  Most
recently, I had crash simply opening "show applications" in GNOME.  

Crash is the same as others have stated.  Screen blips and goes black.  Fans
spin up high speed.  (I did not test ssh), but you cannot use reset button. 
The machine must be hard power down in order to recover.  Twice now this has
corrupted file system to the point where it would not boot normally.  Since, I
was not versed enough to recover manually, I just re-installed.  

As this is my first hard shot at Linux, this is quite a damper on what was a
very exciting change.  For now, I have applied the kernel params mentioned
above and will report back should I crash again.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-02-02 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #64 from lada.dvor...@gmail.com ---
I've been facing freezes for many days on Ryzen1600+RX560. I have tried bios,
kernel, mesa updates, kernel parameters: "processor.nocst=1 iommu=pt
amggpu.vm_update_mode=3", but it didn't help. Finally I've tried kernel param
ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2 and it does the trick. No
freezing anymore.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-01-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #63 from l...@protonmail.ch ---
Well, my GPU doesn't even work properly on Windows anymore. I do not think the
GPU was originally faulty, as it *did* work without problems on Windows before,
but now after having used it on Linux, it has the exact same problems on
Windows. Hopefully I can get it replaced, but I will not use it on Linux
anymore for fear of fucking it up again.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-01-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #62 from l...@protonmail.ch ---
What changes happened in 5.0rc3 that could have fixed this? I will try to see
if I still experience problems with 5.0rc3 when I can check.

Also, can you elaborate on what you mean by draining the built-in battery? Are
you using a laptop, or are you referring to some other built-in battery? Excuse
my ignorance.

(In reply to Zheng Luo from comment #61)
> I experienced similar problems, but mine is much worse. I can't recover from
> black screen after reboot/hard reset unless I drain the builtin battery.
> However this problem disappears in 5.0rc3 (in contrast to the buggy 4.20).
> Strongly suspect there are some kinds of firmware corruption

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #61 from Zheng Luo  ---
I experienced similar problems, but mine is much worse. I can't recover from
black screen after reboot/hard reset unless I drain the builtin battery.
However this problem disappears in 5.0rc3 (in contrast to the buggy 4.20).
Strongly suspect there are some kinds of firmware corruption

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-01-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #60 from l...@protonmail.ch ---
dwagner, my problem persists even if I completely power the system down after
shutting it down by holding down the power button and then turning the PSU
completely off. I have not tried shutting it off only using the PSU fearing
damage to my hardware, although I will try that the next time at least once.

Also, it is reassuring to see that I am not the only one experiencing such odd
behavior, but could it not be that we simply all use faulty hardware?

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-01-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #59 from dwagner  ---
I don't think your observations indicate a hardware defect.

I have also a reproducible "hysteresis"-effect with regards to my RX460 GPU:
When I experience the bug scenario I reported in
https://bugs.freedesktop.org/show_bug.cgi?id=102322 and then reboot by pressing
the reset-button, the BIOS greeting and the GRUB loader are consistently not
shown (just a black screen, but with the connected TV indicating a valid HDMI
signal), only once Linux sets the console video mode during boot, then the
screen lights up again. If at that point, or at any time thereafter I reboot
either by typing "reboot" or by pressing the RESET button, then the BIOS
greeting and GRUB menu are shown as normal.

I think this is just due to some lack of thorough initialization upon reset,
because if I power down the machine by switching off the power supply, and then
reboot, the BIOS and GRUB menu always come up. Seems to me that pressing the
RESET button just isn't resetting as much as an actual power down does.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-01-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #58 from krutoiles...@gmail.com ---
The corruption suggestion is interesting. My RX580 does this and now it
won't even boot on windows anymore, just crashes.

On Wed, Jan 23, 2019, 12:44 PM  l...@protonmail.ch changed bug 105733
> 
> What Removed Added
> CC   l...@protonmail.ch
>
> *Comment # 57  on
> bug 105733  from
> l...@protonmail.ch  *
>
> Created attachment 143206 
>  [details] 
> 
> I get these errors when attempting to boot after a normal GPU hang and KMS
> happens
>
> Recently I've been getting another type of hang somehow. After a normal hang
> happens, where my screen gets garbled output, I can't even get past KMS in the
> next couple of boots. I can fix this by flipping my VBIOS switch, which 
> heavily
> leads me to believe that amdgpu somehow corrupts the GPU's firmware. I have
> attached the error I get when KMS happens at boot, which happens after I get a
> hang while using the system normally. The monitor doesn't display anything 
> when
> this happens, but I can still control caps lock, etc., however I can not shut
> it off normally. I have not tested whether SSH and such still work.
>
> This honestly makes me doubt whether what I am experiencing is the same bug; 
> is
> it simply a faulty GPU? I am using a Sapphire RX 580 4GB, which I bought used
> from a windows user. It *did* work for him, so obviously it isn't entirely
> broken at least.
>
> --
> You are receiving this mail because:
>
>- You are on the CC list for the bug.
>
>

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-01-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

l...@protonmail.ch changed:

   What|Removed |Added

 CC||l...@protonmail.ch

--- Comment #57 from l...@protonmail.ch ---
Created attachment 143206
  --> https://bugs.freedesktop.org/attachment.cgi?id=143206&action=edit
I get these errors when attempting to boot after a normal GPU hang and KMS
happens

Recently I've been getting another type of hang somehow. After a normal hang
happens, where my screen gets garbled output, I can't even get past KMS in the
next couple of boots. I can fix this by flipping my VBIOS switch, which heavily
leads me to believe that amdgpu somehow corrupts the GPU's firmware. I have
attached the error I get when KMS happens at boot, which happens after I get a
hang while using the system normally. The monitor doesn't display anything when
this happens, but I can still control caps lock, etc., however I can not shut
it off normally. I have not tested whether SSH and such still work.

This honestly makes me doubt whether what I am experiencing is the same bug; is
it simply a faulty GPU? I am using a Sapphire RX 580 4GB, which I bought used
from a windows user. It *did* work for him, so obviously it isn't entirely
broken at least.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-01-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #56 from l...@protonmail.ch ---
Also, forgot to mention, but the new GPU recovery thing doesn't work, and it
would make the following error in dmesg:

jan 16 16:43:26 las kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=2792, emitted seq=2795
jan 16 16:43:26 las kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
information: process  pid 0 thread  pid 0
jan 16 16:43:26 las kernel: amdgpu :08:00.0: GPU reset begin!
jan 16 16:43:26 las kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=7013, emitted seq=7015
jan 16 16:43:26 las kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
information: process sway pid 1328 thread sway:cs0 pid 1329
jan 16 16:43:26 las kernel: amdgpu :08:00.0: GPU reset begin!
jan 16 16:43:26 las kernel: amdgpu: [powerplay] 
 failed to send message 261 ret is 0 
jan 16 16:43:27 las kernel: amdgpu: [powerplay] 
 last message was failed ret is 0
jan 16 16:43:28 las kernel: amdgpu: [powerplay] 
 failed to send message 261 ret is 0 
lines 955-1010/1010 (END)

(Fetched through `journalctl -ekb`)

This however stopped once I switched to DVI-D, since I now get no errors at
all.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-01-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #55 from l...@protonmail.ch ---
I have a very similar problem, although the few differences is that my entire
screen becomes one single color, which doesn't seem to be entirely random. Some
times it is grey, other times a blueish tint, but never colors like black.
In addition, num lock etc. are still responsive for a small while, although it
seems that a delay in the response time is added rapidly each second, very soon
seeming completely unresponsive.

My system:
CPU: AMD Ryzen 5 1600
GPU: Sapphire NITRO+ RX 580 4 GB
Motherboard: ASUS ROG STRIX X470-F
Kernel: 4.20.1
Distribution: NixOS
WM: Sway or i3, happens in both

I am using DVI-D, if that is at all relevant.

Oddly enough, even though the symptoms have stayed the exact same the entire
time, the error messages I get very widely. At one point I was getting the "GPU
fault detected" errors, at other times it would say that an sdma0 ring or gfx
ring timed out, and now I have no errors at all when it happens, which seems to
have happened after I switched from an HDMI display to a DVI-D display (it also
seems to have become much more infrequent oddly enough?).
Another interesting thing is that when I was using 4.18.12 or lower, I could
avoid this problem entirely by flipping my VBIOS switch away from the IO ports.
In addition, when it starts happening, if I reboot my system by just turning it
off by holding down the power button and then turning it on normally, it will
happen soon again after launching my WM. This is seemingly avoidable by
completely disconnecting it from power, e.g. by turning my PSU off.


This might actually be a completely unrelated bug, but the symptoms seem to fit
enough, that it could be the same bug.
It could perhaps also be a hardware bug, since it is very odd that the errors I
get change, or maybe it is multiple bugs that seem to be the same? In addition,
I can't find a definite way to reproduce my issue instantly other than just
waiting for it to happen, although of course graphics intensive work does
accelerate it considerably.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2019-01-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #54 from OliverHB  ---
Did anyone ever try switching to a text console (CTRL-ALT-F[1-6]) and back
(usually CTRl-ALT-F7)to graphical screen? That does the trick for me! However,
I wouldn't mind if there is a solution which makes that obsolete...

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #53 from fin4...@hotmail.com ---
Created attachment 142573
  --> https://bugs.freedesktop.org/attachment.cgi?id=142573&action=edit
AMD wip kernel config with 1000Hz timer

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #52 from fin4...@hotmail.com ---
To prevent random kernel lock ups with Ryzen, fix this with bios, set to
Typical Current Idle  in the bios Advanced/AMD CBS menu.

Use latest AMD wip kernel and Oibaf ppa Mesa. Disable display composting and
vsync with Xfce. Use 300Hz kernel timer.

Working kernel config file for my system as attachment.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #51 from Allan  ---
Tried to install the RX480 on the other PC : the card is too big that it
touches the RAM slot's tabs. Can't install it.

In time, seems like the errors delay a little bit when setting
randomize_va_space=0. Was testing it for the CPU and noticed that amdgpu
delayed to fail, but it still failed.

What happened :
- the screen got granulated with pinkish colors as usual
 - desktop extended this behavior
- but I could operate the system
- tty was black and white (normal)
- I could restart x server
- colors got normal after restarting
- tried the same application again
- crashed and froze the system

Main difference : 
- now sometimes I can kill the tasks/restart xserver

I registered the times of each event, here follows:

(Firefox was opened in background while I tried to play Left for Dead 2 through
steam)

1. Recoverable delay with granulated colors (l4d2 begins 11:48, occurs 11:50
after some delay while loading the game menu)
> [Thu Nov 22 11:48:03 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring 
> sdma0 timeout, signaled seq=11477, emitted seq=11480
> [Thu Nov 22 11:48:03 2018] amdgpu :09:00.0: GPU reset begin!
> [Thu Nov 22 11:48:03 2018] amdgpu :09:00.0: GPU pci config reset
> [Thu Nov 22 11:48:03 2018] amdgpu :09:00.0: GPU reset succeeded, trying 
> to resume
> [Thu Nov 22 11:48:03 2018] [drm] PCIE GART of 256M enabled (table at 
> 0x00F40030).
> [Thu Nov 22 11:48:03 2018] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* 
> VRAM is lost!
> [Thu Nov 22 11:48:04 2018] amdgpu :09:00.0: [drm:amdgpu_ring_test_helper 
> [amdgpu]] *ERROR* ring comp_1.3.1 test failed (-110)
> [Thu Nov 22 11:48:04 2018] [drm] UVD and UVD ENC initialized successfully.
> [Thu Nov 22 11:48:04 2018] [drm] VCE initialized successfully.
> [Thu Nov 22 11:48:04 2018] [drm] recover vram bo from shadow start
> [Thu Nov 22 11:48:04 2018] [drm] recover vram bo from shadow done
> [Thu Nov 22 11:48:04 2018] [drm] Skip scheduling IBs!
> [Thu Nov 22 11:48:04 2018] [drm] Skip scheduling IBs!
> [Thu Nov 22 11:48:04 2018] amdgpu :09:00.0: GPU reset(1) succeeded!
> [Thu Nov 22 11:48:04 2018] [drm] Skip scheduling IBs!
> [Thu Nov 22 11:48:04 2018] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to 
> initialize parser -125!
> [Thu Nov 22 11:48:04 2018] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to 
> initialize parser -125!
> [Thu Nov 22 11:48:04 2018] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to 
> initialize parser -125!
> [Thu Nov 22 11:48:04 2018] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to 
> initialize parser -125!
> [Thu Nov 22 11:48:06 2018] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to 
> initialize parser -125!
> [Thu Nov 22 11:50:46 2018] show_signal_msg: 9 callbacks suppressed
> [Thu Nov 22 11:50:46 2018] Chrome_~dThread[1734]: segfault at 0 ip 
> 7f7926c4c181 sp 7f792493aad0 error 6 in 
> libxul.so[7f7926c38000+3a2c000]
> [Thu Nov 22 11:50:46 2018] Code: 15 dc f2 5f 04 48 89 10 c7 04 25 00 00 00 00 
> 7c 09 00 00 e8 21 60 ff ff 90 48 8b 05 f9 2a 9b 05 48 8d 0d 22 f3 5f 04 48 89 
> 08  04 25 00 00 00 00 02 0a 00 00 e8 ff 5f ff ff e8 0a f3 ff ff 48
> [Thu Nov 22 11:50:46 2018] Chrome_~dThread[1885]: segfault at 0 ip 
> 7f7fa150a181 sp 7f7f9f1f8ad0 error 6 in 
> libxul.so[7f7fa14f6000+3a2c000]
> [Thu Nov 22 11:50:46 2018] Chrome_~dThread[8072]: segfault at 0 ip 
> 7fffededa181 sp 7fffebbc8ad0 error 6
> [Thu Nov 22 11:50:46 2018] Code: 15 dc f2 5f 04 48 89 10 c7 04 25 00 00 00 00 
> 7c 09 00 00 e8 21 60 ff ff 90 48 8b 05 f9 2a 9b 05 48 8d 0d 22 f3 5f 04 48 89 
> 08  04 25 00 00 00 00 02 0a 00 00 e8 ff 5f ff ff e8 0a f3 ff ff 48
> [Thu Nov 22 11:50:46 2018]  in libxul.so[7fffedec6000+3a2c000]
> [Thu Nov 22 11:50:46 2018] Code: 15 dc f2 5f 04 48 89 10 c7 04 25 00 00 00 00 
> 7c 09 00 00 e8 21 60 ff ff 90 48 8b 05 f9 2a 9b 05 48 8d 0d 22 f3 5f 04 48 89 
> 08  04 25 00 00 00 00 02 0a 00 00 e8 ff 5f ff ff e8 0a f3 ff ff 48
> [Thu Nov 22 11:50:46 2018] Chrome_~dThread[1931]: segfault at 0 ip 
> 7f8dc581f181 sp 7f8dc350dad0 error 6 in 
> libxul.so[7f8dc580b000+3a2c000]
> [Thu Nov 22 11:50:46 2018] Code: 15 dc f2 5f 04 48 89 10 c7 04 25 00 00 00 00 
> 7c 09 00 00 e8 21 60 ff ff 90 48 8b 05 f9 2a 9b 05 48 8d 0d 22 f3 5f 04 48 89 
> 08  04 25 00 00 00 00 02 0a 00 00 e8 ff 5f ff ff e8 0a f3 ff ff 48
kern.log = dmesg

2. Unrecoverable crash (l4d2 begins 12:00, goes well until 12:55 when crashes
everything)
dmesg:
> [Thu Nov 22 12:55:04 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring 
> gfx timeout, signaled seq=1688198, emitted seq=1688200
> [Thu Nov 22 12:55:04 2018] amdgpu :09:00.0: GPU reset begin!
> [Thu Nov 22 12:55:14 2018] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* 
> [CRTC:46:crtc-0] hw_done or flip_done timed out
kern.log = dmesg

Xorg log is not reporting anything useful.


(In reply to russianneuromancer from comment #50)
> Can't tell you about RX480, but I know for s

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #50 from russianneuroman...@ya.ru ---
>  And this is why I'm thinking that the 1800X has a defective pci-controller. 
> And it is also the second part of the "really bad news". Maybe it is 
> happening mostly with ryzen processors?

Can't tell you about RX480, but I know for sure that at least Vega 64 is
totally fine with 1800X PCI-controller, no single not-solvable graphics-related
issue for a year (so far all issues I had was solved by upgrading kernel and/or
Mesa).

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #49 from Allan  ---
Haha, I knew I could count with a genuine updated valid segfault (thanks to
ryzen) :
> /var/log/kern.log:Nov 20 13:42:20 desk kernel: [ 9940.857175] Cameras 
> IPC[1957]: segfault at 0 ip 55ea219b1cc2 sp 7f390e1fe8b0 error 6
> /var/log/kern.log:Nov 20 13:42:20 desk kernel: [ 9940.857184] Cameras 
> IPC[2999]: segfault at 0 ip 5651d2cf7cc2 sp 7f95f6fb48b0 error 6
> /var/log/kern.log:Nov 20 13:42:20 desk kernel: [ 9940.857232] 
> Chrome_~dThread[1809]: segfault at 0 ip 7f3942529181 sp 7f3940217ad0 
> error 6 in libxul.so[7f3942515000+3a2c000]
> /var/log/kern.log:Nov 20 13:42:20 desk kernel: [ 9940.857264] 
> Chrome_~dThread[2448]: segfault at 0 ip 7f963661a181 sp 7f9634308ad0 
> error 6 in libxul.so[7f9636606000+3a2c000]

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #48 from Allan  ---
Damn, ignore the kern.log report, is outdated.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #47 from Allan  ---
I have really bad news.

I'm delaying a lot to answer because I literally sent for warranty or replaced
ALL of my components in the PC.

The CPU (R7 1800X) was replaced from a batch 21 to a new by AMD itself batched
35.

But OK, let's talk about the amdgpu :

(In reply to Andrey Grodzovsky from comment #25)
> (In reply to Allan from comment #12)
> Can you build latest kernel (4.18) and grab again latest firmware and try
> again ?
> Links to kernel and firmware:
> https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/ 

For reasons already explained here I couldn't either compile or test it before,
so please don't be mad with me :
- Sold my old PC.
- My notebook was completely filled with files.
- Components on warranty. Testing everything else.

So I managed to borrow a PC to test the video cards. I have tested only the
nvidia one to prove for AMD that the GPU is working and the pci-controller (a
guess of mine) of the CPU/chipset that is broken. Going to test the RX480 on
this PC as soon as possible. My warranties are expiring and I had to enumerate
priorities.

I already said it here but, with the 1800X I couldn't even clone the git
repository (the checksum always fails, tried many times).

Then I managed to free some space on my notebook and started to build
yesterday.
- Included amd-ucode firmware.
- Included polaris10 firmware (for RX480).
- Made some optimizations for ryzen as descbribed on the gentoo's dedicated
page.

Compiled, version 4.20-rc1 as present in the branch. No errors reported.

There are 2 main applications that are easier to test right now to find the
problems :
- Metro 2033 Redux through steam.
- Left for Dead 2 through steam.

Started Metro 2033, worked for some minutes with no issue, but it was for some
reason without any sound. Closed. Turned off the HDMI audio on pavucontrol to
use only the default output. Restarted steam.

Started Left for Dead 2 this time. Was able to change graphics settings to max
without AA and vsync. Played for 15 seconds and got a screen freeze. Waited for
a script to record properly the logs and temps. Hard rebooted. This time even
my BIOS/EFI screen had a green background, but still operational. Everything
was green except the text. Rebooted again, got back to normal colors.

And here are the logs :

kern.log about Firefox usage :
> Nov 14 05:26:50 desk kernel: [  324.714998] Chrome_~dThread[1788]: segfault 
> at 0 ip 7fbfee5e3181 sp 7fbfec2d1ad0 error 6 in 
> libxul.so[7fbfee5cf000+3a2c000]

It points that the CPU stills with either a problematic microcode or is
defective.

dmesg about amdgpu screen freeze :
> [ 3323.920795] amdgpu :09:00.0: GPU fault detected: 146 0x080c for 
> process hl2_linux pid 14648 thread amdgpu_cs:0 pid 14653
> [ 3323.920799] amdgpu :09:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   
> 0x
> [ 3323.920801] amdgpu :09:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 
> 0x0200800C
> [ 3323.920804] amdgpu :09:00.0: VM fault (0x0c, vmid 1, pasid 32774) at 
> page 0, read from 'TC0' (0x54433000) (8)
> [ 3334.103233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
> signaled seq=274140, emitted seq=274142
> [ 3334.103239] amdgpu :09:00.0: GPU reset begin!
> [ 3344.332607] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:46:crtc-0] 
> hw_done or flip_done timed out
> [ 3504.834097] INFO: task kworker/u32:2:3872 blocked for more than 120 
> seconds.
> [ 3504.834103]   Not tainted 4.20.0-rc1-amd #2
> [ 3504.834105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [ 3504.834107] kworker/u32:2   D0  3872  2 0x8000
> [ 3504.834123] Workqueue: events_unbound commit_work [drm_kms_helper]
> [ 3504.834126] Call Trace:
> [ 3504.834133]  ? __schedule+0x2a0/0x880
> [ 3504.834136]  schedule+0x28/0x80
> [ 3504.834139]  schedule_timeout+0x25d/0x380
> [ 3504.834217]  ? dce110_timing_generator_get_position+0x5b/0x70 [amdgpu]
> [ 3504.834292]  ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0 
> [amdgpu]
> [ 3504.834297]  dma_fence_default_wait+0x23b/0x2a0
> [ 3504.834301]  ? dma_fence_release+0x90/0x90
> [ 3504.834304]  dma_fence_wait_timeout+0xdd/0x100
> [ 3504.834308]  reservation_object_wait_timeout_rcu+0x161/0x270
> [ 3504.834387]  amdgpu_dm_do_flip+0x112/0x370 [amdgpu]
> [ 3504.834468]  amdgpu_dm_atomic_commit_tail+0x68b/0xcd0 [amdgpu]
> [ 3504.834472]  ? __switch_to_asm+0x40/0x70
> [ 3504.834475]  ? wait_for_completion_timeout+0x3b/0x1a0
> [ 3504.834477]  ? __switch_to_asm+0x34/0x70
> [ 3504.834480]  ? __switch_to_asm+0x40/0x70
> [ 3504.834483]  ? __switch_to+0x1ba/0x450
> [ 3504.834492]  commit_tail+0x3d/0x70 [drm_kms_helper]
> [ 3504.834497]  process_one_work+0x1aa/0x3a0
> [ 3504.834500]  worker_thread+0x30/0x3a0
> [ 3504.834503]  ? drain_workqueue+0x130/0x130
> [ 3504.8

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #46 from Kent Ross  ---
Created attachment 142511
  --> https://bugs.freedesktop.org/attachment.cgi?id=142511&action=edit
dmesg logs for failure

Other items of potential relevance:

I have two screens, one at 3840x2160 and one at 2560x1600. When I've
experienced this failure (I haven't tried a wide variety of applications) it it
is with games that do not have exclusive control of the screen, running in the
desktop compositor.

The second screen also freezes, but other applications that are running on the
other screen -- such as a Chrome window playing streaming video -- will have
their audio continue uninterrupted.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #45 from Kent Ross  ---
This happens to me as well. I first noticed it occurring when I had a
double-GPU setup, but since then I have completely reinstalled with only the
AMD gpu (a Vega 64) and it still happens. The failures are similar to those
bernhardu notes. I have not had a failure simply using Chrome and desktop
applications yet, but it is typically reproducible between 5 and 60 minutes in
a 3D game like Dota 2.

I suspected it might be related to memory stability, but the machine it happens
on happily passes both memtest and mprime. The lockup still occurs even when
the memory is underclocked by 25% (retaining the same timings and voltage, so
that's a full 25% overhead for every command).

I have:

- Intel 7980XE cpu
- Ubuntu Cosmic, linux-image-4.18.0-11-generic
- default amdgpu drivers

I have also tried updated amdgpu packages from ppa:oibaf/graphics-drivers; the
failure is the same.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #44 from krutoiles...@gmail.com ---
No mine passes memtest as well, but I was seeing failures on mprime. Also
was on Corsair vengence originally. See if you can borrow a stick to test
from someone.

On Sat, Nov 17, 2018, 06:28  *Comment # 43  on
> bug 105733  from
> Philipp  *
>
> I've got 2x8GB DDR4 Corsair Vengeance LPX RAM (oh dear, those names).
> I have run a few rounds of memtest without any errors so far, but I'll run a
> few more hours today when I get the chance.
>
> Did you switch your RAM because of memtest error reports or other concerns?
>
> --
> You are receiving this mail because:
>
>- You are on the CC list for the bug.
>
>

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #43 from Philipp  ---
I've got 2x8GB DDR4 Corsair Vengeance LPX RAM (oh dear, those names).
I have run a few rounds of memtest without any errors so far, but I'll run a
few more hours today when I get the chance.

Did you switch your RAM because of memtest error reports or other concerns?

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #42 from krutoiles...@gmail.com ---
What's your ram on the machine? I swapped mine for gskills and the freezes
are completely gone now.

On Fri, Nov 16, 2018, 07:28  *Comment # 41  on
> bug 105733  from
> Philipp  *
>
> I can second much of what John W. says. The crashes have become less frequent
> with recent fimware/kernel versions, but they still happen.
> Also for me the crashes only started on my vega 64, when I threw out my 
> ancient
> Intel CPU and replaced it with an AMD Ryzen 5 1600 on a GR-AB350M-Gaming 3
> Board.
> I've done stability tests on that other OS, so I don't think I've got faulty
> hardware here.
>
> One of my crash logs:
>
> Nov 16 15:18:29 localhorst kernel: amdgpu :08:00.0: [gfxhub] VMC page 
> fault
> (src_id:0 ring:158 vmid:7 pasid:32776, for process RocketLeague pid 6347 
> thread
> RocketLeag:cs0 pid 6400
> )
> Nov 16 15:18:29 localhorst kernel: amdgpu :08:00.0:   at address
> 0x800319593000 from 27
> Nov 16 15:18:29 localhorst kernel: amdgpu :08:00.0:
> VM_L2_PROTECTION_FAULT_STATUS:0x0070053C
> Nov 16 15:18:30 localhorst kernel: amdgpu :08:00.0: [gfxhub] VMC page 
> fault
> (src_id:0 ring:220 vmid:7 pasid:32776, for process RocketLeague pid 6347 
> thread
> RocketLeag:cs0 pid 6400
> )
> Nov 16 15:18:30 localhorst kernel: amdgpu :08:00.0:   at address
> 0x8201004e from 27
> Nov 16 15:18:30 localhorst kernel: amdgpu :08:00.0:
> VM_L2_PROTECTION_FAULT_STATUS:0x007013B8
> Nov 16 15:18:40 localhorst kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
> ring gfx timeout, signaled seq=38153, emitted seq=38155
> Nov 16 15:18:40 localhorst kernel: [drm] GPU recovery disabled.
>
> --
> You are receiving this mail because:
>
>- You are on the CC list for the bug.
>
>

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #41 from Philipp  ---
I can second much of what John W. says. The crashes have become less frequent
with recent fimware/kernel versions, but they still happen.
Also for me the crashes only started on my vega 64, when I threw out my ancient
Intel CPU and replaced it with an AMD Ryzen 5 1600 on a GR-AB350M-Gaming 3
Board.
I've done stability tests on that other OS, so I don't think I've got faulty
hardware here.

One of my crash logs:

Nov 16 15:18:29 localhorst kernel: amdgpu :08:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:7 pasid:32776, for process RocketLeague pid 6347 thread
RocketLeag:cs0 pid 6400
)
Nov 16 15:18:29 localhorst kernel: amdgpu :08:00.0:   at address
0x800319593000 from 27
Nov 16 15:18:29 localhorst kernel: amdgpu :08:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x0070053C
Nov 16 15:18:30 localhorst kernel: amdgpu :08:00.0: [gfxhub] VMC page fault
(src_id:0 ring:220 vmid:7 pasid:32776, for process RocketLeague pid 6347 thread
RocketLeag:cs0 pid 6400
)
Nov 16 15:18:30 localhorst kernel: amdgpu :08:00.0:   at address
0x8201004e from 27
Nov 16 15:18:30 localhorst kernel: amdgpu :08:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x007013B8
Nov 16 15:18:40 localhorst kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring gfx timeout, signaled seq=38153, emitted seq=38155
Nov 16 15:18:40 localhorst kernel: [drm] GPU recovery disabled.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-11-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #40 from John W.  ---
Is there any resolution or work being done on this issue?
I've tried the frequency hack and it slightly delayed the issue
I also tried the latest amd staging kernel with latest firmware and XF86 driver
and found the same issue still happened but somewhat less. Reading my
journalctl logs I found sometimes when it occurs it will attempt to recover but
in the process loses NRAM and freezes the screen covered in odd colors
At least when this occurs the machine is otherwise functional and I can change
TTYs and kill X11
I'm using a 580 and I've added the relevant logs of the attempted recovery.

Nov 02 15:31:26 Towering-DG kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring sdma1 timeout, signaled seq=59193, emitted seq=59194
Nov 02 15:31:27 Towering-DG kernel: amdgpu :01:00.0: GPU reset begin!
Nov 02 15:31:27 Towering-DG kernel: amdgpu :01:00.0: GPU pci config reset
Nov 02 15:31:27 Towering-DG kernel: amdgpu :01:00.0: GPU reset succeeded,
trying to resume
Nov 02 15:31:27 Towering-DG kernel: [drm] PCIE GART of 256M enabled (table at
0x00F40030).
Nov 02 15:31:27 Towering-DG kernel: [drm:amdgpu_device_gpu_recover [amdgpu]]
*ERROR* VRAM is lost!
Nov 02 15:31:27 Towering-DG kernel: amdgpu :01:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.2.1 test failed
(-110)

(Note: Usually it's ring SDMA0 instead of SDMA1 and occasionally GFX)

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-10-31 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #39 from Allan  ---
I can't clone the git repo using command :
 "git clone git://people.freedesktop.org/~agd5f/linux"

Firstly it was checksum errors, found that the processor had a bug, replaced it
through warranty process, and now I'm getting :

"
Cloning into 'linux'...
remote: Enumerating objects: 6619592, done.
remote: Counting objects: 100% (6619592/6619592), done.
remote: Compressing objects: 100% (989580/989580), done.
remote: Total 6619592 (delta 5587252), reused 6617842 (delta 5585574)   
Receiving objects: 100% (6619592/6619592), 1.18 GiB | 896.00 KiB/s, done.
Resolving deltas: 100% (5587252/5587252), done.
fatal: did not receive expected object 22906b31d43fbb88c62d2f4b18c5bd2d0e3cebc1
fatal: index-pack failed
"

I get this error even using :

"git clone -b amd-staging-drm-next --single-branch
git://people.freedesktop.org/~agd5f/linux"

Any tip for me? Am I doing any mistake?

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #38 from Jan Jurzitza  ---
So the "manual" freq hack for me fixes games and graphics intensive
applications (or at least delays it by more than 15 hours). There are still
actual crashes (don't know if it's because of GPU or CPU) that occasionally
occur with my setup which happen after simply browsing a lot, especially with
lots of SVGs and images, but they used to happen before the manual hack as well
and don't seem to be related to this issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-09-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #37 from markusr...@gmail.com ---
(In reply to markusraat from comment #36)
> (In reply to markusraat from comment #35)
> > It might be that kernel option apci=ht ( also apci=off ) solve the problem.
> > It is taking more time to waiting the possible problem appearance. At least
> > it worth of testing. But this is not maybe the final solution for this bug?
> > 
> > [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.18.5-041805-generic
> > root=UUID=c3df607f-ac6e-11e8-9f6b-3497f638e103 ro acpi=ht
> > [0.00] Malformed early option 'acpi'
> 
> Okay, the "acpi=off" or "acpi=ht" was the miss shot.
> 
> But changing from motherboard bios GPU PCIe speed auto > gen3 is giving very
> promissing results! I also rose logging level from grub settings to
> "loglevel=8" but I haven't got regenerated the crash. I will reply if this
> fails again.

Nope,

Sep  6 16:04:31 x99 org.gnome.Shell.desktop[2332]: [Child 18594, MediaPlayback
#2] WARNING: Decoder=7fbcd7976d40 Decode error: NS_ERROR_DOM_MEDIA_FATAL_ERR
(0x806e0005) -
RefPtr,
mozilla::MediaResult, true> >
mozilla::MediaSourceTrackDemuxer::DoGetSamples(int32_t): manager is detached.:
file
/build/firefox-oscv9o/firefox-61.0.1+build1/dom/media/MediaDecoderStateMachine.cpp,
line 3411
Sep  6 16:04:31 x99 org.gnome.Shell.desktop[2332]: [Child 18594, MediaPlayback
#1] WARNING: Decoder=7fbcd7976d40 Decode error: NS_ERROR_DOM_MEDIA_FATAL_ERR
(0x806e0005) -
RefPtr,
mozilla::MediaResult, true> >
mozilla::MediaSourceTrackDemuxer::DoGetSamples(int32_t): manager is detached.:
file
/build/firefox-oscv9o/firefox-61.0.1+build1/dom/media/MediaDecoderStateMachine.cpp,
line 3411
Sep  6 16:04:31 x99 org.gnome.Shell.desktop[2332]: [Child 18594, MediaPlayback
#3] WARNING: Decoder=7fbcd7976d40 Decode error: NS_ERROR_DOM_MEDIA_FATAL_ERR
(0x806e0005) -
RefPtr,
mozilla::MediaResult, true> >
mozilla::MediaSourceTrackDemuxer::DoGetSamples(int32_t): manager is detached.:
file
/build/firefox-oscv9o/firefox-61.0.1+build1/dom/media/MediaDecoderStateMachine.cpp,
line 3411

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-09-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #36 from markusr...@gmail.com ---
(In reply to markusraat from comment #35)
> It might be that kernel option apci=ht ( also apci=off ) solve the problem.
> It is taking more time to waiting the possible problem appearance. At least
> it worth of testing. But this is not maybe the final solution for this bug?
> 
> [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.18.5-041805-generic
> root=UUID=c3df607f-ac6e-11e8-9f6b-3497f638e103 ro acpi=ht
> [0.00] Malformed early option 'acpi'

Okay, the "acpi=off" or "acpi=ht" was the miss shot.

But changing from motherboard bios GPU PCIe speed auto > gen3 is giving very
promissing results! I also rose logging level from grub settings to
"loglevel=8" but I haven't got regenerated the crash. I will reply if this
fails again.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-31 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #35 from markusr...@gmail.com ---
It might be that kernel option apci=ht ( also apci=off ) solve the problem. It
is taking more time to waiting the possible problem appearance. At least it
worth of testing. But this is not maybe the final solution for this bug?

[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.18.5-041805-generic
root=UUID=c3df607f-ac6e-11e8-9f6b-3497f638e103 ro acpi=ht
[0.00] Malformed early option 'acpi'

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #34 from markusr...@gmail.com ---
I have exactly same in here. Youtube videos make the system crash randomly,
also it is happening without video playback. This bug is making whole system
worthless. I have run system ram memtests.

[   10.390343] [drm] amdgpu kernel modesetting enabled.
[   10.401936] fb: switching to amdgpudrmfb from EFI VGA
[   10.402439] amdgpu :01:00.0: enabling device (0106 -> 0107)
[   10.402655] amdgpu :01:00.0: VRAM: 4096M 0x00F4 -
0x00F4 (4096M used)
[   10.402658] amdgpu :01:00.0: GTT: 1024M 0x -
0x3FFF
[   10.402898] [drm] amdgpu: 4096M of VRAM memory ready
[   10.402900] [drm] amdgpu: 4096M of GTT memory ready.
[   10.658069] fbcon: amdgpudrmfb (fb0) is primary device
[   10.710971] amdgpu :01:00.0: fb0: amdgpudrmfb frame buffer device
[   10.744284] [drm] Initialized amdgpu 3.26.0 20150101 for :01:00.0 on
minor 0

[   10.390343] [drm] amdgpu kernel modesetting enabled.
[   10.401936] fb: switching to amdgpudrmfb from EFI VGA
[   10.402567] [drm] initializing kernel modesetting (FIJI 0x1002:0x7300
0x1002:0x0B36 0xCA).
[   10.402578] [drm] register mmio base: 0xFBE0
[   10.402579] [drm] register mmio size: 262144
[   10.402585] [drm] probing gen 2 caps for device 8086:2f08 = 77a3103/e
[   10.402587] [drm] probing mlw for device 8086:2f08 = 77a3103
[   10.402589] [drm] add ip block number 0 
[   10.402590] [drm] add ip block number 1 
[   10.402592] [drm] add ip block number 2 
[   10.402593] [drm] add ip block number 3 
[   10.402594] [drm] add ip block number 4 
[   10.402595] [drm] add ip block number 5 
[   10.402597] [drm] add ip block number 6 
[   10.402598] [drm] add ip block number 7 
[   10.402599] [drm] add ip block number 8 
[   10.402606] [drm] UVD is enabled in physical mode
[   10.402607] [drm] VCE enabled in physical mode
[   10.402648] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment
size is 9-bit
[   10.402662] [drm] Detected VRAM RAM=4096M, BAR=256M
[   10.402664] [drm] RAM width 512bits HBM
[   10.402898] [drm] amdgpu: 4096M of VRAM memory ready
[   10.402900] [drm] amdgpu: 4096M of GTT memory ready.
[   10.402906] [drm] GART: num cpu pages 262144, num gpu pages 262144
[   10.402941] [drm] PCIE GART of 1024M enabled (table at 0x00F40030).
[   10.403779] [drm] Found UVD firmware Version: 1.87 Family ID: 12
[   10.403784] [drm] UVD ENC is disabled
[   10.404378] [drm] Found VCE firmware Version: 53.20 Binary ID: 3
[   10.466617] [drm] dce110_link_encoder_construct: Failed to get
encoder_cap_info from VBIOS with error code 4!
[   10.480097] [drm] Display Core initialized with v3.1.44!
[   10.515673] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   10.515675] [drm] Driver supports precise vblank timestamp query.
[   10.550650] [drm] UVD initialized successfully.
[   10.650581] [drm] VCE initialized successfully.
[   10.658015] [drm] fb mappable at 0xC098C000
[   10.658017] [drm] vram apper at 0xC000
[   10.658018] [drm] size 14745600
[   10.658019] [drm] fb depth is 24
[   10.658020] [drm]pitch is 10240
[   10.658069] fbcon: amdgpudrmfb (fb0) is primary device
[   10.677347] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk
681000 pix_clk 241500
[   10.710971] amdgpu :01:00.0: fb0: amdgpudrmfb frame buffer device
[   10.744284] [drm] Initialized amdgpu 3.26.0 20150101 for :01:00.0 on
minor 0

System:Host: x99 Kernel: 4.18.5-041805-generic x86_64 bits: 64 gcc: 8.2.0
   Desktop: Gnome 3.28.3 (Gtk 3.22.30-1ubuntu1) Distro: Ubuntu 18.04.1
LTS
Machine:   Device: desktop System: ASUS product: All Series serial: N/A
   Mobo: ASUSTeK model: STRIX X99 GAMING v: Rev 1.xx serial: N/A
   UEFI: American Megatrends v: 1902 date: 03/21/2018
CPU:   18 core Intel Xeon E5-2696 v3 (-MT-MCP-) arch: Haswell rev.2 cache:
46080 KB
   flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 82945
   clock speeds: max: 3800 MHz 1: 1202 MHz 2: 1202 MHz 3: 1202 MHz 4:
1202 MHz 5: 1203 MHz 6: 1202 MHz
   7: 1203 MHz 8: 1284 MHz 9: 1203 MHz 10: 1202 MHz 11: 1203 MHz 12:
1202 MHz 13: 1202 MHz 14: 1355 MHz
   15: 1202 MHz 16: 1202 MHz 17: 1202 MHz 18: 1203 MHz 19: 1206 MHz 20:
1204 MHz 21: 1204 MHz
   22: 1205 MHz 23: 1204 MHz 24: 1204 MHz 25: 1203 MHz 26: 1324 MHz 27:
1203 MHz 28: 1206 MHz
   29: 1205 MHz 30: 1203 MHz 31: 1204 MHz 32: 1697 MHz 33: 1204 MHz 34:
1204 MHz 35: 1204 MHz
   36: 1202 MHz
Graphics:  Card: Advanced Micro Devices [AMD/ATI] Fiji [Radeon R9 FURY / NANO
Series] bus-ID: 01:00.0
   Display Server: wayland (X.Org 1.19.6 ) driver: amdgpu Resolution:
2560x1440@59.91hz
   OpenGL: renderer: AMD Radeon R9 Fury Series (FIJI, DRM 3.26.0,
4.18.5-041805-generic, LLVM 6.0.0)
   version: 4.5 Mesa 18.1.7 - padoka PPA Direct Render: Yes
Audio: Card-1 Advanced Micro 

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #33 from Allan  ---
(In reply to Jan Jurzitza from comment #31)
> I have found a workaround (amd patched kernel not required):
> 
> cat /sys/class/drm/card0/device/pp_dpm_sclk
> # insert appropriate index here, I went for 1077Mhz
> echo 3 > /sys/class/drm/card0/device/pp_dpm_sclk
> 
> Makes the GPU a bit slower (changes clock to 1077 Mhz on my card) for the
> session, but at least applications don't freeze the system anymore now (or
> at least this is delaying it so much that it works for multiple hours, but
> it didn't freeze for me yet)
> 
> Though because of the slowdown I don't think this is a good solution
> long-term. Maybe a hint towards a solution though maybe? What I noticed in
> radeon-profile is that on auto it is capable of running at the boost
> frequency (1266 Mhz) and not limited to the base frequency the product page
> specifies (1120 Mhz) by default, so I changed it here and it basically fixed
> it.
> 
> Fixes the issue on kernel 4.18.4

Even that I didn't mention, I tried it.

It worked for me for a while, and most part while I wasn't properly running 3D
rendering, but OpenCL codes instead.

But it never worked as a workaround cause it just randomized the time to happen
the errors.

And this is exactly why I didn't mention it before.

Indeed, I need to test it on kernel 4.18 yet.

###

In time : seems like that the warranty of my motherboard will take a long time
to finish.

I borrowed an old PC from my aunt and I hope that it will be enough to compile
the kernel and test the GPU. It is going to be fun to compile a kernel on a
1.6GHz dual core (1C/2T).

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #32 from dwagner  ---
(In reply to Jan Jurzitza from comment #31)
> I have found a workaround (amd patched kernel not required):
> 
> cat /sys/class/drm/card0/device/pp_dpm_sclk
> # insert appropriate index here, I went for 1077Mhz
> echo 3 > /sys/class/drm/card0/device/pp_dpm_sclk
> 
> Makes the GPU a bit slower (changes clock to 1077 Mhz on my card) for the
> session, but at least applications don't freeze the system anymore now (or
> at least this is delaying it so much that it works for multiple hours, but
> it didn't freeze for me yet)

As long as /sys/class/drm/card0/device/power_dpm_force_performance_level is set
to "auto", this write to pp_dpm_sclk won't have a lasting effect, as dynamic
power management changes this clock setting all the time.

For the symptoms I reported in bug
https://bugs.freedesktop.org/show_bug.cgi?id=102322 I found that actually
disabling dynamic power management prevents them from happening, but I do need
an 

echo manual >power_dpm_force_performance_level

for this (regardless of what values I write to pp_dpm_sclk and pp_dpm_mclk
thereafter.

Cave: Every mode change or re-enabling of a screen with silently disregard a
previous "manual" setting, so that needs to be re-applied afterwards - this is
subject to bug report https://bugs.freedesktop.org/show_bug.cgi?id=107141

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #31 from Jan Jurzitza  ---
I have found a workaround (amd patched kernel not required):

cat /sys/class/drm/card0/device/pp_dpm_sclk
# insert appropriate index here, I went for 1077Mhz
echo 3 > /sys/class/drm/card0/device/pp_dpm_sclk

Makes the GPU a bit slower (changes clock to 1077 Mhz on my card) for the
session, but at least applications don't freeze the system anymore now (or at
least this is delaying it so much that it works for multiple hours, but it
didn't freeze for me yet)

Though because of the slowdown I don't think this is a good solution long-term.
Maybe a hint towards a solution though maybe? What I noticed in radeon-profile
is that on auto it is capable of running at the boost frequency (1266 Mhz) and
not limited to the base frequency the product page specifies (1120 Mhz) by
default, so I changed it here and it basically fixed it.

Fixes the issue on kernel 4.18.4

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #30 from Jan Jurzitza  ---
(In reply to Andrey Grodzovsky from comment #29)
> > ...
> This is just a warning meaning you use CPU to update GPU page tables, any
> reason why ? try passing kernel  
>  amdgpu.vm_update_mode=0 instead.

Yes I had been experimenting with kernel flags trying to fix it. I had it 0
before and it was happening too. Also have tried that variation with
amdgpu.dc=0 and 1, the one with update_mode=1 only with amdgpu.dc=0

> > and then the issue OP posted too:
> > 
> > 
> > ...
> > 
> > 
> > Happens on pretty much any application using Vulkan after some time or Core
> > OpenGL applications too. Doesn't happen on normal desktop usage with Chrome.
> 
> So is it only Vulkan specific ?

No Core OpenGL apps too. Hadn't had it happen to legacy OpenGL apps yet (or any
wine DirectX app actually, not sure if they use core or legacy), but that of
course doesn't mean it couldn't happen there too.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #29 from Andrey Grodzovsky  ---
(In reply to Jan Jurzitza from comment #28)
> (In reply to Andrey Grodzovsky from comment #25)
> 
> Still same issue happening here on both projects built from git. One issue
> here which doesn't seem completely related:
> Aug 23 20:41:20 archlinux kernel: [ cut here ]
> Aug 23 20:41:20 archlinux kernel: CPU update of VM recommended only for
> large BAR system
> Aug 23 20:41:20 archlinux kernel: WARNING: CPU: 5 PID: 1092 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2606 amdgpu_vm_init+0x477/0x490
> [amdgpu]
> Aug 23 20:41:20 archlinux kernel: Modules linked in: bnep nct6775 hwmon_vid
> joydev btusb btrtl btbcm btintel bluetooth snd_usb_audio snd_usbmidi_lib
> snd_rawmidi input_leds snd_seq_device ecdh_generic mousedev nls_iso8859_1
> nls_cp437 vfat fat btrfs zstd_compress libcrc32c zstd_decompress xxhash xor
> arc4 amdkfd amd_iommu_v2 amdgpu iwlmvm mac80211 edac_mce_amd led_class
> kvm_amd iwlwifi snd_hda_codec_realtek chash gpu_sched kvm snd_hda_codec_hdmi
> snd_hda_codec_generic ttm snd_hda_intel drm_kms_helper irqbypass
> snd_hda_codec cfg80211 morus1280_avx2 drm morus1280_sse2 morus1280_glue
> morus640_sse2 morus640_glue snd_hda_core aegis256_aesni aegis128l_aesni
> aegis128_aesni igb snd_hwdep crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel snd_pcm pcbc snd_timer agpgart evdev ccp sp5100_tco
> aesni_intel snd syscopyarea i2c_algo_bit sysfillrect
> Aug 23 20:41:20 archlinux kernel:  aes_x86_64 wmi_bmof mac_hid crypto_simd
> sysimgblt raid6_pq cryptd glue_helper fb_sys_fops soundcore k10temp
> i2c_piix4 dca rfkill rng_core wmi button acpi_cpufreq sch_fq_codel
> vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) sg crypto_user
> ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto sr_mod
> cdrom sd_mod uas usb_storage hid_uclogic hid_generic usbhid hid ahci libahci
> xhci_pci libata crc32c_intel xhci_hcd usbcore scsi_mod usb_common
> Aug 23 20:41:20 archlinux kernel: CPU: 5 PID: 1092 Comm: Xorg.wrap Tainted:
> G   O  4.18.0-rc1-5024f8dfe478 #1
> Aug 23 20:41:20 archlinux kernel: Hardware name: To Be Filled By O.E.M. To
> Be Filled By O.E.M./X370 Gaming-ITX/ac, BIOS P3.40 11/07/2017
> Aug 23 20:41:20 archlinux kernel: RIP: 0010:amdgpu_vm_init+0x477/0x490
> [amdgpu]
> Aug 23 20:41:20 archlinux kernel: Code: b8 08 d8 ff ff e8 79 89 7c e8 e9 ee
> fe ff ff 41 89 ef e9 e6 fe ff ff 48 c7 c7 08 65 f0 c0 c6 05 41 af 2b 00 01
> e8 a3 8f 37 e8 <0f> 0b 0f b6 8b 60 01 00 00 e9 b4 fc ff ff e8 26 8d 37 e8 66
> 0f 1f 
> Aug 23 20:41:20 archlinux kernel: RSP: 0018:acc2c8df7b60 EFLAGS: 00010286
> Aug 23 20:41:20 archlinux kernel: RAX:  RBX:
> 9b10f7bf9000 RCX: 0006
> Aug 23 20:41:20 archlinux kernel: RDX: 0007 RSI:
> 0002 RDI: 9b10fe7564d0
> Aug 23 20:41:20 archlinux kernel: RBP: 9b10f564 R08:
> 001856da5330 R09: 0036
> Aug 23 20:41:20 archlinux kernel: R10: 0424 R11:
> 0006ad48 R12: 9b10f7bf90b8
> Aug 23 20:41:20 archlinux kernel: R13: 000a R14:
>  R15: 
> Aug 23 20:41:20 archlinux kernel: FS:  7fcf6cc95500()
> GS:9b10fe74() knlGS:
> Aug 23 20:41:20 archlinux kernel: CS:  0010 DS:  ES:  CR0:
> 80050033
> Aug 23 20:41:20 archlinux kernel: CR2: 7fcf6cb1d960 CR3:
> 0007e119 CR4: 003406e0
> Aug 23 20:41:20 archlinux kernel: Call Trace:
> Aug 23 20:41:20 archlinux kernel:  ? ida_simple_get+0x91/0xf0
> Aug 23 20:41:20 archlinux kernel:  amdgpu_driver_open_kms+0x83/0x1d0 [amdgpu]
> Aug 23 20:41:20 archlinux kernel:  drm_open+0x20b/0x440 [drm]
> Aug 23 20:41:20 archlinux kernel:  drm_stub_open+0xaf/0xf0 [drm]
> Aug 23 20:41:20 archlinux kernel:  chrdev_open+0xa3/0x1b0
> Aug 23 20:41:20 archlinux kernel:  ? cdev_put.part.3+0x20/0x20
> Aug 23 20:41:20 archlinux kernel:  do_dentry_open+0x1ab/0x2d0
> Aug 23 20:41:20 archlinux kernel:  path_openat+0x31b/0x1440
> Aug 23 20:41:20 archlinux kernel:  ? alloc_set_pte+0x1fd/0x4e0
> Aug 23 20:41:20 archlinux kernel:  do_filp_open+0x93/0x100
> Aug 23 20:41:20 archlinux kernel:  ? __check_object_size+0x9c/0x171
> Aug 23 20:41:20 archlinux kernel:  do_sys_open+0x186/0x210
> Aug 23 20:41:20 archlinux kernel:  do_syscall_64+0x4e/0x100
> Aug 23 20:41:20 archlinux kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Aug 23 20:41:20 archlinux kernel: RIP: 0033:0x7fcf6cbbc452
> Aug 23 20:41:20 archlinux kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4c
> 48 8d 05 f5 70 0d 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c
> ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33
> 0c 25 
> Aug 23 20:41:20 archlinux kernel: RSP: 002b:7ffe9a15b0a0 EFLAGS:
> 0246 ORIG_RAX: 0101
> Aug 23 20:41:20 archlinux kernel: RAX: ffda RBX:
>  RCX: 7fcf6cbbc45

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #28 from Jan Jurzitza  ---
(In reply to Andrey Grodzovsky from comment #25)

Still same issue happening here on both projects built from git. One issue here
which doesn't seem completely related:
Aug 23 20:41:20 archlinux kernel: [ cut here ]
Aug 23 20:41:20 archlinux kernel: CPU update of VM recommended only for large
BAR system
Aug 23 20:41:20 archlinux kernel: WARNING: CPU: 5 PID: 1092 at
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2606 amdgpu_vm_init+0x477/0x490 [amdgpu]
Aug 23 20:41:20 archlinux kernel: Modules linked in: bnep nct6775 hwmon_vid
joydev btusb btrtl btbcm btintel bluetooth snd_usb_audio snd_usbmidi_lib
snd_rawmidi input_leds snd_seq_device ecdh_generic mousedev nls_iso8859_1
nls_cp437 vfat fat btrfs zstd_compress libcrc32c zstd_decompress xxhash xor
arc4 amdkfd amd_iommu_v2 amdgpu iwlmvm mac80211 edac_mce_amd led_class kvm_amd
iwlwifi snd_hda_codec_realtek chash gpu_sched kvm snd_hda_codec_hdmi
snd_hda_codec_generic ttm snd_hda_intel drm_kms_helper irqbypass snd_hda_codec
cfg80211 morus1280_avx2 drm morus1280_sse2 morus1280_glue morus640_sse2
morus640_glue snd_hda_core aegis256_aesni aegis128l_aesni aegis128_aesni igb
snd_hwdep crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_pcm pcbc
snd_timer agpgart evdev ccp sp5100_tco aesni_intel snd syscopyarea i2c_algo_bit
sysfillrect
Aug 23 20:41:20 archlinux kernel:  aes_x86_64 wmi_bmof mac_hid crypto_simd
sysimgblt raid6_pq cryptd glue_helper fb_sys_fops soundcore k10temp i2c_piix4
dca rfkill rng_core wmi button acpi_cpufreq sch_fq_codel vboxnetflt(O)
vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) sg crypto_user ip_tables x_tables
ext4 crc32c_generic crc16 mbcache jbd2 fscrypto sr_mod cdrom sd_mod uas
usb_storage hid_uclogic hid_generic usbhid hid ahci libahci xhci_pci libata
crc32c_intel xhci_hcd usbcore scsi_mod usb_common
Aug 23 20:41:20 archlinux kernel: CPU: 5 PID: 1092 Comm: Xorg.wrap Tainted: G  
O  4.18.0-rc1-5024f8dfe478 #1
Aug 23 20:41:20 archlinux kernel: Hardware name: To Be Filled By O.E.M. To Be
Filled By O.E.M./X370 Gaming-ITX/ac, BIOS P3.40 11/07/2017
Aug 23 20:41:20 archlinux kernel: RIP: 0010:amdgpu_vm_init+0x477/0x490 [amdgpu]
Aug 23 20:41:20 archlinux kernel: Code: b8 08 d8 ff ff e8 79 89 7c e8 e9 ee fe
ff ff 41 89 ef e9 e6 fe ff ff 48 c7 c7 08 65 f0 c0 c6 05 41 af 2b 00 01 e8 a3
8f 37 e8 <0f> 0b 0f b6 8b 60 01 00 00 e9 b4 fc ff ff e8 26 8d 37 e8 66 0f 1f 
Aug 23 20:41:20 archlinux kernel: RSP: 0018:acc2c8df7b60 EFLAGS: 00010286
Aug 23 20:41:20 archlinux kernel: RAX:  RBX: 9b10f7bf9000
RCX: 0006
Aug 23 20:41:20 archlinux kernel: RDX: 0007 RSI: 0002
RDI: 9b10fe7564d0
Aug 23 20:41:20 archlinux kernel: RBP: 9b10f564 R08: 001856da5330
R09: 0036
Aug 23 20:41:20 archlinux kernel: R10: 0424 R11: 0006ad48
R12: 9b10f7bf90b8
Aug 23 20:41:20 archlinux kernel: R13: 000a R14: 
R15: 
Aug 23 20:41:20 archlinux kernel: FS:  7fcf6cc95500()
GS:9b10fe74() knlGS:
Aug 23 20:41:20 archlinux kernel: CS:  0010 DS:  ES:  CR0:
80050033
Aug 23 20:41:20 archlinux kernel: CR2: 7fcf6cb1d960 CR3: 0007e119
CR4: 003406e0
Aug 23 20:41:20 archlinux kernel: Call Trace:
Aug 23 20:41:20 archlinux kernel:  ? ida_simple_get+0x91/0xf0
Aug 23 20:41:20 archlinux kernel:  amdgpu_driver_open_kms+0x83/0x1d0 [amdgpu]
Aug 23 20:41:20 archlinux kernel:  drm_open+0x20b/0x440 [drm]
Aug 23 20:41:20 archlinux kernel:  drm_stub_open+0xaf/0xf0 [drm]
Aug 23 20:41:20 archlinux kernel:  chrdev_open+0xa3/0x1b0
Aug 23 20:41:20 archlinux kernel:  ? cdev_put.part.3+0x20/0x20
Aug 23 20:41:20 archlinux kernel:  do_dentry_open+0x1ab/0x2d0
Aug 23 20:41:20 archlinux kernel:  path_openat+0x31b/0x1440
Aug 23 20:41:20 archlinux kernel:  ? alloc_set_pte+0x1fd/0x4e0
Aug 23 20:41:20 archlinux kernel:  do_filp_open+0x93/0x100
Aug 23 20:41:20 archlinux kernel:  ? __check_object_size+0x9c/0x171
Aug 23 20:41:20 archlinux kernel:  do_sys_open+0x186/0x210
Aug 23 20:41:20 archlinux kernel:  do_syscall_64+0x4e/0x100
Aug 23 20:41:20 archlinux kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 23 20:41:20 archlinux kernel: RIP: 0033:0x7fcf6cbbc452
Aug 23 20:41:20 archlinux kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48
8d 05 f5 70 0d 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff
ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 
Aug 23 20:41:20 archlinux kernel: RSP: 002b:7ffe9a15b0a0 EFLAGS: 0246
ORIG_RAX: 0101
Aug 23 20:41:20 archlinux kernel: RAX: ffda RBX: 
RCX: 7fcf6cbbc452
Aug 23 20:41:20 archlinux kernel: RDX: 0002 RSI: 7ffe9a15b180
RDI: ff9c
Aug 23 20:41:20 archlinux kernel: RBP: 7ffe9a15b130 R08: 
R09: 
Aug 23 

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #27 from Suloev Dmitry  ---
Looks like all my problems fixed in latest kernel. Thx!

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #26 from Allan  ---
I will do it as soon as possible, but it may take a while (maybe a month)
because my motherboard showed many issues and I'm requesting money back to buy
another.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #25 from Andrey Grodzovsky  ---
(In reply to Allan from comment #12)
> My system started to power down for nothing sometimes, even using the
> GTX1070 (nvidia|nouveau) .
> Then I installed a Windows image just to be sure if the kernel was the
> problem.
> 
> Well, for now it *SEEMS* that isn't *ONLY* the driver/kernel :
> - The RX480 was freezing in the same way, then I sent it for warranty.
> - RX580 run problematically, almost always I got a message like : "DX11 :
> device disconnected" or "Mantle : Device lost".
> - GTX1070 was running fine for 1 day, then it became the same as the RX580
> and for my bad luck the system started to power down after a random time
> (5min to 2 hours +/-).
> 
> For sure the driver/kernel (amdgpu/linux) has its faults here, and here's
> why:
> - At Windows, the only card that stuck the system was RX480 sometimes
> because it was really broken.
> - In other cases, when a failure happened (with Nvidia or AMD), the system
> was able to retake the control over the device.
>  - Maybe doing a soft-reset?
>  - Maybe just killing the driver and starting again?
>  - Maybe just by stopping the process that were using the GPU to avoid a big
> chain of resulting problems?
> - Neither the RX580 nor GTX1070 has dual-bios AFAIK. Maybe RX480, but I did
> not test it.
> 
> Then :
> - Revised and changed the PCI-Ex power lines : OK.
> - Tested power supply (lucky for me AX860i has a self test) : OK.
> - Cleaned all slots with a brush : OK.
> - Tested again CPU and RAM : OK.
> 
> But , I must be in a very bad luck, the problems persisted.
> 
> I've sent the Motherboard for warranty. I'm waiting for its diagnostic and
> solution.
> 
> I'll inform here as soon as it becomes possible.
> 
> Thoughts for the while :
> - Not being able to kill the processes *is* a problem that concerns only
> amdgpu and it is either a problem of the driver itself (most likely to be)
> or of the kernel.

We recently fixed the issue of not being able to kill a process stuck like your
process in wait for fence signal in kernel mode. 

Can you build latest kernel (4.18) and grab again latest firmware and try again
?
Links to kernel and firmware:
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/ 

> - The driver is not capable of retaking control of the device.
> - It is impossible to kill children pids when something hung using amdgpu.
> - Yes, it occurred once or twice using nvidia proprietary too, but it was
> probably caused because of the faulty motherboard that I'm waiting to be
> fixed.
> - Using nouveau was the most happy path , but unfortunately nouveau does not
> support Pascal at all yet. It keeps the card at the min clock (300 or
> 400MHz) and it is not possible yet to increase the speed of the card. So it
> is not a valid working way.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #24 from krutoiles...@gmail.com ---
Created attachment 141053
  --> https://bugs.freedesktop.org/attachment.cgi?id=141053&action=edit
dmesg after logging into the system from remote machine.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-08-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #23 from krutoiles...@gmail.com ---
Similar issues. The most reliable way to replicate for me is to use Dota 2.
While it's not 100%, it does seem to work reliably 1 out of 4 attempts. This
does happen with other apps such as chrome when visiting school library website
or firefox. The system even hangs right after login occasionally.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-07-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #22 from Suloev Dmitry  ---
With iommu and dc system can't even boot.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-07-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #21 from Suloev Dmitry  ---
Created attachment 140825
  --> https://bugs.freedesktop.org/attachment.cgi?id=140825&action=edit
amdgpu with dc enabled

And different traceback with amdgpu.dc enabled.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-07-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #20 from Suloev Dmitry  ---
Created attachment 140823
  --> https://bugs.freedesktop.org/attachment.cgi?id=140823&action=edit
Memory manager not clean during takedown.

But everything changes with iommu enabled!

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-07-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #19 from Suloev Dmitry  ---
Created attachment 140822
  --> https://bugs.freedesktop.org/attachment.cgi?id=140822&action=edit
startx.log

I even can run X with disabled iommu, but when I start firefox - X hangs.
But gpu_recovery trying reset gpu and I cag get back to console.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-07-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

Suloev Dmitry  changed:

   What|Removed |Added

 Attachment #140820|0   |1
is obsolete||

--- Comment #18 from Suloev Dmitry  ---
Created attachment 140821
  --> https://bugs.freedesktop.org/attachment.cgi?id=140821&action=edit
amdgpu timeout with iommu disabled

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-07-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #17 from Suloev Dmitry  ---
Created attachment 140820
  --> https://bugs.freedesktop.org/attachment.cgi?id=140820&action=edit
amdgpu timeout with iommu enabled

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-07-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #16 from Suloev Dmitry  ---
This issue looks pretty similar to one of mine.
But in addition to this I found few more bugs in amdgpu+iommu+drm bundle.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-07-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #15 from alpa...@gmail.com ---
Maybe similar: Fedora 28 with latest updates. RX 550 and monitor on display
port. Kernel running with nopti flag.


When I lock the computer (ctrl-l on gnome) and leave it for 30m, then display
will not come back. The monitor will wake up and after a few secs it will go
back to sleep with a no signal message. ctrl-alt-fx does not work.


Something else noticed: When I work long, sometimes I get a message from the
monitor saying there is no signal and it will go to sleep. I cancel the message
and continue working, but it seems something funny is happening with the
driver.

Note: Before I was using an nvidia quadro with no such problems.



[ 2058.885223] kernel BUG at mm/slub.c:296!
[ 2058.885233] invalid opcode:  [#1] SMP NOPTI
[ 2058.885235] Modules linked in: nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_raw iptable_security ebtable_filter ebtables
ip6table_filter ip6_tables fuse ip_set nfnetlink bridge stp llc libcrc32c
binfmt_misc smsc47b397 intel_powerclamp coretemp kvm_intel kvm hp_wmi
sparse_keymap irqbypass iTCO_wdt iTCO_vendor_support rfkill gpio_ich
crct10dif_pclmul crc32_pclmul wmi_bmof ghash_clmulni_intel
snd_hda_codec_realtek intel_cstate snd_hda_codec_generic intel_uncore
snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core
[ 2058.885285]  snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd lpc_ich
soundcore i7core_edac shpchp wmi acpi_cpufreq amdkfd amd_iommu_v2 amdgpu chash
i2c_algo_bit gpu_sched drm_kms_helper ttm crc32c_intel firewire_ohci serio_raw
drm tg3 firewire_core nvme crc_itu_t nvme_core i2c_dev [last unloaded:
ip6_tables]
[ 2058.885314] CPU: 3 PID: 6943 Comm: Xorg Tainted: G  I  
4.17.3-200.fc28.x86_64 #1
[ 2058.885316] Hardware name: Hewlett-Packard HP Z400 Workstation/0B4Ch, BIOS
786G3 v03.60 02/24/2016
[ 2058.885324] RIP: 0010:kfree+0x165/0x180
[ 2058.885326] RSP: 0018:acf38341faf0 EFLAGS: 00010246
[ 2058.885329] RAX: 96d7ea198c00 RBX: 96d7ea198c00 RCX:
96d7ea198c00
[ 2058.885332] RDX: 73a0 RSI: 96da572e6160 RDI:
96da56c06e80
[ 2058.885336] RBP: 96d8ef407200 R08:  R09:
c05ffbb8
[ 2058.885339] R10: e7044ea86600 R11: 0a00 R12:
c05ffbb8
[ 2058.885342] R13: 96d7ea19e000 R14: 96d7ea19cc00 R15:
96da49111000
[ 2058.885346] FS:  7f34dc902ac0() GS:96da572c()
knlGS:
[ 2058.885349] CS:  0010 DS:  ES:  CR0: 80050033
[ 2058.885352] CR2: 7ff5e410 CR3: 000492ece005 CR4:
000206e0
[ 2058.885354] Call Trace:
[ 2058.885456]  dc_stream_release+0x28/0x50 [amdgpu]
[ 2058.885535]  dm_update_crtcs_state+0x1be/0x4d0 [amdgpu]
[ 2058.885614]  amdgpu_dm_atomic_check+0x1b1/0x3b0 [amdgpu]
[ 2058.885642]  drm_atomic_check_only+0x360/0x4f0 [drm]
[ 2058.885663]  drm_atomic_commit+0x13/0x50 [drm]
[ 2058.885682]  drm_atomic_connector_commit_dpms+0xdb/0x100 [drm]
[ 2058.885701]  drm_mode_obj_set_property_ioctl+0x178/0x280 [drm]
[ 2058.885721]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 2058.885739]  drm_mode_connector_property_set_ioctl+0x39/0x60 [drm]
[ 2058.885756]  drm_ioctl_kernel+0x5b/0xb0 [drm]
[ 2058.885773]  drm_ioctl+0x1b3/0x370 [drm]
[ 2058.885792]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 2058.885843]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 2058.885849]  do_vfs_ioctl+0xa4/0x610
[ 2058.885853]  ksys_ioctl+0x60/0x90
[ 2058.885857]  __x64_sys_ioctl+0x16/0x20
[ 2058.885863]  do_syscall_64+0x5b/0x160
[ 2058.885870]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2058.885873] RIP: 0033:0x7f34d9b90e17
[ 2058.885876] RSP: 002b:7ffd08e499a8 EFLAGS: 0246 ORIG_RAX:
0010
[ 2058.885879] RAX: ffda RBX: 02872380 RCX:
7f34d9b90e17
[ 2058.885882] RDX: 7ffd08e499e0 RSI: c01064ab RDI:
000c
[ 2058.885884] RBP: 7ffd08e499e0 R08: 0001 R09:

[ 2058.885887] R10: 0001 R11: 0246 R12:
c01064ab
[ 2058.885889] R13: 000c R14: 028727a0 R15:
00830c01
[ 2058.885892] Code: 74 05 41 0f b6 72 69 5b 4c 89 d7 5d 41 5c e9 c3 bb f8 ff
48 89 d9 48 89 da 41 b8 01 00 00 00 5b 4c 89 d6 5d 41 5c e9 7b f6 ff ff <0f> 0b
0f 0b 49 8b 42 20 a8 01 75 c1 0f 0b 48 8b 3d 36 73 fa 00 
[ 2058.885937] RIP: kfree+0x165/0x180 RSP: acf38341faf0
[ 2058.885955] ---[ end trace 71e7210e68d99a2b ]---

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/l

[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-05-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #14 from Koz Ross  ---
I also seem to be having similar issues - I have given a full report as bug
#106434.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-05-02 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #13 from emmanuel.boudrea...@polymtl.ca ---
I seem to have this same issue when opening an e-mail with a certain picture
using emacs. I'm using ArchLinux and Wayland (gnome shell). It is very easy to
reproduce so let me know if more logs/debugging can help.

AMD Ryzen 5 1600
Radeon RX 560
Kernel: 4.16.5-1-ARCH
amdgpu 18.0.1-1
mesa 18.0.1-1


These are the drm and amd related dmseg logs:

[3.516074] amdgpu :20:00.0: Invalid PCI ROM header signature: expecting
0xaa55, got 0x
[3.516140] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment
size is 9-bit
[3.516703] amdgpu :20:00.0: VRAM: 4096M 0x00F4 -
0x00F4 (4096M used)
[3.516707] amdgpu :20:00.0: GTT: 256M 0x -
0x0FFF
[3.516716] [drm] Detected VRAM RAM=4096M, BAR=256M
[3.516718] [drm] RAM width 128bits GDDR5
[3.516870] [drm] amdgpu: 4096M of VRAM memory ready
[3.516873] [drm] amdgpu: 4096M of GTT memory ready.
[3.516891] [drm] GART: num cpu pages 65536, num gpu pages 65536
[3.517011] [drm] PCIE GART of 256M enabled (table at 0x00F40004).
[3.517987] [drm] Chained IB support enabled!
[3.524025] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[3.525784] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[3.597334] amdgpu: [powerplay] 
[3.597356] amdgpu: [powerplay] 
[3.606949] [drm] DM_PPLIB: values for Engine clock
[3.606951] [drm] DM_PPLIB:   21400
[3.606952] [drm] DM_PPLIB:   38700
[3.606953] [drm] DM_PPLIB:   84300
[3.606953] [drm] DM_PPLIB:   99500
[3.606954] [drm] DM_PPLIB:   106200
[3.606955] [drm] DM_PPLIB:   110800
[3.606956] [drm] DM_PPLIB:   114900
[3.606956] [drm] DM_PPLIB:   122600
[3.606957] [drm] DM_PPLIB: Validation clocks:
[3.606958] [drm] DM_PPLIB:engine_max_clock: 122600
[3.606959] [drm] DM_PPLIB:memory_max_clock: 15
[3.606960] [drm] DM_PPLIB:level   : 0
[3.606962] [drm] DM_PPLIB: values for Memory clock
[3.606963] [drm] DM_PPLIB:   3
[3.606964] [drm] DM_PPLIB:   62500
[3.606964] [drm] DM_PPLIB:   15
[3.606965] [drm] DM_PPLIB: Validation clocks:
[3.606966] [drm] DM_PPLIB:engine_max_clock: 122600
[3.606967] [drm] DM_PPLIB:memory_max_clock: 15
[3.606967] [drm] DM_PPLIB:level   : 0
[3.617049] [drm] Display Core initialized with v3.1.27!
[3.642678] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[3.642680] [drm] Driver supports precise vblank timestamp query.
[3.672521] [drm] UVD and UVD ENC initialized successfully.
[3.773428] [drm] VCE initialized successfully.
[4.301585] [drm] fb mappable at 0xE056A000
[4.301587] [drm] vram apper at 0xE000
[4.301588] [drm] size 11059200
[4.301589] [drm] fb depth is 24
[4.301590] [drm]pitch is 10240
[4.301702] fbcon: amdgpudrmfb (fb0) is primary device
[4.358740] amdgpu :20:00.0: fb0: amdgpudrmfb frame buffer device
[4.371876] [drm] Initialized amdgpu 3.23.0 20150101 for :20:00.0 on
minor 0
[   55.222527] amdgpu :20:00.0: GPU fault detected: 147 0x04f04802
[   55.222536] amdgpu :20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x0050309E
[   55.222540] amdgpu :20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x06048002
[   55.222545] amdgpu :20:00.0: VM fault (0x02, vmid 3) at page 5255326,
read from 'TC0' (0x54433000) (72)
[   65.330363] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
last signaled seq=701, last emitted seq=704
[   65.330378] [drm] IP block:gfx_v8_0 is hung!
[   65.330425] [drm] GPU recovery disabled.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-04-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #12 from Allan  ---
My system started to power down for nothing sometimes, even using the GTX1070
(nvidia|nouveau) .
Then I installed a Windows image just to be sure if the kernel was the problem.

Well, for now it *SEEMS* that isn't *ONLY* the driver/kernel :
- The RX480 was freezing in the same way, then I sent it for warranty.
- RX580 run problematically, almost always I got a message like : "DX11 :
device disconnected" or "Mantle : Device lost".
- GTX1070 was running fine for 1 day, then it became the same as the RX580 and
for my bad luck the system started to power down after a random time (5min to 2
hours +/-).

For sure the driver/kernel (amdgpu/linux) has its faults here, and here's why:
- At Windows, the only card that stuck the system was RX480 sometimes because
it was really broken.
- In other cases, when a failure happened (with Nvidia or AMD), the system was
able to retake the control over the device.
 - Maybe doing a soft-reset?
 - Maybe just killing the driver and starting again?
 - Maybe just by stopping the process that were using the GPU to avoid a big
chain of resulting problems?
- Neither the RX580 nor GTX1070 has dual-bios AFAIK. Maybe RX480, but I did not
test it.

Then :
- Revised and changed the PCI-Ex power lines : OK.
- Tested power supply (lucky for me AX860i has a self test) : OK.
- Cleaned all slots with a brush : OK.
- Tested again CPU and RAM : OK.

But , I must be in a very bad luck, the problems persisted.

I've sent the Motherboard for warranty. I'm waiting for its diagnostic and
solution.

I'll inform here as soon as it becomes possible.

Thoughts for the while :
- Not being able to kill the processes *is* a problem that concerns only amdgpu
and it is either a problem of the driver itself (most likely to be) or of the
kernel.
- The driver is not capable of retaking control of the device.
- It is impossible to kill children pids when something hung using amdgpu.
- Yes, it occurred once or twice using nvidia proprietary too, but it was
probably caused because of the faulty motherboard that I'm waiting to be fixed.
- Using nouveau was the most happy path , but unfortunately nouveau does not
support Pascal at all yet. It keeps the card at the min clock (300 or 400MHz)
and it is not possible yet to increase the speed of the card. So it is not a
valid working way.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-04-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #11 from txtsd  ---
This happens to me too. I run a Ryzen 2400G on an MSI B350 Tomahawk.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-04-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #10 from Peter  ---
Dear All,

I have a similar problem,
Kernel 4.15.16, Xeon CPU E3-1505M, Radeon R9 M295X.

The Laptop runs fine, provided I'm not accessing the GPU.

DRI_PRIME=0 glxinfo | grep "OpenGL renderer"
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics P530 (Skylake GT2) 

is fine. However 
DRI_PRIME=1 glxinfo | grep "OpenGL renderer"

doesn't respond, can't be killed and after a while the Laptop freezes
completely.
At the beginning of the 4.15 releases it was working fine.
I even got a significantly higher frame rate in supertuxkart using
amdgpu instead of the intel graphic.

But I don't know what else got updated besides the kernel.

Best regards,
Peter

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-04-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #9 from Allan  ---
 bernhardu is correct becaus in ALL cases it is impossible to kill some
processes (the cause and anything xorg related).

Maybe something related to the chipset ? (X370?)

I don't know if other chipsets than those for ryzen are having these problems.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-04-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #8 from Allan  ---
Tweaking between packages changes a little bit the amount of time before
crashing.

(in debian)
For example... forcing to install libdrm-amdgpu1-dbg implies in older packages
being installed altogether with it and will mainly crash while using something
inside a docker container.

Upgrading it to a newest on (unstable, testing) results always in crashing,
sooner or later.

Now I got this error :

[ 1812.460184] Watchdog[3376]: segfault at 0 ip f5011ce7 sp
ae0f89d0 error 6 in libcef.so[f1d3+419c000]
[53310.478516] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport
link status failed
[53310.478547] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock
recovery failed
[53310.833870] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport
link status failed
[53310.833900] [drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock
recovery failed

If I had to guess here, I'd guess that something concurrent is going on,
because sometimes it works like a charm, and sometimes the system becomes the
hell itself and gets unusable.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-04-04 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #7 from bernhardu  ---
Just a note, that this might be a similar issue as my #104345:
- both using Ryzen CPUs,
- amdgpu kernel module with Polaris GPUs,
- with unkillable processes
- "GPU fault detected" messages followed by
  "task ... blocked for more than 120 seconds"

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-04-01 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #6 from Allan  ---
TL;DR : I don't have any idea of what is happening. The errors aren't clear and
I didn't find a discrete way of reproducing it and I'm in need of help.

That's exactly the problem... I'm getting crazy about this problem.

I've been trying to understand what is happening for weeks...

So... I'll give you a brief(long) description :

I've been running an RX 580. And then sometimes the system used to freeze like
this and I was starting to think about the card being problematic.

Then I got an RX 480, and I was planning to sell the RX580.

I compiled a kernel with the polaris binaries and etc... It was going very well
until a system upgrade.

Then "here we go again" ... same problems... and now it seems like RX 480 fails
twice as fast as the RX580 fails.

If you are asking yourself "what kind of failures ?" I'll resume it : code 147,
code 146, chrome_dthread libxul.so (for both firefox and chromium), a big call
trace telling about amdgpu blocked for more than 120 seconds. Everything after
the screen being frozen, ignoring the keyboard and mouse clicks, the only thing
that really works is the mouse cursor moving.

When it happens? After a few minutes running youtube or unigine valley or some
random time (from minutes to several hours) using an opencl task for example.

Then I started to think about the other components...
- RAM ? Checked and running if the screen hangs, some ssh tests run fine.
- CPU ? Never had a problem about it as far as I remember. Ssh tests run fine.
- MOBO ? I really don't know. That's why :
 I had been having some sound cracklings, indicating that some power
management could be tainted.
 I noticed that disabling IOMMU decreased the amount of crashes
significantly... but unfortunately after updating the BIOS/EFI the option of
enabling/disabling it simply was removed... I'll be contacting the
manufacturer. So I can't affirm that it was the cause.
 I started to think that something nasty was going on with the power
supply.
- POWER SUPPLY ? I bet that it is not
 I have an 5 yeras old Aerocool 80 plus silver 800W power supply. It always
had been a very good PSU... holding a HD7970GHz (290W TDP) most part of the
time without a single problem.
 But okay... maybe the capacitors were faulty (as the mobo manufacturer
said when I asked about the sound). Then I bought an AX860i. And if there is
any better PSU than this for the 800W range... I'd like to know. 80 plus
platinum certified... and even that the certification system does not get
verified for years (almost like irrelevant to be honest). I already had an
Corsair HX600 before and it was outstanding... an AX is better than a HX so...
only a titanium  that costs more than my mobo and cpu togheter would be better
then.
 Guess what? The same problems. Actually, now, it shuts down sometimes.
- KERNEL ? I was thinking that the problem was 4.15 because it has like 5x more
chance of failling. But it also occurs with the very stable 4.13. Maybe I'll
try other kernels... but as further we go behind with kernel versions, less
features we have with amdgpu AFAIK.
 Also. With the RX480 it started to fail the video output when I configure
the Display Port output to be 144Hz. My screen can handle 160Hz with adaptive
sync, but it never worked with amdgpu.
 The DisplayPort/HDMI sound with DC/DAL support in 4.15 is a myth and NEVER
works. If I configure amdgpu.dc=1 with RX580 it simply does not sound anything
and with the RX480 it hangs the system when starting the pavucontrol. When
forcing the output to the HDMI/DP it simply does not sound anything in both
ways (but pavucontrol shows that something was supposed to be happening).
 While running a tty the chances of crashing is very low. But it happens
when trying an opencl application after some random time as said before.
 When using RX580+1070 or RX480+1070 for vfio I noticed that unbinding the
nvidia card extended the amount of working time before crashing. (was also one
reason for me to think that the PSU was faulty)

Now the "best" part : running a single GPU leads to the same problems... :/


I'm not sure about anything right now. I'll try only the 1070 for sometime to
guarantee that amdgpu is the only problem here.

I never touched the amdgpu code but it seems to me that either I sell the cards
or I fix it by hand. Because I'm not finding anything related.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #5 from Emil Velikov  ---
Hi Allan, just sharing some ideas - I'm not working on the AMD drivers 

Make sure you're not using libdrm* 2.4.90 - it has some nasty bugs.

Afterwords, try to track down exactly what's causing the problem and a simple
way to reproduce it.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-03-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #4 from Allan  ---
If you set amdgpu.dc=1 as a boot parameter and then try openning pavucontrol,
the screen hungs with artifacts (mouse cursor keeps moving) and you get this
error :


```
[  125.640254] amdgpu :0e:00.0: GPU fault detected: 147 0x04f00402
[  125.640259] amdgpu :0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x001C389E
[  125.640262] amdgpu :0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x04004002
[  125.640264] amdgpu :0e:00.0: VM fault (0x02, vmid 2) at page 1849502,
read from 'TC1' (0x54433100) (4)
[  125.640641] amdgpu :0e:00.0: GPU fault detected: 147 0x05004802
[  125.640643] amdgpu :0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x001C38A0
[  125.640644] amdgpu :0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x04048002
[  125.640646] amdgpu :0e:00.0: VM fault (0x02, vmid 2) at page 1849504,
read from 'TC4' (0x54433400) (72)
```

I'm using kernel 4.15.

Then when you request a poweroff from ssh the call trace appears again and
hungs the system, then you have to do a hard reset.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-03-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #3 from Allan  ---
Updating, now this error appears in dmesg too :

```
[ 1502.683100] Chrome_~dThread[2218]: segfault at 0 ip 7f53a4452bd3 sp
7f53a0899ad0 error 6 in libxul.so[7f53a3f3e000+4e2a000]
[ 1502.689186] Chrome_~dThread[2694]: segfault at 0 ip 7f2ef4552bd3 sp
7f2ef0999ad0 error 6 in libxul.so[7f2ef403e000+4e2a000]
[ 1502.689275] Chrome_~dThread[2300]: segfault at 0 ip 7fc55ad52bd3 sp
7fc557199ad0 error 6 in libxul.so[7fc55a83e000+4e2a000]
[ 1502.689287] Chrome_~dThread[2781]: segfault at 0 ip 7f2ce4852bd3 sp
7f2ce0c99ad0 error 6 in libxul.so[7f2ce433e000+4e2a000]
```

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-03-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #2 from Allan  ---
Tried getting all binaries available here
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/amdgpu
.

Even that I included the polaris binaries in the kernel, some binaries were
missing (exactly those that were required...).

I've seen that before, but since sometimes it got working I just thought that
some other bin was being used instead.

Well... I launched Unigine Valley as a test and now the problem is even worse :

[From dmesg]
```
[  517.630633] amdgpu :0e:00.0: GPU fault detected: 147 0x4802
[  517.630636] amdgpu :0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x
[  517.630638] amdgpu :0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08048002
[  517.630640] amdgpu :0e:00.0: VM fault (0x02, vmid 4) at page 0, read
from 'TC4' (0x54433400) (72)
[  517.630644] amdgpu :0e:00.0: GPU fault detected: 147 0x4802
[  517.630645] amdgpu :0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x
[  517.630646] amdgpu :0e:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08084002
[  517.630648] amdgpu :0e:00.0: VM fault (0x02, vmid 4) at page 0, read
from 'TC7' (0x54433700) (132)
```

The symptoms and reactions are the same as above. I got the output from a ssh
because only the cursor was moving and nothing else working.

So ... did my card die or is it a bug?

By the way ... I also have an RX580 and the problem described firstly was
happening too. (I had not tried forcing binaries before)

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-03-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

Allan  changed:

   What|Removed |Added

   Priority|medium  |highest
   Severity|critical|blocker

--- Comment #1 from Allan  ---
Basically it blocks :
- killing pids
- shutting down
- xorg
- quitting xorg

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

2018-03-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105733

Bug ID: 105733
   Summary: Amdgpu randomly hangs and only ssh works. Mouse cursor
moves sometimes but does nothing. Keyboard stops
working.
   Product: DRI
   Version: XOrg git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: critical
  Priority: medium
 Component: DRM/AMDgpu
  Assignee: dri-devel@lists.freedesktop.org
  Reporter: allan4...@gmail.com

Created attachment 138344
  --> https://bugs.freedesktop.org/attachment.cgi?id=138344&action=edit
dmesg, killing pids, shutting down, unloading amdgpu, xorg log

WHAT HAPPENS
- Amdgpu hangs without any clear clue of what is happening.
- The mouse cursor responds to movements when the system is not frozen, but
also it does nothing as well.
- The keyboard gets num lock frozen and even trying with a ps2 one does not
work.
- The video gets frozen.
- Only ssh works, but only the times that the system is not frozen, of course.
- The most irritating part : the system can not be shutdown. No matter what you
do :
-- If you press the power button from the case, it is the only answer that you
can get from the output display : it shows a console indicating that x-server
is trying to be turned off. But nothing else happens and the system can't be
turned off.
-- If you try anything from ssh : "init 0", "poweroff", "shutdown -P 0 -h",
"reboot". It simply does not work. It keeps waiting for something that never
happens. Then you have to press ctrl_c to get back to the ssh sessioon. In an
attempt it closed the ssh daemon but the shutdown itself never happened... even
after 30mins.
-- It is IMPOSSIBLE to force unload amdgpu using "rmmod -f amdgpu". The task
takes forever and never responds. It only hangs the ssh session.
-- It is IMPOSSIBLE to kill some x-related pids properly. If you try to kill it
either nothing will happen or the process will be in a defunct state. Not even
a "su -c 'kill -9 '" will work.

TIPS
- The crashes that allows ssh connection almost always happens when firefox is
openned and running a video (netflix, youtube) or whatsapp web.
- The crashes that simply hangs the entire computer may occur at any time.

OBSERVATIONS
- I use a custom kernel (from 4.15). I've tried including the polaris binaries
for my card, that showed an improvement (less freeze states) for a while. But
now it is the same again.
- I use a nvidia io second pci-e slot for vfio. It is a must and I disable
nouveau as well... It shoud not be a reason for failing. I tried also with
another amd/none-card on second slot. The results were the same as I remember.

SYSTEM SPECS
- Custom kernel compilation optimized for ryzen
(https://wiki.gentoo.org/wiki/Ryzen) and using polaris binaries
(https://wiki.gentoo.org/wiki/AMDGPU)
- Chipset X370 (mobo)
- RX480 in first slot
- GTX 1070 on second slot.
- Tried also with a RX 580 on second slot.
- Tried also with nothing on second slot.
- i3wm loading from startx command

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel