Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-10-02 Thread Thomas Martitz

Am 02.10.18 um 22:03 schrieb Bjorn Helgaas:

Hi Thomas,

On Mon, Oct 01, 2018 at 04:25:06PM +0200, Thomas Martitz wrote:

Am 01.10.18 um 06:57 schrieb Daniel Drake:

On Sun, Sep 30, 2018 at 5:07 AM Thomas Martitz  wrote:

The latest iteration does not work on my HP system. The GPU fails to
power up just like the unpatched kernel.


That's weird, I would not expect a behaviour change in the latest
patch. pci_restore_config_dword() has some debug messages, could you
please make them visible and show logs again?
Also remind us of the PCI device address of the parent bridge (lspci -vt)


I'll follow up with more the requested information on bugzilla
(Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069).

On a quick re-check, it seems to depend on if I used the eGPU before
the initial suspend. If I run glxgears (with DRI_PRIME=1) before suspend it
seems fine.


Does the patch ([1]) make things *worse* compared to v4.19-rc5?



No, certainly not. It does look like a different issue since resuming now
works at least if I used the eGPU in some way before suspend 
(DRI_PRIME=1 glxgears seems to be enough, I assume glxinfo would work as 
well).


Without the patch resuming the eGPU does not work whatsoever.

Please ship the patch. I'll hopefully sort this other issue out.

Best regards.

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-10-01 Thread Thomas Martitz

Am 01.10.18 um 06:57 schrieb Daniel Drake:

On Sun, Sep 30, 2018 at 5:07 AM Thomas Martitz  wrote:

The latest iteration does not work on my HP system. The GPU fails to
power up just like the unpatched kernel.


That's weird, I would not expect a behaviour change in the latest
patch. pci_restore_config_dword() has some debug messages, could you
please make them visible and show logs again?
Also remind us of the PCI device address of the parent bridge (lspci -vt)



I'll follow up with more the requested information on bugzilla
(Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069).

On a quick re-check, it seems to depend on if I used the eGPU before
the initial suspend. If I run glxgears (with DRI_PRIME=1) before suspend 
it seems fine.


Best regards.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-09-29 Thread Thomas Martitz

Am 27.09.18 um 22:52 schrieb Bjorn Helgaas:

[+cc LKML]

On Tue, Sep 18, 2018 at 04:32:44PM -0500, Bjorn Helgaas wrote:

On Thu, Sep 13, 2018 at 11:37:45AM +0800, Daniel Drake wrote:

On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
after S3 suspend/resume. The affected products include multiple
generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
many errors such as:

 fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04
   [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
 DRM: failed to idle channel 0 [DRM]

Similarly, the nvidia proprietary driver also fails after resume
(black screen, 100% CPU usage in Xorg process). We shipped a sample
to Nvidia for diagnosis, and their response indicated that it's a
problem with the parent PCI bridge (on the Intel SoC), not the GPU.

Runtime suspend/resume works fine, only S3 suspend is affected.

We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
the cases that I checked, this register has value 0 and we just have to
rewrite that value.

Linux already saves and restores PCI config space during suspend/resume,
but this register was being skipped because upon resume, it already
has value 0 (the correct, pre-suspend value).

Intel appear to have previously acknowledged this behaviour and the
requirement to rewrite this register.
https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23

Based on that, rewrite the prefetch register values even when that
appears unnecessary.

We have confirmed this solution on all the affected models we have
in-hands (X542UQ, UX533FD, X530UN, V272UN).

Additionally, this solves an issue where r8169 MSI-X interrupts were
broken after S3 suspend/resume on Asus X441UAR. This issue was recently
worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on
RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an
Aimfor-tech laptop that we had not yet patched. I suspect it will also
fix the issue that was worked around in commit 7c53a722459c ("r8169:
don't use MSI-X on RTL8168g").

Thomas Martitz reports that this change also solves an issue where
the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
after S3 suspend/resume.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
Signed-off-by: Daniel Drake 


Applied with Rafael's and Peter's reviewed-by to pci/enumeration for v4.20.
Thanks for the the huge investigative effort!


Since this looks low-risk and fixes several painful issues, I think
this merits a stable tag and being included in v4.19 (instead of
waiting for v4.20).

I moved it to for-linus for v4.19.  Let me know if you object.



The latest iteration does not work on my HP system. The GPU fails to 
power up just like the unpatched kernel.


[  516.833580] amdgpu :01:00.0: Refused to change power state, 
currently in D3
[  516.912885] amdgpu :01:00.0: Refused to change power state, 
currently in D3
[  516.929175] amdgpu :01:00.0: Refused to change power state, 
currently in D3
[  521.932435] [drm:atom_op_jump] *ERROR* atombios stuck in loop for 
more than 5secs aborting
[  521.932440] [drm:amdgpu_atom_execute_table_locked] *ERROR* atombios 
stuck executing C392 (len 62, WS 0, PS 0) @ 0xC3AE
[  521.932442] [drm:amdgpu_atom_execute_table_locked] *ERROR* atombios 
stuck executing ADB8 (len 140, WS 0, PS 8) @ 0xADD3

[  521.932444] [drm:amdgpu_device_resume] *ERROR* amdgpu asic init failed
[  522.883309] amdgpu :01:00.0: Wait for MC idle timedout !
[  523.831676] amdgpu :01:00.0: Wait for MC idle timedout !
[  523.832931] [drm] PCIE GART of 256M enabled (table at 
0x00F4).

[  523.836807] amdgpu: [powerplay] Failed to send Message.
[  523.836862] amdgpu: [powerplay] SMC address must be 4 byte aligned.
[  523.836863] amdgpu: [powerplay] [AVFS][Polaris10_SetupGfxLvlStruct] 
Problems copying VRConfig value over to SMC
[  523.836864] amdgpu: [powerplay] [AVFS][Polaris10_AVFSEventMgr] Could 
not Copy Graphics Level table over to SMU

[  523.836908] amdgpu: [powerplay]
last message was failed ret is 65535
[  523.836924] amdgpu: [powerplay]
failed to send message 252 ret is 65535
[  523.836949] amdgpu: [powerplay]
last message was failed ret is 65535
[  523.836965] amdgpu: [powerplay]
failed to send message 253 ret is 65535
[  523.836989] amdgpu: [powerplay]
last message was failed ret is 65535
[  523.837006] amdgpu: [powerplay]
failed to send message 250 ret is 65535
[  523.837029] amdgpu: [powerplay]
last message was failed ret is 65535
[  523.837045] amdgpu: [powerplay]






---
  drivers/pci/pci.c | 25 +
  1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 29ff9619b5fa..5d58220b6997 100644
--- a/drivers/pci/pci

Re: [Nouveau] [PATCH] PCI: Reprogram bridge prefetch registers on resume

2018-09-10 Thread Thomas Martitz

Hello Daniel,

Am 07.09.18 um 07:36 schrieb Daniel Drake:

On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
after S3 suspend/resume. The affected products include multiple
generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
many errors such as:

 fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04 
[HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
 DRM: failed to idle channel 0 [DRM]

Similarly, the nvidia proprietary driver also fails after resume
(black screen, 100% CPU usage in Xorg process). We shipped a sample
to Nvidia for diagnosis, and their response indicated that it's a
problem with the parent PCI bridge (on the Intel SoC), not the GPU.

Runtime suspend/resume works fine, only S3 suspend is affected.

We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
the cases that I checked, this register has value 0 and we just have to
rewrite that value.

It's very strange that rewriting the exact same register value
makes a difference, but it definitely makes the issue go away.
It's not just acting as some kind of memory barrier, because rewriting
other bridge registers does not work around the issue. There's something
magic in this particular register. We have confirmed this on all
the affected models we have in-hands (X542UQ, UX533FD, X530UN, V272UN).

Additionally, this workaround solves an issue where r8169 MSI-X
interrupts were broken after S3 suspend/resume on Asus X441UAR. This
issue was recently worked around in commit 7bb05b85bc2d ("r8169:
don't use MSI-X on RTL8106e"). It also fixes the same issue on
RTL6186evl/8111evl on an Aimfor-tech laptop that we had not yet
patched. I suspect it will also fix the issue that was worked around in
commit 7c53a722459c ("r8169: don't use MSI-X on RTL8168g").

Thomas Martitz reports that this workaround also solves an issue where
the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
after S3 suspend/resume.



I can confirm that this exact patch also helps on my HP Zbook. Thanks 
for your work on this, resume has been a real pain until now.






  drivers/pci/pci-driver.c | 14 ++
  drivers/pci/setup-bus.c  |  2 +-
  include/linux/pci.h  |  1 +
  3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index bef17c3fca67..034f816570ad 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -524,6 +524,20 @@ static void pci_pm_default_resume_early(struct pci_dev 
*pci_dev)
pci_power_up(pci_dev);
pci_restore_state(pci_dev);
pci_pme_restore(pci_dev);
+
+   /*
+* Redo the PCI bridge prefetch register setup.
+*
+* This works around an Intel PCI bridge issue seen on Asus and HP
+* laptops, where the GPU is not usable after S3 resume.
+* Even though PCI bridge register contents appear to be intact
+* at resume time, rewriting the value of PREF_BASE_UPPER32 is
+* required to make the GPU work.
+* Windows 10 also reprograms these registers during S3 resume.
+*/
+   if (pci_dev->class == PCI_CLASS_BRIDGE_PCI << 8)
+   pci_setup_bridge_mmio_pref(pci_dev);
+
pci_fixup_device(pci_fixup_resume_early, pci_dev);
  }
  
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c

index 79b1824e83b4..cb88288d2a69 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -630,7 +630,7 @@ static void pci_setup_bridge_mmio(struct pci_dev *bridge)
pci_write_config_dword(bridge, PCI_MEMORY_BASE, l);
  }
  
-static void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)

+void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
  {
struct resource *res;
struct pci_bus_region region;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..b15828fc26a4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -934,6 +934,7 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, 
int devfn);
  void pci_device_add(struct pci_dev *dev, struct pci_bus *bus);
  unsigned int pci_scan_child_bus(struct pci_bus *bus);
  void pci_bus_add_device(struct pci_dev *dev);
+void pci_setup_bridge_mmio_pref(struct pci_dev *bridge);
  void pci_read_bridge_bases(struct pci_bus *child);
  struct resource *pci_find_parent_resource(const struct pci_dev *dev,
  struct resource *res);



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] PCI: Reprogram bridge prefetch registers on resume

2018-09-08 Thread Thomas Martitz

Am 07.09.18 um 17:05 schrieb Peter Wu:

On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
<..>

Thomas Martitz reports that this workaround also solves an issue where
the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
after S3 suspend/resume.


Where was this claimed? It is not stated in the linked bug:
(https://bugs.freedesktop.org/show_bug.cgi?id=105760




Actually, I reported that https://patchwork.kernel.org/patch/10583229/ 
works. I updated the bug now, I didn't do so yet because it's closed.


However, I did not actually test the exact patch *this* thread is about. 
Do you want me to give it a spin?


Best regards.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-09-06 Thread Thomas Martitz

Am 31.08.2018 um 09:30 schrieb Daniel Drake:

On over 40 Intel-based Asus products, the nvidia GPU becomes unusable
after S3 suspend/resume. The affected products include multiple
generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
many errors such as:

 fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04 
[HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
 DRM: failed to idle channel 0 [DRM]

Similarly, the nvidia proprietary driver also fails after resume
(black screen, 100% CPU usage in Xorg process). We shipped a sample
to Nvidia for diagnosis, and their response indicated that it's a
problem with the parent PCI bridge (on the Intel SoC), not the GPU.

We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register. In the cases that I checked,
this register has value 0 and we just have to rewrite that value.

It's very strange that rewriting the exact same register value
makes a difference, but it definitely makes the issue go away.
It's not just acting as some kind of memory barrier, because rewriting
other bridge registers does not work around the issue. There's something
magic in this particular register.

We examined our database of Asus hardware and identified 43 products
that we believe are affected. Checking the nvidia GPU parent PCI bridge
on each one, in total 5 Intel PCI bridges need quirking as below.
The quirk will run on bridges even where no nvidia GPU is connected,
but it should be harmless, and we at least limit it to only running
on Asus products.

This fix was tested on all the affected models that we have in hands
(X542UQ, UX533FD, X530UN, V272UN).


Hello,

this patch helps on my HP Zbook 14u G5 which otherwise fails to resume 
the dGPU after suspend. In this case it's a radeon gpu (polaris 10). Of 
course I had to remove the check for ASUS, but made no other changes.


With this patch I can successfully run "DRI_PRIME=1 glxinfo | grep -i 
renderer" and see the radeon, as well as "DRI_PRIME=1 glxgears", after 
resuming from suspend. Attemting that without the patch makes the system 
hang for a few seconds followed by lots of powerplay errors in dmesg. 
glxinfo/gears sometimes use the Intel graphics or show a blank window.


FWIW, this problem was discussed a lot in bug 
https://bugs.freedesktop.org/show_bug.cgi?id=105760 (it's closed only 
because the original bug crash is solved but the root problem is still 
unfixed). Therefore I add Peter Wu and Alex Deucher who attempted to 
help me out already.


I think this supports your other mail where you suggest it should be 
done unconditionally.


Thanks for the patch!

Best regards
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC PATCH v2 0/7] stabilize kepler reclocking

2016-01-06 Thread Thomas Martitz

Am 31.12.2015 um 10:56 schrieb Thomas Martitz:

Am 31. Dezember 2015 03:54:43 MEZ, schrieb Karol Herbst 
<nouv...@karolherbst.de>:

Thomas Martitz <ku...@rockbox.org> hat am 30. Dezember 2015 um 23:53
geschrieben:

Am 25.12.2015 um 18:43 schrieb Pierre Moreau:

Hello,

Maybe my e-mail client is messing with me, but I couldn't find any

dmesg

output
attached to your e-mail. Could you please try to attach it again?

By the way, since you have a Kepler, you should try booting with
"nouveau.War00C800_0=1". That workaround is enabled by default in

4.4-rc5

(IIRC), but that might not be the case on Karol's branch.

Best regards,
Pierre


I did not find that parameter under /sys/module/nouveau/parameters.
Google led me to try nouveau.config=War00C800_0=1 instead.

Anyway, I added both to the kernel cmdline. Unfortunately it made no
difference. Dota 2 still freezes up the system pretty much

immediately

(when 0f pstate is in use).

well so that means that it doesn't freeze when not using 0f?


That's right.



I suspect that nouveau currently just sets the voltage too low, but
this is
another issue
and this issue will be tackled next year.

You could replace info.min with info.max in
drm/nouveau/nvkm/subdev/volt/base.c:nvkm_volt_map
but this will result in more heat generated by your gpu and nouveau
currently
does not verifies if the
ovearheating protection is setup sanely (though it should be setup
already by
the gpu itself)


What's your recommendation to determine suitable values? I could try the binary 
driver under Linux or win10. Under win10 I know some tools (e.g. msi 
afterburner). Note that I know next to nothing about the Kepler architecture 
itself, just the basic voltage-frequency-temperature interactions.


Just a follow up. Your suggestion worked fine for me. Should this change 
be merged?


Best regards.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Nouveau support for GeForce GT 730 or GTX 750 Ti?

2016-01-03 Thread Thomas Martitz

Am 03.01.2016 um 19:52 schrieb Csányi Pál:

2016-01-03 19:32 GMT+01:00 Ilia Mirkin :

On Sun, Jan 3, 2016 at 1:04 PM, Csányi Pál  wrote:

I want to use nouveau driver in the future too.

So I'm in doubt in that that whether to buy
GeForce GT730
So, does nouveau support these cards?

Should I choose GTX 750 Ti or GT 730 or neither of them?

Is that true that GTX 750 Ti is Maxwell and may work with
xf86-video-nouveau v1.0.11, but the next v1.0.12 removed the GM10x
support?
So should I buy instead the GT 730 card?

I would very much recommend a GT 730 over a GTX 750 Ti in terms of
nouveau support, although they are not comparable in price or
performance (GT 730 will be cheaper and slower). But on the bright
side you should be able to reclock a GT 730, so perhaps with nouveau
it really will be faster.
Oh, one last thought, I know NVIDIA loves to rebrand marketing
names... make sure you don't get a GT 730 that's really a Fermi -- if
it says 48 or 96 cores, it's probably a Fermi. If it says 192 or 384
cores then it will almost certainly be a Kepler (which is what you
want).

I can't know without buying it whether is it Fermi or Kepler, right?
I get only these informations from that card:
VGA GIGABYTE NVIDIA GEFORCE GT730, GV-N730D5-2GI, 2GB DDR5,
902/5000MHz, HDMI, DVI-D, D-sub

Is this a Fermi or a Kepler card?


In general I'd look out for Kepler-based cards. These are better 
supported by nouveau and there are some cards which outperform (in 
hardware) a 750 Ti (which is maxwell). I've got a 650 Ti which is slower 
(in hardware) than a 750 Ti (but it should be faster than a 730[1]) and 
I'm pretty happy with it.


[1]: 
http://www.pc-specs.com/gpu/comparison-versus/886/875/geforce-gt-730-vs-geforce-gtx-650-ti



Best regards.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC PATCH v2 0/7] stabilize kepler reclocking

2015-12-31 Thread Thomas Martitz
Am 31. Dezember 2015 03:54:43 MEZ, schrieb Karol Herbst 
<nouv...@karolherbst.de>:
>> Thomas Martitz <ku...@rockbox.org> hat am 30. Dezember 2015 um 23:53
>> geschrieben:
>> 
>> Am 25.12.2015 um 18:43 schrieb Pierre Moreau:
>> > Hello,
>> >
>> > Maybe my e-mail client is messing with me, but I couldn't find any
>dmesg
>> > output
>> > attached to your e-mail. Could you please try to attach it again?
>> >
>> > By the way, since you have a Kepler, you should try booting with
>> > "nouveau.War00C800_0=1". That workaround is enabled by default in
>4.4-rc5
>> > (IIRC), but that might not be the case on Karol's branch.
>> >
>> > Best regards,
>> > Pierre
>> >
>> 
>> I did not find that parameter under /sys/module/nouveau/parameters. 
>> Google led me to try nouveau.config=War00C800_0=1 instead.
>> 
>> Anyway, I added both to the kernel cmdline. Unfortunately it made no 
>> difference. Dota 2 still freezes up the system pretty much
>immediately 
>> (when 0f pstate is in use).
>
>well so that means that it doesn't freeze when not using 0f?
>

That's right.


>I suspect that nouveau currently just sets the voltage too low, but
>this is
>another issue
>and this issue will be tackled next year.
>
>You could replace info.min with info.max in
>drm/nouveau/nvkm/subdev/volt/base.c:nvkm_volt_map
>but this will result in more heat generated by your gpu and nouveau
>currently
>does not verifies if the
>ovearheating protection is setup sanely (though it should be setup
>already by
>the gpu itself)
>

What's your recommendation to determine suitable values? I could try the binary 
driver under Linux or win10. Under win10 I know some tools (e.g. msi 
afterburner). Note that I know next to nothing about the Kepler architecture 
itself, just the basic voltage-frequency-temperature interactions.

>Just keep an eye open on sensors while testing this.
>

What's your recommendation here to continously read temperature data under 
nouveau?

Best regards 

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC PATCH v2 0/7] stabilize kepler reclocking

2015-12-30 Thread Thomas Martitz

Am 25.12.2015 um 18:43 schrieb Pierre Moreau:

Hello,

Maybe my e-mail client is messing with me, but I couldn't find any dmesg output
attached to your e-mail. Could you please try to attach it again?

By the way, since you have a Kepler, you should try booting with
"nouveau.War00C800_0=1". That workaround is enabled by default in 4.4-rc5
(IIRC), but that might not be the case on Karol's branch.

Best regards,
Pierre



I did not find that parameter under /sys/module/nouveau/parameters. 
Google led me to try nouveau.config=War00C800_0=1 instead.


Anyway, I added both to the kernel cmdline. Unfortunately it made no 
difference. Dota 2 still freezes up the system pretty much immediately 
(when 0f pstate is in use).


I verified the config is active through cat 
/sys/module/nouveau/parameters/config (returns a single lien 
"War00C800_0=1" as expected).


Best regards
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC PATCH v2 0/7] stabilize kepler reclocking

2015-12-24 Thread Thomas Martitz

Hello,

following up on myself, it was suggested on IRC that I better attach a 
dmesg output. Here's the output of a clean boot & echo 0f > 
/sys/.../pstate cycle.


I can't spot a message that relates to the reclock action, and there's 
only one weird "nouveau :04:00.0: clk: base: 7 MHz, boost: 7 MHz" 
message.


On the other hand:
# cat /sys/bus/pci/devices/:04:00.0/pstate
07: core 324 MHz memory 648 MHz
0a: core 549 MHz memory 1620 MHz
0f: core 1032 MHz memory 5400 MHz AC DC *
DC: core 1032 MHz memory 5400 MHz

One additional data point: In a gnome3 session it can lock up even 
without starting a game, just by entering the gnome menu (I think the 
gnome desktop is hardware accelerated).


Hope that helps.

Best regards
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC PATCH v2 0/7] stabilize kepler reclocking

2015-12-24 Thread Thomas Martitz

Hello,

first of all, I'm new to this list, so please beer with me. On the other 
hand, I'm a graduate computer systems engineer with experience in Linux 
kernel code, so I can hopefully provide useful input/assistance on this 
topic.


I'm replying because I tried your patches on my setup in the hope they'd 
fix my lockups. Unfortunately they didn't, so I'm offering debug 
assistance to make my system work as well.


Here's my setup:
- Arch Linux
- Lenovo Thinkpad x230 laptop with integrated intel (ivy bridge)
- External GPU (eGPU) connected via ExpressCard <-> PCIe adaptor, a 
nvidia GeForce 650 Ti (Kepler). The PCIe link is unfortunately limited 
to Gen 1 and one lane.
- PRIME with eGPU as primary (following this guide: 
https://wiki.archlinux.org/index.php/PRIME#Discrete_Card_as_Primary_GPU)
- This setup works fine under Windows 10, and the eGPU gives a lot 
better performance than the intel chip even with the limited PCIe link.
- I compiled nouveau.ko, and nothing else, from your out-of-tree fork 
(branch stable_reclocking_kepler, HEAD at bc4767c (bios/fan: hardcode 
the fan mode to linear))


With and without your patches, I get lockups when attempting to reclock 
to the highest level (echo 0f > /sys/.../pstate) and then running an 
actual game (Dota 2 in this instance). Reclock as such appears to work 
fine initially, but the whole system locks up as soon as I start Dota 2. 
glxgears runs fine. echo 0a > /sys/.../pstate works as wel, however the 
performance is poor.


I suspected that reclocking doesn't work fully in that the core voltage 
isn't ramped high enough, so that the GPU (and rest of the system) locks 
up when the load goes above some threshold (glxgears works after all). 
Unfortunately, I can't see any improvement with your patches.


FWIW, I made sure that the self-compiled nouveau.ko is used by running 
make install and then deleting the distro's shipped nouveau.ko. I 
verified this by running modprobe -nv nouveau (it also shows that 
pstate=1 is correctly passed to it). The nouveau.ko (.ko.gz actually) is 
placed under /lib/modules/4.3.3-2-ARCH/extra/nouveau.ko.gz.


Thanks for your ongoing effort to improve the situation on nvidia cards. 
I hope I can be of any help.


Best regards.

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau