[Nouveau] [PATCH] drm/nouveau/i2c: Disable i2c bus access after ->fini()

2019-04-03 Thread Lyude Paul
For a while, we've had the problem of i2c bus access not grabbing
a runtime PM ref when it's being used in userspace by i2c-dev, resulting
in nouveau spamming the kernel log with errors if anything attempts to
access the i2c bus while the GPU is in runtime suspend. An example:

[  130.078386] nouveau :01:00.0: i2c: aux 000d: begin idle timeout 

Since the GPU is in runtime suspend, the MMIO region that the i2c bus is
on isn't accessible. On x86, the standard behavior for accessing an
unavailable MMIO region is to just return ~0.

Except, that turned out to be a lie. While computers with a clean
concious will return ~0 in this scenario, some machines will actually
completely hang a CPU on certian bad MMIO accesses. This was witnessed
with someone's Lenovo ThinkPad P50, where sensors-detect attempting to
access the i2c bus while the GPU was suspended would result in a CPU
hang:

  CPU: 5 PID: 12438 Comm: sensors-detect Not tainted 
5.0.0-0.rc4.git3.1.fc30.x86_64 #1
  Hardware name: LENOVO 20EQS64N17/20EQS64N17, BIOS N1EET74W (1.47 ) 11/21/2017
  RIP: 0010:ioread32+0x2b/0x30
  Code: 81 ff ff ff 03 00 77 20 48 81 ff 00 00 01 00 76 05 0f b7 d7 ed c3
  48 c7 c6 e1 0c 36 96 e8 2d ff ff ff b8 ff ff ff ff c3 8b 07  0f 1f
  40 00 49 89 f0 48 81 fe ff ff 03 00 76 04 40 88 3e c3 48
  RSP: 0018:aac3c5007b48 EFLAGS: 0292 ORIG_RAX: ff13
  RAX: 0000 RBX: 0000 RCX: 043017a97186
  RDX: 0aaa RSI: 0005 RDI: aac3c400e4e4
  RBP: 9e6443902c00 R08: aac3c400e4e4 R09: aac3c5007be7
  R10: 0004 R11: 0001 R12: 9e6445dd
  R13: e4e4 R14: 03c4 R15: 
  FS:  7f253155a740() GS:9e644f60() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 5630d1500358 CR3: 000417c44006 CR4: 003606e0
  DR0:  DR1:  DR2: 
  DR3:  DR6: fffe0ff0 DR7: 0400
  Call Trace:
   g94_i2c_aux_xfer+0x326/0x850 [nouveau]
   nvkm_i2c_aux_i2c_xfer+0x9e/0x140 [nouveau]
   __i2c_transfer+0x14b/0x620
   i2c_smbus_xfer_emulated+0x159/0x680
   ? _raw_spin_unlock_irqrestore+0x1/0x60
   ? rt_mutex_slowlock.constprop.0+0x13d/0x1e0
   ? __lock_is_held+0x59/0xa0
   __i2c_smbus_xfer+0x138/0x5a0
   i2c_smbus_xfer+0x4f/0x80
   i2cdev_ioctl_smbus+0x162/0x2d0 [i2c_dev]
   i2cdev_ioctl+0x1db/0x2c0 [i2c_dev]
   do_vfs_ioctl+0x408/0x750
   ksys_ioctl+0x5e/0x90
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x60/0x1e0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7f25317f546b
  Code: 0f 1e fa 48 8b 05 1d da 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
  ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
  f0 ff ff 73 01 c3 48 8b 0d ed d9 0c 00 f7 d8 64 89 01 48
  RSP: 002b:7ffc88caab68 EFLAGS: 0246 ORIG_RAX: 0010
  RAX: ffda RBX: 5630d0fe7260 RCX: 7f25317f546b
  RDX: 5630d1598e80 RSI: 0720 RDI: 0003
  RBP: 5630d155b968 R08: 0001 R09: 5630d15a1da0
  R10: 0070 R11: 0246 R12: 5630d1598e80
  R13: 5630d12f3d28 R14: 0720 R15: 5630d12f3ce0
  watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sensors-detect:12438]

Yikes! While I wanted to try to make it so that accessing an i2c bus on
nouveau would wake up the GPU as needed, airlied pointed out that pretty
much any usecase for userspace accessing an i2c bus on a GPU (mainly for
the DDC brightness control that some displays have) is going to only be
useful while there's at least one display enabled on the GPU anyway, and
the GPU never sleeps while there's displays running.

Since teaching the i2c bus to wake up the GPU on userspace accesses is a
good deal more difficult than it might seem, mostly due to the fact that
we have to use the i2c bus during runtime resume of the GPU, we instead
opt for the easiest solution: don't let userspace access i2c busses on
the GPU at all while it's in runtime suspend.

Signed-off-by: Lyude Paul 
Cc: sta...@vger.kernel.org
---
 .../gpu/drm/nouveau/include/nvkm/subdev/i2c.h |  1 +
 .../gpu/drm/nouveau/nvkm/subdev/i2c/base.c|  5 +
 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 21 ++-
 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.h |  1 +
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h 
b/drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h
index eef54e9b5d77..7c45056c47dd 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h
@@ -36,6 +36,7 @@ struct nvkm_i2c_bus {
int id;
 
struct mutex mutex;
+   u8 enabled;
struct list_head head;
struct i2c_adapter i2c;
 };
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c 

Re: [Nouveau] [PATCH] pci/quirks: Add quirk to reset nvgpu at boot for the Lenovo ThinkPad P50

2019-04-03 Thread Lyude Paul
Hi, any update on this/do you guys need any more information here? Would very
much like to get this upstream

On Fri, 2019-03-22 at 06:30 -0500, Bjorn Helgaas wrote:
> On Thu, Mar 21, 2019 at 05:48:19PM -0500, Bjorn Helgaas wrote:
> > On Wed, Mar 13, 2019 at 06:25:02PM -0400, Lyude Paul wrote:
> > > On Fri, 2019-02-15 at 16:17 -0500, Lyude Paul wrote:
> > > > On Thu, 2019-02-14 at 18:43 -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Feb 12, 2019 at 05:02:30PM -0500, Lyude Paul wrote:
> > > > > > On a very specific subset of ThinkPad P50 SKUs, particularly
> > > > > > ones that come with a Quadro M1000M chip instead of the M2000M
> > > > > > variant, the BIOS seems to have a very nasty habit of not
> > > > > > always resetting the secondary Nvidia GPU between full reboots
> > > > > > if the laptop is configured in Hybrid Graphics mode. The
> > > > > > reason for this happening is unknown, but the following steps
> > > > > > and possibly a good bit of patience will reproduce the issue:
> > > > > > 
> > > > > > 1. Boot up the laptop normally in Hybrid graphics mode
> > > > > > 2. Make sure nouveau is loaded and that the GPU is awake
> > > > > > 2. Allow the nvidia GPU to runtime suspend itself after being idle
> > > > > > 3. Reboot the machine, the more sudden the better (e.g sysrq-b may
> > > > > > help)
> > > > > > 4. If nouveau loads up properly, reboot the machine again and go
> > > > > > back to
> > > > > > step 2 until you reproduce the issue
> > > > > > 
> > > > > > This results in some very strange behavior: the GPU will quite
> > > > > > literally be left in exactly the same state it was in when the
> > > > > > previously booted kernel started the reboot. This has all
> > > > > > sorts of bad sideaffects: for starters, this completely breaks
> > > > > > nouveau starting with a mysterious EVO channel failure that
> > > > > > happens well before we've actually used the EVO channel for
> > > > > > anything:
> > 
> > Thanks for the hybrid tutorial (snipped from this response).  IIUC,
> > what you said was that in hybrid mode, the Intel GPU drives the
> > built-in display and the Nvidia GPU drives any external displays and
> > may be used for DRI PRIME rendering (whatever that is).  But since you
> > say the Nvidia device gets runtime suspended, I assume there's no
> > external display here and you're not using DRI PRIME.
> > 
> > I wonder if it's related to the fact that the Nvidia GPU has been
> > runtime suspended before you do the reboot.  Can you try turning of
> > runtime power management for the GPU by setting the runpm module
> > parameter to 0?  I *think* this would be booting with
> > "nouveau.runpm=0".
> 
> Sorry, I wasn't really thinking here.  You already *said* this is
> related to runtime suspend.  It only happens when the Nvidia GPU has
> been suspended.
> 
> I don't know that much about suspend, but ISTR seeing comments about
> resuming devices before we shutdown.  If we do that, maybe there's
> some kind of race between that resume and the reboot?
> 
> Bjorn
-- 
Cheers,
Lyude Paul

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau