[Nouveau] [PATCH] drm/nouveau/i2c: Disable i2c bus access after ->fini()
For a while, we've had the problem of i2c bus access not grabbing a runtime PM ref when it's being used in userspace by i2c-dev, resulting in nouveau spamming the kernel log with errors if anything attempts to access the i2c bus while the GPU is in runtime suspend. An example: [ 130.078386] nouveau :01:00.0: i2c: aux 000d: begin idle timeout Since the GPU is in runtime suspend, the MMIO region that the i2c bus is on isn't accessible. On x86, the standard behavior for accessing an unavailable MMIO region is to just return ~0. Except, that turned out to be a lie. While computers with a clean concious will return ~0 in this scenario, some machines will actually completely hang a CPU on certian bad MMIO accesses. This was witnessed with someone's Lenovo ThinkPad P50, where sensors-detect attempting to access the i2c bus while the GPU was suspended would result in a CPU hang: CPU: 5 PID: 12438 Comm: sensors-detect Not tainted 5.0.0-0.rc4.git3.1.fc30.x86_64 #1 Hardware name: LENOVO 20EQS64N17/20EQS64N17, BIOS N1EET74W (1.47 ) 11/21/2017 RIP: 0010:ioread32+0x2b/0x30 Code: 81 ff ff ff 03 00 77 20 48 81 ff 00 00 01 00 76 05 0f b7 d7 ed c3 48 c7 c6 e1 0c 36 96 e8 2d ff ff ff b8 ff ff ff ff c3 8b 07 0f 1f 40 00 49 89 f0 48 81 fe ff ff 03 00 76 04 40 88 3e c3 48 RSP: 0018:aac3c5007b48 EFLAGS: 0292 ORIG_RAX: ff13 RAX: 0000 RBX: 0000 RCX: 043017a97186 RDX: 0aaa RSI: 0005 RDI: aac3c400e4e4 RBP: 9e6443902c00 R08: aac3c400e4e4 R09: aac3c5007be7 R10: 0004 R11: 0001 R12: 9e6445dd R13: e4e4 R14: 03c4 R15: FS: 7f253155a740() GS:9e644f60() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 5630d1500358 CR3: 000417c44006 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: g94_i2c_aux_xfer+0x326/0x850 [nouveau] nvkm_i2c_aux_i2c_xfer+0x9e/0x140 [nouveau] __i2c_transfer+0x14b/0x620 i2c_smbus_xfer_emulated+0x159/0x680 ? _raw_spin_unlock_irqrestore+0x1/0x60 ? rt_mutex_slowlock.constprop.0+0x13d/0x1e0 ? __lock_is_held+0x59/0xa0 __i2c_smbus_xfer+0x138/0x5a0 i2c_smbus_xfer+0x4f/0x80 i2cdev_ioctl_smbus+0x162/0x2d0 [i2c_dev] i2cdev_ioctl+0x1db/0x2c0 [i2c_dev] do_vfs_ioctl+0x408/0x750 ksys_ioctl+0x5e/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x60/0x1e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f25317f546b Code: 0f 1e fa 48 8b 05 1d da 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed d9 0c 00 f7 d8 64 89 01 48 RSP: 002b:7ffc88caab68 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 5630d0fe7260 RCX: 7f25317f546b RDX: 5630d1598e80 RSI: 0720 RDI: 0003 RBP: 5630d155b968 R08: 0001 R09: 5630d15a1da0 R10: 0070 R11: 0246 R12: 5630d1598e80 R13: 5630d12f3d28 R14: 0720 R15: 5630d12f3ce0 watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sensors-detect:12438] Yikes! While I wanted to try to make it so that accessing an i2c bus on nouveau would wake up the GPU as needed, airlied pointed out that pretty much any usecase for userspace accessing an i2c bus on a GPU (mainly for the DDC brightness control that some displays have) is going to only be useful while there's at least one display enabled on the GPU anyway, and the GPU never sleeps while there's displays running. Since teaching the i2c bus to wake up the GPU on userspace accesses is a good deal more difficult than it might seem, mostly due to the fact that we have to use the i2c bus during runtime resume of the GPU, we instead opt for the easiest solution: don't let userspace access i2c busses on the GPU at all while it's in runtime suspend. Signed-off-by: Lyude Paul Cc: sta...@vger.kernel.org --- .../gpu/drm/nouveau/include/nvkm/subdev/i2c.h | 1 + .../gpu/drm/nouveau/nvkm/subdev/i2c/base.c| 5 + drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 21 ++- drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.h | 1 + 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h index eef54e9b5d77..7c45056c47dd 100644 --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h @@ -36,6 +36,7 @@ struct nvkm_i2c_bus { int id; struct mutex mutex; + u8 enabled; struct list_head head; struct i2c_adapter i2c; }; diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c
Re: [Nouveau] [PATCH] pci/quirks: Add quirk to reset nvgpu at boot for the Lenovo ThinkPad P50
Hi, any update on this/do you guys need any more information here? Would very much like to get this upstream On Fri, 2019-03-22 at 06:30 -0500, Bjorn Helgaas wrote: > On Thu, Mar 21, 2019 at 05:48:19PM -0500, Bjorn Helgaas wrote: > > On Wed, Mar 13, 2019 at 06:25:02PM -0400, Lyude Paul wrote: > > > On Fri, 2019-02-15 at 16:17 -0500, Lyude Paul wrote: > > > > On Thu, 2019-02-14 at 18:43 -0600, Bjorn Helgaas wrote: > > > > > On Tue, Feb 12, 2019 at 05:02:30PM -0500, Lyude Paul wrote: > > > > > > On a very specific subset of ThinkPad P50 SKUs, particularly > > > > > > ones that come with a Quadro M1000M chip instead of the M2000M > > > > > > variant, the BIOS seems to have a very nasty habit of not > > > > > > always resetting the secondary Nvidia GPU between full reboots > > > > > > if the laptop is configured in Hybrid Graphics mode. The > > > > > > reason for this happening is unknown, but the following steps > > > > > > and possibly a good bit of patience will reproduce the issue: > > > > > > > > > > > > 1. Boot up the laptop normally in Hybrid graphics mode > > > > > > 2. Make sure nouveau is loaded and that the GPU is awake > > > > > > 2. Allow the nvidia GPU to runtime suspend itself after being idle > > > > > > 3. Reboot the machine, the more sudden the better (e.g sysrq-b may > > > > > > help) > > > > > > 4. If nouveau loads up properly, reboot the machine again and go > > > > > > back to > > > > > > step 2 until you reproduce the issue > > > > > > > > > > > > This results in some very strange behavior: the GPU will quite > > > > > > literally be left in exactly the same state it was in when the > > > > > > previously booted kernel started the reboot. This has all > > > > > > sorts of bad sideaffects: for starters, this completely breaks > > > > > > nouveau starting with a mysterious EVO channel failure that > > > > > > happens well before we've actually used the EVO channel for > > > > > > anything: > > > > Thanks for the hybrid tutorial (snipped from this response). IIUC, > > what you said was that in hybrid mode, the Intel GPU drives the > > built-in display and the Nvidia GPU drives any external displays and > > may be used for DRI PRIME rendering (whatever that is). But since you > > say the Nvidia device gets runtime suspended, I assume there's no > > external display here and you're not using DRI PRIME. > > > > I wonder if it's related to the fact that the Nvidia GPU has been > > runtime suspended before you do the reboot. Can you try turning of > > runtime power management for the GPU by setting the runpm module > > parameter to 0? I *think* this would be booting with > > "nouveau.runpm=0". > > Sorry, I wasn't really thinking here. You already *said* this is > related to runtime suspend. It only happens when the Nvidia GPU has > been suspended. > > I don't know that much about suspend, but ISTR seeing comments about > resuming devices before we shutdown. If we do that, maybe there's > some kind of race between that resume and the reboot? > > Bjorn -- Cheers, Lyude Paul ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau