On Mon, Jan 11, 2021 at 2:43 PM Thierry Reding <tred...@nvidia.com> wrote: > > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > Hi Rafael, > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > fails to suspend when running 5.11-rc kernels: bisected to > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > me to switch on a debug option to extract further info if that may help. > > Hi Hugh, > > Quoting what I think are the relevant parts of that log:
I'm not sure how I overlooked that part of the log. Oh well. > [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) > [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 This is a transport device read operation failing, but I'm not sure how it is related to suspend. > [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. And this is the rmi_write() in rmi_f01_suspend() failing AFAICS. > [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 > [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 > [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 So the call chain is rmi_smb_suspend()->rmi_driver_suspend()->rmi_suspend_functions()->suspend_one_function()->rmi_f01_suspend(). > [ 34.475130] PM: Device 6-002c failed to suspend: error -6 > [ 34.475187] PM: Some devices failed to suspend, or early wake event > detected > [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to > F03 TX register (-6). > [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to > F03 TX register (-6). > [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to > change enabled interrupts! > [ 34.487935] acpi LNXPOWER:02: Turning OFF > [ 34.488707] acpi LNXPOWER:01: Turning OFF > [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to > change enabled interrupts! > [ 34.489669] psmouse: probe of serio2 failed with error -1 > [ 34.489882] OOM killer enabled. > [ 34.489891] Restarting tasks ... done. > [ 34.589183] PM: suspend exit > [ 34.589839] PM: suspend entry (s2idle) > [ 34.605884] Filesystems sync: 0.017 seconds > [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. > [ 34.613645] OOM killer disabled. > [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) > done. > [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) > [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 > [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 > [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > [ 34.653160] PM: Device 6-002c failed to suspend: error -6 > [ 34.653174] PM: Some devices failed to suspend, or early wake event > detected > [ 34.660515] OOM killer enabled. > [ 34.660524] Restarting tasks ... > [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to > change enabled interrupts! > [ 34.661591] psmouse: probe of serio2 failed with error -1 > [ 34.669469] done. > [ 34.748386] PM: suspend exit > > I think what might be happening here is that the offending patch causes > some devices to be reordered in a way different to how they were ordered > originally and the rmi4 driver currently depends on that implicit order. Yes, that's what appears to be happening. > Interestingly one of the bugs that the offending patch fixes is similar > in the failure mode but for the reverse reason: the implicit order > causes suspend/resume to fail. > > I suspect that the underlying reason here is that rmi4 needs something > in order to successfully suspend (i.e. read the IRQ status registers) > that has already been suspended where it hadn't prior to the offending > patch. Definitely, something has been suspended prematurely. > It can't be the I2C controller itself that has been suspended, > because the parent/child relationship should prevent that from > happening. Well, assuming that there is such a parent-child dependency. It looks like there is at least one level of indirection between i2c and the affected device. > I'm not familiar with how exactly rmi4 works, so I'll have to do > some digging to hopefully pinpoint exactly what's going wrong here. > > In the meantime, it would be useful to know what exactly the I2C > hierarchy looks like. For example, what's the I2C controller that the > RMI4 device is hooked up to. According to the above, that's I2C bus 6, > so you should be able to find out some details about it by inspecting > the corresponding sysfs nodes: > > $ ls -l /sys/class/i2c-adapter/i2c-6/ > $ cat /sys/class/i2c-adapter/i2c-6/name > $ ls -l /sys/class/i2c-adapter/i2c-6/device/ > > Thierry