Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)
On Thu, Jan 17, 2019 at 01:05:35PM +0100, Hans de Goede wrote: > On 17-01-19 10:12, Dean Wallace wrote: > > On 17-01-19, Mogens Jensen wrote: > > > Kernel is compiled with SND_SOC_INTEL_CHT_BSW_MAX98090_TI_MACH and the > > > quirk seems to have fixed the problem caused by commit 648e921888ad > > > ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL"), as sound is now > > > working if running "speaker-test" on my system which is clean ALSA. > > Note being "clean ALSA" is really not a good thing now a days, > for lots of things we depend on pulseaudio (like setting > up UCM mixer profiles). FWIW I disagree because PA never worked for me. I simply used "alsaucm -c chtcx2072x set _verb HiFi". But I was surprised that PA does the ALSA UCM setup but it's not documented well that you need to do it by other means if you don't use PA. https://bugzilla.kernel.org/show_bug.cgi?id=115531#c72 Regards, Johannes
Re: [PATCH] PM: Document rules on using pm_runtime_resume() in system suspend callbacks
On Thu, Sep 21, 2017 at 02:39:30AM +0200, Rafael J. Wysocki wrote: > On Wed, Sep 20, 2017 at 6:27 PM, Johannes Stezenbach <j...@sig21.net> wrote: > > > > E.g. an audio codec could keep running > > while the i2c bus used to program its registers can be runtime suspended. > > If this is correct I think it would be useful to spell it out explicitly > > in the documentation. > > That's because the i2c bus uses the ignore_children flag that allows > it to override the general rules. :-) Ah! I was looking at Documentation/driver-api/pm only (which is changed by your patch), but this is documented in Documentation/power (and obviously I hadn't checked the code, shame on me). > direct_complete has nothing to do with this. Oh? Reading again, do I get this right: 1. simple method: always call pm_runtime_resume() in ->suspend(), then suspend the driver again 2. optimization: if pm_runtime_suspended(), the driver's ->suspend() can possibly do nothing if conditions permit, otherwise it calls pm_runtime_resume() and then suspends 3. optimization: tell pm core to skip ->suspend() via return value from ->prepare() which sets direct_complete ...and your patch only deals with 1 and 2. Sorry to hijack your thread for side discussion, it was inadvertant due to my lack of understanding. > First off, the PM core does check the direct_complete flag in > __device_suspend() and does more-or-less what you are saying. > > However, that flag is initialized in device_prepare() with the help of > the ->suspend() return value, because whether or not it makes sense to you mean ->prepare(), right? > set that flag depends on some conditions that may change between > consecutive system suspend-resume cycles in general and need to be > checked in advance before setting it. > > HTH It does, however the question remains *why* it needs to check it in ->prepare() and not right before calling ->suspend(). Using ->prepare() for the purpose seems wrong since it traverses the hierarchy in the "wrong" order. Only right before calling ->suspend() the driver knows if its current state allows it to skip any further actions for suspend, because suspending children or other users may cause pm_runtime_resume() for it. (In the back of my head I have the scenario of bug #196861, some completely different driver uses i2c via ACPI OpRegion during its suspend.) Thanks, Johannes
Re: [PATCH] PM: Document rules on using pm_runtime_resume() in system suspend callbacks
On Thu, Sep 21, 2017 at 02:39:30AM +0200, Rafael J. Wysocki wrote: > On Wed, Sep 20, 2017 at 6:27 PM, Johannes Stezenbach wrote: > > > > E.g. an audio codec could keep running > > while the i2c bus used to program its registers can be runtime suspended. > > If this is correct I think it would be useful to spell it out explicitly > > in the documentation. > > That's because the i2c bus uses the ignore_children flag that allows > it to override the general rules. :-) Ah! I was looking at Documentation/driver-api/pm only (which is changed by your patch), but this is documented in Documentation/power (and obviously I hadn't checked the code, shame on me). > direct_complete has nothing to do with this. Oh? Reading again, do I get this right: 1. simple method: always call pm_runtime_resume() in ->suspend(), then suspend the driver again 2. optimization: if pm_runtime_suspended(), the driver's ->suspend() can possibly do nothing if conditions permit, otherwise it calls pm_runtime_resume() and then suspends 3. optimization: tell pm core to skip ->suspend() via return value from ->prepare() which sets direct_complete ...and your patch only deals with 1 and 2. Sorry to hijack your thread for side discussion, it was inadvertant due to my lack of understanding. > First off, the PM core does check the direct_complete flag in > __device_suspend() and does more-or-less what you are saying. > > However, that flag is initialized in device_prepare() with the help of > the ->suspend() return value, because whether or not it makes sense to you mean ->prepare(), right? > set that flag depends on some conditions that may change between > consecutive system suspend-resume cycles in general and need to be > checked in advance before setting it. > > HTH It does, however the question remains *why* it needs to check it in ->prepare() and not right before calling ->suspend(). Using ->prepare() for the purpose seems wrong since it traverses the hierarchy in the "wrong" order. Only right before calling ->suspend() the driver knows if its current state allows it to skip any further actions for suspend, because suspending children or other users may cause pm_runtime_resume() for it. (In the back of my head I have the scenario of bug #196861, some completely different driver uses i2c via ACPI OpRegion during its suspend.) Thanks, Johannes
Re: [PATCH] PM: Document rules on using pm_runtime_resume() in system suspend callbacks
On Wed, Sep 20, 2017 at 04:01:32PM +0200, Rafael J. Wysocki wrote: > On Wed, Sep 20, 2017 at 2:28 PM, Ulf Hanssonwrote: > > On 20 September 2017 at 02:26, Rafael J. Wysocki wrote: > >> > >> Second, leaving devices in runtime suspend in the "suspend" phase of system > >> suspend is fishy even when their runtime PM is disabled, because that > >> doesn't > >> guarantee anything regarding their children or possible consumers. Runtime > >> PM may still be enabled for those devices at that time and runtime resume > >> may > >> be triggered for them later, in which case it all quickly falls apart. > > > > This is true, although to me this is a about a different problem and > > has very little to do with pm_runtime_force_suspend(). > > > > More precisely, whether runtime PM becomes disabled in the suspend > > phase or suspend_late phase, really doesn't matter. Because in the end > > this is about suspending/resuming devices in the correct order. > > Yes, it is, but this is not my point (I didn't make it clear enough I guess). > > At the time you make the decision to disable runtime PM for a parent > (say) and leave it in runtime suspend, all of its children are > suspended just fine (otherwise the parent wouldn't have been suspended > too). However, you *also* need to make sure that there will be no > attempts to resume any of them *after* that point, which practically > means that either runtime PM has to have been disabled already for all > of them at the time it is disabled for the parent, or there has to be > another guarantee in place. > > That's why the core tries to enforce the "runtime PM disabled for the > entire hierarchy below" guarantee for the devices with direct_complete > set, but that may just be overkill in many cases. I guess it may be > better to use WARN_ON() to catch the cases in which things may really > go wrong. I read this half a dozen times and I'm still confused. Moreover, Documentation/driver-api/pm/devices.rst says: Runtime Power Management model: Devices may also be put into low-power states while the system is running, independently of other power management activity in principle. However, devices are not generally independent of each other (for example, a parent device cannot be suspended unless all of its child devices have been suspended). ... However, isn't this a fundamental difference of runtime suspend vs. system suspend that parent devices *can* be runtime suspended before their children? E.g. an audio codec could keep running while the i2c bus used to program its registers can be runtime suspended. If this is correct I think it would be useful to spell it out explicitly in the documentation. During system suspend, pm core will suspend children first, and if the child's ->suspend hook uses the i2c bus to access registers, it will implicitly runtime resume the i2c bus (e.g. due to pm_runtime_get_sync() in i2c_dw_xfer()). Later pm core will ->suspend the i2c bus. I have a hunch the root of the problem is that ->prepare walks the tree in top-down order, and its return value is used to decide about direct-complete. Why does it do that? Shouldn't pm core check the direct_complete flag during ->suspend if the device is in runtime suspend, to decide whether to skip runtime resume + ->suspend for *this* device? Johannes
Re: [PATCH] PM: Document rules on using pm_runtime_resume() in system suspend callbacks
On Wed, Sep 20, 2017 at 04:01:32PM +0200, Rafael J. Wysocki wrote: > On Wed, Sep 20, 2017 at 2:28 PM, Ulf Hansson wrote: > > On 20 September 2017 at 02:26, Rafael J. Wysocki wrote: > >> > >> Second, leaving devices in runtime suspend in the "suspend" phase of system > >> suspend is fishy even when their runtime PM is disabled, because that > >> doesn't > >> guarantee anything regarding their children or possible consumers. Runtime > >> PM may still be enabled for those devices at that time and runtime resume > >> may > >> be triggered for them later, in which case it all quickly falls apart. > > > > This is true, although to me this is a about a different problem and > > has very little to do with pm_runtime_force_suspend(). > > > > More precisely, whether runtime PM becomes disabled in the suspend > > phase or suspend_late phase, really doesn't matter. Because in the end > > this is about suspending/resuming devices in the correct order. > > Yes, it is, but this is not my point (I didn't make it clear enough I guess). > > At the time you make the decision to disable runtime PM for a parent > (say) and leave it in runtime suspend, all of its children are > suspended just fine (otherwise the parent wouldn't have been suspended > too). However, you *also* need to make sure that there will be no > attempts to resume any of them *after* that point, which practically > means that either runtime PM has to have been disabled already for all > of them at the time it is disabled for the parent, or there has to be > another guarantee in place. > > That's why the core tries to enforce the "runtime PM disabled for the > entire hierarchy below" guarantee for the devices with direct_complete > set, but that may just be overkill in many cases. I guess it may be > better to use WARN_ON() to catch the cases in which things may really > go wrong. I read this half a dozen times and I'm still confused. Moreover, Documentation/driver-api/pm/devices.rst says: Runtime Power Management model: Devices may also be put into low-power states while the system is running, independently of other power management activity in principle. However, devices are not generally independent of each other (for example, a parent device cannot be suspended unless all of its child devices have been suspended). ... However, isn't this a fundamental difference of runtime suspend vs. system suspend that parent devices *can* be runtime suspended before their children? E.g. an audio codec could keep running while the i2c bus used to program its registers can be runtime suspended. If this is correct I think it would be useful to spell it out explicitly in the documentation. During system suspend, pm core will suspend children first, and if the child's ->suspend hook uses the i2c bus to access registers, it will implicitly runtime resume the i2c bus (e.g. due to pm_runtime_get_sync() in i2c_dw_xfer()). Later pm core will ->suspend the i2c bus. I have a hunch the root of the problem is that ->prepare walks the tree in top-down order, and its return value is used to decide about direct-complete. Why does it do that? Shouldn't pm core check the direct_complete flag during ->suspend if the device is in runtime suspend, to decide whether to skip runtime resume + ->suspend for *this* device? Johannes
Re: [PATCH 2/3] input/keyboard: Add support for Dollar Cove TI power button
On Tue, Aug 22, 2017 at 12:58:07PM +0200, Takashi Iwai wrote: > I updated the patches and now pushed to topic/dollar-cove-ti-4.13-v2 > branch. Will resubmit v2 (tomorrow or later) once after gathering > reviews. FWIW I tested current Linus's master + topic/dollar-cove-ti-4.13-v2 + topic/soc-cx2072x-4.13 + my test patches, no observable difference to topic/dollar-cove-ti-4.13 on E200HA. Still hoping someone would give me a hint about possible causes for the SoC entering S0i1 only instead of S0i3? (https://bugzilla.kernel.org/show_bug.cgi?id=193891) Where do I start looking? Thanks, Johannes
Re: [PATCH 2/3] input/keyboard: Add support for Dollar Cove TI power button
On Tue, Aug 22, 2017 at 12:58:07PM +0200, Takashi Iwai wrote: > I updated the patches and now pushed to topic/dollar-cove-ti-4.13-v2 > branch. Will resubmit v2 (tomorrow or later) once after gathering > reviews. FWIW I tested current Linus's master + topic/dollar-cove-ti-4.13-v2 + topic/soc-cx2072x-4.13 + my test patches, no observable difference to topic/dollar-cove-ti-4.13 on E200HA. Still hoping someone would give me a hint about possible causes for the SoC entering S0i1 only instead of S0i3? (https://bugzilla.kernel.org/show_bug.cgi?id=193891) Where do I start looking? Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 05:58:26PM +0200, Mika Westerberg wrote: > On Thu, Feb 02, 2017 at 04:42:43PM +0100, Johannes Stezenbach wrote: > > On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote: > > > Is the model Asus E200HA? Something like this (sorry the information is > > > in Finnish but the machine should look the same): > > > > > > https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone > > > > Looks right, mine is E200HA-FD0004TS but I think that just means dark blue > > color. > > OK, we have one other Cherrytrail machine here which may have the same > PMIC. We'll check that first and if it does not have the same, I'll > order the above machine. Probably it's too early to ask, but did you go for the E200HA or what device are you going to use? And did you start poking at it, or what timeframe can we expect some patches to test? BTW, just to clarify about the test patches I added in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=193891 You can use them but I also don't mind if they go to the garbage can, they were just quickly cobbled together for testing. Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 05:58:26PM +0200, Mika Westerberg wrote: > On Thu, Feb 02, 2017 at 04:42:43PM +0100, Johannes Stezenbach wrote: > > On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote: > > > Is the model Asus E200HA? Something like this (sorry the information is > > > in Finnish but the machine should look the same): > > > > > > https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone > > > > Looks right, mine is E200HA-FD0004TS but I think that just means dark blue > > color. > > OK, we have one other Cherrytrail machine here which may have the same > PMIC. We'll check that first and if it does not have the same, I'll > order the above machine. Probably it's too early to ask, but did you go for the E200HA or what device are you going to use? And did you start poking at it, or what timeframe can we expect some patches to test? BTW, just to clarify about the test patches I added in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=193891 You can use them but I also don't mind if they go to the garbage can, they were just quickly cobbled together for testing. Thanks, Johannes
Re: Cherryview wake up events
On Fri, Feb 03, 2017 at 12:00:00PM +0200, Mika Westerberg wrote: > Just for book keeping purposes, can you file a kernel.org bugzilla bug > about this and add all the necessary information, and your patches > there? You can assign the bug directly to me. I filed it but cannot assign it, added you to CC. https://bugzilla.kernel.org/show_bug.cgi?id=193891 Thanks, Johannes
Re: Cherryview wake up events
On Fri, Feb 03, 2017 at 12:00:00PM +0200, Mika Westerberg wrote: > Just for book keeping purposes, can you file a kernel.org bugzilla bug > about this and add all the necessary information, and your patches > there? You can assign the bug directly to me. I filed it but cannot assign it, added you to CC. https://bugzilla.kernel.org/show_bug.cgi?id=193891 Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 05:58:26PM +0200, Mika Westerberg wrote: > On Thu, Feb 02, 2017 at 04:42:43PM +0100, Johannes Stezenbach wrote: > > On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote: > > > Is the model Asus E200HA? Something like this (sorry the information is > > > in Finnish but the machine should look the same): > > > > > > https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone > > > > Looks right, mine is E200HA-FD0004TS but I think that just means dark blue > > color. > > OK, we have one other Cherrytrail machine here which may have the same > PMIC. We'll check that first and if it does not have the same, I'll > order the above machine. In case it is useful to know, I installed Debian stretch following this: https://wiki.debian.org/InstallingDebianOn/Asus/E200HA I built my kernel using a relatively minimal kernel config, let me know if you want it. I could also post the two patches which port the mfd and opregion drivers, but they are straight forward copies of intel_soc_pmic_crc.c and intel_pmic_crc.c from 4.10.0-rc6+ with code copy from the ProductionKernelQuilts patches and s/crc/dc_ti/ etc., except I scamped the thermal handler to skip the ADC driver port for now. Maybe I should've used intel_pmic_xpower.c instead of intel_pmic_crc.c, since as I write this I see there is a no-op intel_xpower_pmic_gpio_handler() registered. This is the trick that fixes this: \_SB.PCI0.I2C7.PMI2.AVBG Integer 8be7b74d9be0 01 = 0001 But now it generates ACPI errors about thermal zone and "acpi -V" usually hangs it up. [5.500927] ACPI Exception: AE_ERROR, Returned by Handler for [UserDefinedRegion] (20160930/evregion-300) [5.503842] No Local Variables are initialized for method [TMPR] [5.506703] No Arguments are initialized for method [TMPR] [5.509557] ACPI Error: Method parse/execution failed [\_SB.ATKD.TMPR] (Node 8a7d374e87f8), AE_ERROR (20160930/ps parse-543) [5.512481] ACPI Error: Method parse/execution failed [\_SB.ATKD.WMNB] (Node 8a7d374e7ed8), AE_ERROR (20160930/ps parse-543) [6.545403] i2c_designware 808622C1:06: controller timed out [6.550763] ACPI Exception: AE_ERROR, Returned by Handler for [UserDefinedRegion] (20160930/evregion-300) [6.555783] No Local Variables are initialized for method [TMPR] [6.558769] No Arguments are initialized for method [TMPR] [6.561571] ACPI Error: Method parse/execution failed [\_SB.ATKD.TMPR] (Node 8a7d374e87f8), AE_ERROR (20160930/ps parse-543) [6.564487] ACPI Error: Method parse/execution failed [\_SB.ATKD.WMNB] (Node 8a7d374e7ed8), AE_ERROR (20160930/ps parse-543) (I knew my thermal opregion code was preliminary but I didn't expect it to error.) Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 05:58:26PM +0200, Mika Westerberg wrote: > On Thu, Feb 02, 2017 at 04:42:43PM +0100, Johannes Stezenbach wrote: > > On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote: > > > Is the model Asus E200HA? Something like this (sorry the information is > > > in Finnish but the machine should look the same): > > > > > > https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone > > > > Looks right, mine is E200HA-FD0004TS but I think that just means dark blue > > color. > > OK, we have one other Cherrytrail machine here which may have the same > PMIC. We'll check that first and if it does not have the same, I'll > order the above machine. In case it is useful to know, I installed Debian stretch following this: https://wiki.debian.org/InstallingDebianOn/Asus/E200HA I built my kernel using a relatively minimal kernel config, let me know if you want it. I could also post the two patches which port the mfd and opregion drivers, but they are straight forward copies of intel_soc_pmic_crc.c and intel_pmic_crc.c from 4.10.0-rc6+ with code copy from the ProductionKernelQuilts patches and s/crc/dc_ti/ etc., except I scamped the thermal handler to skip the ADC driver port for now. Maybe I should've used intel_pmic_xpower.c instead of intel_pmic_crc.c, since as I write this I see there is a no-op intel_xpower_pmic_gpio_handler() registered. This is the trick that fixes this: \_SB.PCI0.I2C7.PMI2.AVBG Integer 8be7b74d9be0 01 = 0001 But now it generates ACPI errors about thermal zone and "acpi -V" usually hangs it up. [5.500927] ACPI Exception: AE_ERROR, Returned by Handler for [UserDefinedRegion] (20160930/evregion-300) [5.503842] No Local Variables are initialized for method [TMPR] [5.506703] No Arguments are initialized for method [TMPR] [5.509557] ACPI Error: Method parse/execution failed [\_SB.ATKD.TMPR] (Node 8a7d374e87f8), AE_ERROR (20160930/ps parse-543) [5.512481] ACPI Error: Method parse/execution failed [\_SB.ATKD.WMNB] (Node 8a7d374e7ed8), AE_ERROR (20160930/ps parse-543) [6.545403] i2c_designware 808622C1:06: controller timed out [6.550763] ACPI Exception: AE_ERROR, Returned by Handler for [UserDefinedRegion] (20160930/evregion-300) [6.555783] No Local Variables are initialized for method [TMPR] [6.558769] No Arguments are initialized for method [TMPR] [6.561571] ACPI Error: Method parse/execution failed [\_SB.ATKD.TMPR] (Node 8a7d374e87f8), AE_ERROR (20160930/ps parse-543) [6.564487] ACPI Error: Method parse/execution failed [\_SB.ATKD.WMNB] (Node 8a7d374e7ed8), AE_ERROR (20160930/ps parse-543) (I knew my thermal opregion code was preliminary but I didn't expect it to error.) Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote: > OK, I guess it is easier if I just order one of those machines here and > figure out how to get the PMIC driver working. Oh, I assumed the bottleneck is developer time, not lack of hardware... > Is the model Asus E200HA? Something like this (sorry the information is > in Finnish but the machine should look the same): > > https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone Looks right, mine is E200HA-FD0004TS but I think that just means dark blue color. > Also can you remind me what exactly is not working so we can prioritize? There are reports hibernate isn't working, presumably because the storage is 32MB eMMC. I've never tried. It doesn't support ACPI S3 (suspend-to-RAM). So currently one has to boot+shutdown everytime (or keep it running). 1. There seems to be no way to wake it up after "echo freeze >/sys/power/state". That is the reason for wanting the power button to wake it up. Whether the PB creates an input event is secondary. (the LID also doesn't wake it up, but it creates an input event) 2. I've no idea what would be the power consumption in freeze state, so I guess support for the S0ix states is needed 3. It randomly hangs at boot, often with a message related to i2c timeout. I tried Hans de Goede's patches but it didn't work for me (question is if the semphore address is the same for AXP288 and TI DCove; the DSDT has the _SEM method so the semphore is needed). Everything else is secondary. E.g. there is an ADC driver used for thermal used by the opregion driver, I didn't port it and just implemented like for AXP288 by simple register reads (which might not work for TI). We could fix this later. Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote: > OK, I guess it is easier if I just order one of those machines here and > figure out how to get the PMIC driver working. Oh, I assumed the bottleneck is developer time, not lack of hardware... > Is the model Asus E200HA? Something like this (sorry the information is > in Finnish but the machine should look the same): > > https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone Looks right, mine is E200HA-FD0004TS but I think that just means dark blue color. > Also can you remind me what exactly is not working so we can prioritize? There are reports hibernate isn't working, presumably because the storage is 32MB eMMC. I've never tried. It doesn't support ACPI S3 (suspend-to-RAM). So currently one has to boot+shutdown everytime (or keep it running). 1. There seems to be no way to wake it up after "echo freeze >/sys/power/state". That is the reason for wanting the power button to wake it up. Whether the PB creates an input event is secondary. (the LID also doesn't wake it up, but it creates an input event) 2. I've no idea what would be the power consumption in freeze state, so I guess support for the S0ix states is needed 3. It randomly hangs at boot, often with a message related to i2c timeout. I tried Hans de Goede's patches but it didn't work for me (question is if the semphore address is the same for AXP288 and TI DCove; the DSDT has the _SEM method so the semphore is needed). Everything else is secondary. E.g. there is an ADC driver used for thermal used by the opregion driver, I didn't port it and just implemented like for AXP288 by simple register reads (which might not work for TI). We could fix this later. Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 04:26:18PM +0200, Mika Westerberg wrote: > On Thu, Feb 02, 2017 at 02:52:57PM +0100, Johannes Stezenbach wrote: > > Looking at 0002-GPIO-Adding-AXP288-PMIC-GPIO-driver.patch from > > ProductionKernelQuilts, > > it doesn't seem hard to do the same for the TI PMIC, but it needs > > information > > from the PMIC datasheet for irq and gpio control registers. > > Hopefully you have a patch or at least could provide the information. > > That patch looks like a GPIO driver for DCOVE. Did you try it already? Hell, no. Without datasheets I can't compare if registers are compatible between AXP288 and TI DDOVE (SND9039). Couldn't it damager the hardware if I mess up charger and voltage regulator related registers? And current Linus' tree doesn't have the AXP288 GPIO, and ProductionKernelQuilts doesn't use AXP288 GPIO for TI DCOVE. Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 04:26:18PM +0200, Mika Westerberg wrote: > On Thu, Feb 02, 2017 at 02:52:57PM +0100, Johannes Stezenbach wrote: > > Looking at 0002-GPIO-Adding-AXP288-PMIC-GPIO-driver.patch from > > ProductionKernelQuilts, > > it doesn't seem hard to do the same for the TI PMIC, but it needs > > information > > from the PMIC datasheet for irq and gpio control registers. > > Hopefully you have a patch or at least could provide the information. > > That patch looks like a GPIO driver for DCOVE. Did you try it already? Hell, no. Without datasheets I can't compare if registers are compatible between AXP288 and TI DDOVE (SND9039). Couldn't it damager the hardware if I mess up charger and voltage regulator related registers? And current Linus' tree doesn't have the AXP288 GPIO, and ProductionKernelQuilts doesn't use AXP288 GPIO for TI DCOVE. Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 02:16:39PM +0200, Mika Westerberg wrote: > On Thu, Feb 02, 2017 at 01:35:08PM +0200, Mika Westerberg wrote: > > On Thu, Feb 02, 2017 at 12:12:22PM +0100, Johannes Stezenbach wrote: > > > On Thu, Feb 02, 2017 at 12:31:22PM +0200, Mika Westerberg wrote: > > > > On Thu, Feb 02, 2017 at 10:52:00AM +0100, Johannes Stezenbach wrote: > > > > > OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100) > > > > > Field (GPOP, ByteAcc, NoLock, Preserve) > > > > > { > > > > > Connection ( > > > > > GpioIo (Exclusive, PullDefault, 0x, > > > > > 0x, IoRestrictionOutputOnly, > > > > > "\\_SB.PCI0.I2C7.PMI2", 0x00, > > > > > ResourceConsumer, , > > > > > ) > > > > > { // Pin list > > > > > 0x0020 > > > > > } > > > > > ), > > > > > GMP0, 1, > > > > > ... > > > > > (repeat for many more pins) > > > > > > > > > > I guess it means it uses chv_gpio pins and can't work > > > > > if the GPIO opregion is not registered? > > > > > > > > That is using GPIO pins of the PMI2 device - the PMIC GPIO driver, I > > > > suppose. > > > > > > > > So in addition to the PMIC MFD driver, you need to have a GPIO driver > > > > for Dollar Cove (I guess the quilt patch series included that as well?). > > > > > > Nope, I see it for AX288 but didn't find it for TI DCove. And in > > > current Linus' tree axp288_cells[] doesn't include gpio so > > > I concluded it's not needed... what am I missing? > > > > So reading your DSDT there is that GPIO button array device \_SB.TBAD > > which has one GpioInt() referencing \_SB.PCI0.I2C7.PMI2. I suppose that > > is the power button GPIO. > > > > In order to use that there needs to be a GPIO driver exposing those > > GPIOs to other drivers. So it is definitely needed. > > Actually, looking again the patches you found: > > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch > > Did you try to them both? The latter seems to handle the power button > by talking directly with the PMIC (instead of using a GPIO). Nope, as I've written earlier: > In ProductionKernelQuilts I found > DC-TI-PMIC-disable-power-button-support.patch so I guess it > might not be needed because it's probably handled by ACPI. [ +0.000338] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0 [ +0.000127] ACPI: Power Button [PWRB] ... [ +0.000248] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3 [ +0.000116] ACPI: Power Button [PWRF] And I also have: [ +0.04] soc_button_array INTCFD9:00: GPIO lookup for consumer soc_button_array [ +0.02] soc_button_array INTCFD9:00: using ACPI for GPIO lookup [ +0.03] acpi INTCFD9:00: GPIO: looking up soc_button_array-gpios [ +0.04] acpi INTCFD9:00: GPIO: looking up soc_button_array-gpio [ +0.03] acpi INTCFD9:00: GPIO: looking up 0 in _CRS [ +0.000610] soc_button_array INTCFD9:00: lookup for GPIO soc_button_array failed (repeats for 5 buttons, one of them should succeed) > Let's include the original author (Ramakrishna) as well if we could get > some information from him. Looking at 0002-GPIO-Adding-AXP288-PMIC-GPIO-driver.patch from ProductionKernelQuilts, it doesn't seem hard to do the same for the TI PMIC, but it needs information from the PMIC datasheet for irq and gpio control registers. Hopefully you have a patch or at least could provide the information. Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 02:16:39PM +0200, Mika Westerberg wrote: > On Thu, Feb 02, 2017 at 01:35:08PM +0200, Mika Westerberg wrote: > > On Thu, Feb 02, 2017 at 12:12:22PM +0100, Johannes Stezenbach wrote: > > > On Thu, Feb 02, 2017 at 12:31:22PM +0200, Mika Westerberg wrote: > > > > On Thu, Feb 02, 2017 at 10:52:00AM +0100, Johannes Stezenbach wrote: > > > > > OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100) > > > > > Field (GPOP, ByteAcc, NoLock, Preserve) > > > > > { > > > > > Connection ( > > > > > GpioIo (Exclusive, PullDefault, 0x, > > > > > 0x, IoRestrictionOutputOnly, > > > > > "\\_SB.PCI0.I2C7.PMI2", 0x00, > > > > > ResourceConsumer, , > > > > > ) > > > > > { // Pin list > > > > > 0x0020 > > > > > } > > > > > ), > > > > > GMP0, 1, > > > > > ... > > > > > (repeat for many more pins) > > > > > > > > > > I guess it means it uses chv_gpio pins and can't work > > > > > if the GPIO opregion is not registered? > > > > > > > > That is using GPIO pins of the PMI2 device - the PMIC GPIO driver, I > > > > suppose. > > > > > > > > So in addition to the PMIC MFD driver, you need to have a GPIO driver > > > > for Dollar Cove (I guess the quilt patch series included that as well?). > > > > > > Nope, I see it for AX288 but didn't find it for TI DCove. And in > > > current Linus' tree axp288_cells[] doesn't include gpio so > > > I concluded it's not needed... what am I missing? > > > > So reading your DSDT there is that GPIO button array device \_SB.TBAD > > which has one GpioInt() referencing \_SB.PCI0.I2C7.PMI2. I suppose that > > is the power button GPIO. > > > > In order to use that there needs to be a GPIO driver exposing those > > GPIOs to other drivers. So it is definitely needed. > > Actually, looking again the patches you found: > > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch > > Did you try to them both? The latter seems to handle the power button > by talking directly with the PMIC (instead of using a GPIO). Nope, as I've written earlier: > In ProductionKernelQuilts I found > DC-TI-PMIC-disable-power-button-support.patch so I guess it > might not be needed because it's probably handled by ACPI. [ +0.000338] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0 [ +0.000127] ACPI: Power Button [PWRB] ... [ +0.000248] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3 [ +0.000116] ACPI: Power Button [PWRF] And I also have: [ +0.04] soc_button_array INTCFD9:00: GPIO lookup for consumer soc_button_array [ +0.02] soc_button_array INTCFD9:00: using ACPI for GPIO lookup [ +0.03] acpi INTCFD9:00: GPIO: looking up soc_button_array-gpios [ +0.04] acpi INTCFD9:00: GPIO: looking up soc_button_array-gpio [ +0.03] acpi INTCFD9:00: GPIO: looking up 0 in _CRS [ +0.000610] soc_button_array INTCFD9:00: lookup for GPIO soc_button_array failed (repeats for 5 buttons, one of them should succeed) > Let's include the original author (Ramakrishna) as well if we could get > some information from him. Looking at 0002-GPIO-Adding-AXP288-PMIC-GPIO-driver.patch from ProductionKernelQuilts, it doesn't seem hard to do the same for the TI PMIC, but it needs information from the PMIC datasheet for irq and gpio control registers. Hopefully you have a patch or at least could provide the information. Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 12:31:22PM +0200, Mika Westerberg wrote: > On Thu, Feb 02, 2017 at 10:52:00AM +0100, Johannes Stezenbach wrote: > > OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100) > > Field (GPOP, ByteAcc, NoLock, Preserve) > > { > > Connection ( > > GpioIo (Exclusive, PullDefault, 0x, 0x, > > IoRestrictionOutputOnly, > > "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, > > , > > ) > > { // Pin list > > 0x0020 > > } > > ), > > GMP0, 1, > > ... > > (repeat for many more pins) > > > > I guess it means it uses chv_gpio pins and can't work > > if the GPIO opregion is not registered? > > That is using GPIO pins of the PMI2 device - the PMIC GPIO driver, I > suppose. > > So in addition to the PMIC MFD driver, you need to have a GPIO driver > for Dollar Cove (I guess the quilt patch series included that as well?). Nope, I see it for AX288 but didn't find it for TI DCove. And in current Linus' tree axp288_cells[] doesn't include gpio so I concluded it's not needed... what am I missing? Thanks, Johannes
Re: Cherryview wake up events
On Thu, Feb 02, 2017 at 12:31:22PM +0200, Mika Westerberg wrote: > On Thu, Feb 02, 2017 at 10:52:00AM +0100, Johannes Stezenbach wrote: > > OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100) > > Field (GPOP, ByteAcc, NoLock, Preserve) > > { > > Connection ( > > GpioIo (Exclusive, PullDefault, 0x, 0x, > > IoRestrictionOutputOnly, > > "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, > > , > > ) > > { // Pin list > > 0x0020 > > } > > ), > > GMP0, 1, > > ... > > (repeat for many more pins) > > > > I guess it means it uses chv_gpio pins and can't work > > if the GPIO opregion is not registered? > > That is using GPIO pins of the PMI2 device - the PMIC GPIO driver, I > suppose. > > So in addition to the PMIC MFD driver, you need to have a GPIO driver > for Dollar Cove (I guess the quilt patch series included that as well?). Nope, I see it for AX288 but didn't find it for TI DCove. And in current Linus' tree axp288_cells[] doesn't include gpio so I concluded it's not needed... what am I missing? Thanks, Johannes
Re: Cherryview wake up events
Hi Mika, On Tue, Jan 31, 2017 at 03:37:40PM +0100, Johannes Stezenbach wrote: > - Powerbutton driver seems simple enough, the only specialty > of the TI dcove PB driver is the workarond for lost button > press event after resume. However, I still don't see how > the PB would cause thermal event irqs on E200HA and how the > PMIC driver would change it? In ProductionKernelQuilts I found DC-TI-PMIC-disable-power-button-support.patch so I guess it might not be needed because it's probably handled by ACPI. > I think the mfd driver would be similar to intel_soc_pmic_crc.c, > the dollar_cove_ti_powerbtn.c I would keep instead of merging > it into intel_mid_powerbtn.c. I guess what we need is in > drivers/acpi/pmic/ something similar to intel_pmic_crc.c, > the ProductionKernelQuilts has > 0001-ACPI-Adding-support-for-TI-pmic-opregion.patch. I have preliminary versions of the mfd and opregion driver, while testing I found the GPIO opregion is not registered: Excerpt from DSDT: https://linuxtv.org/~js/e200ha/dsdt.dsl Device (PMI2) { Name (_ADR, Zero) // _ADR: Address Name (_HID, "INT33F5" /* TI PMIC Controller */) // _HID: Hardware ID Name (_CID, "INT33F5" /* TI PMIC Controller */) // _CID: Compatible ID Name (_DDN, "TI PMIC Controller") // _DDN: DOS Device Name Name (_HRV, 0x03) // _HRV: Hardware Revision Name (_UID, One) // _UID: Unique ID Name (_DEP, Package (0x02) // _DEP: Dependencies { I2C7, GPO1 }) Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { Name (SBUF, ResourceTemplate () { I2cSerialBusV2 (0x005E, ControllerInitiated, 0x000F4240, AddressingMode7Bit, "\\_SB.PCI0.I2C7", 0x00, ResourceConsumer, , Exclusive, ) GpioInt (Level, ActiveHigh, Shared, PullDefault, 0x, "\\_SB.GPO1", 0x00, ResourceConsumer, , ) { // Pin list 0x000F } }) Return (SBUF) /* \_SB_.PCI0.I2C7.PMI2._CRS.SBUF */ } ... Name (AVBL, Zero) Name (AVBD, Zero) Name (AVBG, Zero) Method (_REG, 2, NotSerialized) // _REG: Region Availability { If (Arg0 == 0x08) { AVBG = Arg1 } If (Arg0 == 0x8D) { AVBL = Arg1 } If (Arg0 == 0x8C) { AVBD = Arg1 } } acpidbg: \_SB.PCI0.I2C7.PMI2.AVBL Integer 8be7b74d97a8 01 = 0001 \_SB.PCI0.I2C7.PMI2.AVBD Integer 8be7b74d94d8 01 = 0001 \_SB.PCI0.I2C7.PMI2.AVBG Integer 8be7b74d9be0 01 = Any idea about it? devm_gpiochip_add_data() in chv_gpio_probe() indirectly calls acpi_gpiochip_add() which should use _DEP to figure out to call _REG, right? Also PMI2 has OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100) Field (GPOP, ByteAcc, NoLock, Preserve) { Connection ( GpioIo (Exclusive, PullDefault, 0x, 0x, IoRestrictionOutputOnly, "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, , ) { // Pin list 0x0020 } ), GMP0, 1, ... (repeat for many more pins) I guess it means it uses chv_gpio pins and can't work if the GPIO opregion is not registered? FWIW, with the mfd driver, /proc/interrupts has 180: 0 0 0 0 chv-gpio9 TI Dollar Cove I guess the 9 refers to the 10th pin in north_pins[] which is pin 0x000F, right? I boot with "dyndbg=file gpiolib* +p" and get [ +0.012798] acpi INT33F5:00: GPIO: looking up 0 in _CRS [ +0.000214] intel_soc_pmic_i2c i2c-INT33F5:00: GPIO lookup for consumer intel_soc_pmic [ +0.03] intel_soc_pmic_i2c i2c-INT33F5:00: using ACPI for GPIO lookup [ +0.05] acpi INT33F5:00: GPIO: looking up intel_soc_pmic-gpios [ +0.05] acpi INT33F5:00: GPIO: looking up intel_soc_pmic-gpio [ +0.05] acpi INT33F5:00: GPIO: looking up 0 in _
Re: Cherryview wake up events
Hi Mika, On Tue, Jan 31, 2017 at 03:37:40PM +0100, Johannes Stezenbach wrote: > - Powerbutton driver seems simple enough, the only specialty > of the TI dcove PB driver is the workarond for lost button > press event after resume. However, I still don't see how > the PB would cause thermal event irqs on E200HA and how the > PMIC driver would change it? In ProductionKernelQuilts I found DC-TI-PMIC-disable-power-button-support.patch so I guess it might not be needed because it's probably handled by ACPI. > I think the mfd driver would be similar to intel_soc_pmic_crc.c, > the dollar_cove_ti_powerbtn.c I would keep instead of merging > it into intel_mid_powerbtn.c. I guess what we need is in > drivers/acpi/pmic/ something similar to intel_pmic_crc.c, > the ProductionKernelQuilts has > 0001-ACPI-Adding-support-for-TI-pmic-opregion.patch. I have preliminary versions of the mfd and opregion driver, while testing I found the GPIO opregion is not registered: Excerpt from DSDT: https://linuxtv.org/~js/e200ha/dsdt.dsl Device (PMI2) { Name (_ADR, Zero) // _ADR: Address Name (_HID, "INT33F5" /* TI PMIC Controller */) // _HID: Hardware ID Name (_CID, "INT33F5" /* TI PMIC Controller */) // _CID: Compatible ID Name (_DDN, "TI PMIC Controller") // _DDN: DOS Device Name Name (_HRV, 0x03) // _HRV: Hardware Revision Name (_UID, One) // _UID: Unique ID Name (_DEP, Package (0x02) // _DEP: Dependencies { I2C7, GPO1 }) Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { Name (SBUF, ResourceTemplate () { I2cSerialBusV2 (0x005E, ControllerInitiated, 0x000F4240, AddressingMode7Bit, "\\_SB.PCI0.I2C7", 0x00, ResourceConsumer, , Exclusive, ) GpioInt (Level, ActiveHigh, Shared, PullDefault, 0x, "\\_SB.GPO1", 0x00, ResourceConsumer, , ) { // Pin list 0x000F } }) Return (SBUF) /* \_SB_.PCI0.I2C7.PMI2._CRS.SBUF */ } ... Name (AVBL, Zero) Name (AVBD, Zero) Name (AVBG, Zero) Method (_REG, 2, NotSerialized) // _REG: Region Availability { If (Arg0 == 0x08) { AVBG = Arg1 } If (Arg0 == 0x8D) { AVBL = Arg1 } If (Arg0 == 0x8C) { AVBD = Arg1 } } acpidbg: \_SB.PCI0.I2C7.PMI2.AVBL Integer 8be7b74d97a8 01 = 0001 \_SB.PCI0.I2C7.PMI2.AVBD Integer 8be7b74d94d8 01 = 0001 \_SB.PCI0.I2C7.PMI2.AVBG Integer 8be7b74d9be0 01 = Any idea about it? devm_gpiochip_add_data() in chv_gpio_probe() indirectly calls acpi_gpiochip_add() which should use _DEP to figure out to call _REG, right? Also PMI2 has OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100) Field (GPOP, ByteAcc, NoLock, Preserve) { Connection ( GpioIo (Exclusive, PullDefault, 0x, 0x, IoRestrictionOutputOnly, "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, , ) { // Pin list 0x0020 } ), GMP0, 1, ... (repeat for many more pins) I guess it means it uses chv_gpio pins and can't work if the GPIO opregion is not registered? FWIW, with the mfd driver, /proc/interrupts has 180: 0 0 0 0 chv-gpio9 TI Dollar Cove I guess the 9 refers to the 10th pin in north_pins[] which is pin 0x000F, right? I boot with "dyndbg=file gpiolib* +p" and get [ +0.012798] acpi INT33F5:00: GPIO: looking up 0 in _CRS [ +0.000214] intel_soc_pmic_i2c i2c-INT33F5:00: GPIO lookup for consumer intel_soc_pmic [ +0.03] intel_soc_pmic_i2c i2c-INT33F5:00: using ACPI for GPIO lookup [ +0.05] acpi INT33F5:00: GPIO: looking up intel_soc_pmic-gpios [ +0.05] acpi INT33F5:00: GPIO: looking up intel_soc_pmic-gpio [ +0.05] acpi INT33F5:00: GPIO: looking up 0 in _
Re: Cherryview wake up events
Hi Andy and Mika, On Tue, Jan 31, 2017 at 12:05:07AM +0200, Andy Shevchenko wrote: > On Mon, Jan 30, 2017 at 10:57 PM, Johannes Stezenbach <j...@sig21.net> wrote: > > > > I checked the reference source code, my impression is the > > TI Dollar Cove and and AXP288 are completely different hardware. > > Thanks for checking. > > Yes, due to not obvious communication to PMIC. I suppose that the IP > core is quite similar in all of them, the difference is just how OS > and other MCUs in SoC communicate with it. > > So, basically what it means that I2C direct communication is prohibited here. Not sure about that, but I guess this is needed: https://lists.freedesktop.org/archives/intel-gfx/2017-January/117696.html > >> > This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html > > > > Interestingly via this link I found Intel also published > > the TI DCove source in a patch series against an unspecified kernel: > > https://github.com/01org/ProductionKernelQuilts > > specifically > > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch > > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch > > and some more (the series is quite messy). FWIW, now I came across yet another source for this driver: https://android.googlesource.com/kernel/x86/+/android-x86-grant-3.10-marshmallow-mr1-wear-release/drivers/external_drivers/drivers/mfd/intel_pmic/ (but seems to be older) > > For the Asus E200HA I'm not sure if the charger and coulomb > > counter drivers are needed since charging just works and > > the battery status is reported via ACPI. It seems these > > drivers are only for tablets without ACPI support, right? > > Have no idea. > > What that code reminds me is MID family of devices. So, power button > is (reasonable) easy to get support of in that case. > Look into drivers/platform/x86/intel_mid_powerbtn.c. I recently > updated it to support Basin Cove on Intel Edison. You seem to suggest I should try and tackle it myself, which I would do, but for one I don't want to step on Mika's toes, secondly ISTR you indicated you have newer, better source than what is available publicly? If you want me to take it, please let me know which tree to work against and any other suggestions you have. Some more questions: - Powerbutton driver seems simple enough, the only specialty of the TI dcove PB driver is the workarond for lost button press event after resume. However, I still don't see how the PB would cause thermal event irqs on E200HA and how the PMIC driver would change it? - Wakeup from freeze state (E200HA doesn't support suspend / ACPI S3) is only step 1, to make it usable we need S0ix support. Any hints about that? I think the mfd driver would be similar to intel_soc_pmic_crc.c, the dollar_cove_ti_powerbtn.c I would keep instead of merging it into intel_mid_powerbtn.c. I guess what we need is in drivers/acpi/pmic/ something similar to intel_pmic_crc.c, the ProductionKernelQuilts has 0001-ACPI-Adding-support-for-TI-pmic-opregion.patch. Thanks, Johannes
Re: Cherryview wake up events
Hi Andy and Mika, On Tue, Jan 31, 2017 at 12:05:07AM +0200, Andy Shevchenko wrote: > On Mon, Jan 30, 2017 at 10:57 PM, Johannes Stezenbach wrote: > > > > I checked the reference source code, my impression is the > > TI Dollar Cove and and AXP288 are completely different hardware. > > Thanks for checking. > > Yes, due to not obvious communication to PMIC. I suppose that the IP > core is quite similar in all of them, the difference is just how OS > and other MCUs in SoC communicate with it. > > So, basically what it means that I2C direct communication is prohibited here. Not sure about that, but I guess this is needed: https://lists.freedesktop.org/archives/intel-gfx/2017-January/117696.html > >> > This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html > > > > Interestingly via this link I found Intel also published > > the TI DCove source in a patch series against an unspecified kernel: > > https://github.com/01org/ProductionKernelQuilts > > specifically > > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch > > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch > > and some more (the series is quite messy). FWIW, now I came across yet another source for this driver: https://android.googlesource.com/kernel/x86/+/android-x86-grant-3.10-marshmallow-mr1-wear-release/drivers/external_drivers/drivers/mfd/intel_pmic/ (but seems to be older) > > For the Asus E200HA I'm not sure if the charger and coulomb > > counter drivers are needed since charging just works and > > the battery status is reported via ACPI. It seems these > > drivers are only for tablets without ACPI support, right? > > Have no idea. > > What that code reminds me is MID family of devices. So, power button > is (reasonable) easy to get support of in that case. > Look into drivers/platform/x86/intel_mid_powerbtn.c. I recently > updated it to support Basin Cove on Intel Edison. You seem to suggest I should try and tackle it myself, which I would do, but for one I don't want to step on Mika's toes, secondly ISTR you indicated you have newer, better source than what is available publicly? If you want me to take it, please let me know which tree to work against and any other suggestions you have. Some more questions: - Powerbutton driver seems simple enough, the only specialty of the TI dcove PB driver is the workarond for lost button press event after resume. However, I still don't see how the PB would cause thermal event irqs on E200HA and how the PMIC driver would change it? - Wakeup from freeze state (E200HA doesn't support suspend / ACPI S3) is only step 1, to make it usable we need S0ix support. Any hints about that? I think the mfd driver would be similar to intel_soc_pmic_crc.c, the dollar_cove_ti_powerbtn.c I would keep instead of merging it into intel_mid_powerbtn.c. I guess what we need is in drivers/acpi/pmic/ something similar to intel_pmic_crc.c, the ProductionKernelQuilts has 0001-ACPI-Adding-support-for-TI-pmic-opregion.patch. Thanks, Johannes
Re: Cherryview wake up events
On Fri, Jan 27, 2017 at 02:30:58PM +0100, Johannes Stezenbach wrote: > On Fri, Jan 27, 2017 at 03:21:22PM +0200, Andy Shevchenko wrote: > > On Fri, Jan 27, 2017 at 1:38 PM, Johannes Stezenbach <j...@sig21.net> wrote: > > > On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote: > > > > >> Had you tried to add ID to axp20x-i2c.c ? > > > > > > Nope, since I have no idea if the axp and TI hardware is similar. > > > > I think you would give a try. > > I'll check it. I checked the reference source code, my impression is the TI Dollar Cove and and AXP288 are completely different hardware. > > > [5.331709] i2c_designware 808622C1:06: controller timed out > > > > > This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html Interestingly via this link I found Intel also published the TI DCove source in a patch series against an unspecified kernel: https://github.com/01org/ProductionKernelQuilts specifically https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch and some more (the series is quite messy). For the Asus E200HA I'm not sure if the charger and coulomb counter drivers are needed since charging just works and the battery status is reported via ACPI. It seems these drivers are only for tablets without ACPI support, right? Thanks, Johannes
Re: Cherryview wake up events
On Fri, Jan 27, 2017 at 02:30:58PM +0100, Johannes Stezenbach wrote: > On Fri, Jan 27, 2017 at 03:21:22PM +0200, Andy Shevchenko wrote: > > On Fri, Jan 27, 2017 at 1:38 PM, Johannes Stezenbach wrote: > > > On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote: > > > > >> Had you tried to add ID to axp20x-i2c.c ? > > > > > > Nope, since I have no idea if the axp and TI hardware is similar. > > > > I think you would give a try. > > I'll check it. I checked the reference source code, my impression is the TI Dollar Cove and and AXP288 are completely different hardware. > > > [5.331709] i2c_designware 808622C1:06: controller timed out > > > > > This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html Interestingly via this link I found Intel also published the TI DCove source in a patch series against an unspecified kernel: https://github.com/01org/ProductionKernelQuilts specifically https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch and some more (the series is quite messy). For the Asus E200HA I'm not sure if the charger and coulomb counter drivers are needed since charging just works and the battery status is reported via ACPI. It seems these drivers are only for tablets without ACPI support, right? Thanks, Johannes
Re: Cherryview wake up events
On Fri, Jan 27, 2017 at 03:21:22PM +0200, Andy Shevchenko wrote: > On Fri, Jan 27, 2017 at 1:38 PM, Johannes Stezenbach <j...@sig21.net> wrote: > > On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote: > > > And the same info is also in sysfs: > > > > # cat > > /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F4\:00/status > > 0 > > # cat > > /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F5\:00/status > > 15 > > > > The DSDT is still at https://linuxtv.org/~js/e200ha/ > > > >> Had you tried to add ID to axp20x-i2c.c ? > > > > Nope, since I have no idea if the axp and TI hardware is similar. > > I think you would give a try. I'll check it. > > There might be more issues, currently the machine hangs often > > during bootup at random points. I built i915 as a module and > > blacklisted it for autoloading so I can read the last message > > on the console. All I can say is that it is more likely to > > hang when the loglevel is high, i.e. it almost never succeeds > > with "debug" on kernel command line. Sometimes there are > > timeout errors from I2C: > > [4.307189] i2c_designware 808622C1:06: controller timed out > > [5.331709] i2c_designware 808622C1:06: controller timed out > > > > Once it has booted it is running stable. > > This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html Not sure because it happens with i915 module not loaded (currently I load it manually after boot completed). But thanks for the link. Johannes
Re: Cherryview wake up events
On Fri, Jan 27, 2017 at 03:21:22PM +0200, Andy Shevchenko wrote: > On Fri, Jan 27, 2017 at 1:38 PM, Johannes Stezenbach wrote: > > On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote: > > > And the same info is also in sysfs: > > > > # cat > > /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F4\:00/status > > 0 > > # cat > > /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F5\:00/status > > 15 > > > > The DSDT is still at https://linuxtv.org/~js/e200ha/ > > > >> Had you tried to add ID to axp20x-i2c.c ? > > > > Nope, since I have no idea if the axp and TI hardware is similar. > > I think you would give a try. I'll check it. > > There might be more issues, currently the machine hangs often > > during bootup at random points. I built i915 as a module and > > blacklisted it for autoloading so I can read the last message > > on the console. All I can say is that it is more likely to > > hang when the loglevel is high, i.e. it almost never succeeds > > with "debug" on kernel command line. Sometimes there are > > timeout errors from I2C: > > [4.307189] i2c_designware 808622C1:06: controller timed out > > [5.331709] i2c_designware 808622C1:06: controller timed out > > > > Once it has booted it is running stable. > > This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html Not sure because it happens with i915 module not loaded (currently I load it manually after boot completed). But thanks for the link. Johannes
Re: Cherryview wake up events
On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote: > > I'm reading your long thread about the issue. Thanks for taking the time! > > but excluded CONFIG_MFD_AXP20X based on \_SB.PIC0.I2C7.PMI1._STA returning > > 0 in acpidbg, > > but \_SB.PIC0.I2C7.PMI1._STA returns 0xf > > Did you mean PMI2 in the second sentence? Yes, sorry for copy & paste mistake. I just repated to confirm: In acpidbg: - execute \_SB.PCI0.I2C7.PMI1._STA Evaluating \_SB.PCI0.I2C7.PMI1._STA Evaluation of \_SB.PCI0.I2C7.PMI1._STA returned object a14a6742, external buffer length 18 [Integer] = - execute \_SB.PCI0.I2C7.PMI2._STA Evaluating \_SB.PCI0.I2C7.PMI2._STA Evaluation of \_SB.PCI0.I2C7.PMI2._STA returned object a14a6742, external buffer length 18 [Integer] = 000F And the same info is also in sysfs: # cat /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F4\:00/status 0 # cat /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F5\:00/status 15 The DSDT is still at https://linuxtv.org/~js/e200ha/ > Had you tried to add ID to axp20x-i2c.c ? Nope, since I have no idea if the axp and TI hardware is similar. There might be more issues, currently the machine hangs often during bootup at random points. I built i915 as a module and blacklisted it for autoloading so I can read the last message on the console. All I can say is that it is more likely to hang when the loglevel is high, i.e. it almost never succeeds with "debug" on kernel command line. Sometimes there are timeout errors from I2C: [4.307189] i2c_designware 808622C1:06: controller timed out [5.331709] i2c_designware 808622C1:06: controller timed out Once it has booted it is running stable. Thanks, Johannes
Re: Cherryview wake up events
On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote: > > I'm reading your long thread about the issue. Thanks for taking the time! > > but excluded CONFIG_MFD_AXP20X based on \_SB.PIC0.I2C7.PMI1._STA returning > > 0 in acpidbg, > > but \_SB.PIC0.I2C7.PMI1._STA returns 0xf > > Did you mean PMI2 in the second sentence? Yes, sorry for copy & paste mistake. I just repated to confirm: In acpidbg: - execute \_SB.PCI0.I2C7.PMI1._STA Evaluating \_SB.PCI0.I2C7.PMI1._STA Evaluation of \_SB.PCI0.I2C7.PMI1._STA returned object a14a6742, external buffer length 18 [Integer] = - execute \_SB.PCI0.I2C7.PMI2._STA Evaluating \_SB.PCI0.I2C7.PMI2._STA Evaluation of \_SB.PCI0.I2C7.PMI2._STA returned object a14a6742, external buffer length 18 [Integer] = 000F And the same info is also in sysfs: # cat /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F4\:00/status 0 # cat /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F5\:00/status 15 The DSDT is still at https://linuxtv.org/~js/e200ha/ > Had you tried to add ID to axp20x-i2c.c ? Nope, since I have no idea if the axp and TI hardware is similar. There might be more issues, currently the machine hangs often during bootup at random points. I built i915 as a module and blacklisted it for autoloading so I can read the last message on the console. All I can say is that it is more likely to hang when the loglevel is high, i.e. it almost never succeeds with "debug" on kernel command line. Sometimes there are timeout errors from I2C: [4.307189] i2c_designware 808622C1:06: controller timed out [5.331709] i2c_designware 808622C1:06: controller timed out Once it has booted it is running stable. Thanks, Johannes
Re: Cherryview wake up events
On Tue, Jan 24, 2017 at 04:28:29PM +0200, Andy Shevchenko wrote: > They probably release just almost all One Big Ugly patch from official > BSP, which by some reason, includes all Intel MID SoCs, Baytrail. > I think I know how Dollar Cove related code ended up there. But that > all stuff is a total mess. I agree. Probably it was a mistake to bring up this code here. Let me try to go back two steps: Could you please let me know if there is any progress in mainlining the TI Dollar Cove PMIC and related drivers? Is there a schedule? After waiting for four months I'm actually getting impatient because by now the Cherryview based hardware seems to go out of production and I fear the mainlining might never happen. Thanks, Johannes
Re: Cherryview wake up events
On Tue, Jan 24, 2017 at 04:28:29PM +0200, Andy Shevchenko wrote: > They probably release just almost all One Big Ugly patch from official > BSP, which by some reason, includes all Intel MID SoCs, Baytrail. > I think I know how Dollar Cove related code ended up there. But that > all stuff is a total mess. I agree. Probably it was a mistake to bring up this code here. Let me try to go back two steps: Could you please let me know if there is any progress in mainlining the TI Dollar Cove PMIC and related drivers? Is there a schedule? After waiting for four months I'm actually getting impatient because by now the Cherryview based hardware seems to go out of production and I fear the mainlining might never happen. Thanks, Johannes
Re: Cherryview wake up events
On Tue, Jan 24, 2017 at 01:14:16PM +0200, Andy Shevchenko wrote: > On Tue, Jan 24, 2017 at 11:41 AM, Johannes Stezenbach <j...@sig21.net> wrote: > > Meanwhile I found out the TI PMIC and power button drivers > > has been published as part of the Asus ZenFone Zoom (ZX551ML) > > Android kernel code drop (based on linux-3.10.x): > > > > https://www.asus.com/support/Download/39/1/0/26/BXbNqJplzZiLmk6G/32/ > > > > Please let me know if there is anything I could do > > to help get it mainlined soon. > > AFAIK ASuS Zenfone 2 (Intel based) series uses Intel Moorefield. It > has ShadyCove PMIC. So Asus released more than they needed. I confirmed their source drop contains the TI Dollar Cove driver (dollar_cove_ti.c). iPreviously I searched for Android devices using CherryView but the only one I could find is Xioami MiPad 2 and it's released kernel source doesn't contain the driver. Anyway, let me know if I can help to get it into mainline soon. Johannes
Re: Cherryview wake up events
On Tue, Jan 24, 2017 at 01:14:16PM +0200, Andy Shevchenko wrote: > On Tue, Jan 24, 2017 at 11:41 AM, Johannes Stezenbach wrote: > > Meanwhile I found out the TI PMIC and power button drivers > > has been published as part of the Asus ZenFone Zoom (ZX551ML) > > Android kernel code drop (based on linux-3.10.x): > > > > https://www.asus.com/support/Download/39/1/0/26/BXbNqJplzZiLmk6G/32/ > > > > Please let me know if there is anything I could do > > to help get it mainlined soon. > > AFAIK ASuS Zenfone 2 (Intel based) series uses Intel Moorefield. It > has ShadyCove PMIC. So Asus released more than they needed. I confirmed their source drop contains the TI Dollar Cove driver (dollar_cove_ti.c). iPreviously I searched for Android devices using CherryView but the only one I could find is Xioami MiPad 2 and it's released kernel source doesn't contain the driver. Anyway, let me know if I can help to get it into mainline soon. Johannes
Re: Cherryview wake up events
Hi, On Mon, Dec 05, 2016 at 01:06:08PM +0200, Mika Westerberg wrote: > On Sun, Dec 04, 2016 at 07:52:19PM +0100, Johannes Stezenbach wrote: > > On Wed, Oct 05, 2016 at 04:05:11PM +0300, Mika Westerberg wrote: > > > On Wed, Oct 05, 2016 at 02:46:48PM +0200, Johannes Stezenbach wrote: > > > > On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote: > > > > > David (CC'd) is working on getting the Dollar Cove PMIC driver > > > > > upstreamed to the mainline kernel. > > > > > > > > May I ask when to expect a patch? I'm ready if you > > > > have something to test, even if it's not in > > > > shape for mainline yet. > > > > > > It typically takes quite some time to get all the legal stuff done > > > before the code can be published. And if people are busy with other > > > things it takes even more time. > > > > > > So please be patient, it will happen sooner or later ;-) > > > > I don't want to nag, but just so it doesn't drop off > > the TODO list due to "lack of interest": What's the > > status? Will Santa bring the the TI Dollar Cove PMIC driver? > > David, do you have any estimate? Meanwhile I found out the TI PMIC and power button drivers has been published as part of the Asus ZenFone Zoom (ZX551ML) Android kernel code drop (based on linux-3.10.x): https://www.asus.com/support/Download/39/1/0/26/BXbNqJplzZiLmk6G/32/ Please let me know if there is anything I could do to help get it mainlined soon. Thanks, Johannes
Re: Cherryview wake up events
Hi, On Mon, Dec 05, 2016 at 01:06:08PM +0200, Mika Westerberg wrote: > On Sun, Dec 04, 2016 at 07:52:19PM +0100, Johannes Stezenbach wrote: > > On Wed, Oct 05, 2016 at 04:05:11PM +0300, Mika Westerberg wrote: > > > On Wed, Oct 05, 2016 at 02:46:48PM +0200, Johannes Stezenbach wrote: > > > > On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote: > > > > > David (CC'd) is working on getting the Dollar Cove PMIC driver > > > > > upstreamed to the mainline kernel. > > > > > > > > May I ask when to expect a patch? I'm ready if you > > > > have something to test, even if it's not in > > > > shape for mainline yet. > > > > > > It typically takes quite some time to get all the legal stuff done > > > before the code can be published. And if people are busy with other > > > things it takes even more time. > > > > > > So please be patient, it will happen sooner or later ;-) > > > > I don't want to nag, but just so it doesn't drop off > > the TODO list due to "lack of interest": What's the > > status? Will Santa bring the the TI Dollar Cove PMIC driver? > > David, do you have any estimate? Meanwhile I found out the TI PMIC and power button drivers has been published as part of the Asus ZenFone Zoom (ZX551ML) Android kernel code drop (based on linux-3.10.x): https://www.asus.com/support/Download/39/1/0/26/BXbNqJplzZiLmk6G/32/ Please let me know if there is anything I could do to help get it mainlined soon. Thanks, Johannes
Re: Cherryview wake up events
Hi, On Wed, Oct 05, 2016 at 04:05:11PM +0300, Mika Westerberg wrote: > On Wed, Oct 05, 2016 at 02:46:48PM +0200, Johannes Stezenbach wrote: > > On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote: > > > David (CC'd) is working on getting the Dollar Cove PMIC driver > > > upstreamed to the mainline kernel. > > > > May I ask when to expect a patch? I'm ready if you > > have something to test, even if it's not in > > shape for mainline yet. > > It typically takes quite some time to get all the legal stuff done > before the code can be published. And if people are busy with other > things it takes even more time. > > So please be patient, it will happen sooner or later ;-) I don't want to nag, but just so it doesn't drop off the TODO list due to "lack of interest": What's the status? Will Santa bring the the TI Dollar Cove PMIC driver? While I'm at it, I also have questions about S0ix support in Linux which I didn't find answers to by web search. Does S0ix depend on the PMIC driver? And will it be used during run time or only in "sleep" state (which would mean "echo freeze >/sys/power/state" since ACPI S3 isn't supported)? Now all I know is it doesn't seem to be used so far (running 4.9.0-rc7+): /sys/kernel/debug/pmc_atom/sleep_state:S0IR Residency: 0us /sys/kernel/debug/pmc_atom/sleep_state:S0I1 Residency: 0us /sys/kernel/debug/pmc_atom/sleep_state:S0I2 Residency: 0us /sys/kernel/debug/pmc_atom/sleep_state:S0I3 Residency: 0us /sys/kernel/debug/pmc_atom/sleep_state:S0 Residency: 160934496us Thanks, Johannes
Re: Cherryview wake up events
Hi, On Wed, Oct 05, 2016 at 04:05:11PM +0300, Mika Westerberg wrote: > On Wed, Oct 05, 2016 at 02:46:48PM +0200, Johannes Stezenbach wrote: > > On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote: > > > David (CC'd) is working on getting the Dollar Cove PMIC driver > > > upstreamed to the mainline kernel. > > > > May I ask when to expect a patch? I'm ready if you > > have something to test, even if it's not in > > shape for mainline yet. > > It typically takes quite some time to get all the legal stuff done > before the code can be published. And if people are busy with other > things it takes even more time. > > So please be patient, it will happen sooner or later ;-) I don't want to nag, but just so it doesn't drop off the TODO list due to "lack of interest": What's the status? Will Santa bring the the TI Dollar Cove PMIC driver? While I'm at it, I also have questions about S0ix support in Linux which I didn't find answers to by web search. Does S0ix depend on the PMIC driver? And will it be used during run time or only in "sleep" state (which would mean "echo freeze >/sys/power/state" since ACPI S3 isn't supported)? Now all I know is it doesn't seem to be used so far (running 4.9.0-rc7+): /sys/kernel/debug/pmc_atom/sleep_state:S0IR Residency: 0us /sys/kernel/debug/pmc_atom/sleep_state:S0I1 Residency: 0us /sys/kernel/debug/pmc_atom/sleep_state:S0I2 Residency: 0us /sys/kernel/debug/pmc_atom/sleep_state:S0I3 Residency: 0us /sys/kernel/debug/pmc_atom/sleep_state:S0 Residency: 160934496us Thanks, Johannes
Re: [PATCH v2 18/31] gp8psk: don't do DMA on stack
On Sun, Nov 06, 2016 at 11:51:14AM -0800, VDR User wrote: > I applied this patch to the 4.8.4 kernel driver (that I'm currently > running) and it caused nothing but "frontend 0/0 timed out while > tuning". Is there another patch that should be used in conjunction > with this? If not, this patch breaks the gp8psk driver. > > Thanks. Thanks for testing. "If it's not tested it's broken"... > On Tue, Oct 11, 2016 at 3:09 AM, Mauro Carvalho Chehab >wrote: > > index 5d0384dd45b5..fa215ad37f7b 100644 > > --- a/drivers/media/usb/dvb-usb/gp8psk.c > > +++ b/drivers/media/usb/dvb-usb/gp8psk.c > > int gp8psk_usb_in_op(struct dvb_usb_device *d, u8 req, u16 value, u16 > > index, u8 *b, int blen) > > { > > + struct gp8psk_state *st = d->priv; > > int ret = 0,try = 0; > > > > if ((ret = mutex_lock_interruptible(>usb_mutex))) > > return ret; > > > > while (ret >= 0 && ret != blen && try < 3) { > > + memcpy(st->data, b, blen); > > ret = usb_control_msg(d->udev, > > usb_rcvctrlpipe(d->udev,0), > > req, > > USB_TYPE_VENDOR | USB_DIR_IN, > > - value,index,b,blen, > > + value, index, st->data, blen, > > 2000); I guess for usb_in the memcpy should be after the usb_control_msg and from st->data to b. Johannes
Re: [PATCH v2 18/31] gp8psk: don't do DMA on stack
On Sun, Nov 06, 2016 at 11:51:14AM -0800, VDR User wrote: > I applied this patch to the 4.8.4 kernel driver (that I'm currently > running) and it caused nothing but "frontend 0/0 timed out while > tuning". Is there another patch that should be used in conjunction > with this? If not, this patch breaks the gp8psk driver. > > Thanks. Thanks for testing. "If it's not tested it's broken"... > On Tue, Oct 11, 2016 at 3:09 AM, Mauro Carvalho Chehab > wrote: > > index 5d0384dd45b5..fa215ad37f7b 100644 > > --- a/drivers/media/usb/dvb-usb/gp8psk.c > > +++ b/drivers/media/usb/dvb-usb/gp8psk.c > > int gp8psk_usb_in_op(struct dvb_usb_device *d, u8 req, u16 value, u16 > > index, u8 *b, int blen) > > { > > + struct gp8psk_state *st = d->priv; > > int ret = 0,try = 0; > > > > if ((ret = mutex_lock_interruptible(>usb_mutex))) > > return ret; > > > > while (ret >= 0 && ret != blen && try < 3) { > > + memcpy(st->data, b, blen); > > ret = usb_control_msg(d->udev, > > usb_rcvctrlpipe(d->udev,0), > > req, > > USB_TYPE_VENDOR | USB_DIR_IN, > > - value,index,b,blen, > > + value, index, st->data, blen, > > 2000); I guess for usb_in the memcpy should be after the usb_control_msg and from st->data to b. Johannes
Re: [PATCH v2 02/31] cinergyT2-core: don't do DMA on stack
On Tue, Oct 11, 2016 at 07:09:17AM -0300, Mauro Carvalho Chehab wrote: > --- a/drivers/media/usb/dvb-usb/cinergyT2-core.c > +++ b/drivers/media/usb/dvb-usb/cinergyT2-core.c > @@ -41,6 +41,8 @@ DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nr); > > struct cinergyt2_state { > u8 rc_counter; > + unsigned char data[64]; > + struct mutex data_mutex; > }; Sometimes my thinking is slow but it just occured to me that this creates a potential issue with cache line sharing. On an architecture which manages cache coherence in software (ARM, MIPS etc.) a write to e.g. rc_counter in this example would dirty the cache line, and a later writeback from the cache could overwrite parts of data[] which was received via DMA. In contrast, if the DMA buffer is allocated seperately via kmalloc it is guaranteed to be safe wrt cache line sharing. (see bottom of Documentation/DMA-API-HOWTO.txt). But of course DMA on stack also had the same issue and no one ever noticed so it's apparently not critical... Johannes
Re: [PATCH v2 02/31] cinergyT2-core: don't do DMA on stack
On Tue, Oct 11, 2016 at 07:09:17AM -0300, Mauro Carvalho Chehab wrote: > --- a/drivers/media/usb/dvb-usb/cinergyT2-core.c > +++ b/drivers/media/usb/dvb-usb/cinergyT2-core.c > @@ -41,6 +41,8 @@ DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nr); > > struct cinergyt2_state { > u8 rc_counter; > + unsigned char data[64]; > + struct mutex data_mutex; > }; Sometimes my thinking is slow but it just occured to me that this creates a potential issue with cache line sharing. On an architecture which manages cache coherence in software (ARM, MIPS etc.) a write to e.g. rc_counter in this example would dirty the cache line, and a later writeback from the cache could overwrite parts of data[] which was received via DMA. In contrast, if the DMA buffer is allocated seperately via kmalloc it is guaranteed to be safe wrt cache line sharing. (see bottom of Documentation/DMA-API-HOWTO.txt). But of course DMA on stack also had the same issue and no one ever noticed so it's apparently not critical... Johannes
Re: Problem with VMAP_STACK=y
On Wed, Oct 05, 2016 at 06:04:50AM -0300, Mauro Carvalho Chehab wrote: > static int cinergyt2_frontend_attach(struct dvb_usb_adapter *adap) > { > - char query[] = { CINERGYT2_EP1_GET_FIRMWARE_VERSION }; > - char state[3]; > + struct dvb_usb_device *d = adap->dev; > + struct cinergyt2_state *st = d->priv; > int ret; > > adap->fe_adap[0].fe = cinergyt2_fe_attach(adap->dev); > > - ret = dvb_usb_generic_rw(adap->dev, query, sizeof(query), state, > - sizeof(state), 0); it seems to miss this: st->data[0] = CINERGYT2_EP1_GET_FIRMWARE_VERSION; > + ret = dvb_usb_generic_rw(d, st->data, 1, st->data, 3, 0); > if (ret < 0) { > deb_rc("cinergyt2_power_ctrl() Failed to retrieve sleep " > "state info\n"); > @@ -141,13 +147,14 @@ static int repeatable_keys[] = { > static int cinergyt2_rc_query(struct dvb_usb_device *d, u32 *event, int > *state) > { > struct cinergyt2_state *st = d->priv; > - u8 key[5] = {0, 0, 0, 0, 0}, cmd = CINERGYT2_EP1_GET_RC_EVENTS; > int i; > > *state = REMOTE_NO_KEY_PRESSED; > > - dvb_usb_generic_rw(d, , 1, key, sizeof(key), 0); > - if (key[4] == 0xff) { > + st->data[0] = CINERGYT2_EP1_SLEEP_MODE; should probably be st->data[0] = CINERGYT2_EP1_GET_RC_EVENTS; > + > + dvb_usb_generic_rw(d, st->data, 1, st->data, 5, 0); HTH, Johannes
Re: Problem with VMAP_STACK=y
On Wed, Oct 05, 2016 at 06:04:50AM -0300, Mauro Carvalho Chehab wrote: > static int cinergyt2_frontend_attach(struct dvb_usb_adapter *adap) > { > - char query[] = { CINERGYT2_EP1_GET_FIRMWARE_VERSION }; > - char state[3]; > + struct dvb_usb_device *d = adap->dev; > + struct cinergyt2_state *st = d->priv; > int ret; > > adap->fe_adap[0].fe = cinergyt2_fe_attach(adap->dev); > > - ret = dvb_usb_generic_rw(adap->dev, query, sizeof(query), state, > - sizeof(state), 0); it seems to miss this: st->data[0] = CINERGYT2_EP1_GET_FIRMWARE_VERSION; > + ret = dvb_usb_generic_rw(d, st->data, 1, st->data, 3, 0); > if (ret < 0) { > deb_rc("cinergyt2_power_ctrl() Failed to retrieve sleep " > "state info\n"); > @@ -141,13 +147,14 @@ static int repeatable_keys[] = { > static int cinergyt2_rc_query(struct dvb_usb_device *d, u32 *event, int > *state) > { > struct cinergyt2_state *st = d->priv; > - u8 key[5] = {0, 0, 0, 0, 0}, cmd = CINERGYT2_EP1_GET_RC_EVENTS; > int i; > > *state = REMOTE_NO_KEY_PRESSED; > > - dvb_usb_generic_rw(d, , 1, key, sizeof(key), 0); > - if (key[4] == 0xff) { > + st->data[0] = CINERGYT2_EP1_SLEEP_MODE; should probably be st->data[0] = CINERGYT2_EP1_GET_RC_EVENTS; > + > + dvb_usb_generic_rw(d, st->data, 1, st->data, 5, 0); HTH, Johannes
Re: Cherryview wake up events
On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote: > David (CC'd) is working on getting the Dollar Cove PMIC driver > upstreamed to the mainline kernel. May I ask when to expect a patch? I'm ready if you have something to test, even if it's not in shape for mainline yet. Thanks, Johannes
Re: Cherryview wake up events
On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote: > David (CC'd) is working on getting the Dollar Cove PMIC driver > upstreamed to the mainline kernel. May I ask when to expect a patch? I'm ready if you have something to test, even if it's not in shape for mainline yet. Thanks, Johannes
Re: Cherryview wake up events
On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote: > On Wed, Sep 21, 2016 at 11:16:35AM +0200, Johannes Stezenbach wrote: > > There is an ADBG ("TI_DCOVE") in PMI2._STA, so Dollar Cove > > sounds like a good guess. > > David (CC'd) is working on getting the Dollar Cove PMIC driver > upstreamed to the mainline kernel. Excellent news! Repeating essential info to avoid any confusion, the PMIC is Device (PMI2) { Name (_ADR, Zero) // _ADR: Address Name (_HID, "INT33F5" /* TI PMIC Controller */) // _HID: Hardware ID Name (_CID, "INT33F5" /* TI PMIC Controller */) // _CID: Compatible ID Name (_DDN, "TI PMIC Controller") // _DDN: DOS Device Name (Because the INT33F4 XPOWER PMIC also has ADBG ("XPWR_DCOVE") so I'm not sure "Dollar Cove" is a unique name.) I put the Asus E200HA DSDT at https://linuxtv.org/~js/e200ha/ Thanks, Johannes
Re: Cherryview wake up events
On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote: > On Wed, Sep 21, 2016 at 11:16:35AM +0200, Johannes Stezenbach wrote: > > There is an ADBG ("TI_DCOVE") in PMI2._STA, so Dollar Cove > > sounds like a good guess. > > David (CC'd) is working on getting the Dollar Cove PMIC driver > upstreamed to the mainline kernel. Excellent news! Repeating essential info to avoid any confusion, the PMIC is Device (PMI2) { Name (_ADR, Zero) // _ADR: Address Name (_HID, "INT33F5" /* TI PMIC Controller */) // _HID: Hardware ID Name (_CID, "INT33F5" /* TI PMIC Controller */) // _CID: Compatible ID Name (_DDN, "TI PMIC Controller") // _DDN: DOS Device Name (Because the INT33F4 XPOWER PMIC also has ADBG ("XPWR_DCOVE") so I'm not sure "Dollar Cove" is a unique name.) I put the Asus E200HA DSDT at https://linuxtv.org/~js/e200ha/ Thanks, Johannes
Re: Cherryview wake up events
On Wed, Sep 21, 2016 at 12:06:14PM +0300, Mika Westerberg wrote: > On Tue, Sep 20, 2016 at 11:11:53PM +0200, Johannes Stezenbach wrote: > > Or it is because the PNP0C40 device depends on GpioInt from PMIC > > which isn't available... > > > > Method (_CRS, 0, NotSerialized) // _CRS: Current Resource > > Settings > > { > > Name (CBUF, ResourceTemplate () > > { > > GpioInt (Edge, ActiveBoth, ExclusiveAndWake, PullUp, > > 0x0BB8, > > "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, , > > ) > > { // Pin list > > 0x0016 > > } > > }) > > Return (CBUF) /* \_SB_.TBAD._CRS.CBUF */ > > } > > Most likely this is the reason. I'll try to find if we have an existing > driver for this PMIC somewhere. I guess this is the Dollar Cove which is > successor of Crystal Cove IIRC which is already supported by the > mainline kernel. There is an ADBG ("TI_DCOVE") in PMI2._STA, so Dollar Cove sounds like a good guess. Thanks, Johannes
Re: Cherryview wake up events
On Wed, Sep 21, 2016 at 12:06:14PM +0300, Mika Westerberg wrote: > On Tue, Sep 20, 2016 at 11:11:53PM +0200, Johannes Stezenbach wrote: > > Or it is because the PNP0C40 device depends on GpioInt from PMIC > > which isn't available... > > > > Method (_CRS, 0, NotSerialized) // _CRS: Current Resource > > Settings > > { > > Name (CBUF, ResourceTemplate () > > { > > GpioInt (Edge, ActiveBoth, ExclusiveAndWake, PullUp, > > 0x0BB8, > > "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, , > > ) > > { // Pin list > > 0x0016 > > } > > }) > > Return (CBUF) /* \_SB_.TBAD._CRS.CBUF */ > > } > > Most likely this is the reason. I'll try to find if we have an existing > driver for this PMIC somewhere. I guess this is the Dollar Cove which is > successor of Crystal Cove IIRC which is already supported by the > mainline kernel. There is an ADBG ("TI_DCOVE") in PMI2._STA, so Dollar Cove sounds like a good guess. Thanks, Johannes
Re: Cherryview wake up events
On Tue, Sep 20, 2016 at 05:59:43PM +0200, Johannes Stezenbach wrote: > On Tue, Sep 20, 2016 at 01:40:14PM +0300, Mika Westerberg wrote: > > If yes, it probably does not have the normal Fixed power button but > > instead it has something called "Windows button array device" with > > _HID/_CID of PNP0C40. Looking at your dsdt.dsl, this looks to be the > > case. > > > > That device is driven by soc_button_array.c driver which can be enabled > > with CONFIG_KEYBOARD_GPIO=y and CONFIG_INPUT_SOC_BUTTON_ARRAY=y. Can you > > check if you have that enabled already? > > > > You should actually see it in /proc/interrupts with names like "power" > > and so on. > > I added CONFIG_INPUT_SOC_BUTTON_ARRAY=y, but no joy. > Maybe because the _HID is INTCFD9, only _CID is PNP0C40? > It also has a _DSM with UUID dfbcf3c5-e7a5-44e6-9c1f-29c76f6e059c. Or it is because the PNP0C40 device depends on GpioInt from PMIC which isn't available... Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { Name (CBUF, ResourceTemplate () { GpioInt (Edge, ActiveBoth, ExclusiveAndWake, PullUp, 0x0BB8, "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, , ) { // Pin list 0x0016 } }) Return (CBUF) /* \_SB_.TBAD._CRS.CBUF */ } Thanks, Johannes
Re: Cherryview wake up events
On Tue, Sep 20, 2016 at 05:59:43PM +0200, Johannes Stezenbach wrote: > On Tue, Sep 20, 2016 at 01:40:14PM +0300, Mika Westerberg wrote: > > If yes, it probably does not have the normal Fixed power button but > > instead it has something called "Windows button array device" with > > _HID/_CID of PNP0C40. Looking at your dsdt.dsl, this looks to be the > > case. > > > > That device is driven by soc_button_array.c driver which can be enabled > > with CONFIG_KEYBOARD_GPIO=y and CONFIG_INPUT_SOC_BUTTON_ARRAY=y. Can you > > check if you have that enabled already? > > > > You should actually see it in /proc/interrupts with names like "power" > > and so on. > > I added CONFIG_INPUT_SOC_BUTTON_ARRAY=y, but no joy. > Maybe because the _HID is INTCFD9, only _CID is PNP0C40? > It also has a _DSM with UUID dfbcf3c5-e7a5-44e6-9c1f-29c76f6e059c. Or it is because the PNP0C40 device depends on GpioInt from PMIC which isn't available... Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { Name (CBUF, ResourceTemplate () { GpioInt (Edge, ActiveBoth, ExclusiveAndWake, PullUp, 0x0BB8, "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, , ) { // Pin list 0x0016 } }) Return (CBUF) /* \_SB_.TBAD._CRS.CBUF */ } Thanks, Johannes
Re: Cherryview wake up events
On Tue, Sep 20, 2016 at 01:40:14PM +0300, Mika Westerberg wrote: > Can you check if you have: > > Hardware Reduced (V5) : 1 > > in that FADT table? Nope, it is "Hardware Reduced (V5) : 0". Now the FADT is also at https://linuxtv.org/~js/e200ha/ > If yes, it probably does not have the normal Fixed power button but > instead it has something called "Windows button array device" with > _HID/_CID of PNP0C40. Looking at your dsdt.dsl, this looks to be the > case. > > That device is driven by soc_button_array.c driver which can be enabled > with CONFIG_KEYBOARD_GPIO=y and CONFIG_INPUT_SOC_BUTTON_ARRAY=y. Can you > check if you have that enabled already? > > You should actually see it in /proc/interrupts with names like "power" > and so on. I added CONFIG_INPUT_SOC_BUTTON_ARRAY=y, but no joy. Maybe because the _HID is INTCFD9, only _CID is PNP0C40? It also has a _DSM with UUID dfbcf3c5-e7a5-44e6-9c1f-29c76f6e059c. BTW, lsinput already lists two "Power Button" devices, phys: "PNP0C0C/button/input0" phys: "LNXPWRBN/button/input0" None of them generates events in input-events. Thanks, Johannes
Re: Cherryview wake up events
On Tue, Sep 20, 2016 at 01:40:14PM +0300, Mika Westerberg wrote: > Can you check if you have: > > Hardware Reduced (V5) : 1 > > in that FADT table? Nope, it is "Hardware Reduced (V5) : 0". Now the FADT is also at https://linuxtv.org/~js/e200ha/ > If yes, it probably does not have the normal Fixed power button but > instead it has something called "Windows button array device" with > _HID/_CID of PNP0C40. Looking at your dsdt.dsl, this looks to be the > case. > > That device is driven by soc_button_array.c driver which can be enabled > with CONFIG_KEYBOARD_GPIO=y and CONFIG_INPUT_SOC_BUTTON_ARRAY=y. Can you > check if you have that enabled already? > > You should actually see it in /proc/interrupts with names like "power" > and so on. I added CONFIG_INPUT_SOC_BUTTON_ARRAY=y, but no joy. Maybe because the _HID is INTCFD9, only _CID is PNP0C40? It also has a _DSM with UUID dfbcf3c5-e7a5-44e6-9c1f-29c76f6e059c. BTW, lsinput already lists two "Power Button" devices, phys: "PNP0C0C/button/input0" phys: "LNXPWRBN/button/input0" None of them generates events in input-events. Thanks, Johannes
Re: Cherryview wake up events
On Tue, Sep 20, 2016 at 12:18:40PM +0300, Mika Westerberg wrote: > On Mon, Sep 19, 2016 at 10:36:22PM +0200, Johannes Stezenbach wrote: > > Now my question is, is this pin 0x004E the same as this > > in /proc/interrupts which fires on LID event? > > > > 158: 2 0 0 0 chv-gpio 43 ACPI:Event > > Yes, it is that one and it triggers \_SB.GPO0._E4E() method to be called > whenever low edge is detected on the GPIO line. This method then handles > many things depending on what the AML code reads from ^^PCI0.I2C1.ENID > notifying the power button device (PWRB) among other things. Thanks for confirmation, but it circles back to the question how to map the numbers. Since the document that describes it is not public, it would be useful if you could add comments to pinctrl-cherryview.c that describes it. Or did I just miss something? > I suppose you already have CONFIG_ACPI_I2C_OPREGION=y in your .config? > That allows the AML code to access the I2C bus using the I2C driver. > > > The FADT has > > Control Method Power Button (V1) : 0 > > Control Method Sleep Button (V1) : 1 > > > > PWRBTN_EN in PM1 is set. But PWRBTN press causes thermal irq. > > Yeah, it uses control method power button (PNP0C0C) and ACPI GPIO event > to trigger changes in that. I'm confused again because I thought "Control Method Power Button (V1) : 0" means it is a fixed power button, however the DSDT also has Device (PWRB) { Name (_HID, EisaId ("PNP0C0C") /* Power Button Device */) // _HID: Hardware ID } Device (SLPB) { Name (_HID, EisaId ("PNP0C0E") /* Sleep Button Device */) // _HID: Hardware ID } > > No SCI (irq 9) is ever generated, except by writing to the > > BIOS_RLS bit in SMI_EN register (IO port 0x430). > > > > GPE block addresses in FADT are 0. GPE0a_EN register (IO 0x428) > > is set to 0x6000 (TCO_EN + PME_B0_EN, but none of the GPIO enables). > > > > Any advice how to continue? > > Please check that you have that CONFIG_ACPI_I2C_OPREGION=y and > CONFIG_MFD_AXP20X=y. > > You should see the ACPI:Event interrupt count increasing in > /proc/interrups when you press power button. When that works then we can > start thinking about adding wake support :) I had CONFIG_ACPI_I2C_OPREGION=y but excluded CONFIG_MFD_AXP20X based on \_SB.PIC0.I2C7.PMI1._STA returning 0 in acpidbg, Device (PMI1) { Name (_ADR, Zero) // _ADR: Address Name (_HID, "INT33F4" /* XPOWER PMIC Controller */) // _HID: Hardware ID Name (_CID, "INT33F4" /* XPOWER PMIC Controller */) // _CID: Compatible ID Name (_DDN, "XPOWER PMIC Controller") // _DDN: DOS Device Name but \_SB.PIC0.I2C7.PMI1._STA returns 0xf Device (PMI2) { Name (_ADR, Zero) // _ADR: Address Name (_HID, "INT33F5" /* TI PMIC Controller */) // _HID: Hardware ID Name (_CID, "INT33F5" /* TI PMIC Controller */) // _CID: Compatible ID Name (_DDN, "TI PMIC Controller") // _DDN: DOS Device Name So I tried CONFIG_MFD_AXP20X=y anyway, but as expected: no change. Since TI doesn't even have a product page for the SND9039 (only a few references in TI support forum can be found), I'm not sure what can be done. So maybe a better short term goal would be to get wakeup by LID working. However, I still wonder why the power button can trigger a thermal irq, is it related to the PMIC? I couldn't find out where the thermal irq is routed. Thanks, Johannes
Re: Cherryview wake up events
On Tue, Sep 20, 2016 at 12:18:40PM +0300, Mika Westerberg wrote: > On Mon, Sep 19, 2016 at 10:36:22PM +0200, Johannes Stezenbach wrote: > > Now my question is, is this pin 0x004E the same as this > > in /proc/interrupts which fires on LID event? > > > > 158: 2 0 0 0 chv-gpio 43 ACPI:Event > > Yes, it is that one and it triggers \_SB.GPO0._E4E() method to be called > whenever low edge is detected on the GPIO line. This method then handles > many things depending on what the AML code reads from ^^PCI0.I2C1.ENID > notifying the power button device (PWRB) among other things. Thanks for confirmation, but it circles back to the question how to map the numbers. Since the document that describes it is not public, it would be useful if you could add comments to pinctrl-cherryview.c that describes it. Or did I just miss something? > I suppose you already have CONFIG_ACPI_I2C_OPREGION=y in your .config? > That allows the AML code to access the I2C bus using the I2C driver. > > > The FADT has > > Control Method Power Button (V1) : 0 > > Control Method Sleep Button (V1) : 1 > > > > PWRBTN_EN in PM1 is set. But PWRBTN press causes thermal irq. > > Yeah, it uses control method power button (PNP0C0C) and ACPI GPIO event > to trigger changes in that. I'm confused again because I thought "Control Method Power Button (V1) : 0" means it is a fixed power button, however the DSDT also has Device (PWRB) { Name (_HID, EisaId ("PNP0C0C") /* Power Button Device */) // _HID: Hardware ID } Device (SLPB) { Name (_HID, EisaId ("PNP0C0E") /* Sleep Button Device */) // _HID: Hardware ID } > > No SCI (irq 9) is ever generated, except by writing to the > > BIOS_RLS bit in SMI_EN register (IO port 0x430). > > > > GPE block addresses in FADT are 0. GPE0a_EN register (IO 0x428) > > is set to 0x6000 (TCO_EN + PME_B0_EN, but none of the GPIO enables). > > > > Any advice how to continue? > > Please check that you have that CONFIG_ACPI_I2C_OPREGION=y and > CONFIG_MFD_AXP20X=y. > > You should see the ACPI:Event interrupt count increasing in > /proc/interrups when you press power button. When that works then we can > start thinking about adding wake support :) I had CONFIG_ACPI_I2C_OPREGION=y but excluded CONFIG_MFD_AXP20X based on \_SB.PIC0.I2C7.PMI1._STA returning 0 in acpidbg, Device (PMI1) { Name (_ADR, Zero) // _ADR: Address Name (_HID, "INT33F4" /* XPOWER PMIC Controller */) // _HID: Hardware ID Name (_CID, "INT33F4" /* XPOWER PMIC Controller */) // _CID: Compatible ID Name (_DDN, "XPOWER PMIC Controller") // _DDN: DOS Device Name but \_SB.PIC0.I2C7.PMI1._STA returns 0xf Device (PMI2) { Name (_ADR, Zero) // _ADR: Address Name (_HID, "INT33F5" /* TI PMIC Controller */) // _HID: Hardware ID Name (_CID, "INT33F5" /* TI PMIC Controller */) // _CID: Compatible ID Name (_DDN, "TI PMIC Controller") // _DDN: DOS Device Name So I tried CONFIG_MFD_AXP20X=y anyway, but as expected: no change. Since TI doesn't even have a product page for the SND9039 (only a few references in TI support forum can be found), I'm not sure what can be done. So maybe a better short term goal would be to get wakeup by LID working. However, I still wonder why the power button can trigger a thermal irq, is it related to the PMIC? I couldn't find out where the thermal irq is routed. Thanks, Johannes
Re: Cherryview wake up events
On Mon, Sep 19, 2016 at 02:56:19PM +0300, Mika Westerberg wrote: > On Mon, Sep 19, 2016 at 01:21:17PM +0200, Johannes Stezenbach wrote: > > > > The LID causes a gpio irq: > > 158: 2 0 0 0 chv-gpio 43 ACPI:Event > > > > However, neither LID nor power button can wake up the > > device from "echo freeze >/sys/power/state". :-( > > The cherryview pinctrl driver does not (yet) support wake up events. It > currently just sets IRQCHIP_SKIP_SET_WAKE for the irqchip. OK, but shouldn't the wakeup usually be handled by ACPI? Clearly I don't understand this. I mean on the non-ACPI embedded ARM systems I'm used to I need to enable specific irqs as wakeup sources, but on ACPI, isn't SCI the implicit wakeup irq? Probably I'm just totally confused, so let me ask another way, below. > I can make you a test patch which adds support for wakes for the pinctrl > driver if you like to test it out. However, that will happen most likely > near end of the week as I have other things right now. That would be great! I found in the DSDT: Scope (_SB.GPO0) { Name (EVBF, Buffer (0x03) {}) CreateByteField (EVBF, Zero, EVST) CreateByteField (EVBF, One, ELEN) CreateByteField (EVBF, 0x02, ENVT) Name (LIDZ, One) Method (_E4E, 0, Serialized) // _Exx: Edge-Triggered GPE { Name (_T_0, Zero) // _T_x: Emitted by ASL Compiler If (^^PCI0.I2C1.AVBL != One) { Return (Zero) } EVBF = ^^PCI0.I2C1.ENID /* \_SB_.PCI0.I2C1.ENID */ ... _T_0 = ENVT /* \_SB_.GPO0.ENVT */ ... ElseIf (_T_0 == 0xA9) { Notify (PWRB, 0x80) // Status Change Break } and Device (GPO0) { ... Method (_AEI, 0, NotSerialized) // _AEI: ACPI Event Interrupts { Name (WBUF, ResourceTemplate () { GpioInt (Edge, ActiveLow, ExclusiveAndWake, PullUp, 0x, "\\_SB.GPO0", 0x00, ResourceConsumer, , ) { // Pin list 0x004E } }) If (OSID == One) { Return (WBUF) /* \_SB_.GPO0._AEI.WBUF */ } } and OSID is a field in OperationRegion (GNVS, SystemMemory, 0x7A158000, 0x0362) which is inside what /proc/iomem lists as "ACPI no-volatile storage". OSID is read a lot in the DSDT but never written to. But calling \_SB.GPO0._AEI in acpidbg returns a buffer of size 25. Now my question is, is this pin 0x004E the same as this in /proc/interrupts which fires on LID event? 158: 2 0 0 0 chv-gpio 43 ACPI:Event The FADT has Control Method Power Button (V1) : 0 Control Method Sleep Button (V1) : 1 PWRBTN_EN in PM1 is set. But PWRBTN press causes thermal irq. No SCI (irq 9) is ever generated, except by writing to the BIOS_RLS bit in SMI_EN register (IO port 0x430). GPE block addresses in FADT are 0. GPE0a_EN register (IO 0x428) is set to 0x6000 (TCO_EN + PME_B0_EN, but none of the GPIO enables). Any advice how to continue? Thanks, Johannes
Re: Cherryview wake up events
On Mon, Sep 19, 2016 at 02:56:19PM +0300, Mika Westerberg wrote: > On Mon, Sep 19, 2016 at 01:21:17PM +0200, Johannes Stezenbach wrote: > > > > The LID causes a gpio irq: > > 158: 2 0 0 0 chv-gpio 43 ACPI:Event > > > > However, neither LID nor power button can wake up the > > device from "echo freeze >/sys/power/state". :-( > > The cherryview pinctrl driver does not (yet) support wake up events. It > currently just sets IRQCHIP_SKIP_SET_WAKE for the irqchip. OK, but shouldn't the wakeup usually be handled by ACPI? Clearly I don't understand this. I mean on the non-ACPI embedded ARM systems I'm used to I need to enable specific irqs as wakeup sources, but on ACPI, isn't SCI the implicit wakeup irq? Probably I'm just totally confused, so let me ask another way, below. > I can make you a test patch which adds support for wakes for the pinctrl > driver if you like to test it out. However, that will happen most likely > near end of the week as I have other things right now. That would be great! I found in the DSDT: Scope (_SB.GPO0) { Name (EVBF, Buffer (0x03) {}) CreateByteField (EVBF, Zero, EVST) CreateByteField (EVBF, One, ELEN) CreateByteField (EVBF, 0x02, ENVT) Name (LIDZ, One) Method (_E4E, 0, Serialized) // _Exx: Edge-Triggered GPE { Name (_T_0, Zero) // _T_x: Emitted by ASL Compiler If (^^PCI0.I2C1.AVBL != One) { Return (Zero) } EVBF = ^^PCI0.I2C1.ENID /* \_SB_.PCI0.I2C1.ENID */ ... _T_0 = ENVT /* \_SB_.GPO0.ENVT */ ... ElseIf (_T_0 == 0xA9) { Notify (PWRB, 0x80) // Status Change Break } and Device (GPO0) { ... Method (_AEI, 0, NotSerialized) // _AEI: ACPI Event Interrupts { Name (WBUF, ResourceTemplate () { GpioInt (Edge, ActiveLow, ExclusiveAndWake, PullUp, 0x, "\\_SB.GPO0", 0x00, ResourceConsumer, , ) { // Pin list 0x004E } }) If (OSID == One) { Return (WBUF) /* \_SB_.GPO0._AEI.WBUF */ } } and OSID is a field in OperationRegion (GNVS, SystemMemory, 0x7A158000, 0x0362) which is inside what /proc/iomem lists as "ACPI no-volatile storage". OSID is read a lot in the DSDT but never written to. But calling \_SB.GPO0._AEI in acpidbg returns a buffer of size 25. Now my question is, is this pin 0x004E the same as this in /proc/interrupts which fires on LID event? 158: 2 0 0 0 chv-gpio 43 ACPI:Event The FADT has Control Method Power Button (V1) : 0 Control Method Sleep Button (V1) : 1 PWRBTN_EN in PM1 is set. But PWRBTN press causes thermal irq. No SCI (irq 9) is ever generated, except by writing to the BIOS_RLS bit in SMI_EN register (IO port 0x430). GPE block addresses in FADT are 0. GPE0a_EN register (IO 0x428) is set to 0x6000 (TCO_EN + PME_B0_EN, but none of the GPIO enables). Any advice how to continue? Thanks, Johannes
Cherryview wake up events
Hi, Mika, I've been reading the thread about pinctrl-cherryview interrupts, but I have some basic questions in understanding the hardware and the relationship between ACPI and Linux drivers, so I decided to start a new thread. https://lkml.kernel.org/g/20160909085832.gk15...@lahna.fi.intel.com I have one Asus E200HA (Atom x5-Z8300) where the power button doesn't generate any ACPI events (no SCI), instead it causes a Thermal Event irq: TRM: 3 3 3 4 Thermal event interrupts [ 51.825488] CPU0: Core temperature above threshold, cpu clock throttled (total events = 1) [ 51.826933] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1) [ 51.826965] mce: [Hardware Error]: Machine check events logged [ 51.841180] mce: [Hardware Error]: Machine check events logged (These events are logged only sometimes, usually a power button press only increments the TRM count.) I would like to understand how this is possible, when I boot with apic=debug I can't see anything claiming vector 0xfa. The LID causes a gpio irq: 158: 2 0 0 0 chv-gpio 43 ACPI:Event However, neither LID nor power button can wake up the device from "echo freeze >/sys/power/state". :-( "grep . /sys/firmware/acpi/interrupts/*" shows only zeros. I put the DSDT and some other tables at: https://linuxtv.org/~js/e200ha/ During the last weeks I read what I could about the hardware and ACPI, and poked at it with acpidbg, devmem, ioport and in kernel source, but to no avail. On Thu, Sep 15, 2016 at 06:52:10PM +0300, Mika Westerberg wrote: > It turns out that for north and southwest communities, they can only > generate GPIO interrupts for lower 8 interrupts (IntSel value). The upper > part (8-15) can only generate GPEs (General Purpose Events). I got the Atom Z8000 series datasheet from http://www.intel.com/content/www/us/en/processors/atom/atom-technical-resources.html and tried to find the source for this. The closest I could find is the GPIO_ROUT PMC register? However, the datasheet doesn't tell about the other interrupts not covered by GPIO_ROUT, if they are fixed IRQ or SCI or "no effect". I also don't get the mapping from intsel irq to IO-APIC pin number. And also not the mapping between the pin numbers used on DSDT GpioInt to the pin numbers in pinctrl-cherryview.c. Could you shed a light on this? Or point out where I can find information? It seems to imply BIOS sets up IntSel. I'm generally confused about the responsibility of BIOS vs. drivers making use of the information from DSDT, e.g. Device (GPO1) has a list of GpioIo Connections, other devices like PMI2 use GpioInt from GPO1. My E200HA has the INT33F5 TI PMIC Controller, which according to Windows driver strings seems to be the SND9039. Does it mean I need a PMIC driver that reads the _CRS and configures the GPIO? BTW, the datasheet talks about 4 seconds for power button override, but it takes 10 seconds. Maybe it means the power button is connected to the TI PMIC, not to the Cherryview SoC? Another question is about the virtual GPIO device that exists in hardware and is used by DSDT. How does that work and why does pinctrl-cherryview.c exclude it? Sorry for so many questions, any info is appreciated, and any suggestion what to try to get the thing to wake up from freeze. I was totally unfamiliar with ACPI until now, but I think the DSDT has some nasty surprise in several _REG methods that use OEM defined OperatingRegionIds to set some availabilty flags that are tested in other methods. So it means if the Windows drivers aren't loaded, those methods won't do anything, right? Does anyone have suggestions or even examples how to deal with this? Thanks, Johannes
Cherryview wake up events
Hi, Mika, I've been reading the thread about pinctrl-cherryview interrupts, but I have some basic questions in understanding the hardware and the relationship between ACPI and Linux drivers, so I decided to start a new thread. https://lkml.kernel.org/g/20160909085832.gk15...@lahna.fi.intel.com I have one Asus E200HA (Atom x5-Z8300) where the power button doesn't generate any ACPI events (no SCI), instead it causes a Thermal Event irq: TRM: 3 3 3 4 Thermal event interrupts [ 51.825488] CPU0: Core temperature above threshold, cpu clock throttled (total events = 1) [ 51.826933] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1) [ 51.826965] mce: [Hardware Error]: Machine check events logged [ 51.841180] mce: [Hardware Error]: Machine check events logged (These events are logged only sometimes, usually a power button press only increments the TRM count.) I would like to understand how this is possible, when I boot with apic=debug I can't see anything claiming vector 0xfa. The LID causes a gpio irq: 158: 2 0 0 0 chv-gpio 43 ACPI:Event However, neither LID nor power button can wake up the device from "echo freeze >/sys/power/state". :-( "grep . /sys/firmware/acpi/interrupts/*" shows only zeros. I put the DSDT and some other tables at: https://linuxtv.org/~js/e200ha/ During the last weeks I read what I could about the hardware and ACPI, and poked at it with acpidbg, devmem, ioport and in kernel source, but to no avail. On Thu, Sep 15, 2016 at 06:52:10PM +0300, Mika Westerberg wrote: > It turns out that for north and southwest communities, they can only > generate GPIO interrupts for lower 8 interrupts (IntSel value). The upper > part (8-15) can only generate GPEs (General Purpose Events). I got the Atom Z8000 series datasheet from http://www.intel.com/content/www/us/en/processors/atom/atom-technical-resources.html and tried to find the source for this. The closest I could find is the GPIO_ROUT PMC register? However, the datasheet doesn't tell about the other interrupts not covered by GPIO_ROUT, if they are fixed IRQ or SCI or "no effect". I also don't get the mapping from intsel irq to IO-APIC pin number. And also not the mapping between the pin numbers used on DSDT GpioInt to the pin numbers in pinctrl-cherryview.c. Could you shed a light on this? Or point out where I can find information? It seems to imply BIOS sets up IntSel. I'm generally confused about the responsibility of BIOS vs. drivers making use of the information from DSDT, e.g. Device (GPO1) has a list of GpioIo Connections, other devices like PMI2 use GpioInt from GPO1. My E200HA has the INT33F5 TI PMIC Controller, which according to Windows driver strings seems to be the SND9039. Does it mean I need a PMIC driver that reads the _CRS and configures the GPIO? BTW, the datasheet talks about 4 seconds for power button override, but it takes 10 seconds. Maybe it means the power button is connected to the TI PMIC, not to the Cherryview SoC? Another question is about the virtual GPIO device that exists in hardware and is used by DSDT. How does that work and why does pinctrl-cherryview.c exclude it? Sorry for so many questions, any info is appreciated, and any suggestion what to try to get the thing to wake up from freeze. I was totally unfamiliar with ACPI until now, but I think the DSDT has some nasty surprise in several _REG methods that use OEM defined OperatingRegionIds to set some availabilty flags that are tested in other methods. So it means if the Windows drivers aren't loaded, those methods won't do anything, right? Does anyone have suggestions or even examples how to deal with this? Thanks, Johannes
Re: [PATCH 4.7 000/143] 4.7.3-stable review
On Thu, Sep 08, 2016 at 08:52:32AM +0200, Greg Kroah-Hartman wrote: > On Wed, Sep 07, 2016 at 04:59:37PM -0400, Levin, Alexander wrote: > > Hey Greg, > > > > For reference, I've generated a list of <=4.8-rc4 commits that look to me > > like stable material but are not in 4.7.3: > > > > 422eac3f7deae34dbaffd08e03e27f37a5394a56 (v4.8-rc1) tpm_crb: fix mapping of > > the buffers > > a36aa80f3cb2540fb1dbad6240852de4365a2e82 (v4.8-rc1) intel_th: Fix a > > deadlock in modprobing > > 7a1a47ce35821b40f5b2ce46379ba14393bc3873 (v4.8-rc1) intel_th: pci: Add Kaby > > Lake PCH-H support > > fa95986095e39205ea2fb5b5dafe271bca7eb8d1 (v4.8-rc1) drm/i915: Set legacy > > properties when using legacy gamma set IOCTL. (v2) > > 78f4f7c2341f1cf510152ad494108850fec1ae39 (v4.8-rc1) ALSA: hda/realtek - > > ALC891 headset mode for Dell > > 9b51fe3efe4c270005e34f55a97e5a84ad68e581 (v4.8-rc1) ALSA: hda - On-board > > speaker fixup on ACER Veriton > > 7d9595d848cdff5c7939f68eec39e0c5d36a1d67 (v4.8-rc1) dm rq: fix the starting > > and stopping of blk-mq queues > > 3b2c1710fac7fb278b760d1545e637cbb5ea5b5b (v4.8-rc2) drm/i915: Wait up to > > 3ms for the pcu to ack the cdclk change request on SKL > > c518189567eaf42b2ec50a4d982484c8e38799f8 (v4.8-rc3) net: macb: Correct CAPS > > mask > > 80788a0fbbdfbb125e3fd45a640cddb582160bc7 (v4.8-rc1) drm/i915/fbc: sanitize > > i915.enable_fbc during FBC init > > 0a491b96aa59a7232f6c1a81414aa57fb8de8594 (v4.8-rc3) drm/i915/fbc: FBC > > causes display flicker when VT-d is enabled on Skylake > > 3e103a65514c2947e53f3171b21255fbde8b60c6 (v4.8-rc4) ASoC: atmel_ssc_dai: > > Don't unconditionally reset SSC on stream startup > > 1b856086813be9371929b6cc62045f9fd470f5a0 (v4.8-rc4) block: Fix race > > triggered by blk_set_queue_dying() > > ae5b80d2b68eac945b124227dea34462118a6f01 (v4.8-rc4) drm/radeon: only apply > > the SS fractional workaround to RS[78]80 > > d9dc1702b297ec4a6bb9c0326a70641b322ba886 (v4.8-rc4) bcache: > > register_bcache(): call blkdev_put() when cache_alloc() fails > > acc9cf8c66c66b2cbbdb4a375537edee72be64df (v4.8-rc4) bcache: RESERVE_PRIO is > > too small by one when prio_buckets() is a power of two. > > 13f479b9df4e2bbf2d16e7e1b02f3f55f70e2455 (v4.8-rc4) drm/radeon: fix > > radeon_move_blit on 32bit systems > > d77976c414ed7f521b9c79b2a9dde0147a3cf754 (v4.8-rc4) ARC: export kmap > > c57653dc94d0db7bf63067433ceaa97bdcd0a312 (v4.8-rc4) ARC: export __udivdi3 > > for modules > > 6f00975c619064a18c23fd3aced325ae165a73b9 (v4.8-rc4) drm: Reject page_flip > > for !DRIVER_MODESET > > e9e5e3fae8da7e237049e00e0bfc9e32fd808fe8 (v4.8-rc4) bdev: fix NULL pointer > > dereference > > 6a33fa2b87513fee44cb8f0cd17b1acd6316bc6b (v4.8-rc4) irqchip/mips-gic: > > Cleanup chip and handler setup > > 2564970a381651865364974ea414384b569cb9c0 (v4.8-rc4) irqchip/mips-gic: > > Implement activate op for device domain > > c62fb260a86dde3df5b2905432caa0e9f6898434 (v4.8-rc4) IB/hfi1,IB/qib: Fix > > qp_stats sleep with rcu read lock held > > a77ec83a57890240c546df00ca5df1cdeedb1cc3 (v4.8-rc4) vhost/scsi: fix reuse > > of >iov[out] in response > > c0082e985fdf77b02fc9e0dac3b58504dcf11b7a (v4.8-rc4) ubifs: Fix assertion in > > layout_in_gaps() > > 17ce1eb0b64eb27d4f9180daae7495fa022c7b0d (v4.8-rc4) ubifs: Fix xattr > > generic handler usage > > 27727df240c7cc84f2ba6047c6f18d5addfd25ef (v4.8-rc4) timekeeping: Avoid > > taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING > > a4f8f6667f099036c88f231dcad4cf233652c824 (v4.8-rc4) timekeeping: Cap array > > access in timekeeping_debug > > 2e63ad4bd5dd583871e6602f9d398b9322d358d9 (v4.8-rc4) x86/apic: Do not init > > irq remapping if ioapic is disabled > > 9b47f77a680447e0132b2cf7fb82374e014bec1c (v4.8-rc4) nvme: Fix > > nvme_get/set_features() with a NULL result pointer > > 4d70dca4eadf2f95abe389116ac02b8439c2d16c (v4.8-rc4) block: make sure a big > > bio is split into at most 256 bvecs > > 9a035a40f7f3f6708b79224b86c5777a3334f7ea (v4.8-rc4) xenbus: don't look up > > transaction IDs for ordinary writes > > 299f6230bc6d0ccd5f95bb0fb865d80a9c7d5ccc (v4.8-rc4) dm flakey: fix reads to > > be issued if drop_writes configured > > b53e7d000d9e6e9fd2c6eb6b82d2783c67fd599e (v4.8-rc4) > > clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe > > function > > add1fa75101263ab4d74240f93000998d4325624 (v4.8-rc4) drm/atomic: Don't > > potentially reset color_mgmt_changed on successive property updates. > > > > Thanks for these, I'll look at them after I get through the other > "properly tagged" patches in my queue. I also have a long list of stuff > like this that I need to look at closer... And another one: b47820edd1634dc1208f9212b7ecfb4230610a23 ext4: avoid modifying checksum fields directly during checksum verification Sorry fo the noise if you have it already, but there was no repsonse to two pings in https://lkml.kernel.org/r/20160901164016.gb25...@birch.djwong.org Thanks, Johannes
Re: [PATCH 4.7 000/143] 4.7.3-stable review
On Thu, Sep 08, 2016 at 08:52:32AM +0200, Greg Kroah-Hartman wrote: > On Wed, Sep 07, 2016 at 04:59:37PM -0400, Levin, Alexander wrote: > > Hey Greg, > > > > For reference, I've generated a list of <=4.8-rc4 commits that look to me > > like stable material but are not in 4.7.3: > > > > 422eac3f7deae34dbaffd08e03e27f37a5394a56 (v4.8-rc1) tpm_crb: fix mapping of > > the buffers > > a36aa80f3cb2540fb1dbad6240852de4365a2e82 (v4.8-rc1) intel_th: Fix a > > deadlock in modprobing > > 7a1a47ce35821b40f5b2ce46379ba14393bc3873 (v4.8-rc1) intel_th: pci: Add Kaby > > Lake PCH-H support > > fa95986095e39205ea2fb5b5dafe271bca7eb8d1 (v4.8-rc1) drm/i915: Set legacy > > properties when using legacy gamma set IOCTL. (v2) > > 78f4f7c2341f1cf510152ad494108850fec1ae39 (v4.8-rc1) ALSA: hda/realtek - > > ALC891 headset mode for Dell > > 9b51fe3efe4c270005e34f55a97e5a84ad68e581 (v4.8-rc1) ALSA: hda - On-board > > speaker fixup on ACER Veriton > > 7d9595d848cdff5c7939f68eec39e0c5d36a1d67 (v4.8-rc1) dm rq: fix the starting > > and stopping of blk-mq queues > > 3b2c1710fac7fb278b760d1545e637cbb5ea5b5b (v4.8-rc2) drm/i915: Wait up to > > 3ms for the pcu to ack the cdclk change request on SKL > > c518189567eaf42b2ec50a4d982484c8e38799f8 (v4.8-rc3) net: macb: Correct CAPS > > mask > > 80788a0fbbdfbb125e3fd45a640cddb582160bc7 (v4.8-rc1) drm/i915/fbc: sanitize > > i915.enable_fbc during FBC init > > 0a491b96aa59a7232f6c1a81414aa57fb8de8594 (v4.8-rc3) drm/i915/fbc: FBC > > causes display flicker when VT-d is enabled on Skylake > > 3e103a65514c2947e53f3171b21255fbde8b60c6 (v4.8-rc4) ASoC: atmel_ssc_dai: > > Don't unconditionally reset SSC on stream startup > > 1b856086813be9371929b6cc62045f9fd470f5a0 (v4.8-rc4) block: Fix race > > triggered by blk_set_queue_dying() > > ae5b80d2b68eac945b124227dea34462118a6f01 (v4.8-rc4) drm/radeon: only apply > > the SS fractional workaround to RS[78]80 > > d9dc1702b297ec4a6bb9c0326a70641b322ba886 (v4.8-rc4) bcache: > > register_bcache(): call blkdev_put() when cache_alloc() fails > > acc9cf8c66c66b2cbbdb4a375537edee72be64df (v4.8-rc4) bcache: RESERVE_PRIO is > > too small by one when prio_buckets() is a power of two. > > 13f479b9df4e2bbf2d16e7e1b02f3f55f70e2455 (v4.8-rc4) drm/radeon: fix > > radeon_move_blit on 32bit systems > > d77976c414ed7f521b9c79b2a9dde0147a3cf754 (v4.8-rc4) ARC: export kmap > > c57653dc94d0db7bf63067433ceaa97bdcd0a312 (v4.8-rc4) ARC: export __udivdi3 > > for modules > > 6f00975c619064a18c23fd3aced325ae165a73b9 (v4.8-rc4) drm: Reject page_flip > > for !DRIVER_MODESET > > e9e5e3fae8da7e237049e00e0bfc9e32fd808fe8 (v4.8-rc4) bdev: fix NULL pointer > > dereference > > 6a33fa2b87513fee44cb8f0cd17b1acd6316bc6b (v4.8-rc4) irqchip/mips-gic: > > Cleanup chip and handler setup > > 2564970a381651865364974ea414384b569cb9c0 (v4.8-rc4) irqchip/mips-gic: > > Implement activate op for device domain > > c62fb260a86dde3df5b2905432caa0e9f6898434 (v4.8-rc4) IB/hfi1,IB/qib: Fix > > qp_stats sleep with rcu read lock held > > a77ec83a57890240c546df00ca5df1cdeedb1cc3 (v4.8-rc4) vhost/scsi: fix reuse > > of >iov[out] in response > > c0082e985fdf77b02fc9e0dac3b58504dcf11b7a (v4.8-rc4) ubifs: Fix assertion in > > layout_in_gaps() > > 17ce1eb0b64eb27d4f9180daae7495fa022c7b0d (v4.8-rc4) ubifs: Fix xattr > > generic handler usage > > 27727df240c7cc84f2ba6047c6f18d5addfd25ef (v4.8-rc4) timekeeping: Avoid > > taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING > > a4f8f6667f099036c88f231dcad4cf233652c824 (v4.8-rc4) timekeeping: Cap array > > access in timekeeping_debug > > 2e63ad4bd5dd583871e6602f9d398b9322d358d9 (v4.8-rc4) x86/apic: Do not init > > irq remapping if ioapic is disabled > > 9b47f77a680447e0132b2cf7fb82374e014bec1c (v4.8-rc4) nvme: Fix > > nvme_get/set_features() with a NULL result pointer > > 4d70dca4eadf2f95abe389116ac02b8439c2d16c (v4.8-rc4) block: make sure a big > > bio is split into at most 256 bvecs > > 9a035a40f7f3f6708b79224b86c5777a3334f7ea (v4.8-rc4) xenbus: don't look up > > transaction IDs for ordinary writes > > 299f6230bc6d0ccd5f95bb0fb865d80a9c7d5ccc (v4.8-rc4) dm flakey: fix reads to > > be issued if drop_writes configured > > b53e7d000d9e6e9fd2c6eb6b82d2783c67fd599e (v4.8-rc4) > > clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe > > function > > add1fa75101263ab4d74240f93000998d4325624 (v4.8-rc4) drm/atomic: Don't > > potentially reset color_mgmt_changed on successive property updates. > > > > Thanks for these, I'll look at them after I get through the other > "properly tagged" patches in my queue. I also have a long list of stuff > like this that I need to look at closer... And another one: b47820edd1634dc1208f9212b7ecfb4230610a23 ext4: avoid modifying checksum fields directly during checksum verification Sorry fo the noise if you have it already, but there was no repsonse to two pings in https://lkml.kernel.org/r/20160901164016.gb25...@birch.djwong.org Thanks, Johannes
Re: 4.7.0-rc7 ext4 error in dx_probe
On Fri, Aug 05, 2016 at 08:11:36PM +0200, Johannes Stezenbach wrote: > On Fri, Aug 05, 2016 at 10:02:28AM -0700, Darrick J. Wong wrote: > > > > When you're back on 4.7, can you apply this patch[1] to see if it fixes > > the problem? I speculate that the new parallel dir lookup code enables > > multiple threads to be verifying the same directory block buffer at the > > same time. > > > > [1] > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23 > > I added the patch, rebuilt and rebooted. It will take some time > before I'll report back since the issue is so hard to reproduce. FWIW, so far the issue didn't appear again after I applied the patch to 4.7.0, and I stressed it a bit with repo syncs, AOSP builds, rsync backups etc. Johannes
Re: 4.7.0-rc7 ext4 error in dx_probe
On Fri, Aug 05, 2016 at 08:11:36PM +0200, Johannes Stezenbach wrote: > On Fri, Aug 05, 2016 at 10:02:28AM -0700, Darrick J. Wong wrote: > > > > When you're back on 4.7, can you apply this patch[1] to see if it fixes > > the problem? I speculate that the new parallel dir lookup code enables > > multiple threads to be verifying the same directory block buffer at the > > same time. > > > > [1] > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23 > > I added the patch, rebuilt and rebooted. It will take some time > before I'll report back since the issue is so hard to reproduce. FWIW, so far the issue didn't appear again after I applied the patch to 4.7.0, and I stressed it a bit with repo syncs, AOSP builds, rsync backups etc. Johannes
4.7.0: RCU stall in nf_conntrack
Hi, I just experienced network hangup with 4.7.0, it happened shortly after resume from hibernate: [201988.443552] INFO: rcu_preempt detected stalls on CPUs/tasks: [201988.443556] Tasks blocked on level-0 rcu_node (CPUs 0-3): P14563 [201988.443557] (detected by 3, t=18002 jiffies, g=7365154, c=7365153, q=15274) [201988.443560] client_socket_t R running task0 14563 1 0x [201988.443563] 8800c427a900 e1b77832 880217603da0 810bf66a [201988.443565] 810bf5d1 8800c427a900 81e566c0 880217603dd0 [201988.443567] 8119a3cf 8802177d80c0 81e566c0 81f89ae0 [201988.443569] Call Trace: [201988.443571][] sched_show_task+0xfa/0x160 [201988.443585] [] ? sched_show_task+0x61/0x160 [201988.443587] [] rcu_print_detail_task_stall_rnp+0x52/0x76 [201988.443590] [] rcu_check_callbacks+0x866/0x9e0 [201988.443592] [] update_process_times+0x39/0x60 [201988.443594] [] tick_sched_handle.isra.5+0x21/0x60 [201988.443596] [] tick_sched_timer+0x42/0x70 [201988.443598] [] __hrtimer_run_queues+0x140/0x3c0 [201988.443599] [] ? tick_sched_handle.isra.5+0x60/0x60 [201988.443601] [] hrtimer_interrupt+0xb3/0x1c0 [201988.443603] [] local_apic_timer_interrupt+0x36/0x60 [201988.443606] [] smp_apic_timer_interrupt+0x3d/0x50 [201988.443607] [] apic_timer_interrupt+0x8c/0xa0 [201988.443608][] ? __nf_conntrack_find_get+0x285/0x420 [201988.443611] [] ? nf_conntrack_in+0x1d1/0x8d0 [201988.443612] [] nf_conntrack_in+0x1d1/0x8d0 [201988.443615] [] ipv4_conntrack_local+0x45/0x50 [201988.443616] [] nf_iterate+0x62/0x80 [201988.443618] [] nf_hook_slow+0xa0/0x110 [201988.443620] [] ? nf_hook_slow+0x5/0x110 [201988.443622] [] __ip_local_out+0xd8/0x120 [201988.443624] [] ? ip_forward_options+0x1f0/0x1f0 [201988.443625] [] ip_local_out+0x1c/0x70 [201988.443627] [] ip_queue_xmit+0x18f/0x450 [201988.443628] [] ? ip_queue_xmit+0x5/0x450 [201988.443630] [] tcp_transmit_skb+0x48b/0x8e0 [201988.443632] [] tcp_connect+0x629/0x830 [201988.443634] [] ? secure_tcp_sequence_number+0x7f/0xe0 [201988.443636] [] tcp_v4_connect+0x2b9/0x460 [201988.443638] [] __inet_stream_connect+0xb2/0x310 [201988.443640] [] ? preempt_count_sub+0xa1/0x100 [201988.443642] [] ? lock_sock_nested+0x31/0x90 [201988.443644] [] ? __local_bh_enable_ip+0x6f/0xd0 [201988.443646] [] inet_stream_connect+0x38/0x50 [201988.443647] [] SyS_connect+0x7b/0xf0 [201988.443649] [] ? sock_alloc_file+0xa5/0x140 [201988.443651] [] ? trace_hardirqs_on_thunk+0x1a/0x1c [201988.443652] [] entry_SYSCALL_64_fastpath+0x1f/0xbd [201988.443654] client_socket_t R running task0 14563 1 0x [201988.443656] 8800c427a900 e1b77832 880217603da0 810bf66a [201988.443658] 810bf5d1 8800c427a900 81e566c0 880217603dd0 [201988.443660] 8119a3cf 8802177d80c0 81e566c0 81f89ae0 [201988.443662] Call Trace: [201988.443663][] sched_show_task+0xfa/0x160 [201988.443665] [] ? sched_show_task+0x61/0x160 [201988.443666] [] rcu_print_detail_task_stall_rnp+0x52/0x76 [201988.443668] [] rcu_check_callbacks+0x89f/0x9e0 [201988.443669] [] update_process_times+0x39/0x60 [201988.443671] [] tick_sched_handle.isra.5+0x21/0x60 [201988.443672] [] tick_sched_timer+0x42/0x70 [201988.443674] [] __hrtimer_run_queues+0x140/0x3c0 [201988.443675] [] ? tick_sched_handle.isra.5+0x60/0x60 [201988.443677] [] hrtimer_interrupt+0xb3/0x1c0 [201988.443679] [] local_apic_timer_interrupt+0x36/0x60 [201988.443680] [] smp_apic_timer_interrupt+0x3d/0x50 [201988.443682] [] apic_timer_interrupt+0x8c/0xa0 [201988.443682][] ? __nf_conntrack_find_get+0x285/0x420 [201988.443685] [] ? nf_conntrack_in+0x1d1/0x8d0 [201988.443686] [] nf_conntrack_in+0x1d1/0x8d0 [201988.443688] [] ipv4_conntrack_local+0x45/0x50 [201988.443689] [] nf_iterate+0x62/0x80 [201988.443691] [] nf_hook_slow+0xa0/0x110 [201988.443692] [] ? nf_hook_slow+0x5/0x110 [201988.443694] [] __ip_local_out+0xd8/0x120 [201988.443696] [] ? ip_forward_options+0x1f0/0x1f0 [201988.443697] [] ip_local_out+0x1c/0x70 [201988.443699] [] ip_queue_xmit+0x18f/0x450 [201988.443700] [] ? ip_queue_xmit+0x5/0x450 [201988.443702] [] tcp_transmit_skb+0x48b/0x8e0 [201988.443703] [] tcp_connect+0x629/0x830 [201988.443705] [] ? secure_tcp_sequence_number+0x7f/0xe0 [201988.443706] [] tcp_v4_connect+0x2b9/0x460 [201988.443708] [] __inet_stream_connect+0xb2/0x310 [201988.443710] [] ? preempt_count_sub+0xa1/0x100 [201988.443711] [] ? lock_sock_nested+0x31/0x90 [201988.443713] [] ? __local_bh_enable_ip+0x6f/0xd0 [201988.443715] [] inet_stream_connect+0x38/0x50 [201988.443716] [] SyS_connect+0x7b/0xf0 [201988.443718] [] ? sock_alloc_file+0xa5/0x140 [201988.443719] [] ? trace_hardirqs_on_thunk+0x1a/0x1c [201988.443720] [] entry_SYSCALL_64_fastpath+0x1f/0xbd [202168.442569] INFO: rcu_preempt detected stalls on CPUs/tasks: [202168.442572] Tasks
4.7.0: RCU stall in nf_conntrack
Hi, I just experienced network hangup with 4.7.0, it happened shortly after resume from hibernate: [201988.443552] INFO: rcu_preempt detected stalls on CPUs/tasks: [201988.443556] Tasks blocked on level-0 rcu_node (CPUs 0-3): P14563 [201988.443557] (detected by 3, t=18002 jiffies, g=7365154, c=7365153, q=15274) [201988.443560] client_socket_t R running task0 14563 1 0x [201988.443563] 8800c427a900 e1b77832 880217603da0 810bf66a [201988.443565] 810bf5d1 8800c427a900 81e566c0 880217603dd0 [201988.443567] 8119a3cf 8802177d80c0 81e566c0 81f89ae0 [201988.443569] Call Trace: [201988.443571][] sched_show_task+0xfa/0x160 [201988.443585] [] ? sched_show_task+0x61/0x160 [201988.443587] [] rcu_print_detail_task_stall_rnp+0x52/0x76 [201988.443590] [] rcu_check_callbacks+0x866/0x9e0 [201988.443592] [] update_process_times+0x39/0x60 [201988.443594] [] tick_sched_handle.isra.5+0x21/0x60 [201988.443596] [] tick_sched_timer+0x42/0x70 [201988.443598] [] __hrtimer_run_queues+0x140/0x3c0 [201988.443599] [] ? tick_sched_handle.isra.5+0x60/0x60 [201988.443601] [] hrtimer_interrupt+0xb3/0x1c0 [201988.443603] [] local_apic_timer_interrupt+0x36/0x60 [201988.443606] [] smp_apic_timer_interrupt+0x3d/0x50 [201988.443607] [] apic_timer_interrupt+0x8c/0xa0 [201988.443608][] ? __nf_conntrack_find_get+0x285/0x420 [201988.443611] [] ? nf_conntrack_in+0x1d1/0x8d0 [201988.443612] [] nf_conntrack_in+0x1d1/0x8d0 [201988.443615] [] ipv4_conntrack_local+0x45/0x50 [201988.443616] [] nf_iterate+0x62/0x80 [201988.443618] [] nf_hook_slow+0xa0/0x110 [201988.443620] [] ? nf_hook_slow+0x5/0x110 [201988.443622] [] __ip_local_out+0xd8/0x120 [201988.443624] [] ? ip_forward_options+0x1f0/0x1f0 [201988.443625] [] ip_local_out+0x1c/0x70 [201988.443627] [] ip_queue_xmit+0x18f/0x450 [201988.443628] [] ? ip_queue_xmit+0x5/0x450 [201988.443630] [] tcp_transmit_skb+0x48b/0x8e0 [201988.443632] [] tcp_connect+0x629/0x830 [201988.443634] [] ? secure_tcp_sequence_number+0x7f/0xe0 [201988.443636] [] tcp_v4_connect+0x2b9/0x460 [201988.443638] [] __inet_stream_connect+0xb2/0x310 [201988.443640] [] ? preempt_count_sub+0xa1/0x100 [201988.443642] [] ? lock_sock_nested+0x31/0x90 [201988.443644] [] ? __local_bh_enable_ip+0x6f/0xd0 [201988.443646] [] inet_stream_connect+0x38/0x50 [201988.443647] [] SyS_connect+0x7b/0xf0 [201988.443649] [] ? sock_alloc_file+0xa5/0x140 [201988.443651] [] ? trace_hardirqs_on_thunk+0x1a/0x1c [201988.443652] [] entry_SYSCALL_64_fastpath+0x1f/0xbd [201988.443654] client_socket_t R running task0 14563 1 0x [201988.443656] 8800c427a900 e1b77832 880217603da0 810bf66a [201988.443658] 810bf5d1 8800c427a900 81e566c0 880217603dd0 [201988.443660] 8119a3cf 8802177d80c0 81e566c0 81f89ae0 [201988.443662] Call Trace: [201988.443663][] sched_show_task+0xfa/0x160 [201988.443665] [] ? sched_show_task+0x61/0x160 [201988.443666] [] rcu_print_detail_task_stall_rnp+0x52/0x76 [201988.443668] [] rcu_check_callbacks+0x89f/0x9e0 [201988.443669] [] update_process_times+0x39/0x60 [201988.443671] [] tick_sched_handle.isra.5+0x21/0x60 [201988.443672] [] tick_sched_timer+0x42/0x70 [201988.443674] [] __hrtimer_run_queues+0x140/0x3c0 [201988.443675] [] ? tick_sched_handle.isra.5+0x60/0x60 [201988.443677] [] hrtimer_interrupt+0xb3/0x1c0 [201988.443679] [] local_apic_timer_interrupt+0x36/0x60 [201988.443680] [] smp_apic_timer_interrupt+0x3d/0x50 [201988.443682] [] apic_timer_interrupt+0x8c/0xa0 [201988.443682][] ? __nf_conntrack_find_get+0x285/0x420 [201988.443685] [] ? nf_conntrack_in+0x1d1/0x8d0 [201988.443686] [] nf_conntrack_in+0x1d1/0x8d0 [201988.443688] [] ipv4_conntrack_local+0x45/0x50 [201988.443689] [] nf_iterate+0x62/0x80 [201988.443691] [] nf_hook_slow+0xa0/0x110 [201988.443692] [] ? nf_hook_slow+0x5/0x110 [201988.443694] [] __ip_local_out+0xd8/0x120 [201988.443696] [] ? ip_forward_options+0x1f0/0x1f0 [201988.443697] [] ip_local_out+0x1c/0x70 [201988.443699] [] ip_queue_xmit+0x18f/0x450 [201988.443700] [] ? ip_queue_xmit+0x5/0x450 [201988.443702] [] tcp_transmit_skb+0x48b/0x8e0 [201988.443703] [] tcp_connect+0x629/0x830 [201988.443705] [] ? secure_tcp_sequence_number+0x7f/0xe0 [201988.443706] [] tcp_v4_connect+0x2b9/0x460 [201988.443708] [] __inet_stream_connect+0xb2/0x310 [201988.443710] [] ? preempt_count_sub+0xa1/0x100 [201988.443711] [] ? lock_sock_nested+0x31/0x90 [201988.443713] [] ? __local_bh_enable_ip+0x6f/0xd0 [201988.443715] [] inet_stream_connect+0x38/0x50 [201988.443716] [] SyS_connect+0x7b/0xf0 [201988.443718] [] ? sock_alloc_file+0xa5/0x140 [201988.443719] [] ? trace_hardirqs_on_thunk+0x1a/0x1c [201988.443720] [] entry_SYSCALL_64_fastpath+0x1f/0xbd [202168.442569] INFO: rcu_preempt detected stalls on CPUs/tasks: [202168.442572] Tasks
Re: 4.7.0-rc7 ext4 error in dx_probe
On Fri, Aug 05, 2016 at 10:02:28AM -0700, Darrick J. Wong wrote: > On Fri, Aug 05, 2016 at 12:35:44PM +0200, Johannes Stezenbach wrote: > > On Wed, Aug 03, 2016 at 05:50:26PM +0300, Török Edwin wrote: > > > I have just encountered a similar problem after I've recently upgraded to > > > 4.7.0: > > > [Wed Aug 3 11:08:57 2016] EXT4-fs error (device dm-1): dx_probe:740: > > > inode #13295: comm python: Directory index failed checksum > > > [Wed Aug 3 11:08:57 2016] Aborting journal on device dm-1-8. > > > [Wed Aug 3 11:08:57 2016] EXT4-fs (dm-1): Remounting filesystem read-only > > > [Wed Aug 3 11:08:57 2016] EXT4-fs error (device dm-1): > > > ext4_journal_check_start:56: Detected aborted journal > > > > It just happened again to me, this time hitting /usr/sbin/ > > on root fs. Meanwhile I ran memtest86 7.0 for two nights, > > it didn't find anything. I'm using hibernate regularly > > and I think so this only happened after a few hibernate/resume > > cycles, but no idea if that means anything. > > Now I'm back at 4.4.16 to see if it reproduces. > > When you're back on 4.7, can you apply this patch[1] to see if it fixes > the problem? I speculate that the new parallel dir lookup code enables > multiple threads to be verifying the same directory block buffer at the > same time. > > [1] > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23 I added the patch, rebuilt and rebooted. It will take some time before I'll report back since the issue is so hard to reproduce. Thanks, Johannes
Re: 4.7.0-rc7 ext4 error in dx_probe
On Fri, Aug 05, 2016 at 10:02:28AM -0700, Darrick J. Wong wrote: > On Fri, Aug 05, 2016 at 12:35:44PM +0200, Johannes Stezenbach wrote: > > On Wed, Aug 03, 2016 at 05:50:26PM +0300, Török Edwin wrote: > > > I have just encountered a similar problem after I've recently upgraded to > > > 4.7.0: > > > [Wed Aug 3 11:08:57 2016] EXT4-fs error (device dm-1): dx_probe:740: > > > inode #13295: comm python: Directory index failed checksum > > > [Wed Aug 3 11:08:57 2016] Aborting journal on device dm-1-8. > > > [Wed Aug 3 11:08:57 2016] EXT4-fs (dm-1): Remounting filesystem read-only > > > [Wed Aug 3 11:08:57 2016] EXT4-fs error (device dm-1): > > > ext4_journal_check_start:56: Detected aborted journal > > > > It just happened again to me, this time hitting /usr/sbin/ > > on root fs. Meanwhile I ran memtest86 7.0 for two nights, > > it didn't find anything. I'm using hibernate regularly > > and I think so this only happened after a few hibernate/resume > > cycles, but no idea if that means anything. > > Now I'm back at 4.4.16 to see if it reproduces. > > When you're back on 4.7, can you apply this patch[1] to see if it fixes > the problem? I speculate that the new parallel dir lookup code enables > multiple threads to be verifying the same directory block buffer at the > same time. > > [1] > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23 I added the patch, rebuilt and rebooted. It will take some time before I'll report back since the issue is so hard to reproduce. Thanks, Johannes
Re: 4.7.0-rc7 ext4 error in dx_probe
On Wed, Aug 03, 2016 at 05:50:26PM +0300, Török Edwin wrote: > I have just encountered a similar problem after I've recently upgraded to > 4.7.0: > [Wed Aug 3 11:08:57 2016] EXT4-fs error (device dm-1): dx_probe:740: inode > #13295: comm python: Directory index failed checksum > [Wed Aug 3 11:08:57 2016] Aborting journal on device dm-1-8. > [Wed Aug 3 11:08:57 2016] EXT4-fs (dm-1): Remounting filesystem read-only > [Wed Aug 3 11:08:57 2016] EXT4-fs error (device dm-1): > ext4_journal_check_start:56: Detected aborted journal > > I've rebooted in single-user mode, fsck fixed the filesystem, and rebooted, > filesystem is rw again now. > > inode #13295 seems to be this and I can list it now: > stat /usr/lib64/python3.4/site-packages > File: '/usr/lib64/python3.4/site-packages' > Size: 12288 Blocks: 24 IO Block: 4096 directory > Device: fd01h/64769d Inode: 13295 Links: 180 > Access: (0755/drwxr-xr-x) Uid: (0/root) Gid: (0/root) > Access: 2016-05-09 11:29:44.056661988 +0300 > Modify: 2016-08-01 00:34:24.029779875 +0300 > Change: 2016-08-01 00:34:24.029779875 +0300 > Birth: - > > The filesystem was /, I only noticed it was readonly after several hours when > I tried to install something: > /dev/mapper/vg--ssd-root on / type ext4 > (rw,noatime,errors=remount-ro,data=ordered) > > $ uname -a > Linux bolt 4.7.0-gentoo-rr #1 SMP Thu Jul 28 11:28:56 EEST 2016 x86_64 AMD > FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux > > FWIW I've been using ext4 for years and this is the first time I see this > message. > Prior to 4.7 I was on 4.6.1 -> 4.6.2 -> 4.6.3 -> 4.6.4. > > The kernel is from gentoo-sources + a patch for enabling AMD LWP (I had that > patch since 4.6.3 and its not related to I/O). > > If I see this message again what should I do to obtain more information to > trace down the root cause? It just happened again to me, this time hitting /usr/sbin/ on root fs. Meanwhile I ran memtest86 7.0 for two nights, it didn't find anything. I'm using hibernate regularly and I think so this only happened after a few hibernate/resume cycles, but no idea if that means anything. Now I'm back at 4.4.16 to see if it reproduces. Johannes
Re: 4.7.0-rc7 ext4 error in dx_probe
On Wed, Aug 03, 2016 at 05:50:26PM +0300, Török Edwin wrote: > I have just encountered a similar problem after I've recently upgraded to > 4.7.0: > [Wed Aug 3 11:08:57 2016] EXT4-fs error (device dm-1): dx_probe:740: inode > #13295: comm python: Directory index failed checksum > [Wed Aug 3 11:08:57 2016] Aborting journal on device dm-1-8. > [Wed Aug 3 11:08:57 2016] EXT4-fs (dm-1): Remounting filesystem read-only > [Wed Aug 3 11:08:57 2016] EXT4-fs error (device dm-1): > ext4_journal_check_start:56: Detected aborted journal > > I've rebooted in single-user mode, fsck fixed the filesystem, and rebooted, > filesystem is rw again now. > > inode #13295 seems to be this and I can list it now: > stat /usr/lib64/python3.4/site-packages > File: '/usr/lib64/python3.4/site-packages' > Size: 12288 Blocks: 24 IO Block: 4096 directory > Device: fd01h/64769d Inode: 13295 Links: 180 > Access: (0755/drwxr-xr-x) Uid: (0/root) Gid: (0/root) > Access: 2016-05-09 11:29:44.056661988 +0300 > Modify: 2016-08-01 00:34:24.029779875 +0300 > Change: 2016-08-01 00:34:24.029779875 +0300 > Birth: - > > The filesystem was /, I only noticed it was readonly after several hours when > I tried to install something: > /dev/mapper/vg--ssd-root on / type ext4 > (rw,noatime,errors=remount-ro,data=ordered) > > $ uname -a > Linux bolt 4.7.0-gentoo-rr #1 SMP Thu Jul 28 11:28:56 EEST 2016 x86_64 AMD > FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux > > FWIW I've been using ext4 for years and this is the first time I see this > message. > Prior to 4.7 I was on 4.6.1 -> 4.6.2 -> 4.6.3 -> 4.6.4. > > The kernel is from gentoo-sources + a patch for enabling AMD LWP (I had that > patch since 4.6.3 and its not related to I/O). > > If I see this message again what should I do to obtain more information to > trace down the root cause? It just happened again to me, this time hitting /usr/sbin/ on root fs. Meanwhile I ran memtest86 7.0 for two nights, it didn't find anything. I'm using hibernate regularly and I think so this only happened after a few hibernate/resume cycles, but no idea if that means anything. Now I'm back at 4.4.16 to see if it reproduces. Johannes
Re: 4.7.0-rc7 ext4 error in dx_probe
On Mon, Jul 18, 2016 at 04:17:23PM +0200, Johannes Stezenbach wrote: > On Mon, Jul 18, 2016 at 09:38:43AM -0400, Theodore Ts'o wrote: > > On Mon, Jul 18, 2016 at 12:57:07PM +0200, Johannes Stezenbach wrote: > > > > > > I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD > > > and out of the blue on idle machine the following error > > > message appeared: > > > > > > [373851.683131] EXT4-fs (dm-3): error count since last fsck: 1 > > > [373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: > > > dx_probe:740: inode 22288562 > > > [373851.683158] EXT4-fs (dm-3): last error at time 1468438194: > > > dx_probe:740: inode 22288562 > > > > > > inode 22288562 is a directory with ~800 small files in it, > > > but AFAICT nothing was accessing it, no cron job running etc. > > > No further error message was logged. Accessing the directory > > > and the files in it also gives no further errors. FWIW, now with 4.7.0 and errors=remount-ro it just happened again during git update (actually "repo sync -ld" of AOSP/cm repository). Again a directory with 321 small files. ls on ro fs after the error listed the directory without problems. Fsck fixed wrong inode and wrong free block count. ls after fsck still listed the directory and "git status" reported it as clean. [72173.126740] EXT4-fs error (device dm-3): dx_probe:740: inode #12327817: comm git: Directory index failed checksum [72173.131346] Aborting journal on device dm-3-8. [72173.135884] EXT4-fs (dm-3): Remounting filesystem read-only Since I upgraded the RAM from 4G to 8G not long ago I suspect it could be the root of the issue, although this RAM was taken from another machine (which I had upgraded from 4G to 12G and now downgraded to 8G) where it worked for ~2 years, also with AOSP stuff. Sigh... Johannes
Re: 4.7.0-rc7 ext4 error in dx_probe
On Mon, Jul 18, 2016 at 04:17:23PM +0200, Johannes Stezenbach wrote: > On Mon, Jul 18, 2016 at 09:38:43AM -0400, Theodore Ts'o wrote: > > On Mon, Jul 18, 2016 at 12:57:07PM +0200, Johannes Stezenbach wrote: > > > > > > I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD > > > and out of the blue on idle machine the following error > > > message appeared: > > > > > > [373851.683131] EXT4-fs (dm-3): error count since last fsck: 1 > > > [373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: > > > dx_probe:740: inode 22288562 > > > [373851.683158] EXT4-fs (dm-3): last error at time 1468438194: > > > dx_probe:740: inode 22288562 > > > > > > inode 22288562 is a directory with ~800 small files in it, > > > but AFAICT nothing was accessing it, no cron job running etc. > > > No further error message was logged. Accessing the directory > > > and the files in it also gives no further errors. FWIW, now with 4.7.0 and errors=remount-ro it just happened again during git update (actually "repo sync -ld" of AOSP/cm repository). Again a directory with 321 small files. ls on ro fs after the error listed the directory without problems. Fsck fixed wrong inode and wrong free block count. ls after fsck still listed the directory and "git status" reported it as clean. [72173.126740] EXT4-fs error (device dm-3): dx_probe:740: inode #12327817: comm git: Directory index failed checksum [72173.131346] Aborting journal on device dm-3-8. [72173.135884] EXT4-fs (dm-3): Remounting filesystem read-only Since I upgraded the RAM from 4G to 8G not long ago I suspect it could be the root of the issue, although this RAM was taken from another machine (which I had upgraded from 4G to 12G and now downgraded to 8G) where it worked for ~2 years, also with AOSP stuff. Sigh... Johannes
Re: 4.7.0-rc7 ext4 error in dx_probe
On Mon, Jul 18, 2016 at 09:38:43AM -0400, Theodore Ts'o wrote: > On Mon, Jul 18, 2016 at 12:57:07PM +0200, Johannes Stezenbach wrote: > > > > I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD > > and out of the blue on idle machine the following error > > message appeared: > > > > [373851.683131] EXT4-fs (dm-3): error count since last fsck: 1 > > [373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: > > dx_probe:740: inode 22288562 > > [373851.683158] EXT4-fs (dm-3): last error at time 1468438194: > > dx_probe:740: inode 22288562 > > > > inode 22288562 is a directory with ~800 small files in it, > > but AFAICT nothing was accessing it, no cron job running etc. > > No further error message was logged. Accessing the directory > > and the files in it also gives no further errors. > > Yes, thes messages gets printed once a day in case there was a file > system corruption detected earlier. The problem is people > unfortunately run with their file systems set to errors=continue, > which I sometimes refer to as the "don't worry, be happy" option. The [snip] I've not willingly done this, but I recently upgraded to a bigger SSD and so created new file system, and the mount option for errors= isn't specified so it uses the default from superblock, and mkfs.ext4 has defaulted to "Errors behavior: Continue" according to dumpe2fs -h. I'm using Debian sid FWIW, just checked the source of e2fsprogs-1.43.1 and found: #define EXT2_ERRORS_DEFAULT EXT2_ERRORS_CONTINUE During reboot after crash I saw the usual "Clearing orphaned inode" messages scroll by, however they did not make it into systemd journal. So I suspect if there were any other fsck errors during boot they were lost, too, thanks to systemd-fsck. Thanks for your detailed reply. Johannes
Re: 4.7.0-rc7 ext4 error in dx_probe
On Mon, Jul 18, 2016 at 09:38:43AM -0400, Theodore Ts'o wrote: > On Mon, Jul 18, 2016 at 12:57:07PM +0200, Johannes Stezenbach wrote: > > > > I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD > > and out of the blue on idle machine the following error > > message appeared: > > > > [373851.683131] EXT4-fs (dm-3): error count since last fsck: 1 > > [373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: > > dx_probe:740: inode 22288562 > > [373851.683158] EXT4-fs (dm-3): last error at time 1468438194: > > dx_probe:740: inode 22288562 > > > > inode 22288562 is a directory with ~800 small files in it, > > but AFAICT nothing was accessing it, no cron job running etc. > > No further error message was logged. Accessing the directory > > and the files in it also gives no further errors. > > Yes, thes messages gets printed once a day in case there was a file > system corruption detected earlier. The problem is people > unfortunately run with their file systems set to errors=continue, > which I sometimes refer to as the "don't worry, be happy" option. The [snip] I've not willingly done this, but I recently upgraded to a bigger SSD and so created new file system, and the mount option for errors= isn't specified so it uses the default from superblock, and mkfs.ext4 has defaulted to "Errors behavior: Continue" according to dumpe2fs -h. I'm using Debian sid FWIW, just checked the source of e2fsprogs-1.43.1 and found: #define EXT2_ERRORS_DEFAULT EXT2_ERRORS_CONTINUE During reboot after crash I saw the usual "Clearing orphaned inode" messages scroll by, however they did not make it into systemd journal. So I suspect if there were any other fsck errors during boot they were lost, too, thanks to systemd-fsck. Thanks for your detailed reply. Johannes
4.7.0-rc7 ext4 error in dx_probe
Hi, I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD and out of the blue on idle machine the following error message appeared: [373851.683131] EXT4-fs (dm-3): error count since last fsck: 1 [373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: dx_probe:740: inode 22288562 [373851.683158] EXT4-fs (dm-3): last error at time 1468438194: dx_probe:740: inode 22288562 inode 22288562 is a directory with ~800 small files in it, but AFAICT nothing was accessing it, no cron job running etc. No further error message was logged. Accessing the directory and the files in it also gives no further errors. Searching back in the log at date -d @1468438194 I found: Jul 13 21:29:54 foo kernel: EXT4-fs error (device dm-3): dx_probe:740: inode #22288562: comm git: Directory index failed checksum Time to run fsck? Is it the consequence of a previous crash (I had many recently)? Johannes
4.7.0-rc7 ext4 error in dx_probe
Hi, I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD and out of the blue on idle machine the following error message appeared: [373851.683131] EXT4-fs (dm-3): error count since last fsck: 1 [373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: dx_probe:740: inode 22288562 [373851.683158] EXT4-fs (dm-3): last error at time 1468438194: dx_probe:740: inode 22288562 inode 22288562 is a directory with ~800 small files in it, but AFAICT nothing was accessing it, no cron job running etc. No further error message was logged. Accessing the directory and the files in it also gives no further errors. Searching back in the log at date -d @1468438194 I found: Jul 13 21:29:54 foo kernel: EXT4-fs error (device dm-3): dx_probe:740: inode #22288562: comm git: Directory index failed checksum Time to run fsck? Is it the consequence of a previous crash (I had many recently)? Johannes
Re: 4.6.2 frequent crashes under memory + IO pressure
(adding back Cc:, just dropped it to send the logs) On Mon, Jun 27, 2016 at 01:35:14AM +0900, Tetsuo Handa wrote: > > It seems to me that GFP_NOIO allocation requests are depleting memory reserves > because they are passing ALLOC_NO_WATERMARKS to get_page_from_freelist(). > But I'm not familiar with block layer / swap I/O operation. So, will you post > to linux-mm ML for somebody else to help you? Frankly I don't care that much about 4.6.y when 4.7 is fixed. Or, maybe the root issue is not fixed but the new oom code covers it. Below I see both dm and kcryptd so there is no surprise when using swap on lvm on dm-crypt triggers it. Maybe it's not a new issue on 4.6 but just some random variation that makes it trigger easier with my particular workload. So, unless you would like to keep going at it I'd like to put the issue at rest. > kswapd0(766) 0x2201200 > 0x81167522 : get_page_from_freelist+0x0/0x82b [kernel] > 0x81168127 : __alloc_pages_nodemask+0x3da/0x978 [kernel] > 0x8119fb2a : new_slab+0xbc/0x3bb [kernel] > 0x811a1acd : ___slab_alloc.constprop.22+0x2fb/0x37b [kernel] > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x810c502d : put_lock_stats.isra.9+0xe/0x20 [kernel] (inexact) > 0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] > (inexact) > 0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] > (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x811a1c78 : kmem_cache_alloc+0xa0/0x1d6 [kernel] (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x81162b7a : mempool_alloc+0x72/0x154 [kernel] (inexact) > 0x810c6438 : __lock_acquire.isra.16+0x55e/0xb4c [kernel] (inexact) > 0x8133fdc1 : bio_alloc_bioset+0xe8/0x1d7 [kernel] (inexact) > 0x816342ea : alloc_tio+0x2d/0x47 [kernel] (inexact) > 0x8163587e : __split_and_process_bio+0x310/0x3a3 [kernel] (inexact) > 0x81635e15 : dm_make_request+0xb5/0xe2 [kernel] (inexact) > 0x81347ae7 : generic_make_request+0xcc/0x180 [kernel] (inexact) > 0x81347c98 : submit_bio+0xfd/0x145 [kernel] (inexact) > > kswapd0(766) 0x2201200 > 0x81167522 : get_page_from_freelist+0x0/0x82b [kernel] > 0x81168127 : __alloc_pages_nodemask+0x3da/0x978 [kernel] > 0x8119fb2a : new_slab+0xbc/0x3bb [kernel] > 0x811a1acd : ___slab_alloc.constprop.22+0x2fb/0x37b [kernel] > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x81640e29 : kcryptd_queue_crypt+0x63/0x68 [kernel] (inexact) > 0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] > (inexact) > 0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] > (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x811a1c78 : kmem_cache_alloc+0xa0/0x1d6 [kernel] (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x81162b7a : mempool_alloc+0x72/0x154 [kernel] (inexact) > 0x8101f5ba : sched_clock+0x9/0xd [kernel] (inexact) > 0x810ae420 : local_clock+0x20/0x22 [kernel] (inexact) > 0x8133fdc1 : bio_alloc_bioset+0xe8/0x1d7 [kernel] (inexact) > 0x811983d0 : end_swap_bio_write+0x0/0x6a [kernel] (inexact) > 0x8119854b : get_swap_bio+0x25/0x6c [kernel] (inexact) > 0x811983d0 : end_swap_bio_write+0x0/0x6a [kernel] (inexact) > 0x811988ef : __swap_writepage+0x1a9/0x225 [kernel] (inexact) > > > > > > # ~/systemtap.tmp/bin/stap -e 'global traces_bt[65536]; > > > probe begin { printf("Probe start!\n"); } > > > function dump_if_new(mask:long) { > > > bt = backtrace(); > > > if (traces_bt[bt]++ == 0) { > > > printf("%s(%u) 0x%lx\n", execname(), pid(), mask); > > > print_backtrace(); > > > printf("\n"); > > > } > > > } > > > probe kernel.function("get_page_from_freelist") { if ($alloc_flags & 0x4) > > > dump_if_new($gfp_mask); } > > > probe kernel.function("gfp_pfmemalloc_allowed").return { if ($return != > > > 0) dump_if_new($gfp_mask); } > > > probe end { delete traces_bt; }' > > ... > > > # addr2line -i -e /usr/src/linux-4.6.2/vmlinux 0x811b9c82 > > > /usr/src/linux-4.6.2/mm/memory.c:1162 > > > /usr/src/linux-4.6.2/mm/memory.c:1241 > > > /usr/src/linux-4.6.2/mm/memory.c:1262 > > > /usr/src/linux-4.6.2/mm/memory.c:1283 > > > > I'm attaching both the stap output and the serial console log, > > not sure what you're looking for with addr2line. Let me know. > > I just meant how to find location in source code from addresses. I meant the log is so large I wouldn't know which addresses would be interesting to look up. Thanks, Johannes
Re: 4.6.2 frequent crashes under memory + IO pressure
(adding back Cc:, just dropped it to send the logs) On Mon, Jun 27, 2016 at 01:35:14AM +0900, Tetsuo Handa wrote: > > It seems to me that GFP_NOIO allocation requests are depleting memory reserves > because they are passing ALLOC_NO_WATERMARKS to get_page_from_freelist(). > But I'm not familiar with block layer / swap I/O operation. So, will you post > to linux-mm ML for somebody else to help you? Frankly I don't care that much about 4.6.y when 4.7 is fixed. Or, maybe the root issue is not fixed but the new oom code covers it. Below I see both dm and kcryptd so there is no surprise when using swap on lvm on dm-crypt triggers it. Maybe it's not a new issue on 4.6 but just some random variation that makes it trigger easier with my particular workload. So, unless you would like to keep going at it I'd like to put the issue at rest. > kswapd0(766) 0x2201200 > 0x81167522 : get_page_from_freelist+0x0/0x82b [kernel] > 0x81168127 : __alloc_pages_nodemask+0x3da/0x978 [kernel] > 0x8119fb2a : new_slab+0xbc/0x3bb [kernel] > 0x811a1acd : ___slab_alloc.constprop.22+0x2fb/0x37b [kernel] > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x810c502d : put_lock_stats.isra.9+0xe/0x20 [kernel] (inexact) > 0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] > (inexact) > 0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] > (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x811a1c78 : kmem_cache_alloc+0xa0/0x1d6 [kernel] (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x81162b7a : mempool_alloc+0x72/0x154 [kernel] (inexact) > 0x810c6438 : __lock_acquire.isra.16+0x55e/0xb4c [kernel] (inexact) > 0x8133fdc1 : bio_alloc_bioset+0xe8/0x1d7 [kernel] (inexact) > 0x816342ea : alloc_tio+0x2d/0x47 [kernel] (inexact) > 0x8163587e : __split_and_process_bio+0x310/0x3a3 [kernel] (inexact) > 0x81635e15 : dm_make_request+0xb5/0xe2 [kernel] (inexact) > 0x81347ae7 : generic_make_request+0xcc/0x180 [kernel] (inexact) > 0x81347c98 : submit_bio+0xfd/0x145 [kernel] (inexact) > > kswapd0(766) 0x2201200 > 0x81167522 : get_page_from_freelist+0x0/0x82b [kernel] > 0x81168127 : __alloc_pages_nodemask+0x3da/0x978 [kernel] > 0x8119fb2a : new_slab+0xbc/0x3bb [kernel] > 0x811a1acd : ___slab_alloc.constprop.22+0x2fb/0x37b [kernel] > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x81640e29 : kcryptd_queue_crypt+0x63/0x68 [kernel] (inexact) > 0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] > (inexact) > 0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] > (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x811a1c78 : kmem_cache_alloc+0xa0/0x1d6 [kernel] (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact) > 0x81162b7a : mempool_alloc+0x72/0x154 [kernel] (inexact) > 0x8101f5ba : sched_clock+0x9/0xd [kernel] (inexact) > 0x810ae420 : local_clock+0x20/0x22 [kernel] (inexact) > 0x8133fdc1 : bio_alloc_bioset+0xe8/0x1d7 [kernel] (inexact) > 0x811983d0 : end_swap_bio_write+0x0/0x6a [kernel] (inexact) > 0x8119854b : get_swap_bio+0x25/0x6c [kernel] (inexact) > 0x811983d0 : end_swap_bio_write+0x0/0x6a [kernel] (inexact) > 0x811988ef : __swap_writepage+0x1a9/0x225 [kernel] (inexact) > > > > > > # ~/systemtap.tmp/bin/stap -e 'global traces_bt[65536]; > > > probe begin { printf("Probe start!\n"); } > > > function dump_if_new(mask:long) { > > > bt = backtrace(); > > > if (traces_bt[bt]++ == 0) { > > > printf("%s(%u) 0x%lx\n", execname(), pid(), mask); > > > print_backtrace(); > > > printf("\n"); > > > } > > > } > > > probe kernel.function("get_page_from_freelist") { if ($alloc_flags & 0x4) > > > dump_if_new($gfp_mask); } > > > probe kernel.function("gfp_pfmemalloc_allowed").return { if ($return != > > > 0) dump_if_new($gfp_mask); } > > > probe end { delete traces_bt; }' > > ... > > > # addr2line -i -e /usr/src/linux-4.6.2/vmlinux 0x811b9c82 > > > /usr/src/linux-4.6.2/mm/memory.c:1162 > > > /usr/src/linux-4.6.2/mm/memory.c:1241 > > > /usr/src/linux-4.6.2/mm/memory.c:1262 > > > /usr/src/linux-4.6.2/mm/memory.c:1283 > > > > I'm attaching both the stap output and the serial console log, > > not sure what you're looking for with addr2line. Let me know. > > I just meant how to find location in source code from addresses. I meant the log is so large I wouldn't know which addresses would be interesting to look up. Thanks, Johannes
Re: 4.6.2 frequent crashes under memory + IO pressure
On Sun, Jun 26, 2016 at 02:04:40AM +0900, Tetsuo Handa wrote: > It seems to me that somebody is using ALLOC_NO_WATERMARKS (with possibly > __GFP_NOWARN), but I don't know how to identify such callers. Maybe print > backtrace from __alloc_pages_slowpath() when ALLOC_NO_WATERMARKS is used? Wouldn't this create too much output for slow serial console? Or is this case supposed to be triggered rarely? This crash testing is pretty painful but I can try it tomorrow if there is no better idea. Johannes
Re: 4.6.2 frequent crashes under memory + IO pressure
On Sun, Jun 26, 2016 at 02:04:40AM +0900, Tetsuo Handa wrote: > It seems to me that somebody is using ALLOC_NO_WATERMARKS (with possibly > __GFP_NOWARN), but I don't know how to identify such callers. Maybe print > backtrace from __alloc_pages_slowpath() when ALLOC_NO_WATERMARKS is used? Wouldn't this create too much output for slow serial console? Or is this case supposed to be triggered rarely? This crash testing is pretty painful but I can try it tomorrow if there is no better idea. Johannes
Re: 4.6.2 frequent crashes under memory + IO pressure
On Thu, Jun 23, 2016 at 08:26:35PM +0900, Tetsuo Handa wrote: > > Since you think you saw OOM messages with the older kernels, I assume that > the OOM > killer was invoked on your 4.6.2 kernel. The OOM reaper in Linux 4.6 and > Linux 4.7 > will not help if the OOM killed process was between down_write(>mmap_sem) > and > up_write(>mmap_sem). > > I was not able to confirm whether the OOM killed process (I guess it was java) > was holding mm->mmap_sem for write, for /proc/sys/kernel/hung_task_warnings > dropped to 0 before traces of java threads are printed or console became > unusable due to the "delayed: kcryptd_crypt, ..." line. Anyway, I think that > kmallocwd will report it. > > > > It is sad that we haven't merged kmallocwd which will report > > > which memory allocations are stalling > > > ( > > > http://lkml.kernel.org/r/1462630604-23410-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp > > > ). > > > > Would you like me to try it? It wouldn't prevent the hang, though, > > just print better debug ouptut to serial console, right? > > Or would it OOM kill some process? > > Yes, but for bisection purpose, please try commit 78ebc2f7146156f4 without > applying kmallocwd. If that commit helps avoiding flood of the allocation > failure warnings, we can consider backporting it. If that commit does not > help, I think you are reporting a new location which we should not use > memory reserves. > > kmallocwd will not OOM kill some process. kmallocwd will not prevent the hang. > kmallocwd just prints information of threads which are stalling inside memory > allocation request. First I tried today's git, linux-4.7-rc4-187-g086e3eb, and the good news is that the oom killer seems to work very well and reliably killed the offending task (java). It happened a few times, the AOSP build broke and I restarted it until it completed. E.g.: [ 2083.604374] Purging GPU memory, 0 pages freed, 4508 pages still pinned. [ 2083.611000] 96 and 0 pages still available in the bound and unbound GPU page lists. [ 2083.618815] make invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 [ 2083.629257] make cpuset=/ mems_allowed=0 ... [ 2084.688753] Out of memory: Kill process 10431 (java) score 378 or sacrifice child [ 2084.696593] Killed process 10431 (java) total-vm:5200964kB, anon-rss:2521764kB, file-rss:0kB, shmem-rss:0kB [ 2084.938058] oom_reaper: reaped process 10431 (java), now anon-rss:0kB, file-rss:8kB, shmem-rss:0kB Next I tried 4.6.2 with 78ebc2f7146156f4, then with kmallocwd (needed one manual fixup), then both patches. It still livelocked in all cases, the log spew looked a bit different with 78ebc2f7146156f4 applied but still continued endlessly. kmallocwd alone didn't trigger, with both patches applied kmallocwd triggered but: [ 363.815595] MemAlloc-Info: stalling=33 dying=0 exiting=42 victim=0 oom_count=0 [ 363.815601] MemAlloc: kworker/0:0(4) flags=0x4208860 switches=212 seq=1 gfp=0x26012c0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_NOTRACK) order=0 delay=17984 ** 1402 printk messages dropped ** [ 363.818816] [] __do_page_cache_readahead+0x144/0x29d ** 501 printk messages dropped ** I'll zip up the logs and send them off-list. Thanks, Johannes
Re: 4.6.2 frequent crashes under memory + IO pressure
On Thu, Jun 23, 2016 at 08:26:35PM +0900, Tetsuo Handa wrote: > > Since you think you saw OOM messages with the older kernels, I assume that > the OOM > killer was invoked on your 4.6.2 kernel. The OOM reaper in Linux 4.6 and > Linux 4.7 > will not help if the OOM killed process was between down_write(>mmap_sem) > and > up_write(>mmap_sem). > > I was not able to confirm whether the OOM killed process (I guess it was java) > was holding mm->mmap_sem for write, for /proc/sys/kernel/hung_task_warnings > dropped to 0 before traces of java threads are printed or console became > unusable due to the "delayed: kcryptd_crypt, ..." line. Anyway, I think that > kmallocwd will report it. > > > > It is sad that we haven't merged kmallocwd which will report > > > which memory allocations are stalling > > > ( > > > http://lkml.kernel.org/r/1462630604-23410-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp > > > ). > > > > Would you like me to try it? It wouldn't prevent the hang, though, > > just print better debug ouptut to serial console, right? > > Or would it OOM kill some process? > > Yes, but for bisection purpose, please try commit 78ebc2f7146156f4 without > applying kmallocwd. If that commit helps avoiding flood of the allocation > failure warnings, we can consider backporting it. If that commit does not > help, I think you are reporting a new location which we should not use > memory reserves. > > kmallocwd will not OOM kill some process. kmallocwd will not prevent the hang. > kmallocwd just prints information of threads which are stalling inside memory > allocation request. First I tried today's git, linux-4.7-rc4-187-g086e3eb, and the good news is that the oom killer seems to work very well and reliably killed the offending task (java). It happened a few times, the AOSP build broke and I restarted it until it completed. E.g.: [ 2083.604374] Purging GPU memory, 0 pages freed, 4508 pages still pinned. [ 2083.611000] 96 and 0 pages still available in the bound and unbound GPU page lists. [ 2083.618815] make invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 [ 2083.629257] make cpuset=/ mems_allowed=0 ... [ 2084.688753] Out of memory: Kill process 10431 (java) score 378 or sacrifice child [ 2084.696593] Killed process 10431 (java) total-vm:5200964kB, anon-rss:2521764kB, file-rss:0kB, shmem-rss:0kB [ 2084.938058] oom_reaper: reaped process 10431 (java), now anon-rss:0kB, file-rss:8kB, shmem-rss:0kB Next I tried 4.6.2 with 78ebc2f7146156f4, then with kmallocwd (needed one manual fixup), then both patches. It still livelocked in all cases, the log spew looked a bit different with 78ebc2f7146156f4 applied but still continued endlessly. kmallocwd alone didn't trigger, with both patches applied kmallocwd triggered but: [ 363.815595] MemAlloc-Info: stalling=33 dying=0 exiting=42 victim=0 oom_count=0 [ 363.815601] MemAlloc: kworker/0:0(4) flags=0x4208860 switches=212 seq=1 gfp=0x26012c0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_NOTRACK) order=0 delay=17984 ** 1402 printk messages dropped ** [ 363.818816] [] __do_page_cache_readahead+0x144/0x29d ** 501 printk messages dropped ** I'll zip up the logs and send them off-list. Thanks, Johannes
Re: 4.6.2 frequent crashes under memory + IO pressure
On Tue, Jun 21, 2016 at 08:47:51PM +0900, Tetsuo Handa wrote: > Johannes Stezenbach wrote: > > > > a man's got to have a hobby, thus I'm running Android AOSP > > builds on my home PC which has 4GB of RAM, 4GB swap. > > Apparently it is not really adequate for the job but used to > > work with a 4.4.10 kernel. Now I upgraded to 4.6.2 > > and it crashes usually within 30mins during compilation. > > Such reproducer is welcomed. > You might be hitting OOM livelock using innocent workload. > > > The crash is a hard hang, mouse doesn't move, no reaction > > to keyboard, nothing in logs (systemd journal) after reboot. > > Yes, it seems to me that your system is OOM livelocked. I got from my crash log that X is hanging in i915_gem_object_get_pages_gtt, and network is dead due to order 0 allocation errors causing a series of "ath9k_htc: RX memory allocation error", which is what makes the issue so unpleasant. The particular command which triggers it seems to be Jill from the Android Java toolchain (http://tools.android.com/tech-docs/jackandjill), which runs as "java -Xmx3500m -jar $(JILL_JAR)", i.e. potentially eating all my available RAM when linking the Android framework. Meanwhile I found some RAM and linux-4.6.2 runs stable with 8GB for this workload. The build time (for the partial AOSP rebuild that fairly reliably triggered the hangup) dropped from ~20min to ~17min (so it wasn't trashing too badly), swap usage dropped from ~50% (of 4GB) to <5%. > It is sad that we haven't merged kmallocwd which will report > which memory allocations are stalling > ( > http://lkml.kernel.org/r/1462630604-23410-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp > ). Would you like me to try it? It wouldn't prevent the hang, though, just print better debug ouptut to serial console, right? Or would it OOM kill some process? > > Then I tried 4.5.7, it seems to be stable so far. > > > > I'm using dm-crypt + lvm + ext4 (swap also in lvm). > > > > Now I hooked up a laptop to the serial port and captured > > some logs of the crash which seems to be repeating > > > > [ 2240.842567] swapper/3: page allocation failure: order:0, > > mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK) > > or > > [ 2241.167986] SLUB: Unable to allocate memory on node -1, > > gfp=0x2080020(GFP_ATOMIC) > > > > over and over. Based on the backtraces in the log I decided > > to hot-unplug USB devices, and twice the kernel came > > back to live, but on the 3rd crash it was dead for good. > > The values > > DMA free:12kB min:32kB > DMA32 free:2268kB min:6724kB > Normal free:84kB min:928kB > > suggest that memory reserves are spent for pointless purpose. Maybe your > system is > falling into situation which was mitigated by commit 78ebc2f7146156f4 > ("mm,writeback: > don't use memory reserves for wb_start_writeback"). Thus, applying that > commit to > your 4.6.2 kernel might help avoiding flood of these allocation failure > messages. I could try. Could you let me know if booting with mem=4G is equivalent, or do I need to use memmap= or physically remove the RAM (which is not so easy since the CPU fan is in the way). > > Before I pressed the reset button I used SysRq-W. At the bottom > > is a "BUG: workqueue lockup", it could be the result of > > the log spew on serial console taking so long but it looks > > like some IO is never completing. > > But even after you apply that commit, I guess you will still see silent hang > up > because the page allocator would think there is still reclaimable memory. So, > is > it possible to also try current linux.git kernels? I'd like to know whether > "OOM detection rework" (which went to 4.7) helps giving up reclaiming and > invoking the OOM killer with your workload. > > Maybe __GFP_FS allocations start invoking the OOM killer. But maybe __GFP_FS > allocations still remain stuck waiting for !__GFP_FS allocations whereas > !__GFP_FS > allocations gives up without invoking the OOM killer (i.e. effectively no > "give up"). I could also try. Same question about mem= though. What is your opinion about older kernels (4.4, 4.5) working? I think I've seen some OOM messages with the older kernels, Jill was killed and I restarted the build to complete it. A full bisect would take more than a day, I don't think I have the time for it. Since I use dm-crypt + lvm, should we add more Cc or do you think it is an mm issue? > > Below I'm pasting some log snippets, let me know if you like > > it so much you want more of it ;-/ The total log is about 1.7MB. > > Yes, I'd like to browse it. Could you send it to me? Did you get any additional insights from it? Thanks, Johannes
Re: 4.6.2 frequent crashes under memory + IO pressure
On Tue, Jun 21, 2016 at 08:47:51PM +0900, Tetsuo Handa wrote: > Johannes Stezenbach wrote: > > > > a man's got to have a hobby, thus I'm running Android AOSP > > builds on my home PC which has 4GB of RAM, 4GB swap. > > Apparently it is not really adequate for the job but used to > > work with a 4.4.10 kernel. Now I upgraded to 4.6.2 > > and it crashes usually within 30mins during compilation. > > Such reproducer is welcomed. > You might be hitting OOM livelock using innocent workload. > > > The crash is a hard hang, mouse doesn't move, no reaction > > to keyboard, nothing in logs (systemd journal) after reboot. > > Yes, it seems to me that your system is OOM livelocked. I got from my crash log that X is hanging in i915_gem_object_get_pages_gtt, and network is dead due to order 0 allocation errors causing a series of "ath9k_htc: RX memory allocation error", which is what makes the issue so unpleasant. The particular command which triggers it seems to be Jill from the Android Java toolchain (http://tools.android.com/tech-docs/jackandjill), which runs as "java -Xmx3500m -jar $(JILL_JAR)", i.e. potentially eating all my available RAM when linking the Android framework. Meanwhile I found some RAM and linux-4.6.2 runs stable with 8GB for this workload. The build time (for the partial AOSP rebuild that fairly reliably triggered the hangup) dropped from ~20min to ~17min (so it wasn't trashing too badly), swap usage dropped from ~50% (of 4GB) to <5%. > It is sad that we haven't merged kmallocwd which will report > which memory allocations are stalling > ( > http://lkml.kernel.org/r/1462630604-23410-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp > ). Would you like me to try it? It wouldn't prevent the hang, though, just print better debug ouptut to serial console, right? Or would it OOM kill some process? > > Then I tried 4.5.7, it seems to be stable so far. > > > > I'm using dm-crypt + lvm + ext4 (swap also in lvm). > > > > Now I hooked up a laptop to the serial port and captured > > some logs of the crash which seems to be repeating > > > > [ 2240.842567] swapper/3: page allocation failure: order:0, > > mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK) > > or > > [ 2241.167986] SLUB: Unable to allocate memory on node -1, > > gfp=0x2080020(GFP_ATOMIC) > > > > over and over. Based on the backtraces in the log I decided > > to hot-unplug USB devices, and twice the kernel came > > back to live, but on the 3rd crash it was dead for good. > > The values > > DMA free:12kB min:32kB > DMA32 free:2268kB min:6724kB > Normal free:84kB min:928kB > > suggest that memory reserves are spent for pointless purpose. Maybe your > system is > falling into situation which was mitigated by commit 78ebc2f7146156f4 > ("mm,writeback: > don't use memory reserves for wb_start_writeback"). Thus, applying that > commit to > your 4.6.2 kernel might help avoiding flood of these allocation failure > messages. I could try. Could you let me know if booting with mem=4G is equivalent, or do I need to use memmap= or physically remove the RAM (which is not so easy since the CPU fan is in the way). > > Before I pressed the reset button I used SysRq-W. At the bottom > > is a "BUG: workqueue lockup", it could be the result of > > the log spew on serial console taking so long but it looks > > like some IO is never completing. > > But even after you apply that commit, I guess you will still see silent hang > up > because the page allocator would think there is still reclaimable memory. So, > is > it possible to also try current linux.git kernels? I'd like to know whether > "OOM detection rework" (which went to 4.7) helps giving up reclaiming and > invoking the OOM killer with your workload. > > Maybe __GFP_FS allocations start invoking the OOM killer. But maybe __GFP_FS > allocations still remain stuck waiting for !__GFP_FS allocations whereas > !__GFP_FS > allocations gives up without invoking the OOM killer (i.e. effectively no > "give up"). I could also try. Same question about mem= though. What is your opinion about older kernels (4.4, 4.5) working? I think I've seen some OOM messages with the older kernels, Jill was killed and I restarted the build to complete it. A full bisect would take more than a day, I don't think I have the time for it. Since I use dm-crypt + lvm, should we add more Cc or do you think it is an mm issue? > > Below I'm pasting some log snippets, let me know if you like > > it so much you want more of it ;-/ The total log is about 1.7MB. > > Yes, I'd like to browse it. Could you send it to me? Did you get any additional insights from it? Thanks, Johannes
4.6.2 frequent crashes under memory + IO pressure
Hi, a man's got to have a hobby, thus I'm running Android AOSP builds on my home PC which has 4GB of RAM, 4GB swap. Apparently it is not really adequate for the job but used to work with a 4.4.10 kernel. Now I upgraded to 4.6.2 and it crashes usually within 30mins during compilation. The crash is a hard hang, mouse doesn't move, no reaction to keyboard, nothing in logs (systemd journal) after reboot. Then I tried 4.5.7, it seems to be stable so far. I'm using dm-crypt + lvm + ext4 (swap also in lvm). Now I hooked up a laptop to the serial port and captured some logs of the crash which seems to be repeating [ 2240.842567] swapper/3: page allocation failure: order:0, mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK) or [ 2241.167986] SLUB: Unable to allocate memory on node -1, gfp=0x2080020(GFP_ATOMIC) over and over. Based on the backtraces in the log I decided to hot-unplug USB devices, and twice the kernel came back to live, but on the 3rd crash it was dead for good. Before I pressed the reset button I used SysRq-W. At the bottom is a "BUG: workqueue lockup", it could be the result of the log spew on serial console taking so long but it looks like some IO is never completing. Below I'm pasting some log snippets, let me know if you like it so much you want more of it ;-/ The total log is about 1.7MB. Thanks, Johannes [ 2240.837431] warn_alloc_failed: 13 callbacks suppressed [ 2240.842567] swapper/3: page allocation failure: order:0, mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK) [ 2240.852384] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.6.2 #2 [ 2240.858215] Hardware name: System manufacturer System Product Name/P8H77-V, BIOS 1905 10/27/2014 [ 2240.866985] 0086 8d325b5c895ad90b 88011b603a90 81368f0c [ 2240.874437] 88011b603b30 811659de [ 2240.881907] 88011b603b40 02200021 88011b603b18 81f58240 [ 2240.889396] Call Trace: [ 2240.891839][] dump_stack+0x85/0xbe [ 2240.897611] [] warn_alloc_failed+0x134/0x15c [ 2240.903531] [] __alloc_pages_nodemask+0x7bd/0x978 [ 2240.909884] [] new_slab+0x129/0x3bb [ 2240.915030] [] ___slab_alloc.constprop.22+0x2fb/0x37b [ 2240.921730] [] ? __alloc_skb+0x55/0x1b4 [ 2240.927224] [] ? skb_release_data+0xc0/0xd0 [ 2240.933046] [] ? kfree+0x1c0/0x216 [ 2240.938089] [] __slab_alloc.isra.17.constprop.21+0x57/0x8b [ 2240.945214] [] ? __slab_alloc.isra.17.constprop.21+0x57/0x8b [ 2240.952520] [] ? __alloc_skb+0x55/0x1b4 [ 2240.957997] [] kmem_cache_alloc+0xa0/0x1d6 [ 2240.963734] [] ? __alloc_skb+0x55/0x1b4 [ 2240.969210] [] __alloc_skb+0x55/0x1b4 [ 2240.974524] [] ath9k_hif_usb_reg_in_cb+0xd4/0x181 [ath9k_htc] [ 2240.981925] [] __usb_hcd_giveback_urb+0xa6/0x10b [ 2240.988215] [] usb_giveback_urb_bh+0x9a/0xe4 [ 2240.994134] [] tasklet_hi_action+0x10c/0x11b [ 2241.63] [] __do_softirq+0x182/0x377 [ 2241.005548] [] irq_exit+0x54/0xa8 [ 2241.010521] [] do_IRQ+0xc7/0xdf [ 2241.015321] [] common_interrupt+0x8c/0x8c [ 2241.020981][] ? cpuidle_enter_state+0x1ae/0x251 [ 2241.027888] [] cpuidle_enter+0x17/0x19 [ 2241.033280] [] call_cpuidle+0x44/0x46 [ 2241.038600] [] cpu_startup_entry+0x2a7/0x378 [ 2241.044524] [] start_secondary+0x17c/0x192 [ 2241.050265] Mem-Info: [ 2241.052543] active_anon:654174 inactive_anon:208849 isolated_anon:64 [ 2241.052543] active_file:4782 inactive_file:3878 isolated_file:0 [ 2241.052543] unevictable:1156 dirty:8 writeback:28052 unstable:0 [ 2241.052543] slab_reclaimable:13827 slab_unreclaimable:25768 [ 2241.052543] mapped:6794 shmem:3939 pagetables:5299 bounce:0 [ 2241.052543] free:424 free_pcp:39 free_cma:0 [ 2241.086414] DMA free:12kB min:32kB low:44kB high:56kB active_anon:28kB inactive_anon:84kB active_file:68kB inactive_file:40kB unevictable:124kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:124kB dirty:0kB writeback:0kB mapped:228kB shmem:36kB slab_reclaimable:552kB slab_unreclaimable:14656kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2241.128265] lowmem_reserve[]: 0 3156 3592 3592 [ 2241.132792] DMA32 free:2120kB min:6724kB low:9956kB high:13188kB active_anon:2414116kB inactive_anon:629228kB active_file:15184kB inactive_file:13336kB unevictable:3624kB isolated(anon):256kB isolated(file):0kB present:3334492kB managed:3243420kB mlocked:3624kB dirty:24kB writeback:104760kB mapped:21988kB shmem:13936kB slab_reclaimable:46356kB slab_unreclaimable:74196kB kernel_stack:4144kB pagetables:17708kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:92 all_unreclaimable? no [ 2241.167769] kworker/u8:3: page allocation failure: order:0, mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK) [ 2241.167771] CPU: 2 PID: 1470 Comm: kworker/u8:3 Not tainted 4.6.2 #2 [ 2241.167772] Hardware name: System
4.6.2 frequent crashes under memory + IO pressure
Hi, a man's got to have a hobby, thus I'm running Android AOSP builds on my home PC which has 4GB of RAM, 4GB swap. Apparently it is not really adequate for the job but used to work with a 4.4.10 kernel. Now I upgraded to 4.6.2 and it crashes usually within 30mins during compilation. The crash is a hard hang, mouse doesn't move, no reaction to keyboard, nothing in logs (systemd journal) after reboot. Then I tried 4.5.7, it seems to be stable so far. I'm using dm-crypt + lvm + ext4 (swap also in lvm). Now I hooked up a laptop to the serial port and captured some logs of the crash which seems to be repeating [ 2240.842567] swapper/3: page allocation failure: order:0, mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK) or [ 2241.167986] SLUB: Unable to allocate memory on node -1, gfp=0x2080020(GFP_ATOMIC) over and over. Based on the backtraces in the log I decided to hot-unplug USB devices, and twice the kernel came back to live, but on the 3rd crash it was dead for good. Before I pressed the reset button I used SysRq-W. At the bottom is a "BUG: workqueue lockup", it could be the result of the log spew on serial console taking so long but it looks like some IO is never completing. Below I'm pasting some log snippets, let me know if you like it so much you want more of it ;-/ The total log is about 1.7MB. Thanks, Johannes [ 2240.837431] warn_alloc_failed: 13 callbacks suppressed [ 2240.842567] swapper/3: page allocation failure: order:0, mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK) [ 2240.852384] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.6.2 #2 [ 2240.858215] Hardware name: System manufacturer System Product Name/P8H77-V, BIOS 1905 10/27/2014 [ 2240.866985] 0086 8d325b5c895ad90b 88011b603a90 81368f0c [ 2240.874437] 88011b603b30 811659de [ 2240.881907] 88011b603b40 02200021 88011b603b18 81f58240 [ 2240.889396] Call Trace: [ 2240.891839][] dump_stack+0x85/0xbe [ 2240.897611] [] warn_alloc_failed+0x134/0x15c [ 2240.903531] [] __alloc_pages_nodemask+0x7bd/0x978 [ 2240.909884] [] new_slab+0x129/0x3bb [ 2240.915030] [] ___slab_alloc.constprop.22+0x2fb/0x37b [ 2240.921730] [] ? __alloc_skb+0x55/0x1b4 [ 2240.927224] [] ? skb_release_data+0xc0/0xd0 [ 2240.933046] [] ? kfree+0x1c0/0x216 [ 2240.938089] [] __slab_alloc.isra.17.constprop.21+0x57/0x8b [ 2240.945214] [] ? __slab_alloc.isra.17.constprop.21+0x57/0x8b [ 2240.952520] [] ? __alloc_skb+0x55/0x1b4 [ 2240.957997] [] kmem_cache_alloc+0xa0/0x1d6 [ 2240.963734] [] ? __alloc_skb+0x55/0x1b4 [ 2240.969210] [] __alloc_skb+0x55/0x1b4 [ 2240.974524] [] ath9k_hif_usb_reg_in_cb+0xd4/0x181 [ath9k_htc] [ 2240.981925] [] __usb_hcd_giveback_urb+0xa6/0x10b [ 2240.988215] [] usb_giveback_urb_bh+0x9a/0xe4 [ 2240.994134] [] tasklet_hi_action+0x10c/0x11b [ 2241.63] [] __do_softirq+0x182/0x377 [ 2241.005548] [] irq_exit+0x54/0xa8 [ 2241.010521] [] do_IRQ+0xc7/0xdf [ 2241.015321] [] common_interrupt+0x8c/0x8c [ 2241.020981][] ? cpuidle_enter_state+0x1ae/0x251 [ 2241.027888] [] cpuidle_enter+0x17/0x19 [ 2241.033280] [] call_cpuidle+0x44/0x46 [ 2241.038600] [] cpu_startup_entry+0x2a7/0x378 [ 2241.044524] [] start_secondary+0x17c/0x192 [ 2241.050265] Mem-Info: [ 2241.052543] active_anon:654174 inactive_anon:208849 isolated_anon:64 [ 2241.052543] active_file:4782 inactive_file:3878 isolated_file:0 [ 2241.052543] unevictable:1156 dirty:8 writeback:28052 unstable:0 [ 2241.052543] slab_reclaimable:13827 slab_unreclaimable:25768 [ 2241.052543] mapped:6794 shmem:3939 pagetables:5299 bounce:0 [ 2241.052543] free:424 free_pcp:39 free_cma:0 [ 2241.086414] DMA free:12kB min:32kB low:44kB high:56kB active_anon:28kB inactive_anon:84kB active_file:68kB inactive_file:40kB unevictable:124kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:124kB dirty:0kB writeback:0kB mapped:228kB shmem:36kB slab_reclaimable:552kB slab_unreclaimable:14656kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2241.128265] lowmem_reserve[]: 0 3156 3592 3592 [ 2241.132792] DMA32 free:2120kB min:6724kB low:9956kB high:13188kB active_anon:2414116kB inactive_anon:629228kB active_file:15184kB inactive_file:13336kB unevictable:3624kB isolated(anon):256kB isolated(file):0kB present:3334492kB managed:3243420kB mlocked:3624kB dirty:24kB writeback:104760kB mapped:21988kB shmem:13936kB slab_reclaimable:46356kB slab_unreclaimable:74196kB kernel_stack:4144kB pagetables:17708kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:92 all_unreclaimable? no [ 2241.167769] kworker/u8:3: page allocation failure: order:0, mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK) [ 2241.167771] CPU: 2 PID: 1470 Comm: kworker/u8:3 Not tainted 4.6.2 #2 [ 2241.167772] Hardware name: System
uas: order 7 page allocation failure in init_tag_map()
Hi, I bought a new backup disk which turned out to be UAS capable, but when I plugged it in I got an order 7 page allocation failure. My hunch is that the .can_queue = 65536 in drivers/usb/storage/uas.c is much too large. Maybe 256 would be a pratical value that matches the capabilities of existing hardware? [1859683.261465] usb 4-2: new SuperSpeed USB device number 8 using xhci_hcd [1859683.281986] scsi host18: uas [1859683.282003] kworker/0:2: page allocation failure: order:7, mode:0x208c020 [1859683.282008] CPU: 0 PID: 6888 Comm: kworker/0:2 Not tainted 4.4.6 #1 [1859683.282011] Hardware name: System manufacturer System Product Name/P8H77-V, BIOS 1905 10/27/2014 [1859683.282017] Workqueue: usb_hub_wq hub_event [1859683.282021] 0286 d38f5999 8800751674d0 813527de [1859683.282026] 0208c020 880075167570 81157c56 [1859683.282031] 880075167580 880075167508 81f43840 00f438b8 [1859683.282036] Call Trace: [1859683.282045] [] dump_stack+0x85/0xbe [1859683.282050] [] warn_alloc_failed+0x12c/0x156 [1859683.282055] [] __alloc_pages_nodemask+0x73a/0x8f1 [1859683.282060] [] ? dev_vprintk_emit+0x1cb/0x1f1 [1859683.282065] [] alloc_kmem_pages+0x22/0x8a [1859683.282069] [] kmalloc_order+0x18/0x46 [1859683.282072] [] kmalloc_order_trace+0x21/0xe9 [1859683.282077] [] __kmalloc+0x38/0x22f [1859683.282081] [] ? __blk_queue_init_tags+0x2f/0x73 [1859683.282085] [] init_tag_map+0x54/0xa3 [1859683.282088] [] __blk_queue_init_tags+0x45/0x73 [1859683.282092] [] blk_init_tags+0x14/0x16 [1859683.282096] [] scsi_add_host_with_dma+0xc8/0x2a0 [1859683.282102] [] uas_probe+0x3aa/0x420 [uas] [1859683.282107] [] usb_probe_interface+0x1a6/0x22d [1859683.282112] [] driver_probe_device+0x173/0x3a6 [1859683.282116] [] __device_attach_driver+0x71/0x78 [1859683.282120] [] ? driver_allows_async_probing+0x31/0x31 [1859683.282124] [] bus_for_each_drv+0x8a/0xad [1859683.282128] [] __device_attach+0xba/0x14f [1859683.282132] [] device_initial_probe+0x13/0x15 [1859683.282136] [] bus_probe_device+0x33/0x9e [1859683.282140] [] device_add+0x2e4/0x56e [1859683.282144] [] usb_set_configuration+0x689/0x6d9 [1859683.282148] [] ? debug_smp_processor_id+0x17/0x19 [1859683.282152] [] generic_probe+0x43/0x73 [1859683.282156] [] usb_probe_device+0x53/0x66 [1859683.282159] [] driver_probe_device+0x173/0x3a6 [1859683.282163] [] __device_attach_driver+0x71/0x78 [1859683.282167] [] ? driver_allows_async_probing+0x31/0x31 [1859683.282171] [] bus_for_each_drv+0x8a/0xad [1859683.282175] [] __device_attach+0xba/0x14f [1859683.282179] [] device_initial_probe+0x13/0x15 [1859683.282183] [] bus_probe_device+0x33/0x9e [1859683.282186] [] device_add+0x2e4/0x56e [1859683.282191] [] usb_new_device+0x241/0x38a [1859683.282194] [] hub_event+0xcb9/0x10f2 [1859683.282201] [] process_one_work+0x27f/0x4d7 [1859683.282206] [] ? put_lock_stats.isra.9+0xe/0x20 [1859683.282209] [] worker_thread+0x273/0x35b [1859683.282214] [] ? rescuer_thread+0x2a7/0x2a7 [1859683.282217] [] kthread+0xff/0x107 [1859683.28] [] ? kthread_create_on_node+0x1ea/0x1ea [1859683.282228] [] ret_from_fork+0x3f/0x70 [1859683.282231] [] ? kthread_create_on_node+0x1ea/0x1ea [1859683.282234] Mem-Info: [1859683.282241] active_anon:21278 inactive_anon:69854 isolated_anon:0 active_file:212300 inactive_file:194346 isolated_file:0 unevictable:2018 dirty:87 writeback:0 unstable:0 slab_reclaimable:127644 slab_unreclaimable:12137 mapped:11526 shmem:13394 pagetables:5007 bounce:0 free:270678 free_pcp:1027 free_cma:0 [1859683.282252] DMA free:14412kB min:32kB low:40kB high:48kB active_anon:180kB inactive_anon:468kB active_file:268kB inactive_file:92kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:4kB writeback:0kB mapped:172kB shmem:328kB slab_reclaimable:208kB slab_unreclaimable:92kB kernel_stack:0kB pagetables:56kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [1859683.282255] lowmem_reserve[]: 0 3162 3597 3597 [1859683.282267] DMA32 free:904468kB min:6728kB low:8408kB high:10092kB active_anon:66188kB inactive_anon:237164kB active_file:803244kB inactive_file:704168kB unevictable:7024kB isolated(anon):0kB isolated(file):0kB present:3334492kB managed:3243208kB mlocked:7024kB dirty:280kB writeback:0kB mapped:37116kB shmem:40212kB slab_reclaimable:435236kB slab_unreclaimable:37848kB kernel_stack:3968kB pagetables:16696kB unstable:0kB bounce:0kB free_pcp:2008kB local_pcp:632kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [1859683.282270] lowmem_reserve[]: 0 0 435 435 [1859683.282281] Normal free:163832kB min:924kB low:1152kB high:1384kB active_anon:18744kB inactive_anon:41784kB active_file:45688kB inactive_file:73124kB
uas: order 7 page allocation failure in init_tag_map()
Hi, I bought a new backup disk which turned out to be UAS capable, but when I plugged it in I got an order 7 page allocation failure. My hunch is that the .can_queue = 65536 in drivers/usb/storage/uas.c is much too large. Maybe 256 would be a pratical value that matches the capabilities of existing hardware? [1859683.261465] usb 4-2: new SuperSpeed USB device number 8 using xhci_hcd [1859683.281986] scsi host18: uas [1859683.282003] kworker/0:2: page allocation failure: order:7, mode:0x208c020 [1859683.282008] CPU: 0 PID: 6888 Comm: kworker/0:2 Not tainted 4.4.6 #1 [1859683.282011] Hardware name: System manufacturer System Product Name/P8H77-V, BIOS 1905 10/27/2014 [1859683.282017] Workqueue: usb_hub_wq hub_event [1859683.282021] 0286 d38f5999 8800751674d0 813527de [1859683.282026] 0208c020 880075167570 81157c56 [1859683.282031] 880075167580 880075167508 81f43840 00f438b8 [1859683.282036] Call Trace: [1859683.282045] [] dump_stack+0x85/0xbe [1859683.282050] [] warn_alloc_failed+0x12c/0x156 [1859683.282055] [] __alloc_pages_nodemask+0x73a/0x8f1 [1859683.282060] [] ? dev_vprintk_emit+0x1cb/0x1f1 [1859683.282065] [] alloc_kmem_pages+0x22/0x8a [1859683.282069] [] kmalloc_order+0x18/0x46 [1859683.282072] [] kmalloc_order_trace+0x21/0xe9 [1859683.282077] [] __kmalloc+0x38/0x22f [1859683.282081] [] ? __blk_queue_init_tags+0x2f/0x73 [1859683.282085] [] init_tag_map+0x54/0xa3 [1859683.282088] [] __blk_queue_init_tags+0x45/0x73 [1859683.282092] [] blk_init_tags+0x14/0x16 [1859683.282096] [] scsi_add_host_with_dma+0xc8/0x2a0 [1859683.282102] [] uas_probe+0x3aa/0x420 [uas] [1859683.282107] [] usb_probe_interface+0x1a6/0x22d [1859683.282112] [] driver_probe_device+0x173/0x3a6 [1859683.282116] [] __device_attach_driver+0x71/0x78 [1859683.282120] [] ? driver_allows_async_probing+0x31/0x31 [1859683.282124] [] bus_for_each_drv+0x8a/0xad [1859683.282128] [] __device_attach+0xba/0x14f [1859683.282132] [] device_initial_probe+0x13/0x15 [1859683.282136] [] bus_probe_device+0x33/0x9e [1859683.282140] [] device_add+0x2e4/0x56e [1859683.282144] [] usb_set_configuration+0x689/0x6d9 [1859683.282148] [] ? debug_smp_processor_id+0x17/0x19 [1859683.282152] [] generic_probe+0x43/0x73 [1859683.282156] [] usb_probe_device+0x53/0x66 [1859683.282159] [] driver_probe_device+0x173/0x3a6 [1859683.282163] [] __device_attach_driver+0x71/0x78 [1859683.282167] [] ? driver_allows_async_probing+0x31/0x31 [1859683.282171] [] bus_for_each_drv+0x8a/0xad [1859683.282175] [] __device_attach+0xba/0x14f [1859683.282179] [] device_initial_probe+0x13/0x15 [1859683.282183] [] bus_probe_device+0x33/0x9e [1859683.282186] [] device_add+0x2e4/0x56e [1859683.282191] [] usb_new_device+0x241/0x38a [1859683.282194] [] hub_event+0xcb9/0x10f2 [1859683.282201] [] process_one_work+0x27f/0x4d7 [1859683.282206] [] ? put_lock_stats.isra.9+0xe/0x20 [1859683.282209] [] worker_thread+0x273/0x35b [1859683.282214] [] ? rescuer_thread+0x2a7/0x2a7 [1859683.282217] [] kthread+0xff/0x107 [1859683.28] [] ? kthread_create_on_node+0x1ea/0x1ea [1859683.282228] [] ret_from_fork+0x3f/0x70 [1859683.282231] [] ? kthread_create_on_node+0x1ea/0x1ea [1859683.282234] Mem-Info: [1859683.282241] active_anon:21278 inactive_anon:69854 isolated_anon:0 active_file:212300 inactive_file:194346 isolated_file:0 unevictable:2018 dirty:87 writeback:0 unstable:0 slab_reclaimable:127644 slab_unreclaimable:12137 mapped:11526 shmem:13394 pagetables:5007 bounce:0 free:270678 free_pcp:1027 free_cma:0 [1859683.282252] DMA free:14412kB min:32kB low:40kB high:48kB active_anon:180kB inactive_anon:468kB active_file:268kB inactive_file:92kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:4kB writeback:0kB mapped:172kB shmem:328kB slab_reclaimable:208kB slab_unreclaimable:92kB kernel_stack:0kB pagetables:56kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [1859683.282255] lowmem_reserve[]: 0 3162 3597 3597 [1859683.282267] DMA32 free:904468kB min:6728kB low:8408kB high:10092kB active_anon:66188kB inactive_anon:237164kB active_file:803244kB inactive_file:704168kB unevictable:7024kB isolated(anon):0kB isolated(file):0kB present:3334492kB managed:3243208kB mlocked:7024kB dirty:280kB writeback:0kB mapped:37116kB shmem:40212kB slab_reclaimable:435236kB slab_unreclaimable:37848kB kernel_stack:3968kB pagetables:16696kB unstable:0kB bounce:0kB free_pcp:2008kB local_pcp:632kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [1859683.282270] lowmem_reserve[]: 0 0 435 435 [1859683.282281] Normal free:163832kB min:924kB low:1152kB high:1384kB active_anon:18744kB inactive_anon:41784kB active_file:45688kB inactive_file:73124kB
Re: Kernel docs: muddying the waters a bit
On Mon, Mar 07, 2016 at 12:29:08AM +0100, Johannes Stezenbach wrote: > On Sat, Mar 05, 2016 at 11:29:37PM -0300, Mauro Carvalho Chehab wrote: > > > > I converted one of the big tables to CSV. At least now it recognized > > it as a table. Yet, the table was very badly formated: > > > > https://mchehab.fedorapeople.org/media-kabi-docs-test/rst_tests/packed-rgb.html > > > > This is how this table should look like: > > https://linuxtv.org/downloads/v4l-dvb-apis/packed-rgb.html > > > > Also, as this table has merged cells at the legend. I've no idea how > > to tell sphinx to do that on csv format. > > > > The RST files are on this git tree: > > https://git.linuxtv.org/mchehab/v4l2-docs-poc.git/ > > Yeah, seems it can't do merged cells in csv. Attached patch converts it > back to grid table format and fixes the table definition. > The html output looks usable, but clearly it is no fun to > work with tables in Sphinx. > > Sphinx' latex writer can't handle nested tables, though. > Python's docutils rst2latex can, but that doesn't help here. > rst2pdf also supports it. But I have doubts such a large > table would render OK in pdf without using landscape orientation. > I have not tried because I used python3-sphinx but rst2pdf > is only availble for Python2 in Debian so it does not integrate > with Sphinx. Just a quick idea: Perhaps one alternative would be to use Graphviz to render the problematic tables, it supports a HTML-like syntax and can be embedded in Spinx documents: http://www.sphinx-doc.org/en/stable/ext/graphviz.html http://www.graphviz.org/content/node-shapes#html http://stackoverflow.com/questions/13890568/graphviz-html-nested-tables Johannes
Re: Kernel docs: muddying the waters a bit
On Mon, Mar 07, 2016 at 12:29:08AM +0100, Johannes Stezenbach wrote: > On Sat, Mar 05, 2016 at 11:29:37PM -0300, Mauro Carvalho Chehab wrote: > > > > I converted one of the big tables to CSV. At least now it recognized > > it as a table. Yet, the table was very badly formated: > > > > https://mchehab.fedorapeople.org/media-kabi-docs-test/rst_tests/packed-rgb.html > > > > This is how this table should look like: > > https://linuxtv.org/downloads/v4l-dvb-apis/packed-rgb.html > > > > Also, as this table has merged cells at the legend. I've no idea how > > to tell sphinx to do that on csv format. > > > > The RST files are on this git tree: > > https://git.linuxtv.org/mchehab/v4l2-docs-poc.git/ > > Yeah, seems it can't do merged cells in csv. Attached patch converts it > back to grid table format and fixes the table definition. > The html output looks usable, but clearly it is no fun to > work with tables in Sphinx. > > Sphinx' latex writer can't handle nested tables, though. > Python's docutils rst2latex can, but that doesn't help here. > rst2pdf also supports it. But I have doubts such a large > table would render OK in pdf without using landscape orientation. > I have not tried because I used python3-sphinx but rst2pdf > is only availble for Python2 in Debian so it does not integrate > with Sphinx. Just a quick idea: Perhaps one alternative would be to use Graphviz to render the problematic tables, it supports a HTML-like syntax and can be embedded in Spinx documents: http://www.sphinx-doc.org/en/stable/ext/graphviz.html http://www.graphviz.org/content/node-shapes#html http://stackoverflow.com/questions/13890568/graphviz-html-nested-tables Johannes
Re: Kernel docs: muddying the waters a bit
On Sat, Mar 05, 2016 at 11:29:37PM -0300, Mauro Carvalho Chehab wrote: > > I converted one of the big tables to CSV. At least now it recognized > it as a table. Yet, the table was very badly formated: > > https://mchehab.fedorapeople.org/media-kabi-docs-test/rst_tests/packed-rgb.html > > This is how this table should look like: > https://linuxtv.org/downloads/v4l-dvb-apis/packed-rgb.html > > Also, as this table has merged cells at the legend. I've no idea how > to tell sphinx to do that on csv format. > > The RST files are on this git tree: > https://git.linuxtv.org/mchehab/v4l2-docs-poc.git/ Yeah, seems it can't do merged cells in csv. Attached patch converts it back to grid table format and fixes the table definition. The html output looks usable, but clearly it is no fun to work with tables in Sphinx. Sphinx' latex writer can't handle nested tables, though. Python's docutils rst2latex can, but that doesn't help here. rst2pdf also supports it. But I have doubts such a large table would render OK in pdf without using landscape orientation. I have not tried because I used python3-sphinx but rst2pdf is only availble for Python2 in Debian so it does not integrate with Sphinx. Johannes >From 61674b398e778bd5ff644ffd493d5ff1cfaca0ef Mon Sep 17 00:00:00 2001 From: Johannes Stezenbach <j...@sig21.net> Date: Sun, 6 Mar 2016 23:55:19 +0100 Subject: [PATCH] some progress for html output --- _static/borderless.css | 3 -- _static/v4l2tables.css | 9 + _templates/layout.html | 9 + packed-rgb.rst | 88 +- pixfmt-yuyv.rst| 2 +- v4l-table-within-table.rst | 72 +++-- 6 files changed, 105 insertions(+), 78 deletions(-) delete mode 100644 _static/borderless.css create mode 100644 _static/v4l2tables.css diff --git a/_static/borderless.css b/_static/borderless.css deleted file mode 100644 index bfd4b01..000 --- a/_static/borderless.css +++ /dev/null @@ -1,3 +0,0 @@ -table#table-borderless { -border: 1px solid black; -} diff --git a/_static/v4l2tables.css b/_static/v4l2tables.css new file mode 100644 index 000..c045e45 --- /dev/null +++ b/_static/v4l2tables.css @@ -0,0 +1,9 @@ +table.noborder { +border: 1px solid black; +background: white; +white-space: nowrap; +} + +table.noborder td, table.noborder th { +padding: 0px; +} diff --git a/_templates/layout.html b/_templates/layout.html index b6bf12b..637332d 100644 --- a/_templates/layout.html +++ b/_templates/layout.html @@ -1,9 +1,2 @@ {% extends "!layout.html" %} -{% block tables %} - -table#table-borderless { -border: 1px solid black; -} - -{{ super() }} -{% endblock %} +{% set css_files = css_files + ["_static/v4l2tables.css"] %} diff --git a/packed-rgb.rst b/packed-rgb.rst index 352b91c..b4fcf3e 100644 --- a/packed-rgb.rst +++ b/packed-rgb.rst @@ -9,25 +9,46 @@ graphics frame buffers. They occupy 8, 16, 24 or 32 bits per pixel. These are all packed-pixel formats, meaning all the data for a pixel lie next to each other in memory. -.. csv-table:: Table: Packed RGB Image Formats - :header: Identifier,Code, ,Byte 0 in memory,Byte 1,Byte 2,Byte 3 +.. table:: Packed RGB Image Formats + :class: noborder - ``V4L2_PIX_FMT_RGB332``,'RGB1',,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`1`,b\ :sub:`0` - ``V4L2_PIX_FMT_ARGB444``,'AR12',,g\ :sub:`3`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,a\ :sub:`3`,a\ :sub:`2`,a\ :sub:`1`,a\ :sub:`0`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0` - ``V4L2_PIX_FMT_XRGB444``,'XR12',,g\ :sub:`3`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,-,-,-,-,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0` - ``V4L2_PIX_FMT_ARGB555``,'AR15',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,a,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3` - ``V4L2_PIX_FMT_XRGB555``,'XR15',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,-,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3` - ``V4L2_PIX_FMT_RGB565``,'RGBP',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`5`,g\ :sub:`4`,g\ :sub:`3` - ``V4L2_PIX_FMT_ARGB555X``,'AR15' | (1<<31),,a,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`,,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0` - ``V4L2_PIX_FMT_XRGB555X``,'XR15' | (1<<31),,-,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`,,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4
Re: Kernel docs: muddying the waters a bit
On Sat, Mar 05, 2016 at 11:29:37PM -0300, Mauro Carvalho Chehab wrote: > > I converted one of the big tables to CSV. At least now it recognized > it as a table. Yet, the table was very badly formated: > > https://mchehab.fedorapeople.org/media-kabi-docs-test/rst_tests/packed-rgb.html > > This is how this table should look like: > https://linuxtv.org/downloads/v4l-dvb-apis/packed-rgb.html > > Also, as this table has merged cells at the legend. I've no idea how > to tell sphinx to do that on csv format. > > The RST files are on this git tree: > https://git.linuxtv.org/mchehab/v4l2-docs-poc.git/ Yeah, seems it can't do merged cells in csv. Attached patch converts it back to grid table format and fixes the table definition. The html output looks usable, but clearly it is no fun to work with tables in Sphinx. Sphinx' latex writer can't handle nested tables, though. Python's docutils rst2latex can, but that doesn't help here. rst2pdf also supports it. But I have doubts such a large table would render OK in pdf without using landscape orientation. I have not tried because I used python3-sphinx but rst2pdf is only availble for Python2 in Debian so it does not integrate with Sphinx. Johannes >From 61674b398e778bd5ff644ffd493d5ff1cfaca0ef Mon Sep 17 00:00:00 2001 From: Johannes Stezenbach Date: Sun, 6 Mar 2016 23:55:19 +0100 Subject: [PATCH] some progress for html output --- _static/borderless.css | 3 -- _static/v4l2tables.css | 9 + _templates/layout.html | 9 + packed-rgb.rst | 88 +- pixfmt-yuyv.rst| 2 +- v4l-table-within-table.rst | 72 +++-- 6 files changed, 105 insertions(+), 78 deletions(-) delete mode 100644 _static/borderless.css create mode 100644 _static/v4l2tables.css diff --git a/_static/borderless.css b/_static/borderless.css deleted file mode 100644 index bfd4b01..000 --- a/_static/borderless.css +++ /dev/null @@ -1,3 +0,0 @@ -table#table-borderless { -border: 1px solid black; -} diff --git a/_static/v4l2tables.css b/_static/v4l2tables.css new file mode 100644 index 000..c045e45 --- /dev/null +++ b/_static/v4l2tables.css @@ -0,0 +1,9 @@ +table.noborder { +border: 1px solid black; +background: white; +white-space: nowrap; +} + +table.noborder td, table.noborder th { +padding: 0px; +} diff --git a/_templates/layout.html b/_templates/layout.html index b6bf12b..637332d 100644 --- a/_templates/layout.html +++ b/_templates/layout.html @@ -1,9 +1,2 @@ {% extends "!layout.html" %} -{% block tables %} - -table#table-borderless { -border: 1px solid black; -} - -{{ super() }} -{% endblock %} +{% set css_files = css_files + ["_static/v4l2tables.css"] %} diff --git a/packed-rgb.rst b/packed-rgb.rst index 352b91c..b4fcf3e 100644 --- a/packed-rgb.rst +++ b/packed-rgb.rst @@ -9,25 +9,46 @@ graphics frame buffers. They occupy 8, 16, 24 or 32 bits per pixel. These are all packed-pixel formats, meaning all the data for a pixel lie next to each other in memory. -.. csv-table:: Table: Packed RGB Image Formats - :header: Identifier,Code, ,Byte 0 in memory,Byte 1,Byte 2,Byte 3 +.. table:: Packed RGB Image Formats + :class: noborder - ``V4L2_PIX_FMT_RGB332``,'RGB1',,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`1`,b\ :sub:`0` - ``V4L2_PIX_FMT_ARGB444``,'AR12',,g\ :sub:`3`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,a\ :sub:`3`,a\ :sub:`2`,a\ :sub:`1`,a\ :sub:`0`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0` - ``V4L2_PIX_FMT_XRGB444``,'XR12',,g\ :sub:`3`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,-,-,-,-,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0` - ``V4L2_PIX_FMT_ARGB555``,'AR15',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,a,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3` - ``V4L2_PIX_FMT_XRGB555``,'XR15',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,-,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3` - ``V4L2_PIX_FMT_RGB565``,'RGBP',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`5`,g\ :sub:`4`,g\ :sub:`3` - ``V4L2_PIX_FMT_ARGB555X``,'AR15' | (1<<31),,a,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`,,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0` - ``V4L2_PIX_FMT_XRGB555X``,'XR15' | (1<<31),,-,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`,,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:
Re: Kernel docs: muddying the waters a bit
On Fri, Mar 04, 2016 at 09:59:50AM -0300, Mauro Carvalho Chehab wrote: > > 3) I tried to use a .. cssclass, as Johannes suggested, but > I was not able to include the CSS file. I suspect that this is > easy to fix, but I want to see if the cssclass will also work for > the pdf output as well. "cssclass" was (I think) a custom role defined in the example, unless you also have defined a custom role you can use plain "class". I have not looked deeper into the theming and template stuff. > 4) It seems that it can't produce nested tables in pdf: > > Markup is unsupported in LaTeX: > v4l-table-within-table:: nested tables are not yet implemented. > Makefile:115: recipe for target 'latexpdf' failed This: http://www.sphinx-doc.org/en/stable/markup/misc.html#tables suggests you need to add the tabularcolumns directive for complex tables. BTW, as an alternative to the ASCII-art input there is also support for CSV and list tables: http://docutils.sourceforge.net/docs/ref/rst/directives.html#table Johannes
Re: Kernel docs: muddying the waters a bit
On Fri, Mar 04, 2016 at 09:59:50AM -0300, Mauro Carvalho Chehab wrote: > > 3) I tried to use a .. cssclass, as Johannes suggested, but > I was not able to include the CSS file. I suspect that this is > easy to fix, but I want to see if the cssclass will also work for > the pdf output as well. "cssclass" was (I think) a custom role defined in the example, unless you also have defined a custom role you can use plain "class". I have not looked deeper into the theming and template stuff. > 4) It seems that it can't produce nested tables in pdf: > > Markup is unsupported in LaTeX: > v4l-table-within-table:: nested tables are not yet implemented. > Makefile:115: recipe for target 'latexpdf' failed This: http://www.sphinx-doc.org/en/stable/markup/misc.html#tables suggests you need to add the tabularcolumns directive for complex tables. BTW, as an alternative to the ASCII-art input there is also support for CSV and list tables: http://docutils.sourceforge.net/docs/ref/rst/directives.html#table Johannes
Re: Kernel docs: muddying the waters a bit
On Fri, Mar 04, 2016 at 10:29:08AM +0200, Jani Nikula wrote: > On Fri, 04 Mar 2016, Mauro Carvalho Chehabwrote: > > > > If, on the other hand, we decide to use RST, we'll very likely need to > > patch it to fulfill our needs in order to add proper table support. > > I've no idea how easy/difficult would be to do that, nor if Sphinx > > upstream would accept such changes. > > > > So, at the end of the day, we may end by having to carry on our own > > version of Sphinx inside our tree, with doesn't sound good, specially > > since it is not just a script, but a package with hundreds of > > files. > > If we end up having to modify Sphinx, it has a powerful extension > mechanism for this. We wouldn't have to worry about getting it merged to > Sphinx upstream, and we wouldn't have to carry a local version of all of > Sphinx. (In fact, the extension mechanism provides a future path for > doing kernel-doc within Sphinx instead of as a preprocessing step.) > > I know none of this alleviates your concerns with table supports right > now. I'll try to have a look at that a bit more. FWIW, I think table formatting in Sphinx works via style sheets. The mechanism is documented in the Python docutils docs that Sphinx is built upon. Basically you use the "class" or "role" directive and define the corresponding CSS or LaTeX (or rst2pdf) style. Here is one example (using a custom "cssclass" role): https://pythonhosted.org/sphinxjp.themes.basicstrap/sample.html Directives (especially role and class): http://www.sphinx-doc.org/en/stable/rest.html#directives LaTeX styling: http://docutils.readthedocs.org/en/sphinx-docs/user/latex.html#custom-interpreted-text-roles HTH, Johannes
Re: Kernel docs: muddying the waters a bit
On Fri, Mar 04, 2016 at 10:29:08AM +0200, Jani Nikula wrote: > On Fri, 04 Mar 2016, Mauro Carvalho Chehab wrote: > > > > If, on the other hand, we decide to use RST, we'll very likely need to > > patch it to fulfill our needs in order to add proper table support. > > I've no idea how easy/difficult would be to do that, nor if Sphinx > > upstream would accept such changes. > > > > So, at the end of the day, we may end by having to carry on our own > > version of Sphinx inside our tree, with doesn't sound good, specially > > since it is not just a script, but a package with hundreds of > > files. > > If we end up having to modify Sphinx, it has a powerful extension > mechanism for this. We wouldn't have to worry about getting it merged to > Sphinx upstream, and we wouldn't have to carry a local version of all of > Sphinx. (In fact, the extension mechanism provides a future path for > doing kernel-doc within Sphinx instead of as a preprocessing step.) > > I know none of this alleviates your concerns with table supports right > now. I'll try to have a look at that a bit more. FWIW, I think table formatting in Sphinx works via style sheets. The mechanism is documented in the Python docutils docs that Sphinx is built upon. Basically you use the "class" or "role" directive and define the corresponding CSS or LaTeX (or rst2pdf) style. Here is one example (using a custom "cssclass" role): https://pythonhosted.org/sphinxjp.themes.basicstrap/sample.html Directives (especially role and class): http://www.sphinx-doc.org/en/stable/rest.html#directives LaTeX styling: http://docutils.readthedocs.org/en/sphinx-docs/user/latex.html#custom-interpreted-text-roles HTH, Johannes
Re: [PATCH 5/6] n_tty: Fix stuck write wakeup
On Sun, Dec 13, 2015 at 10:38:02AM -0800, Peter Hurley wrote: > On 12/13/2015 07:18 AM, Johannes Stezenbach wrote: > > > > There is a related bug that I meant to send a patch, but I > > never got around because the issue was found with proprietary > > userspace and ancient kernel. Maybe you could take care of it? > > The patch might not apply cleanly after your recent changes > > or might even be invalid now, please check. > > Thanks for the patch, Johannes! > > Yes, the patch below is still required to prevent excessive SIGIO > (and to prevent missed SIGIO when the amount actually copied just > happens to be exactly the amount left to be copied). > > I made some comments in the patch; can you re-submit with those > changes and the patch title in the subject? Or I'd happy to re-work > it and send it to Greg if you'd prefer; just let me know. Please rework it, currently I'm in lazy bum mode ;-) > > @@ -1991,7 +1992,7 @@ static ssize_t n_tty_write(struct tty_st > > break_out: > > __set_current_state(TASK_RUNNING); > > remove_wait_queue(>write_wait, ); > > - if (b - buf != nr && tty->fasync) > > + if (b - buf != count && tty->fasync) > > ... this can be > > if (nr && tty->fasync) > set_bit(TTY_DO_WRITE_WAKEUP, >flags); Yeah, that's way better. Thanks, Johannes -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] n_tty: Fix stuck write wakeup
Hi Peter, On Sat, Dec 12, 2015 at 02:16:38PM -0800, Peter Hurley wrote: > If signal-driven i/o is disabled while write wakeup is pending (ie., > n_tty_write() has set TTY_DO_WRITE_WAKEUP but then signal-driven i/o > is disabled), the TTY_DO_WRITE_WAKEUP bit will never be cleared and > will cause tty_wakeup() to always call n_tty_write_wakeup. > > Unconditionally clear the write wakeup, and since kill_fasync() > already checks if the fasync ptr is null, call kill_fasync() > unconditionally as well. ... > @@ -230,8 +230,8 @@ static ssize_t chars_in_buffer(struct tty_struct *tty) > > static void n_tty_write_wakeup(struct tty_struct *tty) > { > - if (tty->fasync && test_and_clear_bit(TTY_DO_WRITE_WAKEUP, >flags)) > - kill_fasync(>fasync, SIGIO, POLL_OUT); > + clear_bit(TTY_DO_WRITE_WAKEUP, >flags); > + kill_fasync(>fasync, SIGIO, POLL_OUT); > } There is a related bug that I meant to send a patch, but I never got around because the issue was found with proprietary userspace and ancient kernel. Maybe you could take care of it? The patch might not apply cleanly after your recent changes or might even be invalid now, please check. Thanks, Johannes --- tty: n_tty: fix SIGIO for output According to fcntl(2), "a SIGIO signal is sent whenever input or output becomes possible on that file descriptor", i.e. after the output buffer was full and now has space for new data. But in fact SIGIO is sent after every write. n_tty_write() should set TTY_DO_WRITE_WAKEUP only when not all data could be written to the buffer. Signed-off-by: Johannes Stezenbach --- drivers/char/n_tty.c.orig 2015-11-02 22:26:04.124227148 +0100 +++ drivers/char/n_tty.c2015-11-02 22:26:10.644212115 +0100 @@ -1925,6 +1925,7 @@ static ssize_t n_tty_write(struct tty_st DECLARE_WAITQUEUE(wait, current); int c; ssize_t retval = 0; + size_t count = nr; /* Job control check -- must be done at start (POSIX.1 7.1.1.4). */ if (L_TOSTOP(tty) && file->f_op->write != redirected_tty_write) { @@ -1991,7 +1992,7 @@ static ssize_t n_tty_write(struct tty_st break_out: __set_current_state(TASK_RUNNING); remove_wait_queue(>write_wait, ); - if (b - buf != nr && tty->fasync) + if (b - buf != count && tty->fasync) set_bit(TTY_DO_WRITE_WAKEUP, >flags); return (b - buf) ? b - buf : retval; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/6] n_tty: Always wake up read()/poll() if new input
Hi Peter, On Sat, Dec 12, 2015 at 02:16:34PM -0800, Peter Hurley wrote: > A read() in non-canonical mode when VMIN > 0 and VTIME == 0 does not > complete until at least VMIN chars have been read (or the user buffer is > full). In this infrequent read mode, n_tty_read() attempts to reduce > wakeups by computing the amount of data still necessary to complete the > read (minimum_to_wake) and only waking the read()/poll() when that much > unread data has been processed. This is the only read mode for which > new data does not necessarily generate a wakeup. > > However, this optimization is broken and commonly leads to hung reads > even though the necessary amount of data has been received. Since the > optimization is of marginal value anyway, just remove the whole > thing. This also remedies a race between a concurrent poll() and > read() in this mode, where the poll() can reset the minimum_to_wake > of the read() (and vice versa). ... > @@ -1632,7 +1631,7 @@ static void __receive_buf(struct tty_struct *tty, const > unsigned char *cp, > /* publish read_head to consumer */ > smp_store_release(>commit_head, ldata->read_head); > > - if ((read_cnt(ldata) >= ldata->minimum_to_wake) || L_EXTPROC(tty)) { > + if (read_cnt(ldata)) { > kill_fasync(>fasync, SIGIO, POLL_IN); > wake_up_interruptible_poll(>read_wait, POLLIN); > } Your patch looks fine, I just want to mention that there was some undocumented behaviour for async IO to take VMIN into account for deciding when to send SIGIO, but it was implemented incorrectly because minimum_to_wake was only updated in read() and poll(), not directly by the tcsetattr() ioctl. I think your change does the right thing to fix this case, too. I had to debug some proprietary code which dynamically changed VMIN based on expected message size and thus sometimes wasn't woken up, in the end we decided to keep VMIN=1 to solve it. Thanks, Johannes -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/