Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

2019-01-17 Thread Johannes Stezenbach
On Thu, Jan 17, 2019 at 01:05:35PM +0100, Hans de Goede wrote:
> On 17-01-19 10:12, Dean Wallace wrote:
> > On 17-01-19, Mogens Jensen wrote:
> > > Kernel is compiled with SND_SOC_INTEL_CHT_BSW_MAX98090_TI_MACH and the 
> > > quirk seems to have fixed the problem caused by commit 648e921888ad 
> > > ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL"), as sound is now 
> > > working if running "speaker-test" on my system which is clean ALSA.
> 
> Note being "clean ALSA" is really not a good thing now a days,
> for lots of things we depend on pulseaudio (like setting
> up UCM mixer profiles).

FWIW I disagree because PA never worked for me.  I simply used
"alsaucm -c chtcx2072x set _verb HiFi".  But I was surprised
that PA does the ALSA UCM setup but it's not documented well that
you need to do it by other means if you don't use PA.
https://bugzilla.kernel.org/show_bug.cgi?id=115531#c72


Regards,
Johannes


Re: [PATCH] PM: Document rules on using pm_runtime_resume() in system suspend callbacks

2017-09-21 Thread Johannes Stezenbach
On Thu, Sep 21, 2017 at 02:39:30AM +0200, Rafael J. Wysocki wrote:
> On Wed, Sep 20, 2017 at 6:27 PM, Johannes Stezenbach <j...@sig21.net> wrote:
> >
> >  E.g. an audio codec could keep running
> > while the i2c bus used to program its registers can be runtime suspended.
> > If this is correct I think it would be useful to spell it out explicitly
> > in the documentation.
> 
> That's because the i2c bus uses the ignore_children flag that allows
> it to override the general rules. :-)

Ah!  I was looking at Documentation/driver-api/pm only (which is
changed by your patch), but this is documented in Documentation/power
(and obviously I hadn't checked the code, shame on me).

> direct_complete has nothing to do with this.

Oh?  Reading again, do I get this right:

1. simple method: always call pm_runtime_resume() in ->suspend(),
   then suspend the driver again
2. optimization: if pm_runtime_suspended(), the driver's ->suspend()
   can possibly do nothing if conditions permit, otherwise it calls
   pm_runtime_resume() and then suspends
3. optimization: tell pm core to skip ->suspend() via return value
   from ->prepare() which sets direct_complete

...and your patch only deals with 1 and 2.

Sorry to hijack your thread for side discussion, it was
inadvertant due to my lack of understanding.


> First off, the PM core does check the direct_complete flag in
> __device_suspend() and does more-or-less what you are saying.
> 
> However, that flag is initialized in device_prepare() with the help of
> the ->suspend() return value, because whether or not it makes sense to

you mean ->prepare(), right?

> set that flag depends on some conditions that may change between
> consecutive system suspend-resume cycles in general and need to be
> checked in advance before setting it.
> 
> HTH

It does, however the question remains *why* it needs to check
it in ->prepare() and not right before calling ->suspend().
Using ->prepare() for the purpose seems wrong since it traverses
the hierarchy in the "wrong" order.  Only right before
calling ->suspend() the driver knows if its current state
allows it to skip any further actions for suspend, because
suspending children or other users may cause pm_runtime_resume()
for it.  (In the back of my head I have the scenario of
bug #196861, some completely different driver uses
i2c via ACPI OpRegion during its suspend.)


Thanks,
Johannes


Re: [PATCH] PM: Document rules on using pm_runtime_resume() in system suspend callbacks

2017-09-21 Thread Johannes Stezenbach
On Thu, Sep 21, 2017 at 02:39:30AM +0200, Rafael J. Wysocki wrote:
> On Wed, Sep 20, 2017 at 6:27 PM, Johannes Stezenbach  wrote:
> >
> >  E.g. an audio codec could keep running
> > while the i2c bus used to program its registers can be runtime suspended.
> > If this is correct I think it would be useful to spell it out explicitly
> > in the documentation.
> 
> That's because the i2c bus uses the ignore_children flag that allows
> it to override the general rules. :-)

Ah!  I was looking at Documentation/driver-api/pm only (which is
changed by your patch), but this is documented in Documentation/power
(and obviously I hadn't checked the code, shame on me).

> direct_complete has nothing to do with this.

Oh?  Reading again, do I get this right:

1. simple method: always call pm_runtime_resume() in ->suspend(),
   then suspend the driver again
2. optimization: if pm_runtime_suspended(), the driver's ->suspend()
   can possibly do nothing if conditions permit, otherwise it calls
   pm_runtime_resume() and then suspends
3. optimization: tell pm core to skip ->suspend() via return value
   from ->prepare() which sets direct_complete

...and your patch only deals with 1 and 2.

Sorry to hijack your thread for side discussion, it was
inadvertant due to my lack of understanding.


> First off, the PM core does check the direct_complete flag in
> __device_suspend() and does more-or-less what you are saying.
> 
> However, that flag is initialized in device_prepare() with the help of
> the ->suspend() return value, because whether or not it makes sense to

you mean ->prepare(), right?

> set that flag depends on some conditions that may change between
> consecutive system suspend-resume cycles in general and need to be
> checked in advance before setting it.
> 
> HTH

It does, however the question remains *why* it needs to check
it in ->prepare() and not right before calling ->suspend().
Using ->prepare() for the purpose seems wrong since it traverses
the hierarchy in the "wrong" order.  Only right before
calling ->suspend() the driver knows if its current state
allows it to skip any further actions for suspend, because
suspending children or other users may cause pm_runtime_resume()
for it.  (In the back of my head I have the scenario of
bug #196861, some completely different driver uses
i2c via ACPI OpRegion during its suspend.)


Thanks,
Johannes


Re: [PATCH] PM: Document rules on using pm_runtime_resume() in system suspend callbacks

2017-09-20 Thread Johannes Stezenbach
On Wed, Sep 20, 2017 at 04:01:32PM +0200, Rafael J. Wysocki wrote:
> On Wed, Sep 20, 2017 at 2:28 PM, Ulf Hansson  wrote:
> > On 20 September 2017 at 02:26, Rafael J. Wysocki  wrote:
> >>
> >> Second, leaving devices in runtime suspend in the "suspend" phase of system
> >> suspend is fishy even when their runtime PM is disabled, because that 
> >> doesn't
> >> guarantee anything regarding their children or possible consumers.  Runtime
> >> PM may still be enabled for those devices at that time and runtime resume 
> >> may
> >> be triggered for them later, in which case it all quickly falls apart.
> >
> > This is true, although to me this is a about a different problem and
> > has very little to do with pm_runtime_force_suspend().
> >
> > More precisely, whether runtime PM becomes disabled in the suspend
> > phase or suspend_late phase, really doesn't matter. Because in the end
> > this is about suspending/resuming devices in the correct order.
> 
> Yes, it is, but this is not my point (I didn't make it clear enough I guess).
> 
> At the time you make the decision to disable runtime PM for a parent
> (say) and leave it in runtime suspend, all of its children are
> suspended just fine (otherwise the parent wouldn't have been suspended
> too).  However, you *also* need to make sure that there will be no
> attempts to resume any of them *after* that point, which practically
> means that either runtime PM has to have been disabled already for all
> of them at the time it is disabled for the parent, or there has to be
> another guarantee in place.
> 
> That's why the core tries to enforce the "runtime PM disabled for the
> entire hierarchy below" guarantee for the devices with direct_complete
> set, but that may just be overkill in many cases.  I guess it may be
> better to use WARN_ON() to catch the cases in which things may really
> go wrong.

I read this half a dozen times and I'm still confused.
Moreover, Documentation/driver-api/pm/devices.rst says:

Runtime Power Management model:

Devices may also be put into low-power states while the system is
running, independently of other power management activity in principle.
However, devices are not generally independent of each other (for
example, a parent device cannot be suspended unless all of its child
devices have been suspended).  ...

However, isn't this a fundamental difference of runtime suspend
vs. system suspend that parent devices *can* be runtime suspended
before their children?  E.g. an audio codec could keep running
while the i2c bus used to program its registers can be runtime suspended.
If this is correct I think it would be useful to spell it out explicitly
in the documentation.

During system suspend, pm core will suspend children first,
and if the child's ->suspend hook uses the i2c bus to access registers,
it will implicitly runtime resume the i2c bus (e.g. due to pm_runtime_get_sync()
in i2c_dw_xfer()).  Later pm core will ->suspend the i2c bus.

I have a hunch the root of the problem is that ->prepare walks the tree
in top-down order, and its return value is used to decide about
direct-complete.  Why does it do that?  Shouldn't pm core check
the direct_complete flag during ->suspend if the device
is in runtime suspend, to decide whether to skip runtime resume + ->suspend
for *this* device?


Johannes


Re: [PATCH] PM: Document rules on using pm_runtime_resume() in system suspend callbacks

2017-09-20 Thread Johannes Stezenbach
On Wed, Sep 20, 2017 at 04:01:32PM +0200, Rafael J. Wysocki wrote:
> On Wed, Sep 20, 2017 at 2:28 PM, Ulf Hansson  wrote:
> > On 20 September 2017 at 02:26, Rafael J. Wysocki  wrote:
> >>
> >> Second, leaving devices in runtime suspend in the "suspend" phase of system
> >> suspend is fishy even when their runtime PM is disabled, because that 
> >> doesn't
> >> guarantee anything regarding their children or possible consumers.  Runtime
> >> PM may still be enabled for those devices at that time and runtime resume 
> >> may
> >> be triggered for them later, in which case it all quickly falls apart.
> >
> > This is true, although to me this is a about a different problem and
> > has very little to do with pm_runtime_force_suspend().
> >
> > More precisely, whether runtime PM becomes disabled in the suspend
> > phase or suspend_late phase, really doesn't matter. Because in the end
> > this is about suspending/resuming devices in the correct order.
> 
> Yes, it is, but this is not my point (I didn't make it clear enough I guess).
> 
> At the time you make the decision to disable runtime PM for a parent
> (say) and leave it in runtime suspend, all of its children are
> suspended just fine (otherwise the parent wouldn't have been suspended
> too).  However, you *also* need to make sure that there will be no
> attempts to resume any of them *after* that point, which practically
> means that either runtime PM has to have been disabled already for all
> of them at the time it is disabled for the parent, or there has to be
> another guarantee in place.
> 
> That's why the core tries to enforce the "runtime PM disabled for the
> entire hierarchy below" guarantee for the devices with direct_complete
> set, but that may just be overkill in many cases.  I guess it may be
> better to use WARN_ON() to catch the cases in which things may really
> go wrong.

I read this half a dozen times and I'm still confused.
Moreover, Documentation/driver-api/pm/devices.rst says:

Runtime Power Management model:

Devices may also be put into low-power states while the system is
running, independently of other power management activity in principle.
However, devices are not generally independent of each other (for
example, a parent device cannot be suspended unless all of its child
devices have been suspended).  ...

However, isn't this a fundamental difference of runtime suspend
vs. system suspend that parent devices *can* be runtime suspended
before their children?  E.g. an audio codec could keep running
while the i2c bus used to program its registers can be runtime suspended.
If this is correct I think it would be useful to spell it out explicitly
in the documentation.

During system suspend, pm core will suspend children first,
and if the child's ->suspend hook uses the i2c bus to access registers,
it will implicitly runtime resume the i2c bus (e.g. due to pm_runtime_get_sync()
in i2c_dw_xfer()).  Later pm core will ->suspend the i2c bus.

I have a hunch the root of the problem is that ->prepare walks the tree
in top-down order, and its return value is used to decide about
direct-complete.  Why does it do that?  Shouldn't pm core check
the direct_complete flag during ->suspend if the device
is in runtime suspend, to decide whether to skip runtime resume + ->suspend
for *this* device?


Johannes


Re: [PATCH 2/3] input/keyboard: Add support for Dollar Cove TI power button

2017-08-22 Thread Johannes Stezenbach
On Tue, Aug 22, 2017 at 12:58:07PM +0200, Takashi Iwai wrote:
> I updated the patches and now pushed to topic/dollar-cove-ti-4.13-v2
> branch.  Will resubmit v2 (tomorrow or later) once after gathering
> reviews.

FWIW I tested current Linus's master + topic/dollar-cove-ti-4.13-v2
+ topic/soc-cx2072x-4.13 + my test patches, no observable
difference to topic/dollar-cove-ti-4.13 on E200HA.

Still hoping someone would give me a hint about possible
causes for the SoC entering S0i1 only instead of S0i3?
(https://bugzilla.kernel.org/show_bug.cgi?id=193891)
Where do I start looking?

Thanks,
Johannes


Re: [PATCH 2/3] input/keyboard: Add support for Dollar Cove TI power button

2017-08-22 Thread Johannes Stezenbach
On Tue, Aug 22, 2017 at 12:58:07PM +0200, Takashi Iwai wrote:
> I updated the patches and now pushed to topic/dollar-cove-ti-4.13-v2
> branch.  Will resubmit v2 (tomorrow or later) once after gathering
> reviews.

FWIW I tested current Linus's master + topic/dollar-cove-ti-4.13-v2
+ topic/soc-cx2072x-4.13 + my test patches, no observable
difference to topic/dollar-cove-ti-4.13 on E200HA.

Still hoping someone would give me a hint about possible
causes for the SoC entering S0i1 only instead of S0i3?
(https://bugzilla.kernel.org/show_bug.cgi?id=193891)
Where do I start looking?

Thanks,
Johannes


Re: Cherryview wake up events

2017-02-09 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 05:58:26PM +0200, Mika Westerberg wrote:
> On Thu, Feb 02, 2017 at 04:42:43PM +0100, Johannes Stezenbach wrote:
> > On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote:
> > > Is the model Asus E200HA? Something like this (sorry the information is
> > > in Finnish but the machine should look the same):
> > > 
> > > https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone
> > 
> > Looks right, mine is E200HA-FD0004TS but I think that just means dark blue 
> > color.
> 
> OK, we have one other Cherrytrail machine here which may have the same
> PMIC. We'll check that first and if it does not have the same, I'll
> order the above machine.

Probably it's too early to ask, but did you go for the E200HA
or what device are you going to use?  And did you start poking
at it, or what timeframe can we expect some patches to test?

BTW, just to clarify about the test patches I added in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=193891
You can use them but I also don't mind if they go to the
garbage can, they were just quickly cobbled together
for testing.


Thanks,
Johannes


Re: Cherryview wake up events

2017-02-09 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 05:58:26PM +0200, Mika Westerberg wrote:
> On Thu, Feb 02, 2017 at 04:42:43PM +0100, Johannes Stezenbach wrote:
> > On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote:
> > > Is the model Asus E200HA? Something like this (sorry the information is
> > > in Finnish but the machine should look the same):
> > > 
> > > https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone
> > 
> > Looks right, mine is E200HA-FD0004TS but I think that just means dark blue 
> > color.
> 
> OK, we have one other Cherrytrail machine here which may have the same
> PMIC. We'll check that first and if it does not have the same, I'll
> order the above machine.

Probably it's too early to ask, but did you go for the E200HA
or what device are you going to use?  And did you start poking
at it, or what timeframe can we expect some patches to test?

BTW, just to clarify about the test patches I added in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=193891
You can use them but I also don't mind if they go to the
garbage can, they were just quickly cobbled together
for testing.


Thanks,
Johannes


Re: Cherryview wake up events

2017-02-03 Thread Johannes Stezenbach
On Fri, Feb 03, 2017 at 12:00:00PM +0200, Mika Westerberg wrote:
> Just for book keeping purposes, can you file a kernel.org bugzilla bug
> about this and add all the necessary information, and your patches
> there? You can assign the bug directly to me.

I filed it but cannot assign it, added you to CC.
https://bugzilla.kernel.org/show_bug.cgi?id=193891

Thanks,
Johannes


Re: Cherryview wake up events

2017-02-03 Thread Johannes Stezenbach
On Fri, Feb 03, 2017 at 12:00:00PM +0200, Mika Westerberg wrote:
> Just for book keeping purposes, can you file a kernel.org bugzilla bug
> about this and add all the necessary information, and your patches
> there? You can assign the bug directly to me.

I filed it but cannot assign it, added you to CC.
https://bugzilla.kernel.org/show_bug.cgi?id=193891

Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 05:58:26PM +0200, Mika Westerberg wrote:
> On Thu, Feb 02, 2017 at 04:42:43PM +0100, Johannes Stezenbach wrote:
> > On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote:
> > > Is the model Asus E200HA? Something like this (sorry the information is
> > > in Finnish but the machine should look the same):
> > > 
> > > https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone
> > 
> > Looks right, mine is E200HA-FD0004TS but I think that just means dark blue 
> > color.
> 
> OK, we have one other Cherrytrail machine here which may have the same
> PMIC. We'll check that first and if it does not have the same, I'll
> order the above machine.

In case it is useful to know, I installed Debian stretch following this:
https://wiki.debian.org/InstallingDebianOn/Asus/E200HA

I built my kernel using a relatively minimal kernel config,
let me know if you want it.  I could also post the
two patches which port the mfd and opregion drivers, but
they are straight forward copies of intel_soc_pmic_crc.c
and intel_pmic_crc.c from 4.10.0-rc6+ with code copy
from the ProductionKernelQuilts patches and s/crc/dc_ti/ etc.,
except I scamped the thermal handler to skip the ADC driver port for now.


Maybe I should've used intel_pmic_xpower.c instead of
intel_pmic_crc.c, since as I write this I see there
is a no-op intel_xpower_pmic_gpio_handler() registered.
This is the trick that fixes this:
\_SB.PCI0.I2C7.PMI2.AVBG Integer  8be7b74d9be0 01 = 
0001

But now it generates ACPI errors about thermal zone
and "acpi -V" usually hangs it up.

[5.500927] ACPI Exception: AE_ERROR, Returned by Handler for 
[UserDefinedRegion] (20160930/evregion-300)
[5.503842] No Local Variables are initialized for method [TMPR]
[5.506703] No Arguments are initialized for method [TMPR]
[5.509557] ACPI Error: Method parse/execution failed [\_SB.ATKD.TMPR] (Node 
8a7d374e87f8), AE_ERROR (20160930/ps
parse-543)
[5.512481] ACPI Error: Method parse/execution failed [\_SB.ATKD.WMNB] (Node 
8a7d374e7ed8), AE_ERROR (20160930/ps
parse-543)
[6.545403] i2c_designware 808622C1:06: controller timed out
[6.550763] ACPI Exception: AE_ERROR, Returned by Handler for 
[UserDefinedRegion] (20160930/evregion-300)
[6.555783] No Local Variables are initialized for method [TMPR]
[6.558769] No Arguments are initialized for method [TMPR]
[6.561571] ACPI Error: Method parse/execution failed [\_SB.ATKD.TMPR] (Node 
8a7d374e87f8), AE_ERROR (20160930/ps
parse-543)
[6.564487] ACPI Error: Method parse/execution failed [\_SB.ATKD.WMNB] (Node 
8a7d374e7ed8), AE_ERROR (20160930/ps
parse-543)

(I knew my thermal opregion code was preliminary but I didn't expect it to 
error.)


Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 05:58:26PM +0200, Mika Westerberg wrote:
> On Thu, Feb 02, 2017 at 04:42:43PM +0100, Johannes Stezenbach wrote:
> > On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote:
> > > Is the model Asus E200HA? Something like this (sorry the information is
> > > in Finnish but the machine should look the same):
> > > 
> > > https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone
> > 
> > Looks right, mine is E200HA-FD0004TS but I think that just means dark blue 
> > color.
> 
> OK, we have one other Cherrytrail machine here which may have the same
> PMIC. We'll check that first and if it does not have the same, I'll
> order the above machine.

In case it is useful to know, I installed Debian stretch following this:
https://wiki.debian.org/InstallingDebianOn/Asus/E200HA

I built my kernel using a relatively minimal kernel config,
let me know if you want it.  I could also post the
two patches which port the mfd and opregion drivers, but
they are straight forward copies of intel_soc_pmic_crc.c
and intel_pmic_crc.c from 4.10.0-rc6+ with code copy
from the ProductionKernelQuilts patches and s/crc/dc_ti/ etc.,
except I scamped the thermal handler to skip the ADC driver port for now.


Maybe I should've used intel_pmic_xpower.c instead of
intel_pmic_crc.c, since as I write this I see there
is a no-op intel_xpower_pmic_gpio_handler() registered.
This is the trick that fixes this:
\_SB.PCI0.I2C7.PMI2.AVBG Integer  8be7b74d9be0 01 = 
0001

But now it generates ACPI errors about thermal zone
and "acpi -V" usually hangs it up.

[5.500927] ACPI Exception: AE_ERROR, Returned by Handler for 
[UserDefinedRegion] (20160930/evregion-300)
[5.503842] No Local Variables are initialized for method [TMPR]
[5.506703] No Arguments are initialized for method [TMPR]
[5.509557] ACPI Error: Method parse/execution failed [\_SB.ATKD.TMPR] (Node 
8a7d374e87f8), AE_ERROR (20160930/ps
parse-543)
[5.512481] ACPI Error: Method parse/execution failed [\_SB.ATKD.WMNB] (Node 
8a7d374e7ed8), AE_ERROR (20160930/ps
parse-543)
[6.545403] i2c_designware 808622C1:06: controller timed out
[6.550763] ACPI Exception: AE_ERROR, Returned by Handler for 
[UserDefinedRegion] (20160930/evregion-300)
[6.555783] No Local Variables are initialized for method [TMPR]
[6.558769] No Arguments are initialized for method [TMPR]
[6.561571] ACPI Error: Method parse/execution failed [\_SB.ATKD.TMPR] (Node 
8a7d374e87f8), AE_ERROR (20160930/ps
parse-543)
[6.564487] ACPI Error: Method parse/execution failed [\_SB.ATKD.WMNB] (Node 
8a7d374e7ed8), AE_ERROR (20160930/ps
parse-543)

(I knew my thermal opregion code was preliminary but I didn't expect it to 
error.)


Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote:
> OK, I guess it is easier if I just order one of those machines here and
> figure out how to get the PMIC driver working.

Oh, I assumed the bottleneck is developer time, not lack of hardware...

> Is the model Asus E200HA? Something like this (sorry the information is
> in Finnish but the machine should look the same):
> 
> https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone

Looks right, mine is E200HA-FD0004TS but I think that just means dark blue 
color.

> Also can you remind me what exactly is not working so we can prioritize?

There are reports hibernate isn't working, presumably because
the storage is 32MB eMMC.  I've never tried.  It doesn't
support ACPI S3 (suspend-to-RAM).  So currently one has
to boot+shutdown everytime (or keep it running).

1. There seems to be no way to wake it up after "echo freeze >/sys/power/state".
   That is the reason for wanting the power button to wake it up.
   Whether the PB creates an input event is secondary.
   (the LID also doesn't wake it up, but it creates an input event)
2. I've no idea what would be the power consumption in freeze state,
   so I guess support for the S0ix states is needed
3. It randomly hangs at boot, often with a message related to i2c timeout.
   I tried Hans de Goede's patches but it didn't work for me
   (question is if the semphore address is the same for
   AXP288 and TI DCove; the DSDT has the _SEM method so the
   semphore is needed).

Everything else is secondary.  E.g. there is an ADC driver used for thermal
used by the opregion driver, I didn't port it and just implemented
like for AXP288 by simple register reads (which might not work for TI).
We could fix this later.


Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 05:02:12PM +0200, Mika Westerberg wrote:
> OK, I guess it is easier if I just order one of those machines here and
> figure out how to get the PMIC driver working.

Oh, I assumed the bottleneck is developer time, not lack of hardware...

> Is the model Asus E200HA? Something like this (sorry the information is
> in Finnish but the machine should look the same):
> 
> https://www.karkkainen.com/verkkokauppa/asus-e200ha-fd0005ts-11-6--hd-kannettava-tietokone

Looks right, mine is E200HA-FD0004TS but I think that just means dark blue 
color.

> Also can you remind me what exactly is not working so we can prioritize?

There are reports hibernate isn't working, presumably because
the storage is 32MB eMMC.  I've never tried.  It doesn't
support ACPI S3 (suspend-to-RAM).  So currently one has
to boot+shutdown everytime (or keep it running).

1. There seems to be no way to wake it up after "echo freeze >/sys/power/state".
   That is the reason for wanting the power button to wake it up.
   Whether the PB creates an input event is secondary.
   (the LID also doesn't wake it up, but it creates an input event)
2. I've no idea what would be the power consumption in freeze state,
   so I guess support for the S0ix states is needed
3. It randomly hangs at boot, often with a message related to i2c timeout.
   I tried Hans de Goede's patches but it didn't work for me
   (question is if the semphore address is the same for
   AXP288 and TI DCove; the DSDT has the _SEM method so the
   semphore is needed).

Everything else is secondary.  E.g. there is an ADC driver used for thermal
used by the opregion driver, I didn't port it and just implemented
like for AXP288 by simple register reads (which might not work for TI).
We could fix this later.


Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 04:26:18PM +0200, Mika Westerberg wrote:
> On Thu, Feb 02, 2017 at 02:52:57PM +0100, Johannes Stezenbach wrote:
> > Looking at 0002-GPIO-Adding-AXP288-PMIC-GPIO-driver.patch from 
> > ProductionKernelQuilts,
> > it doesn't seem hard to do the same for the TI PMIC, but it needs 
> > information
> > from the PMIC datasheet for irq and gpio control registers.
> > Hopefully you have a patch or at least could provide the information.
> 
> That patch looks like a GPIO driver for DCOVE. Did you try it already?

Hell, no.  Without datasheets I can't compare if registers
are compatible between AXP288 and TI DDOVE (SND9039).
Couldn't it damager the hardware if I mess up charger
and voltage regulator related registers?

And current Linus' tree doesn't have the AXP288 GPIO,
and ProductionKernelQuilts doesn't use AXP288 GPIO
for TI DCOVE.

Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 04:26:18PM +0200, Mika Westerberg wrote:
> On Thu, Feb 02, 2017 at 02:52:57PM +0100, Johannes Stezenbach wrote:
> > Looking at 0002-GPIO-Adding-AXP288-PMIC-GPIO-driver.patch from 
> > ProductionKernelQuilts,
> > it doesn't seem hard to do the same for the TI PMIC, but it needs 
> > information
> > from the PMIC datasheet for irq and gpio control registers.
> > Hopefully you have a patch or at least could provide the information.
> 
> That patch looks like a GPIO driver for DCOVE. Did you try it already?

Hell, no.  Without datasheets I can't compare if registers
are compatible between AXP288 and TI DDOVE (SND9039).
Couldn't it damager the hardware if I mess up charger
and voltage regulator related registers?

And current Linus' tree doesn't have the AXP288 GPIO,
and ProductionKernelQuilts doesn't use AXP288 GPIO
for TI DCOVE.

Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 02:16:39PM +0200, Mika Westerberg wrote:
> On Thu, Feb 02, 2017 at 01:35:08PM +0200, Mika Westerberg wrote:
> > On Thu, Feb 02, 2017 at 12:12:22PM +0100, Johannes Stezenbach wrote:
> > > On Thu, Feb 02, 2017 at 12:31:22PM +0200, Mika Westerberg wrote:
> > > > On Thu, Feb 02, 2017 at 10:52:00AM +0100, Johannes Stezenbach wrote:
> > > > > OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100)
> > > > > Field (GPOP, ByteAcc, NoLock, Preserve)
> > > > > {
> > > > > Connection (
> > > > > GpioIo (Exclusive, PullDefault, 0x, 
> > > > > 0x, IoRestrictionOutputOnly,
> > > > > "\\_SB.PCI0.I2C7.PMI2", 0x00, 
> > > > > ResourceConsumer, ,
> > > > > )
> > > > > {   // Pin list
> > > > > 0x0020
> > > > > }
> > > > > ), 
> > > > > GMP0,   1, 
> > > > > ...
> > > > > (repeat for many more pins)
> > > > > 
> > > > > I guess it means it uses chv_gpio pins and can't work
> > > > > if the GPIO opregion is not registered?
> > > > 
> > > > That is using GPIO pins of the PMI2 device - the PMIC GPIO driver, I
> > > > suppose.
> > > > 
> > > > So in addition to the PMIC MFD driver, you need to have a GPIO driver
> > > > for Dollar Cove (I guess the quilt patch series included that as well?).
> > > 
> > > Nope, I see it for AX288 but didn't find it for TI DCove.  And in
> > > current Linus' tree axp288_cells[] doesn't include gpio so
> > > I concluded it's not needed... what am I missing?
> > 
> > So reading your DSDT there is that GPIO button array device \_SB.TBAD
> > which has one GpioInt() referencing \_SB.PCI0.I2C7.PMI2. I suppose that
> > is the power button GPIO.
> > 
> > In order to use that there needs to be a GPIO driver exposing those
> > GPIOs to other drivers. So it is definitely needed.
> 
> Actually, looking again the patches you found:
> 
> https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch
> https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch
> 
> Did you try to them both? The latter seems to handle the power button
> by talking directly with the PMIC (instead of using a GPIO).

Nope, as I've written earlier:
> In ProductionKernelQuilts I found
> DC-TI-PMIC-disable-power-button-support.patch so I guess it
> might not be needed because it's probably handled by ACPI.

[  +0.000338] input: Power Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[  +0.000127] ACPI: Power Button [PWRB]
...
[  +0.000248] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[  +0.000116] ACPI: Power Button [PWRF]

And I also have:

[  +0.04] soc_button_array INTCFD9:00: GPIO lookup for consumer 
soc_button_array
[  +0.02] soc_button_array INTCFD9:00: using ACPI for GPIO lookup
[  +0.03] acpi INTCFD9:00: GPIO: looking up soc_button_array-gpios
[  +0.04] acpi INTCFD9:00: GPIO: looking up soc_button_array-gpio
[  +0.03] acpi INTCFD9:00: GPIO: looking up 0 in _CRS
[  +0.000610] soc_button_array INTCFD9:00: lookup for GPIO soc_button_array 
failed
(repeats for 5 buttons, one of them should succeed)

> Let's include the original author (Ramakrishna) as well if we could get
> some information from him.

Looking at 0002-GPIO-Adding-AXP288-PMIC-GPIO-driver.patch from 
ProductionKernelQuilts,
it doesn't seem hard to do the same for the TI PMIC, but it needs information
from the PMIC datasheet for irq and gpio control registers.
Hopefully you have a patch or at least could provide the information.


Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 02:16:39PM +0200, Mika Westerberg wrote:
> On Thu, Feb 02, 2017 at 01:35:08PM +0200, Mika Westerberg wrote:
> > On Thu, Feb 02, 2017 at 12:12:22PM +0100, Johannes Stezenbach wrote:
> > > On Thu, Feb 02, 2017 at 12:31:22PM +0200, Mika Westerberg wrote:
> > > > On Thu, Feb 02, 2017 at 10:52:00AM +0100, Johannes Stezenbach wrote:
> > > > > OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100)
> > > > > Field (GPOP, ByteAcc, NoLock, Preserve)
> > > > > {
> > > > > Connection (
> > > > > GpioIo (Exclusive, PullDefault, 0x, 
> > > > > 0x, IoRestrictionOutputOnly,
> > > > > "\\_SB.PCI0.I2C7.PMI2", 0x00, 
> > > > > ResourceConsumer, ,
> > > > > )
> > > > > {   // Pin list
> > > > > 0x0020
> > > > > }
> > > > > ), 
> > > > > GMP0,   1, 
> > > > > ...
> > > > > (repeat for many more pins)
> > > > > 
> > > > > I guess it means it uses chv_gpio pins and can't work
> > > > > if the GPIO opregion is not registered?
> > > > 
> > > > That is using GPIO pins of the PMI2 device - the PMIC GPIO driver, I
> > > > suppose.
> > > > 
> > > > So in addition to the PMIC MFD driver, you need to have a GPIO driver
> > > > for Dollar Cove (I guess the quilt patch series included that as well?).
> > > 
> > > Nope, I see it for AX288 but didn't find it for TI DCove.  And in
> > > current Linus' tree axp288_cells[] doesn't include gpio so
> > > I concluded it's not needed... what am I missing?
> > 
> > So reading your DSDT there is that GPIO button array device \_SB.TBAD
> > which has one GpioInt() referencing \_SB.PCI0.I2C7.PMI2. I suppose that
> > is the power button GPIO.
> > 
> > In order to use that there needs to be a GPIO driver exposing those
> > GPIOs to other drivers. So it is definitely needed.
> 
> Actually, looking again the patches you found:
> 
> https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch
> https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch
> 
> Did you try to them both? The latter seems to handle the power button
> by talking directly with the PMIC (instead of using a GPIO).

Nope, as I've written earlier:
> In ProductionKernelQuilts I found
> DC-TI-PMIC-disable-power-button-support.patch so I guess it
> might not be needed because it's probably handled by ACPI.

[  +0.000338] input: Power Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[  +0.000127] ACPI: Power Button [PWRB]
...
[  +0.000248] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[  +0.000116] ACPI: Power Button [PWRF]

And I also have:

[  +0.04] soc_button_array INTCFD9:00: GPIO lookup for consumer 
soc_button_array
[  +0.02] soc_button_array INTCFD9:00: using ACPI for GPIO lookup
[  +0.03] acpi INTCFD9:00: GPIO: looking up soc_button_array-gpios
[  +0.04] acpi INTCFD9:00: GPIO: looking up soc_button_array-gpio
[  +0.03] acpi INTCFD9:00: GPIO: looking up 0 in _CRS
[  +0.000610] soc_button_array INTCFD9:00: lookup for GPIO soc_button_array 
failed
(repeats for 5 buttons, one of them should succeed)

> Let's include the original author (Ramakrishna) as well if we could get
> some information from him.

Looking at 0002-GPIO-Adding-AXP288-PMIC-GPIO-driver.patch from 
ProductionKernelQuilts,
it doesn't seem hard to do the same for the TI PMIC, but it needs information
from the PMIC datasheet for irq and gpio control registers.
Hopefully you have a patch or at least could provide the information.


Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 12:31:22PM +0200, Mika Westerberg wrote:
> On Thu, Feb 02, 2017 at 10:52:00AM +0100, Johannes Stezenbach wrote:
> > OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100)
> > Field (GPOP, ByteAcc, NoLock, Preserve)
> > {
> > Connection (
> > GpioIo (Exclusive, PullDefault, 0x, 0x, 
> > IoRestrictionOutputOnly,
> > "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, 
> > ,
> > )
> > {   // Pin list
> > 0x0020
> > }
> > ), 
> > GMP0,   1, 
> > ...
> > (repeat for many more pins)
> > 
> > I guess it means it uses chv_gpio pins and can't work
> > if the GPIO opregion is not registered?
> 
> That is using GPIO pins of the PMI2 device - the PMIC GPIO driver, I
> suppose.
> 
> So in addition to the PMIC MFD driver, you need to have a GPIO driver
> for Dollar Cove (I guess the quilt patch series included that as well?).

Nope, I see it for AX288 but didn't find it for TI DCove.  And in
current Linus' tree axp288_cells[] doesn't include gpio so
I concluded it's not needed... what am I missing?


Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
On Thu, Feb 02, 2017 at 12:31:22PM +0200, Mika Westerberg wrote:
> On Thu, Feb 02, 2017 at 10:52:00AM +0100, Johannes Stezenbach wrote:
> > OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100)
> > Field (GPOP, ByteAcc, NoLock, Preserve)
> > {
> > Connection (
> > GpioIo (Exclusive, PullDefault, 0x, 0x, 
> > IoRestrictionOutputOnly,
> > "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, 
> > ,
> > )
> > {   // Pin list
> > 0x0020
> > }
> > ), 
> > GMP0,   1, 
> > ...
> > (repeat for many more pins)
> > 
> > I guess it means it uses chv_gpio pins and can't work
> > if the GPIO opregion is not registered?
> 
> That is using GPIO pins of the PMI2 device - the PMIC GPIO driver, I
> suppose.
> 
> So in addition to the PMIC MFD driver, you need to have a GPIO driver
> for Dollar Cove (I guess the quilt patch series included that as well?).

Nope, I see it for AX288 but didn't find it for TI DCove.  And in
current Linus' tree axp288_cells[] doesn't include gpio so
I concluded it's not needed... what am I missing?


Thanks,
Johannes


Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
Hi Mika,

On Tue, Jan 31, 2017 at 03:37:40PM +0100, Johannes Stezenbach wrote:
> - Powerbutton driver seems simple enough, the only specialty
>   of the TI dcove PB driver is the workarond for lost button
>   press event after resume.  However, I still don't see how
>   the PB would cause thermal event irqs on E200HA and how the
>   PMIC driver would change it?

In ProductionKernelQuilts I found
DC-TI-PMIC-disable-power-button-support.patch so I guess it
might not be needed because it's probably handled by ACPI.

> I think the mfd driver would be similar to intel_soc_pmic_crc.c,
> the dollar_cove_ti_powerbtn.c I would keep instead of merging
> it into intel_mid_powerbtn.c.  I guess what we need is in
> drivers/acpi/pmic/ something similar to intel_pmic_crc.c,
> the ProductionKernelQuilts has 
> 0001-ACPI-Adding-support-for-TI-pmic-opregion.patch.

I have preliminary versions of the mfd and opregion driver,
while testing I found the GPIO opregion is not registered:

Excerpt from DSDT:
https://linuxtv.org/~js/e200ha/dsdt.dsl

Device (PMI2)
{
Name (_ADR, Zero)  // _ADR: Address
Name (_HID, "INT33F5" /* TI PMIC Controller */)  // _HID: 
Hardware ID
Name (_CID, "INT33F5" /* TI PMIC Controller */)  // _CID: 
Compatible ID
Name (_DDN, "TI PMIC Controller")  // _DDN: DOS Device Name
Name (_HRV, 0x03)  // _HRV: Hardware Revision
Name (_UID, One)  // _UID: Unique ID
Name (_DEP, Package (0x02)  // _DEP: Dependencies
{
I2C7, 
GPO1
})
Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource 
Settings
{
Name (SBUF, ResourceTemplate ()
{
I2cSerialBusV2 (0x005E, ControllerInitiated, 0x000F4240,
AddressingMode7Bit, "\\_SB.PCI0.I2C7",
0x00, ResourceConsumer, , Exclusive,
)
GpioInt (Level, ActiveHigh, Shared, PullDefault, 0x,
"\\_SB.GPO1", 0x00, ResourceConsumer, ,
)
{   // Pin list
0x000F
}
})
Return (SBUF) /* \_SB_.PCI0.I2C7.PMI2._CRS.SBUF */
}
...
Name (AVBL, Zero)
Name (AVBD, Zero)
Name (AVBG, Zero)
Method (_REG, 2, NotSerialized)  // _REG: Region Availability
{
If (Arg0 == 0x08)
{
AVBG = Arg1
}

If (Arg0 == 0x8D)
{
AVBL = Arg1
}

If (Arg0 == 0x8C)
{
AVBD = Arg1
}
}


acpidbg:
\_SB.PCI0.I2C7.PMI2.AVBL Integer  8be7b74d97a8 01 = 
0001
\_SB.PCI0.I2C7.PMI2.AVBD Integer  8be7b74d94d8 01 = 
0001
\_SB.PCI0.I2C7.PMI2.AVBG Integer  8be7b74d9be0 01 = 


Any idea about it?
devm_gpiochip_add_data() in chv_gpio_probe() indirectly calls 
acpi_gpiochip_add()
which should use _DEP to figure out to call _REG, right?

Also PMI2 has

OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100)
Field (GPOP, ByteAcc, NoLock, Preserve)
{
Connection (
GpioIo (Exclusive, PullDefault, 0x, 0x, 
IoRestrictionOutputOnly,
"\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, ,
)
{   // Pin list
0x0020
}
), 
GMP0,   1, 
...
(repeat for many more pins)

I guess it means it uses chv_gpio pins and can't work
if the GPIO opregion is not registered?


FWIW, with the mfd driver, /proc/interrupts has

 180:  0  0  0  0  chv-gpio9  TI Dollar Cove

I guess the 9 refers to the 10th pin in north_pins[] which is pin 0x000F, right?
I boot with "dyndbg=file gpiolib* +p" and get

[  +0.012798] acpi INT33F5:00: GPIO: looking up 0 in _CRS
[  +0.000214] intel_soc_pmic_i2c i2c-INT33F5:00: GPIO lookup for consumer 
intel_soc_pmic
[  +0.03] intel_soc_pmic_i2c i2c-INT33F5:00: using ACPI for GPIO lookup
[  +0.05] acpi INT33F5:00: GPIO: looking up intel_soc_pmic-gpios
[  +0.05] acpi INT33F5:00: GPIO: looking up intel_soc_pmic-gpio
[  +0.05] acpi INT33F5:00: GPIO: looking up 0 in _

Re: Cherryview wake up events

2017-02-02 Thread Johannes Stezenbach
Hi Mika,

On Tue, Jan 31, 2017 at 03:37:40PM +0100, Johannes Stezenbach wrote:
> - Powerbutton driver seems simple enough, the only specialty
>   of the TI dcove PB driver is the workarond for lost button
>   press event after resume.  However, I still don't see how
>   the PB would cause thermal event irqs on E200HA and how the
>   PMIC driver would change it?

In ProductionKernelQuilts I found
DC-TI-PMIC-disable-power-button-support.patch so I guess it
might not be needed because it's probably handled by ACPI.

> I think the mfd driver would be similar to intel_soc_pmic_crc.c,
> the dollar_cove_ti_powerbtn.c I would keep instead of merging
> it into intel_mid_powerbtn.c.  I guess what we need is in
> drivers/acpi/pmic/ something similar to intel_pmic_crc.c,
> the ProductionKernelQuilts has 
> 0001-ACPI-Adding-support-for-TI-pmic-opregion.patch.

I have preliminary versions of the mfd and opregion driver,
while testing I found the GPIO opregion is not registered:

Excerpt from DSDT:
https://linuxtv.org/~js/e200ha/dsdt.dsl

Device (PMI2)
{
Name (_ADR, Zero)  // _ADR: Address
Name (_HID, "INT33F5" /* TI PMIC Controller */)  // _HID: 
Hardware ID
Name (_CID, "INT33F5" /* TI PMIC Controller */)  // _CID: 
Compatible ID
Name (_DDN, "TI PMIC Controller")  // _DDN: DOS Device Name
Name (_HRV, 0x03)  // _HRV: Hardware Revision
Name (_UID, One)  // _UID: Unique ID
Name (_DEP, Package (0x02)  // _DEP: Dependencies
{
I2C7, 
GPO1
})
Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource 
Settings
{
Name (SBUF, ResourceTemplate ()
{
I2cSerialBusV2 (0x005E, ControllerInitiated, 0x000F4240,
AddressingMode7Bit, "\\_SB.PCI0.I2C7",
0x00, ResourceConsumer, , Exclusive,
)
GpioInt (Level, ActiveHigh, Shared, PullDefault, 0x,
"\\_SB.GPO1", 0x00, ResourceConsumer, ,
)
{   // Pin list
0x000F
}
})
Return (SBUF) /* \_SB_.PCI0.I2C7.PMI2._CRS.SBUF */
}
...
Name (AVBL, Zero)
Name (AVBD, Zero)
Name (AVBG, Zero)
Method (_REG, 2, NotSerialized)  // _REG: Region Availability
{
If (Arg0 == 0x08)
{
AVBG = Arg1
}

If (Arg0 == 0x8D)
{
AVBL = Arg1
}

If (Arg0 == 0x8C)
{
AVBD = Arg1
}
}


acpidbg:
\_SB.PCI0.I2C7.PMI2.AVBL Integer  8be7b74d97a8 01 = 
0001
\_SB.PCI0.I2C7.PMI2.AVBD Integer  8be7b74d94d8 01 = 
0001
\_SB.PCI0.I2C7.PMI2.AVBG Integer  8be7b74d9be0 01 = 


Any idea about it?
devm_gpiochip_add_data() in chv_gpio_probe() indirectly calls 
acpi_gpiochip_add()
which should use _DEP to figure out to call _REG, right?

Also PMI2 has

OperationRegion (GPOP, GeneralPurposeIo, Zero, 0x0100)
Field (GPOP, ByteAcc, NoLock, Preserve)
{
Connection (
GpioIo (Exclusive, PullDefault, 0x, 0x, 
IoRestrictionOutputOnly,
"\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, ,
)
{   // Pin list
0x0020
}
), 
GMP0,   1, 
...
(repeat for many more pins)

I guess it means it uses chv_gpio pins and can't work
if the GPIO opregion is not registered?


FWIW, with the mfd driver, /proc/interrupts has

 180:  0  0  0  0  chv-gpio9  TI Dollar Cove

I guess the 9 refers to the 10th pin in north_pins[] which is pin 0x000F, right?
I boot with "dyndbg=file gpiolib* +p" and get

[  +0.012798] acpi INT33F5:00: GPIO: looking up 0 in _CRS
[  +0.000214] intel_soc_pmic_i2c i2c-INT33F5:00: GPIO lookup for consumer 
intel_soc_pmic
[  +0.03] intel_soc_pmic_i2c i2c-INT33F5:00: using ACPI for GPIO lookup
[  +0.05] acpi INT33F5:00: GPIO: looking up intel_soc_pmic-gpios
[  +0.05] acpi INT33F5:00: GPIO: looking up intel_soc_pmic-gpio
[  +0.05] acpi INT33F5:00: GPIO: looking up 0 in _

Re: Cherryview wake up events

2017-01-31 Thread Johannes Stezenbach
Hi Andy and Mika,

On Tue, Jan 31, 2017 at 12:05:07AM +0200, Andy Shevchenko wrote:
> On Mon, Jan 30, 2017 at 10:57 PM, Johannes Stezenbach <j...@sig21.net> wrote:
> >
> > I checked the reference source code, my impression is the
> > TI Dollar Cove and and AXP288 are completely different hardware.
> 
> Thanks for checking.
> 
> Yes, due to not obvious communication to PMIC. I suppose that the IP
> core is quite similar in all of them, the difference is just how OS
> and other MCUs in SoC communicate with it.
> 
> So, basically what it means that I2C direct communication is prohibited here.

Not sure about that, but I guess this is needed:
https://lists.freedesktop.org/archives/intel-gfx/2017-January/117696.html

> >> > This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html
> >
> > Interestingly via this link I found Intel also published
> > the TI DCove source in a patch series against an unspecified kernel:
> > https://github.com/01org/ProductionKernelQuilts
> > specifically
> > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch
> > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch
> > and some more (the series is quite messy).

FWIW, now I came across yet another source for this driver:
https://android.googlesource.com/kernel/x86/+/android-x86-grant-3.10-marshmallow-mr1-wear-release/drivers/external_drivers/drivers/mfd/intel_pmic/
(but seems to be older)

> > For the Asus E200HA I'm not sure if the charger and coulomb
> > counter drivers are needed since charging just works and
> > the battery status is reported via ACPI.  It seems these
> > drivers are only for tablets without ACPI support, right?
> 
> Have no idea.
> 
> What that code reminds me is MID family of devices. So, power button
> is (reasonable) easy to get support of in that case.
> Look into drivers/platform/x86/intel_mid_powerbtn.c. I recently
> updated it to support Basin Cove on Intel Edison.

You seem to suggest I should try and tackle it myself,
which I would do, but for one I don't want to step on
Mika's toes, secondly ISTR you indicated you have newer,
better source than what is available publicly?
If you want me to take it, please let me know which tree
to work against and any other suggestions you have.

Some more questions:
- Powerbutton driver seems simple enough, the only specialty
  of the TI dcove PB driver is the workarond for lost button
  press event after resume.  However, I still don't see how
  the PB would cause thermal event irqs on E200HA and how the
  PMIC driver would change it?
- Wakeup from freeze state (E200HA doesn't support suspend / ACPI S3)
  is only step 1, to make it usable we need S0ix support.
  Any hints about that?

I think the mfd driver would be similar to intel_soc_pmic_crc.c,
the dollar_cove_ti_powerbtn.c I would keep instead of merging
it into intel_mid_powerbtn.c.  I guess what we need is in
drivers/acpi/pmic/ something similar to intel_pmic_crc.c,
the ProductionKernelQuilts has 
0001-ACPI-Adding-support-for-TI-pmic-opregion.patch.


Thanks,
Johannes


Re: Cherryview wake up events

2017-01-31 Thread Johannes Stezenbach
Hi Andy and Mika,

On Tue, Jan 31, 2017 at 12:05:07AM +0200, Andy Shevchenko wrote:
> On Mon, Jan 30, 2017 at 10:57 PM, Johannes Stezenbach  wrote:
> >
> > I checked the reference source code, my impression is the
> > TI Dollar Cove and and AXP288 are completely different hardware.
> 
> Thanks for checking.
> 
> Yes, due to not obvious communication to PMIC. I suppose that the IP
> core is quite similar in all of them, the difference is just how OS
> and other MCUs in SoC communicate with it.
> 
> So, basically what it means that I2C direct communication is prohibited here.

Not sure about that, but I guess this is needed:
https://lists.freedesktop.org/archives/intel-gfx/2017-January/117696.html

> >> > This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html
> >
> > Interestingly via this link I found Intel also published
> > the TI DCove source in a patch series against an unspecified kernel:
> > https://github.com/01org/ProductionKernelQuilts
> > specifically
> > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch
> > https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch
> > and some more (the series is quite messy).

FWIW, now I came across yet another source for this driver:
https://android.googlesource.com/kernel/x86/+/android-x86-grant-3.10-marshmallow-mr1-wear-release/drivers/external_drivers/drivers/mfd/intel_pmic/
(but seems to be older)

> > For the Asus E200HA I'm not sure if the charger and coulomb
> > counter drivers are needed since charging just works and
> > the battery status is reported via ACPI.  It seems these
> > drivers are only for tablets without ACPI support, right?
> 
> Have no idea.
> 
> What that code reminds me is MID family of devices. So, power button
> is (reasonable) easy to get support of in that case.
> Look into drivers/platform/x86/intel_mid_powerbtn.c. I recently
> updated it to support Basin Cove on Intel Edison.

You seem to suggest I should try and tackle it myself,
which I would do, but for one I don't want to step on
Mika's toes, secondly ISTR you indicated you have newer,
better source than what is available publicly?
If you want me to take it, please let me know which tree
to work against and any other suggestions you have.

Some more questions:
- Powerbutton driver seems simple enough, the only specialty
  of the TI dcove PB driver is the workarond for lost button
  press event after resume.  However, I still don't see how
  the PB would cause thermal event irqs on E200HA and how the
  PMIC driver would change it?
- Wakeup from freeze state (E200HA doesn't support suspend / ACPI S3)
  is only step 1, to make it usable we need S0ix support.
  Any hints about that?

I think the mfd driver would be similar to intel_soc_pmic_crc.c,
the dollar_cove_ti_powerbtn.c I would keep instead of merging
it into intel_mid_powerbtn.c.  I guess what we need is in
drivers/acpi/pmic/ something similar to intel_pmic_crc.c,
the ProductionKernelQuilts has 
0001-ACPI-Adding-support-for-TI-pmic-opregion.patch.


Thanks,
Johannes


Re: Cherryview wake up events

2017-01-30 Thread Johannes Stezenbach
On Fri, Jan 27, 2017 at 02:30:58PM +0100, Johannes Stezenbach wrote:
> On Fri, Jan 27, 2017 at 03:21:22PM +0200, Andy Shevchenko wrote:
> > On Fri, Jan 27, 2017 at 1:38 PM, Johannes Stezenbach <j...@sig21.net> wrote:
> > > On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote:
> > 
> > >> Had you tried to add ID to axp20x-i2c.c ?
> > >
> > > Nope, since I have no idea if the axp and TI hardware is similar.
> > 
> > I think you would give a try.
> 
> I'll check it.

I checked the reference source code, my impression is the
TI Dollar Cove and and AXP288 are completely different hardware.

> > > [5.331709] i2c_designware 808622C1:06: controller timed out
> > >
> > This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html

Interestingly via this link I found Intel also published
the TI DCove source in a patch series against an unspecified kernel:
https://github.com/01org/ProductionKernelQuilts
specifically
https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch
https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch
and some more (the series is quite messy).

For the Asus E200HA I'm not sure if the charger and coulomb
counter drivers are needed since charging just works and
the battery status is reported via ACPI.  It seems these
drivers are only for tablets without ACPI support, right?


Thanks,
Johannes


Re: Cherryview wake up events

2017-01-30 Thread Johannes Stezenbach
On Fri, Jan 27, 2017 at 02:30:58PM +0100, Johannes Stezenbach wrote:
> On Fri, Jan 27, 2017 at 03:21:22PM +0200, Andy Shevchenko wrote:
> > On Fri, Jan 27, 2017 at 1:38 PM, Johannes Stezenbach  wrote:
> > > On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote:
> > 
> > >> Had you tried to add ID to axp20x-i2c.c ?
> > >
> > > Nope, since I have no idea if the axp and TI hardware is similar.
> > 
> > I think you would give a try.
> 
> I'll check it.

I checked the reference source code, my impression is the
TI Dollar Cove and and AXP288 are completely different hardware.

> > > [5.331709] i2c_designware 808622C1:06: controller timed out
> > >
> > This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html

Interestingly via this link I found Intel also published
the TI DCove source in a patch series against an unspecified kernel:
https://github.com/01org/ProductionKernelQuilts
specifically
https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/mfd-intel_soc_pmic-add-TI-variant-of-dollar-cove.patch
https://github.com/01org/ProductionKernelQuilts/blob/master/uefi/cht-m1stable/patches/PWRBTN-add-driver-for-TI-PMIC.patch
and some more (the series is quite messy).

For the Asus E200HA I'm not sure if the charger and coulomb
counter drivers are needed since charging just works and
the battery status is reported via ACPI.  It seems these
drivers are only for tablets without ACPI support, right?


Thanks,
Johannes


Re: Cherryview wake up events

2017-01-27 Thread Johannes Stezenbach
On Fri, Jan 27, 2017 at 03:21:22PM +0200, Andy Shevchenko wrote:
> On Fri, Jan 27, 2017 at 1:38 PM, Johannes Stezenbach <j...@sig21.net> wrote:
> > On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote:
> 
> > And the same info is also in sysfs:
> >
> > # cat 
> > /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F4\:00/status
> > 0
> > # cat 
> > /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F5\:00/status
> > 15
> >
> > The DSDT is still at https://linuxtv.org/~js/e200ha/
> >
> >> Had you tried to add ID to axp20x-i2c.c ?
> >
> > Nope, since I have no idea if the axp and TI hardware is similar.
> 
> I think you would give a try.

I'll check it.

> > There might be more issues, currently the machine hangs often
> > during bootup at random points.  I built i915 as a module and
> > blacklisted it for autoloading so I can read the last message
> > on the console.  All I can say is that it is more likely to
> > hang when the loglevel is high, i.e. it almost never succeeds
> > with "debug" on kernel command line.  Sometimes there are
> > timeout errors from I2C:
> > [4.307189] i2c_designware 808622C1:06: controller timed out
> > [5.331709] i2c_designware 808622C1:06: controller timed out
> >
> > Once it has booted it is running stable.
> 
> This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html

Not sure because it happens with i915 module not loaded
(currently I load it manually after boot completed).
But thanks for the link.

Johannes


Re: Cherryview wake up events

2017-01-27 Thread Johannes Stezenbach
On Fri, Jan 27, 2017 at 03:21:22PM +0200, Andy Shevchenko wrote:
> On Fri, Jan 27, 2017 at 1:38 PM, Johannes Stezenbach  wrote:
> > On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote:
> 
> > And the same info is also in sysfs:
> >
> > # cat 
> > /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F4\:00/status
> > 0
> > # cat 
> > /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F5\:00/status
> > 15
> >
> > The DSDT is still at https://linuxtv.org/~js/e200ha/
> >
> >> Had you tried to add ID to axp20x-i2c.c ?
> >
> > Nope, since I have no idea if the axp and TI hardware is similar.
> 
> I think you would give a try.

I'll check it.

> > There might be more issues, currently the machine hangs often
> > during bootup at random points.  I built i915 as a module and
> > blacklisted it for autoloading so I can read the last message
> > on the console.  All I can say is that it is more likely to
> > hang when the loglevel is high, i.e. it almost never succeeds
> > with "debug" on kernel command line.  Sometimes there are
> > timeout errors from I2C:
> > [4.307189] i2c_designware 808622C1:06: controller timed out
> > [5.331709] i2c_designware 808622C1:06: controller timed out
> >
> > Once it has booted it is running stable.
> 
> This is known: http://www.spinics.net/lists/intel-gfx/msg117738.html

Not sure because it happens with i915 module not loaded
(currently I load it manually after boot completed).
But thanks for the link.

Johannes


Re: Cherryview wake up events

2017-01-27 Thread Johannes Stezenbach
On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote:
> 
> I'm reading your long thread about the issue.

Thanks for taking the time!

> > but excluded CONFIG_MFD_AXP20X based on \_SB.PIC0.I2C7.PMI1._STA returning 
> > 0 in acpidbg,
> > but \_SB.PIC0.I2C7.PMI1._STA returns 0xf
> 
> Did you mean PMI2 in the second sentence?

Yes, sorry for copy & paste mistake.  I just repated to confirm:
In acpidbg:

- execute \_SB.PCI0.I2C7.PMI1._STA
Evaluating \_SB.PCI0.I2C7.PMI1._STA
Evaluation of \_SB.PCI0.I2C7.PMI1._STA returned object a14a6742, 
external buffer length 18
 [Integer] = 

- execute \_SB.PCI0.I2C7.PMI2._STA
Evaluating \_SB.PCI0.I2C7.PMI2._STA
Evaluation of \_SB.PCI0.I2C7.PMI2._STA returned object a14a6742, 
external buffer length 18
 [Integer] = 000F

And the same info is also in sysfs:

# cat 
/sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F4\:00/status
0
# cat 
/sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F5\:00/status
15

The DSDT is still at https://linuxtv.org/~js/e200ha/

> Had you tried to add ID to axp20x-i2c.c ?

Nope, since I have no idea if the axp and TI hardware is similar.

There might be more issues, currently the machine hangs often
during bootup at random points.  I built i915 as a module and
blacklisted it for autoloading so I can read the last message
on the console.  All I can say is that it is more likely to
hang when the loglevel is high, i.e. it almost never succeeds
with "debug" on kernel command line.  Sometimes there are
timeout errors from I2C:
[4.307189] i2c_designware 808622C1:06: controller timed out
[5.331709] i2c_designware 808622C1:06: controller timed out

Once it has booted it is running stable.


Thanks,
Johannes


Re: Cherryview wake up events

2017-01-27 Thread Johannes Stezenbach
On Fri, Jan 27, 2017 at 12:56:53AM +0200, Andy Shevchenko wrote:
> 
> I'm reading your long thread about the issue.

Thanks for taking the time!

> > but excluded CONFIG_MFD_AXP20X based on \_SB.PIC0.I2C7.PMI1._STA returning 
> > 0 in acpidbg,
> > but \_SB.PIC0.I2C7.PMI1._STA returns 0xf
> 
> Did you mean PMI2 in the second sentence?

Yes, sorry for copy & paste mistake.  I just repated to confirm:
In acpidbg:

- execute \_SB.PCI0.I2C7.PMI1._STA
Evaluating \_SB.PCI0.I2C7.PMI1._STA
Evaluation of \_SB.PCI0.I2C7.PMI1._STA returned object a14a6742, 
external buffer length 18
 [Integer] = 

- execute \_SB.PCI0.I2C7.PMI2._STA
Evaluating \_SB.PCI0.I2C7.PMI2._STA
Evaluation of \_SB.PCI0.I2C7.PMI2._STA returned object a14a6742, 
external buffer length 18
 [Integer] = 000F

And the same info is also in sysfs:

# cat 
/sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F4\:00/status
0
# cat 
/sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/PNP0A08\:00/808622C1\:06/INT33F5\:00/status
15

The DSDT is still at https://linuxtv.org/~js/e200ha/

> Had you tried to add ID to axp20x-i2c.c ?

Nope, since I have no idea if the axp and TI hardware is similar.

There might be more issues, currently the machine hangs often
during bootup at random points.  I built i915 as a module and
blacklisted it for autoloading so I can read the last message
on the console.  All I can say is that it is more likely to
hang when the loglevel is high, i.e. it almost never succeeds
with "debug" on kernel command line.  Sometimes there are
timeout errors from I2C:
[4.307189] i2c_designware 808622C1:06: controller timed out
[5.331709] i2c_designware 808622C1:06: controller timed out

Once it has booted it is running stable.


Thanks,
Johannes


Re: Cherryview wake up events

2017-01-24 Thread Johannes Stezenbach
On Tue, Jan 24, 2017 at 04:28:29PM +0200, Andy Shevchenko wrote:
> They probably release just almost all One Big Ugly patch from official
> BSP, which by some reason, includes all Intel MID SoCs, Baytrail.
> I think I know how Dollar Cove related code ended up there. But that
> all stuff is a total mess.

I agree.  Probably it was a mistake to bring up this code here.
Let me try to go back two steps:  Could you please let me
know if there is any progress in mainlining the TI Dollar Cove PMIC
and related drivers?  Is there a schedule?
After waiting for four months I'm actually getting impatient
because by now the Cherryview based hardware seems to go out
of production and I fear the mainlining might never happen.

Thanks,
Johannes


Re: Cherryview wake up events

2017-01-24 Thread Johannes Stezenbach
On Tue, Jan 24, 2017 at 04:28:29PM +0200, Andy Shevchenko wrote:
> They probably release just almost all One Big Ugly patch from official
> BSP, which by some reason, includes all Intel MID SoCs, Baytrail.
> I think I know how Dollar Cove related code ended up there. But that
> all stuff is a total mess.

I agree.  Probably it was a mistake to bring up this code here.
Let me try to go back two steps:  Could you please let me
know if there is any progress in mainlining the TI Dollar Cove PMIC
and related drivers?  Is there a schedule?
After waiting for four months I'm actually getting impatient
because by now the Cherryview based hardware seems to go out
of production and I fear the mainlining might never happen.

Thanks,
Johannes


Re: Cherryview wake up events

2017-01-24 Thread Johannes Stezenbach
On Tue, Jan 24, 2017 at 01:14:16PM +0200, Andy Shevchenko wrote:
> On Tue, Jan 24, 2017 at 11:41 AM, Johannes Stezenbach <j...@sig21.net> wrote:
> > Meanwhile I found out the TI PMIC and power button drivers
> > has been published as part of the Asus ZenFone Zoom (ZX551ML)
> > Android kernel code drop (based on linux-3.10.x):
> >
> > https://www.asus.com/support/Download/39/1/0/26/BXbNqJplzZiLmk6G/32/
> >
> > Please let me know if there is anything I could do
> > to help get it mainlined soon.
> 
> AFAIK ASuS Zenfone 2 (Intel based) series uses Intel Moorefield. It
> has ShadyCove PMIC.

So Asus released more than they needed.  I confirmed their
source drop contains the TI Dollar Cove driver (dollar_cove_ti.c).
iPreviously I searched for Android devices using CherryView but the
only one I could find is Xioami MiPad 2 and it's released
kernel source doesn't contain the driver.

Anyway, let me know if I can help to get it into mainline soon.


Johannes


Re: Cherryview wake up events

2017-01-24 Thread Johannes Stezenbach
On Tue, Jan 24, 2017 at 01:14:16PM +0200, Andy Shevchenko wrote:
> On Tue, Jan 24, 2017 at 11:41 AM, Johannes Stezenbach  wrote:
> > Meanwhile I found out the TI PMIC and power button drivers
> > has been published as part of the Asus ZenFone Zoom (ZX551ML)
> > Android kernel code drop (based on linux-3.10.x):
> >
> > https://www.asus.com/support/Download/39/1/0/26/BXbNqJplzZiLmk6G/32/
> >
> > Please let me know if there is anything I could do
> > to help get it mainlined soon.
> 
> AFAIK ASuS Zenfone 2 (Intel based) series uses Intel Moorefield. It
> has ShadyCove PMIC.

So Asus released more than they needed.  I confirmed their
source drop contains the TI Dollar Cove driver (dollar_cove_ti.c).
iPreviously I searched for Android devices using CherryView but the
only one I could find is Xioami MiPad 2 and it's released
kernel source doesn't contain the driver.

Anyway, let me know if I can help to get it into mainline soon.


Johannes


Re: Cherryview wake up events

2017-01-24 Thread Johannes Stezenbach
Hi,

On Mon, Dec 05, 2016 at 01:06:08PM +0200, Mika Westerberg wrote:
> On Sun, Dec 04, 2016 at 07:52:19PM +0100, Johannes Stezenbach wrote:
> > On Wed, Oct 05, 2016 at 04:05:11PM +0300, Mika Westerberg wrote:
> > > On Wed, Oct 05, 2016 at 02:46:48PM +0200, Johannes Stezenbach wrote:
> > > > On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote:
> > > > > David (CC'd) is working on getting the Dollar Cove PMIC driver
> > > > > upstreamed to the mainline kernel.
> > > > 
> > > > May I ask when to expect a patch?  I'm ready if you
> > > > have something to test, even if it's not in
> > > > shape for mainline yet.
> > > 
> > > It typically takes quite some time to get all the legal stuff done
> > > before the code can be published. And if people are busy with other
> > > things it takes even more time.
> > > 
> > > So please be patient, it will happen sooner or later ;-)
> > 
> > I don't want to nag, but just so it doesn't drop off
> > the TODO list due to "lack of interest":  What's the
> > status?  Will Santa bring the the TI Dollar Cove PMIC driver?
> 
> David, do you have any estimate?


Meanwhile I found out the TI PMIC and power button drivers
has been published as part of the Asus ZenFone Zoom (ZX551ML)
Android kernel code drop (based on linux-3.10.x):

https://www.asus.com/support/Download/39/1/0/26/BXbNqJplzZiLmk6G/32/

Please let me know if there is anything I could do
to help get it mainlined soon.


Thanks,
Johannes


Re: Cherryview wake up events

2017-01-24 Thread Johannes Stezenbach
Hi,

On Mon, Dec 05, 2016 at 01:06:08PM +0200, Mika Westerberg wrote:
> On Sun, Dec 04, 2016 at 07:52:19PM +0100, Johannes Stezenbach wrote:
> > On Wed, Oct 05, 2016 at 04:05:11PM +0300, Mika Westerberg wrote:
> > > On Wed, Oct 05, 2016 at 02:46:48PM +0200, Johannes Stezenbach wrote:
> > > > On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote:
> > > > > David (CC'd) is working on getting the Dollar Cove PMIC driver
> > > > > upstreamed to the mainline kernel.
> > > > 
> > > > May I ask when to expect a patch?  I'm ready if you
> > > > have something to test, even if it's not in
> > > > shape for mainline yet.
> > > 
> > > It typically takes quite some time to get all the legal stuff done
> > > before the code can be published. And if people are busy with other
> > > things it takes even more time.
> > > 
> > > So please be patient, it will happen sooner or later ;-)
> > 
> > I don't want to nag, but just so it doesn't drop off
> > the TODO list due to "lack of interest":  What's the
> > status?  Will Santa bring the the TI Dollar Cove PMIC driver?
> 
> David, do you have any estimate?


Meanwhile I found out the TI PMIC and power button drivers
has been published as part of the Asus ZenFone Zoom (ZX551ML)
Android kernel code drop (based on linux-3.10.x):

https://www.asus.com/support/Download/39/1/0/26/BXbNqJplzZiLmk6G/32/

Please let me know if there is anything I could do
to help get it mainlined soon.


Thanks,
Johannes


Re: Cherryview wake up events

2016-12-04 Thread Johannes Stezenbach
Hi,

On Wed, Oct 05, 2016 at 04:05:11PM +0300, Mika Westerberg wrote:
> On Wed, Oct 05, 2016 at 02:46:48PM +0200, Johannes Stezenbach wrote:
> > On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote:
> > > David (CC'd) is working on getting the Dollar Cove PMIC driver
> > > upstreamed to the mainline kernel.
> > 
> > May I ask when to expect a patch?  I'm ready if you
> > have something to test, even if it's not in
> > shape for mainline yet.
> 
> It typically takes quite some time to get all the legal stuff done
> before the code can be published. And if people are busy with other
> things it takes even more time.
> 
> So please be patient, it will happen sooner or later ;-)

I don't want to nag, but just so it doesn't drop off
the TODO list due to "lack of interest":  What's the
status?  Will Santa bring the the TI Dollar Cove PMIC driver?

While I'm at it, I also have questions about S0ix support
in Linux which I didn't find answers to by web search.
Does S0ix depend on the PMIC driver?  And will it be
used during run time or only in "sleep" state
(which would mean "echo freeze >/sys/power/state"
since ACPI S3 isn't supported)?
Now all I know is it doesn't seem to be used so far (running 4.9.0-rc7+):

/sys/kernel/debug/pmc_atom/sleep_state:S0IR Residency:  0us
/sys/kernel/debug/pmc_atom/sleep_state:S0I1 Residency:  0us
/sys/kernel/debug/pmc_atom/sleep_state:S0I2 Residency:  0us
/sys/kernel/debug/pmc_atom/sleep_state:S0I3 Residency:  0us
/sys/kernel/debug/pmc_atom/sleep_state:S0   Residency: 160934496us


Thanks,
Johannes


Re: Cherryview wake up events

2016-12-04 Thread Johannes Stezenbach
Hi,

On Wed, Oct 05, 2016 at 04:05:11PM +0300, Mika Westerberg wrote:
> On Wed, Oct 05, 2016 at 02:46:48PM +0200, Johannes Stezenbach wrote:
> > On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote:
> > > David (CC'd) is working on getting the Dollar Cove PMIC driver
> > > upstreamed to the mainline kernel.
> > 
> > May I ask when to expect a patch?  I'm ready if you
> > have something to test, even if it's not in
> > shape for mainline yet.
> 
> It typically takes quite some time to get all the legal stuff done
> before the code can be published. And if people are busy with other
> things it takes even more time.
> 
> So please be patient, it will happen sooner or later ;-)

I don't want to nag, but just so it doesn't drop off
the TODO list due to "lack of interest":  What's the
status?  Will Santa bring the the TI Dollar Cove PMIC driver?

While I'm at it, I also have questions about S0ix support
in Linux which I didn't find answers to by web search.
Does S0ix depend on the PMIC driver?  And will it be
used during run time or only in "sleep" state
(which would mean "echo freeze >/sys/power/state"
since ACPI S3 isn't supported)?
Now all I know is it doesn't seem to be used so far (running 4.9.0-rc7+):

/sys/kernel/debug/pmc_atom/sleep_state:S0IR Residency:  0us
/sys/kernel/debug/pmc_atom/sleep_state:S0I1 Residency:  0us
/sys/kernel/debug/pmc_atom/sleep_state:S0I2 Residency:  0us
/sys/kernel/debug/pmc_atom/sleep_state:S0I3 Residency:  0us
/sys/kernel/debug/pmc_atom/sleep_state:S0   Residency: 160934496us


Thanks,
Johannes


Re: [PATCH v2 18/31] gp8psk: don't do DMA on stack

2016-11-07 Thread Johannes Stezenbach
On Sun, Nov 06, 2016 at 11:51:14AM -0800, VDR User wrote:
> I applied this patch to the 4.8.4 kernel driver (that I'm currently
> running) and it caused nothing but "frontend 0/0 timed out while
> tuning". Is there another patch that should be used in conjunction
> with this? If not, this patch breaks the gp8psk driver.
> 
> Thanks.

Thanks for testing.  "If it's not tested it's broken"...

> On Tue, Oct 11, 2016 at 3:09 AM, Mauro Carvalho Chehab
>  wrote:

> > index 5d0384dd45b5..fa215ad37f7b 100644
> > --- a/drivers/media/usb/dvb-usb/gp8psk.c
> > +++ b/drivers/media/usb/dvb-usb/gp8psk.c

> >  int gp8psk_usb_in_op(struct dvb_usb_device *d, u8 req, u16 value, u16 
> > index, u8 *b, int blen)
> >  {
> > +   struct gp8psk_state *st = d->priv;
> > int ret = 0,try = 0;
> >
> > if ((ret = mutex_lock_interruptible(>usb_mutex)))
> > return ret;
> >
> > while (ret >= 0 && ret != blen && try < 3) {
> > +   memcpy(st->data, b, blen);
> > ret = usb_control_msg(d->udev,
> > usb_rcvctrlpipe(d->udev,0),
> > req,
> > USB_TYPE_VENDOR | USB_DIR_IN,
> > -   value,index,b,blen,
> > +   value, index, st->data, blen,
> > 2000);

I guess for usb_in the memcpy should be after the usb_control_msg
and from st->data to b.

Johannes


Re: [PATCH v2 18/31] gp8psk: don't do DMA on stack

2016-11-07 Thread Johannes Stezenbach
On Sun, Nov 06, 2016 at 11:51:14AM -0800, VDR User wrote:
> I applied this patch to the 4.8.4 kernel driver (that I'm currently
> running) and it caused nothing but "frontend 0/0 timed out while
> tuning". Is there another patch that should be used in conjunction
> with this? If not, this patch breaks the gp8psk driver.
> 
> Thanks.

Thanks for testing.  "If it's not tested it's broken"...

> On Tue, Oct 11, 2016 at 3:09 AM, Mauro Carvalho Chehab
>  wrote:

> > index 5d0384dd45b5..fa215ad37f7b 100644
> > --- a/drivers/media/usb/dvb-usb/gp8psk.c
> > +++ b/drivers/media/usb/dvb-usb/gp8psk.c

> >  int gp8psk_usb_in_op(struct dvb_usb_device *d, u8 req, u16 value, u16 
> > index, u8 *b, int blen)
> >  {
> > +   struct gp8psk_state *st = d->priv;
> > int ret = 0,try = 0;
> >
> > if ((ret = mutex_lock_interruptible(>usb_mutex)))
> > return ret;
> >
> > while (ret >= 0 && ret != blen && try < 3) {
> > +   memcpy(st->data, b, blen);
> > ret = usb_control_msg(d->udev,
> > usb_rcvctrlpipe(d->udev,0),
> > req,
> > USB_TYPE_VENDOR | USB_DIR_IN,
> > -   value,index,b,blen,
> > +   value, index, st->data, blen,
> > 2000);

I guess for usb_in the memcpy should be after the usb_control_msg
and from st->data to b.

Johannes


Re: [PATCH v2 02/31] cinergyT2-core: don't do DMA on stack

2016-10-15 Thread Johannes Stezenbach
On Tue, Oct 11, 2016 at 07:09:17AM -0300, Mauro Carvalho Chehab wrote:
> --- a/drivers/media/usb/dvb-usb/cinergyT2-core.c
> +++ b/drivers/media/usb/dvb-usb/cinergyT2-core.c
> @@ -41,6 +41,8 @@ DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nr);
>  
>  struct cinergyt2_state {
>   u8 rc_counter;
> + unsigned char data[64];
> + struct mutex data_mutex;
>  };

Sometimes my thinking is slow but it just occured to me
that this creates a potential issue with cache line sharing.
On an architecture which manages cache coherence in software
(ARM, MIPS etc.) a write to e.g. rc_counter in this example
would dirty the cache line, and a later writeback from the
cache could overwrite parts of data[] which was received via DMA.
In contrast, if the DMA buffer is allocated seperately via
kmalloc it is guaranteed to be safe wrt cache line sharing.
(see bottom of Documentation/DMA-API-HOWTO.txt).

But of course DMA on stack also had the same issue
and no one ever noticed so it's apparently not critical...


Johannes


Re: [PATCH v2 02/31] cinergyT2-core: don't do DMA on stack

2016-10-15 Thread Johannes Stezenbach
On Tue, Oct 11, 2016 at 07:09:17AM -0300, Mauro Carvalho Chehab wrote:
> --- a/drivers/media/usb/dvb-usb/cinergyT2-core.c
> +++ b/drivers/media/usb/dvb-usb/cinergyT2-core.c
> @@ -41,6 +41,8 @@ DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nr);
>  
>  struct cinergyt2_state {
>   u8 rc_counter;
> + unsigned char data[64];
> + struct mutex data_mutex;
>  };

Sometimes my thinking is slow but it just occured to me
that this creates a potential issue with cache line sharing.
On an architecture which manages cache coherence in software
(ARM, MIPS etc.) a write to e.g. rc_counter in this example
would dirty the cache line, and a later writeback from the
cache could overwrite parts of data[] which was received via DMA.
In contrast, if the DMA buffer is allocated seperately via
kmalloc it is guaranteed to be safe wrt cache line sharing.
(see bottom of Documentation/DMA-API-HOWTO.txt).

But of course DMA on stack also had the same issue
and no one ever noticed so it's apparently not critical...


Johannes


Re: Problem with VMAP_STACK=y

2016-10-05 Thread Johannes Stezenbach
On Wed, Oct 05, 2016 at 06:04:50AM -0300, Mauro Carvalho Chehab wrote:
>  static int cinergyt2_frontend_attach(struct dvb_usb_adapter *adap)
>  {
> - char query[] = { CINERGYT2_EP1_GET_FIRMWARE_VERSION };
> - char state[3];
> + struct dvb_usb_device *d = adap->dev;
> + struct cinergyt2_state *st = d->priv;
>   int ret;
>  
>   adap->fe_adap[0].fe = cinergyt2_fe_attach(adap->dev);
>  
> - ret = dvb_usb_generic_rw(adap->dev, query, sizeof(query), state,
> - sizeof(state), 0);

it seems to miss this:

st->data[0] = CINERGYT2_EP1_GET_FIRMWARE_VERSION;

> + ret = dvb_usb_generic_rw(d, st->data, 1, st->data, 3, 0);
>   if (ret < 0) {
>   deb_rc("cinergyt2_power_ctrl() Failed to retrieve sleep "
>   "state info\n");
> @@ -141,13 +147,14 @@ static int repeatable_keys[] = {
>  static int cinergyt2_rc_query(struct dvb_usb_device *d, u32 *event, int 
> *state)
>  {
>   struct cinergyt2_state *st = d->priv;
> - u8 key[5] = {0, 0, 0, 0, 0}, cmd = CINERGYT2_EP1_GET_RC_EVENTS;
>   int i;
>  
>   *state = REMOTE_NO_KEY_PRESSED;
>  
> - dvb_usb_generic_rw(d, , 1, key, sizeof(key), 0);
> - if (key[4] == 0xff) {
> + st->data[0] = CINERGYT2_EP1_SLEEP_MODE;

should probably be

st->data[0] = CINERGYT2_EP1_GET_RC_EVENTS;

> +
> + dvb_usb_generic_rw(d, st->data, 1, st->data, 5, 0);


HTH,
Johannes


Re: Problem with VMAP_STACK=y

2016-10-05 Thread Johannes Stezenbach
On Wed, Oct 05, 2016 at 06:04:50AM -0300, Mauro Carvalho Chehab wrote:
>  static int cinergyt2_frontend_attach(struct dvb_usb_adapter *adap)
>  {
> - char query[] = { CINERGYT2_EP1_GET_FIRMWARE_VERSION };
> - char state[3];
> + struct dvb_usb_device *d = adap->dev;
> + struct cinergyt2_state *st = d->priv;
>   int ret;
>  
>   adap->fe_adap[0].fe = cinergyt2_fe_attach(adap->dev);
>  
> - ret = dvb_usb_generic_rw(adap->dev, query, sizeof(query), state,
> - sizeof(state), 0);

it seems to miss this:

st->data[0] = CINERGYT2_EP1_GET_FIRMWARE_VERSION;

> + ret = dvb_usb_generic_rw(d, st->data, 1, st->data, 3, 0);
>   if (ret < 0) {
>   deb_rc("cinergyt2_power_ctrl() Failed to retrieve sleep "
>   "state info\n");
> @@ -141,13 +147,14 @@ static int repeatable_keys[] = {
>  static int cinergyt2_rc_query(struct dvb_usb_device *d, u32 *event, int 
> *state)
>  {
>   struct cinergyt2_state *st = d->priv;
> - u8 key[5] = {0, 0, 0, 0, 0}, cmd = CINERGYT2_EP1_GET_RC_EVENTS;
>   int i;
>  
>   *state = REMOTE_NO_KEY_PRESSED;
>  
> - dvb_usb_generic_rw(d, , 1, key, sizeof(key), 0);
> - if (key[4] == 0xff) {
> + st->data[0] = CINERGYT2_EP1_SLEEP_MODE;

should probably be

st->data[0] = CINERGYT2_EP1_GET_RC_EVENTS;

> +
> + dvb_usb_generic_rw(d, st->data, 1, st->data, 5, 0);


HTH,
Johannes


Re: Cherryview wake up events

2016-10-05 Thread Johannes Stezenbach
On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote:
> David (CC'd) is working on getting the Dollar Cove PMIC driver
> upstreamed to the mainline kernel.

May I ask when to expect a patch?  I'm ready if you
have something to test, even if it's not in
shape for mainline yet.

Thanks,
Johannes


Re: Cherryview wake up events

2016-10-05 Thread Johannes Stezenbach
On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote:
> David (CC'd) is working on getting the Dollar Cove PMIC driver
> upstreamed to the mainline kernel.

May I ask when to expect a patch?  I'm ready if you
have something to test, even if it's not in
shape for mainline yet.

Thanks,
Johannes


Re: Cherryview wake up events

2016-09-23 Thread Johannes Stezenbach
On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote:
> On Wed, Sep 21, 2016 at 11:16:35AM +0200, Johannes Stezenbach wrote:
> > There is an ADBG ("TI_DCOVE") in PMI2._STA, so Dollar Cove
> > sounds like a good guess.
> 
> David (CC'd) is working on getting the Dollar Cove PMIC driver
> upstreamed to the mainline kernel.

Excellent news!  Repeating essential info to avoid any
confusion, the PMIC is

Device (PMI2)
{
Name (_ADR, Zero)  // _ADR: Address
Name (_HID, "INT33F5" /* TI PMIC Controller */)  // _HID: 
Hardware ID
Name (_CID, "INT33F5" /* TI PMIC Controller */)  // _CID: 
Compatible ID
Name (_DDN, "TI PMIC Controller")  // _DDN: DOS Device Name

(Because the INT33F4 XPOWER PMIC also has ADBG ("XPWR_DCOVE")
so I'm not sure "Dollar Cove" is a unique name.)


I put the Asus E200HA DSDT at
https://linuxtv.org/~js/e200ha/


Thanks,
Johannes


Re: Cherryview wake up events

2016-09-23 Thread Johannes Stezenbach
On Fri, Sep 23, 2016 at 11:19:04AM +0300, Mika Westerberg wrote:
> On Wed, Sep 21, 2016 at 11:16:35AM +0200, Johannes Stezenbach wrote:
> > There is an ADBG ("TI_DCOVE") in PMI2._STA, so Dollar Cove
> > sounds like a good guess.
> 
> David (CC'd) is working on getting the Dollar Cove PMIC driver
> upstreamed to the mainline kernel.

Excellent news!  Repeating essential info to avoid any
confusion, the PMIC is

Device (PMI2)
{
Name (_ADR, Zero)  // _ADR: Address
Name (_HID, "INT33F5" /* TI PMIC Controller */)  // _HID: 
Hardware ID
Name (_CID, "INT33F5" /* TI PMIC Controller */)  // _CID: 
Compatible ID
Name (_DDN, "TI PMIC Controller")  // _DDN: DOS Device Name

(Because the INT33F4 XPOWER PMIC also has ADBG ("XPWR_DCOVE")
so I'm not sure "Dollar Cove" is a unique name.)


I put the Asus E200HA DSDT at
https://linuxtv.org/~js/e200ha/


Thanks,
Johannes


Re: Cherryview wake up events

2016-09-21 Thread Johannes Stezenbach
On Wed, Sep 21, 2016 at 12:06:14PM +0300, Mika Westerberg wrote:
> On Tue, Sep 20, 2016 at 11:11:53PM +0200, Johannes Stezenbach wrote:
> > Or it is because the PNP0C40 device depends on GpioInt from PMIC
> > which isn't available...
> > 
> > Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource 
> > Settings
> > {
> > Name (CBUF, ResourceTemplate ()
> > {
> > GpioInt (Edge, ActiveBoth, ExclusiveAndWake, PullUp, 
> > 0x0BB8,
> > "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, ,
> > )
> > {   // Pin list
> > 0x0016
> > }
> > })
> > Return (CBUF) /* \_SB_.TBAD._CRS.CBUF */
> > }
> 
> Most likely this is the reason. I'll try to find if we have an existing
> driver for this PMIC somewhere. I guess this is the Dollar Cove which is
> successor of Crystal Cove IIRC which is already supported by the
> mainline kernel.

There is an ADBG ("TI_DCOVE") in PMI2._STA, so Dollar Cove
sounds like a good guess.


Thanks,
Johannes


Re: Cherryview wake up events

2016-09-21 Thread Johannes Stezenbach
On Wed, Sep 21, 2016 at 12:06:14PM +0300, Mika Westerberg wrote:
> On Tue, Sep 20, 2016 at 11:11:53PM +0200, Johannes Stezenbach wrote:
> > Or it is because the PNP0C40 device depends on GpioInt from PMIC
> > which isn't available...
> > 
> > Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource 
> > Settings
> > {
> > Name (CBUF, ResourceTemplate ()
> > {
> > GpioInt (Edge, ActiveBoth, ExclusiveAndWake, PullUp, 
> > 0x0BB8,
> > "\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, ,
> > )
> > {   // Pin list
> > 0x0016
> > }
> > })
> > Return (CBUF) /* \_SB_.TBAD._CRS.CBUF */
> > }
> 
> Most likely this is the reason. I'll try to find if we have an existing
> driver for this PMIC somewhere. I guess this is the Dollar Cove which is
> successor of Crystal Cove IIRC which is already supported by the
> mainline kernel.

There is an ADBG ("TI_DCOVE") in PMI2._STA, so Dollar Cove
sounds like a good guess.


Thanks,
Johannes


Re: Cherryview wake up events

2016-09-20 Thread Johannes Stezenbach
On Tue, Sep 20, 2016 at 05:59:43PM +0200, Johannes Stezenbach wrote:
> On Tue, Sep 20, 2016 at 01:40:14PM +0300, Mika Westerberg wrote:
> > If yes, it probably does not have the normal Fixed power button but
> > instead it has something called "Windows button array device" with
> > _HID/_CID of PNP0C40. Looking at your dsdt.dsl, this looks to be the
> > case.
> > 
> > That device is driven by soc_button_array.c driver which can be enabled
> > with CONFIG_KEYBOARD_GPIO=y and CONFIG_INPUT_SOC_BUTTON_ARRAY=y. Can you
> > check if you have that enabled already?
> > 
> > You should actually see it in /proc/interrupts with names like "power"
> > and so on.
> 
> I added CONFIG_INPUT_SOC_BUTTON_ARRAY=y, but no joy.
> Maybe because the _HID is INTCFD9, only _CID is PNP0C40?
> It also has a _DSM with UUID dfbcf3c5-e7a5-44e6-9c1f-29c76f6e059c.

Or it is because the PNP0C40 device depends on GpioInt from PMIC
which isn't available...

Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
{
Name (CBUF, ResourceTemplate ()
{
GpioInt (Edge, ActiveBoth, ExclusiveAndWake, PullUp, 0x0BB8,
"\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, ,
)
{   // Pin list
0x0016
}
})
Return (CBUF) /* \_SB_.TBAD._CRS.CBUF */
}

Thanks,
Johannes


Re: Cherryview wake up events

2016-09-20 Thread Johannes Stezenbach
On Tue, Sep 20, 2016 at 05:59:43PM +0200, Johannes Stezenbach wrote:
> On Tue, Sep 20, 2016 at 01:40:14PM +0300, Mika Westerberg wrote:
> > If yes, it probably does not have the normal Fixed power button but
> > instead it has something called "Windows button array device" with
> > _HID/_CID of PNP0C40. Looking at your dsdt.dsl, this looks to be the
> > case.
> > 
> > That device is driven by soc_button_array.c driver which can be enabled
> > with CONFIG_KEYBOARD_GPIO=y and CONFIG_INPUT_SOC_BUTTON_ARRAY=y. Can you
> > check if you have that enabled already?
> > 
> > You should actually see it in /proc/interrupts with names like "power"
> > and so on.
> 
> I added CONFIG_INPUT_SOC_BUTTON_ARRAY=y, but no joy.
> Maybe because the _HID is INTCFD9, only _CID is PNP0C40?
> It also has a _DSM with UUID dfbcf3c5-e7a5-44e6-9c1f-29c76f6e059c.

Or it is because the PNP0C40 device depends on GpioInt from PMIC
which isn't available...

Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
{
Name (CBUF, ResourceTemplate ()
{
GpioInt (Edge, ActiveBoth, ExclusiveAndWake, PullUp, 0x0BB8,
"\\_SB.PCI0.I2C7.PMI2", 0x00, ResourceConsumer, ,
)
{   // Pin list
0x0016
}
})
Return (CBUF) /* \_SB_.TBAD._CRS.CBUF */
}

Thanks,
Johannes


Re: Cherryview wake up events

2016-09-20 Thread Johannes Stezenbach
On Tue, Sep 20, 2016 at 01:40:14PM +0300, Mika Westerberg wrote:
> Can you check if you have:
> 
>   Hardware Reduced (V5) : 1
> 
> in that FADT table?

Nope, it is "Hardware Reduced (V5) : 0".  Now the FADT is also at
https://linuxtv.org/~js/e200ha/

> If yes, it probably does not have the normal Fixed power button but
> instead it has something called "Windows button array device" with
> _HID/_CID of PNP0C40. Looking at your dsdt.dsl, this looks to be the
> case.
> 
> That device is driven by soc_button_array.c driver which can be enabled
> with CONFIG_KEYBOARD_GPIO=y and CONFIG_INPUT_SOC_BUTTON_ARRAY=y. Can you
> check if you have that enabled already?
> 
> You should actually see it in /proc/interrupts with names like "power"
> and so on.

I added CONFIG_INPUT_SOC_BUTTON_ARRAY=y, but no joy.
Maybe because the _HID is INTCFD9, only _CID is PNP0C40?
It also has a _DSM with UUID dfbcf3c5-e7a5-44e6-9c1f-29c76f6e059c.

BTW, lsinput already lists two "Power Button" devices,
   phys: "PNP0C0C/button/input0"
   phys: "LNXPWRBN/button/input0"

None of them generates events in input-events.


Thanks,
Johannes


Re: Cherryview wake up events

2016-09-20 Thread Johannes Stezenbach
On Tue, Sep 20, 2016 at 01:40:14PM +0300, Mika Westerberg wrote:
> Can you check if you have:
> 
>   Hardware Reduced (V5) : 1
> 
> in that FADT table?

Nope, it is "Hardware Reduced (V5) : 0".  Now the FADT is also at
https://linuxtv.org/~js/e200ha/

> If yes, it probably does not have the normal Fixed power button but
> instead it has something called "Windows button array device" with
> _HID/_CID of PNP0C40. Looking at your dsdt.dsl, this looks to be the
> case.
> 
> That device is driven by soc_button_array.c driver which can be enabled
> with CONFIG_KEYBOARD_GPIO=y and CONFIG_INPUT_SOC_BUTTON_ARRAY=y. Can you
> check if you have that enabled already?
> 
> You should actually see it in /proc/interrupts with names like "power"
> and so on.

I added CONFIG_INPUT_SOC_BUTTON_ARRAY=y, but no joy.
Maybe because the _HID is INTCFD9, only _CID is PNP0C40?
It also has a _DSM with UUID dfbcf3c5-e7a5-44e6-9c1f-29c76f6e059c.

BTW, lsinput already lists two "Power Button" devices,
   phys: "PNP0C0C/button/input0"
   phys: "LNXPWRBN/button/input0"

None of them generates events in input-events.


Thanks,
Johannes


Re: Cherryview wake up events

2016-09-20 Thread Johannes Stezenbach
On Tue, Sep 20, 2016 at 12:18:40PM +0300, Mika Westerberg wrote:
> On Mon, Sep 19, 2016 at 10:36:22PM +0200, Johannes Stezenbach wrote:
> > Now my question is, is this pin 0x004E the same as this
> > in /proc/interrupts which fires on LID event?
> > 
> >  158:  2  0  0  0  chv-gpio   43 ACPI:Event
> 
> Yes, it is that one and it triggers \_SB.GPO0._E4E() method to be called
> whenever low edge is detected on the GPIO line. This method then handles
> many things depending on what the AML code reads from ^^PCI0.I2C1.ENID
> notifying the power button device (PWRB) among other things.

Thanks for confirmation, but it circles back to the question
how to map the numbers.  Since the document that describes it
is not public, it would be useful if you could add comments
to pinctrl-cherryview.c that describes it.
Or did I just miss something?

> I suppose you already have CONFIG_ACPI_I2C_OPREGION=y in your .config?
> That allows the AML code to access the I2C bus using the I2C driver.
> 
> > The FADT has
> > Control Method Power Button (V1) : 0
> > Control Method Sleep Button (V1) : 1
> > 
> > PWRBTN_EN in PM1 is set.  But PWRBTN press causes thermal irq.
> 
> Yeah, it uses control method power button (PNP0C0C) and ACPI GPIO event
> to trigger changes in that.

I'm confused again because I thought "Control Method Power Button (V1) : 0"
means it is a fixed power button, however the DSDT also has

Device (PWRB)
{
Name (_HID, EisaId ("PNP0C0C") /* Power Button Device */)  // _HID: 
Hardware ID
}

Device (SLPB)
{
Name (_HID, EisaId ("PNP0C0E") /* Sleep Button Device */)  // _HID: 
Hardware ID
}


> > No SCI (irq 9) is ever generated, except by writing to the
> > BIOS_RLS bit in SMI_EN register (IO port 0x430).
> > 
> > GPE block addresses in FADT are 0.  GPE0a_EN register (IO 0x428)
> > is set to 0x6000 (TCO_EN + PME_B0_EN, but none of the GPIO enables).
> > 
> > Any advice how to continue?
> 
> Please check that you have that CONFIG_ACPI_I2C_OPREGION=y and
> CONFIG_MFD_AXP20X=y.
> 
> You should see the ACPI:Event interrupt count increasing in
> /proc/interrups when you press power button. When that works then we can
> start thinking about adding wake support :)

I had CONFIG_ACPI_I2C_OPREGION=y but excluded CONFIG_MFD_AXP20X
based on \_SB.PIC0.I2C7.PMI1._STA returning 0 in acpidbg,

Device (PMI1)
{
Name (_ADR, Zero)  // _ADR: Address
Name (_HID, "INT33F4" /* XPOWER PMIC Controller */)  // _HID: 
Hardware ID
Name (_CID, "INT33F4" /* XPOWER PMIC Controller */)  // _CID: 
Compatible ID
Name (_DDN, "XPOWER PMIC Controller")  // _DDN: DOS Device Name

but \_SB.PIC0.I2C7.PMI1._STA returns 0xf

Device (PMI2)
{
Name (_ADR, Zero)  // _ADR: Address
Name (_HID, "INT33F5" /* TI PMIC Controller */)  // _HID: 
Hardware ID
Name (_CID, "INT33F5" /* TI PMIC Controller */)  // _CID: 
Compatible ID
Name (_DDN, "TI PMIC Controller")  // _DDN: DOS Device Name

So I tried CONFIG_MFD_AXP20X=y anyway, but as expected: no change.

Since TI doesn't even have a product page for the SND9039
(only a few references in TI support forum can be found),
I'm not sure what can be done.  So maybe a better short term
goal would be to get wakeup by LID working.

However, I still wonder why the power button can trigger
a thermal irq, is it related to the PMIC?  I couldn't
find out where the thermal irq is routed.


Thanks,
Johannes


Re: Cherryview wake up events

2016-09-20 Thread Johannes Stezenbach
On Tue, Sep 20, 2016 at 12:18:40PM +0300, Mika Westerberg wrote:
> On Mon, Sep 19, 2016 at 10:36:22PM +0200, Johannes Stezenbach wrote:
> > Now my question is, is this pin 0x004E the same as this
> > in /proc/interrupts which fires on LID event?
> > 
> >  158:  2  0  0  0  chv-gpio   43 ACPI:Event
> 
> Yes, it is that one and it triggers \_SB.GPO0._E4E() method to be called
> whenever low edge is detected on the GPIO line. This method then handles
> many things depending on what the AML code reads from ^^PCI0.I2C1.ENID
> notifying the power button device (PWRB) among other things.

Thanks for confirmation, but it circles back to the question
how to map the numbers.  Since the document that describes it
is not public, it would be useful if you could add comments
to pinctrl-cherryview.c that describes it.
Or did I just miss something?

> I suppose you already have CONFIG_ACPI_I2C_OPREGION=y in your .config?
> That allows the AML code to access the I2C bus using the I2C driver.
> 
> > The FADT has
> > Control Method Power Button (V1) : 0
> > Control Method Sleep Button (V1) : 1
> > 
> > PWRBTN_EN in PM1 is set.  But PWRBTN press causes thermal irq.
> 
> Yeah, it uses control method power button (PNP0C0C) and ACPI GPIO event
> to trigger changes in that.

I'm confused again because I thought "Control Method Power Button (V1) : 0"
means it is a fixed power button, however the DSDT also has

Device (PWRB)
{
Name (_HID, EisaId ("PNP0C0C") /* Power Button Device */)  // _HID: 
Hardware ID
}

Device (SLPB)
{
Name (_HID, EisaId ("PNP0C0E") /* Sleep Button Device */)  // _HID: 
Hardware ID
}


> > No SCI (irq 9) is ever generated, except by writing to the
> > BIOS_RLS bit in SMI_EN register (IO port 0x430).
> > 
> > GPE block addresses in FADT are 0.  GPE0a_EN register (IO 0x428)
> > is set to 0x6000 (TCO_EN + PME_B0_EN, but none of the GPIO enables).
> > 
> > Any advice how to continue?
> 
> Please check that you have that CONFIG_ACPI_I2C_OPREGION=y and
> CONFIG_MFD_AXP20X=y.
> 
> You should see the ACPI:Event interrupt count increasing in
> /proc/interrups when you press power button. When that works then we can
> start thinking about adding wake support :)

I had CONFIG_ACPI_I2C_OPREGION=y but excluded CONFIG_MFD_AXP20X
based on \_SB.PIC0.I2C7.PMI1._STA returning 0 in acpidbg,

Device (PMI1)
{
Name (_ADR, Zero)  // _ADR: Address
Name (_HID, "INT33F4" /* XPOWER PMIC Controller */)  // _HID: 
Hardware ID
Name (_CID, "INT33F4" /* XPOWER PMIC Controller */)  // _CID: 
Compatible ID
Name (_DDN, "XPOWER PMIC Controller")  // _DDN: DOS Device Name

but \_SB.PIC0.I2C7.PMI1._STA returns 0xf

Device (PMI2)
{
Name (_ADR, Zero)  // _ADR: Address
Name (_HID, "INT33F5" /* TI PMIC Controller */)  // _HID: 
Hardware ID
Name (_CID, "INT33F5" /* TI PMIC Controller */)  // _CID: 
Compatible ID
Name (_DDN, "TI PMIC Controller")  // _DDN: DOS Device Name

So I tried CONFIG_MFD_AXP20X=y anyway, but as expected: no change.

Since TI doesn't even have a product page for the SND9039
(only a few references in TI support forum can be found),
I'm not sure what can be done.  So maybe a better short term
goal would be to get wakeup by LID working.

However, I still wonder why the power button can trigger
a thermal irq, is it related to the PMIC?  I couldn't
find out where the thermal irq is routed.


Thanks,
Johannes


Re: Cherryview wake up events

2016-09-19 Thread Johannes Stezenbach
On Mon, Sep 19, 2016 at 02:56:19PM +0300, Mika Westerberg wrote:
> On Mon, Sep 19, 2016 at 01:21:17PM +0200, Johannes Stezenbach wrote:
> > 
> > The LID causes a gpio irq:
> >  158:  2  0  0  0  chv-gpio   43 ACPI:Event
> > 
> > However, neither LID nor power button can wake up the
> > device from "echo freeze >/sys/power/state".  :-(
> 
> The cherryview pinctrl driver does not (yet) support wake up events. It
> currently just sets IRQCHIP_SKIP_SET_WAKE for the irqchip.

OK, but shouldn't the wakeup usually be handled by ACPI?
Clearly I don't understand this.  I mean on the non-ACPI
embedded ARM systems I'm used to I need to enable specific
irqs as wakeup sources, but on ACPI, isn't SCI the implicit
wakeup irq?  Probably I'm just totally confused, so let
me ask another way, below.

> I can make you a test patch which adds support for wakes for the pinctrl
> driver if you like to test it out. However, that will happen most likely
> near end of the week as I have other things right now.

That would be great!


I found in the DSDT:

Scope (_SB.GPO0)
{
Name (EVBF, Buffer (0x03) {})
CreateByteField (EVBF, Zero, EVST)
CreateByteField (EVBF, One, ELEN)
CreateByteField (EVBF, 0x02, ENVT)
Name (LIDZ, One)
Method (_E4E, 0, Serialized)  // _Exx: Edge-Triggered GPE
{
Name (_T_0, Zero)  // _T_x: Emitted by ASL Compiler
If (^^PCI0.I2C1.AVBL != One)
{
Return (Zero)
}

EVBF = ^^PCI0.I2C1.ENID /* \_SB_.PCI0.I2C1.ENID */
...
_T_0 = ENVT /* \_SB_.GPO0.ENVT */
...
ElseIf (_T_0 == 0xA9)
{
Notify (PWRB, 0x80) // Status Change
Break
}

and
Device (GPO0)
{
...
Method (_AEI, 0, NotSerialized)  // _AEI: ACPI Event Interrupts
{
Name (WBUF, ResourceTemplate ()
{
GpioInt (Edge, ActiveLow, ExclusiveAndWake, PullUp, 0x,
"\\_SB.GPO0", 0x00, ResourceConsumer, ,
)
{   // Pin list
0x004E
}
})
If (OSID == One)
{
Return (WBUF) /* \_SB_.GPO0._AEI.WBUF */
}
}

and OSID is a field in 
OperationRegion (GNVS, SystemMemory, 0x7A158000, 0x0362)
which is inside what /proc/iomem lists as "ACPI no-volatile storage".
OSID is read a lot in the DSDT but never written to.
But calling \_SB.GPO0._AEI in acpidbg returns a buffer of size 25.

Now my question is, is this pin 0x004E the same as this
in /proc/interrupts which fires on LID event?

 158:  2  0  0  0  chv-gpio   43 ACPI:Event



The FADT has
Control Method Power Button (V1) : 0
Control Method Sleep Button (V1) : 1

PWRBTN_EN in PM1 is set.  But PWRBTN press causes thermal irq.

No SCI (irq 9) is ever generated, except by writing to the
BIOS_RLS bit in SMI_EN register (IO port 0x430).

GPE block addresses in FADT are 0.  GPE0a_EN register (IO 0x428)
is set to 0x6000 (TCO_EN + PME_B0_EN, but none of the GPIO enables).

Any advice how to continue?


Thanks,
Johannes


Re: Cherryview wake up events

2016-09-19 Thread Johannes Stezenbach
On Mon, Sep 19, 2016 at 02:56:19PM +0300, Mika Westerberg wrote:
> On Mon, Sep 19, 2016 at 01:21:17PM +0200, Johannes Stezenbach wrote:
> > 
> > The LID causes a gpio irq:
> >  158:  2  0  0  0  chv-gpio   43 ACPI:Event
> > 
> > However, neither LID nor power button can wake up the
> > device from "echo freeze >/sys/power/state".  :-(
> 
> The cherryview pinctrl driver does not (yet) support wake up events. It
> currently just sets IRQCHIP_SKIP_SET_WAKE for the irqchip.

OK, but shouldn't the wakeup usually be handled by ACPI?
Clearly I don't understand this.  I mean on the non-ACPI
embedded ARM systems I'm used to I need to enable specific
irqs as wakeup sources, but on ACPI, isn't SCI the implicit
wakeup irq?  Probably I'm just totally confused, so let
me ask another way, below.

> I can make you a test patch which adds support for wakes for the pinctrl
> driver if you like to test it out. However, that will happen most likely
> near end of the week as I have other things right now.

That would be great!


I found in the DSDT:

Scope (_SB.GPO0)
{
Name (EVBF, Buffer (0x03) {})
CreateByteField (EVBF, Zero, EVST)
CreateByteField (EVBF, One, ELEN)
CreateByteField (EVBF, 0x02, ENVT)
Name (LIDZ, One)
Method (_E4E, 0, Serialized)  // _Exx: Edge-Triggered GPE
{
Name (_T_0, Zero)  // _T_x: Emitted by ASL Compiler
If (^^PCI0.I2C1.AVBL != One)
{
Return (Zero)
}

EVBF = ^^PCI0.I2C1.ENID /* \_SB_.PCI0.I2C1.ENID */
...
_T_0 = ENVT /* \_SB_.GPO0.ENVT */
...
ElseIf (_T_0 == 0xA9)
{
Notify (PWRB, 0x80) // Status Change
Break
}

and
Device (GPO0)
{
...
Method (_AEI, 0, NotSerialized)  // _AEI: ACPI Event Interrupts
{
Name (WBUF, ResourceTemplate ()
{
GpioInt (Edge, ActiveLow, ExclusiveAndWake, PullUp, 0x,
"\\_SB.GPO0", 0x00, ResourceConsumer, ,
)
{   // Pin list
0x004E
}
})
If (OSID == One)
{
Return (WBUF) /* \_SB_.GPO0._AEI.WBUF */
}
}

and OSID is a field in 
OperationRegion (GNVS, SystemMemory, 0x7A158000, 0x0362)
which is inside what /proc/iomem lists as "ACPI no-volatile storage".
OSID is read a lot in the DSDT but never written to.
But calling \_SB.GPO0._AEI in acpidbg returns a buffer of size 25.

Now my question is, is this pin 0x004E the same as this
in /proc/interrupts which fires on LID event?

 158:  2  0  0  0  chv-gpio   43 ACPI:Event



The FADT has
Control Method Power Button (V1) : 0
Control Method Sleep Button (V1) : 1

PWRBTN_EN in PM1 is set.  But PWRBTN press causes thermal irq.

No SCI (irq 9) is ever generated, except by writing to the
BIOS_RLS bit in SMI_EN register (IO port 0x430).

GPE block addresses in FADT are 0.  GPE0a_EN register (IO 0x428)
is set to 0x6000 (TCO_EN + PME_B0_EN, but none of the GPIO enables).

Any advice how to continue?


Thanks,
Johannes


Cherryview wake up events

2016-09-19 Thread Johannes Stezenbach
Hi,

Mika, I've been reading the thread about pinctrl-cherryview
interrupts, but I have some basic questions in understanding
the hardware and the relationship between ACPI and Linux drivers,
so I decided to start a new thread.
https://lkml.kernel.org/g/20160909085832.gk15...@lahna.fi.intel.com

I have one Asus E200HA (Atom x5-Z8300) where the power button
doesn't generate any ACPI events (no SCI), instead it causes
a Thermal Event irq:

 TRM:  3  3  3  4   Thermal event interrupts

[   51.825488] CPU0: Core temperature above threshold, cpu clock throttled 
(total events = 1)
[   51.826933] CPU1: Core temperature above threshold, cpu clock throttled 
(total events = 1)
[   51.826965] mce: [Hardware Error]: Machine check events logged
[   51.841180] mce: [Hardware Error]: Machine check events logged

(These events are logged only sometimes, usually a power button
press only increments the TRM count.)

I would like to understand how this is possible, when I boot
with apic=debug I can't see anything claiming vector 0xfa.

The LID causes a gpio irq:
 158:  2  0  0  0  chv-gpio   43 ACPI:Event

However, neither LID nor power button can wake up the
device from "echo freeze >/sys/power/state".  :-(

"grep . /sys/firmware/acpi/interrupts/*" shows only zeros.

I put the DSDT and some other tables at:
https://linuxtv.org/~js/e200ha/

During the last weeks I read what I could about the hardware
and ACPI, and poked at it with acpidbg, devmem, ioport
and in kernel source, but to no avail.

On Thu, Sep 15, 2016 at 06:52:10PM +0300, Mika Westerberg wrote:
> It turns out that for north and southwest communities, they can only
> generate GPIO interrupts for lower 8 interrupts (IntSel value). The upper
> part (8-15) can only generate GPEs (General Purpose Events).

I got the Atom Z8000 series datasheet from
http://www.intel.com/content/www/us/en/processors/atom/atom-technical-resources.html
and tried to find the source for this.  The closest I
could find is the GPIO_ROUT PMC register?
However, the datasheet doesn't tell about the other
interrupts not covered by GPIO_ROUT, if they are fixed
IRQ or SCI or "no effect".

I also don't get the mapping from intsel irq to IO-APIC pin
number.  And also not the mapping between the pin numbers used
on DSDT GpioInt to the pin numbers in pinctrl-cherryview.c.
Could you shed a light on this?  Or point out where I can
find information?

It seems to imply BIOS sets up IntSel.  I'm generally confused
about the responsibility of BIOS vs. drivers making use of the
information from DSDT, e.g. Device (GPO1) has a list of
GpioIo Connections, other devices like PMI2 use GpioInt
from GPO1.  My E200HA has the INT33F5 TI PMIC
Controller, which according to Windows driver strings
seems to be the SND9039.
Does it mean I need a PMIC driver that reads the _CRS and
configures the GPIO?

BTW, the datasheet talks about 4 seconds for power button
override, but it takes 10 seconds.  Maybe it means the
power button is connected to the TI PMIC, not to the
Cherryview SoC?

Another question is about the virtual GPIO device that exists
in hardware and is used by DSDT.  How does that work and
why does pinctrl-cherryview.c exclude it?

Sorry for so many questions, any info is appreciated,
and any suggestion what to try to get the thing to
wake up from freeze.

I was totally unfamiliar with ACPI until now, but I think
the DSDT has some nasty surprise in several _REG methods
that use OEM defined OperatingRegionIds to set some availabilty
flags that are tested in other methods.  So it means if the
Windows drivers aren't loaded, those methods won't do anything,
right? Does anyone have suggestions or even examples how to deal with this?


Thanks,
Johannes


Cherryview wake up events

2016-09-19 Thread Johannes Stezenbach
Hi,

Mika, I've been reading the thread about pinctrl-cherryview
interrupts, but I have some basic questions in understanding
the hardware and the relationship between ACPI and Linux drivers,
so I decided to start a new thread.
https://lkml.kernel.org/g/20160909085832.gk15...@lahna.fi.intel.com

I have one Asus E200HA (Atom x5-Z8300) where the power button
doesn't generate any ACPI events (no SCI), instead it causes
a Thermal Event irq:

 TRM:  3  3  3  4   Thermal event interrupts

[   51.825488] CPU0: Core temperature above threshold, cpu clock throttled 
(total events = 1)
[   51.826933] CPU1: Core temperature above threshold, cpu clock throttled 
(total events = 1)
[   51.826965] mce: [Hardware Error]: Machine check events logged
[   51.841180] mce: [Hardware Error]: Machine check events logged

(These events are logged only sometimes, usually a power button
press only increments the TRM count.)

I would like to understand how this is possible, when I boot
with apic=debug I can't see anything claiming vector 0xfa.

The LID causes a gpio irq:
 158:  2  0  0  0  chv-gpio   43 ACPI:Event

However, neither LID nor power button can wake up the
device from "echo freeze >/sys/power/state".  :-(

"grep . /sys/firmware/acpi/interrupts/*" shows only zeros.

I put the DSDT and some other tables at:
https://linuxtv.org/~js/e200ha/

During the last weeks I read what I could about the hardware
and ACPI, and poked at it with acpidbg, devmem, ioport
and in kernel source, but to no avail.

On Thu, Sep 15, 2016 at 06:52:10PM +0300, Mika Westerberg wrote:
> It turns out that for north and southwest communities, they can only
> generate GPIO interrupts for lower 8 interrupts (IntSel value). The upper
> part (8-15) can only generate GPEs (General Purpose Events).

I got the Atom Z8000 series datasheet from
http://www.intel.com/content/www/us/en/processors/atom/atom-technical-resources.html
and tried to find the source for this.  The closest I
could find is the GPIO_ROUT PMC register?
However, the datasheet doesn't tell about the other
interrupts not covered by GPIO_ROUT, if they are fixed
IRQ or SCI or "no effect".

I also don't get the mapping from intsel irq to IO-APIC pin
number.  And also not the mapping between the pin numbers used
on DSDT GpioInt to the pin numbers in pinctrl-cherryview.c.
Could you shed a light on this?  Or point out where I can
find information?

It seems to imply BIOS sets up IntSel.  I'm generally confused
about the responsibility of BIOS vs. drivers making use of the
information from DSDT, e.g. Device (GPO1) has a list of
GpioIo Connections, other devices like PMI2 use GpioInt
from GPO1.  My E200HA has the INT33F5 TI PMIC
Controller, which according to Windows driver strings
seems to be the SND9039.
Does it mean I need a PMIC driver that reads the _CRS and
configures the GPIO?

BTW, the datasheet talks about 4 seconds for power button
override, but it takes 10 seconds.  Maybe it means the
power button is connected to the TI PMIC, not to the
Cherryview SoC?

Another question is about the virtual GPIO device that exists
in hardware and is used by DSDT.  How does that work and
why does pinctrl-cherryview.c exclude it?

Sorry for so many questions, any info is appreciated,
and any suggestion what to try to get the thing to
wake up from freeze.

I was totally unfamiliar with ACPI until now, but I think
the DSDT has some nasty surprise in several _REG methods
that use OEM defined OperatingRegionIds to set some availabilty
flags that are tested in other methods.  So it means if the
Windows drivers aren't loaded, those methods won't do anything,
right? Does anyone have suggestions or even examples how to deal with this?


Thanks,
Johannes


Re: [PATCH 4.7 000/143] 4.7.3-stable review

2016-09-08 Thread Johannes Stezenbach
On Thu, Sep 08, 2016 at 08:52:32AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Sep 07, 2016 at 04:59:37PM -0400, Levin, Alexander wrote:
> > Hey Greg,
> > 
> > For reference, I've generated a list of <=4.8-rc4 commits that look to me 
> > like stable material but are not in 4.7.3:
> > 
> > 422eac3f7deae34dbaffd08e03e27f37a5394a56 (v4.8-rc1) tpm_crb: fix mapping of 
> > the buffers
> > a36aa80f3cb2540fb1dbad6240852de4365a2e82 (v4.8-rc1) intel_th: Fix a 
> > deadlock in modprobing
> > 7a1a47ce35821b40f5b2ce46379ba14393bc3873 (v4.8-rc1) intel_th: pci: Add Kaby 
> > Lake PCH-H support
> > fa95986095e39205ea2fb5b5dafe271bca7eb8d1 (v4.8-rc1) drm/i915: Set legacy 
> > properties when using legacy gamma set IOCTL. (v2)
> > 78f4f7c2341f1cf510152ad494108850fec1ae39 (v4.8-rc1) ALSA: hda/realtek - 
> > ALC891 headset mode for Dell
> > 9b51fe3efe4c270005e34f55a97e5a84ad68e581 (v4.8-rc1) ALSA: hda - On-board 
> > speaker fixup on ACER Veriton
> > 7d9595d848cdff5c7939f68eec39e0c5d36a1d67 (v4.8-rc1) dm rq: fix the starting 
> > and stopping of blk-mq queues
> > 3b2c1710fac7fb278b760d1545e637cbb5ea5b5b (v4.8-rc2) drm/i915: Wait up to 
> > 3ms for the pcu to ack the cdclk change request on SKL
> > c518189567eaf42b2ec50a4d982484c8e38799f8 (v4.8-rc3) net: macb: Correct CAPS 
> > mask
> > 80788a0fbbdfbb125e3fd45a640cddb582160bc7 (v4.8-rc1) drm/i915/fbc: sanitize 
> > i915.enable_fbc during FBC init
> > 0a491b96aa59a7232f6c1a81414aa57fb8de8594 (v4.8-rc3) drm/i915/fbc: FBC 
> > causes display flicker when VT-d is enabled on Skylake
> > 3e103a65514c2947e53f3171b21255fbde8b60c6 (v4.8-rc4) ASoC: atmel_ssc_dai: 
> > Don't unconditionally reset SSC on stream startup
> > 1b856086813be9371929b6cc62045f9fd470f5a0 (v4.8-rc4) block: Fix race 
> > triggered by blk_set_queue_dying()
> > ae5b80d2b68eac945b124227dea34462118a6f01 (v4.8-rc4) drm/radeon: only apply 
> > the SS fractional workaround to RS[78]80
> > d9dc1702b297ec4a6bb9c0326a70641b322ba886 (v4.8-rc4) bcache: 
> > register_bcache(): call blkdev_put() when cache_alloc() fails
> > acc9cf8c66c66b2cbbdb4a375537edee72be64df (v4.8-rc4) bcache: RESERVE_PRIO is 
> > too small by one when prio_buckets() is a power of two.
> > 13f479b9df4e2bbf2d16e7e1b02f3f55f70e2455 (v4.8-rc4) drm/radeon: fix 
> > radeon_move_blit on 32bit systems
> > d77976c414ed7f521b9c79b2a9dde0147a3cf754 (v4.8-rc4) ARC: export kmap
> > c57653dc94d0db7bf63067433ceaa97bdcd0a312 (v4.8-rc4) ARC: export __udivdi3 
> > for modules
> > 6f00975c619064a18c23fd3aced325ae165a73b9 (v4.8-rc4) drm: Reject page_flip 
> > for !DRIVER_MODESET
> > e9e5e3fae8da7e237049e00e0bfc9e32fd808fe8 (v4.8-rc4) bdev: fix NULL pointer 
> > dereference
> > 6a33fa2b87513fee44cb8f0cd17b1acd6316bc6b (v4.8-rc4) irqchip/mips-gic: 
> > Cleanup chip and handler setup
> > 2564970a381651865364974ea414384b569cb9c0 (v4.8-rc4) irqchip/mips-gic: 
> > Implement activate op for device domain
> > c62fb260a86dde3df5b2905432caa0e9f6898434 (v4.8-rc4) IB/hfi1,IB/qib: Fix 
> > qp_stats sleep with rcu read lock held
> > a77ec83a57890240c546df00ca5df1cdeedb1cc3 (v4.8-rc4) vhost/scsi: fix reuse 
> > of >iov[out] in response
> > c0082e985fdf77b02fc9e0dac3b58504dcf11b7a (v4.8-rc4) ubifs: Fix assertion in 
> > layout_in_gaps()
> > 17ce1eb0b64eb27d4f9180daae7495fa022c7b0d (v4.8-rc4) ubifs: Fix xattr 
> > generic handler usage
> > 27727df240c7cc84f2ba6047c6f18d5addfd25ef (v4.8-rc4) timekeeping: Avoid 
> > taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
> > a4f8f6667f099036c88f231dcad4cf233652c824 (v4.8-rc4) timekeeping: Cap array 
> > access in timekeeping_debug
> > 2e63ad4bd5dd583871e6602f9d398b9322d358d9 (v4.8-rc4) x86/apic: Do not init 
> > irq remapping if ioapic is disabled
> > 9b47f77a680447e0132b2cf7fb82374e014bec1c (v4.8-rc4) nvme: Fix 
> > nvme_get/set_features() with a NULL result pointer
> > 4d70dca4eadf2f95abe389116ac02b8439c2d16c (v4.8-rc4) block: make sure a big 
> > bio is split into at most 256 bvecs
> > 9a035a40f7f3f6708b79224b86c5777a3334f7ea (v4.8-rc4) xenbus: don't look up 
> > transaction IDs for ordinary writes
> > 299f6230bc6d0ccd5f95bb0fb865d80a9c7d5ccc (v4.8-rc4) dm flakey: fix reads to 
> > be issued if drop_writes configured
> > b53e7d000d9e6e9fd2c6eb6b82d2783c67fd599e (v4.8-rc4) 
> > clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe 
> > function
> > add1fa75101263ab4d74240f93000998d4325624 (v4.8-rc4) drm/atomic: Don't 
> > potentially reset color_mgmt_changed on successive property updates.
> > 
> 
> Thanks for these, I'll look at them after I get through the other
> "properly tagged" patches in my queue.  I also have a long list of stuff
> like this that I need to look at closer...

And another one:
b47820edd1634dc1208f9212b7ecfb4230610a23 ext4: avoid modifying checksum fields 
directly during checksum verification

Sorry fo the noise if you have it already, but there was
no repsonse to two pings in
https://lkml.kernel.org/r/20160901164016.gb25...@birch.djwong.org


Thanks,
Johannes


Re: [PATCH 4.7 000/143] 4.7.3-stable review

2016-09-08 Thread Johannes Stezenbach
On Thu, Sep 08, 2016 at 08:52:32AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Sep 07, 2016 at 04:59:37PM -0400, Levin, Alexander wrote:
> > Hey Greg,
> > 
> > For reference, I've generated a list of <=4.8-rc4 commits that look to me 
> > like stable material but are not in 4.7.3:
> > 
> > 422eac3f7deae34dbaffd08e03e27f37a5394a56 (v4.8-rc1) tpm_crb: fix mapping of 
> > the buffers
> > a36aa80f3cb2540fb1dbad6240852de4365a2e82 (v4.8-rc1) intel_th: Fix a 
> > deadlock in modprobing
> > 7a1a47ce35821b40f5b2ce46379ba14393bc3873 (v4.8-rc1) intel_th: pci: Add Kaby 
> > Lake PCH-H support
> > fa95986095e39205ea2fb5b5dafe271bca7eb8d1 (v4.8-rc1) drm/i915: Set legacy 
> > properties when using legacy gamma set IOCTL. (v2)
> > 78f4f7c2341f1cf510152ad494108850fec1ae39 (v4.8-rc1) ALSA: hda/realtek - 
> > ALC891 headset mode for Dell
> > 9b51fe3efe4c270005e34f55a97e5a84ad68e581 (v4.8-rc1) ALSA: hda - On-board 
> > speaker fixup on ACER Veriton
> > 7d9595d848cdff5c7939f68eec39e0c5d36a1d67 (v4.8-rc1) dm rq: fix the starting 
> > and stopping of blk-mq queues
> > 3b2c1710fac7fb278b760d1545e637cbb5ea5b5b (v4.8-rc2) drm/i915: Wait up to 
> > 3ms for the pcu to ack the cdclk change request on SKL
> > c518189567eaf42b2ec50a4d982484c8e38799f8 (v4.8-rc3) net: macb: Correct CAPS 
> > mask
> > 80788a0fbbdfbb125e3fd45a640cddb582160bc7 (v4.8-rc1) drm/i915/fbc: sanitize 
> > i915.enable_fbc during FBC init
> > 0a491b96aa59a7232f6c1a81414aa57fb8de8594 (v4.8-rc3) drm/i915/fbc: FBC 
> > causes display flicker when VT-d is enabled on Skylake
> > 3e103a65514c2947e53f3171b21255fbde8b60c6 (v4.8-rc4) ASoC: atmel_ssc_dai: 
> > Don't unconditionally reset SSC on stream startup
> > 1b856086813be9371929b6cc62045f9fd470f5a0 (v4.8-rc4) block: Fix race 
> > triggered by blk_set_queue_dying()
> > ae5b80d2b68eac945b124227dea34462118a6f01 (v4.8-rc4) drm/radeon: only apply 
> > the SS fractional workaround to RS[78]80
> > d9dc1702b297ec4a6bb9c0326a70641b322ba886 (v4.8-rc4) bcache: 
> > register_bcache(): call blkdev_put() when cache_alloc() fails
> > acc9cf8c66c66b2cbbdb4a375537edee72be64df (v4.8-rc4) bcache: RESERVE_PRIO is 
> > too small by one when prio_buckets() is a power of two.
> > 13f479b9df4e2bbf2d16e7e1b02f3f55f70e2455 (v4.8-rc4) drm/radeon: fix 
> > radeon_move_blit on 32bit systems
> > d77976c414ed7f521b9c79b2a9dde0147a3cf754 (v4.8-rc4) ARC: export kmap
> > c57653dc94d0db7bf63067433ceaa97bdcd0a312 (v4.8-rc4) ARC: export __udivdi3 
> > for modules
> > 6f00975c619064a18c23fd3aced325ae165a73b9 (v4.8-rc4) drm: Reject page_flip 
> > for !DRIVER_MODESET
> > e9e5e3fae8da7e237049e00e0bfc9e32fd808fe8 (v4.8-rc4) bdev: fix NULL pointer 
> > dereference
> > 6a33fa2b87513fee44cb8f0cd17b1acd6316bc6b (v4.8-rc4) irqchip/mips-gic: 
> > Cleanup chip and handler setup
> > 2564970a381651865364974ea414384b569cb9c0 (v4.8-rc4) irqchip/mips-gic: 
> > Implement activate op for device domain
> > c62fb260a86dde3df5b2905432caa0e9f6898434 (v4.8-rc4) IB/hfi1,IB/qib: Fix 
> > qp_stats sleep with rcu read lock held
> > a77ec83a57890240c546df00ca5df1cdeedb1cc3 (v4.8-rc4) vhost/scsi: fix reuse 
> > of >iov[out] in response
> > c0082e985fdf77b02fc9e0dac3b58504dcf11b7a (v4.8-rc4) ubifs: Fix assertion in 
> > layout_in_gaps()
> > 17ce1eb0b64eb27d4f9180daae7495fa022c7b0d (v4.8-rc4) ubifs: Fix xattr 
> > generic handler usage
> > 27727df240c7cc84f2ba6047c6f18d5addfd25ef (v4.8-rc4) timekeeping: Avoid 
> > taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
> > a4f8f6667f099036c88f231dcad4cf233652c824 (v4.8-rc4) timekeeping: Cap array 
> > access in timekeeping_debug
> > 2e63ad4bd5dd583871e6602f9d398b9322d358d9 (v4.8-rc4) x86/apic: Do not init 
> > irq remapping if ioapic is disabled
> > 9b47f77a680447e0132b2cf7fb82374e014bec1c (v4.8-rc4) nvme: Fix 
> > nvme_get/set_features() with a NULL result pointer
> > 4d70dca4eadf2f95abe389116ac02b8439c2d16c (v4.8-rc4) block: make sure a big 
> > bio is split into at most 256 bvecs
> > 9a035a40f7f3f6708b79224b86c5777a3334f7ea (v4.8-rc4) xenbus: don't look up 
> > transaction IDs for ordinary writes
> > 299f6230bc6d0ccd5f95bb0fb865d80a9c7d5ccc (v4.8-rc4) dm flakey: fix reads to 
> > be issued if drop_writes configured
> > b53e7d000d9e6e9fd2c6eb6b82d2783c67fd599e (v4.8-rc4) 
> > clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe 
> > function
> > add1fa75101263ab4d74240f93000998d4325624 (v4.8-rc4) drm/atomic: Don't 
> > potentially reset color_mgmt_changed on successive property updates.
> > 
> 
> Thanks for these, I'll look at them after I get through the other
> "properly tagged" patches in my queue.  I also have a long list of stuff
> like this that I need to look at closer...

And another one:
b47820edd1634dc1208f9212b7ecfb4230610a23 ext4: avoid modifying checksum fields 
directly during checksum verification

Sorry fo the noise if you have it already, but there was
no repsonse to two pings in
https://lkml.kernel.org/r/20160901164016.gb25...@birch.djwong.org


Thanks,
Johannes


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-17 Thread Johannes Stezenbach
On Fri, Aug 05, 2016 at 08:11:36PM +0200, Johannes Stezenbach wrote:
> On Fri, Aug 05, 2016 at 10:02:28AM -0700, Darrick J. Wong wrote:
> > 
> > When you're back on 4.7, can you apply this patch[1] to see if it fixes
> > the problem?  I speculate that the new parallel dir lookup code enables
> > multiple threads to be verifying the same directory block buffer at the
> > same time.
> > 
> > [1] 
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23
> 
> I added the patch, rebuilt and rebooted.  It will take some time
> before I'll report back since the issue is so hard to reproduce.

FWIW, so far the issue didn't appear again after I applied
the patch to 4.7.0, and I stressed it a bit with repo syncs,
AOSP builds, rsync backups etc.

Johannes


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-17 Thread Johannes Stezenbach
On Fri, Aug 05, 2016 at 08:11:36PM +0200, Johannes Stezenbach wrote:
> On Fri, Aug 05, 2016 at 10:02:28AM -0700, Darrick J. Wong wrote:
> > 
> > When you're back on 4.7, can you apply this patch[1] to see if it fixes
> > the problem?  I speculate that the new parallel dir lookup code enables
> > multiple threads to be verifying the same directory block buffer at the
> > same time.
> > 
> > [1] 
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23
> 
> I added the patch, rebuilt and rebooted.  It will take some time
> before I'll report back since the issue is so hard to reproduce.

FWIW, so far the issue didn't appear again after I applied
the patch to 4.7.0, and I stressed it a bit with repo syncs,
AOSP builds, rsync backups etc.

Johannes


4.7.0: RCU stall in nf_conntrack

2016-08-09 Thread Johannes Stezenbach
Hi,

I just experienced network hangup with 4.7.0, it happened shortly
after resume from hibernate:

[201988.443552] INFO: rcu_preempt detected stalls on CPUs/tasks:
[201988.443556] Tasks blocked on level-0 rcu_node (CPUs 0-3): P14563
[201988.443557] (detected by 3, t=18002 jiffies, g=7365154, c=7365153, 
q=15274)
[201988.443560] client_socket_t R  running task0 14563  1 0x
[201988.443563]  8800c427a900 e1b77832 880217603da0 
810bf66a
[201988.443565]  810bf5d1 8800c427a900 81e566c0 
880217603dd0
[201988.443567]  8119a3cf 8802177d80c0 81e566c0 
81f89ae0
[201988.443569] Call Trace:
[201988.443571][] sched_show_task+0xfa/0x160
[201988.443585]  [] ? sched_show_task+0x61/0x160
[201988.443587]  [] rcu_print_detail_task_stall_rnp+0x52/0x76
[201988.443590]  [] rcu_check_callbacks+0x866/0x9e0
[201988.443592]  [] update_process_times+0x39/0x60
[201988.443594]  [] tick_sched_handle.isra.5+0x21/0x60
[201988.443596]  [] tick_sched_timer+0x42/0x70
[201988.443598]  [] __hrtimer_run_queues+0x140/0x3c0
[201988.443599]  [] ? tick_sched_handle.isra.5+0x60/0x60
[201988.443601]  [] hrtimer_interrupt+0xb3/0x1c0
[201988.443603]  [] local_apic_timer_interrupt+0x36/0x60
[201988.443606]  [] smp_apic_timer_interrupt+0x3d/0x50
[201988.443607]  [] apic_timer_interrupt+0x8c/0xa0
[201988.443608][] ? 
__nf_conntrack_find_get+0x285/0x420
[201988.443611]  [] ? nf_conntrack_in+0x1d1/0x8d0
[201988.443612]  [] nf_conntrack_in+0x1d1/0x8d0
[201988.443615]  [] ipv4_conntrack_local+0x45/0x50
[201988.443616]  [] nf_iterate+0x62/0x80
[201988.443618]  [] nf_hook_slow+0xa0/0x110
[201988.443620]  [] ? nf_hook_slow+0x5/0x110
[201988.443622]  [] __ip_local_out+0xd8/0x120
[201988.443624]  [] ? ip_forward_options+0x1f0/0x1f0
[201988.443625]  [] ip_local_out+0x1c/0x70
[201988.443627]  [] ip_queue_xmit+0x18f/0x450
[201988.443628]  [] ? ip_queue_xmit+0x5/0x450
[201988.443630]  [] tcp_transmit_skb+0x48b/0x8e0
[201988.443632]  [] tcp_connect+0x629/0x830
[201988.443634]  [] ? secure_tcp_sequence_number+0x7f/0xe0
[201988.443636]  [] tcp_v4_connect+0x2b9/0x460
[201988.443638]  [] __inet_stream_connect+0xb2/0x310
[201988.443640]  [] ? preempt_count_sub+0xa1/0x100
[201988.443642]  [] ? lock_sock_nested+0x31/0x90
[201988.443644]  [] ? __local_bh_enable_ip+0x6f/0xd0
[201988.443646]  [] inet_stream_connect+0x38/0x50
[201988.443647]  [] SyS_connect+0x7b/0xf0
[201988.443649]  [] ? sock_alloc_file+0xa5/0x140
[201988.443651]  [] ? trace_hardirqs_on_thunk+0x1a/0x1c
[201988.443652]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
[201988.443654] client_socket_t R  running task0 14563  1 0x
[201988.443656]  8800c427a900 e1b77832 880217603da0 
810bf66a
[201988.443658]  810bf5d1 8800c427a900 81e566c0 
880217603dd0
[201988.443660]  8119a3cf 8802177d80c0 81e566c0 
81f89ae0
[201988.443662] Call Trace:
[201988.443663][] sched_show_task+0xfa/0x160
[201988.443665]  [] ? sched_show_task+0x61/0x160
[201988.443666]  [] rcu_print_detail_task_stall_rnp+0x52/0x76
[201988.443668]  [] rcu_check_callbacks+0x89f/0x9e0
[201988.443669]  [] update_process_times+0x39/0x60
[201988.443671]  [] tick_sched_handle.isra.5+0x21/0x60
[201988.443672]  [] tick_sched_timer+0x42/0x70
[201988.443674]  [] __hrtimer_run_queues+0x140/0x3c0
[201988.443675]  [] ? tick_sched_handle.isra.5+0x60/0x60
[201988.443677]  [] hrtimer_interrupt+0xb3/0x1c0
[201988.443679]  [] local_apic_timer_interrupt+0x36/0x60
[201988.443680]  [] smp_apic_timer_interrupt+0x3d/0x50
[201988.443682]  [] apic_timer_interrupt+0x8c/0xa0
[201988.443682][] ? 
__nf_conntrack_find_get+0x285/0x420
[201988.443685]  [] ? nf_conntrack_in+0x1d1/0x8d0
[201988.443686]  [] nf_conntrack_in+0x1d1/0x8d0
[201988.443688]  [] ipv4_conntrack_local+0x45/0x50
[201988.443689]  [] nf_iterate+0x62/0x80
[201988.443691]  [] nf_hook_slow+0xa0/0x110
[201988.443692]  [] ? nf_hook_slow+0x5/0x110
[201988.443694]  [] __ip_local_out+0xd8/0x120
[201988.443696]  [] ? ip_forward_options+0x1f0/0x1f0
[201988.443697]  [] ip_local_out+0x1c/0x70
[201988.443699]  [] ip_queue_xmit+0x18f/0x450
[201988.443700]  [] ? ip_queue_xmit+0x5/0x450
[201988.443702]  [] tcp_transmit_skb+0x48b/0x8e0
[201988.443703]  [] tcp_connect+0x629/0x830
[201988.443705]  [] ? secure_tcp_sequence_number+0x7f/0xe0
[201988.443706]  [] tcp_v4_connect+0x2b9/0x460
[201988.443708]  [] __inet_stream_connect+0xb2/0x310
[201988.443710]  [] ? preempt_count_sub+0xa1/0x100
[201988.443711]  [] ? lock_sock_nested+0x31/0x90
[201988.443713]  [] ? __local_bh_enable_ip+0x6f/0xd0
[201988.443715]  [] inet_stream_connect+0x38/0x50
[201988.443716]  [] SyS_connect+0x7b/0xf0
[201988.443718]  [] ? sock_alloc_file+0xa5/0x140
[201988.443719]  [] ? trace_hardirqs_on_thunk+0x1a/0x1c
[201988.443720]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
[202168.442569] INFO: rcu_preempt detected stalls on CPUs/tasks:
[202168.442572] Tasks 

4.7.0: RCU stall in nf_conntrack

2016-08-09 Thread Johannes Stezenbach
Hi,

I just experienced network hangup with 4.7.0, it happened shortly
after resume from hibernate:

[201988.443552] INFO: rcu_preempt detected stalls on CPUs/tasks:
[201988.443556] Tasks blocked on level-0 rcu_node (CPUs 0-3): P14563
[201988.443557] (detected by 3, t=18002 jiffies, g=7365154, c=7365153, 
q=15274)
[201988.443560] client_socket_t R  running task0 14563  1 0x
[201988.443563]  8800c427a900 e1b77832 880217603da0 
810bf66a
[201988.443565]  810bf5d1 8800c427a900 81e566c0 
880217603dd0
[201988.443567]  8119a3cf 8802177d80c0 81e566c0 
81f89ae0
[201988.443569] Call Trace:
[201988.443571][] sched_show_task+0xfa/0x160
[201988.443585]  [] ? sched_show_task+0x61/0x160
[201988.443587]  [] rcu_print_detail_task_stall_rnp+0x52/0x76
[201988.443590]  [] rcu_check_callbacks+0x866/0x9e0
[201988.443592]  [] update_process_times+0x39/0x60
[201988.443594]  [] tick_sched_handle.isra.5+0x21/0x60
[201988.443596]  [] tick_sched_timer+0x42/0x70
[201988.443598]  [] __hrtimer_run_queues+0x140/0x3c0
[201988.443599]  [] ? tick_sched_handle.isra.5+0x60/0x60
[201988.443601]  [] hrtimer_interrupt+0xb3/0x1c0
[201988.443603]  [] local_apic_timer_interrupt+0x36/0x60
[201988.443606]  [] smp_apic_timer_interrupt+0x3d/0x50
[201988.443607]  [] apic_timer_interrupt+0x8c/0xa0
[201988.443608][] ? 
__nf_conntrack_find_get+0x285/0x420
[201988.443611]  [] ? nf_conntrack_in+0x1d1/0x8d0
[201988.443612]  [] nf_conntrack_in+0x1d1/0x8d0
[201988.443615]  [] ipv4_conntrack_local+0x45/0x50
[201988.443616]  [] nf_iterate+0x62/0x80
[201988.443618]  [] nf_hook_slow+0xa0/0x110
[201988.443620]  [] ? nf_hook_slow+0x5/0x110
[201988.443622]  [] __ip_local_out+0xd8/0x120
[201988.443624]  [] ? ip_forward_options+0x1f0/0x1f0
[201988.443625]  [] ip_local_out+0x1c/0x70
[201988.443627]  [] ip_queue_xmit+0x18f/0x450
[201988.443628]  [] ? ip_queue_xmit+0x5/0x450
[201988.443630]  [] tcp_transmit_skb+0x48b/0x8e0
[201988.443632]  [] tcp_connect+0x629/0x830
[201988.443634]  [] ? secure_tcp_sequence_number+0x7f/0xe0
[201988.443636]  [] tcp_v4_connect+0x2b9/0x460
[201988.443638]  [] __inet_stream_connect+0xb2/0x310
[201988.443640]  [] ? preempt_count_sub+0xa1/0x100
[201988.443642]  [] ? lock_sock_nested+0x31/0x90
[201988.443644]  [] ? __local_bh_enable_ip+0x6f/0xd0
[201988.443646]  [] inet_stream_connect+0x38/0x50
[201988.443647]  [] SyS_connect+0x7b/0xf0
[201988.443649]  [] ? sock_alloc_file+0xa5/0x140
[201988.443651]  [] ? trace_hardirqs_on_thunk+0x1a/0x1c
[201988.443652]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
[201988.443654] client_socket_t R  running task0 14563  1 0x
[201988.443656]  8800c427a900 e1b77832 880217603da0 
810bf66a
[201988.443658]  810bf5d1 8800c427a900 81e566c0 
880217603dd0
[201988.443660]  8119a3cf 8802177d80c0 81e566c0 
81f89ae0
[201988.443662] Call Trace:
[201988.443663][] sched_show_task+0xfa/0x160
[201988.443665]  [] ? sched_show_task+0x61/0x160
[201988.443666]  [] rcu_print_detail_task_stall_rnp+0x52/0x76
[201988.443668]  [] rcu_check_callbacks+0x89f/0x9e0
[201988.443669]  [] update_process_times+0x39/0x60
[201988.443671]  [] tick_sched_handle.isra.5+0x21/0x60
[201988.443672]  [] tick_sched_timer+0x42/0x70
[201988.443674]  [] __hrtimer_run_queues+0x140/0x3c0
[201988.443675]  [] ? tick_sched_handle.isra.5+0x60/0x60
[201988.443677]  [] hrtimer_interrupt+0xb3/0x1c0
[201988.443679]  [] local_apic_timer_interrupt+0x36/0x60
[201988.443680]  [] smp_apic_timer_interrupt+0x3d/0x50
[201988.443682]  [] apic_timer_interrupt+0x8c/0xa0
[201988.443682][] ? 
__nf_conntrack_find_get+0x285/0x420
[201988.443685]  [] ? nf_conntrack_in+0x1d1/0x8d0
[201988.443686]  [] nf_conntrack_in+0x1d1/0x8d0
[201988.443688]  [] ipv4_conntrack_local+0x45/0x50
[201988.443689]  [] nf_iterate+0x62/0x80
[201988.443691]  [] nf_hook_slow+0xa0/0x110
[201988.443692]  [] ? nf_hook_slow+0x5/0x110
[201988.443694]  [] __ip_local_out+0xd8/0x120
[201988.443696]  [] ? ip_forward_options+0x1f0/0x1f0
[201988.443697]  [] ip_local_out+0x1c/0x70
[201988.443699]  [] ip_queue_xmit+0x18f/0x450
[201988.443700]  [] ? ip_queue_xmit+0x5/0x450
[201988.443702]  [] tcp_transmit_skb+0x48b/0x8e0
[201988.443703]  [] tcp_connect+0x629/0x830
[201988.443705]  [] ? secure_tcp_sequence_number+0x7f/0xe0
[201988.443706]  [] tcp_v4_connect+0x2b9/0x460
[201988.443708]  [] __inet_stream_connect+0xb2/0x310
[201988.443710]  [] ? preempt_count_sub+0xa1/0x100
[201988.443711]  [] ? lock_sock_nested+0x31/0x90
[201988.443713]  [] ? __local_bh_enable_ip+0x6f/0xd0
[201988.443715]  [] inet_stream_connect+0x38/0x50
[201988.443716]  [] SyS_connect+0x7b/0xf0
[201988.443718]  [] ? sock_alloc_file+0xa5/0x140
[201988.443719]  [] ? trace_hardirqs_on_thunk+0x1a/0x1c
[201988.443720]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
[202168.442569] INFO: rcu_preempt detected stalls on CPUs/tasks:
[202168.442572] Tasks 

Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-05 Thread Johannes Stezenbach
On Fri, Aug 05, 2016 at 10:02:28AM -0700, Darrick J. Wong wrote:
> On Fri, Aug 05, 2016 at 12:35:44PM +0200, Johannes Stezenbach wrote:
> > On Wed, Aug 03, 2016 at 05:50:26PM +0300, Török Edwin wrote:
> > > I have just encountered a similar problem after I've recently upgraded to 
> > > 4.7.0:
> > > [Wed Aug  3 11:08:57 2016] EXT4-fs error (device dm-1): dx_probe:740: 
> > > inode #13295: comm python: Directory index failed checksum
> > > [Wed Aug  3 11:08:57 2016] Aborting journal on device dm-1-8.
> > > [Wed Aug  3 11:08:57 2016] EXT4-fs (dm-1): Remounting filesystem read-only
> > > [Wed Aug  3 11:08:57 2016] EXT4-fs error (device dm-1): 
> > > ext4_journal_check_start:56: Detected aborted journal
> > 
> > It just happened again to me, this time hitting /usr/sbin/
> > on root fs.  Meanwhile I ran memtest86 7.0 for two nights,
> > it didn't find anything.  I'm using hibernate regularly
> > and I think so this only happened after a few hibernate/resume
> > cycles, but no idea if that means anything.
> > Now I'm back at 4.4.16 to see if it reproduces.
> 
> When you're back on 4.7, can you apply this patch[1] to see if it fixes
> the problem?  I speculate that the new parallel dir lookup code enables
> multiple threads to be verifying the same directory block buffer at the
> same time.
> 
> [1] 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23

I added the patch, rebuilt and rebooted.  It will take some time
before I'll report back since the issue is so hard to reproduce.

Thanks,
Johannes


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-05 Thread Johannes Stezenbach
On Fri, Aug 05, 2016 at 10:02:28AM -0700, Darrick J. Wong wrote:
> On Fri, Aug 05, 2016 at 12:35:44PM +0200, Johannes Stezenbach wrote:
> > On Wed, Aug 03, 2016 at 05:50:26PM +0300, Török Edwin wrote:
> > > I have just encountered a similar problem after I've recently upgraded to 
> > > 4.7.0:
> > > [Wed Aug  3 11:08:57 2016] EXT4-fs error (device dm-1): dx_probe:740: 
> > > inode #13295: comm python: Directory index failed checksum
> > > [Wed Aug  3 11:08:57 2016] Aborting journal on device dm-1-8.
> > > [Wed Aug  3 11:08:57 2016] EXT4-fs (dm-1): Remounting filesystem read-only
> > > [Wed Aug  3 11:08:57 2016] EXT4-fs error (device dm-1): 
> > > ext4_journal_check_start:56: Detected aborted journal
> > 
> > It just happened again to me, this time hitting /usr/sbin/
> > on root fs.  Meanwhile I ran memtest86 7.0 for two nights,
> > it didn't find anything.  I'm using hibernate regularly
> > and I think so this only happened after a few hibernate/resume
> > cycles, but no idea if that means anything.
> > Now I'm back at 4.4.16 to see if it reproduces.
> 
> When you're back on 4.7, can you apply this patch[1] to see if it fixes
> the problem?  I speculate that the new parallel dir lookup code enables
> multiple threads to be verifying the same directory block buffer at the
> same time.
> 
> [1] 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23

I added the patch, rebuilt and rebooted.  It will take some time
before I'll report back since the issue is so hard to reproduce.

Thanks,
Johannes


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-05 Thread Johannes Stezenbach
On Wed, Aug 03, 2016 at 05:50:26PM +0300, Török Edwin wrote:
> I have just encountered a similar problem after I've recently upgraded to 
> 4.7.0:
> [Wed Aug  3 11:08:57 2016] EXT4-fs error (device dm-1): dx_probe:740: inode 
> #13295: comm python: Directory index failed checksum
> [Wed Aug  3 11:08:57 2016] Aborting journal on device dm-1-8.
> [Wed Aug  3 11:08:57 2016] EXT4-fs (dm-1): Remounting filesystem read-only
> [Wed Aug  3 11:08:57 2016] EXT4-fs error (device dm-1): 
> ext4_journal_check_start:56: Detected aborted journal
> 
> I've rebooted in single-user mode, fsck fixed the filesystem, and rebooted, 
> filesystem is rw again now.
> 
> inode #13295 seems to be this and I can list it now:
> stat /usr/lib64/python3.4/site-packages
>   File: '/usr/lib64/python3.4/site-packages'
>   Size: 12288 Blocks: 24 IO Block: 4096   directory
> Device: fd01h/64769d  Inode: 13295   Links: 180
> Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)
> Access: 2016-05-09 11:29:44.056661988 +0300
> Modify: 2016-08-01 00:34:24.029779875 +0300
> Change: 2016-08-01 00:34:24.029779875 +0300
>  Birth: -
> 
> The filesystem was /, I only noticed it was readonly after several hours when 
> I tried to install something:
> /dev/mapper/vg--ssd-root on / type ext4 
> (rw,noatime,errors=remount-ro,data=ordered)
> 
> $ uname -a
> Linux bolt 4.7.0-gentoo-rr #1 SMP Thu Jul 28 11:28:56 EEST 2016 x86_64 AMD 
> FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux
> 
> FWIW I've been using ext4 for years and this is the first time I see this 
> message.
> Prior to 4.7 I was on 4.6.1 -> 4.6.2 -> 4.6.3 -> 4.6.4.
> 
> The kernel is from gentoo-sources + a patch for enabling AMD LWP (I had that 
> patch since 4.6.3 and its not related to I/O).
> 
> If I see this message again what should I do to obtain more information to 
> trace down the root cause?

It just happened again to me, this time hitting /usr/sbin/
on root fs.  Meanwhile I ran memtest86 7.0 for two nights,
it didn't find anything.  I'm using hibernate regularly
and I think so this only happened after a few hibernate/resume
cycles, but no idea if that means anything.
Now I'm back at 4.4.16 to see if it reproduces.

Johannes


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-05 Thread Johannes Stezenbach
On Wed, Aug 03, 2016 at 05:50:26PM +0300, Török Edwin wrote:
> I have just encountered a similar problem after I've recently upgraded to 
> 4.7.0:
> [Wed Aug  3 11:08:57 2016] EXT4-fs error (device dm-1): dx_probe:740: inode 
> #13295: comm python: Directory index failed checksum
> [Wed Aug  3 11:08:57 2016] Aborting journal on device dm-1-8.
> [Wed Aug  3 11:08:57 2016] EXT4-fs (dm-1): Remounting filesystem read-only
> [Wed Aug  3 11:08:57 2016] EXT4-fs error (device dm-1): 
> ext4_journal_check_start:56: Detected aborted journal
> 
> I've rebooted in single-user mode, fsck fixed the filesystem, and rebooted, 
> filesystem is rw again now.
> 
> inode #13295 seems to be this and I can list it now:
> stat /usr/lib64/python3.4/site-packages
>   File: '/usr/lib64/python3.4/site-packages'
>   Size: 12288 Blocks: 24 IO Block: 4096   directory
> Device: fd01h/64769d  Inode: 13295   Links: 180
> Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)
> Access: 2016-05-09 11:29:44.056661988 +0300
> Modify: 2016-08-01 00:34:24.029779875 +0300
> Change: 2016-08-01 00:34:24.029779875 +0300
>  Birth: -
> 
> The filesystem was /, I only noticed it was readonly after several hours when 
> I tried to install something:
> /dev/mapper/vg--ssd-root on / type ext4 
> (rw,noatime,errors=remount-ro,data=ordered)
> 
> $ uname -a
> Linux bolt 4.7.0-gentoo-rr #1 SMP Thu Jul 28 11:28:56 EEST 2016 x86_64 AMD 
> FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux
> 
> FWIW I've been using ext4 for years and this is the first time I see this 
> message.
> Prior to 4.7 I was on 4.6.1 -> 4.6.2 -> 4.6.3 -> 4.6.4.
> 
> The kernel is from gentoo-sources + a patch for enabling AMD LWP (I had that 
> patch since 4.6.3 and its not related to I/O).
> 
> If I see this message again what should I do to obtain more information to 
> trace down the root cause?

It just happened again to me, this time hitting /usr/sbin/
on root fs.  Meanwhile I ran memtest86 7.0 for two nights,
it didn't find anything.  I'm using hibernate regularly
and I think so this only happened after a few hibernate/resume
cycles, but no idea if that means anything.
Now I'm back at 4.4.16 to see if it reproduces.

Johannes


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-07-27 Thread Johannes Stezenbach
On Mon, Jul 18, 2016 at 04:17:23PM +0200, Johannes Stezenbach wrote:
> On Mon, Jul 18, 2016 at 09:38:43AM -0400, Theodore Ts'o wrote:
> > On Mon, Jul 18, 2016 at 12:57:07PM +0200, Johannes Stezenbach wrote:
> > > 
> > > I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD
> > > and out of the blue on idle machine the following error
> > > message appeared:
> > > 
> > > [373851.683131] EXT4-fs (dm-3): error count since last fsck: 1
> > > [373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: 
> > > dx_probe:740: inode 22288562
> > > [373851.683158] EXT4-fs (dm-3): last error at time 1468438194: 
> > > dx_probe:740: inode 22288562
> > > 
> > > inode 22288562 is a directory with ~800 small files in it,
> > > but AFAICT nothing was accessing it, no cron job running etc.
> > > No further error message was logged.  Accessing the directory
> > > and the files in it also gives no further errors.

FWIW, now with 4.7.0 and errors=remount-ro it just happened again
during git update (actually "repo sync -ld" of AOSP/cm
repository).  Again a directory with 321 small files.
ls on ro fs after the error listed the directory without problems.
Fsck fixed wrong inode and wrong free block count.
ls after fsck still listed the directory and "git status"
reported it as clean.

[72173.126740] EXT4-fs error (device dm-3): dx_probe:740: inode #12327817: comm 
git: Directory index failed checksum
[72173.131346] Aborting journal on device dm-3-8.
[72173.135884] EXT4-fs (dm-3): Remounting filesystem read-only

Since I upgraded the RAM from 4G to 8G not long ago I
suspect it could be the root of the issue, although
this RAM was taken from another machine (which I had
upgraded from 4G to 12G and now downgraded to 8G) where it
worked for ~2 years, also with AOSP stuff.  Sigh...


Johannes


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-07-27 Thread Johannes Stezenbach
On Mon, Jul 18, 2016 at 04:17:23PM +0200, Johannes Stezenbach wrote:
> On Mon, Jul 18, 2016 at 09:38:43AM -0400, Theodore Ts'o wrote:
> > On Mon, Jul 18, 2016 at 12:57:07PM +0200, Johannes Stezenbach wrote:
> > > 
> > > I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD
> > > and out of the blue on idle machine the following error
> > > message appeared:
> > > 
> > > [373851.683131] EXT4-fs (dm-3): error count since last fsck: 1
> > > [373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: 
> > > dx_probe:740: inode 22288562
> > > [373851.683158] EXT4-fs (dm-3): last error at time 1468438194: 
> > > dx_probe:740: inode 22288562
> > > 
> > > inode 22288562 is a directory with ~800 small files in it,
> > > but AFAICT nothing was accessing it, no cron job running etc.
> > > No further error message was logged.  Accessing the directory
> > > and the files in it also gives no further errors.

FWIW, now with 4.7.0 and errors=remount-ro it just happened again
during git update (actually "repo sync -ld" of AOSP/cm
repository).  Again a directory with 321 small files.
ls on ro fs after the error listed the directory without problems.
Fsck fixed wrong inode and wrong free block count.
ls after fsck still listed the directory and "git status"
reported it as clean.

[72173.126740] EXT4-fs error (device dm-3): dx_probe:740: inode #12327817: comm 
git: Directory index failed checksum
[72173.131346] Aborting journal on device dm-3-8.
[72173.135884] EXT4-fs (dm-3): Remounting filesystem read-only

Since I upgraded the RAM from 4G to 8G not long ago I
suspect it could be the root of the issue, although
this RAM was taken from another machine (which I had
upgraded from 4G to 12G and now downgraded to 8G) where it
worked for ~2 years, also with AOSP stuff.  Sigh...


Johannes


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-07-18 Thread Johannes Stezenbach
On Mon, Jul 18, 2016 at 09:38:43AM -0400, Theodore Ts'o wrote:
> On Mon, Jul 18, 2016 at 12:57:07PM +0200, Johannes Stezenbach wrote:
> > 
> > I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD
> > and out of the blue on idle machine the following error
> > message appeared:
> > 
> > [373851.683131] EXT4-fs (dm-3): error count since last fsck: 1
> > [373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: 
> > dx_probe:740: inode 22288562
> > [373851.683158] EXT4-fs (dm-3): last error at time 1468438194: 
> > dx_probe:740: inode 22288562
> > 
> > inode 22288562 is a directory with ~800 small files in it,
> > but AFAICT nothing was accessing it, no cron job running etc.
> > No further error message was logged.  Accessing the directory
> > and the files in it also gives no further errors.
> 
> Yes, thes messages gets printed once a day in case there was a file
> system corruption detected earlier.  The problem is people
> unfortunately run with their file systems set to errors=continue,
> which I sometimes refer to as the "don't worry, be happy" option.  The
[snip]

I've not willingly done this, but I recently upgraded to a bigger
SSD and so created new file system, and the mount option for errors=
isn't specified so it uses the default from superblock, and
mkfs.ext4 has defaulted to "Errors behavior: Continue"
according to dumpe2fs -h.  I'm using Debian sid FWIW, just checked
the source of e2fsprogs-1.43.1 and found:

#define EXT2_ERRORS_DEFAULT EXT2_ERRORS_CONTINUE


During reboot after crash I saw the usual "Clearing orphaned inode"
messages scroll by, however they did not make it into systemd journal.
So I suspect if there were any other fsck errors during boot
they were lost, too, thanks to systemd-fsck.

Thanks for your detailed reply.


Johannes


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-07-18 Thread Johannes Stezenbach
On Mon, Jul 18, 2016 at 09:38:43AM -0400, Theodore Ts'o wrote:
> On Mon, Jul 18, 2016 at 12:57:07PM +0200, Johannes Stezenbach wrote:
> > 
> > I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD
> > and out of the blue on idle machine the following error
> > message appeared:
> > 
> > [373851.683131] EXT4-fs (dm-3): error count since last fsck: 1
> > [373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: 
> > dx_probe:740: inode 22288562
> > [373851.683158] EXT4-fs (dm-3): last error at time 1468438194: 
> > dx_probe:740: inode 22288562
> > 
> > inode 22288562 is a directory with ~800 small files in it,
> > but AFAICT nothing was accessing it, no cron job running etc.
> > No further error message was logged.  Accessing the directory
> > and the files in it also gives no further errors.
> 
> Yes, thes messages gets printed once a day in case there was a file
> system corruption detected earlier.  The problem is people
> unfortunately run with their file systems set to errors=continue,
> which I sometimes refer to as the "don't worry, be happy" option.  The
[snip]

I've not willingly done this, but I recently upgraded to a bigger
SSD and so created new file system, and the mount option for errors=
isn't specified so it uses the default from superblock, and
mkfs.ext4 has defaulted to "Errors behavior: Continue"
according to dumpe2fs -h.  I'm using Debian sid FWIW, just checked
the source of e2fsprogs-1.43.1 and found:

#define EXT2_ERRORS_DEFAULT EXT2_ERRORS_CONTINUE


During reboot after crash I saw the usual "Clearing orphaned inode"
messages scroll by, however they did not make it into systemd journal.
So I suspect if there were any other fsck errors during boot
they were lost, too, thanks to systemd-fsck.

Thanks for your detailed reply.


Johannes


4.7.0-rc7 ext4 error in dx_probe

2016-07-18 Thread Johannes Stezenbach
Hi,

I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD
and out of the blue on idle machine the following error
message appeared:

[373851.683131] EXT4-fs (dm-3): error count since last fsck: 1
[373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: dx_probe:740: 
inode 22288562
[373851.683158] EXT4-fs (dm-3): last error at time 1468438194: dx_probe:740: 
inode 22288562

inode 22288562 is a directory with ~800 small files in it,
but AFAICT nothing was accessing it, no cron job running etc.
No further error message was logged.  Accessing the directory
and the files in it also gives no further errors.

Searching back in the log at date -d @1468438194 I found:

Jul 13 21:29:54 foo kernel: EXT4-fs error (device dm-3): dx_probe:740: inode 
#22288562: comm git: Directory index failed checksum


Time to run fsck?  Is it the consequence of a previous crash
(I had many recently)?


Johannes


4.7.0-rc7 ext4 error in dx_probe

2016-07-18 Thread Johannes Stezenbach
Hi,

I'm running 4.7.0-rc7 with ext4 on lvm on dm-crypt on SSD
and out of the blue on idle machine the following error
message appeared:

[373851.683131] EXT4-fs (dm-3): error count since last fsck: 1
[373851.683151] EXT4-fs (dm-3): initial error at time 1468438194: dx_probe:740: 
inode 22288562
[373851.683158] EXT4-fs (dm-3): last error at time 1468438194: dx_probe:740: 
inode 22288562

inode 22288562 is a directory with ~800 small files in it,
but AFAICT nothing was accessing it, no cron job running etc.
No further error message was logged.  Accessing the directory
and the files in it also gives no further errors.

Searching back in the log at date -d @1468438194 I found:

Jul 13 21:29:54 foo kernel: EXT4-fs error (device dm-3): dx_probe:740: inode 
#22288562: comm git: Directory index failed checksum


Time to run fsck?  Is it the consequence of a previous crash
(I had many recently)?


Johannes


Re: 4.6.2 frequent crashes under memory + IO pressure

2016-06-26 Thread Johannes Stezenbach
(adding back Cc:, just dropped it to send the logs)

On Mon, Jun 27, 2016 at 01:35:14AM +0900, Tetsuo Handa wrote:
> 
> It seems to me that GFP_NOIO allocation requests are depleting memory reserves
> because they are passing ALLOC_NO_WATERMARKS to get_page_from_freelist().
> But I'm not familiar with block layer / swap I/O operation. So, will you post
> to linux-mm ML for somebody else to help you?

Frankly I don't care that much about 4.6.y when 4.7 is fixed.
Or, maybe the root issue is not fixed but the new oom code
covers it.  Below I see both dm and kcryptd so there is no
surprise when using swap on lvm on dm-crypt triggers it.
Maybe it's not a new issue on 4.6 but just some random variation
that makes it trigger easier with my particular workload.

So, unless you would like to keep going at it I'd
like to put the issue at rest.

> kswapd0(766) 0x2201200
>  0x81167522 : get_page_from_freelist+0x0/0x82b [kernel]
>  0x81168127 : __alloc_pages_nodemask+0x3da/0x978 [kernel]
>  0x8119fb2a : new_slab+0xbc/0x3bb [kernel]
>  0x811a1acd : ___slab_alloc.constprop.22+0x2fb/0x37b [kernel]
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x810c502d : put_lock_stats.isra.9+0xe/0x20 [kernel] (inexact)
>  0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] 
> (inexact)
>  0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] 
> (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x811a1c78 : kmem_cache_alloc+0xa0/0x1d6 [kernel] (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x81162b7a : mempool_alloc+0x72/0x154 [kernel] (inexact)
>  0x810c6438 : __lock_acquire.isra.16+0x55e/0xb4c [kernel] (inexact)
>  0x8133fdc1 : bio_alloc_bioset+0xe8/0x1d7 [kernel] (inexact)
>  0x816342ea : alloc_tio+0x2d/0x47 [kernel] (inexact)
>  0x8163587e : __split_and_process_bio+0x310/0x3a3 [kernel] (inexact)
>  0x81635e15 : dm_make_request+0xb5/0xe2 [kernel] (inexact)
>  0x81347ae7 : generic_make_request+0xcc/0x180 [kernel] (inexact)
>  0x81347c98 : submit_bio+0xfd/0x145 [kernel] (inexact)
> 
> kswapd0(766) 0x2201200
>  0x81167522 : get_page_from_freelist+0x0/0x82b [kernel]
>  0x81168127 : __alloc_pages_nodemask+0x3da/0x978 [kernel]
>  0x8119fb2a : new_slab+0xbc/0x3bb [kernel]
>  0x811a1acd : ___slab_alloc.constprop.22+0x2fb/0x37b [kernel]
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x81640e29 : kcryptd_queue_crypt+0x63/0x68 [kernel] (inexact)
>  0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] 
> (inexact)
>  0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] 
> (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x811a1c78 : kmem_cache_alloc+0xa0/0x1d6 [kernel] (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x81162b7a : mempool_alloc+0x72/0x154 [kernel] (inexact)
>  0x8101f5ba : sched_clock+0x9/0xd [kernel] (inexact)
>  0x810ae420 : local_clock+0x20/0x22 [kernel] (inexact)
>  0x8133fdc1 : bio_alloc_bioset+0xe8/0x1d7 [kernel] (inexact)
>  0x811983d0 : end_swap_bio_write+0x0/0x6a [kernel] (inexact)
>  0x8119854b : get_swap_bio+0x25/0x6c [kernel] (inexact)
>  0x811983d0 : end_swap_bio_write+0x0/0x6a [kernel] (inexact)
>  0x811988ef : __swap_writepage+0x1a9/0x225 [kernel] (inexact)
> 
> > 
> > > # ~/systemtap.tmp/bin/stap -e 'global traces_bt[65536];
> > > probe begin { printf("Probe start!\n"); }
> > > function dump_if_new(mask:long) {
> > >   bt = backtrace();
> > >   if (traces_bt[bt]++ == 0) {
> > > printf("%s(%u) 0x%lx\n", execname(), pid(), mask);
> > > print_backtrace();
> > > printf("\n");
> > >   }
> > > }
> > > probe kernel.function("get_page_from_freelist") { if ($alloc_flags & 0x4) 
> > > dump_if_new($gfp_mask); }
> > > probe kernel.function("gfp_pfmemalloc_allowed").return { if ($return != 
> > > 0) dump_if_new($gfp_mask); }
> > > probe end { delete traces_bt; }'
> > ...
> > > # addr2line -i -e /usr/src/linux-4.6.2/vmlinux 0x811b9c82
> > > /usr/src/linux-4.6.2/mm/memory.c:1162
> > > /usr/src/linux-4.6.2/mm/memory.c:1241
> > > /usr/src/linux-4.6.2/mm/memory.c:1262
> > > /usr/src/linux-4.6.2/mm/memory.c:1283
> > 
> > I'm attaching both the stap output and the serial console log,
> > not sure what you're looking for with addr2line.  Let me know.
> 
> I just meant how to find location in source code from addresses.

I meant the log is so large I wouldn't know which
addresses would be interesting to look up.

Thanks,
Johannes


Re: 4.6.2 frequent crashes under memory + IO pressure

2016-06-26 Thread Johannes Stezenbach
(adding back Cc:, just dropped it to send the logs)

On Mon, Jun 27, 2016 at 01:35:14AM +0900, Tetsuo Handa wrote:
> 
> It seems to me that GFP_NOIO allocation requests are depleting memory reserves
> because they are passing ALLOC_NO_WATERMARKS to get_page_from_freelist().
> But I'm not familiar with block layer / swap I/O operation. So, will you post
> to linux-mm ML for somebody else to help you?

Frankly I don't care that much about 4.6.y when 4.7 is fixed.
Or, maybe the root issue is not fixed but the new oom code
covers it.  Below I see both dm and kcryptd so there is no
surprise when using swap on lvm on dm-crypt triggers it.
Maybe it's not a new issue on 4.6 but just some random variation
that makes it trigger easier with my particular workload.

So, unless you would like to keep going at it I'd
like to put the issue at rest.

> kswapd0(766) 0x2201200
>  0x81167522 : get_page_from_freelist+0x0/0x82b [kernel]
>  0x81168127 : __alloc_pages_nodemask+0x3da/0x978 [kernel]
>  0x8119fb2a : new_slab+0xbc/0x3bb [kernel]
>  0x811a1acd : ___slab_alloc.constprop.22+0x2fb/0x37b [kernel]
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x810c502d : put_lock_stats.isra.9+0xe/0x20 [kernel] (inexact)
>  0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] 
> (inexact)
>  0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] 
> (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x811a1c78 : kmem_cache_alloc+0xa0/0x1d6 [kernel] (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x81162b7a : mempool_alloc+0x72/0x154 [kernel] (inexact)
>  0x810c6438 : __lock_acquire.isra.16+0x55e/0xb4c [kernel] (inexact)
>  0x8133fdc1 : bio_alloc_bioset+0xe8/0x1d7 [kernel] (inexact)
>  0x816342ea : alloc_tio+0x2d/0x47 [kernel] (inexact)
>  0x8163587e : __split_and_process_bio+0x310/0x3a3 [kernel] (inexact)
>  0x81635e15 : dm_make_request+0xb5/0xe2 [kernel] (inexact)
>  0x81347ae7 : generic_make_request+0xcc/0x180 [kernel] (inexact)
>  0x81347c98 : submit_bio+0xfd/0x145 [kernel] (inexact)
> 
> kswapd0(766) 0x2201200
>  0x81167522 : get_page_from_freelist+0x0/0x82b [kernel]
>  0x81168127 : __alloc_pages_nodemask+0x3da/0x978 [kernel]
>  0x8119fb2a : new_slab+0xbc/0x3bb [kernel]
>  0x811a1acd : ___slab_alloc.constprop.22+0x2fb/0x37b [kernel]
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x81640e29 : kcryptd_queue_crypt+0x63/0x68 [kernel] (inexact)
>  0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] 
> (inexact)
>  0x811a1ba4 : __slab_alloc.isra.17.constprop.21+0x57/0x8b [kernel] 
> (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x811a1c78 : kmem_cache_alloc+0xa0/0x1d6 [kernel] (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x81162a88 : mempool_alloc_slab+0x15/0x17 [kernel] (inexact)
>  0x81162b7a : mempool_alloc+0x72/0x154 [kernel] (inexact)
>  0x8101f5ba : sched_clock+0x9/0xd [kernel] (inexact)
>  0x810ae420 : local_clock+0x20/0x22 [kernel] (inexact)
>  0x8133fdc1 : bio_alloc_bioset+0xe8/0x1d7 [kernel] (inexact)
>  0x811983d0 : end_swap_bio_write+0x0/0x6a [kernel] (inexact)
>  0x8119854b : get_swap_bio+0x25/0x6c [kernel] (inexact)
>  0x811983d0 : end_swap_bio_write+0x0/0x6a [kernel] (inexact)
>  0x811988ef : __swap_writepage+0x1a9/0x225 [kernel] (inexact)
> 
> > 
> > > # ~/systemtap.tmp/bin/stap -e 'global traces_bt[65536];
> > > probe begin { printf("Probe start!\n"); }
> > > function dump_if_new(mask:long) {
> > >   bt = backtrace();
> > >   if (traces_bt[bt]++ == 0) {
> > > printf("%s(%u) 0x%lx\n", execname(), pid(), mask);
> > > print_backtrace();
> > > printf("\n");
> > >   }
> > > }
> > > probe kernel.function("get_page_from_freelist") { if ($alloc_flags & 0x4) 
> > > dump_if_new($gfp_mask); }
> > > probe kernel.function("gfp_pfmemalloc_allowed").return { if ($return != 
> > > 0) dump_if_new($gfp_mask); }
> > > probe end { delete traces_bt; }'
> > ...
> > > # addr2line -i -e /usr/src/linux-4.6.2/vmlinux 0x811b9c82
> > > /usr/src/linux-4.6.2/mm/memory.c:1162
> > > /usr/src/linux-4.6.2/mm/memory.c:1241
> > > /usr/src/linux-4.6.2/mm/memory.c:1262
> > > /usr/src/linux-4.6.2/mm/memory.c:1283
> > 
> > I'm attaching both the stap output and the serial console log,
> > not sure what you're looking for with addr2line.  Let me know.
> 
> I just meant how to find location in source code from addresses.

I meant the log is so large I wouldn't know which
addresses would be interesting to look up.

Thanks,
Johannes


Re: 4.6.2 frequent crashes under memory + IO pressure

2016-06-25 Thread Johannes Stezenbach
On Sun, Jun 26, 2016 at 02:04:40AM +0900, Tetsuo Handa wrote:
> It seems to me that somebody is using ALLOC_NO_WATERMARKS (with possibly
> __GFP_NOWARN), but I don't know how to identify such callers. Maybe print
> backtrace from __alloc_pages_slowpath() when ALLOC_NO_WATERMARKS is used?

Wouldn't this create too much output for slow serial console?
Or is this case supposed to be triggered rarely?

This crash testing is pretty painful but I can try it tomorrow
if there is no better idea.

Johannes


Re: 4.6.2 frequent crashes under memory + IO pressure

2016-06-25 Thread Johannes Stezenbach
On Sun, Jun 26, 2016 at 02:04:40AM +0900, Tetsuo Handa wrote:
> It seems to me that somebody is using ALLOC_NO_WATERMARKS (with possibly
> __GFP_NOWARN), but I don't know how to identify such callers. Maybe print
> backtrace from __alloc_pages_slowpath() when ALLOC_NO_WATERMARKS is used?

Wouldn't this create too much output for slow serial console?
Or is this case supposed to be triggered rarely?

This crash testing is pretty painful but I can try it tomorrow
if there is no better idea.

Johannes


Re: 4.6.2 frequent crashes under memory + IO pressure

2016-06-25 Thread Johannes Stezenbach
On Thu, Jun 23, 2016 at 08:26:35PM +0900, Tetsuo Handa wrote:
> 
> Since you think you saw OOM messages with the older kernels, I assume that 
> the OOM
> killer was invoked on your 4.6.2 kernel. The OOM reaper in Linux 4.6 and 
> Linux 4.7
> will not help if the OOM killed process was between down_write(>mmap_sem) 
> and
> up_write(>mmap_sem).
> 
> I was not able to confirm whether the OOM killed process (I guess it was java)
> was holding mm->mmap_sem for write, for /proc/sys/kernel/hung_task_warnings
> dropped to 0 before traces of java threads are printed or console became
> unusable due to the "delayed: kcryptd_crypt, ..." line. Anyway, I think that
> kmallocwd will report it.
> 
> > > It is sad that we haven't merged kmallocwd which will report
> > > which memory allocations are stalling
> > >  ( 
> > > http://lkml.kernel.org/r/1462630604-23410-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
> > >  ).
> > 
> > Would you like me to try it?  It wouldn't prevent the hang, though,
> > just print better debug ouptut to serial console, right?
> > Or would it OOM kill some process?
> 
> Yes, but for bisection purpose, please try commit 78ebc2f7146156f4 without
> applying kmallocwd. If that commit helps avoiding flood of the allocation
> failure warnings, we can consider backporting it. If that commit does not
> help, I think you are reporting a new location which we should not use
> memory reserves.
> 
> kmallocwd will not OOM kill some process. kmallocwd will not prevent the hang.
> kmallocwd just prints information of threads which are stalling inside memory
> allocation request.

First I tried today's git, linux-4.7-rc4-187-g086e3eb, and
the good news is that the oom killer seems to work very
well and reliably killed the offending task (java).
It happened a few times, the AOSP build broke and I restarted
it until it completed.  E.g.:

[ 2083.604374] Purging GPU memory, 0 pages freed, 4508 pages still pinned.
[ 2083.611000] 96 and 0 pages still available in the bound and unbound GPU page 
lists.
[ 2083.618815] make invoked oom-killer: 
gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[ 2083.629257] make cpuset=/ mems_allowed=0
...
[ 2084.688753] Out of memory: Kill process 10431 (java) score 378 or sacrifice 
child
[ 2084.696593] Killed process 10431 (java) total-vm:5200964kB, 
anon-rss:2521764kB, file-rss:0kB, shmem-rss:0kB
[ 2084.938058] oom_reaper: reaped process 10431 (java), now anon-rss:0kB, 
file-rss:8kB, shmem-rss:0kB

Next I tried 4.6.2 with 78ebc2f7146156f4, then with kmallocwd (needed one 
manual fixup),
then both patches.  It still livelocked in all cases, the log spew looked
a bit different with 78ebc2f7146156f4 applied but still continued
endlessly.  kmallocwd alone didn't trigger, with both patches
applied kmallocwd triggered but:

[  363.815595] MemAlloc-Info: stalling=33 dying=0 exiting=42 victim=0 
oom_count=0
[  363.815601] MemAlloc: kworker/0:0(4) flags=0x4208860 switches=212 seq=1 
gfp=0x26012c0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_NOTRACK) order=0 
delay=17984
** 1402 printk messages dropped ** [  363.818816]  [] 
__do_page_cache_readahead+0x144/0x29d
** 501 printk messages dropped **

I'll zip up the logs and send them off-list.


Thanks,
Johannes


Re: 4.6.2 frequent crashes under memory + IO pressure

2016-06-25 Thread Johannes Stezenbach
On Thu, Jun 23, 2016 at 08:26:35PM +0900, Tetsuo Handa wrote:
> 
> Since you think you saw OOM messages with the older kernels, I assume that 
> the OOM
> killer was invoked on your 4.6.2 kernel. The OOM reaper in Linux 4.6 and 
> Linux 4.7
> will not help if the OOM killed process was between down_write(>mmap_sem) 
> and
> up_write(>mmap_sem).
> 
> I was not able to confirm whether the OOM killed process (I guess it was java)
> was holding mm->mmap_sem for write, for /proc/sys/kernel/hung_task_warnings
> dropped to 0 before traces of java threads are printed or console became
> unusable due to the "delayed: kcryptd_crypt, ..." line. Anyway, I think that
> kmallocwd will report it.
> 
> > > It is sad that we haven't merged kmallocwd which will report
> > > which memory allocations are stalling
> > >  ( 
> > > http://lkml.kernel.org/r/1462630604-23410-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
> > >  ).
> > 
> > Would you like me to try it?  It wouldn't prevent the hang, though,
> > just print better debug ouptut to serial console, right?
> > Or would it OOM kill some process?
> 
> Yes, but for bisection purpose, please try commit 78ebc2f7146156f4 without
> applying kmallocwd. If that commit helps avoiding flood of the allocation
> failure warnings, we can consider backporting it. If that commit does not
> help, I think you are reporting a new location which we should not use
> memory reserves.
> 
> kmallocwd will not OOM kill some process. kmallocwd will not prevent the hang.
> kmallocwd just prints information of threads which are stalling inside memory
> allocation request.

First I tried today's git, linux-4.7-rc4-187-g086e3eb, and
the good news is that the oom killer seems to work very
well and reliably killed the offending task (java).
It happened a few times, the AOSP build broke and I restarted
it until it completed.  E.g.:

[ 2083.604374] Purging GPU memory, 0 pages freed, 4508 pages still pinned.
[ 2083.611000] 96 and 0 pages still available in the bound and unbound GPU page 
lists.
[ 2083.618815] make invoked oom-killer: 
gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[ 2083.629257] make cpuset=/ mems_allowed=0
...
[ 2084.688753] Out of memory: Kill process 10431 (java) score 378 or sacrifice 
child
[ 2084.696593] Killed process 10431 (java) total-vm:5200964kB, 
anon-rss:2521764kB, file-rss:0kB, shmem-rss:0kB
[ 2084.938058] oom_reaper: reaped process 10431 (java), now anon-rss:0kB, 
file-rss:8kB, shmem-rss:0kB

Next I tried 4.6.2 with 78ebc2f7146156f4, then with kmallocwd (needed one 
manual fixup),
then both patches.  It still livelocked in all cases, the log spew looked
a bit different with 78ebc2f7146156f4 applied but still continued
endlessly.  kmallocwd alone didn't trigger, with both patches
applied kmallocwd triggered but:

[  363.815595] MemAlloc-Info: stalling=33 dying=0 exiting=42 victim=0 
oom_count=0
[  363.815601] MemAlloc: kworker/0:0(4) flags=0x4208860 switches=212 seq=1 
gfp=0x26012c0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_NOTRACK) order=0 
delay=17984
** 1402 printk messages dropped ** [  363.818816]  [] 
__do_page_cache_readahead+0x144/0x29d
** 501 printk messages dropped **

I'll zip up the logs and send them off-list.


Thanks,
Johannes


Re: 4.6.2 frequent crashes under memory + IO pressure

2016-06-23 Thread Johannes Stezenbach
On Tue, Jun 21, 2016 at 08:47:51PM +0900, Tetsuo Handa wrote:
> Johannes Stezenbach wrote:
> > 
> > a man's got to have a hobby, thus I'm running Android AOSP
> > builds on my home PC which has 4GB of RAM, 4GB swap.
> > Apparently it is not really adequate for the job but used to
> > work with a 4.4.10 kernel.  Now I upgraded to 4.6.2
> > and it crashes usually within 30mins during compilation.
> 
> Such reproducer is welcomed.
> You might be hitting OOM livelock using innocent workload.
> 
> > The crash is a hard hang, mouse doesn't move, no reaction
> > to keyboard, nothing in logs (systemd journal) after reboot.
> 
> Yes, it seems to me that your system is OOM livelocked.

I got from my crash log that X is hanging in
i915_gem_object_get_pages_gtt, and network is dead
due to order 0 allocation errors causing a series of
"ath9k_htc: RX memory allocation error", which is
what makes the issue so unpleasant.

The particular command which triggers it seems to be
Jill from the Android Java toolchain
(http://tools.android.com/tech-docs/jackandjill),
which runs as "java -Xmx3500m -jar $(JILL_JAR)", i.e.
potentially eating all my available RAM when linking
the Android framework.

Meanwhile I found some RAM and linux-4.6.2 runs stable
with 8GB for this workload.  The build time (for the
partial AOSP rebuild that fairly reliably triggered the hangup)
dropped from ~20min to ~17min (so it wasn't trashing too
badly), swap usage dropped from ~50% (of 4GB) to <5%.

> It is sad that we haven't merged kmallocwd which will report
> which memory allocations are stalling
>  ( 
> http://lkml.kernel.org/r/1462630604-23410-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
>  ).

Would you like me to try it?  It wouldn't prevent the hang, though,
just print better debug ouptut to serial console, right?
Or would it OOM kill some process?

> > Then I tried 4.5.7, it seems to be stable so far.
> > 
> > I'm using dm-crypt + lvm + ext4 (swap also in lvm).
> > 
> > Now I hooked up a laptop to the serial port and captured
> > some logs of the crash which seems to be repeating
> > 
> > [ 2240.842567] swapper/3: page allocation failure: order:0, 
> > mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK)
> > or
> > [ 2241.167986] SLUB: Unable to allocate memory on node -1, 
> > gfp=0x2080020(GFP_ATOMIC)
> > 
> > over and over.  Based on the backtraces in the log I decided
> > to hot-unplug USB devices, and twice the kernel came
> > back to live, but on the 3rd crash it was dead for good.
> 
> The values
> 
>   DMA free:12kB min:32kB
>   DMA32 free:2268kB min:6724kB
>   Normal free:84kB min:928kB 
> 
> suggest that memory reserves are spent for pointless purpose. Maybe your 
> system is
> falling into situation which was mitigated by commit 78ebc2f7146156f4 
> ("mm,writeback:
> don't use memory reserves for wb_start_writeback"). Thus, applying that 
> commit to
> your 4.6.2 kernel might help avoiding flood of these allocation failure 
> messages.

I could try.  Could you let me know if booting with mem=4G
is equivalent, or do I need to use memmap= or physically remove
the RAM (which is not so easy since the CPU fan is in the way).

> > Before I pressed the reset button I used SysRq-W.  At the bottom
> > is a "BUG: workqueue lockup", it could be the result of
> > the log spew on serial console taking so long but it looks
> > like some IO is never completing.
> 
> But even after you apply that commit, I guess you will still see silent hang 
> up
> because the page allocator would think there is still reclaimable memory. So, 
> is
> it possible to also try current linux.git kernels? I'd like to know whether
> "OOM detection rework" (which went to 4.7) helps giving up reclaiming and
> invoking the OOM killer with your workload.
> 
> Maybe __GFP_FS allocations start invoking the OOM killer. But maybe __GFP_FS
> allocations still remain stuck waiting for !__GFP_FS allocations whereas 
> !__GFP_FS
> allocations gives up without invoking the OOM killer (i.e. effectively no 
> "give up").

I could also try.  Same question about mem= though.

What is your opinion about older kernels (4.4, 4.5) working?
I think I've seen some OOM messages with the older kernels,
Jill was killed and I restarted the build to complete it.
A full bisect would take more than a day, I don't think
I have the time for it.
Since I use dm-crypt + lvm, should we add more Cc or do
you think it is an mm issue?


> > Below I'm pasting some log snippets, let me know if you like
> > it so much you want more of it ;-/  The total log is about 1.7MB.
> 
> Yes, I'd like to browse it. Could you send it to me?

Did you get any additional insights from it?


Thanks,
Johannes


Re: 4.6.2 frequent crashes under memory + IO pressure

2016-06-23 Thread Johannes Stezenbach
On Tue, Jun 21, 2016 at 08:47:51PM +0900, Tetsuo Handa wrote:
> Johannes Stezenbach wrote:
> > 
> > a man's got to have a hobby, thus I'm running Android AOSP
> > builds on my home PC which has 4GB of RAM, 4GB swap.
> > Apparently it is not really adequate for the job but used to
> > work with a 4.4.10 kernel.  Now I upgraded to 4.6.2
> > and it crashes usually within 30mins during compilation.
> 
> Such reproducer is welcomed.
> You might be hitting OOM livelock using innocent workload.
> 
> > The crash is a hard hang, mouse doesn't move, no reaction
> > to keyboard, nothing in logs (systemd journal) after reboot.
> 
> Yes, it seems to me that your system is OOM livelocked.

I got from my crash log that X is hanging in
i915_gem_object_get_pages_gtt, and network is dead
due to order 0 allocation errors causing a series of
"ath9k_htc: RX memory allocation error", which is
what makes the issue so unpleasant.

The particular command which triggers it seems to be
Jill from the Android Java toolchain
(http://tools.android.com/tech-docs/jackandjill),
which runs as "java -Xmx3500m -jar $(JILL_JAR)", i.e.
potentially eating all my available RAM when linking
the Android framework.

Meanwhile I found some RAM and linux-4.6.2 runs stable
with 8GB for this workload.  The build time (for the
partial AOSP rebuild that fairly reliably triggered the hangup)
dropped from ~20min to ~17min (so it wasn't trashing too
badly), swap usage dropped from ~50% (of 4GB) to <5%.

> It is sad that we haven't merged kmallocwd which will report
> which memory allocations are stalling
>  ( 
> http://lkml.kernel.org/r/1462630604-23410-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
>  ).

Would you like me to try it?  It wouldn't prevent the hang, though,
just print better debug ouptut to serial console, right?
Or would it OOM kill some process?

> > Then I tried 4.5.7, it seems to be stable so far.
> > 
> > I'm using dm-crypt + lvm + ext4 (swap also in lvm).
> > 
> > Now I hooked up a laptop to the serial port and captured
> > some logs of the crash which seems to be repeating
> > 
> > [ 2240.842567] swapper/3: page allocation failure: order:0, 
> > mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK)
> > or
> > [ 2241.167986] SLUB: Unable to allocate memory on node -1, 
> > gfp=0x2080020(GFP_ATOMIC)
> > 
> > over and over.  Based on the backtraces in the log I decided
> > to hot-unplug USB devices, and twice the kernel came
> > back to live, but on the 3rd crash it was dead for good.
> 
> The values
> 
>   DMA free:12kB min:32kB
>   DMA32 free:2268kB min:6724kB
>   Normal free:84kB min:928kB 
> 
> suggest that memory reserves are spent for pointless purpose. Maybe your 
> system is
> falling into situation which was mitigated by commit 78ebc2f7146156f4 
> ("mm,writeback:
> don't use memory reserves for wb_start_writeback"). Thus, applying that 
> commit to
> your 4.6.2 kernel might help avoiding flood of these allocation failure 
> messages.

I could try.  Could you let me know if booting with mem=4G
is equivalent, or do I need to use memmap= or physically remove
the RAM (which is not so easy since the CPU fan is in the way).

> > Before I pressed the reset button I used SysRq-W.  At the bottom
> > is a "BUG: workqueue lockup", it could be the result of
> > the log spew on serial console taking so long but it looks
> > like some IO is never completing.
> 
> But even after you apply that commit, I guess you will still see silent hang 
> up
> because the page allocator would think there is still reclaimable memory. So, 
> is
> it possible to also try current linux.git kernels? I'd like to know whether
> "OOM detection rework" (which went to 4.7) helps giving up reclaiming and
> invoking the OOM killer with your workload.
> 
> Maybe __GFP_FS allocations start invoking the OOM killer. But maybe __GFP_FS
> allocations still remain stuck waiting for !__GFP_FS allocations whereas 
> !__GFP_FS
> allocations gives up without invoking the OOM killer (i.e. effectively no 
> "give up").

I could also try.  Same question about mem= though.

What is your opinion about older kernels (4.4, 4.5) working?
I think I've seen some OOM messages with the older kernels,
Jill was killed and I restarted the build to complete it.
A full bisect would take more than a day, I don't think
I have the time for it.
Since I use dm-crypt + lvm, should we add more Cc or do
you think it is an mm issue?


> > Below I'm pasting some log snippets, let me know if you like
> > it so much you want more of it ;-/  The total log is about 1.7MB.
> 
> Yes, I'd like to browse it. Could you send it to me?

Did you get any additional insights from it?


Thanks,
Johannes


4.6.2 frequent crashes under memory + IO pressure

2016-06-16 Thread Johannes Stezenbach
Hi,

a man's got to have a hobby, thus I'm running Android AOSP
builds on my home PC which has 4GB of RAM, 4GB swap.
Apparently it is not really adequate for the job but used to
work with a 4.4.10 kernel.  Now I upgraded to 4.6.2
and it crashes usually within 30mins during compilation.
The crash is a hard hang, mouse doesn't move, no reaction
to keyboard, nothing in logs (systemd journal) after reboot.

Then I tried 4.5.7, it seems to be stable so far.

I'm using dm-crypt + lvm + ext4 (swap also in lvm).

Now I hooked up a laptop to the serial port and captured
some logs of the crash which seems to be repeating

[ 2240.842567] swapper/3: page allocation failure: order:0, 
mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK)
or
[ 2241.167986] SLUB: Unable to allocate memory on node -1, 
gfp=0x2080020(GFP_ATOMIC)

over and over.  Based on the backtraces in the log I decided
to hot-unplug USB devices, and twice the kernel came
back to live, but on the 3rd crash it was dead for good.
Before I pressed the reset button I used SysRq-W.  At the bottom
is a "BUG: workqueue lockup", it could be the result of
the log spew on serial console taking so long but it looks
like some IO is never completing.

Below I'm pasting some log snippets, let me know if you like
it so much you want more of it ;-/  The total log is about 1.7MB.


Thanks,
Johannes


[ 2240.837431] warn_alloc_failed: 13 callbacks suppressed
[ 2240.842567] swapper/3: page allocation failure: order:0, 
mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK)
[ 2240.852384] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.6.2 #2
[ 2240.858215] Hardware name: System manufacturer System Product Name/P8H77-V, 
BIOS 1905 10/27/2014
[ 2240.866985]  0086 8d325b5c895ad90b 88011b603a90 
81368f0c
[ 2240.874437]    88011b603b30 
811659de
[ 2240.881907]  88011b603b40 02200021 88011b603b18 
81f58240
[ 2240.889396] Call Trace:
[ 2240.891839][] dump_stack+0x85/0xbe
[ 2240.897611]  [] warn_alloc_failed+0x134/0x15c
[ 2240.903531]  [] __alloc_pages_nodemask+0x7bd/0x978
[ 2240.909884]  [] new_slab+0x129/0x3bb
[ 2240.915030]  [] ___slab_alloc.constprop.22+0x2fb/0x37b
[ 2240.921730]  [] ? __alloc_skb+0x55/0x1b4
[ 2240.927224]  [] ? skb_release_data+0xc0/0xd0
[ 2240.933046]  [] ? kfree+0x1c0/0x216
[ 2240.938089]  [] __slab_alloc.isra.17.constprop.21+0x57/0x8b
[ 2240.945214]  [] ? 
__slab_alloc.isra.17.constprop.21+0x57/0x8b
[ 2240.952520]  [] ? __alloc_skb+0x55/0x1b4
[ 2240.957997]  [] kmem_cache_alloc+0xa0/0x1d6
[ 2240.963734]  [] ? __alloc_skb+0x55/0x1b4
[ 2240.969210]  [] __alloc_skb+0x55/0x1b4
[ 2240.974524]  [] ath9k_hif_usb_reg_in_cb+0xd4/0x181 
[ath9k_htc]
[ 2240.981925]  [] __usb_hcd_giveback_urb+0xa6/0x10b
[ 2240.988215]  [] usb_giveback_urb_bh+0x9a/0xe4
[ 2240.994134]  [] tasklet_hi_action+0x10c/0x11b
[ 2241.63]  [] __do_softirq+0x182/0x377
[ 2241.005548]  [] irq_exit+0x54/0xa8
[ 2241.010521]  [] do_IRQ+0xc7/0xdf
[ 2241.015321]  [] common_interrupt+0x8c/0x8c
[ 2241.020981][] ? cpuidle_enter_state+0x1ae/0x251
[ 2241.027888]  [] cpuidle_enter+0x17/0x19
[ 2241.033280]  [] call_cpuidle+0x44/0x46
[ 2241.038600]  [] cpu_startup_entry+0x2a7/0x378
[ 2241.044524]  [] start_secondary+0x17c/0x192
[ 2241.050265] Mem-Info:
[ 2241.052543] active_anon:654174 inactive_anon:208849 isolated_anon:64
[ 2241.052543]  active_file:4782 inactive_file:3878 isolated_file:0
[ 2241.052543]  unevictable:1156 dirty:8 writeback:28052 unstable:0
[ 2241.052543]  slab_reclaimable:13827 slab_unreclaimable:25768
[ 2241.052543]  mapped:6794 shmem:3939 pagetables:5299 bounce:0
[ 2241.052543]  free:424 free_pcp:39 free_cma:0
[ 2241.086414] DMA free:12kB min:32kB low:44kB high:56kB active_anon:28kB 
inactive_anon:84kB active_file:68kB inactive_file:40kB unevictable:124kB 
isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB 
mlocked:124kB dirty:0kB writeback:0kB mapped:228kB shmem:36kB 
slab_reclaimable:552kB slab_unreclaimable:14656kB kernel_stack:0kB 
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2241.128265] lowmem_reserve[]: 0 3156 3592 3592
[ 2241.132792] DMA32 free:2120kB min:6724kB low:9956kB high:13188kB 
active_anon:2414116kB inactive_anon:629228kB active_file:15184kB 
inactive_file:13336kB unevictable:3624kB isolated(anon):256kB 
isolated(file):0kB present:3334492kB managed:3243420kB mlocked:3624kB 
dirty:24kB writeback:104760kB mapped:21988kB shmem:13936kB 
slab_reclaimable:46356kB slab_unreclaimable:74196kB kernel_stack:4144kB 
pagetables:17708kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB 
free_cma:0kB writeback_tmp:0kB pages_scanned:92 all_unreclaimable? no
[ 2241.167769] kworker/u8:3: page allocation failure: order:0, 
mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK)
[ 2241.167771] CPU: 2 PID: 1470 Comm: kworker/u8:3 Not tainted 4.6.2 #2
[ 2241.167772] Hardware name: System 

4.6.2 frequent crashes under memory + IO pressure

2016-06-16 Thread Johannes Stezenbach
Hi,

a man's got to have a hobby, thus I'm running Android AOSP
builds on my home PC which has 4GB of RAM, 4GB swap.
Apparently it is not really adequate for the job but used to
work with a 4.4.10 kernel.  Now I upgraded to 4.6.2
and it crashes usually within 30mins during compilation.
The crash is a hard hang, mouse doesn't move, no reaction
to keyboard, nothing in logs (systemd journal) after reboot.

Then I tried 4.5.7, it seems to be stable so far.

I'm using dm-crypt + lvm + ext4 (swap also in lvm).

Now I hooked up a laptop to the serial port and captured
some logs of the crash which seems to be repeating

[ 2240.842567] swapper/3: page allocation failure: order:0, 
mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK)
or
[ 2241.167986] SLUB: Unable to allocate memory on node -1, 
gfp=0x2080020(GFP_ATOMIC)

over and over.  Based on the backtraces in the log I decided
to hot-unplug USB devices, and twice the kernel came
back to live, but on the 3rd crash it was dead for good.
Before I pressed the reset button I used SysRq-W.  At the bottom
is a "BUG: workqueue lockup", it could be the result of
the log spew on serial console taking so long but it looks
like some IO is never completing.

Below I'm pasting some log snippets, let me know if you like
it so much you want more of it ;-/  The total log is about 1.7MB.


Thanks,
Johannes


[ 2240.837431] warn_alloc_failed: 13 callbacks suppressed
[ 2240.842567] swapper/3: page allocation failure: order:0, 
mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK)
[ 2240.852384] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.6.2 #2
[ 2240.858215] Hardware name: System manufacturer System Product Name/P8H77-V, 
BIOS 1905 10/27/2014
[ 2240.866985]  0086 8d325b5c895ad90b 88011b603a90 
81368f0c
[ 2240.874437]    88011b603b30 
811659de
[ 2240.881907]  88011b603b40 02200021 88011b603b18 
81f58240
[ 2240.889396] Call Trace:
[ 2240.891839][] dump_stack+0x85/0xbe
[ 2240.897611]  [] warn_alloc_failed+0x134/0x15c
[ 2240.903531]  [] __alloc_pages_nodemask+0x7bd/0x978
[ 2240.909884]  [] new_slab+0x129/0x3bb
[ 2240.915030]  [] ___slab_alloc.constprop.22+0x2fb/0x37b
[ 2240.921730]  [] ? __alloc_skb+0x55/0x1b4
[ 2240.927224]  [] ? skb_release_data+0xc0/0xd0
[ 2240.933046]  [] ? kfree+0x1c0/0x216
[ 2240.938089]  [] __slab_alloc.isra.17.constprop.21+0x57/0x8b
[ 2240.945214]  [] ? 
__slab_alloc.isra.17.constprop.21+0x57/0x8b
[ 2240.952520]  [] ? __alloc_skb+0x55/0x1b4
[ 2240.957997]  [] kmem_cache_alloc+0xa0/0x1d6
[ 2240.963734]  [] ? __alloc_skb+0x55/0x1b4
[ 2240.969210]  [] __alloc_skb+0x55/0x1b4
[ 2240.974524]  [] ath9k_hif_usb_reg_in_cb+0xd4/0x181 
[ath9k_htc]
[ 2240.981925]  [] __usb_hcd_giveback_urb+0xa6/0x10b
[ 2240.988215]  [] usb_giveback_urb_bh+0x9a/0xe4
[ 2240.994134]  [] tasklet_hi_action+0x10c/0x11b
[ 2241.63]  [] __do_softirq+0x182/0x377
[ 2241.005548]  [] irq_exit+0x54/0xa8
[ 2241.010521]  [] do_IRQ+0xc7/0xdf
[ 2241.015321]  [] common_interrupt+0x8c/0x8c
[ 2241.020981][] ? cpuidle_enter_state+0x1ae/0x251
[ 2241.027888]  [] cpuidle_enter+0x17/0x19
[ 2241.033280]  [] call_cpuidle+0x44/0x46
[ 2241.038600]  [] cpu_startup_entry+0x2a7/0x378
[ 2241.044524]  [] start_secondary+0x17c/0x192
[ 2241.050265] Mem-Info:
[ 2241.052543] active_anon:654174 inactive_anon:208849 isolated_anon:64
[ 2241.052543]  active_file:4782 inactive_file:3878 isolated_file:0
[ 2241.052543]  unevictable:1156 dirty:8 writeback:28052 unstable:0
[ 2241.052543]  slab_reclaimable:13827 slab_unreclaimable:25768
[ 2241.052543]  mapped:6794 shmem:3939 pagetables:5299 bounce:0
[ 2241.052543]  free:424 free_pcp:39 free_cma:0
[ 2241.086414] DMA free:12kB min:32kB low:44kB high:56kB active_anon:28kB 
inactive_anon:84kB active_file:68kB inactive_file:40kB unevictable:124kB 
isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB 
mlocked:124kB dirty:0kB writeback:0kB mapped:228kB shmem:36kB 
slab_reclaimable:552kB slab_unreclaimable:14656kB kernel_stack:0kB 
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2241.128265] lowmem_reserve[]: 0 3156 3592 3592
[ 2241.132792] DMA32 free:2120kB min:6724kB low:9956kB high:13188kB 
active_anon:2414116kB inactive_anon:629228kB active_file:15184kB 
inactive_file:13336kB unevictable:3624kB isolated(anon):256kB 
isolated(file):0kB present:3334492kB managed:3243420kB mlocked:3624kB 
dirty:24kB writeback:104760kB mapped:21988kB shmem:13936kB 
slab_reclaimable:46356kB slab_unreclaimable:74196kB kernel_stack:4144kB 
pagetables:17708kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB 
free_cma:0kB writeback_tmp:0kB pages_scanned:92 all_unreclaimable? no
[ 2241.167769] kworker/u8:3: page allocation failure: order:0, 
mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK)
[ 2241.167771] CPU: 2 PID: 1470 Comm: kworker/u8:3 Not tainted 4.6.2 #2
[ 2241.167772] Hardware name: System 

uas: order 7 page allocation failure in init_tag_map()

2016-04-23 Thread Johannes Stezenbach
Hi,

I bought a new backup disk which turned out to be UAS capable,
but when I plugged it in I got an order 7 page allocation failure.
My hunch is that the .can_queue = 65536 in drivers/usb/storage/uas.c
is much too large.  Maybe 256 would be a pratical value that matches
the capabilities of existing hardware?


[1859683.261465] usb 4-2: new SuperSpeed USB device number 8 using xhci_hcd
[1859683.281986] scsi host18: uas
[1859683.282003] kworker/0:2: page allocation failure: order:7, mode:0x208c020
[1859683.282008] CPU: 0 PID: 6888 Comm: kworker/0:2 Not tainted 4.4.6 #1
[1859683.282011] Hardware name: System manufacturer System Product 
Name/P8H77-V, BIOS 1905 10/27/2014
[1859683.282017] Workqueue: usb_hub_wq hub_event
[1859683.282021]  0286 d38f5999 8800751674d0 
813527de
[1859683.282026]   0208c020 880075167570 
81157c56
[1859683.282031]  880075167580 880075167508 81f43840 
00f438b8
[1859683.282036] Call Trace:
[1859683.282045]  [] dump_stack+0x85/0xbe
[1859683.282050]  [] warn_alloc_failed+0x12c/0x156
[1859683.282055]  [] __alloc_pages_nodemask+0x73a/0x8f1
[1859683.282060]  [] ? dev_vprintk_emit+0x1cb/0x1f1
[1859683.282065]  [] alloc_kmem_pages+0x22/0x8a
[1859683.282069]  [] kmalloc_order+0x18/0x46
[1859683.282072]  [] kmalloc_order_trace+0x21/0xe9
[1859683.282077]  [] __kmalloc+0x38/0x22f
[1859683.282081]  [] ? __blk_queue_init_tags+0x2f/0x73
[1859683.282085]  [] init_tag_map+0x54/0xa3
[1859683.282088]  [] __blk_queue_init_tags+0x45/0x73
[1859683.282092]  [] blk_init_tags+0x14/0x16
[1859683.282096]  [] scsi_add_host_with_dma+0xc8/0x2a0
[1859683.282102]  [] uas_probe+0x3aa/0x420 [uas]
[1859683.282107]  [] usb_probe_interface+0x1a6/0x22d
[1859683.282112]  [] driver_probe_device+0x173/0x3a6
[1859683.282116]  [] __device_attach_driver+0x71/0x78
[1859683.282120]  [] ? driver_allows_async_probing+0x31/0x31
[1859683.282124]  [] bus_for_each_drv+0x8a/0xad
[1859683.282128]  [] __device_attach+0xba/0x14f
[1859683.282132]  [] device_initial_probe+0x13/0x15
[1859683.282136]  [] bus_probe_device+0x33/0x9e
[1859683.282140]  [] device_add+0x2e4/0x56e
[1859683.282144]  [] usb_set_configuration+0x689/0x6d9
[1859683.282148]  [] ? debug_smp_processor_id+0x17/0x19
[1859683.282152]  [] generic_probe+0x43/0x73
[1859683.282156]  [] usb_probe_device+0x53/0x66
[1859683.282159]  [] driver_probe_device+0x173/0x3a6
[1859683.282163]  [] __device_attach_driver+0x71/0x78
[1859683.282167]  [] ? driver_allows_async_probing+0x31/0x31
[1859683.282171]  [] bus_for_each_drv+0x8a/0xad
[1859683.282175]  [] __device_attach+0xba/0x14f
[1859683.282179]  [] device_initial_probe+0x13/0x15
[1859683.282183]  [] bus_probe_device+0x33/0x9e
[1859683.282186]  [] device_add+0x2e4/0x56e
[1859683.282191]  [] usb_new_device+0x241/0x38a
[1859683.282194]  [] hub_event+0xcb9/0x10f2
[1859683.282201]  [] process_one_work+0x27f/0x4d7
[1859683.282206]  [] ? put_lock_stats.isra.9+0xe/0x20
[1859683.282209]  [] worker_thread+0x273/0x35b
[1859683.282214]  [] ? rescuer_thread+0x2a7/0x2a7
[1859683.282217]  [] kthread+0xff/0x107
[1859683.28]  [] ? kthread_create_on_node+0x1ea/0x1ea
[1859683.282228]  [] ret_from_fork+0x3f/0x70
[1859683.282231]  [] ? kthread_create_on_node+0x1ea/0x1ea
[1859683.282234] Mem-Info:
[1859683.282241] active_anon:21278 inactive_anon:69854 isolated_anon:0
  active_file:212300 inactive_file:194346 isolated_file:0
  unevictable:2018 dirty:87 writeback:0 unstable:0
  slab_reclaimable:127644 slab_unreclaimable:12137
  mapped:11526 shmem:13394 pagetables:5007 bounce:0
  free:270678 free_pcp:1027 free_cma:0
[1859683.282252] DMA free:14412kB min:32kB low:40kB high:48kB active_anon:180kB 
inactive_anon:468kB active_file:268kB inactive_file:92kB unevictable:0kB 
isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB 
mlocked:0kB dirty:4kB writeback:0kB mapped:172kB shmem:328kB 
slab_reclaimable:208kB slab_unreclaimable:92kB kernel_stack:0kB pagetables:56kB 
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[1859683.282255] lowmem_reserve[]: 0 3162 3597 3597
[1859683.282267] DMA32 free:904468kB min:6728kB low:8408kB high:10092kB 
active_anon:66188kB inactive_anon:237164kB active_file:803244kB 
inactive_file:704168kB unevictable:7024kB isolated(anon):0kB isolated(file):0kB 
present:3334492kB managed:3243208kB mlocked:7024kB dirty:280kB writeback:0kB 
mapped:37116kB shmem:40212kB slab_reclaimable:435236kB 
slab_unreclaimable:37848kB kernel_stack:3968kB pagetables:16696kB unstable:0kB 
bounce:0kB free_pcp:2008kB local_pcp:632kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:0 all_unreclaimable? no
[1859683.282270] lowmem_reserve[]: 0 0 435 435
[1859683.282281] Normal free:163832kB min:924kB low:1152kB high:1384kB 
active_anon:18744kB inactive_anon:41784kB active_file:45688kB 
inactive_file:73124kB 

uas: order 7 page allocation failure in init_tag_map()

2016-04-23 Thread Johannes Stezenbach
Hi,

I bought a new backup disk which turned out to be UAS capable,
but when I plugged it in I got an order 7 page allocation failure.
My hunch is that the .can_queue = 65536 in drivers/usb/storage/uas.c
is much too large.  Maybe 256 would be a pratical value that matches
the capabilities of existing hardware?


[1859683.261465] usb 4-2: new SuperSpeed USB device number 8 using xhci_hcd
[1859683.281986] scsi host18: uas
[1859683.282003] kworker/0:2: page allocation failure: order:7, mode:0x208c020
[1859683.282008] CPU: 0 PID: 6888 Comm: kworker/0:2 Not tainted 4.4.6 #1
[1859683.282011] Hardware name: System manufacturer System Product 
Name/P8H77-V, BIOS 1905 10/27/2014
[1859683.282017] Workqueue: usb_hub_wq hub_event
[1859683.282021]  0286 d38f5999 8800751674d0 
813527de
[1859683.282026]   0208c020 880075167570 
81157c56
[1859683.282031]  880075167580 880075167508 81f43840 
00f438b8
[1859683.282036] Call Trace:
[1859683.282045]  [] dump_stack+0x85/0xbe
[1859683.282050]  [] warn_alloc_failed+0x12c/0x156
[1859683.282055]  [] __alloc_pages_nodemask+0x73a/0x8f1
[1859683.282060]  [] ? dev_vprintk_emit+0x1cb/0x1f1
[1859683.282065]  [] alloc_kmem_pages+0x22/0x8a
[1859683.282069]  [] kmalloc_order+0x18/0x46
[1859683.282072]  [] kmalloc_order_trace+0x21/0xe9
[1859683.282077]  [] __kmalloc+0x38/0x22f
[1859683.282081]  [] ? __blk_queue_init_tags+0x2f/0x73
[1859683.282085]  [] init_tag_map+0x54/0xa3
[1859683.282088]  [] __blk_queue_init_tags+0x45/0x73
[1859683.282092]  [] blk_init_tags+0x14/0x16
[1859683.282096]  [] scsi_add_host_with_dma+0xc8/0x2a0
[1859683.282102]  [] uas_probe+0x3aa/0x420 [uas]
[1859683.282107]  [] usb_probe_interface+0x1a6/0x22d
[1859683.282112]  [] driver_probe_device+0x173/0x3a6
[1859683.282116]  [] __device_attach_driver+0x71/0x78
[1859683.282120]  [] ? driver_allows_async_probing+0x31/0x31
[1859683.282124]  [] bus_for_each_drv+0x8a/0xad
[1859683.282128]  [] __device_attach+0xba/0x14f
[1859683.282132]  [] device_initial_probe+0x13/0x15
[1859683.282136]  [] bus_probe_device+0x33/0x9e
[1859683.282140]  [] device_add+0x2e4/0x56e
[1859683.282144]  [] usb_set_configuration+0x689/0x6d9
[1859683.282148]  [] ? debug_smp_processor_id+0x17/0x19
[1859683.282152]  [] generic_probe+0x43/0x73
[1859683.282156]  [] usb_probe_device+0x53/0x66
[1859683.282159]  [] driver_probe_device+0x173/0x3a6
[1859683.282163]  [] __device_attach_driver+0x71/0x78
[1859683.282167]  [] ? driver_allows_async_probing+0x31/0x31
[1859683.282171]  [] bus_for_each_drv+0x8a/0xad
[1859683.282175]  [] __device_attach+0xba/0x14f
[1859683.282179]  [] device_initial_probe+0x13/0x15
[1859683.282183]  [] bus_probe_device+0x33/0x9e
[1859683.282186]  [] device_add+0x2e4/0x56e
[1859683.282191]  [] usb_new_device+0x241/0x38a
[1859683.282194]  [] hub_event+0xcb9/0x10f2
[1859683.282201]  [] process_one_work+0x27f/0x4d7
[1859683.282206]  [] ? put_lock_stats.isra.9+0xe/0x20
[1859683.282209]  [] worker_thread+0x273/0x35b
[1859683.282214]  [] ? rescuer_thread+0x2a7/0x2a7
[1859683.282217]  [] kthread+0xff/0x107
[1859683.28]  [] ? kthread_create_on_node+0x1ea/0x1ea
[1859683.282228]  [] ret_from_fork+0x3f/0x70
[1859683.282231]  [] ? kthread_create_on_node+0x1ea/0x1ea
[1859683.282234] Mem-Info:
[1859683.282241] active_anon:21278 inactive_anon:69854 isolated_anon:0
  active_file:212300 inactive_file:194346 isolated_file:0
  unevictable:2018 dirty:87 writeback:0 unstable:0
  slab_reclaimable:127644 slab_unreclaimable:12137
  mapped:11526 shmem:13394 pagetables:5007 bounce:0
  free:270678 free_pcp:1027 free_cma:0
[1859683.282252] DMA free:14412kB min:32kB low:40kB high:48kB active_anon:180kB 
inactive_anon:468kB active_file:268kB inactive_file:92kB unevictable:0kB 
isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB 
mlocked:0kB dirty:4kB writeback:0kB mapped:172kB shmem:328kB 
slab_reclaimable:208kB slab_unreclaimable:92kB kernel_stack:0kB pagetables:56kB 
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[1859683.282255] lowmem_reserve[]: 0 3162 3597 3597
[1859683.282267] DMA32 free:904468kB min:6728kB low:8408kB high:10092kB 
active_anon:66188kB inactive_anon:237164kB active_file:803244kB 
inactive_file:704168kB unevictable:7024kB isolated(anon):0kB isolated(file):0kB 
present:3334492kB managed:3243208kB mlocked:7024kB dirty:280kB writeback:0kB 
mapped:37116kB shmem:40212kB slab_reclaimable:435236kB 
slab_unreclaimable:37848kB kernel_stack:3968kB pagetables:16696kB unstable:0kB 
bounce:0kB free_pcp:2008kB local_pcp:632kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:0 all_unreclaimable? no
[1859683.282270] lowmem_reserve[]: 0 0 435 435
[1859683.282281] Normal free:163832kB min:924kB low:1152kB high:1384kB 
active_anon:18744kB inactive_anon:41784kB active_file:45688kB 
inactive_file:73124kB 

Re: Kernel docs: muddying the waters a bit

2016-03-07 Thread Johannes Stezenbach
On Mon, Mar 07, 2016 at 12:29:08AM +0100, Johannes Stezenbach wrote:
> On Sat, Mar 05, 2016 at 11:29:37PM -0300, Mauro Carvalho Chehab wrote:
> > 
> > I converted one of the big tables to CSV. At least now it recognized
> > it as a table. Yet, the table was very badly formated:
> > 
> > https://mchehab.fedorapeople.org/media-kabi-docs-test/rst_tests/packed-rgb.html
> > 
> > This is how this table should look like:
> > https://linuxtv.org/downloads/v4l-dvb-apis/packed-rgb.html
> > 
> > Also, as this table has merged cells at the legend. I've no idea how
> > to tell sphinx to do that on csv format.
> > 
> > The RST files are on this git tree:
> > https://git.linuxtv.org/mchehab/v4l2-docs-poc.git/
> 
> Yeah, seems it can't do merged cells in csv.  Attached patch converts it
> back to grid table format and fixes the table definition.
> The html output looks usable, but clearly it is no fun to
> work with tables in Sphinx.
> 
> Sphinx' latex writer can't handle nested tables, though.
> Python's docutils rst2latex can, but that doesn't help here.
> rst2pdf also supports it.  But I have doubts such a large
> table would render OK in pdf without using landscape orientation.
> I have not tried because I used python3-sphinx but rst2pdf
> is only availble for Python2 in Debian so it does not integrate
> with Sphinx.

Just a quick idea:
Perhaps one alternative would be to use Graphviz to render
the problematic tables, it supports a HTML-like syntax
and can be embedded in Spinx documents:

http://www.sphinx-doc.org/en/stable/ext/graphviz.html
http://www.graphviz.org/content/node-shapes#html
http://stackoverflow.com/questions/13890568/graphviz-html-nested-tables


Johannes


Re: Kernel docs: muddying the waters a bit

2016-03-07 Thread Johannes Stezenbach
On Mon, Mar 07, 2016 at 12:29:08AM +0100, Johannes Stezenbach wrote:
> On Sat, Mar 05, 2016 at 11:29:37PM -0300, Mauro Carvalho Chehab wrote:
> > 
> > I converted one of the big tables to CSV. At least now it recognized
> > it as a table. Yet, the table was very badly formated:
> > 
> > https://mchehab.fedorapeople.org/media-kabi-docs-test/rst_tests/packed-rgb.html
> > 
> > This is how this table should look like:
> > https://linuxtv.org/downloads/v4l-dvb-apis/packed-rgb.html
> > 
> > Also, as this table has merged cells at the legend. I've no idea how
> > to tell sphinx to do that on csv format.
> > 
> > The RST files are on this git tree:
> > https://git.linuxtv.org/mchehab/v4l2-docs-poc.git/
> 
> Yeah, seems it can't do merged cells in csv.  Attached patch converts it
> back to grid table format and fixes the table definition.
> The html output looks usable, but clearly it is no fun to
> work with tables in Sphinx.
> 
> Sphinx' latex writer can't handle nested tables, though.
> Python's docutils rst2latex can, but that doesn't help here.
> rst2pdf also supports it.  But I have doubts such a large
> table would render OK in pdf without using landscape orientation.
> I have not tried because I used python3-sphinx but rst2pdf
> is only availble for Python2 in Debian so it does not integrate
> with Sphinx.

Just a quick idea:
Perhaps one alternative would be to use Graphviz to render
the problematic tables, it supports a HTML-like syntax
and can be embedded in Spinx documents:

http://www.sphinx-doc.org/en/stable/ext/graphviz.html
http://www.graphviz.org/content/node-shapes#html
http://stackoverflow.com/questions/13890568/graphviz-html-nested-tables


Johannes


Re: Kernel docs: muddying the waters a bit

2016-03-06 Thread Johannes Stezenbach
On Sat, Mar 05, 2016 at 11:29:37PM -0300, Mauro Carvalho Chehab wrote:
> 
> I converted one of the big tables to CSV. At least now it recognized
> it as a table. Yet, the table was very badly formated:
>   
> https://mchehab.fedorapeople.org/media-kabi-docs-test/rst_tests/packed-rgb.html
> 
> This is how this table should look like:
>   https://linuxtv.org/downloads/v4l-dvb-apis/packed-rgb.html
> 
> Also, as this table has merged cells at the legend. I've no idea how
> to tell sphinx to do that on csv format.
> 
> The RST files are on this git tree:
>   https://git.linuxtv.org/mchehab/v4l2-docs-poc.git/

Yeah, seems it can't do merged cells in csv.  Attached patch converts it
back to grid table format and fixes the table definition.
The html output looks usable, but clearly it is no fun to
work with tables in Sphinx.

Sphinx' latex writer can't handle nested tables, though.
Python's docutils rst2latex can, but that doesn't help here.
rst2pdf also supports it.  But I have doubts such a large
table would render OK in pdf without using landscape orientation.
I have not tried because I used python3-sphinx but rst2pdf
is only availble for Python2 in Debian so it does not integrate
with Sphinx.


Johannes
>From 61674b398e778bd5ff644ffd493d5ff1cfaca0ef Mon Sep 17 00:00:00 2001
From: Johannes Stezenbach <j...@sig21.net>
Date: Sun, 6 Mar 2016 23:55:19 +0100
Subject: [PATCH] some progress for html output

---
 _static/borderless.css |  3 --
 _static/v4l2tables.css |  9 +
 _templates/layout.html |  9 +
 packed-rgb.rst | 88 +-
 pixfmt-yuyv.rst|  2 +-
 v4l-table-within-table.rst | 72 +++--
 6 files changed, 105 insertions(+), 78 deletions(-)
 delete mode 100644 _static/borderless.css
 create mode 100644 _static/v4l2tables.css

diff --git a/_static/borderless.css b/_static/borderless.css
deleted file mode 100644
index bfd4b01..000
--- a/_static/borderless.css
+++ /dev/null
@@ -1,3 +0,0 @@
-table#table-borderless {
-border: 1px solid black;
-}
diff --git a/_static/v4l2tables.css b/_static/v4l2tables.css
new file mode 100644
index 000..c045e45
--- /dev/null
+++ b/_static/v4l2tables.css
@@ -0,0 +1,9 @@
+table.noborder {
+border: 1px solid black;
+background: white;
+white-space: nowrap;
+}
+
+table.noborder td, table.noborder th {
+padding: 0px;
+}
diff --git a/_templates/layout.html b/_templates/layout.html
index b6bf12b..637332d 100644
--- a/_templates/layout.html
+++ b/_templates/layout.html
@@ -1,9 +1,2 @@
 {% extends "!layout.html" %}
-{% block tables %}
-
-table#table-borderless {
-border: 1px solid black;
-}
-
-{{ super() }}
-{% endblock %}
+{% set css_files = css_files + ["_static/v4l2tables.css"] %}
diff --git a/packed-rgb.rst b/packed-rgb.rst
index 352b91c..b4fcf3e 100644
--- a/packed-rgb.rst
+++ b/packed-rgb.rst
@@ -9,25 +9,46 @@ graphics frame buffers. They occupy 8, 16, 24 or 32 bits per pixel.
 These are all packed-pixel formats, meaning all the data for a pixel lie
 next to each other in memory.
 
-.. csv-table:: Table: Packed RGB Image Formats
-  :header: Identifier,Code, ,Byte 0 in memory,Byte 1,Byte 2,Byte 3
+.. table:: Packed RGB Image Formats
+   :class: noborder
 
-  ``V4L2_PIX_FMT_RGB332``,'RGB1',,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`1`,b\ :sub:`0`
-  ``V4L2_PIX_FMT_ARGB444``,'AR12',,g\ :sub:`3`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,a\ :sub:`3`,a\ :sub:`2`,a\ :sub:`1`,a\ :sub:`0`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`
-  ``V4L2_PIX_FMT_XRGB444``,'XR12',,g\ :sub:`3`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,-,-,-,-,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`
-  ``V4L2_PIX_FMT_ARGB555``,'AR15',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,a,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`
-  ``V4L2_PIX_FMT_XRGB555``,'XR15',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,-,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`
-  ``V4L2_PIX_FMT_RGB565``,'RGBP',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`5`,g\ :sub:`4`,g\ :sub:`3`
-  ``V4L2_PIX_FMT_ARGB555X``,'AR15' | (1<<31),,a,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`,,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`
-  ``V4L2_PIX_FMT_XRGB555X``,'XR15' | (1<<31),,-,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`,,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4

Re: Kernel docs: muddying the waters a bit

2016-03-06 Thread Johannes Stezenbach
On Sat, Mar 05, 2016 at 11:29:37PM -0300, Mauro Carvalho Chehab wrote:
> 
> I converted one of the big tables to CSV. At least now it recognized
> it as a table. Yet, the table was very badly formated:
>   
> https://mchehab.fedorapeople.org/media-kabi-docs-test/rst_tests/packed-rgb.html
> 
> This is how this table should look like:
>   https://linuxtv.org/downloads/v4l-dvb-apis/packed-rgb.html
> 
> Also, as this table has merged cells at the legend. I've no idea how
> to tell sphinx to do that on csv format.
> 
> The RST files are on this git tree:
>   https://git.linuxtv.org/mchehab/v4l2-docs-poc.git/

Yeah, seems it can't do merged cells in csv.  Attached patch converts it
back to grid table format and fixes the table definition.
The html output looks usable, but clearly it is no fun to
work with tables in Sphinx.

Sphinx' latex writer can't handle nested tables, though.
Python's docutils rst2latex can, but that doesn't help here.
rst2pdf also supports it.  But I have doubts such a large
table would render OK in pdf without using landscape orientation.
I have not tried because I used python3-sphinx but rst2pdf
is only availble for Python2 in Debian so it does not integrate
with Sphinx.


Johannes
>From 61674b398e778bd5ff644ffd493d5ff1cfaca0ef Mon Sep 17 00:00:00 2001
From: Johannes Stezenbach 
Date: Sun, 6 Mar 2016 23:55:19 +0100
Subject: [PATCH] some progress for html output

---
 _static/borderless.css |  3 --
 _static/v4l2tables.css |  9 +
 _templates/layout.html |  9 +
 packed-rgb.rst | 88 +-
 pixfmt-yuyv.rst|  2 +-
 v4l-table-within-table.rst | 72 +++--
 6 files changed, 105 insertions(+), 78 deletions(-)
 delete mode 100644 _static/borderless.css
 create mode 100644 _static/v4l2tables.css

diff --git a/_static/borderless.css b/_static/borderless.css
deleted file mode 100644
index bfd4b01..000
--- a/_static/borderless.css
+++ /dev/null
@@ -1,3 +0,0 @@
-table#table-borderless {
-border: 1px solid black;
-}
diff --git a/_static/v4l2tables.css b/_static/v4l2tables.css
new file mode 100644
index 000..c045e45
--- /dev/null
+++ b/_static/v4l2tables.css
@@ -0,0 +1,9 @@
+table.noborder {
+border: 1px solid black;
+background: white;
+white-space: nowrap;
+}
+
+table.noborder td, table.noborder th {
+padding: 0px;
+}
diff --git a/_templates/layout.html b/_templates/layout.html
index b6bf12b..637332d 100644
--- a/_templates/layout.html
+++ b/_templates/layout.html
@@ -1,9 +1,2 @@
 {% extends "!layout.html" %}
-{% block tables %}
-
-table#table-borderless {
-border: 1px solid black;
-}
-
-{{ super() }}
-{% endblock %}
+{% set css_files = css_files + ["_static/v4l2tables.css"] %}
diff --git a/packed-rgb.rst b/packed-rgb.rst
index 352b91c..b4fcf3e 100644
--- a/packed-rgb.rst
+++ b/packed-rgb.rst
@@ -9,25 +9,46 @@ graphics frame buffers. They occupy 8, 16, 24 or 32 bits per pixel.
 These are all packed-pixel formats, meaning all the data for a pixel lie
 next to each other in memory.
 
-.. csv-table:: Table: Packed RGB Image Formats
-  :header: Identifier,Code, ,Byte 0 in memory,Byte 1,Byte 2,Byte 3
+.. table:: Packed RGB Image Formats
+   :class: noborder
 
-  ``V4L2_PIX_FMT_RGB332``,'RGB1',,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`1`,b\ :sub:`0`
-  ``V4L2_PIX_FMT_ARGB444``,'AR12',,g\ :sub:`3`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,a\ :sub:`3`,a\ :sub:`2`,a\ :sub:`1`,a\ :sub:`0`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`
-  ``V4L2_PIX_FMT_XRGB444``,'XR12',,g\ :sub:`3`,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,-,-,-,-,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`
-  ``V4L2_PIX_FMT_ARGB555``,'AR15',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,a,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`
-  ``V4L2_PIX_FMT_XRGB555``,'XR15',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,-,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`
-  ``V4L2_PIX_FMT_RGB565``,'RGBP',,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`,,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`5`,g\ :sub:`4`,g\ :sub:`3`
-  ``V4L2_PIX_FMT_ARGB555X``,'AR15' | (1<<31),,a,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`,,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:`2`,b\ :sub:`1`,b\ :sub:`0`
-  ``V4L2_PIX_FMT_XRGB555X``,'XR15' | (1<<31),,-,r\ :sub:`4`,r\ :sub:`3`,r\ :sub:`2`,r\ :sub:`1`,r\ :sub:`0`,g\ :sub:`4`,g\ :sub:`3`,,g\ :sub:`2`,g\ :sub:`1`,g\ :sub:`0`,b\ :sub:`4`,b\ :sub:`3`,b\ :sub:

Re: Kernel docs: muddying the waters a bit

2016-03-04 Thread Johannes Stezenbach
On Fri, Mar 04, 2016 at 09:59:50AM -0300, Mauro Carvalho Chehab wrote:
> 
> 3) I tried to use a .. cssclass, as Johannes suggested, but
> I was not able to include the CSS file. I suspect that this is
> easy to fix, but I want to see if the cssclass will also work for
> the pdf output as well.

"cssclass" was (I think) a custom role defined in the example,
unless you also have defined a custom role you can use plain "class".
I have not looked deeper into the theming and template stuff.

> 4) It seems that it can't produce nested tables in pdf:
> 
> Markup is unsupported in LaTeX:
> v4l-table-within-table:: nested tables are not yet implemented.
> Makefile:115: recipe for target 'latexpdf' failed

This:
http://www.sphinx-doc.org/en/stable/markup/misc.html#tables

suggests you need to add the tabularcolumns directive
for complex tables.

BTW, as an alternative to the ASCII-art input
there is also support for CSV and list tables:
http://docutils.sourceforge.net/docs/ref/rst/directives.html#table


Johannes


Re: Kernel docs: muddying the waters a bit

2016-03-04 Thread Johannes Stezenbach
On Fri, Mar 04, 2016 at 09:59:50AM -0300, Mauro Carvalho Chehab wrote:
> 
> 3) I tried to use a .. cssclass, as Johannes suggested, but
> I was not able to include the CSS file. I suspect that this is
> easy to fix, but I want to see if the cssclass will also work for
> the pdf output as well.

"cssclass" was (I think) a custom role defined in the example,
unless you also have defined a custom role you can use plain "class".
I have not looked deeper into the theming and template stuff.

> 4) It seems that it can't produce nested tables in pdf:
> 
> Markup is unsupported in LaTeX:
> v4l-table-within-table:: nested tables are not yet implemented.
> Makefile:115: recipe for target 'latexpdf' failed

This:
http://www.sphinx-doc.org/en/stable/markup/misc.html#tables

suggests you need to add the tabularcolumns directive
for complex tables.

BTW, as an alternative to the ASCII-art input
there is also support for CSV and list tables:
http://docutils.sourceforge.net/docs/ref/rst/directives.html#table


Johannes


Re: Kernel docs: muddying the waters a bit

2016-03-04 Thread Johannes Stezenbach
On Fri, Mar 04, 2016 at 10:29:08AM +0200, Jani Nikula wrote:
> On Fri, 04 Mar 2016, Mauro Carvalho Chehab  wrote:
> >
> > If, on the other hand, we decide to use RST, we'll very likely need to
> > patch it to fulfill our needs in order to add proper table support.
> > I've no idea how easy/difficult would be to do that, nor if Sphinx
> > upstream would accept such changes.
> >
> > So, at the end of the day, we may end by having to carry on our own
> > version of Sphinx inside our tree, with doesn't sound good, specially
> > since it is not just a script, but a package with hundreds of
> > files.
> 
> If we end up having to modify Sphinx, it has a powerful extension
> mechanism for this. We wouldn't have to worry about getting it merged to
> Sphinx upstream, and we wouldn't have to carry a local version of all of
> Sphinx. (In fact, the extension mechanism provides a future path for
> doing kernel-doc within Sphinx instead of as a preprocessing step.)
> 
> I know none of this alleviates your concerns with table supports right
> now. I'll try to have a look at that a bit more.

FWIW, I think table formatting in Sphinx works via style sheets.
The mechanism is documented in the Python docutils docs that
Sphinx is built upon.
Basically you use the "class" or "role" directive and define
the corresponding CSS or LaTeX (or rst2pdf) style.

Here is one example (using a custom "cssclass" role):
https://pythonhosted.org/sphinxjp.themes.basicstrap/sample.html

Directives (especially role and class):
http://www.sphinx-doc.org/en/stable/rest.html#directives

LaTeX styling:
http://docutils.readthedocs.org/en/sphinx-docs/user/latex.html#custom-interpreted-text-roles


HTH,
Johannes


Re: Kernel docs: muddying the waters a bit

2016-03-04 Thread Johannes Stezenbach
On Fri, Mar 04, 2016 at 10:29:08AM +0200, Jani Nikula wrote:
> On Fri, 04 Mar 2016, Mauro Carvalho Chehab  wrote:
> >
> > If, on the other hand, we decide to use RST, we'll very likely need to
> > patch it to fulfill our needs in order to add proper table support.
> > I've no idea how easy/difficult would be to do that, nor if Sphinx
> > upstream would accept such changes.
> >
> > So, at the end of the day, we may end by having to carry on our own
> > version of Sphinx inside our tree, with doesn't sound good, specially
> > since it is not just a script, but a package with hundreds of
> > files.
> 
> If we end up having to modify Sphinx, it has a powerful extension
> mechanism for this. We wouldn't have to worry about getting it merged to
> Sphinx upstream, and we wouldn't have to carry a local version of all of
> Sphinx. (In fact, the extension mechanism provides a future path for
> doing kernel-doc within Sphinx instead of as a preprocessing step.)
> 
> I know none of this alleviates your concerns with table supports right
> now. I'll try to have a look at that a bit more.

FWIW, I think table formatting in Sphinx works via style sheets.
The mechanism is documented in the Python docutils docs that
Sphinx is built upon.
Basically you use the "class" or "role" directive and define
the corresponding CSS or LaTeX (or rst2pdf) style.

Here is one example (using a custom "cssclass" role):
https://pythonhosted.org/sphinxjp.themes.basicstrap/sample.html

Directives (especially role and class):
http://www.sphinx-doc.org/en/stable/rest.html#directives

LaTeX styling:
http://docutils.readthedocs.org/en/sphinx-docs/user/latex.html#custom-interpreted-text-roles


HTH,
Johannes


Re: [PATCH 5/6] n_tty: Fix stuck write wakeup

2015-12-13 Thread Johannes Stezenbach
On Sun, Dec 13, 2015 at 10:38:02AM -0800, Peter Hurley wrote:
> On 12/13/2015 07:18 AM, Johannes Stezenbach wrote:
> > 
> > There is a related bug that I meant to send a patch, but I
> > never got around because the issue was found with proprietary
> > userspace and ancient kernel.  Maybe you could take care of it?
> > The patch might not apply cleanly after your recent changes
> > or might even be invalid now, please check.
> 
> Thanks for the patch, Johannes!
> 
> Yes, the patch below is still required to prevent excessive SIGIO
> (and to prevent missed SIGIO when the amount actually copied just
> happens to be exactly the amount left to be copied).
> 
> I made some comments in the patch; can you re-submit with those
> changes and the patch title in the subject? Or I'd happy to re-work
> it and send it to Greg if you'd prefer; just let me know.

Please rework it, currently I'm in lazy bum mode ;-)

> > @@ -1991,7 +1992,7 @@ static ssize_t n_tty_write(struct tty_st
> >  break_out:
> > __set_current_state(TASK_RUNNING);
> > remove_wait_queue(>write_wait, );
> > -   if (b - buf != nr && tty->fasync)
> > +   if (b - buf != count && tty->fasync)
> 
> ... this can be
> 
>   if (nr && tty->fasync)
>   set_bit(TTY_DO_WRITE_WAKEUP, >flags);

Yeah, that's way better.

Thanks,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] n_tty: Fix stuck write wakeup

2015-12-13 Thread Johannes Stezenbach
Hi Peter,

On Sat, Dec 12, 2015 at 02:16:38PM -0800, Peter Hurley wrote:
> If signal-driven i/o is disabled while write wakeup is pending (ie.,
> n_tty_write() has set TTY_DO_WRITE_WAKEUP but then signal-driven i/o
> is disabled), the TTY_DO_WRITE_WAKEUP bit will never be cleared and
> will cause tty_wakeup() to always call n_tty_write_wakeup.
> 
> Unconditionally clear the write wakeup, and since kill_fasync()
> already checks if the fasync ptr is null, call kill_fasync()
> unconditionally as well.
...
> @@ -230,8 +230,8 @@ static ssize_t chars_in_buffer(struct tty_struct *tty)
>  
>  static void n_tty_write_wakeup(struct tty_struct *tty)
>  {
> - if (tty->fasync && test_and_clear_bit(TTY_DO_WRITE_WAKEUP, >flags))
> - kill_fasync(>fasync, SIGIO, POLL_OUT);
> + clear_bit(TTY_DO_WRITE_WAKEUP, >flags);
> + kill_fasync(>fasync, SIGIO, POLL_OUT);
>  }

There is a related bug that I meant to send a patch, but I
never got around because the issue was found with proprietary
userspace and ancient kernel.  Maybe you could take care of it?
The patch might not apply cleanly after your recent changes
or might even be invalid now, please check.

Thanks,
Johannes


---
tty: n_tty: fix SIGIO for output

According to fcntl(2), "a SIGIO signal is sent whenever input
or output becomes possible on that file descriptor", i.e.
after the output buffer was full and now has space for new data.
But in fact SIGIO is sent after every write.

n_tty_write() should set TTY_DO_WRITE_WAKEUP only when
not all data could be written to the buffer.

Signed-off-by: Johannes Stezenbach 

--- drivers/char/n_tty.c.orig   2015-11-02 22:26:04.124227148 +0100
+++ drivers/char/n_tty.c2015-11-02 22:26:10.644212115 +0100
@@ -1925,6 +1925,7 @@ static ssize_t n_tty_write(struct tty_st
DECLARE_WAITQUEUE(wait, current);
int c;
ssize_t retval = 0;
+   size_t count = nr;

/* Job control check -- must be done at start (POSIX.1 7.1.1.4). */
if (L_TOSTOP(tty) && file->f_op->write != redirected_tty_write) {
@@ -1991,7 +1992,7 @@ static ssize_t n_tty_write(struct tty_st
 break_out:
__set_current_state(TASK_RUNNING);
remove_wait_queue(>write_wait, );
-   if (b - buf != nr && tty->fasync)
+   if (b - buf != count && tty->fasync)
set_bit(TTY_DO_WRITE_WAKEUP, >flags);
return (b - buf) ? b - buf : retval;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] n_tty: Always wake up read()/poll() if new input

2015-12-13 Thread Johannes Stezenbach
Hi Peter,

On Sat, Dec 12, 2015 at 02:16:34PM -0800, Peter Hurley wrote:
> A read() in non-canonical mode when VMIN > 0 and VTIME == 0 does not
> complete until at least VMIN chars have been read (or the user buffer is
> full). In this infrequent read mode, n_tty_read() attempts to reduce
> wakeups by computing the amount of data still necessary to complete the
> read (minimum_to_wake) and only waking the read()/poll() when that much
> unread data has been processed. This is the only read mode for which
> new data does not necessarily generate a wakeup.
> 
> However, this optimization is broken and commonly leads to hung reads
> even though the necessary amount of data has been received. Since the
> optimization is of marginal value anyway, just remove the whole
> thing. This also remedies a race between a concurrent poll() and
> read() in this mode, where the poll() can reset the minimum_to_wake
> of the read() (and vice versa).
...
> @@ -1632,7 +1631,7 @@ static void __receive_buf(struct tty_struct *tty, const 
> unsigned char *cp,
>   /* publish read_head to consumer */
>   smp_store_release(>commit_head, ldata->read_head);
>  
> - if ((read_cnt(ldata) >= ldata->minimum_to_wake) || L_EXTPROC(tty)) {
> + if (read_cnt(ldata)) {
>   kill_fasync(>fasync, SIGIO, POLL_IN);
>   wake_up_interruptible_poll(>read_wait, POLLIN);
>   }

Your patch looks fine, I just want to mention that there was
some undocumented behaviour for async IO to take VMIN
into account for deciding when to send SIGIO, but it was
implemented incorrectly because minimum_to_wake was
only updated in read() and poll(), not directly by the
tcsetattr() ioctl.  I think your change does the right
thing to fix this case, too.  I had to debug some
proprietary code which dynamically changed VMIN based on
expected message size and thus sometimes wasn't woken up,
in the end we decided to keep VMIN=1 to solve it.


Thanks,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   >