Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-12 Thread Tom Rini
On Mon, Sep 12, 2016 at 03:46:23PM +0200, Matthijs van Duin wrote:
> On 12 September 2016 at 15:35, Tom Rini  wrote:
> > What do you mean by "you can't put it to good use" ?  Is that the case
> > of stuff that's say exposed via a header and could be used but isn't (ie
> > the cape/hat/chip/etc case) or the IP block is still OK but just not
> > exposed at all?
> >
> > What we're trying to address here is the case of "don't even try to
> > use the peripheral, bad things will happen.  But please properly idle
> > the IP block!".
> 
> I'm pretty sure IP blocks in the "don't even try" category are
> necessarily idled. There's no reason they wouldn't be.

OK.  But this is specifically about the blocks that we are being told by
the vendor are _not_ idled and need to be.

-- 
Tom


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-12 Thread Matthijs van Duin
On 12 September 2016 at 15:35, Tom Rini  wrote:
> What do you mean by "you can't put it to good use" ?  Is that the case
> of stuff that's say exposed via a header and could be used but isn't (ie
> the cape/hat/chip/etc case) or the IP block is still OK but just not
> exposed at all?
>
> What we're trying to address here is the case of "don't even try to
> use the peripheral, bad things will happen.  But please properly idle
> the IP block!".

I'm pretty sure IP blocks in the "don't even try" category are
necessarily idled. There's no reason they wouldn't be.

The hwmod infrastructure generally tries to access the sysconfig
register located in the IP block address space itself. If you try to
do that with a peripheral that's efused out you'll get a bus error.

Matthijs


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-12 Thread Tom Rini
On Sat, Sep 10, 2016 at 03:11:17AM +0200, Matthijs van Duin wrote:

[snip]
> On 8 September 2016 at 16:20, Nishanth Menon  wrote:
> > Minor point here
> 
> It's not minor, it's quite crucial.
> 
> > maintaining dts per paper spin is just too impossible to maintain
> 
> Even if the per-spin dtsi just consists of including the common dtsi
> and then applying /delete-node/ per peripheral that is disabled in
> efuse? (or through firewalling, e.g. on HS devices)
> 
> > the variations if maintained as seperate dts might infact end up being
> > larger in number than all the dts we have in arch/arm/boot/dts
> 
> There are 813 dts files there. Even if there were a dra7xx and am57xx
> for every value of x (and there really isn't) you wouldn't get
> anywhere near there.
> 
> But, fair enough, efuse bits are definitely an excellent way to get
> combinatorial explosion.
> 
> Afaik those feature bits are readable through the control module on TI
> SoCs though. Assuming such a thing is the norm in SoC world maybe a
> "delete-if" property referencing some sort of test on register bits of
> a referenced syscon node might do the trick?
> 
> On the cortex-A8 doing auto-detection would be a feasible alternative,
> by reusing the existing exception mechanism to trap synchronous
> aborts, but e.g. the Cortex-A15 seems to use async aborts for *every*
> bus error, which makes things just very awful.

I think this still misses one of the keys which is that for the IP block
that has been efused (or whatever) into being unusable it's also still
true that the IP block needs to be properly turned off.  The IP in
question wasn't removed from the SoC but rather an ice pick was jammed
into a critical path meaning you can't use it and now it's in zombie
state.  The kernel needs to know about it so that it can finish the job.

-- 
Tom


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-12 Thread Tom Rini
On Sat, Sep 10, 2016 at 03:11:17AM +0200, Matthijs van Duin wrote:
> On 8 September 2016 at 15:38, Rob Herring  wrote:
> > Yes, in theory a device can go from disabled to okay, but that's
> > generally never been supported. Linux takes the simple approach of
> > "disabled" means ignore it. I think we'll see that change with
> > overlays.
> 
> No need for future tense there, overlays are being used on a daily
> basis on the BeagleBone and have already been for years.
> 
> > I don't agree. Generally, disabled means the h/w is there, but don't
> > use it. There may be some cases where the hardware doesn't exist for
> > the convenience of having a single dts, but that's the exception.
> 
> Yes and no. What matters is whether "don't use it" means "you can't
> put it to good use" or "don't even try to reach the peripheral, bad
> things will happen". Right now it's used for both cases.
> 
> Ideally the latter case would be removed from the kernel's view entirely.

What do you mean by "you can't put it to good use" ?  Is that the case
of stuff that's say exposed via a header and could be used but isn't (ie
the cape/hat/chip/etc case) or the IP block is still OK but just not
exposed at all?

What we're trying to address here is the case of "don't even try to
use the peripheral, bad things will happen.  But please properly idle
the IP block!".

-- 
Tom


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-09 Thread Matthijs van Duin
On 8 September 2016 at 15:38, Rob Herring  wrote:
> Yes, in theory a device can go from disabled to okay, but that's
> generally never been supported. Linux takes the simple approach of
> "disabled" means ignore it. I think we'll see that change with
> overlays.

No need for future tense there, overlays are being used on a daily
basis on the BeagleBone and have already been for years.

> I don't agree. Generally, disabled means the h/w is there, but don't
> use it. There may be some cases where the hardware doesn't exist for
> the convenience of having a single dts, but that's the exception.

Yes and no. What matters is whether "don't use it" means "you can't
put it to good use" or "don't even try to reach the peripheral, bad
things will happen". Right now it's used for both cases.

Ideally the latter case would be removed from the kernel's view entirely.

On 8 September 2016 at 16:20, Nishanth Menon  wrote:
> Minor point here

It's not minor, it's quite crucial.

> maintaining dts per paper spin is just too impossible to maintain

Even if the per-spin dtsi just consists of including the common dtsi
and then applying /delete-node/ per peripheral that is disabled in
efuse? (or through firewalling, e.g. on HS devices)

> the variations if maintained as seperate dts might infact end up being
> larger in number than all the dts we have in arch/arm/boot/dts

There are 813 dts files there. Even if there were a dra7xx and am57xx
for every value of x (and there really isn't) you wouldn't get
anywhere near there.

But, fair enough, efuse bits are definitely an excellent way to get
combinatorial explosion.

Afaik those feature bits are readable through the control module on TI
SoCs though. Assuming such a thing is the norm in SoC world maybe a
"delete-if" property referencing some sort of test on register bits of
a referenced syscon node might do the trick?

On the cortex-A8 doing auto-detection would be a feasible alternative,
by reusing the existing exception mechanism to trap synchronous
aborts, but e.g. the Cortex-A15 seems to use async aborts for *every*
bus error, which makes things just very awful.

Matthijs


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-09 Thread Tom Rini
On Thu, Sep 08, 2016 at 09:43:03PM -0500, Rob Herring wrote:
> On Thu, Sep 8, 2016 at 2:05 PM, Frank Rowand  wrote:
> > On 09/08/16 06:38, Rob Herring wrote:
> >> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren  wrote:
> >>> * Frank Rowand  [160831 13:51]:
[snip]
>  Why not just create a new property that describes the hardware?
>  Perhaps something like:
> 
> incomplete = "pins_output", "buggy_dma";
> >>>
> >>> New property for incomplete works for me. Rob, got any comments here?
> >>
> >> Pins not muxed out or connected on the board has to be the #1 reason
> >> for disabled status. I don't think we need or want another way to
> >> express that.
> >
> > How is that expressed now?
> 
> status = "disabled";

It might be worth repeating that the problem that needs to be solved is
that we have an IP block that exists in some sense but needs to be shut
down (or some similar case the driver can handle) rather than ignored
like today.  This is not the same as having 5 UARTs available from the
SoC but only one is actually in some useful physical state, so list all
5 in the dtsi, status disabled, and then in the board dts file enable
the exposed ones.

-- 
Tom


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-08 Thread Rob Herring
On Thu, Sep 8, 2016 at 2:05 PM, Frank Rowand  wrote:
> On 09/08/16 06:38, Rob Herring wrote:
>> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren  wrote:
>>> * Frank Rowand  [160831 13:51]:
 On 08/29/16 15:35, Tony Lindgren wrote:
> if (of_device_is_incomplete(pdev->dev.of_node, status)) {
> if (!strcmp("hw-incomplete-pins", status)) {
> dev_info(&pdev->dev,
>  "Unusable hardware: Not pinned out\n");
> err = -ENODEV;
> goto out;
> }
> if (!strcmp("hw-missing-daughter-card")) {
> err = -EPROBE_DEFER;
> goto out;
> }
> if (!strcmp("hw-buggy-dma")) {
> dev_warn(&pdev->dev,
>  "Replace hardware for working DMA\n");
> }
> }

 What if the device has two issues to be reported?  You can not
 specify two different values for the status property.
>>>
>>> That's a good point.
>>>
 What if the firmware wants to report that the hardware failed
 self-test (thus status = "fail-sss") but is already using
 status to describe the hardware?
>>>
>>> Yeah that's true. Do you know what the "sss" stands for here?
>>> Status Self teSt, or Side Scan Sonar? :)
>>
>> String String String!!!?
>>
>> No clue for me.
>>
>>>
> - Make more generic as suggested by Frank but stick with
>   "operational status of a device" approch most people seem
>   to prefer that

 I am still opposed to using the status property for this purpose.

 The status property is intended to report an operational problem with
 a device or a device that the kernel can cause to be operational (such
 as a quiescent cpu being enabled).  It is the only property I am aware
 of to report _state_.
>>
>> Yes, in theory a device can go from disabled to okay, but that's
>> generally never been supported. Linux takes the simple approach of
>> "disabled" means ignore it. I think we'll see that change with
>> overlays.
>>
 It is unfortunate that Linux has adopted the practice of overloading status
 to determine whether a piece of hardware exists or does not exist.  This
 is extremely useful for the way we structure the .dts and .dtsi files but
 should have used a new property name.  We are stuck with that choice of
 using the status property for two purposes, first the state of a device,
 and secondly the hardware description of existing or not existing.
>>
>> I don't agree. Generally, disabled means the h/w is there, but don't
>> use it. There may be some cases where the hardware doesn't exist for
>> the convenience of having a single dts, but that's the exception.
>
> That it is not an exception, but instead a frequent pattern for .dtsi files.
> A quick look in arm:
>
>   $ grep status *.dtsi | wc -l
>   4842
>
>   $ grep status *.dtsi | grep '"disabled"' | wc -l
>   3431

That's not what I mean. You can't grep for this. Yes, marking blocks
as disabled by default in the SoC dtsi file and then enabling in the
board file is standard practice. If it is left disabled that doesn't
mean it doesn't exist is what I was getting at.

And yes, there are other cases where things get disabled with efuses
or don't have pins. Sometimes there is no difference in parts at all
and it's just a given block is not tested in factory test.

 Why not just create a new property that describes the hardware?
 Perhaps something like:

incomplete = "pins_output", "buggy_dma";
>>>
>>> New property for incomplete works for me. Rob, got any comments here?
>>
>> Pins not muxed out or connected on the board has to be the #1 reason
>> for disabled status. I don't think we need or want another way to
>> express that.
>
> How is that expressed now?

status = "disabled";

Rob


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-08 Thread Tony Lindgren
* Frank Rowand  [160908 12:18]:
> > On 09/08/16 08:58, Tony Lindgren wrote:
> >> Just to consider other ways of doing it, we could use the compatible
> >> flag to tag devices that need to be just idled on probe, but that does
> >> not seem like generic solution to me.
> > 
> > Yuck.  Again overloading a property to convey multiple pieces of
> > information.
> 
> I should have been more explicit in that statement.
> 
> If the hardware device does not have "wires" routed to a connector or
> bus then it is still the same device.  Thus the compatible should be
> the same.
> 
> The difference is the way the device is used in the SOC or board.

That is correct. The device is exactly the same.

Regards,

Tony


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-08 Thread Frank Rowand
On 09/08/16 12:09, Frank Rowand wrote:
> On 09/08/16 08:58, Tony Lindgren wrote:
>> * Rob Herring  [160908 06:38]:
>>> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren  wrote:
 * Frank Rowand  [160831 13:51]:
> I am still opposed to using the status property for this purpose.
>
> The status property is intended to report an operational problem with
> a device or a device that the kernel can cause to be operational (such
> as a quiescent cpu being enabled).  It is the only property I am aware
> of to report _state_.
>>>
>>> Yes, in theory a device can go from disabled to okay, but that's
>>> generally never been supported. Linux takes the simple approach of
>>> "disabled" means ignore it. I think we'll see that change with
>>> overlays.
>>
>> Yeah I think we have to assume that.
>>
> It is unfortunate that Linux has adopted the practice of overloading 
> status
> to determine whether a piece of hardware exists or does not exist.  This
> is extremely useful for the way we structure the .dts and .dtsi files but
> should have used a new property name.  We are stuck with that choice of
> using the status property for two purposes, first the state of a device,
> and secondly the hardware description of existing or not existing.
>>>
>>> I don't agree. Generally, disabled means the h/w is there, but don't
>>> use it. There may be some cases where the hardware doesn't exist for
>>> the convenience of having a single dts, but that's the exception.
>>>
> Why not just create a new property that describes the hardware?
> Perhaps something like:
>
>incomplete = "pins_output", "buggy_dma";

 New property for incomplete works for me. Rob, got any comments here?
>>>
>>> Pins not muxed out or connected on the board has to be the #1 reason
>>> for disabled status. I don't think we need or want another way to
>>> express that.
>>
>> Both status and and a separate property work for me.
>>
>> If no other considerations, we should probably pick something with a
>> a limited set of states to avoid it getting out of control and being
>> misused for something weird like driver probe order..
> 
> I do not want to add another property that conveys state.  The property
> that I was proposing does not convey state -- it describes the hardware.
> 
> 
>> For example, just status = "fail" would be enough for the cases I've
>> seen. That would still allow probe the device, then PM runtime idle
>> it and bail out with -ENODEV.
>>
>> For whatever warnings or errors the driver needs to show, the driver
>> could probably figure it out. I don't know if we want to or need to
>> pass any informational messages with the incomplete status or
>> property :)
>>
>>> We may have discussed this, but why can't the driver that checks fail
>>> state just check whatever was used to set the device to fail in the
>>> first place?
>>
>> Well there may be no way to check if something is pinned out based on
>> the hardware. The same SoC can be packaged with different pins. In that
>> case only the die id or serial number for each produced chip is
>> different, not the revision numbers. So for cases like that the dtb
>> file or kernel cmdline is the only information available. And we can't
>> assume pinctrl is required as it's perfectly fine to do that only in
>> the bootloader for the static cases to save memory.
>>
>> Just to consider other ways of doing it, we could use the compatible
>> flag to tag devices that need to be just idled on probe, but that does
>> not seem like generic solution to me.
> 
> Yuck.  Again overloading a property to convey multiple pieces of
> information.

I should have been more explicit in that statement.

If the hardware device does not have "wires" routed to a connector or
bus then it is still the same device.  Thus the compatible should be
the same.

The difference is the way the device is used in the SOC or board.

>>
>> Regards,
>>
>> Tony
>>
> 
> 



Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-08 Thread Frank Rowand
On 09/08/16 08:58, Tony Lindgren wrote:
> * Rob Herring  [160908 06:38]:
>> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren  wrote:
>>> * Frank Rowand  [160831 13:51]:
 I am still opposed to using the status property for this purpose.

 The status property is intended to report an operational problem with
 a device or a device that the kernel can cause to be operational (such
 as a quiescent cpu being enabled).  It is the only property I am aware
 of to report _state_.
>>
>> Yes, in theory a device can go from disabled to okay, but that's
>> generally never been supported. Linux takes the simple approach of
>> "disabled" means ignore it. I think we'll see that change with
>> overlays.
> 
> Yeah I think we have to assume that.
> 
 It is unfortunate that Linux has adopted the practice of overloading status
 to determine whether a piece of hardware exists or does not exist.  This
 is extremely useful for the way we structure the .dts and .dtsi files but
 should have used a new property name.  We are stuck with that choice of
 using the status property for two purposes, first the state of a device,
 and secondly the hardware description of existing or not existing.
>>
>> I don't agree. Generally, disabled means the h/w is there, but don't
>> use it. There may be some cases where the hardware doesn't exist for
>> the convenience of having a single dts, but that's the exception.
>>
 Why not just create a new property that describes the hardware?
 Perhaps something like:

incomplete = "pins_output", "buggy_dma";
>>>
>>> New property for incomplete works for me. Rob, got any comments here?
>>
>> Pins not muxed out or connected on the board has to be the #1 reason
>> for disabled status. I don't think we need or want another way to
>> express that.
> 
> Both status and and a separate property work for me.
> 
> If no other considerations, we should probably pick something with a
> a limited set of states to avoid it getting out of control and being
> misused for something weird like driver probe order..

I do not want to add another property that conveys state.  The property
that I was proposing does not convey state -- it describes the hardware.


> For example, just status = "fail" would be enough for the cases I've
> seen. That would still allow probe the device, then PM runtime idle
> it and bail out with -ENODEV.
> 
> For whatever warnings or errors the driver needs to show, the driver
> could probably figure it out. I don't know if we want to or need to
> pass any informational messages with the incomplete status or
> property :)
> 
>> We may have discussed this, but why can't the driver that checks fail
>> state just check whatever was used to set the device to fail in the
>> first place?
> 
> Well there may be no way to check if something is pinned out based on
> the hardware. The same SoC can be packaged with different pins. In that
> case only the die id or serial number for each produced chip is
> different, not the revision numbers. So for cases like that the dtb
> file or kernel cmdline is the only information available. And we can't
> assume pinctrl is required as it's perfectly fine to do that only in
> the bootloader for the static cases to save memory.
> 
> Just to consider other ways of doing it, we could use the compatible
> flag to tag devices that need to be just idled on probe, but that does
> not seem like generic solution to me.

Yuck.  Again overloading a property to convey multiple pieces of
information.


> 
> Regards,
> 
> Tony
> 



Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-08 Thread Frank Rowand
On 09/08/16 06:38, Rob Herring wrote:
> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren  wrote:
>> * Frank Rowand  [160831 13:51]:
>>> On 08/29/16 15:35, Tony Lindgren wrote:
 if (of_device_is_incomplete(pdev->dev.of_node, status)) {
 if (!strcmp("hw-incomplete-pins", status)) {
 dev_info(&pdev->dev,
  "Unusable hardware: Not pinned out\n");
 err = -ENODEV;
 goto out;
 }
 if (!strcmp("hw-missing-daughter-card")) {
 err = -EPROBE_DEFER;
 goto out;
 }
 if (!strcmp("hw-buggy-dma")) {
 dev_warn(&pdev->dev,
  "Replace hardware for working DMA\n");
 }
 }
>>>
>>> What if the device has two issues to be reported?  You can not
>>> specify two different values for the status property.
>>
>> That's a good point.
>>
>>> What if the firmware wants to report that the hardware failed
>>> self-test (thus status = "fail-sss") but is already using
>>> status to describe the hardware?
>>
>> Yeah that's true. Do you know what the "sss" stands for here?
>> Status Self teSt, or Side Scan Sonar? :)
> 
> String String String!!!?
> 
> No clue for me.
> 
>>
 - Make more generic as suggested by Frank but stick with
   "operational status of a device" approch most people seem
   to prefer that
>>>
>>> I am still opposed to using the status property for this purpose.
>>>
>>> The status property is intended to report an operational problem with
>>> a device or a device that the kernel can cause to be operational (such
>>> as a quiescent cpu being enabled).  It is the only property I am aware
>>> of to report _state_.
> 
> Yes, in theory a device can go from disabled to okay, but that's
> generally never been supported. Linux takes the simple approach of
> "disabled" means ignore it. I think we'll see that change with
> overlays.
> 
>>> It is unfortunate that Linux has adopted the practice of overloading status
>>> to determine whether a piece of hardware exists or does not exist.  This
>>> is extremely useful for the way we structure the .dts and .dtsi files but
>>> should have used a new property name.  We are stuck with that choice of
>>> using the status property for two purposes, first the state of a device,
>>> and secondly the hardware description of existing or not existing.
> 
> I don't agree. Generally, disabled means the h/w is there, but don't
> use it. There may be some cases where the hardware doesn't exist for
> the convenience of having a single dts, but that's the exception.

That it is not an exception, but instead a frequent pattern for .dtsi files.
A quick look in arm:

  $ grep status *.dtsi | wc -l
  4842

  $ grep status *.dtsi | grep '"disabled"' | wc -l
  3431


>>> Why not just create a new property that describes the hardware?
>>> Perhaps something like:
>>>
>>>incomplete = "pins_output", "buggy_dma";
>>
>> New property for incomplete works for me. Rob, got any comments here?
> 
> Pins not muxed out or connected on the board has to be the #1 reason
> for disabled status. I don't think we need or want another way to
> express that.

How is that expressed now?


> We may have discussed this, but why can't the driver that checks fail
> state just check whatever was used to set the device to fail in the
> first place?
> 
> Rob
> 



Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-08 Thread Tony Lindgren
* Rob Herring  [160908 06:38]:
> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren  wrote:
> > * Frank Rowand  [160831 13:51]:
> >> I am still opposed to using the status property for this purpose.
> >>
> >> The status property is intended to report an operational problem with
> >> a device or a device that the kernel can cause to be operational (such
> >> as a quiescent cpu being enabled).  It is the only property I am aware
> >> of to report _state_.
> 
> Yes, in theory a device can go from disabled to okay, but that's
> generally never been supported. Linux takes the simple approach of
> "disabled" means ignore it. I think we'll see that change with
> overlays.

Yeah I think we have to assume that.

> >> It is unfortunate that Linux has adopted the practice of overloading status
> >> to determine whether a piece of hardware exists or does not exist.  This
> >> is extremely useful for the way we structure the .dts and .dtsi files but
> >> should have used a new property name.  We are stuck with that choice of
> >> using the status property for two purposes, first the state of a device,
> >> and secondly the hardware description of existing or not existing.
> 
> I don't agree. Generally, disabled means the h/w is there, but don't
> use it. There may be some cases where the hardware doesn't exist for
> the convenience of having a single dts, but that's the exception.
> 
> >> Why not just create a new property that describes the hardware?
> >> Perhaps something like:
> >>
> >>incomplete = "pins_output", "buggy_dma";
> >
> > New property for incomplete works for me. Rob, got any comments here?
> 
> Pins not muxed out or connected on the board has to be the #1 reason
> for disabled status. I don't think we need or want another way to
> express that.

Both status and and a separate property work for me.

If no other considerations, we should probably pick something with a
a limited set of states to avoid it getting out of control and being
misused for something weird like driver probe order..

For example, just status = "fail" would be enough for the cases I've
seen. That would still allow probe the device, then PM runtime idle
it and bail out with -ENODEV.

For whatever warnings or errors the driver needs to show, the driver
could probably figure it out. I don't know if we want to or need to
pass any informational messages with the incomplete status or
property :)

> We may have discussed this, but why can't the driver that checks fail
> state just check whatever was used to set the device to fail in the
> first place?

Well there may be no way to check if something is pinned out based on
the hardware. The same SoC can be packaged with different pins. In that
case only the die id or serial number for each produced chip is
different, not the revision numbers. So for cases like that the dtb
file or kernel cmdline is the only information available. And we can't
assume pinctrl is required as it's perfectly fine to do that only in
the bootloader for the static cases to save memory.

Just to consider other ways of doing it, we could use the compatible
flag to tag devices that need to be just idled on probe, but that does
not seem like generic solution to me.

Regards,

Tony


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-08 Thread Nishanth Menon

On 09/08/2016 08:38 AM, Rob Herring wrote:

On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren  wrote:

[...]

It is unfortunate that Linux has adopted the practice of overloading status
to determine whether a piece of hardware exists or does not exist.  This
is extremely useful for the way we structure the .dts and .dtsi files but
should have used a new property name.  We are stuck with that choice of
using the status property for two purposes, first the state of a device,
and secondly the hardware description of existing or not existing.


I don't agree. Generally, disabled means the h/w is there, but don't
use it. There may be some cases where the hardware doesn't exist for
the convenience of having a single dts, but that's the exception.



Minor point here: when SoCs are manufactured, even though the silicon 
die may have a hardware block, it is completely efused out based on 
paper spin. in effect such a hardware block "does not exist". The 
number of such paper spins are not exception cases, but rather 
standard for SoC vendors - maintaining dts per paper spin is just too 
impossible to maintain (DRA7 as an example maintains a single 
dra7.dtsi as the root for all paper spins.. the variations if 
maintained as seperate dts might infact end up being larger in number 
than all the dts we have in arch/arm/boot/dts) - typically as an soc 
vendor pushes a specific SoC to multiple markets, this tends to be a norm.


--
Regards,
Nishanth Menon


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-09-08 Thread Rob Herring
On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren  wrote:
> * Frank Rowand  [160831 13:51]:
>> On 08/29/16 15:35, Tony Lindgren wrote:
>> > if (of_device_is_incomplete(pdev->dev.of_node, status)) {
>> > if (!strcmp("hw-incomplete-pins", status)) {
>> > dev_info(&pdev->dev,
>> >  "Unusable hardware: Not pinned out\n");
>> > err = -ENODEV;
>> > goto out;
>> > }
>> > if (!strcmp("hw-missing-daughter-card")) {
>> > err = -EPROBE_DEFER;
>> > goto out;
>> > }
>> > if (!strcmp("hw-buggy-dma")) {
>> > dev_warn(&pdev->dev,
>> >  "Replace hardware for working DMA\n");
>> > }
>> > }
>>
>> What if the device has two issues to be reported?  You can not
>> specify two different values for the status property.
>
> That's a good point.
>
>> What if the firmware wants to report that the hardware failed
>> self-test (thus status = "fail-sss") but is already using
>> status to describe the hardware?
>
> Yeah that's true. Do you know what the "sss" stands for here?
> Status Self teSt, or Side Scan Sonar? :)

String String String!!!?

No clue for me.

>
>> > - Make more generic as suggested by Frank but stick with
>> >   "operational status of a device" approch most people seem
>> >   to prefer that
>>
>> I am still opposed to using the status property for this purpose.
>>
>> The status property is intended to report an operational problem with
>> a device or a device that the kernel can cause to be operational (such
>> as a quiescent cpu being enabled).  It is the only property I am aware
>> of to report _state_.

Yes, in theory a device can go from disabled to okay, but that's
generally never been supported. Linux takes the simple approach of
"disabled" means ignore it. I think we'll see that change with
overlays.

>> It is unfortunate that Linux has adopted the practice of overloading status
>> to determine whether a piece of hardware exists or does not exist.  This
>> is extremely useful for the way we structure the .dts and .dtsi files but
>> should have used a new property name.  We are stuck with that choice of
>> using the status property for two purposes, first the state of a device,
>> and secondly the hardware description of existing or not existing.

I don't agree. Generally, disabled means the h/w is there, but don't
use it. There may be some cases where the hardware doesn't exist for
the convenience of having a single dts, but that's the exception.

>> Why not just create a new property that describes the hardware?
>> Perhaps something like:
>>
>>incomplete = "pins_output", "buggy_dma";
>
> New property for incomplete works for me. Rob, got any comments here?

Pins not muxed out or connected on the board has to be the #1 reason
for disabled status. I don't think we need or want another way to
express that.

We may have discussed this, but why can't the driver that checks fail
state just check whatever was used to set the device to fail in the
first place?

Rob


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-08-31 Thread Tony Lindgren
* Frank Rowand  [160831 13:51]:
> On 08/29/16 15:35, Tony Lindgren wrote:
> > if (of_device_is_incomplete(pdev->dev.of_node, status)) {
> > if (!strcmp("hw-incomplete-pins", status)) {
> > dev_info(&pdev->dev,
> >  "Unusable hardware: Not pinned out\n");
> > err = -ENODEV;
> > goto out;
> > }
> > if (!strcmp("hw-missing-daughter-card")) {
> > err = -EPROBE_DEFER;
> > goto out;
> > }
> > if (!strcmp("hw-buggy-dma")) {
> > dev_warn(&pdev->dev,
> >  "Replace hardware for working DMA\n");
> > }
> > }
> 
> What if the device has two issues to be reported?  You can not
> specify two different values for the status property.

That's a good point.

> What if the firmware wants to report that the hardware failed
> self-test (thus status = "fail-sss") but is already using
> status to describe the hardware?

Yeah that's true. Do you know what the "sss" stands for here?
Status Self teSt, or Side Scan Sonar? :)

> > - Make more generic as suggested by Frank but stick with
> >   "operational status of a device" approch most people seem
> >   to prefer that
> 
> I am still opposed to using the status property for this purpose.
> 
> The status property is intended to report an operational problem with
> a device or a device that the kernel can cause to be operational (such
> as a quiescent cpu being enabled).  It is the only property I am aware
> of to report _state_.
> 
> It is unfortunate that Linux has adopted the practice of overloading status
> to determine whether a piece of hardware exists or does not exist.  This
> is extremely useful for the way we structure the .dts and .dtsi files but
> should have used a new property name.  We are stuck with that choice of
> using the status property for two purposes, first the state of a device,
> and secondly the hardware description of existing or not existing.
> 
> Why not just create a new property that describes the hardware?
> Perhaps something like:
> 
>incomplete = "pins_output", "buggy_dma";

New property for incomplete works for me. Rob, got any comments here?

> > + *  __of_device_is_incomplete - check if a device is incomplete
> 
> It is not checking if a device is incomplete.  It is checking whether the
> device is operational _or_ incomplete.
> 
> This is conflating concepts and likely to be confusing.  This is the problem
> with overloading the status property for yet another purpose.

Sure that's a valid point.

Regards,

Tony


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-08-31 Thread Frank Rowand
Hi Tony,

On 08/29/16 15:35, Tony Lindgren wrote:
> We have devices that are in incomplete state, but still need to be
> probed to allow properly idling them for PM. Some examples are
> devices that are not pinned out on certain packages, or otherwise
> unusable on some SoCs.
> 
> Setting status = "disabled" cannot be used for this case. Setting
> "disabled" makes Linux ignore these devices so they are not probed
> and can never get idled properly by Linux PM runtime.
> 
> The proper way deal with the incomplete devices is to probe them,
> then allow the driver to check the state, and then disable or idle
> the device using PM runtime. To do this, we need to often enable
> the device and run some device specific code to idle the device.
> 
> Also boards may have needs to disable and idle unused devices.
> This may be needed for example if all resources are not wired
> for a certain board variant.
> 
> It seems we can use the ePAPR 1.1 status fail-sss to do this.
> Quoting "Table 2-4 Values for status property" we have "fail-sss":
> 
> "Indicates that the device is not operational. A serious error was
>  detected in the device and it is unlikely to become operational
>  without repair. The sss portion of the value is specific to the
>  device and indicates the error condition detected."
> 
> We can handle these fail states can be handled in a generic
> way. So let's introduce a generic status = "fail-" that
> describes a situation where a device driver probe should just
> shut down or idle the device if possible and then bail out.
> This allows the SoC variant and board specific dts files to set
> the status as needed.
> 
> The suggested usage in a device driver probe is:
> 
> static int foo_probe(struct platform_device *pdev)
> {
>   int err;
>   const char *status;
>   ...
> 
> pm_runtime_enable(&pdev->dev);
> pm_runtime_get_sync(&pdev->dev);
> pm_runtime_use_autosuspend(&pdev->dev);
>   ...
> 
>   /* Configure device, load firmware, idle device */
>   ...
> 
>   if (of_device_is_incomplete(pdev->dev.of_node, status)) {
>   if (!strcmp("hw-incomplete-pins", status)) {
>   dev_info(&pdev->dev,
>"Unusable hardware: Not pinned out\n");
>   err = -ENODEV;
>   goto out;
>   }
>   if (!strcmp("hw-missing-daughter-card")) {
>   err = -EPROBE_DEFER;
>   goto out;
>   }
>   if (!strcmp("hw-buggy-dma")) {
>   dev_warn(&pdev->dev,
>"Replace hardware for working DMA\n");
>   }
>   }

What if the device has two issues to be reported?  You can not
specify two different values for the status property.

What if the firmware wants to report that the hardware failed
self-test (thus status = "fail-sss") but is already using
status to describe the hardware?


> 
>   /* Continue normal device probe */
>   ...
> 
>   return 0;
> 
> out:
>   pm_runtime_dont_use_autosuspend(&pdev->dev);
>   pm_runtime_put_sync(&pdev->dev);
>   pm_runtime_disable(&pdev->dev);
> 
>   return err;
> }
> 
> Cc: Nishanth Menon 
> Cc: Tero Kristo 
> Cc: Tom Rini 
> Signed-off-by: Tony Lindgren 
> ---
> Changes since v1 ([PATCH] of: Add generic handling for hardware incomplete
> fail state):
> 
> - Make more generic as suggested by Frank but stick with
>   "operational status of a device" approch most people seem
>   to prefer that

I am still opposed to using the status property for this purpose.

The status property is intended to report an operational problem with
a device or a device that the kernel can cause to be operational (such
as a quiescent cpu being enabled).  It is the only property I am aware
of to report _state_.

It is unfortunate that Linux has adopted the practice of overloading status
to determine whether a piece of hardware exists or does not exist.  This
is extremely useful for the way we structure the .dts and .dtsi files but
should have used a new property name.  We are stuck with that choice of
using the status property for two purposes, first the state of a device,
and secondly the hardware description of existing or not existing.

Why not just create a new property that describes the hardware?
Perhaps something like:

   incomplete = "pins_output", "buggy_dma";


> - Pass the failure status back to caller so the caller may
>   attempt to handle the failure status
> 
> - Improved the description with more examples
> 
>  drivers/of/base.c  | 64 
> +++---
>  include/linux/of.h |  8 +++
>  2 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index ff37f6d..4d39857 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -550,8 +550,8 @@ EXPORT_SYMBOL(of_machine_is_compatible);
>   *
>  

Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-08-31 Thread Tony Lindgren
* Tony Lindgren  [160829 17:40]:
> * Rob Herring  [160829 17:24]:
> > On Mon, Aug 29, 2016 at 5:35 PM, Tony Lindgren  wrote:
> > > if (!strcmp("hw-incomplete-pins", status)) {
> > > dev_info(&pdev->dev,
> > >  "Unusable hardware: Not pinned out\n");
> > > err = -ENODEV;
> > > goto out;
> > > }
> > > if (!strcmp("hw-missing-daughter-card")) {
> > > err = -EPROBE_DEFER;
> > 
> > This implies we're going to change this on the fly? I guess
> > disabled->okay can already happen.
> 
> Well let's assume the bootloader sets some i2c controlled daughter
> board with "fail-hw-missing-daughter-card", then in theory kernel
> could probe it if it pops up on the i2c bus later on. Not sure if
> we want to do this, but it seems we could..

I've dropped that part, if there ever is need we can revisit it.
Probably disabled->okay change is enough here.

> > > +static bool __of_device_is_incomplete(const struct device_node *device,
> > > + const char **status)
> > > +{
> > > +   const char *s, *m = "fail-";
> > > +   int slen, mlen;
> > > +
> > > +   if (!device)
> > > +   return false;
> > > +
> > > +   s = __of_get_property(device, "status", &slen);
> > 
> > You can use the string helper function here (or is the lock a problem?).
> 
> I'll check.

We have of_property_read_string calls of_find_property which takes
devtree_lock. So sticking with __of_get_property.

Updated patch below with the &status fixed in the description
and speculative use case example left out.

Regards,

Tony

8< --
From: Tony Lindgren 
Date: Tue, 12 Apr 2016 11:37:55 -0700
Subject: [PATCHv3] of: Add generic handling for ePAPR 1.1 fail-sss states

We have devices that are in incomplete state, but still need to be
probed to allow properly idling them for PM. Some examples are
devices that are not pinned out on certain packages, or otherwise
unusable on some SoCs.

Setting status = "disabled" cannot be used for this case. Setting
"disabled" makes Linux ignore these devices so they are not probed
and can never get idled properly by Linux PM runtime.

The proper way deal with the incomplete devices is to probe them,
then allow the driver to check the state, and then disable or idle
the device using PM runtime. To do this, we need to often enable
the device and run some device specific code to idle the device.

Also boards may have needs to disable and idle unused devices.
This may be needed for example if all resources are not wired
for a certain board variant.

It seems we can use the ePAPR 1.1 status fail-sss to do this.
Quoting "Table 2-4 Values for status property" we have "fail-sss":

"Indicates that the device is not operational. A serious error was
 detected in the device and it is unlikely to become operational
 without repair. The sss portion of the value is specific to the
 device and indicates the error condition detected."

We can handle these fail states can be handled in a generic
way. So let's introduce a generic status = "fail-" that
describes a situation where a device driver probe should just
shut down or idle the device if possible and then bail out.
This allows the SoC variant and board specific dts files to set
the status as needed.

The suggested usage in a device driver probe is:

static int foo_probe(struct platform_device *pdev)
{
int err;
const char *status;
...

pm_runtime_enable(&pdev->dev);
pm_runtime_get_sync(&pdev->dev);
pm_runtime_use_autosuspend(&pdev->dev);
...

/* Configure device, load firmware, idle device */
...

if (of_device_is_incomplete(pdev->dev.of_node, &status)) {
if (!strcmp("hw-incomplete-pins", status)) {
dev_info(&pdev->dev,
 "Unusable hardware: Not pinned out\n");
err = -ENODEV;
goto out;
}
if (!strcmp("hw-buggy-dma")) {
dev_warn(&pdev->dev,
 "Replace hardware for working DMA\n");
}
}

/* Continue normal device probe */
...

return 0;

out:
pm_runtime_dont_use_autosuspend(&pdev->dev);
pm_runtime_put_sync(&pdev->dev);
pm_runtime_disable(&pdev->dev);

return err;
}

Cc: Nishanth Menon 
Cc: Tero Kristo 
Cc: Tom Rini 
Signed-off-by: Tony Lindgren 
---

Changes since v2:

- Fixed description example for &status and left out deferred
  probe example

 drivers/of/base.c  | 64 +++---
 include/linux/of.h |  8 +++
 2 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
--- a/drivers/of/base.c
+++ b/drivers/

Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-08-29 Thread Tony Lindgren
* Rob Herring  [160829 17:24]:
> On Mon, Aug 29, 2016 at 5:35 PM, Tony Lindgren  wrote:
> > It seems we can use the ePAPR 1.1 status fail-sss to do this.
> > Quoting "Table 2-4 Values for status property" we have "fail-sss":
> >
> > "Indicates that the device is not operational. A serious error was
> >  detected in the device and it is unlikely to become operational
> >  without repair. The sss portion of the value is specific to the
> >  device and indicates the error condition detected."
> 
> I had read this as 'sss' is just 3 characters, but I guess that doesn't 
> matter.

Yeah I'd assume it does not matter for the length.

> > We can handle these fail states can be handled in a generic
> > way. So let's introduce a generic status = "fail-" that
> > describes a situation where a device driver probe should just
> > shut down or idle the device if possible and then bail out.
> > This allows the SoC variant and board specific dts files to set
> > the status as needed.
> >
> > The suggested usage in a device driver probe is:
> >
> > static int foo_probe(struct platform_device *pdev)
> > {
> > int err;
> > const char *status;
> > ...
> >
> > pm_runtime_enable(&pdev->dev);
> > pm_runtime_get_sync(&pdev->dev);
> > pm_runtime_use_autosuspend(&pdev->dev);
> > ...
> >
> > /* Configure device, load firmware, idle device */
> > ...
> >
> > if (of_device_is_incomplete(pdev->dev.of_node, status)) {
> 
> &status

Oops, I guess I should compile test the example.

> > if (!strcmp("hw-incomplete-pins", status)) {
> > dev_info(&pdev->dev,
> >  "Unusable hardware: Not pinned out\n");
> > err = -ENODEV;
> > goto out;
> > }
> > if (!strcmp("hw-missing-daughter-card")) {
> > err = -EPROBE_DEFER;
> 
> This implies we're going to change this on the fly? I guess
> disabled->okay can already happen.

Well let's assume the bootloader sets some i2c controlled daughter
board with "fail-hw-missing-daughter-card", then in theory kernel
could probe it if it pops up on the i2c bus later on. Not sure if
we want to do this, but it seems we could..

> > +static bool __of_device_is_incomplete(const struct device_node *device,
> > + const char **status)
> > +{
> > +   const char *s, *m = "fail-";
> > +   int slen, mlen;
> > +
> > +   if (!device)
> > +   return false;
> > +
> > +   s = __of_get_property(device, "status", &slen);
> 
> You can use the string helper function here (or is the lock a problem?).

I'll check.

> Overall, seems okay to me.

OK thanks for looking.

Regards,

Tony


Re: [PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

2016-08-29 Thread Rob Herring
On Mon, Aug 29, 2016 at 5:35 PM, Tony Lindgren  wrote:
> We have devices that are in incomplete state, but still need to be
> probed to allow properly idling them for PM. Some examples are
> devices that are not pinned out on certain packages, or otherwise
> unusable on some SoCs.
>
> Setting status = "disabled" cannot be used for this case. Setting
> "disabled" makes Linux ignore these devices so they are not probed
> and can never get idled properly by Linux PM runtime.
>
> The proper way deal with the incomplete devices is to probe them,
> then allow the driver to check the state, and then disable or idle
> the device using PM runtime. To do this, we need to often enable
> the device and run some device specific code to idle the device.
>
> Also boards may have needs to disable and idle unused devices.
> This may be needed for example if all resources are not wired
> for a certain board variant.
>
> It seems we can use the ePAPR 1.1 status fail-sss to do this.
> Quoting "Table 2-4 Values for status property" we have "fail-sss":
>
> "Indicates that the device is not operational. A serious error was
>  detected in the device and it is unlikely to become operational
>  without repair. The sss portion of the value is specific to the
>  device and indicates the error condition detected."

I had read this as 'sss' is just 3 characters, but I guess that doesn't matter.

> We can handle these fail states can be handled in a generic
> way. So let's introduce a generic status = "fail-" that
> describes a situation where a device driver probe should just
> shut down or idle the device if possible and then bail out.
> This allows the SoC variant and board specific dts files to set
> the status as needed.
>
> The suggested usage in a device driver probe is:
>
> static int foo_probe(struct platform_device *pdev)
> {
> int err;
> const char *status;
> ...
>
> pm_runtime_enable(&pdev->dev);
> pm_runtime_get_sync(&pdev->dev);
> pm_runtime_use_autosuspend(&pdev->dev);
> ...
>
> /* Configure device, load firmware, idle device */
> ...
>
> if (of_device_is_incomplete(pdev->dev.of_node, status)) {

&status

> if (!strcmp("hw-incomplete-pins", status)) {
> dev_info(&pdev->dev,
>  "Unusable hardware: Not pinned out\n");
> err = -ENODEV;
> goto out;
> }
> if (!strcmp("hw-missing-daughter-card")) {
> err = -EPROBE_DEFER;

This implies we're going to change this on the fly? I guess
disabled->okay can already happen.

> goto out;
> }
> if (!strcmp("hw-buggy-dma")) {
> dev_warn(&pdev->dev,
>  "Replace hardware for working DMA\n");
> }
> }
>
> /* Continue normal device probe */
> ...
>
> return 0;
>
> out:
> pm_runtime_dont_use_autosuspend(&pdev->dev);
> pm_runtime_put_sync(&pdev->dev);
> pm_runtime_disable(&pdev->dev);
>
> return err;
> }
>
> Cc: Nishanth Menon 
> Cc: Tero Kristo 
> Cc: Tom Rini 
> Signed-off-by: Tony Lindgren 
> ---
> Changes since v1 ([PATCH] of: Add generic handling for hardware incomplete
> fail state):
>
> - Make more generic as suggested by Frank but stick with
>   "operational status of a device" approch most people seem
>   to prefer that
>
> - Pass the failure status back to caller so the caller may
>   attempt to handle the failure status
>
> - Improved the description with more examples
>
>  drivers/of/base.c  | 64 
> +++---
>  include/linux/of.h |  8 +++
>  2 files changed, 69 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index ff37f6d..4d39857 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -550,8 +550,8 @@ EXPORT_SYMBOL(of_machine_is_compatible);
>   *
>   *  @device: Node to check for availability, with locks already held
>   *
> - *  Returns true if the status property is absent or set to "okay" or "ok",
> - *  false otherwise
> + *  Returns true if the status property is absent or set to "okay", "ok",
> + *  or starting with "fail-", false otherwise
>   */
>  static bool __of_device_is_available(const struct device_node *device)
>  {
> @@ -566,7 +566,8 @@ static bool __of_device_is_available(const struct 
> device_node *device)
> return true;
>
> if (statlen > 0) {
> -   if (!strcmp(status, "okay") || !strcmp(status, "ok"))
> +   if (!strcmp(status, "okay") || !strcmp(status, "ok") ||
> +   !strncmp(status, "fail-", 5))
> return true;
> }
>
> @@ -595,6 +596,63 @@ bool of_device_is_available(const struct device_node 
> *device)
>  EXPORT_SYMBOL(of_device_is_