[PATCH v2 1/3] dt-bindings: iio: vadc: Fix documentation of 'reg'

2018-09-06 Thread Matthias Kaehlcke
The documentation of Qualcomm's SPMI PMIC voltage ADC claims that the
'reg' property consists of two values, the SPMI address and the length
of the controller's registers. However the SPMI bus to which it is added
specifies "#size-cells = <0>;". Remove the controller register length
from the documentation of the field and the example.

Signed-off-by: Matthias Kaehlcke 
Reviewed-by: Douglas Anderson 
Reviewed-by: Rob Herring 
Signed-off-by: Matthias Kaehlcke 
--
Changes in v2:
- none
---
 Documentation/devicetree/bindings/iio/adc/qcom,spmi-vadc.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/iio/adc/qcom,spmi-vadc.txt 
b/Documentation/devicetree/bindings/iio/adc/qcom,spmi-vadc.txt
index 0fb46137f936..d0c188e5c922 100644
--- a/Documentation/devicetree/bindings/iio/adc/qcom,spmi-vadc.txt
+++ b/Documentation/devicetree/bindings/iio/adc/qcom,spmi-vadc.txt
@@ -13,7 +13,7 @@ VADC node:
 - reg:
 Usage: required
 Value type: 
-Definition: VADC base address and length in the SPMI PMIC register map.
+Definition: VADC base address in the SPMI PMIC register map.
 
 - #address-cells:
 Usage: required
@@ -104,7 +104,7 @@ Example:
/* VADC node */
pmic_vadc: vadc@3100 {
compatible = "qcom,spmi-vadc";
-   reg = <0x3100 0x100>;
+   reg = <0x3100>;
interrupts = <0x0 0x31 0x0 IRQ_TYPE_EDGE_RISING>;
#address-cells = <1>;
#size-cells = <0>;
-- 
2.19.0.rc2.392.g5ba43deb5a-goog



[PATCH v2 0/3] arm64: dts: qcom: pm8998: Add ADC node and die temperature channel

2018-09-06 Thread Matthias Kaehlcke
This series adds the DT node for the QCOM SPMI PMIC5 ADC and a channel
for the die temperature.

The die temperature is going to be used by the temperature alarm driver
(https://lore.kernel.org/patchwork/project/lkml/list/?series=361416).

My understanding is that some of the ADC channels are/can be universally
useful on devices with a pm8998, while the use of others is device specific
(e.g. AMUX). Siddartha / QCA folks, are there other generally useful ADC
channels that should be added here?

The driver for the QCOM SPMI PMIC5 ADC landed in the 'testing' branch of the
'iio' tree:

https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git/commit/?h=testing=e13d757279bbc59776c8435fb94e54b5a58bdd0b

Matthias Kaehlcke (3):
  dt-bindings: iio: vadc: Fix documentation of 'reg'
  arm64: dts: qcom: pm8998: Add adc node
  arm64: dts: qcom: pm8998: Add die temperature channel node to the ADC

 .../bindings/iio/adc/qcom,spmi-vadc.txt |  4 ++--
 arch/arm64/boot/dts/qcom/pm8998.dtsi| 17 -
 2 files changed, 18 insertions(+), 3 deletions(-)

-- 
2.19.0.rc2.392.g5ba43deb5a-goog



[PATCH v2 2/3] arm64: dts: qcom: pm8998: Add adc node

2018-09-06 Thread Matthias Kaehlcke
This adds the adc node to pm8998 based on the examples in the
bindings. It also fixes the order of the included headers.

Signed-off-by: Matthias Kaehlcke 
--
Changes in v2:
- removed io-channel-ranges attribute
---
 arch/arm64/boot/dts/qcom/pm8998.dtsi | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
b/arch/arm64/boot/dts/qcom/pm8998.dtsi
index 92bed1e7d4bb..41593ebbea2c 100644
--- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
+++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
@@ -1,8 +1,9 @@
 // SPDX-License-Identifier: (GPL-2.0+ OR MIT)
 /* Copyright 2018 Google LLC. */
 
-#include 
+#include 
 #include 
+#include 
 
 _bus {
pm8998_lsid0: pmic@0 {
@@ -11,6 +12,15 @@
#address-cells = <1>;
#size-cells = <0>;
 
+   pm8998_adc: adc@3100 {
+   compatible = "qcom,spmi-adc-rev2";
+   reg = <0x3100>;
+   interrupts = <0x0 0x31 0x0 IRQ_TYPE_EDGE_RISING>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   #io-channel-cells = <1>;
+   };
+
pm8998_gpio: gpios@c000 {
compatible = "qcom,pm8998-gpio", "qcom,spmi-gpio";
reg = <0xc000>;
-- 
2.19.0.rc2.392.g5ba43deb5a-goog



[PATCH v2 2/3] arm64: dts: qcom: pm8998: Add adc node

2018-09-06 Thread Matthias Kaehlcke
This adds the adc node to pm8998 based on the examples in the
bindings. It also fixes the order of the included headers.

Signed-off-by: Matthias Kaehlcke 
--
Changes in v2:
- removed io-channel-ranges attribute
---
 arch/arm64/boot/dts/qcom/pm8998.dtsi | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
b/arch/arm64/boot/dts/qcom/pm8998.dtsi
index 92bed1e7d4bb..41593ebbea2c 100644
--- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
+++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
@@ -1,8 +1,9 @@
 // SPDX-License-Identifier: (GPL-2.0+ OR MIT)
 /* Copyright 2018 Google LLC. */
 
-#include 
+#include 
 #include 
+#include 
 
 _bus {
pm8998_lsid0: pmic@0 {
@@ -11,6 +12,15 @@
#address-cells = <1>;
#size-cells = <0>;
 
+   pm8998_adc: adc@3100 {
+   compatible = "qcom,spmi-adc-rev2";
+   reg = <0x3100>;
+   interrupts = <0x0 0x31 0x0 IRQ_TYPE_EDGE_RISING>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   #io-channel-cells = <1>;
+   };
+
pm8998_gpio: gpios@c000 {
compatible = "qcom,pm8998-gpio", "qcom,spmi-gpio";
reg = <0xc000>;
-- 
2.19.0.rc2.392.g5ba43deb5a-goog



[PATCH v2 3/3] arm64: dts: qcom: pm8998: Add die temperature channel node to the ADC

2018-09-06 Thread Matthias Kaehlcke
Add a channel node for the die temperature to the ADC.

Signed-off-by: Matthias Kaehlcke 
Reviewed-by: Douglas Anderson 
Signed-off-by: Matthias Kaehlcke 
--
Changes in v2:
- none
---
 arch/arm64/boot/dts/qcom/pm8998.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
b/arch/arm64/boot/dts/qcom/pm8998.dtsi
index 41593ebbea2c..d18d4f260fbe 100644
--- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
+++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
@@ -19,6 +19,11 @@
#address-cells = <1>;
#size-cells = <0>;
#io-channel-cells = <1>;
+
+   die-temp {
+   reg = ;
+   label = "die_temp";
+   };
};
 
pm8998_gpio: gpios@c000 {
-- 
2.19.0.rc2.392.g5ba43deb5a-goog



[PATCH v2 3/3] arm64: dts: qcom: pm8998: Add die temperature channel node to the ADC

2018-09-06 Thread Matthias Kaehlcke
Add a channel node for the die temperature to the ADC.

Signed-off-by: Matthias Kaehlcke 
Reviewed-by: Douglas Anderson 
Signed-off-by: Matthias Kaehlcke 
--
Changes in v2:
- none
---
 arch/arm64/boot/dts/qcom/pm8998.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
b/arch/arm64/boot/dts/qcom/pm8998.dtsi
index 41593ebbea2c..d18d4f260fbe 100644
--- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
+++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
@@ -19,6 +19,11 @@
#address-cells = <1>;
#size-cells = <0>;
#io-channel-cells = <1>;
+
+   die-temp {
+   reg = ;
+   label = "die_temp";
+   };
};
 
pm8998_gpio: gpios@c000 {
-- 
2.19.0.rc2.392.g5ba43deb5a-goog



Re: [PATCH 1/3] of/fdt: Scan the root node properties earlier

2018-09-06 Thread Frank Rowand
Hi Rob,

On 09/05/18 14:31, Rob Herring wrote:
> On Wed, Sep 5, 2018 at 4:10 PM Frank Rowand  wrote:
>>
>> On 09/05/18 13:06, Rob Herring wrote:
>>> On Wed, Sep 5, 2018 at 1:19 PM Frank Rowand  wrote:

 On 09/05/18 04:51, Rob Herring wrote:
> On Tue, Sep 4, 2018 at 8:49 PM Frank Rowand  
> wrote:
>>
>> On 08/30/18 12:05, Rob Herring wrote:
>>> Scan the root node properties (#{size,address}-cells) earlier,
>>
>> ^^^
>>  before mdesc->dt_fixup() is called
>>
>>> so that
>>> the dt_root_addr_cells and dt_root_size_cells variables are initialized
>>> and can be used.
>>  by mdesc->dt_fixup()
>
> That's an ARM specific detail. Granted, ARM is the only caller.

 The dt_root_addr_cells and dt_root_size_cells variables are being
 initialized earlier in this patch series so that of_fdt_limit_memory()
 can use them.  The only caller of of_fdt_limit_memory() is
 exynos_dt_fixup(), which is an mdesc->dt_fixup() function.


>
>>>
>>> Cc: Frank Rowand 
>>> Signed-off-by: Rob Herring 
>>> ---
>>>  drivers/of/fdt.c | 7 ---
>>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>> Moving early_init_dt_scan_root() to inside early_init_dt_verify()
>> puts something that has nothing to do with verifying the fdt
>> into a function whose purpose is the verify.  It hides the side
>> effect of initializing the dt_root_addr_cells and dt_root_size_cells
>> variables.
>
> It already has the side effect of setting initial_boot_params which
> every subsequent function needs.

 And that side effect should probably also be moved.
>>>
>>> So 2 functions? One to set the blob and one to verify it. Then we can
>>
>> No, I would not add yet another function.  All of these side effects are
>> an argument in favor of a single setup_machine_fdt(), as I suggested below.
>> Then all of these side effects could be in setup_machine_fdt() instead
>> of hiding them in sub-functions that are called by all of the different
>> architectures.
>>
>>
>>> just let arches decide if they want to do any verification or not.
>>>
>>> Perhaps it should be called fdt_init(blob) and then it is vague enough
>>> I can do whatever I want.
>>>
>> I suggest creating a new function early_init_dt_scan_init_pre_dt_fixup(),
>> move the chunk of code there instead of to early_init_dt_scan_nodes(),
>> and call the new function from setup_machine_fdt(), just before
>> calling  mdesc->dt_fixup().  This would be a little bit more code,
>> but more clearly showing the intent.
>
> I'm trying to reduce the number of functions arches call

 I like that goal.


> and renaming
> would need a bunch of arch changes. This change will also let me make
> early_init_dt_scan_root private as powerpc is the only user. I need to
> dust off a patch for that.
>
> I'd be more inclined to push exynos to remove this altogether. After

 Not a bad idea.

> all, if they claim their bindings are unstable, they can't really
> claim their bootloader is stable/fixed.

 It seems that this series is showing us that maybe the three architecture
 specific (arc, arm, arm64) versions of setup_machine_fdt() should be
 consolidated so that we have consistent behavior for FDT.

 If we had a single setup_machine_fdt() then some of he hidden side
 effects of functions called by setup_machine_fdt() could instead
 be hoisted into setup_machine_fdt().
>>>
>>> Those functions are all quite a bit different. ARM matches the machine
>>> desc while arm64 doesn't have any such thing. How the DTB gets mapped
>>> into virtual space also varies.
>>
>> I argue that they _should be_ made to be more alike than different.  You
>> have only pointed out two differences.  Of those, the mapping could be
>> cleanly handled by an mdesc-> callback.  (I would have to look at the
>> match to see if that could be handled easily, but I would expect so.)
> 
> The machine desc is in no way common and only used on a few arches
> (and not even common across those arches). So there's no way the core
> DT code can just call a mdesc callback without addressing making that
> common first. And callbacks are just another way to call arch specific
> functions which are another thing I'm trying to remove.
> 
>> On the other hand, in a previous reply you considered removing
>> of_fdt_limit_memory(), which is only used for an exynos fixup.  If
>> you do that, then patch 1 disappears, and we can continue to
>> sweep under the rug the side effects that you reminded me of
>> with patch 1.
> 
> I'm inclined to just drop the patch. Seemed like a simple clean-up and
> I'm not interested in doing more right now (did you look at the stack
> of stuff in dt/testing branch). 

Re: [PATCH 1/3] of/fdt: Scan the root node properties earlier

2018-09-06 Thread Frank Rowand
Hi Rob,

On 09/05/18 14:31, Rob Herring wrote:
> On Wed, Sep 5, 2018 at 4:10 PM Frank Rowand  wrote:
>>
>> On 09/05/18 13:06, Rob Herring wrote:
>>> On Wed, Sep 5, 2018 at 1:19 PM Frank Rowand  wrote:

 On 09/05/18 04:51, Rob Herring wrote:
> On Tue, Sep 4, 2018 at 8:49 PM Frank Rowand  
> wrote:
>>
>> On 08/30/18 12:05, Rob Herring wrote:
>>> Scan the root node properties (#{size,address}-cells) earlier,
>>
>> ^^^
>>  before mdesc->dt_fixup() is called
>>
>>> so that
>>> the dt_root_addr_cells and dt_root_size_cells variables are initialized
>>> and can be used.
>>  by mdesc->dt_fixup()
>
> That's an ARM specific detail. Granted, ARM is the only caller.

 The dt_root_addr_cells and dt_root_size_cells variables are being
 initialized earlier in this patch series so that of_fdt_limit_memory()
 can use them.  The only caller of of_fdt_limit_memory() is
 exynos_dt_fixup(), which is an mdesc->dt_fixup() function.


>
>>>
>>> Cc: Frank Rowand 
>>> Signed-off-by: Rob Herring 
>>> ---
>>>  drivers/of/fdt.c | 7 ---
>>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>> Moving early_init_dt_scan_root() to inside early_init_dt_verify()
>> puts something that has nothing to do with verifying the fdt
>> into a function whose purpose is the verify.  It hides the side
>> effect of initializing the dt_root_addr_cells and dt_root_size_cells
>> variables.
>
> It already has the side effect of setting initial_boot_params which
> every subsequent function needs.

 And that side effect should probably also be moved.
>>>
>>> So 2 functions? One to set the blob and one to verify it. Then we can
>>
>> No, I would not add yet another function.  All of these side effects are
>> an argument in favor of a single setup_machine_fdt(), as I suggested below.
>> Then all of these side effects could be in setup_machine_fdt() instead
>> of hiding them in sub-functions that are called by all of the different
>> architectures.
>>
>>
>>> just let arches decide if they want to do any verification or not.
>>>
>>> Perhaps it should be called fdt_init(blob) and then it is vague enough
>>> I can do whatever I want.
>>>
>> I suggest creating a new function early_init_dt_scan_init_pre_dt_fixup(),
>> move the chunk of code there instead of to early_init_dt_scan_nodes(),
>> and call the new function from setup_machine_fdt(), just before
>> calling  mdesc->dt_fixup().  This would be a little bit more code,
>> but more clearly showing the intent.
>
> I'm trying to reduce the number of functions arches call

 I like that goal.


> and renaming
> would need a bunch of arch changes. This change will also let me make
> early_init_dt_scan_root private as powerpc is the only user. I need to
> dust off a patch for that.
>
> I'd be more inclined to push exynos to remove this altogether. After

 Not a bad idea.

> all, if they claim their bindings are unstable, they can't really
> claim their bootloader is stable/fixed.

 It seems that this series is showing us that maybe the three architecture
 specific (arc, arm, arm64) versions of setup_machine_fdt() should be
 consolidated so that we have consistent behavior for FDT.

 If we had a single setup_machine_fdt() then some of he hidden side
 effects of functions called by setup_machine_fdt() could instead
 be hoisted into setup_machine_fdt().
>>>
>>> Those functions are all quite a bit different. ARM matches the machine
>>> desc while arm64 doesn't have any such thing. How the DTB gets mapped
>>> into virtual space also varies.
>>
>> I argue that they _should be_ made to be more alike than different.  You
>> have only pointed out two differences.  Of those, the mapping could be
>> cleanly handled by an mdesc-> callback.  (I would have to look at the
>> match to see if that could be handled easily, but I would expect so.)
> 
> The machine desc is in no way common and only used on a few arches
> (and not even common across those arches). So there's no way the core
> DT code can just call a mdesc callback without addressing making that
> common first. And callbacks are just another way to call arch specific
> functions which are another thing I'm trying to remove.
> 
>> On the other hand, in a previous reply you considered removing
>> of_fdt_limit_memory(), which is only used for an exynos fixup.  If
>> you do that, then patch 1 disappears, and we can continue to
>> sweep under the rug the side effects that you reminded me of
>> with patch 1.
> 
> I'm inclined to just drop the patch. Seemed like a simple clean-up and
> I'm not interested in doing more right now (did you look at the stack
> of stuff in dt/testing branch). 

Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Nadav Amit
at 1:25 PM, Peter Zijlstra  wrote:

> On Thu, Sep 06, 2018 at 07:58:40PM +, Nadav Amit wrote:
>>> With that CR3 trickery, we can rid ourselves of the text_mutex
>>> requirement, since concurrent text_poke is 'safe'. That would clean up
>>> the kgdb code quite a bit.
>> 
>> I don’t know. I’m somewhat worried with multiple mechanisms potentially
>> changing the same code at the same time - and maybe ending up with some
>> mess.
> 
> kgdb only pokes INT3, that should be pretty safe.

Maybe I misunderstand your point. If you want me to get rid of text_mutex
completely, I am afraid it will be able to cause mess by changing the same
piece of code through kprobes and the static-keys mechanism.

I doubt it would work today without failing, but getting rid of text_mutex
is likely to make it even worse.



Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Nadav Amit
at 1:25 PM, Peter Zijlstra  wrote:

> On Thu, Sep 06, 2018 at 07:58:40PM +, Nadav Amit wrote:
>>> With that CR3 trickery, we can rid ourselves of the text_mutex
>>> requirement, since concurrent text_poke is 'safe'. That would clean up
>>> the kgdb code quite a bit.
>> 
>> I don’t know. I’m somewhat worried with multiple mechanisms potentially
>> changing the same code at the same time - and maybe ending up with some
>> mess.
> 
> kgdb only pokes INT3, that should be pretty safe.

Maybe I misunderstand your point. If you want me to get rid of text_mutex
completely, I am afraid it will be able to cause mess by changing the same
piece of code through kprobes and the static-keys mechanism.

I doubt it would work today without failing, but getting rid of text_mutex
is likely to make it even worse.



Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-09-06 Thread Davidlohr Bueso

On Thu, 06 Sep 2018, Peter Zijlstra wrote:


On Fri, Jul 20, 2018 at 01:05:59PM -0700, Davidlohr Bueso wrote:

On Fri, 20 Jul 2018, Andrew Morton wrote:



>I'm surprised.  Is spin_lock_irqsave() significantly more expensive
>than spin_lock_irq()?  Relative to all the other stuff those functions
>are doing?  If so, how come?  Some architectural thing makes
>local_irq_save() much more costly than local_irq_disable()?

For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
the cost of Xen is much higher.


Xen is a moot argument. IIRC the point is that POPF (as used by
*irqrestore()) is a very expensive operation because it changes all
flags and thus has very 'difficult' instruction dependencies, killing
the front end reorder and generating a giant bubble in the pipeline.

Similarly, I suppose PUSHF is an expensive instruction because it needs
all the flags 'stable' and thus needs to wait for a fair number of prior
instructions to retire before it can get on with it.

Combined the whole PUSHF + POPF is _far_ more expensive than STI + CLI,
because the latter only has dependencies on instructions that muck about
with IF -- not that many.


ack.

In fact it turns out that my Xen numbers for this patch were actually
native (popf), and not the xen_restore_fl() as it was using hvm and
not paravirt.

Thanks,
Davidlohr


Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-09-06 Thread Davidlohr Bueso

On Thu, 06 Sep 2018, Peter Zijlstra wrote:


On Fri, Jul 20, 2018 at 01:05:59PM -0700, Davidlohr Bueso wrote:

On Fri, 20 Jul 2018, Andrew Morton wrote:



>I'm surprised.  Is spin_lock_irqsave() significantly more expensive
>than spin_lock_irq()?  Relative to all the other stuff those functions
>are doing?  If so, how come?  Some architectural thing makes
>local_irq_save() much more costly than local_irq_disable()?

For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
the cost of Xen is much higher.


Xen is a moot argument. IIRC the point is that POPF (as used by
*irqrestore()) is a very expensive operation because it changes all
flags and thus has very 'difficult' instruction dependencies, killing
the front end reorder and generating a giant bubble in the pipeline.

Similarly, I suppose PUSHF is an expensive instruction because it needs
all the flags 'stable' and thus needs to wait for a fair number of prior
instructions to retire before it can get on with it.

Combined the whole PUSHF + POPF is _far_ more expensive than STI + CLI,
because the latter only has dependencies on instructions that muck about
with IF -- not that many.


ack.

In fact it turns out that my Xen numbers for this patch were actually
native (popf), and not the xen_restore_fl() as it was using hvm and
not paravirt.

Thanks,
Davidlohr


Re: [PATCH 2/2] mtd: nand: esmt: retrieve ecc requirements from 5th id byte

2018-09-06 Thread Miquel Raynal
Hi Marcel,

Boris Brezillon  wrote on Thu, 6 Sep 2018
22:44:22 +0200:

> On Thu,  6 Sep 2018 10:49:22 +0200
> Marcel Ziswiler  wrote:
> 
> > From: Marcel Ziswiler 
> > 
> > This patch enables support to read the ECC level from the NAND flash
> > using ESMT SLC NAND ID byte 5 information as documented e.g. in the
> > following data sheet:
> > 
> > https://www.esmt.com.tw/upload/pdf/ESMT/datasheets/F59L1G81LA(2Y).pdf
> > 
> > Signed-off-by: Marcel Ziswiler 
> > 
> > ---
> > 
> >  drivers/mtd/nand/raw/nand_esmt.c | 46 
> > 
> >  1 file changed, 46 insertions(+)
> >  create mode 100644 drivers/mtd/nand/raw/nand_esmt.c
> > 
> > diff --git a/drivers/mtd/nand/raw/nand_esmt.c 
> > b/drivers/mtd/nand/raw/nand_esmt.c
> > new file mode 100644
> > index ..360d351ac043
> > --- /dev/null
> > +++ b/drivers/mtd/nand/raw/nand_esmt.c
> > @@ -0,0 +1,46 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2018 Toradex AG
> > + *
> > + * Author: Marcel Ziswiler 
> > + */
> > +
> > +#include 
> > +
> > +static void esmt_nand_decode_id(struct nand_chip *chip)
> > +{
> > +   nand_decode_ext_id(chip);
> > +
> > +   /* Extract ECC requirements from 5th id byte. */
> > +   if (chip->id.len >= 5 && nand_is_slc(chip)) {
> > +   chip->ecc_step_ds = 512;
> > +   switch (chip->id.data[4] & 0x3) {
> > +   case 0x0:
> > +   chip->ecc_strength_ds = 4;
> > +   break;
> > +   case 0x1:
> > +   chip->ecc_strength_ds = 2;
> > +   break;
> > +   case 0x2:
> > +   chip->ecc_strength_ds = 1;
> > +   break;
> > +   default:
> > +   WARN(1, "Could not get ECC info");
> > +   chip->ecc_step_ds = 0;
> > +   break;
> > +   }
> > +   }
> > +}
> > +
> > +static int esmt_nand_init(struct nand_chip *chip)
> > +{
> > +   if (nand_is_slc(chip))
> > +   chip->bbt_options |= NAND_BBT_SCAN2NDPAGE;
> > +
> > +   return 0;
> > +}
> > +
> > +const struct nand_manufacturer_ops esmt_nand_manuf_ops = {
> > +   .detect = esmt_nand_decode_id,
> > +   .init = esmt_nand_init,
> > +};  
> 
> Looks like you forgot to hook the new esmt_nand_manuf_ops to the ESMT
> entry (in the nand_manufacturer table), so as is, the patch is not
> exactly adding support for ECC req parsing ;-).

I missed that.

Will drop both patches from nand/next, waiting for a v2 (please also
reorder the macros in patch 1 as suggested).

Thanks,
Miquèl


Re: [PATCH 2/2] mtd: nand: esmt: retrieve ecc requirements from 5th id byte

2018-09-06 Thread Miquel Raynal
Hi Marcel,

Boris Brezillon  wrote on Thu, 6 Sep 2018
22:44:22 +0200:

> On Thu,  6 Sep 2018 10:49:22 +0200
> Marcel Ziswiler  wrote:
> 
> > From: Marcel Ziswiler 
> > 
> > This patch enables support to read the ECC level from the NAND flash
> > using ESMT SLC NAND ID byte 5 information as documented e.g. in the
> > following data sheet:
> > 
> > https://www.esmt.com.tw/upload/pdf/ESMT/datasheets/F59L1G81LA(2Y).pdf
> > 
> > Signed-off-by: Marcel Ziswiler 
> > 
> > ---
> > 
> >  drivers/mtd/nand/raw/nand_esmt.c | 46 
> > 
> >  1 file changed, 46 insertions(+)
> >  create mode 100644 drivers/mtd/nand/raw/nand_esmt.c
> > 
> > diff --git a/drivers/mtd/nand/raw/nand_esmt.c 
> > b/drivers/mtd/nand/raw/nand_esmt.c
> > new file mode 100644
> > index ..360d351ac043
> > --- /dev/null
> > +++ b/drivers/mtd/nand/raw/nand_esmt.c
> > @@ -0,0 +1,46 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2018 Toradex AG
> > + *
> > + * Author: Marcel Ziswiler 
> > + */
> > +
> > +#include 
> > +
> > +static void esmt_nand_decode_id(struct nand_chip *chip)
> > +{
> > +   nand_decode_ext_id(chip);
> > +
> > +   /* Extract ECC requirements from 5th id byte. */
> > +   if (chip->id.len >= 5 && nand_is_slc(chip)) {
> > +   chip->ecc_step_ds = 512;
> > +   switch (chip->id.data[4] & 0x3) {
> > +   case 0x0:
> > +   chip->ecc_strength_ds = 4;
> > +   break;
> > +   case 0x1:
> > +   chip->ecc_strength_ds = 2;
> > +   break;
> > +   case 0x2:
> > +   chip->ecc_strength_ds = 1;
> > +   break;
> > +   default:
> > +   WARN(1, "Could not get ECC info");
> > +   chip->ecc_step_ds = 0;
> > +   break;
> > +   }
> > +   }
> > +}
> > +
> > +static int esmt_nand_init(struct nand_chip *chip)
> > +{
> > +   if (nand_is_slc(chip))
> > +   chip->bbt_options |= NAND_BBT_SCAN2NDPAGE;
> > +
> > +   return 0;
> > +}
> > +
> > +const struct nand_manufacturer_ops esmt_nand_manuf_ops = {
> > +   .detect = esmt_nand_decode_id,
> > +   .init = esmt_nand_init,
> > +};  
> 
> Looks like you forgot to hook the new esmt_nand_manuf_ops to the ESMT
> entry (in the nand_manufacturer table), so as is, the patch is not
> exactly adding support for ECC req parsing ;-).

I missed that.

Will drop both patches from nand/next, waiting for a v2 (please also
reorder the macros in patch 1 as suggested).

Thanks,
Miquèl


Re: [PATCH 2/2] mtd: nand: esmt: retrieve ecc requirements from 5th id byte

2018-09-06 Thread Boris Brezillon
On Thu,  6 Sep 2018 10:49:22 +0200
Marcel Ziswiler  wrote:

> From: Marcel Ziswiler 
> 
> This patch enables support to read the ECC level from the NAND flash
> using ESMT SLC NAND ID byte 5 information as documented e.g. in the
> following data sheet:
> 
> https://www.esmt.com.tw/upload/pdf/ESMT/datasheets/F59L1G81LA(2Y).pdf
> 
> Signed-off-by: Marcel Ziswiler 
> 
> ---
> 
>  drivers/mtd/nand/raw/nand_esmt.c | 46 
> 
>  1 file changed, 46 insertions(+)
>  create mode 100644 drivers/mtd/nand/raw/nand_esmt.c
> 
> diff --git a/drivers/mtd/nand/raw/nand_esmt.c 
> b/drivers/mtd/nand/raw/nand_esmt.c
> new file mode 100644
> index ..360d351ac043
> --- /dev/null
> +++ b/drivers/mtd/nand/raw/nand_esmt.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2018 Toradex AG
> + *
> + * Author: Marcel Ziswiler 
> + */
> +
> +#include 
> +
> +static void esmt_nand_decode_id(struct nand_chip *chip)
> +{
> + nand_decode_ext_id(chip);
> +
> + /* Extract ECC requirements from 5th id byte. */
> + if (chip->id.len >= 5 && nand_is_slc(chip)) {
> + chip->ecc_step_ds = 512;
> + switch (chip->id.data[4] & 0x3) {
> + case 0x0:
> + chip->ecc_strength_ds = 4;
> + break;
> + case 0x1:
> + chip->ecc_strength_ds = 2;
> + break;
> + case 0x2:
> + chip->ecc_strength_ds = 1;
> + break;
> + default:
> + WARN(1, "Could not get ECC info");
> + chip->ecc_step_ds = 0;
> + break;
> + }
> + }
> +}
> +
> +static int esmt_nand_init(struct nand_chip *chip)
> +{
> + if (nand_is_slc(chip))
> + chip->bbt_options |= NAND_BBT_SCAN2NDPAGE;
> +
> + return 0;
> +}
> +
> +const struct nand_manufacturer_ops esmt_nand_manuf_ops = {
> + .detect = esmt_nand_decode_id,
> + .init = esmt_nand_init,
> +};

Looks like you forgot to hook the new esmt_nand_manuf_ops to the ESMT
entry (in the nand_manufacturer table), so as is, the patch is not
exactly adding support for ECC req parsing ;-).


Re: [PATCH 2/2] mtd: nand: esmt: retrieve ecc requirements from 5th id byte

2018-09-06 Thread Boris Brezillon
On Thu,  6 Sep 2018 10:49:22 +0200
Marcel Ziswiler  wrote:

> From: Marcel Ziswiler 
> 
> This patch enables support to read the ECC level from the NAND flash
> using ESMT SLC NAND ID byte 5 information as documented e.g. in the
> following data sheet:
> 
> https://www.esmt.com.tw/upload/pdf/ESMT/datasheets/F59L1G81LA(2Y).pdf
> 
> Signed-off-by: Marcel Ziswiler 
> 
> ---
> 
>  drivers/mtd/nand/raw/nand_esmt.c | 46 
> 
>  1 file changed, 46 insertions(+)
>  create mode 100644 drivers/mtd/nand/raw/nand_esmt.c
> 
> diff --git a/drivers/mtd/nand/raw/nand_esmt.c 
> b/drivers/mtd/nand/raw/nand_esmt.c
> new file mode 100644
> index ..360d351ac043
> --- /dev/null
> +++ b/drivers/mtd/nand/raw/nand_esmt.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2018 Toradex AG
> + *
> + * Author: Marcel Ziswiler 
> + */
> +
> +#include 
> +
> +static void esmt_nand_decode_id(struct nand_chip *chip)
> +{
> + nand_decode_ext_id(chip);
> +
> + /* Extract ECC requirements from 5th id byte. */
> + if (chip->id.len >= 5 && nand_is_slc(chip)) {
> + chip->ecc_step_ds = 512;
> + switch (chip->id.data[4] & 0x3) {
> + case 0x0:
> + chip->ecc_strength_ds = 4;
> + break;
> + case 0x1:
> + chip->ecc_strength_ds = 2;
> + break;
> + case 0x2:
> + chip->ecc_strength_ds = 1;
> + break;
> + default:
> + WARN(1, "Could not get ECC info");
> + chip->ecc_step_ds = 0;
> + break;
> + }
> + }
> +}
> +
> +static int esmt_nand_init(struct nand_chip *chip)
> +{
> + if (nand_is_slc(chip))
> + chip->bbt_options |= NAND_BBT_SCAN2NDPAGE;
> +
> + return 0;
> +}
> +
> +const struct nand_manufacturer_ops esmt_nand_manuf_ops = {
> + .detect = esmt_nand_decode_id,
> + .init = esmt_nand_init,
> +};

Looks like you forgot to hook the new esmt_nand_manuf_ops to the ESMT
entry (in the nand_manufacturer table), so as is, the patch is not
exactly adding support for ECC req parsing ;-).


Re: [PATCH 1/2] mtd: nand: reorder nand manufacturer ids

2018-09-06 Thread Boris Brezillon
On Thu,  6 Sep 2018 10:49:21 +0200
Marcel Ziswiler  wrote:

> From: Marcel Ziswiler 
> 
> Reorder NAND manufacturer ids for clarity.
> 
> Signed-off-by: Marcel Ziswiler 
> 
> ---
> 
>  drivers/mtd/nand/raw/nand_ids.c | 20 ++--
>  include/linux/mtd/rawnand.h |  8 
>  2 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/nand_ids.c b/drivers/mtd/nand/raw/nand_ids.c
> index 5423c3bb388e..0851cd86bf19 100644
> --- a/drivers/mtd/nand/raw/nand_ids.c
> +++ b/drivers/mtd/nand/raw/nand_ids.c
> @@ -169,21 +169,21 @@ struct nand_flash_dev nand_flash_ids[] = {
>  
>  /* Manufacturer IDs */
>  static const struct nand_manufacturer nand_manufacturers[] = {
> - {NAND_MFR_TOSHIBA, "Toshiba", _nand_manuf_ops},
> + {NAND_MFR_AMD, "AMD/Spansion", _nand_manuf_ops},
> + {NAND_MFR_ATO, "ATO"},
> + {NAND_MFR_EON, "Eon"},
>   {NAND_MFR_ESMT, "ESMT"},
> - {NAND_MFR_SAMSUNG, "Samsung", _nand_manuf_ops},
>   {NAND_MFR_FUJITSU, "Fujitsu"},
> - {NAND_MFR_NATIONAL, "National"},
> - {NAND_MFR_RENESAS, "Renesas"},
> - {NAND_MFR_STMICRO, "ST Micro"},
>   {NAND_MFR_HYNIX, "Hynix", _nand_manuf_ops},
> - {NAND_MFR_MICRON, "Micron", _nand_manuf_ops},
> - {NAND_MFR_AMD, "AMD/Spansion", _nand_manuf_ops},
> + {NAND_MFR_INTEL, "Intel"},
>   {NAND_MFR_MACRONIX, "Macronix", _nand_manuf_ops},
> - {NAND_MFR_EON, "Eon"},
> + {NAND_MFR_MICRON, "Micron", _nand_manuf_ops},
> + {NAND_MFR_NATIONAL, "National"},
> + {NAND_MFR_RENESAS, "Renesas"},
> + {NAND_MFR_SAMSUNG, "Samsung", _nand_manuf_ops},
>   {NAND_MFR_SANDISK, "SanDisk"},
> - {NAND_MFR_INTEL, "Intel"},
> - {NAND_MFR_ATO, "ATO"},
> + {NAND_MFR_STMICRO, "ST Micro"},
> + {NAND_MFR_TOSHIBA, "Toshiba", _nand_manuf_ops},
>   {NAND_MFR_WINBOND, "Winbond"},
>  };
>  
> diff --git a/include/linux/mtd/rawnand.h b/include/linux/mtd/rawnand.h
> index efb2345359bb..ca134851d211 100644
> --- a/include/linux/mtd/rawnand.h
> +++ b/include/linux/mtd/rawnand.h
> @@ -1541,12 +1541,12 @@ nand_manufacturer_name(const struct nand_manufacturer 
> *manufacturer)
>  
>  extern struct nand_flash_dev nand_flash_ids[];
>  
> -extern const struct nand_manufacturer_ops toshiba_nand_manuf_ops;
> -extern const struct nand_manufacturer_ops samsung_nand_manuf_ops;
> -extern const struct nand_manufacturer_ops hynix_nand_manuf_ops;
> -extern const struct nand_manufacturer_ops micron_nand_manuf_ops;
>  extern const struct nand_manufacturer_ops amd_nand_manuf_ops;
> +extern const struct nand_manufacturer_ops hynix_nand_manuf_ops;
>  extern const struct nand_manufacturer_ops macronix_nand_manuf_ops;
> +extern const struct nand_manufacturer_ops micron_nand_manuf_ops;
> +extern const struct nand_manufacturer_ops samsung_nand_manuf_ops;
> +extern const struct nand_manufacturer_ops toshiba_nand_manuf_ops;

I'm okay with this kind of alphabetic reordering, but why not
reordering the macro definitions too?


Re: [PATCH 1/2] mtd: nand: reorder nand manufacturer ids

2018-09-06 Thread Boris Brezillon
On Thu,  6 Sep 2018 10:49:21 +0200
Marcel Ziswiler  wrote:

> From: Marcel Ziswiler 
> 
> Reorder NAND manufacturer ids for clarity.
> 
> Signed-off-by: Marcel Ziswiler 
> 
> ---
> 
>  drivers/mtd/nand/raw/nand_ids.c | 20 ++--
>  include/linux/mtd/rawnand.h |  8 
>  2 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/nand_ids.c b/drivers/mtd/nand/raw/nand_ids.c
> index 5423c3bb388e..0851cd86bf19 100644
> --- a/drivers/mtd/nand/raw/nand_ids.c
> +++ b/drivers/mtd/nand/raw/nand_ids.c
> @@ -169,21 +169,21 @@ struct nand_flash_dev nand_flash_ids[] = {
>  
>  /* Manufacturer IDs */
>  static const struct nand_manufacturer nand_manufacturers[] = {
> - {NAND_MFR_TOSHIBA, "Toshiba", _nand_manuf_ops},
> + {NAND_MFR_AMD, "AMD/Spansion", _nand_manuf_ops},
> + {NAND_MFR_ATO, "ATO"},
> + {NAND_MFR_EON, "Eon"},
>   {NAND_MFR_ESMT, "ESMT"},
> - {NAND_MFR_SAMSUNG, "Samsung", _nand_manuf_ops},
>   {NAND_MFR_FUJITSU, "Fujitsu"},
> - {NAND_MFR_NATIONAL, "National"},
> - {NAND_MFR_RENESAS, "Renesas"},
> - {NAND_MFR_STMICRO, "ST Micro"},
>   {NAND_MFR_HYNIX, "Hynix", _nand_manuf_ops},
> - {NAND_MFR_MICRON, "Micron", _nand_manuf_ops},
> - {NAND_MFR_AMD, "AMD/Spansion", _nand_manuf_ops},
> + {NAND_MFR_INTEL, "Intel"},
>   {NAND_MFR_MACRONIX, "Macronix", _nand_manuf_ops},
> - {NAND_MFR_EON, "Eon"},
> + {NAND_MFR_MICRON, "Micron", _nand_manuf_ops},
> + {NAND_MFR_NATIONAL, "National"},
> + {NAND_MFR_RENESAS, "Renesas"},
> + {NAND_MFR_SAMSUNG, "Samsung", _nand_manuf_ops},
>   {NAND_MFR_SANDISK, "SanDisk"},
> - {NAND_MFR_INTEL, "Intel"},
> - {NAND_MFR_ATO, "ATO"},
> + {NAND_MFR_STMICRO, "ST Micro"},
> + {NAND_MFR_TOSHIBA, "Toshiba", _nand_manuf_ops},
>   {NAND_MFR_WINBOND, "Winbond"},
>  };
>  
> diff --git a/include/linux/mtd/rawnand.h b/include/linux/mtd/rawnand.h
> index efb2345359bb..ca134851d211 100644
> --- a/include/linux/mtd/rawnand.h
> +++ b/include/linux/mtd/rawnand.h
> @@ -1541,12 +1541,12 @@ nand_manufacturer_name(const struct nand_manufacturer 
> *manufacturer)
>  
>  extern struct nand_flash_dev nand_flash_ids[];
>  
> -extern const struct nand_manufacturer_ops toshiba_nand_manuf_ops;
> -extern const struct nand_manufacturer_ops samsung_nand_manuf_ops;
> -extern const struct nand_manufacturer_ops hynix_nand_manuf_ops;
> -extern const struct nand_manufacturer_ops micron_nand_manuf_ops;
>  extern const struct nand_manufacturer_ops amd_nand_manuf_ops;
> +extern const struct nand_manufacturer_ops hynix_nand_manuf_ops;
>  extern const struct nand_manufacturer_ops macronix_nand_manuf_ops;
> +extern const struct nand_manufacturer_ops micron_nand_manuf_ops;
> +extern const struct nand_manufacturer_ops samsung_nand_manuf_ops;
> +extern const struct nand_manufacturer_ops toshiba_nand_manuf_ops;

I'm okay with this kind of alphabetic reordering, but why not
reordering the macro definitions too?


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 03:20:46PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/06/2018 02:47 PM, Sean Christopherson wrote:
> ...
> 
> >>
> >>Yes, the auxiliary array will dumped into the regular .bss when
> >>CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
> >>sure if its worth complicating the code to save those extra memory.
> >>Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.
> >
> >I just realized that we'll try to create a bogus array if 'NR_CPUS <=
> >HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
> >and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:
> >
> >#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
> >#define HVC_AUX_ARRAY_SIZE  \
> > PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
> >sizeof(struct pvclock_vsyscall_time_info))
> >static struct pvclock_vsyscall_time_info
> > hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
> >#endif
> >
> 
> The HVC_BOOT_ARRAY_SIZE macro uses sizeof(..) and to my understanding
> the sizeof operators are not allowed in '#if'. Anyway, I will try to see
> if it can be used, if not then I will stick to CONFIG_AMD_MEM_ENCRYPT
> check.

Hmm, we'll need something otherwise 'NR_CPUS - HVC_BOOT_ARRAY_SIZE'
will wrap and cause build errors.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 03:20:46PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/06/2018 02:47 PM, Sean Christopherson wrote:
> ...
> 
> >>
> >>Yes, the auxiliary array will dumped into the regular .bss when
> >>CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
> >>sure if its worth complicating the code to save those extra memory.
> >>Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.
> >
> >I just realized that we'll try to create a bogus array if 'NR_CPUS <=
> >HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
> >and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:
> >
> >#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
> >#define HVC_AUX_ARRAY_SIZE  \
> > PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
> >sizeof(struct pvclock_vsyscall_time_info))
> >static struct pvclock_vsyscall_time_info
> > hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
> >#endif
> >
> 
> The HVC_BOOT_ARRAY_SIZE macro uses sizeof(..) and to my understanding
> the sizeof operators are not allowed in '#if'. Anyway, I will try to see
> if it can be used, if not then I will stick to CONFIG_AMD_MEM_ENCRYPT
> check.

Hmm, we'll need something otherwise 'NR_CPUS - HVC_BOOT_ARRAY_SIZE'
will wrap and cause build errors.


Re: [PATCH V2 5/6] x86/intel_rdt: Use perf infrastructure for measurements

2018-09-06 Thread Reinette Chatre
On 9/6/2018 1:29 PM, Peter Zijlstra wrote:
> On Thu, Sep 06, 2018 at 01:05:05PM -0700, Reinette Chatre wrote:
>> When I separate the above into the two functions it just becomes either:
>>rdpmcl(l2_hit_pmcnum, l2_hits_after);
>>rdpmcl(l2_miss_pmcnum, l2_miss_after);
>> or:
>>rdpmcl(l3_hit_pmcnum, l3_hits_after);
>>rdpmcl(l3_miss_pmcnum, l3_miss_after);
>>
> 
> Right, which is the exact _same_ code, so you only need a single
> function.
> 

>From my understanding it is not this code specifically that is causing
the cache misses but instead the code and variables used to decide
whether to run them or not. These would still be needed when I extract
the above into inline functions.

Reinette


Re: [PATCH V2 5/6] x86/intel_rdt: Use perf infrastructure for measurements

2018-09-06 Thread Reinette Chatre
On 9/6/2018 1:29 PM, Peter Zijlstra wrote:
> On Thu, Sep 06, 2018 at 01:05:05PM -0700, Reinette Chatre wrote:
>> When I separate the above into the two functions it just becomes either:
>>rdpmcl(l2_hit_pmcnum, l2_hits_after);
>>rdpmcl(l2_miss_pmcnum, l2_miss_after);
>> or:
>>rdpmcl(l3_hit_pmcnum, l3_hits_after);
>>rdpmcl(l3_miss_pmcnum, l3_miss_after);
>>
> 
> Right, which is the exact _same_ code, so you only need a single
> function.
> 

>From my understanding it is not this code specifically that is causing
the cache misses but instead the code and variables used to decide
whether to run them or not. These would still be needed when I extract
the above into inline functions.

Reinette


Re: [PATCH 2/3] arm64: dts: qcom: pm8998: Add adc node

2018-09-06 Thread Matthias Kaehlcke
On Thu, Sep 06, 2018 at 11:34:32AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Mon, Aug 27, 2018 at 10:10 AM, Matthias Kaehlcke  wrote:
> > On Fri, Aug 10, 2018 at 05:09:17PM -0700, Doug Anderson wrote:
> >> Hi,
> >>
> >> On Wed, Aug 8, 2018 at 12:13 PM, Matthias Kaehlcke  
> >> wrote:
> >> > This adds the adc node to pm8998 based on the examples in the
> >> > bindings. It also fixes the order of the included headers.
> >> >
> >> > Signed-off-by: Matthias Kaehlcke 
> >> > ---
> >> >  arch/arm64/boot/dts/qcom/pm8998.dtsi | 13 -
> >> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
> >> > b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> >> > index 92bed1e7d4bb..f70f6101bceb 100644
> >> > --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
> >> > +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> >> > @@ -1,8 +1,9 @@
> >> >  // SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> >> >  /* Copyright 2018 Google LLC. */
> >> >
> >> > -#include 
> >> > +#include 
> >> >  #include 
> >> > +#include 
> >> >
> >> >  _bus {
> >> > pm8998_lsid0: pmic@0 {
> >> > @@ -11,6 +12,16 @@
> >> > #address-cells = <1>;
> >> > #size-cells = <0>;
> >> >
> >> > +   pm8998_adc: adc@3100 {
> >> > +   compatible = "qcom,spmi-adc-rev2";
> >> > +   reg = <0x3100>;
> >> > +   interrupts = <0x0 0x31 0x0 IRQ_TYPE_EDGE_RISING>;
> >> > +   #address-cells = <1>;
> >> > +   #size-cells = <0>;
> >> > +   #io-channel-cells = <1>;
> >> > +   io-channel-ranges;
> >> > +   };
> >>
> >> I'm a little confused about what the "io-channel-ranges" does here.
> >> The documentation isn't clear at all to me for it.  If I'm reading it
> >> right it's also supposed to be for iio-consumers, but you're using it
> >> in a provider.  I see you copied this from the example.  Maybe the
> >> example is wrong?  ...or I'm just confused...
> >
> > Yes, I copied it from the example, its use here is also not clear to
> > me, other ADC providers like adc@126c in exynos3250.dtsi or
> > adc@180a6000 in bcm-cygnus.dtsi also specify it ...
> >
> > Siddartha/Jonathan, could you help to clarify if "io-channel-ranges"
> > should really be specified here as the DT example suggests?
> 
> Does everything work if you just remove the "io-channel-ranges"?

Yes, the ADC channels are still available within the kernel and
through sysfs and provide reasonable values.

> We could remove it and always add it back in later if someone could
> explain what it's for or if we find a reason why it was needed?  ...or
> we have any other ideas for how to get this resolved?  :(

Sounds good, I'll respin with 'io-channel-ranges' removed.


Re: [PATCH 2/3] arm64: dts: qcom: pm8998: Add adc node

2018-09-06 Thread Matthias Kaehlcke
On Thu, Sep 06, 2018 at 11:34:32AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Mon, Aug 27, 2018 at 10:10 AM, Matthias Kaehlcke  wrote:
> > On Fri, Aug 10, 2018 at 05:09:17PM -0700, Doug Anderson wrote:
> >> Hi,
> >>
> >> On Wed, Aug 8, 2018 at 12:13 PM, Matthias Kaehlcke  
> >> wrote:
> >> > This adds the adc node to pm8998 based on the examples in the
> >> > bindings. It also fixes the order of the included headers.
> >> >
> >> > Signed-off-by: Matthias Kaehlcke 
> >> > ---
> >> >  arch/arm64/boot/dts/qcom/pm8998.dtsi | 13 -
> >> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
> >> > b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> >> > index 92bed1e7d4bb..f70f6101bceb 100644
> >> > --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
> >> > +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> >> > @@ -1,8 +1,9 @@
> >> >  // SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> >> >  /* Copyright 2018 Google LLC. */
> >> >
> >> > -#include 
> >> > +#include 
> >> >  #include 
> >> > +#include 
> >> >
> >> >  _bus {
> >> > pm8998_lsid0: pmic@0 {
> >> > @@ -11,6 +12,16 @@
> >> > #address-cells = <1>;
> >> > #size-cells = <0>;
> >> >
> >> > +   pm8998_adc: adc@3100 {
> >> > +   compatible = "qcom,spmi-adc-rev2";
> >> > +   reg = <0x3100>;
> >> > +   interrupts = <0x0 0x31 0x0 IRQ_TYPE_EDGE_RISING>;
> >> > +   #address-cells = <1>;
> >> > +   #size-cells = <0>;
> >> > +   #io-channel-cells = <1>;
> >> > +   io-channel-ranges;
> >> > +   };
> >>
> >> I'm a little confused about what the "io-channel-ranges" does here.
> >> The documentation isn't clear at all to me for it.  If I'm reading it
> >> right it's also supposed to be for iio-consumers, but you're using it
> >> in a provider.  I see you copied this from the example.  Maybe the
> >> example is wrong?  ...or I'm just confused...
> >
> > Yes, I copied it from the example, its use here is also not clear to
> > me, other ADC providers like adc@126c in exynos3250.dtsi or
> > adc@180a6000 in bcm-cygnus.dtsi also specify it ...
> >
> > Siddartha/Jonathan, could you help to clarify if "io-channel-ranges"
> > should really be specified here as the DT example suggests?
> 
> Does everything work if you just remove the "io-channel-ranges"?

Yes, the ADC channels are still available within the kernel and
through sysfs and provide reasonable values.

> We could remove it and always add it back in later if someone could
> explain what it's for or if we find a reason why it was needed?  ...or
> we have any other ideas for how to get this resolved?  :(

Sounds good, I'll respin with 'io-channel-ranges' removed.


[PATCH v2] ARM: s3c24xx: Correct SD card write protect detection on Mini2440

2018-09-06 Thread Cedric Roux
The mini2440 computer uses "active high" to signal that the "write protect"
of the inserted MMC is set. The current code uses the opposite, leading to
a wrong detection of write protection. The solution is simply to use
".wprotect_invert = 1" in the description of the MMC.

Signed-off-by: Cedric Roux 
---
 arch/arm/mach-s3c24xx/mach-mini2440.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-s3c24xx/mach-mini2440.c 
b/arch/arm/mach-s3c24xx/mach-mini2440.c
index 95753e0..4cd8d45 100644
--- a/arch/arm/mach-s3c24xx/mach-mini2440.c
+++ b/arch/arm/mach-s3c24xx/mach-mini2440.c
@@ -232,10 +232,11 @@ static struct s3c2410fb_mach_info mini2440_fb_info 
__initdata = {
 /* MMC/SD  */
 
 static struct s3c24xx_mci_pdata mini2440_mmc_cfg __initdata = {
-   .gpio_detect   = S3C2410_GPG(8),
-   .gpio_wprotect = S3C2410_GPH(8),
-   .set_power = NULL,
-   .ocr_avail = MMC_VDD_32_33|MMC_VDD_33_34,
+   .wprotect_invert = 1,
+   .gpio_detect   = S3C2410_GPG(8),
+   .gpio_wprotect = S3C2410_GPH(8),
+   .set_power = NULL,
+   .ocr_avail = MMC_VDD_32_33|MMC_VDD_33_34,
 };
 
 /* NAND Flash on MINI2440 board */
-- 
2.18.0


[PATCH v2] ARM: s3c24xx: Correct SD card write protect detection on Mini2440

2018-09-06 Thread Cedric Roux
The mini2440 computer uses "active high" to signal that the "write protect"
of the inserted MMC is set. The current code uses the opposite, leading to
a wrong detection of write protection. The solution is simply to use
".wprotect_invert = 1" in the description of the MMC.

Signed-off-by: Cedric Roux 
---
 arch/arm/mach-s3c24xx/mach-mini2440.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-s3c24xx/mach-mini2440.c 
b/arch/arm/mach-s3c24xx/mach-mini2440.c
index 95753e0..4cd8d45 100644
--- a/arch/arm/mach-s3c24xx/mach-mini2440.c
+++ b/arch/arm/mach-s3c24xx/mach-mini2440.c
@@ -232,10 +232,11 @@ static struct s3c2410fb_mach_info mini2440_fb_info 
__initdata = {
 /* MMC/SD  */
 
 static struct s3c24xx_mci_pdata mini2440_mmc_cfg __initdata = {
-   .gpio_detect   = S3C2410_GPG(8),
-   .gpio_wprotect = S3C2410_GPH(8),
-   .set_power = NULL,
-   .ocr_avail = MMC_VDD_32_33|MMC_VDD_33_34,
+   .wprotect_invert = 1,
+   .gpio_detect   = S3C2410_GPG(8),
+   .gpio_wprotect = S3C2410_GPH(8),
+   .set_power = NULL,
+   .ocr_avail = MMC_VDD_32_33|MMC_VDD_33_34,
 };
 
 /* NAND Flash on MINI2440 board */
-- 
2.18.0


Re: [PATCH v1] x86/intel_rdt: Switch to bitmap_zalloc()

2018-09-06 Thread Fenghua Yu
On Thu, Aug 30, 2018 at 02:50:39PM +0300, Andy Shevchenko wrote:
> Switch to bitmap_zalloc() to show clearly what we are allocating.
> Besides that it returns pointer of bitmap type instead of opaque void *.
> 
> Signed-off-by: Andy Shevchenko 
> ---
>  arch/x86/kernel/cpu/intel_rdt.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 

Acked-by: Fenghua Yu 


Re: [PATCH] seccomp: remove unnecessary unlikely()

2018-09-06 Thread James Morris
On Wed, 5 Sep 2018, Igor Stoppa wrote:

> WARN_ON() already contains an unlikely(), so it's not necessary to wrap it
> into another.
> 
> Signed-off-by: Igor Stoppa 
> Acked-by: Kees Cook 
> Cc: linux-security-mod...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
next-general

and next-testing.


Thanks!

-- 
James Morris




Re: [PATCH v1] x86/intel_rdt: Switch to bitmap_zalloc()

2018-09-06 Thread Fenghua Yu
On Thu, Aug 30, 2018 at 02:50:39PM +0300, Andy Shevchenko wrote:
> Switch to bitmap_zalloc() to show clearly what we are allocating.
> Besides that it returns pointer of bitmap type instead of opaque void *.
> 
> Signed-off-by: Andy Shevchenko 
> ---
>  arch/x86/kernel/cpu/intel_rdt.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 

Acked-by: Fenghua Yu 


Re: [PATCH] seccomp: remove unnecessary unlikely()

2018-09-06 Thread James Morris
On Wed, 5 Sep 2018, Igor Stoppa wrote:

> WARN_ON() already contains an unlikely(), so it's not necessary to wrap it
> into another.
> 
> Signed-off-by: Igor Stoppa 
> Acked-by: Kees Cook 
> Cc: linux-security-mod...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
next-general

and next-testing.


Thanks!

-- 
James Morris




Re: [RFC PATCH 1/2] mm: move tlb_table_flush to tlb_flush_mmu_free

2018-09-06 Thread Rik van Riel
On Thu, 2018-08-23 at 18:47 +1000, Nicholas Piggin wrote:
> There is no need to call this from tlb_flush_mmu_tlbonly, it
> logically belongs with tlb_flush_mmu_free. This allows some
> code consolidation with a subsequent fix.
> 
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Rik van Riel 

This patch also fixes an infinite recursion bug
with CONFIG_HAVE_RCU_TABLE_FREE enabled, which
has this call trace:

tlb_table_flush
  -> tlb_table_invalidate
 -> tlb_flush_mmu_tlbonly
-> tlb_table_flush
   -> ... (infinite recursion)

This should probably be applied sooner rather than
later.

-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part


Re: [RFC PATCH 1/2] mm: move tlb_table_flush to tlb_flush_mmu_free

2018-09-06 Thread Rik van Riel
On Thu, 2018-08-23 at 18:47 +1000, Nicholas Piggin wrote:
> There is no need to call this from tlb_flush_mmu_tlbonly, it
> logically belongs with tlb_flush_mmu_free. This allows some
> code consolidation with a subsequent fix.
> 
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Rik van Riel 

This patch also fixes an infinite recursion bug
with CONFIG_HAVE_RCU_TABLE_FREE enabled, which
has this call trace:

tlb_table_flush
  -> tlb_table_invalidate
 -> tlb_flush_mmu_tlbonly
-> tlb_table_flush
   -> ... (infinite recursion)

This should probably be applied sooner rather than
later.

-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH V2 5/6] x86/intel_rdt: Use perf infrastructure for measurements

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 01:05:05PM -0700, Reinette Chatre wrote:
> When I separate the above into the two functions it just becomes either:
>rdpmcl(l2_hit_pmcnum, l2_hits_after);
>rdpmcl(l2_miss_pmcnum, l2_miss_after);
> or:
>rdpmcl(l3_hit_pmcnum, l3_hits_after);
>rdpmcl(l3_miss_pmcnum, l3_miss_after);
> 

Right, which is the exact _same_ code, so you only need a single
function.


Re: [PATCH V2 5/6] x86/intel_rdt: Use perf infrastructure for measurements

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 01:05:05PM -0700, Reinette Chatre wrote:
> When I separate the above into the two functions it just becomes either:
>rdpmcl(l2_hit_pmcnum, l2_hits_after);
>rdpmcl(l2_miss_pmcnum, l2_miss_after);
> or:
>rdpmcl(l3_hit_pmcnum, l3_hits_after);
>rdpmcl(l3_miss_pmcnum, l3_miss_after);
> 

Right, which is the exact _same_ code, so you only need a single
function.


Re: [PATCH] x86/mm/pat: Fix L1TF stable backport for CPA

2018-09-06 Thread Andi Kleen
> And what about populate_pud?
> set_pud(pud, pud_mkhuge(pfn_pud(cpa->pfn,
>canon_pgprot(pud_pgprot;
> 
> start += PUD_SIZE;
> cpa->pfn  += PUD_SIZE;

Yes you're right. That case needs to be fixed too.

Are you sending a patch, or should I?

-Andi


Re: [PATCH] x86/mm/pat: Fix L1TF stable backport for CPA

2018-09-06 Thread Andi Kleen
> And what about populate_pud?
> set_pud(pud, pud_mkhuge(pfn_pud(cpa->pfn,
>canon_pgprot(pud_pgprot;
> 
> start += PUD_SIZE;
> cpa->pfn  += PUD_SIZE;

Yes you're right. That case needs to be fixed too.

Are you sending a patch, or should I?

-Andi


Re: [PATCH] x86: use WRITE_ONCE() when setting PTEs

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 01:12:14PM -0700, Nadav Amit wrote:
> at 12:57 PM, Peter Zijlstra  wrote:
> 
> > On Sun, Sep 02, 2018 at 11:14:50AM -0700, Nadav Amit wrote:
> >> When page-table entries are set, the compiler might optimize their
> >> assignment by using multiple instructions to set the PTE. This might
> >> turn into a security hazard if the user somehow manages to use the
> >> interim PTE. L1TF does not make our lives easier, making even an interim
> >> non-present PTE a security hazard.
> >> 
> >> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
> >> security hazard.
> >> 
> >> I skimmed the differences in the binary with and without this patch. The
> >> differences are (obviously) greater when CONFIG_PARAVIRT=n as more
> >> code optimizations are possible. For better and worse, the impact on the
> >> binary with this patch is pretty small. Skimming the code did not cause
> >> anything to jump out as a security hazard, but it seems that at least
> >> move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
> > 
> > Acked-by: Peter Zijlstra (Intel) 
> > 
> > Also, its corollary would also make sense/be required, use READ_ONCE()
> > when reading these.
> 
> I don’t know. This would obviously be much more intrusive. I can add a
> get_pte() and write a Coccinelle script to use it instead of reading the
> PTE, but in most cases, I presume, it would be an overkill.
> 
> The reason for that is that the PTEs are supposed to be accessed while
> holding the page-table lock, and the hardware can only change dirty & access
> bits. I think that any code that assumes that these bits do not change while
> holding the lock is already broken in more ways.

There are lockless readers, but I just checked, mm/gup.c already uses
READ_ONCE(), so that should be fine.


Re: [PATCH] x86: use WRITE_ONCE() when setting PTEs

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 01:12:14PM -0700, Nadav Amit wrote:
> at 12:57 PM, Peter Zijlstra  wrote:
> 
> > On Sun, Sep 02, 2018 at 11:14:50AM -0700, Nadav Amit wrote:
> >> When page-table entries are set, the compiler might optimize their
> >> assignment by using multiple instructions to set the PTE. This might
> >> turn into a security hazard if the user somehow manages to use the
> >> interim PTE. L1TF does not make our lives easier, making even an interim
> >> non-present PTE a security hazard.
> >> 
> >> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
> >> security hazard.
> >> 
> >> I skimmed the differences in the binary with and without this patch. The
> >> differences are (obviously) greater when CONFIG_PARAVIRT=n as more
> >> code optimizations are possible. For better and worse, the impact on the
> >> binary with this patch is pretty small. Skimming the code did not cause
> >> anything to jump out as a security hazard, but it seems that at least
> >> move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
> > 
> > Acked-by: Peter Zijlstra (Intel) 
> > 
> > Also, its corollary would also make sense/be required, use READ_ONCE()
> > when reading these.
> 
> I don’t know. This would obviously be much more intrusive. I can add a
> get_pte() and write a Coccinelle script to use it instead of reading the
> PTE, but in most cases, I presume, it would be an overkill.
> 
> The reason for that is that the PTEs are supposed to be accessed while
> holding the page-table lock, and the hardware can only change dirty & access
> bits. I think that any code that assumes that these bits do not change while
> holding the lock is already broken in more ways.

There are lockless readers, but I just checked, mm/gup.c already uses
READ_ONCE(), so that should be fine.


Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 07:58:40PM +, Nadav Amit wrote:
> > With that CR3 trickery, we can rid ourselves of the text_mutex
> > requirement, since concurrent text_poke is 'safe'. That would clean up
> > the kgdb code quite a bit.
> 
> I don’t know. I’m somewhat worried with multiple mechanisms potentially
> changing the same code at the same time - and maybe ending up with some
> mess.

kgdb only pokes INT3, that should be pretty safe.


Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 07:58:40PM +, Nadav Amit wrote:
> > With that CR3 trickery, we can rid ourselves of the text_mutex
> > requirement, since concurrent text_poke is 'safe'. That would clean up
> > the kgdb code quite a bit.
> 
> I don’t know. I’m somewhat worried with multiple mechanisms potentially
> changing the same code at the same time - and maybe ending up with some
> mess.

kgdb only pokes INT3, that should be pretty safe.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 02:47 PM, Sean Christopherson wrote:
...



Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


I just realized that we'll try to create a bogus array if 'NR_CPUS <=
HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
#define HVC_AUX_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
#endif



The HVC_BOOT_ARRAY_SIZE macro uses sizeof(..) and to my understanding
the sizeof operators are not allowed in '#if'. Anyway, I will try to see
if it can be used, if not then I will stick to CONFIG_AMD_MEM_ENCRYPT
check.

-Brijesh


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 02:47 PM, Sean Christopherson wrote:
...



Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


I just realized that we'll try to create a bogus array if 'NR_CPUS <=
HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
#define HVC_AUX_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
#endif



The HVC_BOOT_ARRAY_SIZE macro uses sizeof(..) and to my understanding
the sizeof operators are not allowed in '#if'. Anyway, I will try to see
if it can be used, if not then I will stick to CONFIG_AMD_MEM_ENCRYPT
check.

-Brijesh


Re: [PATCH] arm64: dts: qcom: sdm845-mtp: pm8998 and pmi8998 regulators

2018-09-06 Thread Doug Anderson
Hi,

On Thu, Sep 6, 2018 at 1:12 PM, Bjorn Andersson
 wrote:
>> * Seems to have a few rails named differently LDO22 is named
>> "vreg_l22a_2p95" in your DTS but "vreg_l22a_2p85" in mine as one
>> example.  The schematics I have (from Dec 5, 2017) show it as 2p85.
>>
>
> Looks like a typo on my part, yours matches the schematics I have as
> well.

OK.  That was just one example.  Maybe do a diff of the two and see if
you find any where you believe your patch was right and mine was
wrong.


>> * Is lacking many alternate names for rails.  We can debate this also
>> if you want.
>>
>
> Afaict names such as "vdda_pcie_1p2" is the name of the pin on the
> SDM845, while the thing that comes out of the regulator is named
> vreg_l26a_1p2.
>
> So I believe this name should be used in the pcie node as:
>
>   vdda_pcie_1p2-supply = <_l26a_1p2>;

Adding labels for things like "vdda_pcie_1p2" is based on a suggestion
Stephen Boyd made to me that he felt it would be nice to be able to
refer to things by the name of the supply pin on the SoC.  This could
possibly enable better sharing of device tree snippets between
different boards using the same SoC and the idea seemed sane to me, so
I implemented it.  I described it in my commit message like this:

- Regulators that are hooked up to supply pins on the SoC are given
  an alias matching the name of that pin (pin name comes from the
  Qualcomm SoC "device specification" doc).

If folks don't like it, though, it's easy to remove.

>> * Have a few voltage values different.  If you have better info than
>> me we should update to yours.  Diffing against yours does make me
>> believe that perhaps LDO14 should be listed as 1.88 V in my patch.
>>
>
> Downstream it's listed as min: 1.8V max: 1.88V init: 1.8V.

I can change mine to that.  It's a very strange specification though.
Is it really 1.80 - 1.88 V?  Does something out there actually need
the voltage to be variable in this small range?


>> * Is lacking pm8005.
>>
>
> I didn't need this, yet...

OK.  I can certainly remove it as needed--the downstream patch I
started with had it so I figured I'd keep it in.


-Doug


Re: [PATCH] arm64: dts: qcom: sdm845-mtp: pm8998 and pmi8998 regulators

2018-09-06 Thread Doug Anderson
Hi,

On Thu, Sep 6, 2018 at 1:12 PM, Bjorn Andersson
 wrote:
>> * Seems to have a few rails named differently LDO22 is named
>> "vreg_l22a_2p95" in your DTS but "vreg_l22a_2p85" in mine as one
>> example.  The schematics I have (from Dec 5, 2017) show it as 2p85.
>>
>
> Looks like a typo on my part, yours matches the schematics I have as
> well.

OK.  That was just one example.  Maybe do a diff of the two and see if
you find any where you believe your patch was right and mine was
wrong.


>> * Is lacking many alternate names for rails.  We can debate this also
>> if you want.
>>
>
> Afaict names such as "vdda_pcie_1p2" is the name of the pin on the
> SDM845, while the thing that comes out of the regulator is named
> vreg_l26a_1p2.
>
> So I believe this name should be used in the pcie node as:
>
>   vdda_pcie_1p2-supply = <_l26a_1p2>;

Adding labels for things like "vdda_pcie_1p2" is based on a suggestion
Stephen Boyd made to me that he felt it would be nice to be able to
refer to things by the name of the supply pin on the SoC.  This could
possibly enable better sharing of device tree snippets between
different boards using the same SoC and the idea seemed sane to me, so
I implemented it.  I described it in my commit message like this:

- Regulators that are hooked up to supply pins on the SoC are given
  an alias matching the name of that pin (pin name comes from the
  Qualcomm SoC "device specification" doc).

If folks don't like it, though, it's easy to remove.

>> * Have a few voltage values different.  If you have better info than
>> me we should update to yours.  Diffing against yours does make me
>> believe that perhaps LDO14 should be listed as 1.88 V in my patch.
>>
>
> Downstream it's listed as min: 1.8V max: 1.88V init: 1.8V.

I can change mine to that.  It's a very strange specification though.
Is it really 1.80 - 1.88 V?  Does something out there actually need
the voltage to be variable in this small range?


>> * Is lacking pm8005.
>>
>
> I didn't need this, yet...

OK.  I can certainly remove it as needed--the downstream patch I
started with had it so I figured I'd keep it in.


-Doug


[PATCH] locking/rwsem: Make owner store task pointer of last owning reader

2018-09-06 Thread Waiman Long
Currently, when a reader acquires a lock, it only sets the
RWSEM_READER_OWNED bit in the owner field. The other bits are simply
not used. When debugging hanging cases involving rwsems and readers,
the owner value does not provide much useful information at all.

This patch modifies the current behavior to always store the task_struct
pointer of the last rwsem-acquiring reader in a reader-owned rwsem. This
may be useful in debugging rwsem hanging cases especially if only one
reader is involved. However, the task in the owner field may not the
real owner or one of the real owners at all when the owner value is
examined, for example, in a crash dump. So it is just an additional
hint about the past history.

If CONFIG_DEBUG_RWSEMS is enabled, the owner field will be checked at
unlock time too to make sure the task pointer value is valid. That does
have a slight performance cost and so is only enabled as part of that
debug option.

>From the performance point of view, it is expected that the changes
shouldn't have any noticeable performance impact. A rwsem microbenchmark
(with 48 worker threads and 1:1 reader/writer ratio) was ran on a
2-socket 24-core 48-thread Haswell system.  The locking rates on a
4.19-rc1 based kernel were as follows:

  1) Unpatched kernel:  543.3 kops/s
  2) Patched kernel:549.2 kops/s
  3) Patched kernel (CONFIG_DEBUG_RWSEMS on):   546.6 kops/s

There was actually a slight increase in performance (1.1%) in this
particular case. Maybe it was caused by the elimination of a branch or
just a testing noise. Turning on the CONFIG_DEBUG_RWSEMS option also
had less than the expected impact on performance.

The least significant 2 bits of the owner value are now used to designate
the rwsem is readers owned and the owners are anonymous.

Signed-off-by: Waiman Long 
---
 include/linux/rwsem.h   |  4 +-
 kernel/locking/rwsem-xadd.c |  2 +-
 kernel/locking/rwsem.c  |  7 ++--
 kernel/locking/rwsem.h  | 95 +
 4 files changed, 78 insertions(+), 30 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index ab93b6e..67dbb57 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -45,10 +45,10 @@ struct rw_semaphore {
 };
 
 /*
- * Setting bit 0 of the owner field with other non-zero bits will indicate
+ * Setting bit 1 of the owner field but not bit 0 will indicate
  * that the rwsem is writer-owned with an unknown owner.
  */
-#define RWSEM_OWNER_UNKNOWN((struct task_struct *)-1L)
+#define RWSEM_OWNER_UNKNOWN((struct task_struct *)-2L)
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_read_failed_killable(struct 
rw_semaphore *sem);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 01fcb80..09b1800 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -180,7 +180,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 * but it gives the spinners an early indication that the
 * readers now have the lock.
 */
-   rwsem_set_reader_owned(sem);
+   __rwsem_set_reader_owned(sem, waiter->task);
}
 
/*
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 776308d..e586f0d 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -117,8 +117,9 @@ int down_write_trylock(struct rw_semaphore *sem)
 void up_read(struct rw_semaphore *sem)
 {
rwsem_release(>dep_map, 1, _RET_IP_);
-   DEBUG_RWSEMS_WARN_ON(sem->owner != RWSEM_READER_OWNED);
+   DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED));
 
+   rwsem_clear_reader_owned(sem);
__up_read(sem);
 }
 
@@ -181,7 +182,7 @@ void down_read_non_owner(struct rw_semaphore *sem)
might_sleep();
 
__down_read(sem);
-   rwsem_set_reader_owned(sem);
+   __rwsem_set_reader_owned(sem, NULL);
 }
 
 EXPORT_SYMBOL(down_read_non_owner);
@@ -215,7 +216,7 @@ int __sched down_write_killable_nested(struct rw_semaphore 
*sem, int subclass)
 
 void up_read_non_owner(struct rw_semaphore *sem)
 {
-   DEBUG_RWSEMS_WARN_ON(sem->owner != RWSEM_READER_OWNED);
+   DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED));
__up_read(sem);
 }
 
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index b9d0e72..bad2bca 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -1,24 +1,30 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READER_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
+ * The least significant 2 bits of the owner value has the following
+ * meanings when set.
+ *  - RWSEM_READER_OWNED 

[PATCH] locking/rwsem: Make owner store task pointer of last owning reader

2018-09-06 Thread Waiman Long
Currently, when a reader acquires a lock, it only sets the
RWSEM_READER_OWNED bit in the owner field. The other bits are simply
not used. When debugging hanging cases involving rwsems and readers,
the owner value does not provide much useful information at all.

This patch modifies the current behavior to always store the task_struct
pointer of the last rwsem-acquiring reader in a reader-owned rwsem. This
may be useful in debugging rwsem hanging cases especially if only one
reader is involved. However, the task in the owner field may not the
real owner or one of the real owners at all when the owner value is
examined, for example, in a crash dump. So it is just an additional
hint about the past history.

If CONFIG_DEBUG_RWSEMS is enabled, the owner field will be checked at
unlock time too to make sure the task pointer value is valid. That does
have a slight performance cost and so is only enabled as part of that
debug option.

>From the performance point of view, it is expected that the changes
shouldn't have any noticeable performance impact. A rwsem microbenchmark
(with 48 worker threads and 1:1 reader/writer ratio) was ran on a
2-socket 24-core 48-thread Haswell system.  The locking rates on a
4.19-rc1 based kernel were as follows:

  1) Unpatched kernel:  543.3 kops/s
  2) Patched kernel:549.2 kops/s
  3) Patched kernel (CONFIG_DEBUG_RWSEMS on):   546.6 kops/s

There was actually a slight increase in performance (1.1%) in this
particular case. Maybe it was caused by the elimination of a branch or
just a testing noise. Turning on the CONFIG_DEBUG_RWSEMS option also
had less than the expected impact on performance.

The least significant 2 bits of the owner value are now used to designate
the rwsem is readers owned and the owners are anonymous.

Signed-off-by: Waiman Long 
---
 include/linux/rwsem.h   |  4 +-
 kernel/locking/rwsem-xadd.c |  2 +-
 kernel/locking/rwsem.c  |  7 ++--
 kernel/locking/rwsem.h  | 95 +
 4 files changed, 78 insertions(+), 30 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index ab93b6e..67dbb57 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -45,10 +45,10 @@ struct rw_semaphore {
 };
 
 /*
- * Setting bit 0 of the owner field with other non-zero bits will indicate
+ * Setting bit 1 of the owner field but not bit 0 will indicate
  * that the rwsem is writer-owned with an unknown owner.
  */
-#define RWSEM_OWNER_UNKNOWN((struct task_struct *)-1L)
+#define RWSEM_OWNER_UNKNOWN((struct task_struct *)-2L)
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_read_failed_killable(struct 
rw_semaphore *sem);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 01fcb80..09b1800 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -180,7 +180,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 * but it gives the spinners an early indication that the
 * readers now have the lock.
 */
-   rwsem_set_reader_owned(sem);
+   __rwsem_set_reader_owned(sem, waiter->task);
}
 
/*
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 776308d..e586f0d 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -117,8 +117,9 @@ int down_write_trylock(struct rw_semaphore *sem)
 void up_read(struct rw_semaphore *sem)
 {
rwsem_release(>dep_map, 1, _RET_IP_);
-   DEBUG_RWSEMS_WARN_ON(sem->owner != RWSEM_READER_OWNED);
+   DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED));
 
+   rwsem_clear_reader_owned(sem);
__up_read(sem);
 }
 
@@ -181,7 +182,7 @@ void down_read_non_owner(struct rw_semaphore *sem)
might_sleep();
 
__down_read(sem);
-   rwsem_set_reader_owned(sem);
+   __rwsem_set_reader_owned(sem, NULL);
 }
 
 EXPORT_SYMBOL(down_read_non_owner);
@@ -215,7 +216,7 @@ int __sched down_write_killable_nested(struct rw_semaphore 
*sem, int subclass)
 
 void up_read_non_owner(struct rw_semaphore *sem)
 {
-   DEBUG_RWSEMS_WARN_ON(sem->owner != RWSEM_READER_OWNED);
+   DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED));
__up_read(sem);
 }
 
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index b9d0e72..bad2bca 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -1,24 +1,30 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READER_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
+ * The least significant 2 bits of the owner value has the following
+ * meanings when set.
+ *  - RWSEM_READER_OWNED 

Re: [PATCH v10 2/2] leds: sc27xx: Add pattern_set/clear interfaces for LED controller

2018-09-06 Thread Jacek Anaszewski
Hi Baolin,

Thank you for the patch. I really appreciate your effort.

I see one more thing I forgot to mention in the previous
review. Please take a look at my comment to pattern_set().

On 09/06/2018 04:37 AM, Baolin Wang wrote:
> This patch implements the 'pattern_set'and 'pattern_clear'
> interfaces to support SC27XX LED breathing mode.
> 
> Signed-off-by: Baolin Wang 
> ---
> Changes from v9:
>  - Optimize the ABI documentation file.
>  - Update the brightness value in hardware pattern mode.
> 
> Changes from v8:
>  - Optimize the ABI documentation file.
> 
> Changes from v7:
>  - Add its own ABI documentation file.
> 
> Changes from v6:
>  - None.
> 
> Changes from v5:
>  - None.
> 
> Changes from v4:
>  - None.
> 
> Changes from v3:
>  - None.
> 
> Changes from v2:
>  - None.
> 
> Changes from v1:
>  - Remove pattern_get interface.
> ---
>  .../ABI/testing/sysfs-class-led-driver-sc27xx  |   22 +
>  drivers/leds/leds-sc27xx-bltc.c|  103 
> 
>  2 files changed, 125 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx 
> b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
> new file mode 100644
> index 000..d8056d5
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
> @@ -0,0 +1,22 @@
> +What:/sys/class/leds//hw_pattern
> +Date:September 2018
> +KernelVersion:   4.20
> +Description:
> + Specify a hardware pattern for the SC27XX LED. For the SC27XX
> + LED controller, it only supports 4 stages to make a single
> + hardware pattern, which is used to configure the rise time,
> + high time, fall time and low time for the breathing mode.
> +
> + For the breathing mode, the SC27XX LED only expects one 
> brightness
> + for the high stage. To be compatible with the hardware pattern
> + format, we should set brightness as 0 for rise stage, fall
> + stage and low stage.
> +
> + Min stage duration: 125 ms
> + Max stage duration: 31875 ms
> +
> + Since the stage duration step is 125 ms, the duration must be
> + a multiplier of 125, like 125ms, 250ms, 375ms, 500ms ... 
> 31875ms.
> +
> + Thus the format of the hardware pattern values should be:
> + "0 rise_duration brightness high_duration 0 fall_duration 0 
> low_duration".
> diff --git a/drivers/leds/leds-sc27xx-bltc.c b/drivers/leds/leds-sc27xx-bltc.c
> index 9d9b7aa..558a325 100644
> --- a/drivers/leds/leds-sc27xx-bltc.c
> +++ b/drivers/leds/leds-sc27xx-bltc.c
> @@ -32,8 +32,15 @@
>  #define SC27XX_DUTY_MASK GENMASK(15, 0)
>  #define SC27XX_MOD_MASK  GENMASK(7, 0)
>  
> +#define SC27XX_CURVE_SHIFT   8
> +#define SC27XX_CURVE_L_MASK  GENMASK(7, 0)
> +#define SC27XX_CURVE_H_MASK  GENMASK(15, 8)
> +
>  #define SC27XX_LEDS_OFFSET   0x10
>  #define SC27XX_LEDS_MAX  3
> +#define SC27XX_LEDS_PATTERN_CNT  4
> +/* Stage duration step, in milliseconds */
> +#define SC27XX_LEDS_STEP 125
>  
>  struct sc27xx_led {
>   char name[LED_MAX_NAME_SIZE];
> @@ -122,6 +129,98 @@ static int sc27xx_led_set(struct led_classdev *ldev, 
> enum led_brightness value)
>   return err;
>  }
>  
> +static int sc27xx_led_pattern_clear(struct led_classdev *ldev)
> +{
> + struct sc27xx_led *leds = to_sc27xx_led(ldev);
> + struct regmap *regmap = leds->priv->regmap;
> + u32 base = sc27xx_led_get_offset(leds);
> + u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL;
> + u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line;
> + int err;
> +
> + mutex_lock(>priv->lock);
> +
> + /* Reset the rise, high, fall and low time to zero. */
> + regmap_write(regmap, base + SC27XX_LEDS_CURVE0, 0);
> + regmap_write(regmap, base + SC27XX_LEDS_CURVE1, 0);
> +
> + err = regmap_update_bits(regmap, ctrl_base,
> + (SC27XX_LED_RUN | SC27XX_LED_TYPE) << ctrl_shift, 0);
> +
> + ldev->brightness = LED_OFF;
> +
> + mutex_unlock(>priv->lock);
> +
> + return err;
> +}
> +
> +static int sc27xx_led_pattern_set(struct led_classdev *ldev,
> +   struct led_pattern *pattern,
> +   int len, u32 repeat)
> +{
> + struct sc27xx_led *leds = to_sc27xx_led(ldev);
> + u32 base = sc27xx_led_get_offset(leds);
> + u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL;
> + u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line;
> + struct regmap *regmap = leds->priv->regmap;
> + int err;
> +
> + /*
> +  * Must contain 4 tuples to configure the rise time, high time, fall
> +  * time and low time to enable the breathing mode.
> +  */
> + if (len != SC27XX_LEDS_PATTERN_CNT)
> + return -EINVAL;
> +
> + mutex_lock(>priv->lock);


Re: [PATCH v10 2/2] leds: sc27xx: Add pattern_set/clear interfaces for LED controller

2018-09-06 Thread Jacek Anaszewski
Hi Baolin,

Thank you for the patch. I really appreciate your effort.

I see one more thing I forgot to mention in the previous
review. Please take a look at my comment to pattern_set().

On 09/06/2018 04:37 AM, Baolin Wang wrote:
> This patch implements the 'pattern_set'and 'pattern_clear'
> interfaces to support SC27XX LED breathing mode.
> 
> Signed-off-by: Baolin Wang 
> ---
> Changes from v9:
>  - Optimize the ABI documentation file.
>  - Update the brightness value in hardware pattern mode.
> 
> Changes from v8:
>  - Optimize the ABI documentation file.
> 
> Changes from v7:
>  - Add its own ABI documentation file.
> 
> Changes from v6:
>  - None.
> 
> Changes from v5:
>  - None.
> 
> Changes from v4:
>  - None.
> 
> Changes from v3:
>  - None.
> 
> Changes from v2:
>  - None.
> 
> Changes from v1:
>  - Remove pattern_get interface.
> ---
>  .../ABI/testing/sysfs-class-led-driver-sc27xx  |   22 +
>  drivers/leds/leds-sc27xx-bltc.c|  103 
> 
>  2 files changed, 125 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx 
> b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
> new file mode 100644
> index 000..d8056d5
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
> @@ -0,0 +1,22 @@
> +What:/sys/class/leds//hw_pattern
> +Date:September 2018
> +KernelVersion:   4.20
> +Description:
> + Specify a hardware pattern for the SC27XX LED. For the SC27XX
> + LED controller, it only supports 4 stages to make a single
> + hardware pattern, which is used to configure the rise time,
> + high time, fall time and low time for the breathing mode.
> +
> + For the breathing mode, the SC27XX LED only expects one 
> brightness
> + for the high stage. To be compatible with the hardware pattern
> + format, we should set brightness as 0 for rise stage, fall
> + stage and low stage.
> +
> + Min stage duration: 125 ms
> + Max stage duration: 31875 ms
> +
> + Since the stage duration step is 125 ms, the duration must be
> + a multiplier of 125, like 125ms, 250ms, 375ms, 500ms ... 
> 31875ms.
> +
> + Thus the format of the hardware pattern values should be:
> + "0 rise_duration brightness high_duration 0 fall_duration 0 
> low_duration".
> diff --git a/drivers/leds/leds-sc27xx-bltc.c b/drivers/leds/leds-sc27xx-bltc.c
> index 9d9b7aa..558a325 100644
> --- a/drivers/leds/leds-sc27xx-bltc.c
> +++ b/drivers/leds/leds-sc27xx-bltc.c
> @@ -32,8 +32,15 @@
>  #define SC27XX_DUTY_MASK GENMASK(15, 0)
>  #define SC27XX_MOD_MASK  GENMASK(7, 0)
>  
> +#define SC27XX_CURVE_SHIFT   8
> +#define SC27XX_CURVE_L_MASK  GENMASK(7, 0)
> +#define SC27XX_CURVE_H_MASK  GENMASK(15, 8)
> +
>  #define SC27XX_LEDS_OFFSET   0x10
>  #define SC27XX_LEDS_MAX  3
> +#define SC27XX_LEDS_PATTERN_CNT  4
> +/* Stage duration step, in milliseconds */
> +#define SC27XX_LEDS_STEP 125
>  
>  struct sc27xx_led {
>   char name[LED_MAX_NAME_SIZE];
> @@ -122,6 +129,98 @@ static int sc27xx_led_set(struct led_classdev *ldev, 
> enum led_brightness value)
>   return err;
>  }
>  
> +static int sc27xx_led_pattern_clear(struct led_classdev *ldev)
> +{
> + struct sc27xx_led *leds = to_sc27xx_led(ldev);
> + struct regmap *regmap = leds->priv->regmap;
> + u32 base = sc27xx_led_get_offset(leds);
> + u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL;
> + u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line;
> + int err;
> +
> + mutex_lock(>priv->lock);
> +
> + /* Reset the rise, high, fall and low time to zero. */
> + regmap_write(regmap, base + SC27XX_LEDS_CURVE0, 0);
> + regmap_write(regmap, base + SC27XX_LEDS_CURVE1, 0);
> +
> + err = regmap_update_bits(regmap, ctrl_base,
> + (SC27XX_LED_RUN | SC27XX_LED_TYPE) << ctrl_shift, 0);
> +
> + ldev->brightness = LED_OFF;
> +
> + mutex_unlock(>priv->lock);
> +
> + return err;
> +}
> +
> +static int sc27xx_led_pattern_set(struct led_classdev *ldev,
> +   struct led_pattern *pattern,
> +   int len, u32 repeat)
> +{
> + struct sc27xx_led *leds = to_sc27xx_led(ldev);
> + u32 base = sc27xx_led_get_offset(leds);
> + u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL;
> + u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line;
> + struct regmap *regmap = leds->priv->regmap;
> + int err;
> +
> + /*
> +  * Must contain 4 tuples to configure the rise time, high time, fall
> +  * time and low time to enable the breathing mode.
> +  */
> + if (len != SC27XX_LEDS_PATTERN_CNT)
> + return -EINVAL;
> +
> + mutex_lock(>priv->lock);


Re: [PATCH 1/2] mtd: nand: reorder nand manufacturer ids

2018-09-06 Thread Miquel Raynal
Hi Marcel,

Marcel Ziswiler  wrote on Thu,  6 Sep 2018
10:49:21 +0200:

> From: Marcel Ziswiler 
> 
> Reorder NAND manufacturer ids for clarity.
> 
> Signed-off-by: Marcel Ziswiler 
> 
> ---

Both patches applied to nand/next with subject prefix s/nand:/rawnand:/
as well as upper-case: s/nand/NAND/ and s/id/ID/ in the commit log.

Thanks,
Miquèl


Re: [PATCH 1/2] mtd: nand: reorder nand manufacturer ids

2018-09-06 Thread Miquel Raynal
Hi Marcel,

Marcel Ziswiler  wrote on Thu,  6 Sep 2018
10:49:21 +0200:

> From: Marcel Ziswiler 
> 
> Reorder NAND manufacturer ids for clarity.
> 
> Signed-off-by: Marcel Ziswiler 
> 
> ---

Both patches applied to nand/next with subject prefix s/nand:/rawnand:/
as well as upper-case: s/nand/NAND/ and s/id/ID/ in the commit log.

Thanks,
Miquèl


Re: [PATCH] x86: use WRITE_ONCE() when setting PTEs

2018-09-06 Thread Nadav Amit
at 12:57 PM, Peter Zijlstra  wrote:

> On Sun, Sep 02, 2018 at 11:14:50AM -0700, Nadav Amit wrote:
>> When page-table entries are set, the compiler might optimize their
>> assignment by using multiple instructions to set the PTE. This might
>> turn into a security hazard if the user somehow manages to use the
>> interim PTE. L1TF does not make our lives easier, making even an interim
>> non-present PTE a security hazard.
>> 
>> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
>> security hazard.
>> 
>> I skimmed the differences in the binary with and without this patch. The
>> differences are (obviously) greater when CONFIG_PARAVIRT=n as more
>> code optimizations are possible. For better and worse, the impact on the
>> binary with this patch is pretty small. Skimming the code did not cause
>> anything to jump out as a security hazard, but it seems that at least
>> move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
> 
> Acked-by: Peter Zijlstra (Intel) 
> 
> Also, its corollary would also make sense/be required, use READ_ONCE()
> when reading these.

I don’t know. This would obviously be much more intrusive. I can add a
get_pte() and write a Coccinelle script to use it instead of reading the
PTE, but in most cases, I presume, it would be an overkill.

The reason for that is that the PTEs are supposed to be accessed while
holding the page-table lock, and the hardware can only change dirty & access
bits. I think that any code that assumes that these bits do not change while
holding the lock is already broken in more ways.



Re: [PATCH] x86: use WRITE_ONCE() when setting PTEs

2018-09-06 Thread Nadav Amit
at 12:57 PM, Peter Zijlstra  wrote:

> On Sun, Sep 02, 2018 at 11:14:50AM -0700, Nadav Amit wrote:
>> When page-table entries are set, the compiler might optimize their
>> assignment by using multiple instructions to set the PTE. This might
>> turn into a security hazard if the user somehow manages to use the
>> interim PTE. L1TF does not make our lives easier, making even an interim
>> non-present PTE a security hazard.
>> 
>> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
>> security hazard.
>> 
>> I skimmed the differences in the binary with and without this patch. The
>> differences are (obviously) greater when CONFIG_PARAVIRT=n as more
>> code optimizations are possible. For better and worse, the impact on the
>> binary with this patch is pretty small. Skimming the code did not cause
>> anything to jump out as a security hazard, but it seems that at least
>> move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
> 
> Acked-by: Peter Zijlstra (Intel) 
> 
> Also, its corollary would also make sense/be required, use READ_ONCE()
> when reading these.

I don’t know. This would obviously be much more intrusive. I can add a
get_pte() and write a Coccinelle script to use it instead of reading the
PTE, but in most cases, I presume, it would be an overkill.

The reason for that is that the PTEs are supposed to be accessed while
holding the page-table lock, and the hardware can only change dirty & access
bits. I think that any code that assumes that these bits do not change while
holding the lock is already broken in more ways.



Re: [PATCH] arm64: dts: qcom: sdm845-mtp: pm8998 and pmi8998 regulators

2018-09-06 Thread Bjorn Andersson
On Thu 06 Sep 10:52 PDT 2018, Doug Anderson wrote:

> Bjorn,
> 
> On Sat, Sep 1, 2018 at 3:19 PM, Bjorn Andersson
>  wrote:
> > Add regulator definitions for pm8998 and pmi8998 regulators on the MTP.
> >
> > Signed-off-by: Bjorn Andersson 
> > ---
> >  arch/arm64/boot/dts/qcom/sdm845-mtp.dts | 216 
> >  1 file changed, 216 insertions(+)
> 
> I'm curious why you chose to post this instead of reviewing and/or
> building upon the patch I sent up at
> .

Sorry about that, my search skills failed me.

> Compared to mine, yours:
> 
> * Seems to have a few rails named differently LDO22 is named
> "vreg_l22a_2p95" in your DTS but "vreg_l22a_2p85" in mine as one
> example.  The schematics I have (from Dec 5, 2017) show it as 2p85.
> 

Looks like a typo on my part, yours matches the schematics I have as
well.

> * Is missing "regulator-initial-mode".  We can debate this if you want.
> 

Either we start by making them all HPM or we tune as we debug, I'm fine
with your suggestion.

> * Is lacking many alternate names for rails.  We can debate this also
> if you want.
> 

Afaict names such as "vdda_pcie_1p2" is the name of the pin on the
SDM845, while the thing that comes out of the regulator is named
vreg_l26a_1p2.

So I believe this name should be used in the pcie node as:

  vdda_pcie_1p2-supply = <_l26a_1p2>;

> * Have a few voltage values different.  If you have better info than
> me we should update to yours.  Diffing against yours does make me
> believe that perhaps LDO14 should be listed as 1.88 V in my patch.
> 

Downstream it's listed as min: 1.8V max: 1.88V init: 1.8V.

> * Is lacking pm8005.
> 

I didn't need this, yet...

> Anyway, let me know.  If you provide a review of my patch I'm happy to
> spin it with your feedback.
> 

I will review your patches.

Regards,
Bjorn


Re: [PATCH] arm64: dts: qcom: sdm845-mtp: pm8998 and pmi8998 regulators

2018-09-06 Thread Bjorn Andersson
On Thu 06 Sep 10:52 PDT 2018, Doug Anderson wrote:

> Bjorn,
> 
> On Sat, Sep 1, 2018 at 3:19 PM, Bjorn Andersson
>  wrote:
> > Add regulator definitions for pm8998 and pmi8998 regulators on the MTP.
> >
> > Signed-off-by: Bjorn Andersson 
> > ---
> >  arch/arm64/boot/dts/qcom/sdm845-mtp.dts | 216 
> >  1 file changed, 216 insertions(+)
> 
> I'm curious why you chose to post this instead of reviewing and/or
> building upon the patch I sent up at
> .

Sorry about that, my search skills failed me.

> Compared to mine, yours:
> 
> * Seems to have a few rails named differently LDO22 is named
> "vreg_l22a_2p95" in your DTS but "vreg_l22a_2p85" in mine as one
> example.  The schematics I have (from Dec 5, 2017) show it as 2p85.
> 

Looks like a typo on my part, yours matches the schematics I have as
well.

> * Is missing "regulator-initial-mode".  We can debate this if you want.
> 

Either we start by making them all HPM or we tune as we debug, I'm fine
with your suggestion.

> * Is lacking many alternate names for rails.  We can debate this also
> if you want.
> 

Afaict names such as "vdda_pcie_1p2" is the name of the pin on the
SDM845, while the thing that comes out of the regulator is named
vreg_l26a_1p2.

So I believe this name should be used in the pcie node as:

  vdda_pcie_1p2-supply = <_l26a_1p2>;

> * Have a few voltage values different.  If you have better info than
> me we should update to yours.  Diffing against yours does make me
> believe that perhaps LDO14 should be listed as 1.88 V in my patch.
> 

Downstream it's listed as min: 1.8V max: 1.88V init: 1.8V.

> * Is lacking pm8005.
> 

I didn't need this, yet...

> Anyway, let me know.  If you provide a review of my patch I'm happy to
> spin it with your feedback.
> 

I will review your patches.

Regards,
Bjorn


RE: [PATCH v3] ata: Disable AHCI ALPM feature for Ampere Computing eMAG SATA

2018-09-06 Thread Suman Tripathi
Hi Hans,


With regards,
Suman



-Original Message-
From: Hans de Goede  
Sent: Thursday, September 6, 2018 1:06 PM
To: Suman Tripathi ; ax...@kernel.dk; 
t...@kernel.org; linux-...@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org; 
j...@perches.com; a...@arndb.de; gre...@linuxfoundation.org
Cc: Open Source Submission ; Rameshwar Sahu 

Subject: Re: [PATCH v3] ata: Disable AHCI ALPM feature for Ampere Computing 
eMAG SATA

[NOTICE: This email originated from an external sender. Please be mindful of 
safe email handling and proprietary information protection practices.] 


Hi,

On 06-09-18 20:05, Suman Tripathi wrote:
> Due to hardware errata, Ampere Computing eMAG SATA can't support AHCI 
> ALPM feature. This patch disables the AHCI ALPM feature for eMAG SATA.
>
> Signed-off-by: Suman Trpathi 
> Signed-off-by: Rameshwar Prasad Sahu 
> 

Sorry, but the patch is still coming through mangled. Did you use git 
send-email ?
[Suman Tripathi] yes. I rechecked before sending twice. Sorry for the 
inconvenience. Will figure the issue out

Regards,

Hans




>
> ---
> Changes for v3:
>
> * Fix the indentation and whitespace warnings.
>
> Changes for v2:
>
> * Introduce the new ata_port_info object which includes ATA_FLAG_NO_LPM.
> * Include this object for eMAG SATA inside the acpi match table.
> * Retrieve the ata_port_info from the acpi match table.
>
> ---
>   drivers/ata/ahci_platform.c | 15 ++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/ata/ahci_platform.c b/drivers/ata/ahci_platform.c 
> index 99f9a89..9ba283f 100644
> --- a/drivers/ata/ahci_platform.c
> +++ b/drivers/ata/ahci_platform.c
> @@ -33,6 +33,13 @@ static const struct ata_port_info ahci_port_info = {
>   .port_ops   = _platform_ops,
>   };
>
> +static const struct ata_port_info ahci_port_info_nolpm = {
> + .flags  = AHCI_FLAG_COMMON | ATA_FLAG_NO_LPM,
> + .pio_mask   = ATA_PIO4,
> + .udma_mask  = ATA_UDMA6,
> + .port_ops   = _platform_ops,
> +};
> +
>   static struct scsi_host_template ahci_platform_sht = {
>   AHCI_SHT(DRV_NAME),
>   };
> @@ -41,6 +48,7 @@ static int ahci_probe(struct platform_device *pdev)
>   {
>   struct device *dev = >dev;
>   struct ahci_host_priv *hpriv;
> + const struct ata_port_info *port;
>   int rc;
>
>   hpriv = ahci_platform_get_resources(pdev);
> @@ -57,7 +65,11 @@ static int ahci_probe(struct platform_device *pdev)
>   if (of_device_is_compatible(dev->of_node, "hisilicon,hisi-ahci"))
>   hpriv->flags |= AHCI_HFLAG_NO_FBS | AHCI_HFLAG_NO_NCQ;
>
> - rc = ahci_platform_init_host(pdev, hpriv, _port_info,
> + port = acpi_device_get_match_data(dev);
> + if (!port)
> + port = _port_info;
> +
> + rc = ahci_platform_init_host(pdev, hpriv, port,
>_platform_sht);
>   if (rc)
>   goto disable_resources;
> @@ -85,6 +97,7 @@ static const struct of_device_id ahci_of_match[] = {
>   MODULE_DEVICE_TABLE(of, ahci_of_match);
>
>   static const struct acpi_device_id ahci_acpi_match[] = {
> + { "APMC0D33", (unsigned long)_port_info_nolpm },
>   { ACPI_DEVICE_CLASS(PCI_CLASS_STORAGE_SATA_AHCI, 0xff) },
>   {},
>   };
> --
> 2.7.4
>


RE: [PATCH v3] ata: Disable AHCI ALPM feature for Ampere Computing eMAG SATA

2018-09-06 Thread Suman Tripathi
Hi Hans,


With regards,
Suman



-Original Message-
From: Hans de Goede  
Sent: Thursday, September 6, 2018 1:06 PM
To: Suman Tripathi ; ax...@kernel.dk; 
t...@kernel.org; linux-...@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org; 
j...@perches.com; a...@arndb.de; gre...@linuxfoundation.org
Cc: Open Source Submission ; Rameshwar Sahu 

Subject: Re: [PATCH v3] ata: Disable AHCI ALPM feature for Ampere Computing 
eMAG SATA

[NOTICE: This email originated from an external sender. Please be mindful of 
safe email handling and proprietary information protection practices.] 


Hi,

On 06-09-18 20:05, Suman Tripathi wrote:
> Due to hardware errata, Ampere Computing eMAG SATA can't support AHCI 
> ALPM feature. This patch disables the AHCI ALPM feature for eMAG SATA.
>
> Signed-off-by: Suman Trpathi 
> Signed-off-by: Rameshwar Prasad Sahu 
> 

Sorry, but the patch is still coming through mangled. Did you use git 
send-email ?
[Suman Tripathi] yes. I rechecked before sending twice. Sorry for the 
inconvenience. Will figure the issue out

Regards,

Hans




>
> ---
> Changes for v3:
>
> * Fix the indentation and whitespace warnings.
>
> Changes for v2:
>
> * Introduce the new ata_port_info object which includes ATA_FLAG_NO_LPM.
> * Include this object for eMAG SATA inside the acpi match table.
> * Retrieve the ata_port_info from the acpi match table.
>
> ---
>   drivers/ata/ahci_platform.c | 15 ++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/ata/ahci_platform.c b/drivers/ata/ahci_platform.c 
> index 99f9a89..9ba283f 100644
> --- a/drivers/ata/ahci_platform.c
> +++ b/drivers/ata/ahci_platform.c
> @@ -33,6 +33,13 @@ static const struct ata_port_info ahci_port_info = {
>   .port_ops   = _platform_ops,
>   };
>
> +static const struct ata_port_info ahci_port_info_nolpm = {
> + .flags  = AHCI_FLAG_COMMON | ATA_FLAG_NO_LPM,
> + .pio_mask   = ATA_PIO4,
> + .udma_mask  = ATA_UDMA6,
> + .port_ops   = _platform_ops,
> +};
> +
>   static struct scsi_host_template ahci_platform_sht = {
>   AHCI_SHT(DRV_NAME),
>   };
> @@ -41,6 +48,7 @@ static int ahci_probe(struct platform_device *pdev)
>   {
>   struct device *dev = >dev;
>   struct ahci_host_priv *hpriv;
> + const struct ata_port_info *port;
>   int rc;
>
>   hpriv = ahci_platform_get_resources(pdev);
> @@ -57,7 +65,11 @@ static int ahci_probe(struct platform_device *pdev)
>   if (of_device_is_compatible(dev->of_node, "hisilicon,hisi-ahci"))
>   hpriv->flags |= AHCI_HFLAG_NO_FBS | AHCI_HFLAG_NO_NCQ;
>
> - rc = ahci_platform_init_host(pdev, hpriv, _port_info,
> + port = acpi_device_get_match_data(dev);
> + if (!port)
> + port = _port_info;
> +
> + rc = ahci_platform_init_host(pdev, hpriv, port,
>_platform_sht);
>   if (rc)
>   goto disable_resources;
> @@ -85,6 +97,7 @@ static const struct of_device_id ahci_of_match[] = {
>   MODULE_DEVICE_TABLE(of, ahci_of_match);
>
>   static const struct acpi_device_id ahci_acpi_match[] = {
> + { "APMC0D33", (unsigned long)_port_info_nolpm },
>   { ACPI_DEVICE_CLASS(PCI_CLASS_STORAGE_SATA_AHCI, 0xff) },
>   {},
>   };
> --
> 2.7.4
>


Re: [PATCH v3] ata: Disable AHCI ALPM feature for Ampere Computing eMAG SATA

2018-09-06 Thread Hans de Goede

Hi,

On 06-09-18 20:05, Suman Tripathi wrote:

Due to hardware errata, Ampere Computing eMAG SATA can't support
AHCI ALPM feature. This patch disables the AHCI ALPM feature for
eMAG SATA.

Signed-off-by: Suman Trpathi 
Signed-off-by: Rameshwar Prasad Sahu 


Sorry, but the patch is still coming through mangled. Did you use
git send-email ?

Regards,

Hans






---
Changes for v3:

* Fix the indentation and whitespace warnings.

Changes for v2:

* Introduce the new ata_port_info object which includes ATA_FLAG_NO_LPM.
* Include this object for eMAG SATA inside the acpi match table.
* Retrieve the ata_port_info from the acpi match table.

---
  drivers/ata/ahci_platform.c | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/ahci_platform.c b/drivers/ata/ahci_platform.c
index 99f9a89..9ba283f 100644
--- a/drivers/ata/ahci_platform.c
+++ b/drivers/ata/ahci_platform.c
@@ -33,6 +33,13 @@ static const struct ata_port_info ahci_port_info = {
.port_ops   = _platform_ops,
  };

+static const struct ata_port_info ahci_port_info_nolpm = {
+   .flags  = AHCI_FLAG_COMMON | ATA_FLAG_NO_LPM,
+   .pio_mask   = ATA_PIO4,
+   .udma_mask  = ATA_UDMA6,
+   .port_ops   = _platform_ops,
+};
+
  static struct scsi_host_template ahci_platform_sht = {
AHCI_SHT(DRV_NAME),
  };
@@ -41,6 +48,7 @@ static int ahci_probe(struct platform_device *pdev)
  {
struct device *dev = >dev;
struct ahci_host_priv *hpriv;
+   const struct ata_port_info *port;
int rc;

hpriv = ahci_platform_get_resources(pdev);
@@ -57,7 +65,11 @@ static int ahci_probe(struct platform_device *pdev)
if (of_device_is_compatible(dev->of_node, "hisilicon,hisi-ahci"))
hpriv->flags |= AHCI_HFLAG_NO_FBS | AHCI_HFLAG_NO_NCQ;

-   rc = ahci_platform_init_host(pdev, hpriv, _port_info,
+   port = acpi_device_get_match_data(dev);
+   if (!port)
+   port = _port_info;
+
+   rc = ahci_platform_init_host(pdev, hpriv, port,
 _platform_sht);
if (rc)
goto disable_resources;
@@ -85,6 +97,7 @@ static const struct of_device_id ahci_of_match[] = {
  MODULE_DEVICE_TABLE(of, ahci_of_match);

  static const struct acpi_device_id ahci_acpi_match[] = {
+   { "APMC0D33", (unsigned long)_port_info_nolpm },
{ ACPI_DEVICE_CLASS(PCI_CLASS_STORAGE_SATA_AHCI, 0xff) },
{},
  };
--
2.7.4



Re: [PATCH v3] ata: Disable AHCI ALPM feature for Ampere Computing eMAG SATA

2018-09-06 Thread Hans de Goede

Hi,

On 06-09-18 20:05, Suman Tripathi wrote:

Due to hardware errata, Ampere Computing eMAG SATA can't support
AHCI ALPM feature. This patch disables the AHCI ALPM feature for
eMAG SATA.

Signed-off-by: Suman Trpathi 
Signed-off-by: Rameshwar Prasad Sahu 


Sorry, but the patch is still coming through mangled. Did you use
git send-email ?

Regards,

Hans






---
Changes for v3:

* Fix the indentation and whitespace warnings.

Changes for v2:

* Introduce the new ata_port_info object which includes ATA_FLAG_NO_LPM.
* Include this object for eMAG SATA inside the acpi match table.
* Retrieve the ata_port_info from the acpi match table.

---
  drivers/ata/ahci_platform.c | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/ahci_platform.c b/drivers/ata/ahci_platform.c
index 99f9a89..9ba283f 100644
--- a/drivers/ata/ahci_platform.c
+++ b/drivers/ata/ahci_platform.c
@@ -33,6 +33,13 @@ static const struct ata_port_info ahci_port_info = {
.port_ops   = _platform_ops,
  };

+static const struct ata_port_info ahci_port_info_nolpm = {
+   .flags  = AHCI_FLAG_COMMON | ATA_FLAG_NO_LPM,
+   .pio_mask   = ATA_PIO4,
+   .udma_mask  = ATA_UDMA6,
+   .port_ops   = _platform_ops,
+};
+
  static struct scsi_host_template ahci_platform_sht = {
AHCI_SHT(DRV_NAME),
  };
@@ -41,6 +48,7 @@ static int ahci_probe(struct platform_device *pdev)
  {
struct device *dev = >dev;
struct ahci_host_priv *hpriv;
+   const struct ata_port_info *port;
int rc;

hpriv = ahci_platform_get_resources(pdev);
@@ -57,7 +65,11 @@ static int ahci_probe(struct platform_device *pdev)
if (of_device_is_compatible(dev->of_node, "hisilicon,hisi-ahci"))
hpriv->flags |= AHCI_HFLAG_NO_FBS | AHCI_HFLAG_NO_NCQ;

-   rc = ahci_platform_init_host(pdev, hpriv, _port_info,
+   port = acpi_device_get_match_data(dev);
+   if (!port)
+   port = _port_info;
+
+   rc = ahci_platform_init_host(pdev, hpriv, port,
 _platform_sht);
if (rc)
goto disable_resources;
@@ -85,6 +97,7 @@ static const struct of_device_id ahci_of_match[] = {
  MODULE_DEVICE_TABLE(of, ahci_of_match);

  static const struct acpi_device_id ahci_acpi_match[] = {
+   { "APMC0D33", (unsigned long)_port_info_nolpm },
{ ACPI_DEVICE_CLASS(PCI_CLASS_STORAGE_SATA_AHCI, 0xff) },
{},
  };
--
2.7.4



Re: [PATCH V2 5/6] x86/intel_rdt: Use perf infrastructure for measurements

2018-09-06 Thread Reinette Chatre
Hi Peter,

On 9/6/2018 12:44 PM, Peter Zijlstra wrote:
> On Thu, Sep 06, 2018 at 12:21:59PM -0700, Reinette Chatre wrote:
>> If you do have suggestions on how I can improve the implementation while
>> maintaining (or improving) the accuracy of the measurements I would
>> greatly appreciate it.
> 
> You can reduce the code duplication by using __always_inline functions
> with const function arguments.
> 

Could you please elaborate? I am unable to see how that would help in,
for example:

  if (need_l2) {
   rdpmcl(l2_hit_pmcnum, l2_hits_after);
   rdpmcl(l2_miss_pmcnum, l2_miss_after);
  }
  if (need_l3) {
   rdpmcl(l3_hit_pmcnum, l3_hits_after);
   rdpmcl(l3_miss_pmcnum, l3_miss_after);
  }

Two issues with the above are:
- the measurements captured in l2_hits_after and l2_miss_after would
include the hits/misses associated with the "if (need_l2)" test
- the measurements captured in l3_hits_after and l3_miss_after would
include the hits/misses associated with both the "if (need_l2)" and "if
(need_l3)" tests.

When I separate the above into the two functions it just becomes either:
   rdpmcl(l2_hit_pmcnum, l2_hits_after);
   rdpmcl(l2_miss_pmcnum, l2_miss_after);
or:
   rdpmcl(l3_hit_pmcnum, l3_hits_after);
   rdpmcl(l3_miss_pmcnum, l3_miss_after);

Reinette


Re: [PATCH V2 5/6] x86/intel_rdt: Use perf infrastructure for measurements

2018-09-06 Thread Reinette Chatre
Hi Peter,

On 9/6/2018 12:44 PM, Peter Zijlstra wrote:
> On Thu, Sep 06, 2018 at 12:21:59PM -0700, Reinette Chatre wrote:
>> If you do have suggestions on how I can improve the implementation while
>> maintaining (or improving) the accuracy of the measurements I would
>> greatly appreciate it.
> 
> You can reduce the code duplication by using __always_inline functions
> with const function arguments.
> 

Could you please elaborate? I am unable to see how that would help in,
for example:

  if (need_l2) {
   rdpmcl(l2_hit_pmcnum, l2_hits_after);
   rdpmcl(l2_miss_pmcnum, l2_miss_after);
  }
  if (need_l3) {
   rdpmcl(l3_hit_pmcnum, l3_hits_after);
   rdpmcl(l3_miss_pmcnum, l3_miss_after);
  }

Two issues with the above are:
- the measurements captured in l2_hits_after and l2_miss_after would
include the hits/misses associated with the "if (need_l2)" test
- the measurements captured in l3_hits_after and l3_miss_after would
include the hits/misses associated with both the "if (need_l2)" and "if
(need_l3)" tests.

When I separate the above into the two functions it just becomes either:
   rdpmcl(l2_hit_pmcnum, l2_hits_after);
   rdpmcl(l2_miss_pmcnum, l2_miss_after);
or:
   rdpmcl(l3_hit_pmcnum, l3_hits_after);
   rdpmcl(l3_miss_pmcnum, l3_miss_after);

Reinette


Re: [PATCH v2 7/9] net: stmmac: dwmac-sun8i: fix OF child-node lookup

2018-09-06 Thread Corentin Labbe
On Mon, Aug 27, 2018 at 10:21:51AM +0200, Johan Hovold wrote:
> Use the new of_get_compatible_child() helper to lookup the mdio-internal
> child node instead of using of_find_compatible_node(), which searches
> the entire tree from a given start node and thus can return an unrelated
> (i.e. non-child) node.
> 
> This also addresses a potential use-after-free (e.g. after probe
> deferral) as the tree-wide helper drops a reference to its first
> argument (i.e. the mdio-mux node). Fortunately, this was inadvertently
> balanced by a failure to drop the mdio-mux reference after lookup.
> 
> While at it, also fix the related mdio-internal- and phy-node reference
> leaks.
> 
> Fixes: 634db83b8265 ("net: stmmac: dwmac-sun8i: Handle integrated/external 
> MDIOs")
> Cc: Corentin Labbe 
> Cc: Andrew Lunn 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> Cc: Jose Abreu 
> Cc: David S. Miller 
> Signed-off-by: Johan Hovold 
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c 
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> index f9a61f90cfbc..0f660af01a4b 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> @@ -714,8 +714,9 @@ static int get_ephy_nodes(struct stmmac_priv *priv)
>   return -ENODEV;
>   }
>  
> - mdio_internal = of_find_compatible_node(mdio_mux, NULL,
> + mdio_internal = of_get_compatible_child(mdio_mux,
>   
> "allwinner,sun8i-h3-mdio-internal");
> + of_node_put(mdio_mux);
>   if (!mdio_internal) {
>   dev_err(priv->device, "Cannot get internal_mdio node\n");
>   return -ENODEV;
> @@ -729,13 +730,20 @@ static int get_ephy_nodes(struct stmmac_priv *priv)
>   gmac->rst_ephy = of_reset_control_get_exclusive(iphynode, NULL);
>   if (IS_ERR(gmac->rst_ephy)) {
>   ret = PTR_ERR(gmac->rst_ephy);
> - if (ret == -EPROBE_DEFER)
> + if (ret == -EPROBE_DEFER) {
> + of_node_put(iphynode);
> + of_node_put(mdio_internal);
>   return ret;
> + }
>   continue;
>   }
>   dev_info(priv->device, "Found internal PHY node\n");
> + of_node_put(iphynode);
> + of_node_put(mdio_internal);
>   return 0;
>   }
> +
> + of_node_put(mdio_internal);
>   return -ENODEV;
>  }
>  
> -- 
> 2.18.0
> 

Sorry for the delay
Tested-by: Corentin Labbe 


Re: [PATCH v2 7/9] net: stmmac: dwmac-sun8i: fix OF child-node lookup

2018-09-06 Thread Corentin Labbe
On Mon, Aug 27, 2018 at 10:21:51AM +0200, Johan Hovold wrote:
> Use the new of_get_compatible_child() helper to lookup the mdio-internal
> child node instead of using of_find_compatible_node(), which searches
> the entire tree from a given start node and thus can return an unrelated
> (i.e. non-child) node.
> 
> This also addresses a potential use-after-free (e.g. after probe
> deferral) as the tree-wide helper drops a reference to its first
> argument (i.e. the mdio-mux node). Fortunately, this was inadvertently
> balanced by a failure to drop the mdio-mux reference after lookup.
> 
> While at it, also fix the related mdio-internal- and phy-node reference
> leaks.
> 
> Fixes: 634db83b8265 ("net: stmmac: dwmac-sun8i: Handle integrated/external 
> MDIOs")
> Cc: Corentin Labbe 
> Cc: Andrew Lunn 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> Cc: Jose Abreu 
> Cc: David S. Miller 
> Signed-off-by: Johan Hovold 
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c 
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> index f9a61f90cfbc..0f660af01a4b 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> @@ -714,8 +714,9 @@ static int get_ephy_nodes(struct stmmac_priv *priv)
>   return -ENODEV;
>   }
>  
> - mdio_internal = of_find_compatible_node(mdio_mux, NULL,
> + mdio_internal = of_get_compatible_child(mdio_mux,
>   
> "allwinner,sun8i-h3-mdio-internal");
> + of_node_put(mdio_mux);
>   if (!mdio_internal) {
>   dev_err(priv->device, "Cannot get internal_mdio node\n");
>   return -ENODEV;
> @@ -729,13 +730,20 @@ static int get_ephy_nodes(struct stmmac_priv *priv)
>   gmac->rst_ephy = of_reset_control_get_exclusive(iphynode, NULL);
>   if (IS_ERR(gmac->rst_ephy)) {
>   ret = PTR_ERR(gmac->rst_ephy);
> - if (ret == -EPROBE_DEFER)
> + if (ret == -EPROBE_DEFER) {
> + of_node_put(iphynode);
> + of_node_put(mdio_internal);
>   return ret;
> + }
>   continue;
>   }
>   dev_info(priv->device, "Found internal PHY node\n");
> + of_node_put(iphynode);
> + of_node_put(mdio_internal);
>   return 0;
>   }
> +
> + of_node_put(mdio_internal);
>   return -ENODEV;
>  }
>  
> -- 
> 2.18.0
> 

Sorry for the delay
Tested-by: Corentin Labbe 


Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Nadav Amit
at 12:53 PM, Peter Zijlstra  wrote:

> On Thu, Sep 06, 2018 at 07:42:14PM +, Nadav Amit wrote:
>> at 12:40 PM, Peter Zijlstra  wrote:
>> 
>>> On Sun, Sep 02, 2018 at 10:32:19AM -0700, Nadav Amit wrote:
 text_mutex is expected to be held before text_poke() is called, but we
 cannot add a lockdep assertion since kgdb does not take it, and instead
 *supposedly* ensures the lock is not taken and will not be acquired by
 any other core while text_poke() is running.
 
 The reason for the "supposedly" comment is that it is not entirely clear
 that this would be the case if gdb_do_roundup is zero.
>>> 
>>> Argh, that's pretty shit code...
>>> 
>>> Not only is that text_mutex abuse ugly, so too is the fixmap usage from
>>> IRQ context. I suppose this really does require your alternative mm
>>> patches for text_poke().
>> 
>> Right, I forgot about that…
> 
> With that CR3 trickery, we can rid ourselves of the text_mutex
> requirement, since concurrent text_poke is 'safe'. That would clean up
> the kgdb code quite a bit.

I don’t know. I’m somewhat worried with multiple mechanisms potentially
changing the same code at the same time - and maybe ending up with some
mess.



Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Nadav Amit
at 12:53 PM, Peter Zijlstra  wrote:

> On Thu, Sep 06, 2018 at 07:42:14PM +, Nadav Amit wrote:
>> at 12:40 PM, Peter Zijlstra  wrote:
>> 
>>> On Sun, Sep 02, 2018 at 10:32:19AM -0700, Nadav Amit wrote:
 text_mutex is expected to be held before text_poke() is called, but we
 cannot add a lockdep assertion since kgdb does not take it, and instead
 *supposedly* ensures the lock is not taken and will not be acquired by
 any other core while text_poke() is running.
 
 The reason for the "supposedly" comment is that it is not entirely clear
 that this would be the case if gdb_do_roundup is zero.
>>> 
>>> Argh, that's pretty shit code...
>>> 
>>> Not only is that text_mutex abuse ugly, so too is the fixmap usage from
>>> IRQ context. I suppose this really does require your alternative mm
>>> patches for text_poke().
>> 
>> Right, I forgot about that…
> 
> With that CR3 trickery, we can rid ourselves of the text_mutex
> requirement, since concurrent text_poke is 'safe'. That would clean up
> the kgdb code quite a bit.

I don’t know. I’m somewhat worried with multiple mechanisms potentially
changing the same code at the same time - and maybe ending up with some
mess.



Re: [PATCH] x86: use WRITE_ONCE() when setting PTEs

2018-09-06 Thread Peter Zijlstra
On Sun, Sep 02, 2018 at 11:14:50AM -0700, Nadav Amit wrote:
> When page-table entries are set, the compiler might optimize their
> assignment by using multiple instructions to set the PTE. This might
> turn into a security hazard if the user somehow manages to use the
> interim PTE. L1TF does not make our lives easier, making even an interim
> non-present PTE a security hazard.
> 
> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
> security hazard.
> 
> I skimmed the differences in the binary with and without this patch. The
> differences are (obviously) greater when CONFIG_PARAVIRT=n as more
> code optimizations are possible. For better and worse, the impact on the
> binary with this patch is pretty small. Skimming the code did not cause
> anything to jump out as a security hazard, but it seems that at least
> move_soft_dirty_pte() caused set_pte_at() to use multiple writes.

Acked-by: Peter Zijlstra (Intel) 

Also, its corollary would also make sense/be required, use READ_ONCE()
when reading these.



Re: [PATCH] x86: use WRITE_ONCE() when setting PTEs

2018-09-06 Thread Peter Zijlstra
On Sun, Sep 02, 2018 at 11:14:50AM -0700, Nadav Amit wrote:
> When page-table entries are set, the compiler might optimize their
> assignment by using multiple instructions to set the PTE. This might
> turn into a security hazard if the user somehow manages to use the
> interim PTE. L1TF does not make our lives easier, making even an interim
> non-present PTE a security hazard.
> 
> Using WRITE_ONCE() to set PTEs and friends should prevent this potential
> security hazard.
> 
> I skimmed the differences in the binary with and without this patch. The
> differences are (obviously) greater when CONFIG_PARAVIRT=n as more
> code optimizations are possible. For better and worse, the impact on the
> binary with this patch is pretty small. Skimming the code did not cause
> anything to jump out as a security hazard, but it seems that at least
> move_soft_dirty_pte() caused set_pte_at() to use multiple writes.

Acked-by: Peter Zijlstra (Intel) 

Also, its corollary would also make sense/be required, use READ_ONCE()
when reading these.



Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 07:42:14PM +, Nadav Amit wrote:
> at 12:40 PM, Peter Zijlstra  wrote:
> 
> > On Sun, Sep 02, 2018 at 10:32:19AM -0700, Nadav Amit wrote:
> >> text_mutex is expected to be held before text_poke() is called, but we
> >> cannot add a lockdep assertion since kgdb does not take it, and instead
> >> *supposedly* ensures the lock is not taken and will not be acquired by
> >> any other core while text_poke() is running.
> >> 
> >> The reason for the "supposedly" comment is that it is not entirely clear
> >> that this would be the case if gdb_do_roundup is zero.
> > 
> > Argh, that's pretty shit code...
> > 
> > Not only is that text_mutex abuse ugly, so too is the fixmap usage from
> > IRQ context. I suppose this really does require your alternative mm
> > patches for text_poke().
> 
> Right, I forgot about that…

With that CR3 trickery, we can rid ourselves of the text_mutex
requirement, since concurrent text_poke is 'safe'. That would clean up
the kgdb code quite a bit.


Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 07:42:14PM +, Nadav Amit wrote:
> at 12:40 PM, Peter Zijlstra  wrote:
> 
> > On Sun, Sep 02, 2018 at 10:32:19AM -0700, Nadav Amit wrote:
> >> text_mutex is expected to be held before text_poke() is called, but we
> >> cannot add a lockdep assertion since kgdb does not take it, and instead
> >> *supposedly* ensures the lock is not taken and will not be acquired by
> >> any other core while text_poke() is running.
> >> 
> >> The reason for the "supposedly" comment is that it is not entirely clear
> >> that this would be the case if gdb_do_roundup is zero.
> > 
> > Argh, that's pretty shit code...
> > 
> > Not only is that text_mutex abuse ugly, so too is the fixmap usage from
> > IRQ context. I suppose this really does require your alternative mm
> > patches for text_poke().
> 
> Right, I forgot about that…

With that CR3 trickery, we can rid ourselves of the text_mutex
requirement, since concurrent text_poke is 'safe'. That would clean up
the kgdb code quite a bit.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 02:24 PM, Brijesh Singh wrote:

...



Again we have to consider the bare metal scenario while doing this. The
aux array you proposed will be added in decrypted section only when
CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
gets put in .data.decrypted section. At the runtime, if memory
encryption is active then .data.decrypted_hvclock will contains useful
data.

The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.


Right, but won't the data get dumped into the regular .bss in that
case, i.e. needs to be freed?




Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


We can use #ifdef CONFIG_AMD_MEM_ENCRYPT around hv_clock_aux definition
so that it gets defined

Something like this:

#ifdef CONFIG_AMD_MEM_ENCRYPT
/* The auxilary array will be used when SEV is active */
#define HVC_AUX_ARRAY_SIZE \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted_aux;
#endif



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 02:24 PM, Brijesh Singh wrote:

...



Again we have to consider the bare metal scenario while doing this. The
aux array you proposed will be added in decrypted section only when
CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
gets put in .data.decrypted section. At the runtime, if memory
encryption is active then .data.decrypted_hvclock will contains useful
data.

The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.


Right, but won't the data get dumped into the regular .bss in that
case, i.e. needs to be freed?




Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


We can use #ifdef CONFIG_AMD_MEM_ENCRYPT around hv_clock_aux definition
so that it gets defined

Something like this:

#ifdef CONFIG_AMD_MEM_ENCRYPT
/* The auxilary array will be used when SEV is active */
#define HVC_AUX_ARRAY_SIZE \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted_aux;
#endif



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 02:24:32PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/06/2018 01:47 PM, Sean Christopherson wrote:
> >On Thu, Sep 06, 2018 at 01:37:50PM -0500, Brijesh Singh wrote:
> >>
> >>
> >>On 09/06/2018 09:18 AM, Sean Christopherson wrote:
> >>
> >>
> >
> >So are we going to be defining a decrypted section for every piece of
> >machinery now?
> >
> >That's a bit too much in my book.
> >
> >Why can't you simply free everything in .data..decrypted on !SVE guests?
> 
> That would prevent adding __decrypted to existing declarations, e.g.
> hv_clock_boot, which would be ugly in its own right.  A more generic
> solution would be to add something like __decrypted_exclusive to mark
> data that is used if and only if SEV is active, and then free the
> SEV-only data when SEV is disabled.
> >>>
> >>>Oh, and we'd need to make sure __decrypted_exclusive is freed when
> >>>!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
> >>>array is used only if SEV is active.  This patch unconditionally
> >>>defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
> >>>!mem_encrypt_active().
> >>>
> >>
> >>Again we have to consider the bare metal scenario while doing this. The
> >>aux array you proposed will be added in decrypted section only when
> >>CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
> >>gets put in .data.decrypted section. At the runtime, if memory
> >>encryption is active then .data.decrypted_hvclock will contains useful
> >>data.
> >>
> >>The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.
> >
> >Right, but won't the data get dumped into the regular .bss in that
> >case, i.e. needs to be freed?
> >
> 
> 
> Yes, the auxiliary array will dumped into the regular .bss when
> CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
> sure if its worth complicating the code to save those extra memory.
> Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.

I just realized that we'll try to create a bogus array if 'NR_CPUS <=
HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
#define HVC_AUX_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
#endif



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Sean Christopherson
On Thu, Sep 06, 2018 at 02:24:32PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/06/2018 01:47 PM, Sean Christopherson wrote:
> >On Thu, Sep 06, 2018 at 01:37:50PM -0500, Brijesh Singh wrote:
> >>
> >>
> >>On 09/06/2018 09:18 AM, Sean Christopherson wrote:
> >>
> >>
> >
> >So are we going to be defining a decrypted section for every piece of
> >machinery now?
> >
> >That's a bit too much in my book.
> >
> >Why can't you simply free everything in .data..decrypted on !SVE guests?
> 
> That would prevent adding __decrypted to existing declarations, e.g.
> hv_clock_boot, which would be ugly in its own right.  A more generic
> solution would be to add something like __decrypted_exclusive to mark
> data that is used if and only if SEV is active, and then free the
> SEV-only data when SEV is disabled.
> >>>
> >>>Oh, and we'd need to make sure __decrypted_exclusive is freed when
> >>>!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
> >>>array is used only if SEV is active.  This patch unconditionally
> >>>defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
> >>>!mem_encrypt_active().
> >>>
> >>
> >>Again we have to consider the bare metal scenario while doing this. The
> >>aux array you proposed will be added in decrypted section only when
> >>CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
> >>gets put in .data.decrypted section. At the runtime, if memory
> >>encryption is active then .data.decrypted_hvclock will contains useful
> >>data.
> >>
> >>The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.
> >
> >Right, but won't the data get dumped into the regular .bss in that
> >case, i.e. needs to be freed?
> >
> 
> 
> Yes, the auxiliary array will dumped into the regular .bss when
> CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
> sure if its worth complicating the code to save those extra memory.
> Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.

I just realized that we'll try to create a bogus array if 'NR_CPUS <=
HVC_BOOT_ARRAY_SIZE'.  A bit ugly, but we could #ifdef away both that
and CONFIG_AMD_MEM_ENCRYPT=n in a single shot, e.g.:

#if defined(CONFIG_AMD_MEM_ENCRYPT) && NR_CPUS > HVC_BOOT_ARRAY_SIZE
#define HVC_AUX_ARRAY_SIZE  \
PAGE_ALIGN((NR_CPUS - HVC_BOOT_ARRAY_SIZE) * \
   sizeof(struct pvclock_vsyscall_time_info))
static struct pvclock_vsyscall_time_info
hv_clock_aux[HVC_AUX_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
#endif



Re: [PATCH V2 5/6] x86/intel_rdt: Use perf infrastructure for measurements

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 12:21:59PM -0700, Reinette Chatre wrote:
> If you do have suggestions on how I can improve the implementation while
> maintaining (or improving) the accuracy of the measurements I would
> greatly appreciate it.

You can reduce the code duplication by using __always_inline functions
with const function arguments.


Re: [PATCH V2 5/6] x86/intel_rdt: Use perf infrastructure for measurements

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 12:21:59PM -0700, Reinette Chatre wrote:
> If you do have suggestions on how I can improve the implementation while
> maintaining (or improving) the accuracy of the measurements I would
> greatly appreciate it.

You can reduce the code duplication by using __always_inline functions
with const function arguments.


Re: Widespread crashes in next-20180906

2018-09-06 Thread Theodore Y. Ts'o
On Thu, Sep 06, 2018 at 03:13:23PM +0100, Matt Hart wrote:
> On 6 September 2018 at 15:04, Theodore Y. Ts'o  wrote:
> > On Thu, Sep 06, 2018 at 06:45:15AM -0700, Guenter Roeck wrote:
> >> Build results:
> >>   total: 134 pass: 133 fail: 1
> 
> Do you build arm64? Because KernelCI is seeing build failures in arm64
> defconfig for next-20180906
> Clearly it's a module build problem for sunxi but I'm not sure who to
> CC about this.
> 
> https://kernelci.org/build/next/branch/master/kernel/next-20180906/
> https://storage.kernelci.org/next/master/next-20180906/arm64/defconfig/build.log

At a guess, from the MAINTAINERS file

ARM/Allwinner sunXi SoC support
M:  Maxime Ripard 
M:  Chen-Yu Tsai 

- Ted


Re: Widespread crashes in next-20180906

2018-09-06 Thread Theodore Y. Ts'o
On Thu, Sep 06, 2018 at 03:13:23PM +0100, Matt Hart wrote:
> On 6 September 2018 at 15:04, Theodore Y. Ts'o  wrote:
> > On Thu, Sep 06, 2018 at 06:45:15AM -0700, Guenter Roeck wrote:
> >> Build results:
> >>   total: 134 pass: 133 fail: 1
> 
> Do you build arm64? Because KernelCI is seeing build failures in arm64
> defconfig for next-20180906
> Clearly it's a module build problem for sunxi but I'm not sure who to
> CC about this.
> 
> https://kernelci.org/build/next/branch/master/kernel/next-20180906/
> https://storage.kernelci.org/next/master/next-20180906/arm64/defconfig/build.log

At a guess, from the MAINTAINERS file

ARM/Allwinner sunXi SoC support
M:  Maxime Ripard 
M:  Chen-Yu Tsai 

- Ted


Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Nadav Amit
at 12:40 PM, Peter Zijlstra  wrote:

> On Sun, Sep 02, 2018 at 10:32:19AM -0700, Nadav Amit wrote:
>> text_mutex is expected to be held before text_poke() is called, but we
>> cannot add a lockdep assertion since kgdb does not take it, and instead
>> *supposedly* ensures the lock is not taken and will not be acquired by
>> any other core while text_poke() is running.
>> 
>> The reason for the "supposedly" comment is that it is not entirely clear
>> that this would be the case if gdb_do_roundup is zero.
> 
> Argh, that's pretty shit code...
> 
> Not only is that text_mutex abuse ugly, so too is the fixmap usage from
> IRQ context. I suppose this really does require your alternative mm
> patches for text_poke().

Right, I forgot about that…



Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Nadav Amit
at 12:40 PM, Peter Zijlstra  wrote:

> On Sun, Sep 02, 2018 at 10:32:19AM -0700, Nadav Amit wrote:
>> text_mutex is expected to be held before text_poke() is called, but we
>> cannot add a lockdep assertion since kgdb does not take it, and instead
>> *supposedly* ensures the lock is not taken and will not be acquired by
>> any other core while text_poke() is running.
>> 
>> The reason for the "supposedly" comment is that it is not entirely clear
>> that this would be the case if gdb_do_roundup is zero.
> 
> Argh, that's pretty shit code...
> 
> Not only is that text_mutex abuse ugly, so too is the fixmap usage from
> IRQ context. I suppose this really does require your alternative mm
> patches for text_poke().

Right, I forgot about that…



Re: possible deadlock in ext4_evict_inode

2018-09-06 Thread Dmitry Vyukov
On Thu, Sep 6, 2018 at 9:38 PM, Theodore Y. Ts'o  wrote:
> On Thu, Sep 06, 2018 at 09:41:04AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:b36fdc6853a3 Merge tag 'gpio-v4.19-2' of git://git.kernel...
>> git tree:   upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1716bc9e40
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=6c9564cd177daf0c
>> dashboard link: https://syzkaller.appspot.com/bug?extid=0eefc1e06a77d327a056
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13db48be40
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+0eefc1e06a77d327a...@syzkaller.appspotmail.com
>
> This looks like it's a Smack issue?


Looks like it.

Happens only on smack instance:
https://syzkaller.appspot.com/bug?extid=0eefc1e06a77d327a056

Let's add +smack maintainers.


>> 8021q: adding VLAN 0 to HW filter on device team0
>> 8021q: adding VLAN 0 to HW filter on device team0
>> syz-executor0 (11193) used greatest stack depth: 15352 bytes left
>>
>> ==
>> WARNING: possible circular locking dependency detected
>> 4.19.0-rc2+ #1 Not tainted
>> --
>> syz-executor3/11182 is trying to acquire lock:
>> c157b042 (sb_internal){.+.+}, at: sb_start_intwrite
>> include/linux/fs.h:1613 [inline]
>> c157b042 (sb_internal){.+.+}, at: ext4_evict_inode+0x588/0x19b0
>> fs/ext4/inode.c:250
>>
>> but task is already holding lock:
>> 128cdd3b (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.98+0x0/0x30
>> mm/page_alloc.c:463
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #3 (fs_reclaim){+.+.}:
>>__fs_reclaim_acquire mm/page_alloc.c:3728 [inline]
>>fs_reclaim_acquire.part.98+0x24/0x30 mm/page_alloc.c:3739
>>fs_reclaim_acquire+0x14/0x20 mm/page_alloc.c:3740
>>slab_pre_alloc_hook mm/slab.h:418 [inline]
>>slab_alloc mm/slab.c:3378 [inline]
>>kmem_cache_alloc_trace+0x2d/0x730 mm/slab.c:3618
>>kmalloc include/linux/slab.h:513 [inline]
>>kzalloc include/linux/slab.h:707 [inline]
>>smk_fetch.part.24+0x5a/0xf0 security/smack/smack_lsm.c:273
>>smk_fetch security/smack/smack_lsm.c:3548 [inline]
>>smack_d_instantiate+0x946/0xea0 security/smack/smack_lsm.c:3502
>>security_d_instantiate+0x5c/0xf0 security/security.c:1287
>>d_instantiate+0x5e/0xa0 fs/dcache.c:1870
>>shmem_mknod+0x189/0x1f0 mm/shmem.c:2812
>>vfs_mknod+0x447/0x800 fs/namei.c:3719
>>handle_create+0x1ff/0x7c0 drivers/base/devtmpfs.c:211
>>handle drivers/base/devtmpfs.c:374 [inline]
>>devtmpfsd+0x27f/0x4c0 drivers/base/devtmpfs.c:400
>>kthread+0x35a/0x420 kernel/kthread.c:246
>>ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
>>
>> -> #2 (>smk_lock){+.+.}:
>>__mutex_lock_common kernel/locking/mutex.c:925 [inline]
>>__mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
>>smack_d_instantiate+0x130/0xea0 security/smack/smack_lsm.c:3369
>>security_d_instantiate+0x5c/0xf0 security/security.c:1287
>>d_instantiate_new+0x7e/0x160 fs/dcache.c:1889
>>ext4_add_nondir+0x81/0x90 fs/ext4/namei.c:2415
>>ext4_symlink+0x761/0x1170 fs/ext4/namei.c:3162
>>vfs_symlink+0x37a/0x5d0 fs/namei.c:4127
>>do_symlinkat+0x242/0x2d0 fs/namei.c:4154
>>__do_sys_symlink fs/namei.c:4173 [inline]
>>__se_sys_symlink fs/namei.c:4171 [inline]
>>__x64_sys_symlink+0x59/0x80 fs/namei.c:4171
>>do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>>entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> -> #1 (jbd2_handle){}:
>>start_this_handle+0x5c0/0x1260 fs/jbd2/transaction.c:385
>>jbd2__journal_start+0x3c9/0x9f0 fs/jbd2/transaction.c:439
>>__ext4_journal_start_sb+0x18d/0x590 fs/ext4/ext4_jbd2.c:81
>>ext4_sample_last_mounted fs/ext4/file.c:414 [inline]
>>ext4_file_open+0x552/0x7b0 fs/ext4/file.c:439
>>do_dentry_open+0x499/0x1250 fs/open.c:771
>>vfs_open+0xa0/0xd0 fs/open.c:880
>>do_last fs/namei.c:3418 [inline]
>>path_openat+0x130f/0x5340 fs/namei.c:3534
>>do_filp_open+0x255/0x380 fs/namei.c:3564
>>do_open_execat+0x221/0x8e0 fs/exec.c:853
>>__do_execve_file.isra.35+0x1707/0x2460 fs/exec.c:1755
>>do_execveat_common fs/exec.c:1866 [inline]
>>do_execve fs/exec.c:1883 [inline]
>>__do_sys_execve fs/exec.c:1964 [inline]
>>__se_sys_execve fs/exec.c:1959 [inline]
>>__x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
>>do_syscall_64+0x1b9/0x820 

Re: possible deadlock in ext4_evict_inode

2018-09-06 Thread Dmitry Vyukov
On Thu, Sep 6, 2018 at 9:38 PM, Theodore Y. Ts'o  wrote:
> On Thu, Sep 06, 2018 at 09:41:04AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:b36fdc6853a3 Merge tag 'gpio-v4.19-2' of git://git.kernel...
>> git tree:   upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1716bc9e40
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=6c9564cd177daf0c
>> dashboard link: https://syzkaller.appspot.com/bug?extid=0eefc1e06a77d327a056
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13db48be40
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+0eefc1e06a77d327a...@syzkaller.appspotmail.com
>
> This looks like it's a Smack issue?


Looks like it.

Happens only on smack instance:
https://syzkaller.appspot.com/bug?extid=0eefc1e06a77d327a056

Let's add +smack maintainers.


>> 8021q: adding VLAN 0 to HW filter on device team0
>> 8021q: adding VLAN 0 to HW filter on device team0
>> syz-executor0 (11193) used greatest stack depth: 15352 bytes left
>>
>> ==
>> WARNING: possible circular locking dependency detected
>> 4.19.0-rc2+ #1 Not tainted
>> --
>> syz-executor3/11182 is trying to acquire lock:
>> c157b042 (sb_internal){.+.+}, at: sb_start_intwrite
>> include/linux/fs.h:1613 [inline]
>> c157b042 (sb_internal){.+.+}, at: ext4_evict_inode+0x588/0x19b0
>> fs/ext4/inode.c:250
>>
>> but task is already holding lock:
>> 128cdd3b (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.98+0x0/0x30
>> mm/page_alloc.c:463
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #3 (fs_reclaim){+.+.}:
>>__fs_reclaim_acquire mm/page_alloc.c:3728 [inline]
>>fs_reclaim_acquire.part.98+0x24/0x30 mm/page_alloc.c:3739
>>fs_reclaim_acquire+0x14/0x20 mm/page_alloc.c:3740
>>slab_pre_alloc_hook mm/slab.h:418 [inline]
>>slab_alloc mm/slab.c:3378 [inline]
>>kmem_cache_alloc_trace+0x2d/0x730 mm/slab.c:3618
>>kmalloc include/linux/slab.h:513 [inline]
>>kzalloc include/linux/slab.h:707 [inline]
>>smk_fetch.part.24+0x5a/0xf0 security/smack/smack_lsm.c:273
>>smk_fetch security/smack/smack_lsm.c:3548 [inline]
>>smack_d_instantiate+0x946/0xea0 security/smack/smack_lsm.c:3502
>>security_d_instantiate+0x5c/0xf0 security/security.c:1287
>>d_instantiate+0x5e/0xa0 fs/dcache.c:1870
>>shmem_mknod+0x189/0x1f0 mm/shmem.c:2812
>>vfs_mknod+0x447/0x800 fs/namei.c:3719
>>handle_create+0x1ff/0x7c0 drivers/base/devtmpfs.c:211
>>handle drivers/base/devtmpfs.c:374 [inline]
>>devtmpfsd+0x27f/0x4c0 drivers/base/devtmpfs.c:400
>>kthread+0x35a/0x420 kernel/kthread.c:246
>>ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
>>
>> -> #2 (>smk_lock){+.+.}:
>>__mutex_lock_common kernel/locking/mutex.c:925 [inline]
>>__mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
>>smack_d_instantiate+0x130/0xea0 security/smack/smack_lsm.c:3369
>>security_d_instantiate+0x5c/0xf0 security/security.c:1287
>>d_instantiate_new+0x7e/0x160 fs/dcache.c:1889
>>ext4_add_nondir+0x81/0x90 fs/ext4/namei.c:2415
>>ext4_symlink+0x761/0x1170 fs/ext4/namei.c:3162
>>vfs_symlink+0x37a/0x5d0 fs/namei.c:4127
>>do_symlinkat+0x242/0x2d0 fs/namei.c:4154
>>__do_sys_symlink fs/namei.c:4173 [inline]
>>__se_sys_symlink fs/namei.c:4171 [inline]
>>__x64_sys_symlink+0x59/0x80 fs/namei.c:4171
>>do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>>entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> -> #1 (jbd2_handle){}:
>>start_this_handle+0x5c0/0x1260 fs/jbd2/transaction.c:385
>>jbd2__journal_start+0x3c9/0x9f0 fs/jbd2/transaction.c:439
>>__ext4_journal_start_sb+0x18d/0x590 fs/ext4/ext4_jbd2.c:81
>>ext4_sample_last_mounted fs/ext4/file.c:414 [inline]
>>ext4_file_open+0x552/0x7b0 fs/ext4/file.c:439
>>do_dentry_open+0x499/0x1250 fs/open.c:771
>>vfs_open+0xa0/0xd0 fs/open.c:880
>>do_last fs/namei.c:3418 [inline]
>>path_openat+0x130f/0x5340 fs/namei.c:3534
>>do_filp_open+0x255/0x380 fs/namei.c:3564
>>do_open_execat+0x221/0x8e0 fs/exec.c:853
>>__do_execve_file.isra.35+0x1707/0x2460 fs/exec.c:1755
>>do_execveat_common fs/exec.c:1866 [inline]
>>do_execve fs/exec.c:1883 [inline]
>>__do_sys_execve fs/exec.c:1964 [inline]
>>__se_sys_execve fs/exec.c:1959 [inline]
>>__x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
>>do_syscall_64+0x1b9/0x820 

Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Peter Zijlstra
On Sun, Sep 02, 2018 at 10:32:19AM -0700, Nadav Amit wrote:
> text_mutex is expected to be held before text_poke() is called, but we
> cannot add a lockdep assertion since kgdb does not take it, and instead
> *supposedly* ensures the lock is not taken and will not be acquired by
> any other core while text_poke() is running.
> 
> The reason for the "supposedly" comment is that it is not entirely clear
> that this would be the case if gdb_do_roundup is zero.

Argh, that's pretty shit code...

Not only is that text_mutex abuse ugly, so too is the fixmap usage from
IRQ context. I suppose this really does require your alternative mm
patches for text_poke().


Re: [PATCH v2 1/6] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-09-06 Thread Peter Zijlstra
On Sun, Sep 02, 2018 at 10:32:19AM -0700, Nadav Amit wrote:
> text_mutex is expected to be held before text_poke() is called, but we
> cannot add a lockdep assertion since kgdb does not take it, and instead
> *supposedly* ensures the lock is not taken and will not be acquired by
> any other core while text_poke() is running.
> 
> The reason for the "supposedly" comment is that it is not entirely clear
> that this would be the case if gdb_do_roundup is zero.

Argh, that's pretty shit code...

Not only is that text_mutex abuse ugly, so too is the fixmap usage from
IRQ context. I suppose this really does require your alternative mm
patches for text_poke().


Re: possible deadlock in ext4_evict_inode

2018-09-06 Thread Theodore Y. Ts'o
On Thu, Sep 06, 2018 at 09:41:04AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:b36fdc6853a3 Merge tag 'gpio-v4.19-2' of git://git.kernel...
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1716bc9e40
> kernel config:  https://syzkaller.appspot.com/x/.config?x=6c9564cd177daf0c
> dashboard link: https://syzkaller.appspot.com/bug?extid=0eefc1e06a77d327a056
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13db48be40
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0eefc1e06a77d327a...@syzkaller.appspotmail.com

This looks like it's a Smack issue?

- Ted

> 
> 8021q: adding VLAN 0 to HW filter on device team0
> 8021q: adding VLAN 0 to HW filter on device team0
> syz-executor0 (11193) used greatest stack depth: 15352 bytes left
> 
> ==
> WARNING: possible circular locking dependency detected
> 4.19.0-rc2+ #1 Not tainted
> --
> syz-executor3/11182 is trying to acquire lock:
> c157b042 (sb_internal){.+.+}, at: sb_start_intwrite
> include/linux/fs.h:1613 [inline]
> c157b042 (sb_internal){.+.+}, at: ext4_evict_inode+0x588/0x19b0
> fs/ext4/inode.c:250
> 
> but task is already holding lock:
> 128cdd3b (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.98+0x0/0x30
> mm/page_alloc.c:463
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #3 (fs_reclaim){+.+.}:
>__fs_reclaim_acquire mm/page_alloc.c:3728 [inline]
>fs_reclaim_acquire.part.98+0x24/0x30 mm/page_alloc.c:3739
>fs_reclaim_acquire+0x14/0x20 mm/page_alloc.c:3740
>slab_pre_alloc_hook mm/slab.h:418 [inline]
>slab_alloc mm/slab.c:3378 [inline]
>kmem_cache_alloc_trace+0x2d/0x730 mm/slab.c:3618
>kmalloc include/linux/slab.h:513 [inline]
>kzalloc include/linux/slab.h:707 [inline]
>smk_fetch.part.24+0x5a/0xf0 security/smack/smack_lsm.c:273
>smk_fetch security/smack/smack_lsm.c:3548 [inline]
>smack_d_instantiate+0x946/0xea0 security/smack/smack_lsm.c:3502
>security_d_instantiate+0x5c/0xf0 security/security.c:1287
>d_instantiate+0x5e/0xa0 fs/dcache.c:1870
>shmem_mknod+0x189/0x1f0 mm/shmem.c:2812
>vfs_mknod+0x447/0x800 fs/namei.c:3719
>handle_create+0x1ff/0x7c0 drivers/base/devtmpfs.c:211
>handle drivers/base/devtmpfs.c:374 [inline]
>devtmpfsd+0x27f/0x4c0 drivers/base/devtmpfs.c:400
>kthread+0x35a/0x420 kernel/kthread.c:246
>ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
> 
> -> #2 (>smk_lock){+.+.}:
>__mutex_lock_common kernel/locking/mutex.c:925 [inline]
>__mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
>smack_d_instantiate+0x130/0xea0 security/smack/smack_lsm.c:3369
>security_d_instantiate+0x5c/0xf0 security/security.c:1287
>d_instantiate_new+0x7e/0x160 fs/dcache.c:1889
>ext4_add_nondir+0x81/0x90 fs/ext4/namei.c:2415
>ext4_symlink+0x761/0x1170 fs/ext4/namei.c:3162
>vfs_symlink+0x37a/0x5d0 fs/namei.c:4127
>do_symlinkat+0x242/0x2d0 fs/namei.c:4154
>__do_sys_symlink fs/namei.c:4173 [inline]
>__se_sys_symlink fs/namei.c:4171 [inline]
>__x64_sys_symlink+0x59/0x80 fs/namei.c:4171
>do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> -> #1 (jbd2_handle){}:
>start_this_handle+0x5c0/0x1260 fs/jbd2/transaction.c:385
>jbd2__journal_start+0x3c9/0x9f0 fs/jbd2/transaction.c:439
>__ext4_journal_start_sb+0x18d/0x590 fs/ext4/ext4_jbd2.c:81
>ext4_sample_last_mounted fs/ext4/file.c:414 [inline]
>ext4_file_open+0x552/0x7b0 fs/ext4/file.c:439
>do_dentry_open+0x499/0x1250 fs/open.c:771
>vfs_open+0xa0/0xd0 fs/open.c:880
>do_last fs/namei.c:3418 [inline]
>path_openat+0x130f/0x5340 fs/namei.c:3534
>do_filp_open+0x255/0x380 fs/namei.c:3564
>do_open_execat+0x221/0x8e0 fs/exec.c:853
>__do_execve_file.isra.35+0x1707/0x2460 fs/exec.c:1755
>do_execveat_common fs/exec.c:1866 [inline]
>do_execve fs/exec.c:1883 [inline]
>__do_sys_execve fs/exec.c:1964 [inline]
>__se_sys_execve fs/exec.c:1959 [inline]
>__x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
>do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> -> #0 (sb_internal){.+.+}:
>lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
>percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:36
> [inline]
>   

Re: possible deadlock in ext4_evict_inode

2018-09-06 Thread Theodore Y. Ts'o
On Thu, Sep 06, 2018 at 09:41:04AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:b36fdc6853a3 Merge tag 'gpio-v4.19-2' of git://git.kernel...
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1716bc9e40
> kernel config:  https://syzkaller.appspot.com/x/.config?x=6c9564cd177daf0c
> dashboard link: https://syzkaller.appspot.com/bug?extid=0eefc1e06a77d327a056
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13db48be40
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0eefc1e06a77d327a...@syzkaller.appspotmail.com

This looks like it's a Smack issue?

- Ted

> 
> 8021q: adding VLAN 0 to HW filter on device team0
> 8021q: adding VLAN 0 to HW filter on device team0
> syz-executor0 (11193) used greatest stack depth: 15352 bytes left
> 
> ==
> WARNING: possible circular locking dependency detected
> 4.19.0-rc2+ #1 Not tainted
> --
> syz-executor3/11182 is trying to acquire lock:
> c157b042 (sb_internal){.+.+}, at: sb_start_intwrite
> include/linux/fs.h:1613 [inline]
> c157b042 (sb_internal){.+.+}, at: ext4_evict_inode+0x588/0x19b0
> fs/ext4/inode.c:250
> 
> but task is already holding lock:
> 128cdd3b (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.98+0x0/0x30
> mm/page_alloc.c:463
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #3 (fs_reclaim){+.+.}:
>__fs_reclaim_acquire mm/page_alloc.c:3728 [inline]
>fs_reclaim_acquire.part.98+0x24/0x30 mm/page_alloc.c:3739
>fs_reclaim_acquire+0x14/0x20 mm/page_alloc.c:3740
>slab_pre_alloc_hook mm/slab.h:418 [inline]
>slab_alloc mm/slab.c:3378 [inline]
>kmem_cache_alloc_trace+0x2d/0x730 mm/slab.c:3618
>kmalloc include/linux/slab.h:513 [inline]
>kzalloc include/linux/slab.h:707 [inline]
>smk_fetch.part.24+0x5a/0xf0 security/smack/smack_lsm.c:273
>smk_fetch security/smack/smack_lsm.c:3548 [inline]
>smack_d_instantiate+0x946/0xea0 security/smack/smack_lsm.c:3502
>security_d_instantiate+0x5c/0xf0 security/security.c:1287
>d_instantiate+0x5e/0xa0 fs/dcache.c:1870
>shmem_mknod+0x189/0x1f0 mm/shmem.c:2812
>vfs_mknod+0x447/0x800 fs/namei.c:3719
>handle_create+0x1ff/0x7c0 drivers/base/devtmpfs.c:211
>handle drivers/base/devtmpfs.c:374 [inline]
>devtmpfsd+0x27f/0x4c0 drivers/base/devtmpfs.c:400
>kthread+0x35a/0x420 kernel/kthread.c:246
>ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
> 
> -> #2 (>smk_lock){+.+.}:
>__mutex_lock_common kernel/locking/mutex.c:925 [inline]
>__mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
>smack_d_instantiate+0x130/0xea0 security/smack/smack_lsm.c:3369
>security_d_instantiate+0x5c/0xf0 security/security.c:1287
>d_instantiate_new+0x7e/0x160 fs/dcache.c:1889
>ext4_add_nondir+0x81/0x90 fs/ext4/namei.c:2415
>ext4_symlink+0x761/0x1170 fs/ext4/namei.c:3162
>vfs_symlink+0x37a/0x5d0 fs/namei.c:4127
>do_symlinkat+0x242/0x2d0 fs/namei.c:4154
>__do_sys_symlink fs/namei.c:4173 [inline]
>__se_sys_symlink fs/namei.c:4171 [inline]
>__x64_sys_symlink+0x59/0x80 fs/namei.c:4171
>do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> -> #1 (jbd2_handle){}:
>start_this_handle+0x5c0/0x1260 fs/jbd2/transaction.c:385
>jbd2__journal_start+0x3c9/0x9f0 fs/jbd2/transaction.c:439
>__ext4_journal_start_sb+0x18d/0x590 fs/ext4/ext4_jbd2.c:81
>ext4_sample_last_mounted fs/ext4/file.c:414 [inline]
>ext4_file_open+0x552/0x7b0 fs/ext4/file.c:439
>do_dentry_open+0x499/0x1250 fs/open.c:771
>vfs_open+0xa0/0xd0 fs/open.c:880
>do_last fs/namei.c:3418 [inline]
>path_openat+0x130f/0x5340 fs/namei.c:3534
>do_filp_open+0x255/0x380 fs/namei.c:3564
>do_open_execat+0x221/0x8e0 fs/exec.c:853
>__do_execve_file.isra.35+0x1707/0x2460 fs/exec.c:1755
>do_execveat_common fs/exec.c:1866 [inline]
>do_execve fs/exec.c:1883 [inline]
>__do_sys_execve fs/exec.c:1964 [inline]
>__se_sys_execve fs/exec.c:1959 [inline]
>__x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
>do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> -> #0 (sb_internal){.+.+}:
>lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
>percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:36
> [inline]
>   

Re: [PATCH RT 00/22] Linux 4.14.63-rt41-rc1

2018-09-06 Thread Sebastian Andrzej Siewior
On 2018-09-06 12:43:49 [-0400], Steven Rostedt wrote:
> > Your tree (v4.14.63) has
> >   80d20d35af1ed ("nohz: Fix local_timer_softirq_pending()")
> >   0a0e0829f9901 ("nohz: Fix missing tick reprogram when interrupting an
> >  inline softirq")
> > 
> > which means that the patch "nohz: Prevent erroneous tick stop
> > invocations" can be reverted / dropped. It might have clashed during the
> > emerge of the stable tree.
> >
> 
> My tree also has this:
> 
> commit 5536f5491a2e098 
> 
> softirq: keep the 'softirq pending' check RT-only
> 
> The patch "nohz: Prevent erroneous tick stop invocations" was merged
> differently upstream. The original issue where a slow box could lock up
> with a pending timer remained. I currently assume that this is a RT only
> issue and keep the patch as RT only.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> 
> 
> What needs to be done?

This commit can be removed / reverted. The two commits (80d20d35af1ed
and 0a0e0829f9901) fix the issue and this duct-tape commit
(5536f5491a2e098) is no longer required.

> -- Steve

Sebastian


Re: [PATCH RT 00/22] Linux 4.14.63-rt41-rc1

2018-09-06 Thread Sebastian Andrzej Siewior
On 2018-09-06 12:43:49 [-0400], Steven Rostedt wrote:
> > Your tree (v4.14.63) has
> >   80d20d35af1ed ("nohz: Fix local_timer_softirq_pending()")
> >   0a0e0829f9901 ("nohz: Fix missing tick reprogram when interrupting an
> >  inline softirq")
> > 
> > which means that the patch "nohz: Prevent erroneous tick stop
> > invocations" can be reverted / dropped. It might have clashed during the
> > emerge of the stable tree.
> >
> 
> My tree also has this:
> 
> commit 5536f5491a2e098 
> 
> softirq: keep the 'softirq pending' check RT-only
> 
> The patch "nohz: Prevent erroneous tick stop invocations" was merged
> differently upstream. The original issue where a slow box could lock up
> with a pending timer remained. I currently assume that this is a RT only
> issue and keep the patch as RT only.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> 
> 
> What needs to be done?

This commit can be removed / reverted. The two commits (80d20d35af1ed
and 0a0e0829f9901) fix the issue and this duct-tape commit
(5536f5491a2e098) is no longer required.

> -- Steve

Sebastian


Re: [PATCH] x86: use WRITE_ONCE() when setting PTEs

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 10:21:38AM -0700, Dave Hansen wrote:
> There's probably a massive number of things that would break if we
> assumed sane 64-bit writes can be observed piecemeal.

Do assume, it has been observed. WRITE_ONCE()/READ_ONCE() are _required_
if you cannot have load/store tearing.



Re: [PATCH] x86: use WRITE_ONCE() when setting PTEs

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 10:21:38AM -0700, Dave Hansen wrote:
> There's probably a massive number of things that would break if we
> assumed sane 64-bit writes can be observed piecemeal.

Do assume, it has been observed. WRITE_ONCE()/READ_ONCE() are _required_
if you cannot have load/store tearing.



Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 01:47 PM, Sean Christopherson wrote:

On Thu, Sep 06, 2018 at 01:37:50PM -0500, Brijesh Singh wrote:



On 09/06/2018 09:18 AM, Sean Christopherson wrote:




So are we going to be defining a decrypted section for every piece of
machinery now?

That's a bit too much in my book.

Why can't you simply free everything in .data..decrypted on !SVE guests?


That would prevent adding __decrypted to existing declarations, e.g.
hv_clock_boot, which would be ugly in its own right.  A more generic
solution would be to add something like __decrypted_exclusive to mark
data that is used if and only if SEV is active, and then free the
SEV-only data when SEV is disabled.


Oh, and we'd need to make sure __decrypted_exclusive is freed when
!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
array is used only if SEV is active.  This patch unconditionally
defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
!mem_encrypt_active().



Again we have to consider the bare metal scenario while doing this. The
aux array you proposed will be added in decrypted section only when
CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
gets put in .data.decrypted section. At the runtime, if memory
encryption is active then .data.decrypted_hvclock will contains useful
data.

The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.


Right, but won't the data get dumped into the regular .bss in that
case, i.e. needs to be freed?




Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


Re: [PATCH v5 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

2018-09-06 Thread Brijesh Singh




On 09/06/2018 01:47 PM, Sean Christopherson wrote:

On Thu, Sep 06, 2018 at 01:37:50PM -0500, Brijesh Singh wrote:



On 09/06/2018 09:18 AM, Sean Christopherson wrote:




So are we going to be defining a decrypted section for every piece of
machinery now?

That's a bit too much in my book.

Why can't you simply free everything in .data..decrypted on !SVE guests?


That would prevent adding __decrypted to existing declarations, e.g.
hv_clock_boot, which would be ugly in its own right.  A more generic
solution would be to add something like __decrypted_exclusive to mark
data that is used if and only if SEV is active, and then free the
SEV-only data when SEV is disabled.


Oh, and we'd need to make sure __decrypted_exclusive is freed when
!CONFIG_AMD_MEM_ENCRYPT, and preferably !sev_active() since the big
array is used only if SEV is active.  This patch unconditionally
defines hv_clock_dec but only frees it if CONFIG_AMD_MEM_ENCRYPT=y &&
!mem_encrypt_active().



Again we have to consider the bare metal scenario while doing this. The
aux array you proposed will be added in decrypted section only when
CONFIG_AMD_MEM_ENCRYPT=y.  If CONFIG_AMD_MEM_ENCRYPT=n then nothng
gets put in .data.decrypted section. At the runtime, if memory
encryption is active then .data.decrypted_hvclock will contains useful
data.

The __decrypted attribute in "" when CONFIG_AMD_MEM_ENCRYPT=n.


Right, but won't the data get dumped into the regular .bss in that
case, i.e. needs to be freed?




Yes, the auxiliary array will dumped into the regular .bss when
CONFIG_AMD_MEM_ENCRYPT=n. Typically it will be few k, I am not
sure if its worth complicating the code to save those extra memory.
Most of the distro's have CONFIG_AMD_MEM_ENCRYPT=y anyways.


Re: [PATCH V2 5/6] x86/intel_rdt: Use perf infrastructure for measurements

2018-09-06 Thread Reinette Chatre
Hi Peter,

On 9/6/2018 7:15 AM, Peter Zijlstra wrote:
> On Thu, Aug 16, 2018 at 01:16:08PM -0700, Reinette Chatre wrote:
>> +l2_miss_event = perf_event_create_kernel_counter(_miss_attr,
>> + plr->cpu,
>> + NULL, NULL, NULL);
>> +if (IS_ERR(l2_miss_event))
>> +goto out;
>> +
>> +l2_hit_event = perf_event_create_kernel_counter(_hit_attr,
>> +plr->cpu,
>> +NULL, NULL, NULL);
>> +if (IS_ERR(l2_hit_event))
>> +goto out_l2_miss;
>> +
>> +local_irq_disable();
>> +/*
>> + * Check any possible error state of events used by performing
>> + * one local read.
>> + */
>> +if (perf_event_read_local(l2_miss_event, , NULL, NULL)) {
>> +local_irq_enable();
>> +goto out_l2_hit;
>> +}
>> +if (perf_event_read_local(l2_hit_event, , NULL, NULL)) {
>> +local_irq_enable();
>> +goto out_l2_hit;
>> +}
>> +
>> +/*
>> + * Disable hardware prefetchers.
>>   *
>> + * Call wrmsr direcly to avoid the local register variables from
>> + * being overwritten due to reordering of their assignment with
>> + * the wrmsr calls.
>> + */
>> +__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
>> +
>> +/* Initialize rest of local variables */
>> +/*
>> + * Performance event has been validated right before this with
>> + * interrupts disabled - it is thus safe to read the counter index.
>> + */
>> +l2_miss_pmcnum = x86_perf_rdpmc_ctr_get(l2_miss_event);
>> +l2_hit_pmcnum = x86_perf_rdpmc_ctr_get(l2_hit_event);
>> +line_size = plr->line_size;
>> +mem_r = plr->kmem;
>> +size = plr->size;
>> +
>> +/*
>> + * Read counter variables twice - first to load the instructions
>> + * used in L1 cache, second to capture accurate value that does not
>> + * include cache misses incurred because of instruction loads.
>> + */
>> +rdpmcl(l2_hit_pmcnum, l2_hits_before);
>> +rdpmcl(l2_miss_pmcnum, l2_miss_before);
>> +/*
>> + * From SDM: Performing back-to-back fast reads are not guaranteed
>> + * to be monotonic. To guarantee monotonicity on back-toback reads,
>> + * a serializing instruction must be placed between the two
>> + * RDPMC instructions
>> + */
>> +rmb();
>> +rdpmcl(l2_hit_pmcnum, l2_hits_before);
>> +rdpmcl(l2_miss_pmcnum, l2_miss_before);
>> +/*
>> + * rdpmc is not a serializing instruction. Add barrier to prevent
>> + * instructions that follow to begin executing before reading the
>> + * counter value.
>> + */
>> +rmb();
>> +for (i = 0; i < size; i += line_size) {
>> +/*
>> + * Add a barrier to prevent speculative execution of this
>> + * loop reading beyond the end of the buffer.
>> + */
>> +rmb();
>> +asm volatile("mov (%0,%1,1), %%eax\n\t"
>> + :
>> + : "r" (mem_r), "r" (i)
>> + : "%eax", "memory");
>> +}
>> +rdpmcl(l2_hit_pmcnum, l2_hits_after);
>> +rdpmcl(l2_miss_pmcnum, l2_miss_after);
>> +/*
>> + * rdpmc is not a serializing instruction. Add barrier to ensure
>> + * events measured have completed and prevent instructions that
>> + * follow to begin executing before reading the counter value.
>> + */
>> +rmb();
>> +/* Re-enable hardware prefetchers */
>> +wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
>> +local_irq_enable();
>> +trace_pseudo_lock_l2(l2_hits_after - l2_hits_before,
>> + l2_miss_after - l2_miss_before);
>> +
>> +out_l2_hit:
>> +perf_event_release_kernel(l2_hit_event);
>> +out_l2_miss:
>> +perf_event_release_kernel(l2_miss_event);
>> +out:
>> +plr->thread_done = 1;
>> +wake_up_interruptible(>lock_thread_wq);
>> +return 0;
>> +}
>> +
> 
> The above, looks a _LOT_ like the below. And while C does suck a little,
> I'm sure there's something we can do about this.

You are correct, the L2 and L3 cache measurements are very similar.
Indeed, the current implementation does have them together in one
function but I was not able to obtain the same accuracy in measurements
as presented in my previous emails that only did L2 measurements. When
combining the L2 and L3 measurements in one function it is required to
do something like:

  if (need_l2) {
  rdpmcl(l2_hit_pmcnum, l2_hits_before);
  rdpmcl(l2_miss_pmcnum, l2_miss_before);
  }
  if (need_l3) {
  rdpmcl(l3_hit_pmcnum, l3_hits_before);
  rdpmcl(l3_miss_pmcnum, l3_miss_before);
  }
  rmb();
  if (need_l2) {
  rdpmcl(l2_hit_pmcnum, 

Re: [PATCH V2 5/6] x86/intel_rdt: Use perf infrastructure for measurements

2018-09-06 Thread Reinette Chatre
Hi Peter,

On 9/6/2018 7:15 AM, Peter Zijlstra wrote:
> On Thu, Aug 16, 2018 at 01:16:08PM -0700, Reinette Chatre wrote:
>> +l2_miss_event = perf_event_create_kernel_counter(_miss_attr,
>> + plr->cpu,
>> + NULL, NULL, NULL);
>> +if (IS_ERR(l2_miss_event))
>> +goto out;
>> +
>> +l2_hit_event = perf_event_create_kernel_counter(_hit_attr,
>> +plr->cpu,
>> +NULL, NULL, NULL);
>> +if (IS_ERR(l2_hit_event))
>> +goto out_l2_miss;
>> +
>> +local_irq_disable();
>> +/*
>> + * Check any possible error state of events used by performing
>> + * one local read.
>> + */
>> +if (perf_event_read_local(l2_miss_event, , NULL, NULL)) {
>> +local_irq_enable();
>> +goto out_l2_hit;
>> +}
>> +if (perf_event_read_local(l2_hit_event, , NULL, NULL)) {
>> +local_irq_enable();
>> +goto out_l2_hit;
>> +}
>> +
>> +/*
>> + * Disable hardware prefetchers.
>>   *
>> + * Call wrmsr direcly to avoid the local register variables from
>> + * being overwritten due to reordering of their assignment with
>> + * the wrmsr calls.
>> + */
>> +__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
>> +
>> +/* Initialize rest of local variables */
>> +/*
>> + * Performance event has been validated right before this with
>> + * interrupts disabled - it is thus safe to read the counter index.
>> + */
>> +l2_miss_pmcnum = x86_perf_rdpmc_ctr_get(l2_miss_event);
>> +l2_hit_pmcnum = x86_perf_rdpmc_ctr_get(l2_hit_event);
>> +line_size = plr->line_size;
>> +mem_r = plr->kmem;
>> +size = plr->size;
>> +
>> +/*
>> + * Read counter variables twice - first to load the instructions
>> + * used in L1 cache, second to capture accurate value that does not
>> + * include cache misses incurred because of instruction loads.
>> + */
>> +rdpmcl(l2_hit_pmcnum, l2_hits_before);
>> +rdpmcl(l2_miss_pmcnum, l2_miss_before);
>> +/*
>> + * From SDM: Performing back-to-back fast reads are not guaranteed
>> + * to be monotonic. To guarantee monotonicity on back-toback reads,
>> + * a serializing instruction must be placed between the two
>> + * RDPMC instructions
>> + */
>> +rmb();
>> +rdpmcl(l2_hit_pmcnum, l2_hits_before);
>> +rdpmcl(l2_miss_pmcnum, l2_miss_before);
>> +/*
>> + * rdpmc is not a serializing instruction. Add barrier to prevent
>> + * instructions that follow to begin executing before reading the
>> + * counter value.
>> + */
>> +rmb();
>> +for (i = 0; i < size; i += line_size) {
>> +/*
>> + * Add a barrier to prevent speculative execution of this
>> + * loop reading beyond the end of the buffer.
>> + */
>> +rmb();
>> +asm volatile("mov (%0,%1,1), %%eax\n\t"
>> + :
>> + : "r" (mem_r), "r" (i)
>> + : "%eax", "memory");
>> +}
>> +rdpmcl(l2_hit_pmcnum, l2_hits_after);
>> +rdpmcl(l2_miss_pmcnum, l2_miss_after);
>> +/*
>> + * rdpmc is not a serializing instruction. Add barrier to ensure
>> + * events measured have completed and prevent instructions that
>> + * follow to begin executing before reading the counter value.
>> + */
>> +rmb();
>> +/* Re-enable hardware prefetchers */
>> +wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
>> +local_irq_enable();
>> +trace_pseudo_lock_l2(l2_hits_after - l2_hits_before,
>> + l2_miss_after - l2_miss_before);
>> +
>> +out_l2_hit:
>> +perf_event_release_kernel(l2_hit_event);
>> +out_l2_miss:
>> +perf_event_release_kernel(l2_miss_event);
>> +out:
>> +plr->thread_done = 1;
>> +wake_up_interruptible(>lock_thread_wq);
>> +return 0;
>> +}
>> +
> 
> The above, looks a _LOT_ like the below. And while C does suck a little,
> I'm sure there's something we can do about this.

You are correct, the L2 and L3 cache measurements are very similar.
Indeed, the current implementation does have them together in one
function but I was not able to obtain the same accuracy in measurements
as presented in my previous emails that only did L2 measurements. When
combining the L2 and L3 measurements in one function it is required to
do something like:

  if (need_l2) {
  rdpmcl(l2_hit_pmcnum, l2_hits_before);
  rdpmcl(l2_miss_pmcnum, l2_miss_before);
  }
  if (need_l3) {
  rdpmcl(l3_hit_pmcnum, l3_hits_before);
  rdpmcl(l3_miss_pmcnum, l3_miss_before);
  }
  rmb();
  if (need_l2) {
  rdpmcl(l2_hit_pmcnum, 

Re: Widespread crashes in next-20180906

2018-09-06 Thread Matt Hart
On 6 September 2018 at 16:23, Guenter Roeck  wrote:
> On Thu, Sep 06, 2018 at 03:13:23PM +0100, Matt Hart wrote:
>> On 6 September 2018 at 15:04, Theodore Y. Ts'o  wrote:
>> > On Thu, Sep 06, 2018 at 06:45:15AM -0700, Guenter Roeck wrote:
>> >> Build results:
>> >>   total: 134 pass: 133 fail: 1
>>
>> Do you build arm64? Because KernelCI is seeing build failures in arm64
>> defconfig for next-20180906
>> Clearly it's a module build problem for sunxi but I'm not sure who to
>> CC about this.
>>
> I do, but as part of the boot tests, not as explicit build test,
> to save a bit of time. Maybe I should run a build test as well.
>
>> https://kernelci.org/build/next/branch/master/kernel/next-20180906/
>> https://storage.kernelci.org/next/master/next-20180906/arm64/defconfig/build.log
>>
>> ERROR: "sun8i_tcon_top_de_config"
>> [drivers/gpu/drm/sun4i/sun4i-tcon.ko] undefined!
>> ERROR: "sun8i_tcon_top_set_hdmi_src"
>> [drivers/gpu/drm/sun4i/sun4i-tcon.ko] undefined!
>> ERROR: "sun8i_tcon_top_of_table" [drivers/gpu/drm/sun4i/sun4i-tcon.ko]
>> undefined!
>
> Same error here, in the boot tests. Sorry, there are just too many
> failures there, and I didn't go through all the logs.
>
> Do you have bisect results ? Otherwise I can run bisect later today;
> that should hopefully identify the culprit.

Bisecting it now, though it might finish after I'm done for the night.

>
> Thanks,
> Guenter


Re: Widespread crashes in next-20180906

2018-09-06 Thread Matt Hart
On 6 September 2018 at 16:23, Guenter Roeck  wrote:
> On Thu, Sep 06, 2018 at 03:13:23PM +0100, Matt Hart wrote:
>> On 6 September 2018 at 15:04, Theodore Y. Ts'o  wrote:
>> > On Thu, Sep 06, 2018 at 06:45:15AM -0700, Guenter Roeck wrote:
>> >> Build results:
>> >>   total: 134 pass: 133 fail: 1
>>
>> Do you build arm64? Because KernelCI is seeing build failures in arm64
>> defconfig for next-20180906
>> Clearly it's a module build problem for sunxi but I'm not sure who to
>> CC about this.
>>
> I do, but as part of the boot tests, not as explicit build test,
> to save a bit of time. Maybe I should run a build test as well.
>
>> https://kernelci.org/build/next/branch/master/kernel/next-20180906/
>> https://storage.kernelci.org/next/master/next-20180906/arm64/defconfig/build.log
>>
>> ERROR: "sun8i_tcon_top_de_config"
>> [drivers/gpu/drm/sun4i/sun4i-tcon.ko] undefined!
>> ERROR: "sun8i_tcon_top_set_hdmi_src"
>> [drivers/gpu/drm/sun4i/sun4i-tcon.ko] undefined!
>> ERROR: "sun8i_tcon_top_of_table" [drivers/gpu/drm/sun4i/sun4i-tcon.ko]
>> undefined!
>
> Same error here, in the boot tests. Sorry, there are just too many
> failures there, and I didn't go through all the logs.
>
> Do you have bisect results ? Otherwise I can run bisect later today;
> that should hopefully identify the culprit.

Bisecting it now, though it might finish after I'm done for the night.

>
> Thanks,
> Guenter


Re: [PATCH v2 0/6] x86/alternatives: text_poke() fixes

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 06:38:30PM +, Nadav Amit wrote:
> Note that patch 1/6 is still needed to fix false lockdep shoutouts due to a
> recent patch.

For some reason I do not appear to have 1/6 in my inbox. Let me dig
through lkml.


Re: [PATCH v2 0/6] x86/alternatives: text_poke() fixes

2018-09-06 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 06:38:30PM +, Nadav Amit wrote:
> Note that patch 1/6 is still needed to fix false lockdep shoutouts due to a
> recent patch.

For some reason I do not appear to have 1/6 in my inbox. Let me dig
through lkml.


Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-09-06 Thread Peter Zijlstra
On Fri, Jul 20, 2018 at 01:05:59PM -0700, Davidlohr Bueso wrote:
> On Fri, 20 Jul 2018, Andrew Morton wrote:

> >I'm surprised.  Is spin_lock_irqsave() significantly more expensive
> >than spin_lock_irq()?  Relative to all the other stuff those functions
> >are doing?  If so, how come?  Some architectural thing makes
> >local_irq_save() much more costly than local_irq_disable()?
> 
> For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
> the cost of Xen is much higher.

Xen is a moot argument. IIRC the point is that POPF (as used by
*irqrestore()) is a very expensive operation because it changes all
flags and thus has very 'difficult' instruction dependencies, killing
the front end reorder and generating a giant bubble in the pipeline.

Similarly, I suppose PUSHF is an expensive instruction because it needs
all the flags 'stable' and thus needs to wait for a fair number of prior
instructions to retire before it can get on with it.

Combined the whole PUSHF + POPF is _far_ more expensive than STI + CLI,
because the latter only has dependencies on instructions that muck about
with IF -- not that many.


Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-09-06 Thread Peter Zijlstra
On Fri, Jul 20, 2018 at 01:05:59PM -0700, Davidlohr Bueso wrote:
> On Fri, 20 Jul 2018, Andrew Morton wrote:

> >I'm surprised.  Is spin_lock_irqsave() significantly more expensive
> >than spin_lock_irq()?  Relative to all the other stuff those functions
> >are doing?  If so, how come?  Some architectural thing makes
> >local_irq_save() much more costly than local_irq_disable()?
> 
> For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
> the cost of Xen is much higher.

Xen is a moot argument. IIRC the point is that POPF (as used by
*irqrestore()) is a very expensive operation because it changes all
flags and thus has very 'difficult' instruction dependencies, killing
the front end reorder and generating a giant bubble in the pipeline.

Similarly, I suppose PUSHF is an expensive instruction because it needs
all the flags 'stable' and thus needs to wait for a fair number of prior
instructions to retire before it can get on with it.

Combined the whole PUSHF + POPF is _far_ more expensive than STI + CLI,
because the latter only has dependencies on instructions that muck about
with IF -- not that many.


Re: [PATCH] pci: fix pci.c kernel-doc parameter warning

2018-09-06 Thread Bjorn Helgaas
On Sun, Sep 02, 2018 at 07:32:50PM -0700, Randy Dunlap wrote:
> From: Randy Dunlap 
> 
> Fix kernel-doc warning:
> 
> ../drivers/pci/pci.c:218: warning: Excess function parameter 'p' description 
> in 'pci_dev_str_match_path'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Bjorn Helgaas 
> Cc: linux-...@vger.kernel.org

Applied to pci/misc for v4.20, thanks!

> ---
>  drivers/pci/pci.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- lnx-419-rc2.orig/drivers/pci/pci.c
> +++ lnx-419-rc2/drivers/pci/pci.c
> @@ -196,7 +196,7 @@ EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
>  /**
>   * pci_dev_str_match_path - test if a path string matches a device
>   * @dev:the PCI device to test
> - * @p:  string to match the device against
> + * @path:   string to match the device against
>   * @endptr: pointer to the string after the match
>   *
>   * Test if a string (typically from a kernel parameter) formatted as a
> 


Re: [PATCH] pci: fix pci.c kernel-doc parameter warning

2018-09-06 Thread Bjorn Helgaas
On Sun, Sep 02, 2018 at 07:32:50PM -0700, Randy Dunlap wrote:
> From: Randy Dunlap 
> 
> Fix kernel-doc warning:
> 
> ../drivers/pci/pci.c:218: warning: Excess function parameter 'p' description 
> in 'pci_dev_str_match_path'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Bjorn Helgaas 
> Cc: linux-...@vger.kernel.org

Applied to pci/misc for v4.20, thanks!

> ---
>  drivers/pci/pci.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- lnx-419-rc2.orig/drivers/pci/pci.c
> +++ lnx-419-rc2/drivers/pci/pci.c
> @@ -196,7 +196,7 @@ EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
>  /**
>   * pci_dev_str_match_path - test if a path string matches a device
>   * @dev:the PCI device to test
> - * @p:  string to match the device against
> + * @path:   string to match the device against
>   * @endptr: pointer to the string after the match
>   *
>   * Test if a string (typically from a kernel parameter) formatted as a
> 


<    3   4   5   6   7   8   9   10   11   12   >