Re: [PATCH v2 2/5] mfd: lochnagar: Add support for the Cirrus Logic Lochnagar

2018-11-01 Thread Mark Brown
On Thu, Nov 01, 2018 at 10:28:28AM +, Charles Keepax wrote:

>  1.2) Read all the registers from the device on boot
>   + Uses less memory
>   - Potentially very slow boot time
>  1.3) Only read values as you touch registers
>   + Uses less memory
>   + Usually no significant impact on boot time
>   - Can't do read or read/write/modify operations on
> previously untouched registers whilst chip is off

Both of these options also require that you be able to reset the device
which isn't possible with all devices and systems.

> I think this really the crux of my concerns here about updating
> the regmap API are that it feels like almost any path we take
> from here results in worse runtime performance and probably
> equal or worse memory usage. In return we gain a reduction of
> some, whilst admittedly large, very simple data tables from
> the driver. Which is a trade it feels hard to get invested in.

Right, I don't see an urgent problem either - if the tables were
complicated to think about it'd be a bit different but they aren't
particularly and like you say there's tradeoffs both at runtime and
at development time for trying to make the source code smaller.

Equally if there are useful things we can add to the API for some use
cases I'm all for that so long as they don't get in the way of other
users, most of the extensions in the API have come from someone having
problems and coming up with a fix that meets their needs.


signature.asc
Description: PGP signature


Re: [PATCH v2 2/5] mfd: lochnagar: Add support for the Cirrus Logic Lochnagar

2018-11-01 Thread Mark Brown
On Thu, Nov 01, 2018 at 10:28:28AM +, Charles Keepax wrote:

>  1.2) Read all the registers from the device on boot
>   + Uses less memory
>   - Potentially very slow boot time
>  1.3) Only read values as you touch registers
>   + Uses less memory
>   + Usually no significant impact on boot time
>   - Can't do read or read/write/modify operations on
> previously untouched registers whilst chip is off

Both of these options also require that you be able to reset the device
which isn't possible with all devices and systems.

> I think this really the crux of my concerns here about updating
> the regmap API are that it feels like almost any path we take
> from here results in worse runtime performance and probably
> equal or worse memory usage. In return we gain a reduction of
> some, whilst admittedly large, very simple data tables from
> the driver. Which is a trade it feels hard to get invested in.

Right, I don't see an urgent problem either - if the tables were
complicated to think about it'd be a bit different but they aren't
particularly and like you say there's tradeoffs both at runtime and
at development time for trying to make the source code smaller.

Equally if there are useful things we can add to the API for some use
cases I'm all for that so long as they don't get in the way of other
users, most of the extensions in the API have come from someone having
problems and coming up with a fix that meets their needs.


signature.asc
Description: PGP signature


Re: [PATCH 00/10] steal tasks to improve CPU utilization

2018-11-01 Thread Steven Sistare
On 10/22/2018 10:59 AM, Steve Sistare wrote:
> When a CPU has no more CFS tasks to run, and idle_balance() fails to
> find a task, then attempt to steal a task from an overloaded CPU in the
> same LLC. Maintain and use a bitmap of overloaded CPUs to efficiently
> identify candidates.  To minimize search time, steal the first migratable
> task that is found when the bitmap is traversed.  For fairness, search
> for migratable tasks on an overloaded CPU in order of next to run.
> [...] 
> Steve Sistare (10):
>   sched: Provide sparsemask, a reduced contention bitmap
>   sched/topology: Provide hooks to allocate data shared per LLC
>   sched/topology: Provide cfs_overload_cpus bitmap
>   sched/fair: Dynamically update cfs_overload_cpus
>   sched/fair: Hoist idle_stamp up from idle_balance
>   sched/fair: Generalize the detach_task interface
>   sched/fair: Provide can_migrate_task_llc
>   sched/fair: Steal work from an overloaded CPU when CPU goes idle
>   sched/fair: disable stealing if too many NUMA nodes
>   sched/fair: Provide idle search schedstats

(resend, reformatted)

Thanks very much to everyone who has commented on my patch series.
Here are the issues to be addressed in V2 of the series, and the person
that suggested it, or raised the issue that led to it.

Changes for V2:
  * Remove stray patch 10 hunk from patch 5 (Valentin)
  * Fix "warning: label out defined but not used" for !CONFIG_SCHED_SMT
(Valentin)
  * Set SCHED_STEAL_NODE_LIMIT_DEFAULT to 2 (Steve)
  * Call try_steal iff avg_idle exceeds some small threshold (Steve, Valentin)

Possible future work:
  * Use sparsemask and stealing for RT (Steve, Peter)
  * Remove the core and socket levels from idle_balance() and let stealing
handle those levels (Steve, Peter)
  * Delete idle_balance() and use stealing exclusively for handling new idle
(Steve, Peter)
  * Test specjbb multi-warehouse on 8-node systems when stealing for
large NUMA systems is revisited (Peter)
  * Enhance stealing to handle misfits (Valentin)
  * Lower time threshold for task_hot within LLC (Valentin)

Dropped:
  * Skip try_steal() if we bail out of idle_balance() because 
!this_rq->rd->overload 
(Valentin)
I tried it and saw no difference.  Dropped for simplicity.

Does anyone else plan to review the code?  Please tell me now, even if your
review will be delayed.  If yes, I will wait for all comments before producing
V2.  The code changes so far are small.

- Steve


Re: [PATCH 00/10] steal tasks to improve CPU utilization

2018-11-01 Thread Steven Sistare
On 10/22/2018 10:59 AM, Steve Sistare wrote:
> When a CPU has no more CFS tasks to run, and idle_balance() fails to
> find a task, then attempt to steal a task from an overloaded CPU in the
> same LLC. Maintain and use a bitmap of overloaded CPUs to efficiently
> identify candidates.  To minimize search time, steal the first migratable
> task that is found when the bitmap is traversed.  For fairness, search
> for migratable tasks on an overloaded CPU in order of next to run.
> [...] 
> Steve Sistare (10):
>   sched: Provide sparsemask, a reduced contention bitmap
>   sched/topology: Provide hooks to allocate data shared per LLC
>   sched/topology: Provide cfs_overload_cpus bitmap
>   sched/fair: Dynamically update cfs_overload_cpus
>   sched/fair: Hoist idle_stamp up from idle_balance
>   sched/fair: Generalize the detach_task interface
>   sched/fair: Provide can_migrate_task_llc
>   sched/fair: Steal work from an overloaded CPU when CPU goes idle
>   sched/fair: disable stealing if too many NUMA nodes
>   sched/fair: Provide idle search schedstats

(resend, reformatted)

Thanks very much to everyone who has commented on my patch series.
Here are the issues to be addressed in V2 of the series, and the person
that suggested it, or raised the issue that led to it.

Changes for V2:
  * Remove stray patch 10 hunk from patch 5 (Valentin)
  * Fix "warning: label out defined but not used" for !CONFIG_SCHED_SMT
(Valentin)
  * Set SCHED_STEAL_NODE_LIMIT_DEFAULT to 2 (Steve)
  * Call try_steal iff avg_idle exceeds some small threshold (Steve, Valentin)

Possible future work:
  * Use sparsemask and stealing for RT (Steve, Peter)
  * Remove the core and socket levels from idle_balance() and let stealing
handle those levels (Steve, Peter)
  * Delete idle_balance() and use stealing exclusively for handling new idle
(Steve, Peter)
  * Test specjbb multi-warehouse on 8-node systems when stealing for
large NUMA systems is revisited (Peter)
  * Enhance stealing to handle misfits (Valentin)
  * Lower time threshold for task_hot within LLC (Valentin)

Dropped:
  * Skip try_steal() if we bail out of idle_balance() because 
!this_rq->rd->overload 
(Valentin)
I tried it and saw no difference.  Dropped for simplicity.

Does anyone else plan to review the code?  Please tell me now, even if your
review will be delayed.  If yes, I will wait for all comments before producing
V2.  The code changes so far are small.

- Steve


Re: [PATCH] [PATCH V7] watchdog/core: Add watchdog_thresh command line parameter

2018-11-01 Thread Thomas Gleixner
Laurence,

On Tue, 30 Oct 2018, Laurence Oberman wrote:

This looks much better.

But please send your patches first to yourself. Your subject ended up with
'[PATCH] [PATCH V7]' instead of just '[PATCH V7]'

> Both graphics and serial consoles are exposed to hard lockups
> when handling a large amount of messaging. The kernel watchdog_thresh
> parameter up to now has not been available to be set on the kernel line for

kernel command line please. kernel line does not mean anything,

> early boot.
> This patch allows the setting of watchdog_thresh to be increased

Again:

 git grep 'This patch' Documentation/process/

This leads you to:

 "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
  instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
  to do frotz", as if you are giving orders to the codebase to change
  its behaviour."

Documentation is there for a reason.

> Both graphics and serial consoles are exposed to hard lockups
> when handling a large amount of messaging. The kernel watchdog_thresh
> parameter up to now has not been available to be set on the kernel line for
> early boot.
> This patch allows the setting of watchdog_thresh to be increased
> when needed to extend the hard lockup timer in the console code.
> Note that this also affects the soft lockup detector

Just for nitpickings sake. The change log is slightly confusing. Let me
give you a suggestion:

  The hard and soft lockup detector threshold has a default value of 10
  seconds which can only be changed via sysctl.

  During early boot lockup detection can trigger when noisy debugging emits
  a large amount of messages to the console, but there is no way to set a
  larger threshold on the kernel command line. The detector can only be
  completely disabled.

  Add a new watchdog_thresh= command line parameter to allow boot time
  control over the threshold. It works in the same way as the sysctl and
  affects both the soft and the hard lockup detectors.

Hmm?

> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4957,6 +4957,14 @@
>   or other driver-specific files in the
>   Documentation/watchdog/ directory.
>  
> +watchdog_thresh=

Please use tabs instead of spaces

> +[KNL]
> +Set the hard lockup detector stall duration
> +threshold in seconds. The soft lockup detector
> +threshold is set to twice the value. A value of 0
> +disables both lockup detectors. Default is 10
> +seconds.

Same here.

Thanks,

tglx



Re: [PATCH] [PATCH V7] watchdog/core: Add watchdog_thresh command line parameter

2018-11-01 Thread Thomas Gleixner
Laurence,

On Tue, 30 Oct 2018, Laurence Oberman wrote:

This looks much better.

But please send your patches first to yourself. Your subject ended up with
'[PATCH] [PATCH V7]' instead of just '[PATCH V7]'

> Both graphics and serial consoles are exposed to hard lockups
> when handling a large amount of messaging. The kernel watchdog_thresh
> parameter up to now has not been available to be set on the kernel line for

kernel command line please. kernel line does not mean anything,

> early boot.
> This patch allows the setting of watchdog_thresh to be increased

Again:

 git grep 'This patch' Documentation/process/

This leads you to:

 "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
  instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
  to do frotz", as if you are giving orders to the codebase to change
  its behaviour."

Documentation is there for a reason.

> Both graphics and serial consoles are exposed to hard lockups
> when handling a large amount of messaging. The kernel watchdog_thresh
> parameter up to now has not been available to be set on the kernel line for
> early boot.
> This patch allows the setting of watchdog_thresh to be increased
> when needed to extend the hard lockup timer in the console code.
> Note that this also affects the soft lockup detector

Just for nitpickings sake. The change log is slightly confusing. Let me
give you a suggestion:

  The hard and soft lockup detector threshold has a default value of 10
  seconds which can only be changed via sysctl.

  During early boot lockup detection can trigger when noisy debugging emits
  a large amount of messages to the console, but there is no way to set a
  larger threshold on the kernel command line. The detector can only be
  completely disabled.

  Add a new watchdog_thresh= command line parameter to allow boot time
  control over the threshold. It works in the same way as the sysctl and
  affects both the soft and the hard lockup detectors.

Hmm?

> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4957,6 +4957,14 @@
>   or other driver-specific files in the
>   Documentation/watchdog/ directory.
>  
> +watchdog_thresh=

Please use tabs instead of spaces

> +[KNL]
> +Set the hard lockup detector stall duration
> +threshold in seconds. The soft lockup detector
> +threshold is set to twice the value. A value of 0
> +disables both lockup detectors. Default is 10
> +seconds.

Same here.

Thanks,

tglx



Re: [PATCH V2 1/2] dt-bindings: mmc: sdhci-msm: Add new compatible string for sdcdc10 DLL

2018-11-01 Thread Veerabhadrarao Badiganti




On 10/13/2018 2:18 AM, Rob Herring wrote:

On Mon, Oct 08, 2018 at 06:55:37PM +0530, Veerabhadrarao Badiganti wrote:

On the SDHC-MSM controllers which makes use of sdcdc10 variant DLLs,
the DLL configuration needs to be restored whenever controller clocks
are gated. This new compatible string denotes the sdhc-msm controller
variant which uses sdcdc10 DLL.

What is sdcdc10 DLL?

Seems like a work-around to not having SoC specific compatible strings.


Will update this to SOC specific compatible string.


Signed-off-by: Veerabhadrarao Badiganti 
---
  Documentation/devicetree/bindings/mmc/sdhci-msm.txt | 4 
  1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/mmc/sdhci-msm.txt 
b/Documentation/devicetree/bindings/mmc/sdhci-msm.txt
index 3720385..49b0a43 100644
--- a/Documentation/devicetree/bindings/mmc/sdhci-msm.txt
+++ b/Documentation/devicetree/bindings/mmc/sdhci-msm.txt
@@ -6,7 +6,11 @@ and the properties used by the sdhci-msm driver.
  Required properties:
  - compatible: Should contain:
"qcom,sdhci-msm-v4" for sdcc versions less than 5.0
+   "qcom,sdhci-msm-v4-sdcdc10" for sdcc versions < 5.0 and
+   which makes use of sdcdc10 variant DLLs.
"qcom,sdhci-msm-v5" for sdcc versions >= 5.0
+   "qcom,sdhci-msm-v5-sdcdc10" for sdcc versions >= 5.0 and
+   which makes use of sdcdc10 variant DLLs.
For SDCC version 5.0.0, MCI registers are removed from SDCC
interface and some registers are moved to HC. New compatible
string is added to support this change - "qcom,sdhci-msm-v5".
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.



Thanks
Veera


Re: [PATCH V2 1/2] dt-bindings: mmc: sdhci-msm: Add new compatible string for sdcdc10 DLL

2018-11-01 Thread Veerabhadrarao Badiganti




On 10/13/2018 2:18 AM, Rob Herring wrote:

On Mon, Oct 08, 2018 at 06:55:37PM +0530, Veerabhadrarao Badiganti wrote:

On the SDHC-MSM controllers which makes use of sdcdc10 variant DLLs,
the DLL configuration needs to be restored whenever controller clocks
are gated. This new compatible string denotes the sdhc-msm controller
variant which uses sdcdc10 DLL.

What is sdcdc10 DLL?

Seems like a work-around to not having SoC specific compatible strings.


Will update this to SOC specific compatible string.


Signed-off-by: Veerabhadrarao Badiganti 
---
  Documentation/devicetree/bindings/mmc/sdhci-msm.txt | 4 
  1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/mmc/sdhci-msm.txt 
b/Documentation/devicetree/bindings/mmc/sdhci-msm.txt
index 3720385..49b0a43 100644
--- a/Documentation/devicetree/bindings/mmc/sdhci-msm.txt
+++ b/Documentation/devicetree/bindings/mmc/sdhci-msm.txt
@@ -6,7 +6,11 @@ and the properties used by the sdhci-msm driver.
  Required properties:
  - compatible: Should contain:
"qcom,sdhci-msm-v4" for sdcc versions less than 5.0
+   "qcom,sdhci-msm-v4-sdcdc10" for sdcc versions < 5.0 and
+   which makes use of sdcdc10 variant DLLs.
"qcom,sdhci-msm-v5" for sdcc versions >= 5.0
+   "qcom,sdhci-msm-v5-sdcdc10" for sdcc versions >= 5.0 and
+   which makes use of sdcdc10 variant DLLs.
For SDCC version 5.0.0, MCI registers are removed from SDCC
interface and some registers are moved to HC. New compatible
string is added to support this change - "qcom,sdhci-msm-v5".
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.



Thanks
Veera


Re: [PATCH 17/28] tools include uapi: Update asound.h copy

2018-11-01 Thread Takashi Sakamoto

Hi Arnaldo,

On Nov 1 2018 4:29, Arnaldo Carvalho de Melo wrote:

Wouldn't a symlink be simpler?


That would be equivalent to using it directly, see my response to
Takashi for a few of the reasons this is done this way.


I have a plan to add changes to 'include/uapi/sound/asound.h' in a
development period for Linux kernel v4.21[1]. Would I need to add you to
C.C list of the future patch to notify the change, or things go well
without such special care for your side?

[1] 
https://github.com/takaswie/presentations/blob/master/20181021/contents.md



Regards

Takashi Sakamoto


Re: [PATCH 17/28] tools include uapi: Update asound.h copy

2018-11-01 Thread Takashi Sakamoto

Hi Arnaldo,

On Nov 1 2018 4:29, Arnaldo Carvalho de Melo wrote:

Wouldn't a symlink be simpler?


That would be equivalent to using it directly, see my response to
Takashi for a few of the reasons this is done this way.


I have a plan to add changes to 'include/uapi/sound/asound.h' in a
development period for Linux kernel v4.21[1]. Would I need to add you to
C.C list of the future patch to notify the change, or things go well
without such special care for your side?

[1] 
https://github.com/takaswie/presentations/blob/master/20181021/contents.md



Regards

Takashi Sakamoto


RE: [RFC PATCH] Implement /proc/pid/kill

2018-11-01 Thread David Laight
From: Sent: 31 October 2018 13:28
...
> * I actually have a local variant of the patch that would have you
> open "/proc/$PID/kill/$SIGNO" instead, since different signal numbers
> have different permission checks.

I think you'd need the open() to specify some specific unusual
open modes.
Otherwise it'll be far too easy for scripts (and people) to
accidentally send every signal to every process.

Also think of the memory footprint.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


RE: [RFC PATCH] Implement /proc/pid/kill

2018-11-01 Thread David Laight
From: Sent: 31 October 2018 13:28
...
> * I actually have a local variant of the patch that would have you
> open "/proc/$PID/kill/$SIGNO" instead, since different signal numbers
> have different permission checks.

I think you'd need the open() to specify some specific unusual
open modes.
Otherwise it'll be far too easy for scripts (and people) to
accidentally send every signal to every process.

Also think of the memory footprint.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


[PATCH] ASoC: qdsp6: q6afe: Fix wrong MI2S SD line mask

2018-11-01 Thread Rohit kumar
SD line mask for MI2S starts from BIT 0 instead of BIT 1.
Fix all bit mask for MI2S SD lines.

Signed-off-by: Rohit kumar 
---
 sound/soc/qcom/qdsp6/q6afe.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/sound/soc/qcom/qdsp6/q6afe.c b/sound/soc/qcom/qdsp6/q6afe.c
index 000775b..829b5e987 100644
--- a/sound/soc/qcom/qdsp6/q6afe.c
+++ b/sound/soc/qcom/qdsp6/q6afe.c
@@ -49,14 +49,14 @@
 #define AFE_PORT_I2S_SD1   0x2
 #define AFE_PORT_I2S_SD2   0x3
 #define AFE_PORT_I2S_SD3   0x4
-#define AFE_PORT_I2S_SD0_MASK  BIT(0x1)
-#define AFE_PORT_I2S_SD1_MASK  BIT(0x2)
-#define AFE_PORT_I2S_SD2_MASK  BIT(0x3)
-#define AFE_PORT_I2S_SD3_MASK  BIT(0x4)
-#define AFE_PORT_I2S_SD0_1_MASKGENMASK(2, 1)
-#define AFE_PORT_I2S_SD2_3_MASKGENMASK(4, 3)
-#define AFE_PORT_I2S_SD0_1_2_MASK  GENMASK(3, 1)
-#define AFE_PORT_I2S_SD0_1_2_3_MASKGENMASK(4, 1)
+#define AFE_PORT_I2S_SD0_MASK  BIT(0x0)
+#define AFE_PORT_I2S_SD1_MASK  BIT(0x1)
+#define AFE_PORT_I2S_SD2_MASK  BIT(0x2)
+#define AFE_PORT_I2S_SD3_MASK  BIT(0x3)
+#define AFE_PORT_I2S_SD0_1_MASKGENMASK(1, 0)
+#define AFE_PORT_I2S_SD2_3_MASKGENMASK(3, 2)
+#define AFE_PORT_I2S_SD0_1_2_MASK  GENMASK(2, 0)
+#define AFE_PORT_I2S_SD0_1_2_3_MASKGENMASK(3, 0)
 #define AFE_PORT_I2S_QUAD010x5
 #define AFE_PORT_I2S_QUAD230x6
 #define AFE_PORT_I2S_6CHS  0x7
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.



[PATCH] ASoC: qdsp6: q6afe: Fix wrong MI2S SD line mask

2018-11-01 Thread Rohit kumar
SD line mask for MI2S starts from BIT 0 instead of BIT 1.
Fix all bit mask for MI2S SD lines.

Signed-off-by: Rohit kumar 
---
 sound/soc/qcom/qdsp6/q6afe.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/sound/soc/qcom/qdsp6/q6afe.c b/sound/soc/qcom/qdsp6/q6afe.c
index 000775b..829b5e987 100644
--- a/sound/soc/qcom/qdsp6/q6afe.c
+++ b/sound/soc/qcom/qdsp6/q6afe.c
@@ -49,14 +49,14 @@
 #define AFE_PORT_I2S_SD1   0x2
 #define AFE_PORT_I2S_SD2   0x3
 #define AFE_PORT_I2S_SD3   0x4
-#define AFE_PORT_I2S_SD0_MASK  BIT(0x1)
-#define AFE_PORT_I2S_SD1_MASK  BIT(0x2)
-#define AFE_PORT_I2S_SD2_MASK  BIT(0x3)
-#define AFE_PORT_I2S_SD3_MASK  BIT(0x4)
-#define AFE_PORT_I2S_SD0_1_MASKGENMASK(2, 1)
-#define AFE_PORT_I2S_SD2_3_MASKGENMASK(4, 3)
-#define AFE_PORT_I2S_SD0_1_2_MASK  GENMASK(3, 1)
-#define AFE_PORT_I2S_SD0_1_2_3_MASKGENMASK(4, 1)
+#define AFE_PORT_I2S_SD0_MASK  BIT(0x0)
+#define AFE_PORT_I2S_SD1_MASK  BIT(0x1)
+#define AFE_PORT_I2S_SD2_MASK  BIT(0x2)
+#define AFE_PORT_I2S_SD3_MASK  BIT(0x3)
+#define AFE_PORT_I2S_SD0_1_MASKGENMASK(1, 0)
+#define AFE_PORT_I2S_SD2_3_MASKGENMASK(3, 2)
+#define AFE_PORT_I2S_SD0_1_2_MASK  GENMASK(2, 0)
+#define AFE_PORT_I2S_SD0_1_2_3_MASKGENMASK(3, 0)
 #define AFE_PORT_I2S_QUAD010x5
 #define AFE_PORT_I2S_QUAD230x6
 #define AFE_PORT_I2S_6CHS  0x7
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.



Hello My Dear Friend,

2018-11-01 Thread Mr Marc Joseph Hebert



I am Mr Marc Joseph Hebert a I work in the Finance Risk control/Accounts Broker 
Unit of a prestigious bank in London. Under varying state laws in United 
Kingdom, financial institutions and other companies are required to turn over 
any funds considered "abandoned," including uncashed paychecks, forgotten bank 
account balances, unclaimed refunds, insurance payouts and contents of safe 
deposit boxes. I have the official duty to process and release unclaimed funds 
in the bank to government treasury.

Recently, there are multiple abandoned accounts in the bank which I have 
transferred some to the government treasury. Some of these funds are what I 
want to transfer (10.6m GBP) out of the bank to a sincere and

trustworthy person for either investment purpose or sharing between us.  Can 
you handle this with confidentiality, sincerity and seriousness?

Please indicate your interest by simply replying to this email with your full 
personal details below.

(1) Your Full Name:
(2) Full Residential Address:
(3) Phone And Fax Number:
(4) Occupation:
(5) Whatsapp Number:

I anticipate your urgent response to this financial deal.

Your responds should be forwarded to my private email below.

marc.joseph.hebe...@gmail.com

Sincerely,

Mr Marc Joseph Hebert
Finance Risk control/Accounts Broker Unit.


Hello My Dear Friend,

2018-11-01 Thread Mr Marc Joseph Hebert



I am Mr Marc Joseph Hebert a I work in the Finance Risk control/Accounts Broker 
Unit of a prestigious bank in London. Under varying state laws in United 
Kingdom, financial institutions and other companies are required to turn over 
any funds considered "abandoned," including uncashed paychecks, forgotten bank 
account balances, unclaimed refunds, insurance payouts and contents of safe 
deposit boxes. I have the official duty to process and release unclaimed funds 
in the bank to government treasury.

Recently, there are multiple abandoned accounts in the bank which I have 
transferred some to the government treasury. Some of these funds are what I 
want to transfer (10.6m GBP) out of the bank to a sincere and

trustworthy person for either investment purpose or sharing between us.  Can 
you handle this with confidentiality, sincerity and seriousness?

Please indicate your interest by simply replying to this email with your full 
personal details below.

(1) Your Full Name:
(2) Full Residential Address:
(3) Phone And Fax Number:
(4) Occupation:
(5) Whatsapp Number:

I anticipate your urgent response to this financial deal.

Your responds should be forwarded to my private email below.

marc.joseph.hebe...@gmail.com

Sincerely,

Mr Marc Joseph Hebert
Finance Risk control/Accounts Broker Unit.


[tip:irq/urgent] irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function

2018-11-01 Thread tip-bot for Dan Carpenter
Commit-ID:  3424243e39e8ec138486926949e3668e7553125d
Gitweb: https://git.kernel.org/tip/3424243e39e8ec138486926949e3668e7553125d
Author: Dan Carpenter 
AuthorDate: Sat, 13 Oct 2018 13:22:46 +0300
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Nov 2018 12:38:48 +0100

irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function

The devm_ioremap_resource() function never returns NULL, it returns
error pointers.

Fixes: 61ce8d8d8a81 ("irqchip/irq-mvebu-sei: Add new driver for Marvell SEI")
Signed-off-by: Dan Carpenter 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Miquel Raynal 
Cc: Jason Cooper 
Cc: Andrew Lunn 
Cc: Gregory Clement 
Cc: Sebastian Hesselbarth 
Cc: Marc Zyngier 
Cc: linux-arm-ker...@lists.infradead.org
Cc: kernel-janit...@vger.kernel.org
Link: https://lkml.kernel.org/r/20181013102246.GD16086@mwanda

---
 drivers/irqchip/irq-mvebu-sei.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-mvebu-sei.c b/drivers/irqchip/irq-mvebu-sei.c
index 566d69a2edbc..add4c9c934c8 100644
--- a/drivers/irqchip/irq-mvebu-sei.c
+++ b/drivers/irqchip/irq-mvebu-sei.c
@@ -384,9 +384,9 @@ static int mvebu_sei_probe(struct platform_device *pdev)
 
sei->res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
sei->base = devm_ioremap_resource(sei->dev, sei->res);
-   if (!sei->base) {
+   if (IS_ERR(sei->base)) {
dev_err(sei->dev, "Failed to remap SEI resource\n");
-   return -ENODEV;
+   return PTR_ERR(sei->base);
}
 
/* Retrieve the SEI capabilities with the interrupt ranges */


[tip:irq/urgent] irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function

2018-11-01 Thread tip-bot for Dan Carpenter
Commit-ID:  3424243e39e8ec138486926949e3668e7553125d
Gitweb: https://git.kernel.org/tip/3424243e39e8ec138486926949e3668e7553125d
Author: Dan Carpenter 
AuthorDate: Sat, 13 Oct 2018 13:22:46 +0300
Committer:  Thomas Gleixner 
CommitDate: Thu, 1 Nov 2018 12:38:48 +0100

irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function

The devm_ioremap_resource() function never returns NULL, it returns
error pointers.

Fixes: 61ce8d8d8a81 ("irqchip/irq-mvebu-sei: Add new driver for Marvell SEI")
Signed-off-by: Dan Carpenter 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Miquel Raynal 
Cc: Jason Cooper 
Cc: Andrew Lunn 
Cc: Gregory Clement 
Cc: Sebastian Hesselbarth 
Cc: Marc Zyngier 
Cc: linux-arm-ker...@lists.infradead.org
Cc: kernel-janit...@vger.kernel.org
Link: https://lkml.kernel.org/r/20181013102246.GD16086@mwanda

---
 drivers/irqchip/irq-mvebu-sei.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-mvebu-sei.c b/drivers/irqchip/irq-mvebu-sei.c
index 566d69a2edbc..add4c9c934c8 100644
--- a/drivers/irqchip/irq-mvebu-sei.c
+++ b/drivers/irqchip/irq-mvebu-sei.c
@@ -384,9 +384,9 @@ static int mvebu_sei_probe(struct platform_device *pdev)
 
sei->res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
sei->base = devm_ioremap_resource(sei->dev, sei->res);
-   if (!sei->base) {
+   if (IS_ERR(sei->base)) {
dev_err(sei->dev, "Failed to remap SEI resource\n");
-   return -ENODEV;
+   return PTR_ERR(sei->base);
}
 
/* Retrieve the SEI capabilities with the interrupt ranges */


RE: [PATCH] pinctrl: generic: Avoid several implicit enum conversions

2018-11-01 Thread David Laight
From: Nathan Chancellor
> Sent: 01 November 2018 00:04
...
> > A slightly lesser evil variant is to add a few PIN_CONFIG_CUSTOM_1
> > PIN_CONFIG_CUSTOM_2 etc at the end of the enum and just
> > #define MY_CONFIG PIN_CONFIG_CUSTOM_1
> > in all drivers that use these.
> >
> 
> Some drivers actually just define their pin config params like:
> 
> #define VAR (PIN_CONFIG_END + 1)

You probably want to add 'custom' definitions after PIN_CONFIG_END
so that a future compiler versions doesn't generate an error
because (PIN_CONFIG_END + 1) isn't a valid value for the enum.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



RE: [PATCH] pinctrl: generic: Avoid several implicit enum conversions

2018-11-01 Thread David Laight
From: Nathan Chancellor
> Sent: 01 November 2018 00:04
...
> > A slightly lesser evil variant is to add a few PIN_CONFIG_CUSTOM_1
> > PIN_CONFIG_CUSTOM_2 etc at the end of the enum and just
> > #define MY_CONFIG PIN_CONFIG_CUSTOM_1
> > in all drivers that use these.
> >
> 
> Some drivers actually just define their pin config params like:
> 
> #define VAR (PIN_CONFIG_END + 1)

You probably want to add 'custom' definitions after PIN_CONFIG_END
so that a future compiler versions doesn't generate an error
because (PIN_CONFIG_END + 1) isn't a valid value for the enum.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [PATCH AUTOSEL 4.19 022/146] cpupower: Fix coredump on VMWare

2018-11-01 Thread Prarit Bhargava



On 10/31/2018 07:03 PM, Sasha Levin wrote:
> From: Prarit Bhargava 
> 
> [ Upstream commit f69ffc5d3db8f1f03fd6d1df5930f9a1fbd787b6 ]
> 
> cpupower crashes on VMWare guests.  The guests have the AMD PStateDef MSR
> (0xC0010064 + state number) set to zero.  As a result fid and did are zero
> and the crash occurs because of a divide by zero (cof = fid/did).  This
> can be prevented by checking the enable bit in the PStateDef MSR before
> calculating cof.  By doing this the value of pstate[i] remains zero and
> the value can be tested before displaying the active Pstates.
> 
> Check the enable bit in the PstateDef register for all supported families
> and only print out enabled Pstates.
> 

Hi Sasha,

This patch, f69ffc5d3db8, depends on 8c22e2f69592 ("cpupower: Fix AMD Family
0x17 msr_pstate size").  Without 8c22e2f69592 the patch below will always read a
value of "0" and not output the correct data.

8c22e2f69592 must be applied to any stable branch that f69ffc5d3db8 is applied 
to.

P.


> Signed-off-by: Prarit Bhargava 
> Cc: Shuah Khan 
> Cc: Stafford Horne 
> Signed-off-by: Shuah Khan (Samsung OSG) 
> Signed-off-by: Sasha Levin 
> ---
>  tools/power/cpupower/utils/cpufreq-info.c | 2 ++
>  tools/power/cpupower/utils/helpers/amd.c  | 5 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/tools/power/cpupower/utils/cpufreq-info.c 
> b/tools/power/cpupower/utils/cpufreq-info.c
> index df43cd45d810..ccd08dd00996 100644
> --- a/tools/power/cpupower/utils/cpufreq-info.c
> +++ b/tools/power/cpupower/utils/cpufreq-info.c
> @@ -200,6 +200,8 @@ static int get_boost_mode(unsigned int cpu)
>   printf(_("Boost States: %d\n"), b_states);
>   printf(_("Total States: %d\n"), pstate_no);
>   for (i = 0; i < pstate_no; i++) {
> + if (!pstates[i])
> + continue;
>   if (i < b_states)
>   printf(_("Pstate-Pb%d: %luMHz (boost state)"
>"\n"), i, pstates[i]);
> diff --git a/tools/power/cpupower/utils/helpers/amd.c 
> b/tools/power/cpupower/utils/helpers/amd.c
> index bb41cdd0df6b..58d23997424d 100644
> --- a/tools/power/cpupower/utils/helpers/amd.c
> +++ b/tools/power/cpupower/utils/helpers/amd.c
> @@ -119,6 +119,11 @@ int decode_pstates(unsigned int cpu, unsigned int 
> cpu_family,
>   }
>   if (read_msr(cpu, MSR_AMD_PSTATE + i, ))
>   return -1;
> + if ((cpu_family == 0x17) && (!pstate.fam17h_bits.en))
> + continue;
> + else if (!pstate.bits.en)
> + continue;
> +
>   pstates[i] = get_cof(cpu_family, pstate);
>   }
>   *no = i;
> 


Re: [PATCH AUTOSEL 4.19 022/146] cpupower: Fix coredump on VMWare

2018-11-01 Thread Prarit Bhargava



On 10/31/2018 07:03 PM, Sasha Levin wrote:
> From: Prarit Bhargava 
> 
> [ Upstream commit f69ffc5d3db8f1f03fd6d1df5930f9a1fbd787b6 ]
> 
> cpupower crashes on VMWare guests.  The guests have the AMD PStateDef MSR
> (0xC0010064 + state number) set to zero.  As a result fid and did are zero
> and the crash occurs because of a divide by zero (cof = fid/did).  This
> can be prevented by checking the enable bit in the PStateDef MSR before
> calculating cof.  By doing this the value of pstate[i] remains zero and
> the value can be tested before displaying the active Pstates.
> 
> Check the enable bit in the PstateDef register for all supported families
> and only print out enabled Pstates.
> 

Hi Sasha,

This patch, f69ffc5d3db8, depends on 8c22e2f69592 ("cpupower: Fix AMD Family
0x17 msr_pstate size").  Without 8c22e2f69592 the patch below will always read a
value of "0" and not output the correct data.

8c22e2f69592 must be applied to any stable branch that f69ffc5d3db8 is applied 
to.

P.


> Signed-off-by: Prarit Bhargava 
> Cc: Shuah Khan 
> Cc: Stafford Horne 
> Signed-off-by: Shuah Khan (Samsung OSG) 
> Signed-off-by: Sasha Levin 
> ---
>  tools/power/cpupower/utils/cpufreq-info.c | 2 ++
>  tools/power/cpupower/utils/helpers/amd.c  | 5 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/tools/power/cpupower/utils/cpufreq-info.c 
> b/tools/power/cpupower/utils/cpufreq-info.c
> index df43cd45d810..ccd08dd00996 100644
> --- a/tools/power/cpupower/utils/cpufreq-info.c
> +++ b/tools/power/cpupower/utils/cpufreq-info.c
> @@ -200,6 +200,8 @@ static int get_boost_mode(unsigned int cpu)
>   printf(_("Boost States: %d\n"), b_states);
>   printf(_("Total States: %d\n"), pstate_no);
>   for (i = 0; i < pstate_no; i++) {
> + if (!pstates[i])
> + continue;
>   if (i < b_states)
>   printf(_("Pstate-Pb%d: %luMHz (boost state)"
>"\n"), i, pstates[i]);
> diff --git a/tools/power/cpupower/utils/helpers/amd.c 
> b/tools/power/cpupower/utils/helpers/amd.c
> index bb41cdd0df6b..58d23997424d 100644
> --- a/tools/power/cpupower/utils/helpers/amd.c
> +++ b/tools/power/cpupower/utils/helpers/amd.c
> @@ -119,6 +119,11 @@ int decode_pstates(unsigned int cpu, unsigned int 
> cpu_family,
>   }
>   if (read_msr(cpu, MSR_AMD_PSTATE + i, ))
>   return -1;
> + if ((cpu_family == 0x17) && (!pstate.fam17h_bits.en))
> + continue;
> + else if (!pstate.bits.en)
> + continue;
> +
>   pstates[i] = get_cof(cpu_family, pstate);
>   }
>   *no = i;
> 


Re: [PATCH v2 2/5] mfd: lochnagar: Add support for the Cirrus Logic Lochnagar

2018-11-01 Thread Richard Fitzgerald

On 01/11/18 10:28, Charles Keepax wrote:

On Mon, Oct 29, 2018 at 11:04:39AM +, Lee Jones wrote:

On Fri, 26 Oct 2018, Mark Brown wrote:

On Fri, Oct 26, 2018 at 09:00:51AM +0100, Lee Jones wrote:

On Thu, 25 Oct 2018, Richard Fitzgerald wrote:


I have re-ordered some of the quotes here for the benefit
of clarity. I will start with the Lochnagar focused bits
and then cover some of the more general regmap discussion.
Apologies for the wall of text towards the end but it seemed
wise to explore some of the why for parts of the current regmap
implementation and some of the implications for changing it.


I think from the perspective of Richard and Charles who are just trying
to get their driver merged this is something of an abstract distinction.
If the driver were merged and this discussion were happening separately
their perspective would most likely be different.


Charles has already mentioned that he'd take a look at the current
*use* (not changing the API, but the way in which Lochnagar
*uses/consumes* it).  Actions which would be most welcomed.


I kind of said this above but just to be clear this driver seems to me
like an idiomatic user of the regmap API as it is today.  My guess is
that we could probably loose the defaults tables and not suffer too much
but equally well they don't hurt from a regmap point of view.


Perhaps Charles could elaborate on whether this is possible or not?


So on this front I have had a look through and we can drop the
defaults tables for Lochnagar, although as I shall cover later
Lochnagar is the exceptional case with respect to our normal
devices.


Utilising range support here would certainly help IMHO.



I am rather hesitant to switch to the range API. Whilst it is
significantly more clear in certain situations, such as say
mapping out the memory for a DSP, where there exist large
amorphis blobs of identical function registers. I am really
not convinced it really suits something like the register map
that controls Lochnagar. You have an intertwinned mix of
various purpose registers, each with a clearly defined name
and potentially with quite fine grained properties.

Certainly when working with the driver I want to be able to
fairly quickly see what registers are marked as by name. The
callbacks are very simple and I don't need to look up where
register are in the regmap to be able check their attributes.
But perhaps I have just got too used to seeing these callbacks,
things do have a way of becoming normal after being exposed to
them for a while.

What I will try for the next spin is to try to use as much:

   case REG1...REG2:

As I can without totally sacrificing grepability/clarity and
hopefully that will get us to a compromise we can all live
with at least for now. Lochnagar is a fairly small part so it
feels like this should be doable.


+   ret = devm_of_platform_populate(dev);
+   if (ret < 0) {
+   dev_err(dev, "Failed to populate child
nodes:
%d\n", ret);
+   return ret;
+   }


Please do not mix OF and MFD device registration
strategies.

Pick one and register all devices through your chosen
method.


Hmmm we use this to do things like register some fixed
regulators
and clocks that don't need any control but do need to be
associated
with the device. I could do that through the MFD although it
is in
direct conflict with the feedback on the clock patches I
received
to move the fixed clocks into devicetree rather than
registering
them manually (see v2 of the patch chain).


The I suggest moving everything to DT.


So pulling this out from earlier discussions in this thread,
it seems I can happily move all the child device registration
into device tree. I will also try this for the next version of
the patch, unless anyone wants to object? But it does change
the DT binding quite a lot as the individual sub drivers now
each require their own node rather than one single unified
Lochnagar node.



We went through this discussion with the Madera MFD patches. I had
originally implemented it using DT to register the child drivers and
it was nice in some ways each driver having its own node. But Mark
and Rob didn't like it so I went back to non-DT child registration with
all sharing the parent MFD node. It would be nice if we could stick to
one way of doing it so that Cirrus drivers don't flip-flop between
different styles of DT binding.

With the Madera MFD, since it now uses non-DT registration of children
if there was a reason we need to be able to register DT-defined children
(and there are potential uses like adding platform-specific virtual
regulators that are treated as a child) the bindings are now fixed so we
would end up having a mixed non-DT and DT registration.





Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-11-01 Thread John Garry


Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
NUMA code numa_set_distance() never had the opportunity to do the sanity
checks as ACPI slit_valid() has completely invalidated the table.

Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
checks on the distance values parse from the "distance-matrix" property
and all the checks directly falls on numa_set_distance(). This needs to
be fixed in line with ACPI

* If (to == from) ---> distance = LOCAL_DISTANCE
* If (to != from) ---> distance > LOCAL_DISTANCE

At the same time its okay to just enhance numa_set_distance() test coverage
to include this new test. If we would have trusted firmware parsing all the
way, existing basic checks about node range, distance stuff should not have
been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
part, we should include this new check there as well.


I don't see what we gain by duplicating the check. In fact, it has a few
downsides:

  (1) It confuses the semantics of the API, because it is no longer clear
  who "owns" the check

  (2) It duplicates code in each architecture

  (3) Some clever-cloggs will remove at least some of the duplication in
  future

I'm not willing to accept the check in the arm64 code if we update the
OF code.

I think the way forward here is for John to fix the crash he reported by
adding the check to the OF code.


I was planning on doing that.

If somebody wants to follow up with

subsequent patches to move more of the checking out of the arch code, then
we can review that as a separate series.


Cheers,
John



Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

.






Re: [PATCH v2 2/5] mfd: lochnagar: Add support for the Cirrus Logic Lochnagar

2018-11-01 Thread Richard Fitzgerald

On 01/11/18 10:28, Charles Keepax wrote:

On Mon, Oct 29, 2018 at 11:04:39AM +, Lee Jones wrote:

On Fri, 26 Oct 2018, Mark Brown wrote:

On Fri, Oct 26, 2018 at 09:00:51AM +0100, Lee Jones wrote:

On Thu, 25 Oct 2018, Richard Fitzgerald wrote:


I have re-ordered some of the quotes here for the benefit
of clarity. I will start with the Lochnagar focused bits
and then cover some of the more general regmap discussion.
Apologies for the wall of text towards the end but it seemed
wise to explore some of the why for parts of the current regmap
implementation and some of the implications for changing it.


I think from the perspective of Richard and Charles who are just trying
to get their driver merged this is something of an abstract distinction.
If the driver were merged and this discussion were happening separately
their perspective would most likely be different.


Charles has already mentioned that he'd take a look at the current
*use* (not changing the API, but the way in which Lochnagar
*uses/consumes* it).  Actions which would be most welcomed.


I kind of said this above but just to be clear this driver seems to me
like an idiomatic user of the regmap API as it is today.  My guess is
that we could probably loose the defaults tables and not suffer too much
but equally well they don't hurt from a regmap point of view.


Perhaps Charles could elaborate on whether this is possible or not?


So on this front I have had a look through and we can drop the
defaults tables for Lochnagar, although as I shall cover later
Lochnagar is the exceptional case with respect to our normal
devices.


Utilising range support here would certainly help IMHO.



I am rather hesitant to switch to the range API. Whilst it is
significantly more clear in certain situations, such as say
mapping out the memory for a DSP, where there exist large
amorphis blobs of identical function registers. I am really
not convinced it really suits something like the register map
that controls Lochnagar. You have an intertwinned mix of
various purpose registers, each with a clearly defined name
and potentially with quite fine grained properties.

Certainly when working with the driver I want to be able to
fairly quickly see what registers are marked as by name. The
callbacks are very simple and I don't need to look up where
register are in the regmap to be able check their attributes.
But perhaps I have just got too used to seeing these callbacks,
things do have a way of becoming normal after being exposed to
them for a while.

What I will try for the next spin is to try to use as much:

   case REG1...REG2:

As I can without totally sacrificing grepability/clarity and
hopefully that will get us to a compromise we can all live
with at least for now. Lochnagar is a fairly small part so it
feels like this should be doable.


+   ret = devm_of_platform_populate(dev);
+   if (ret < 0) {
+   dev_err(dev, "Failed to populate child
nodes:
%d\n", ret);
+   return ret;
+   }


Please do not mix OF and MFD device registration
strategies.

Pick one and register all devices through your chosen
method.


Hmmm we use this to do things like register some fixed
regulators
and clocks that don't need any control but do need to be
associated
with the device. I could do that through the MFD although it
is in
direct conflict with the feedback on the clock patches I
received
to move the fixed clocks into devicetree rather than
registering
them manually (see v2 of the patch chain).


The I suggest moving everything to DT.


So pulling this out from earlier discussions in this thread,
it seems I can happily move all the child device registration
into device tree. I will also try this for the next version of
the patch, unless anyone wants to object? But it does change
the DT binding quite a lot as the individual sub drivers now
each require their own node rather than one single unified
Lochnagar node.



We went through this discussion with the Madera MFD patches. I had
originally implemented it using DT to register the child drivers and
it was nice in some ways each driver having its own node. But Mark
and Rob didn't like it so I went back to non-DT child registration with
all sharing the parent MFD node. It would be nice if we could stick to
one way of doing it so that Cirrus drivers don't flip-flop between
different styles of DT binding.

With the Madera MFD, since it now uses non-DT registration of children
if there was a reason we need to be able to register DT-defined children
(and there are potential uses like adding platform-specific virtual
regulators that are treated as a child) the bindings are now fixed so we
would end up having a mixed non-DT and DT registration.





Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-11-01 Thread John Garry


Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
NUMA code numa_set_distance() never had the opportunity to do the sanity
checks as ACPI slit_valid() has completely invalidated the table.

Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
checks on the distance values parse from the "distance-matrix" property
and all the checks directly falls on numa_set_distance(). This needs to
be fixed in line with ACPI

* If (to == from) ---> distance = LOCAL_DISTANCE
* If (to != from) ---> distance > LOCAL_DISTANCE

At the same time its okay to just enhance numa_set_distance() test coverage
to include this new test. If we would have trusted firmware parsing all the
way, existing basic checks about node range, distance stuff should not have
been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
part, we should include this new check there as well.


I don't see what we gain by duplicating the check. In fact, it has a few
downsides:

  (1) It confuses the semantics of the API, because it is no longer clear
  who "owns" the check

  (2) It duplicates code in each architecture

  (3) Some clever-cloggs will remove at least some of the duplication in
  future

I'm not willing to accept the check in the arm64 code if we update the
OF code.

I think the way forward here is for John to fix the crash he reported by
adding the check to the OF code.


I was planning on doing that.

If somebody wants to follow up with

subsequent patches to move more of the checking out of the arch code, then
we can review that as a separate series.


Cheers,
John



Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

.






ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!

2018-11-01 Thread kbuild test robot
Hi Matias,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   5b7449810ae6d652629c550d3974c8453836d229
commit: 73569e11032fc5a9b314b6351632cfca7793afd5 lightnvm: remove dependencies 
on BLK_DEV_NVME and PCI
date:   3 weeks ago
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 73569e11032fc5a9b314b6351632cfca7793afd5
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   ERROR: "ia64_delay_loop" [drivers/spi/spi-thunderx.ko] undefined!
>> ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!
   ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] 
undefined!
   ERROR: "ia64_delay_loop" [drivers/net/phy/mdio-cavium.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!

2018-11-01 Thread kbuild test robot
Hi Matias,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   5b7449810ae6d652629c550d3974c8453836d229
commit: 73569e11032fc5a9b314b6351632cfca7793afd5 lightnvm: remove dependencies 
on BLK_DEV_NVME and PCI
date:   3 weeks ago
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 73569e11032fc5a9b314b6351632cfca7793afd5
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   ERROR: "ia64_delay_loop" [drivers/spi/spi-thunderx.ko] undefined!
>> ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!
   ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] 
undefined!
   ERROR: "ia64_delay_loop" [drivers/net/phy/mdio-cavium.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] ARM:kexec:offline panic_smp_self_stop CPU

2018-11-01 Thread Russell King - ARM Linux
On Thu, Nov 01, 2018 at 07:20:49PM +0800, Wang Yufen wrote:
> From: Yufen Wang 
> 
> In case panic() and panic() called at the same time on different CPUS.
> For example:
> CPU 0:
>   panic()
>  __crash_kexec
>machine_crash_shutdown
>  crash_smp_send_stop
>machine_kexec
>  BUG_ON(num_online_cpus() > 1);
> 
> CPU 1:
>   panic()
> local_irq_disable
> panic_smp_self_stop
> 
> If CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop(), kdump
> fails. CPU1 can't receive the ipi irq, CPU1 will be always online.
> I changed BUG_ON to WARN in kexec crash as arm64 does, kdump also fails.
> Because num_online_cpus() > 1, can't disable the L2 in _soft_restart.
> To fix this problem, this patch split out the panic_smp_self_stop()
> and add set_cpu_online(smp_processor_id(), false).

Thanks.

I think this may as well go into arch/arm/kernel/smp.c - it won't be
required for single-CPU systems, since there aren't "other" CPUs.

It's probably also worth a comment above the function as to why we
have this.

> 
> Signed-off-by: Yufen Wang 
> ---
>  arch/arm/kernel/setup.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index 31940bd..151861f 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -602,6 +602,16 @@ static void __init smp_build_mpidr_hash(void)
>  }
>  #endif
>  
> +void panic_smp_self_stop(void)
> +{
> + printk(KERN_DEBUG "CPU %u will stop doing anything useful since another 
> CPU has paniced\n",
> + smp_processor_id());
> + set_cpu_online(smp_processor_id(), false);
> + while (1)
> + cpu_relax();
> +
> +}
> +
>  static void __init setup_processor(void)
>  {
>   struct proc_info_list *list;
> -- 
> 2.7.4
> 
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] ARM:kexec:offline panic_smp_self_stop CPU

2018-11-01 Thread Russell King - ARM Linux
On Thu, Nov 01, 2018 at 07:20:49PM +0800, Wang Yufen wrote:
> From: Yufen Wang 
> 
> In case panic() and panic() called at the same time on different CPUS.
> For example:
> CPU 0:
>   panic()
>  __crash_kexec
>machine_crash_shutdown
>  crash_smp_send_stop
>machine_kexec
>  BUG_ON(num_online_cpus() > 1);
> 
> CPU 1:
>   panic()
> local_irq_disable
> panic_smp_self_stop
> 
> If CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop(), kdump
> fails. CPU1 can't receive the ipi irq, CPU1 will be always online.
> I changed BUG_ON to WARN in kexec crash as arm64 does, kdump also fails.
> Because num_online_cpus() > 1, can't disable the L2 in _soft_restart.
> To fix this problem, this patch split out the panic_smp_self_stop()
> and add set_cpu_online(smp_processor_id(), false).

Thanks.

I think this may as well go into arch/arm/kernel/smp.c - it won't be
required for single-CPU systems, since there aren't "other" CPUs.

It's probably also worth a comment above the function as to why we
have this.

> 
> Signed-off-by: Yufen Wang 
> ---
>  arch/arm/kernel/setup.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index 31940bd..151861f 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -602,6 +602,16 @@ static void __init smp_build_mpidr_hash(void)
>  }
>  #endif
>  
> +void panic_smp_self_stop(void)
> +{
> + printk(KERN_DEBUG "CPU %u will stop doing anything useful since another 
> CPU has paniced\n",
> + smp_processor_id());
> + set_cpu_online(smp_processor_id(), false);
> + while (1)
> + cpu_relax();
> +
> +}
> +
>  static void __init setup_processor(void)
>  {
>   struct proc_info_list *list;
> -- 
> 2.7.4
> 
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


RE: [PATCH v3] Implement /proc/pid/kill

2018-11-01 Thread David Laight
From: Daniel Colascione
> Sent: 31 October 2018 19:33
...
> You can't do it today with kill. The idea that keeping a open file
> descriptor to a /proc/pid or a file within it prevents PID reuse is
> widespread, but incorrect.

Is there a real good reason why that shouldn't be the case?
ie Holding a reference on the 'struct pid' being enough to stop reuse.

A patch to do that would be more generally useful.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


RE: [PATCH v3] Implement /proc/pid/kill

2018-11-01 Thread David Laight
From: Daniel Colascione
> Sent: 31 October 2018 19:33
...
> You can't do it today with kill. The idea that keeping a open file
> descriptor to a /proc/pid or a file within it prevents PID reuse is
> widespread, but incorrect.

Is there a real good reason why that shouldn't be the case?
ie Holding a reference on the 'struct pid' being enough to stop reuse.

A patch to do that would be more generally useful.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [RFC PATCH v2] Minimal non-child process exit notification support

2018-11-01 Thread Daniel Colascione
On Thu, Nov 1, 2018 at 10:47 AM, Aleksa Sarai  wrote:
> On 2018-11-01, Daniel Colascione  wrote:
>> On Thu, Nov 1, 2018 at 7:00 AM, Aleksa Sarai  wrote:
>> > On 2018-10-29, Daniel Colascione  wrote:
>> >> This patch adds a new file under /proc/pid, /proc/pid/exithand.
>> >> Attempting to read from an exithand file will block until the
>> >> corresponding process exits, at which point the read will successfully
>> >> complete with EOF.  The file descriptor supports both blocking
>> >> operations and poll(2). It's intended to be a minimal interface for
>> >> allowing a program to wait for the exit of a process that is not one
>> >> of its children.
>> >>
>> >> Why might we want this interface? Android's lmkd kills processes in
>> >> order to free memory in response to various memory pressure
>> >> signals. It's desirable to wait until a killed process actually exits
>> >> before moving on (if needed) to killing the next process. Since the
>> >> processes that lmkd kills are not lmkd's children, lmkd currently
>> >> lacks a way to wait for a process to actually die after being sent
>> >> SIGKILL; today, lmkd resorts to polling the proc filesystem pid
>> >> entry. This interface allow lmkd to give up polling and instead block
>> >> and wait for process death.
>> >
>> > I agree with the need for this interface (with a few caveats), but there
>> > are a few points I'd like to make:
>> >
>> >  * I don't think that making a new procfile is necessary. When you open
>> >/proc/$pid you already have a handle for the underlying process, and
>> >you can already poll to check whether the process has died (fstatat
>> >fails for instance). What if we just used an inotify event to tell
>> >userspace that the process has died -- to avoid userspace doing a
>> >poll loop?
>>
>> I'm trying to make a simple interface. The basic unix data access
>> model is that a userspace application wants information (e.g., next
>> bunch of bytes in a file, next packet from a socket, next signal from
>> a signal FD, etc.), and tells the kernel so by making a system call on
>> a file descriptor. Ordinarily, the kernel returns to userspace with
>> the requested information when it's available, potentially after
>> blocking until the information is available. Sometimes userspace
>> doesn't want to block, so it adds O_NONBLOCK to the open file mode,
>> and in this mode, the kernel can tell the userspace requestor "try
>> again later", but the source of truth is still that
>> ordinarily-blocking system call. How does userspace know when to try
>> again in the "try again later" case? By using
>> select/poll/epoll/whatever, which suggests a good time for that "try
>> again later" retry, but is not dispositive about it, since that
>> ordinarily-blocking system call is still the sole source of truth, and
>> that poll is allowed to report spurious readabilty.
>
> inotify gives you an event if a file or directory is deleted. A pid
> dying semantically is similar to the idea of a /proc/$pid being deleted.
> I don't see how a blocking read on a new procfile is simpler than using
> the existing notification-on-file-events infrastructure -- not to
> mention that the idea of "this file blocks until the thing we are
> indirectly referencing by this file is gone" seems to me to be a really
> strange interface.

There's another subtlety: we don't want to wait until a process's proc
entry is *gone*. We want to wait until the process is *dead* (since
that's when its resources are released). If a process becomes a zombie
instead of going TASK_DEAD immediately, than it dies before its
/proc/pid directory disappears, so any API that talks about /proc/pid
directory presence will do the wrong thing. inotify gets this subtlety
wrong because a process is an object, not a file, and not a directory.
Using filesystem monitoring APIs to monitor process state is simply
the wrong approach, because you're mixing up some interface label for
an object with the object itself. Confusing objects and labels is how
we got the F_SETLK mess, among other things.

In other words, a process has an entire lifecycle in its own right
independent of whatever procfs is doing. Procfs is just an interface
for learning things about processes and doesn't control process
lifetime, so basing the "source of truth" for process notifications on
whatever is happening over in procfs is going to cause problems sooner
or later. We care about the process. (That's not the same as struct
task for various reasons, but logically, a process is a coherent
entity.)

> Sure, it uses read(2) -- but is that the only constraint on designing
> simple interfaces?

No, but if an interface requires some kind of setup procedure,
listener registration, event queue draining, switching on event queue
notification types, and scan-on-queue-overflow behavior, chances are
that it's not a simple interface.

>> The event file I'm proposing is so ordinary, in fact, that it works
>> from the shell. Without some 

Re: [GIT PULL] Addition of the I3C subsystem in 4.20 (5.0)?

2018-11-01 Thread Greg Kroah-Hartman
On Thu, Nov 01, 2018 at 06:35:23AM +0100, Boris Brezillon wrote:
> Hello Greg, Linus,
> 
> Greg, I didn't get your feedback on v10 of the i3c patchset [1] where I
> was asking if you'd agree to have this framework merged in 4.20 (I know
> you were busy with the 4.19 release and after that the Kernel Summit, so
> that's not a surprise), so I'm sending this PR. It's really not an attempt
> to bypass you, more like a ping that, if positively acknowledged, might
> lead to inclusion of the I3C framework in 4.20 (5.0?) :-).
> 
> Please let me know if you're opposed to the idea.
> 
> Regards,
> 
> Boris
> 
> [1]https://lkml.org/lkml/2018/10/26/552 

The last set of patches was sent way too late for this -rc cycle so I
was going to review them again once 4.20-rc1 was out.

Also you do not have an ack from the DT maintainers yet, which would be
required as well, right?

So I don't think this can be merged just yet, sorry.

greg k-h


Re: [RFC PATCH v2] Minimal non-child process exit notification support

2018-11-01 Thread Daniel Colascione
On Thu, Nov 1, 2018 at 10:47 AM, Aleksa Sarai  wrote:
> On 2018-11-01, Daniel Colascione  wrote:
>> On Thu, Nov 1, 2018 at 7:00 AM, Aleksa Sarai  wrote:
>> > On 2018-10-29, Daniel Colascione  wrote:
>> >> This patch adds a new file under /proc/pid, /proc/pid/exithand.
>> >> Attempting to read from an exithand file will block until the
>> >> corresponding process exits, at which point the read will successfully
>> >> complete with EOF.  The file descriptor supports both blocking
>> >> operations and poll(2). It's intended to be a minimal interface for
>> >> allowing a program to wait for the exit of a process that is not one
>> >> of its children.
>> >>
>> >> Why might we want this interface? Android's lmkd kills processes in
>> >> order to free memory in response to various memory pressure
>> >> signals. It's desirable to wait until a killed process actually exits
>> >> before moving on (if needed) to killing the next process. Since the
>> >> processes that lmkd kills are not lmkd's children, lmkd currently
>> >> lacks a way to wait for a process to actually die after being sent
>> >> SIGKILL; today, lmkd resorts to polling the proc filesystem pid
>> >> entry. This interface allow lmkd to give up polling and instead block
>> >> and wait for process death.
>> >
>> > I agree with the need for this interface (with a few caveats), but there
>> > are a few points I'd like to make:
>> >
>> >  * I don't think that making a new procfile is necessary. When you open
>> >/proc/$pid you already have a handle for the underlying process, and
>> >you can already poll to check whether the process has died (fstatat
>> >fails for instance). What if we just used an inotify event to tell
>> >userspace that the process has died -- to avoid userspace doing a
>> >poll loop?
>>
>> I'm trying to make a simple interface. The basic unix data access
>> model is that a userspace application wants information (e.g., next
>> bunch of bytes in a file, next packet from a socket, next signal from
>> a signal FD, etc.), and tells the kernel so by making a system call on
>> a file descriptor. Ordinarily, the kernel returns to userspace with
>> the requested information when it's available, potentially after
>> blocking until the information is available. Sometimes userspace
>> doesn't want to block, so it adds O_NONBLOCK to the open file mode,
>> and in this mode, the kernel can tell the userspace requestor "try
>> again later", but the source of truth is still that
>> ordinarily-blocking system call. How does userspace know when to try
>> again in the "try again later" case? By using
>> select/poll/epoll/whatever, which suggests a good time for that "try
>> again later" retry, but is not dispositive about it, since that
>> ordinarily-blocking system call is still the sole source of truth, and
>> that poll is allowed to report spurious readabilty.
>
> inotify gives you an event if a file or directory is deleted. A pid
> dying semantically is similar to the idea of a /proc/$pid being deleted.
> I don't see how a blocking read on a new procfile is simpler than using
> the existing notification-on-file-events infrastructure -- not to
> mention that the idea of "this file blocks until the thing we are
> indirectly referencing by this file is gone" seems to me to be a really
> strange interface.

There's another subtlety: we don't want to wait until a process's proc
entry is *gone*. We want to wait until the process is *dead* (since
that's when its resources are released). If a process becomes a zombie
instead of going TASK_DEAD immediately, than it dies before its
/proc/pid directory disappears, so any API that talks about /proc/pid
directory presence will do the wrong thing. inotify gets this subtlety
wrong because a process is an object, not a file, and not a directory.
Using filesystem monitoring APIs to monitor process state is simply
the wrong approach, because you're mixing up some interface label for
an object with the object itself. Confusing objects and labels is how
we got the F_SETLK mess, among other things.

In other words, a process has an entire lifecycle in its own right
independent of whatever procfs is doing. Procfs is just an interface
for learning things about processes and doesn't control process
lifetime, so basing the "source of truth" for process notifications on
whatever is happening over in procfs is going to cause problems sooner
or later. We care about the process. (That's not the same as struct
task for various reasons, but logically, a process is a coherent
entity.)

> Sure, it uses read(2) -- but is that the only constraint on designing
> simple interfaces?

No, but if an interface requires some kind of setup procedure,
listener registration, event queue draining, switching on event queue
notification types, and scan-on-queue-overflow behavior, chances are
that it's not a simple interface.

>> The event file I'm proposing is so ordinary, in fact, that it works
>> from the shell. Without some 

Re: [GIT PULL] Addition of the I3C subsystem in 4.20 (5.0)?

2018-11-01 Thread Greg Kroah-Hartman
On Thu, Nov 01, 2018 at 06:35:23AM +0100, Boris Brezillon wrote:
> Hello Greg, Linus,
> 
> Greg, I didn't get your feedback on v10 of the i3c patchset [1] where I
> was asking if you'd agree to have this framework merged in 4.20 (I know
> you were busy with the 4.19 release and after that the Kernel Summit, so
> that's not a surprise), so I'm sending this PR. It's really not an attempt
> to bypass you, more like a ping that, if positively acknowledged, might
> lead to inclusion of the I3C framework in 4.20 (5.0?) :-).
> 
> Please let me know if you're opposed to the idea.
> 
> Regards,
> 
> Boris
> 
> [1]https://lkml.org/lkml/2018/10/26/552 

The last set of patches was sent way too late for this -rc cycle so I
was going to review them again once 4.20-rc1 was out.

Also you do not have an ack from the DT maintainers yet, which would be
required as well, right?

So I don't think this can be merged just yet, sorry.

greg k-h


Re: [PATCH 2/2] arm64: acpi: Prepare for longer MADTs

2018-11-01 Thread Lorenzo Pieralisi
On Fri, Oct 12, 2018 at 02:29:37PM -0500, Jeremy Linton wrote:
> The BAD_MADT_GICC_ENTRY check is a little too strict because
> it rejects MADT entries that don't match the currently known
> lengths. We should remove this restriction to avoid problems
> if the table length changes. Future code which might depend on
> additional fields should be written to validate those fields
> before using them, rather than trying to globally check
> known MADT version lengths.
> 
> Signed-off-by: Jeremy Linton 
> ---
>  arch/arm64/include/asm/acpi.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Hi Jeremy,

I assume there is no [PATCH 1/2] (to make sure I have not missed
anything).

> diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
> index 709208dfdc8b..4d0946bd485a 100644
> --- a/arch/arm64/include/asm/acpi.h
> +++ b/arch/arm64/include/asm/acpi.h
> @@ -22,12 +22,12 @@
>  #include 
>  
>  /* Macros for consistency checks of the GICC subtable of MADT */
> -#define ACPI_MADT_GICC_LENGTH\
> - (acpi_gbl_FADT.header.revision < 6 ? 76 : 80)
> +#define ACPI_MADT_GICC_MIN_LENGTH   ACPI_OFFSET(  \
> + struct acpi_madt_generic_interrupt, efficiency_class)

I would add a comment explaining why the efficiency_class offset
corresponds to the min length, I will do it myself before sending it
upstream.

I will take this patch and send it to Will/Catalin with ARM64 ACPI
material for v4.21.

Thanks,
Lorenzo

>  
>  #define BAD_MADT_GICC_ENTRY(entry, end)  
> \
> - (!(entry) || (entry)->header.length != ACPI_MADT_GICC_LENGTH || \
> - (unsigned long)(entry) + ACPI_MADT_GICC_LENGTH > (end))
> + (!(entry) || (entry)->header.length < ACPI_MADT_GICC_MIN_LENGTH || \
> + (unsigned long)(entry) + (entry)->header.length > (end))
>  
>  /* Basic configuration for ACPI */
>  #ifdef   CONFIG_ACPI
> -- 
> 2.14.3
> 


Re: [PATCH 2/2] arm64: acpi: Prepare for longer MADTs

2018-11-01 Thread Lorenzo Pieralisi
On Fri, Oct 12, 2018 at 02:29:37PM -0500, Jeremy Linton wrote:
> The BAD_MADT_GICC_ENTRY check is a little too strict because
> it rejects MADT entries that don't match the currently known
> lengths. We should remove this restriction to avoid problems
> if the table length changes. Future code which might depend on
> additional fields should be written to validate those fields
> before using them, rather than trying to globally check
> known MADT version lengths.
> 
> Signed-off-by: Jeremy Linton 
> ---
>  arch/arm64/include/asm/acpi.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Hi Jeremy,

I assume there is no [PATCH 1/2] (to make sure I have not missed
anything).

> diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
> index 709208dfdc8b..4d0946bd485a 100644
> --- a/arch/arm64/include/asm/acpi.h
> +++ b/arch/arm64/include/asm/acpi.h
> @@ -22,12 +22,12 @@
>  #include 
>  
>  /* Macros for consistency checks of the GICC subtable of MADT */
> -#define ACPI_MADT_GICC_LENGTH\
> - (acpi_gbl_FADT.header.revision < 6 ? 76 : 80)
> +#define ACPI_MADT_GICC_MIN_LENGTH   ACPI_OFFSET(  \
> + struct acpi_madt_generic_interrupt, efficiency_class)

I would add a comment explaining why the efficiency_class offset
corresponds to the min length, I will do it myself before sending it
upstream.

I will take this patch and send it to Will/Catalin with ARM64 ACPI
material for v4.21.

Thanks,
Lorenzo

>  
>  #define BAD_MADT_GICC_ENTRY(entry, end)  
> \
> - (!(entry) || (entry)->header.length != ACPI_MADT_GICC_LENGTH || \
> - (unsigned long)(entry) + ACPI_MADT_GICC_LENGTH > (end))
> + (!(entry) || (entry)->header.length < ACPI_MADT_GICC_MIN_LENGTH || \
> + (unsigned long)(entry) + (entry)->header.length > (end))
>  
>  /* Basic configuration for ACPI */
>  #ifdef   CONFIG_ACPI
> -- 
> 2.14.3
> 


Re: [tip:locking/core 5/6] /bin/bash: scripts/atomic/check-atomics.sh: No such file or directory

2018-11-01 Thread Mark Rutland
On Thu, Nov 01, 2018 at 06:16:22PM +0800, kbuild test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> locking/core
> head:   30bc7baa9de81efc0584b9290ce8040a1130f156
> commit: 85f8507192fbfb4ad2ac01de879cb50045f4247f [5/6] locking/atomics: Check 
> generated headers are up-to-date
> config: i386-tinyconfig (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
> reproduce:
> git checkout 85f8507192fbfb4ad2ac01de879cb50045f4247f
> # save the attached .config to linux build tree
> make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
> >> /bin/bash: scripts/atomic/check-atomics.sh: No such file or directory
>make[2]: *** [old-atomics] Error 127
>make[2]: Target '__build' not remade because of errors.
>make[1]: *** [prepare0] Error 2
>make[1]: Target 'prepare' not remade because of errors.
>make: *** [sub-make] Error 2

It looks like we accidentally dropped execute permissions from the
scripts when picking them from the list. Locally I get a slightly
different failure:

  ALLscripts/atomic/check-atomics.sh
  scripts/atomic/check-atomics.sh: line 16: 
  scripts/atomic/gen-atomic-instrumented.sh: Permission denied
  warning: include/asm-generic/atomic-instrumented.h is out-of-date.
  scripts/atomic/check-atomics.sh: line 16: 
  scripts/atomic/gen-atomic-long.sh: Permission denied
  warning: include/asm-generic/atomic-long.h is out-of-date.
  scripts/atomic/check-atomics.sh: line 16: 
  scripts/atomic/gen-atomic-fallback.sh: Permission denied
  warning: include/linux/atomic-fallback.h is out-of-date.

I've sent a patch [1] to fix this.

Thanks,
Mark.

[1] https://lkml.kernel.org/r/20181101112654.55898-1-mark.rutl...@arm.com


Re: [tip:locking/core 5/6] /bin/bash: scripts/atomic/check-atomics.sh: No such file or directory

2018-11-01 Thread Mark Rutland
On Thu, Nov 01, 2018 at 06:16:22PM +0800, kbuild test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> locking/core
> head:   30bc7baa9de81efc0584b9290ce8040a1130f156
> commit: 85f8507192fbfb4ad2ac01de879cb50045f4247f [5/6] locking/atomics: Check 
> generated headers are up-to-date
> config: i386-tinyconfig (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
> reproduce:
> git checkout 85f8507192fbfb4ad2ac01de879cb50045f4247f
> # save the attached .config to linux build tree
> make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
> >> /bin/bash: scripts/atomic/check-atomics.sh: No such file or directory
>make[2]: *** [old-atomics] Error 127
>make[2]: Target '__build' not remade because of errors.
>make[1]: *** [prepare0] Error 2
>make[1]: Target 'prepare' not remade because of errors.
>make: *** [sub-make] Error 2

It looks like we accidentally dropped execute permissions from the
scripts when picking them from the list. Locally I get a slightly
different failure:

  ALLscripts/atomic/check-atomics.sh
  scripts/atomic/check-atomics.sh: line 16: 
  scripts/atomic/gen-atomic-instrumented.sh: Permission denied
  warning: include/asm-generic/atomic-instrumented.h is out-of-date.
  scripts/atomic/check-atomics.sh: line 16: 
  scripts/atomic/gen-atomic-long.sh: Permission denied
  warning: include/asm-generic/atomic-long.h is out-of-date.
  scripts/atomic/check-atomics.sh: line 16: 
  scripts/atomic/gen-atomic-fallback.sh: Permission denied
  warning: include/linux/atomic-fallback.h is out-of-date.

I've sent a patch [1] to fix this.

Thanks,
Mark.

[1] https://lkml.kernel.org/r/20181101112654.55898-1-mark.rutl...@arm.com


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-11-01 Thread Will Deacon
[ Nit: Please wrap your lines when replying -- I've done it for you here ]

On Tue, Oct 30, 2018 at 08:16:21AM +0530, Anshuman Khandual wrote:
> On 10/29/2018 08:14 PM, John Garry wrote:
> >  I think we should either factor out the sanity check
> >> into a core helper or make the core code robust to these funny 
> >> configurations.
> >
> > OK, so to me it would make sense to factor out a sanity check into a 
> > core
> > helper.
> 
>  That, or have the OF code perform the same validation that slit_valid() 
>  is
>  doing for ACPI. I'm just trying to avoid other architectures running into
>  this problem down the line.
> 
> >>>
> >>> Right, OF code should do this validation job if ACPI is doing it
> >>> (especially since the DT bindings actually specify the distance
> >>> rules), and not rely on the arch NUMA code to accept/reject
> >>> numa_set_distance() combinations.
> >>
> >> I would say this particular condition checking still falls under arch NUMA 
> >> init
> >> code sanity check like other basic tests what numa_set_distance() 
> >> currently does
> >> already but it should not be a necessity for the OF driver to check these.
> > 
> > The checks in the arch NUMA code mean that invalid inter-node distance
> > combinations are ignored.
> 
> Right and should not this new test (from != to && distance == LOCAL_DISTANCE) 
> be
> one of them as well ? numa_set_distance() updates the table or just throws 
> some
> warnings while skipping entries it deems invalid. It would be okay to have 
> this
> new check there in addition to others like this patch suggests.

If we're changing numa_set_distance(), I'd actually be inclined to do the
opposite of what you're suggesting and move as much of this boilerplate
checking into the core code. Perhaps we could add something like
check_fw_numa_topology() and have both ACPI and OF call that so that the
arch doesn't need to worry about malformed tables at all.

> > However, if any entries in the table are invalid, then the whole table
> > can be discarded as none of it can be believed, i.e. it's better to
> > validate the table.
> >
> 
> Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
> acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
> NUMA code numa_set_distance() never had the opportunity to do the sanity
> checks as ACPI slit_valid() has completely invalidated the table.
> 
> Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
> checks on the distance values parse from the "distance-matrix" property
> and all the checks directly falls on numa_set_distance(). This needs to
> be fixed in line with ACPI
> 
> * If (to == from) ---> distance = LOCAL_DISTANCE
> * If (to != from) ---> distance > LOCAL_DISTANCE
> 
> At the same time its okay to just enhance numa_set_distance() test coverage
> to include this new test. If we would have trusted firmware parsing all the
> way, existing basic checks about node range, distance stuff should not have
> been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
> part, we should include this new check there as well.

I don't see what we gain by duplicating the check. In fact, it has a few
downsides:

  (1) It confuses the semantics of the API, because it is no longer clear
  who "owns" the check

  (2) It duplicates code in each architecture

  (3) Some clever-cloggs will remove at least some of the duplication in
  future

I'm not willing to accept the check in the arm64 code if we update the
OF code.

I think the way forward here is for John to fix the crash he reported by
adding the check to the OF code. If somebody wants to follow up with
subsequent patches to move more of the checking out of the arch code, then
we can review that as a separate series.

Will


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-11-01 Thread Will Deacon
[ Nit: Please wrap your lines when replying -- I've done it for you here ]

On Tue, Oct 30, 2018 at 08:16:21AM +0530, Anshuman Khandual wrote:
> On 10/29/2018 08:14 PM, John Garry wrote:
> >  I think we should either factor out the sanity check
> >> into a core helper or make the core code robust to these funny 
> >> configurations.
> >
> > OK, so to me it would make sense to factor out a sanity check into a 
> > core
> > helper.
> 
>  That, or have the OF code perform the same validation that slit_valid() 
>  is
>  doing for ACPI. I'm just trying to avoid other architectures running into
>  this problem down the line.
> 
> >>>
> >>> Right, OF code should do this validation job if ACPI is doing it
> >>> (especially since the DT bindings actually specify the distance
> >>> rules), and not rely on the arch NUMA code to accept/reject
> >>> numa_set_distance() combinations.
> >>
> >> I would say this particular condition checking still falls under arch NUMA 
> >> init
> >> code sanity check like other basic tests what numa_set_distance() 
> >> currently does
> >> already but it should not be a necessity for the OF driver to check these.
> > 
> > The checks in the arch NUMA code mean that invalid inter-node distance
> > combinations are ignored.
> 
> Right and should not this new test (from != to && distance == LOCAL_DISTANCE) 
> be
> one of them as well ? numa_set_distance() updates the table or just throws 
> some
> warnings while skipping entries it deems invalid. It would be okay to have 
> this
> new check there in addition to others like this patch suggests.

If we're changing numa_set_distance(), I'd actually be inclined to do the
opposite of what you're suggesting and move as much of this boilerplate
checking into the core code. Perhaps we could add something like
check_fw_numa_topology() and have both ACPI and OF call that so that the
arch doesn't need to worry about malformed tables at all.

> > However, if any entries in the table are invalid, then the whole table
> > can be discarded as none of it can be believed, i.e. it's better to
> > validate the table.
> >
> 
> Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
> acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
> NUMA code numa_set_distance() never had the opportunity to do the sanity
> checks as ACPI slit_valid() has completely invalidated the table.
> 
> Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
> checks on the distance values parse from the "distance-matrix" property
> and all the checks directly falls on numa_set_distance(). This needs to
> be fixed in line with ACPI
> 
> * If (to == from) ---> distance = LOCAL_DISTANCE
> * If (to != from) ---> distance > LOCAL_DISTANCE
> 
> At the same time its okay to just enhance numa_set_distance() test coverage
> to include this new test. If we would have trusted firmware parsing all the
> way, existing basic checks about node range, distance stuff should not have
> been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
> part, we should include this new check there as well.

I don't see what we gain by duplicating the check. In fact, it has a few
downsides:

  (1) It confuses the semantics of the API, because it is no longer clear
  who "owns" the check

  (2) It duplicates code in each architecture

  (3) Some clever-cloggs will remove at least some of the duplication in
  future

I'm not willing to accept the check in the arm64 code if we update the
OF code.

I think the way forward here is for John to fix the crash he reported by
adding the check to the OF code. If somebody wants to follow up with
subsequent patches to move more of the checking out of the arch code, then
we can review that as a separate series.

Will


[PATCH] locking/atomics: restore execute permission to scripts

2018-11-01 Thread Mark Rutland
We accidentally dropped execute permissions from the atomics scripts,
which leads to the check-atomics.sh script failing at build time:

  CALLscripts/atomic/check-atomics.sh
  scripts/atomic/check-atomics.sh: line 16:
  scripts/atomic/gen-atomic-instrumented.sh: Permission denied
  warning: include/asm-generic/atomic-instrumented.h is out-of-date.
  scripts/atomic/check-atomics.sh: line 16:
  scripts/atomic/gen-atomic-long.sh: Permission denied
  warning: include/asm-generic/atomic-long.h is out-of-date.
  scripts/atomic/check-atomics.sh: line 16:
  scripts/atomic/gen-atomic-fallback.sh: Permission denied
  warning: include/linux/atomic-fallback.h is out-of-date.

Fix this by restoring execute permissions to all the atomics scripts.

Fixes: ace9bad4df2684f3 ("locking/atomics: Add common header generation files")
Signed-off-by: Mark Rutland 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
---
 scripts/atomic/atomic-tbl.sh  | 0
 scripts/atomic/check-atomics.sh   | 0
 scripts/atomic/gen-atomic-fallback.sh | 0
 scripts/atomic/gen-atomic-instrumented.sh | 0
 scripts/atomic/gen-atomic-long.sh | 0
 5 files changed, 0 insertions(+), 0 deletions(-)
 mode change 100644 => 100755 scripts/atomic/atomic-tbl.sh
 mode change 100644 => 100755 scripts/atomic/check-atomics.sh
 mode change 100644 => 100755 scripts/atomic/gen-atomic-fallback.sh
 mode change 100644 => 100755 scripts/atomic/gen-atomic-instrumented.sh
 mode change 100644 => 100755 scripts/atomic/gen-atomic-long.sh

diff --git a/scripts/atomic/atomic-tbl.sh b/scripts/atomic/atomic-tbl.sh
old mode 100644
new mode 100755
diff --git a/scripts/atomic/check-atomics.sh b/scripts/atomic/check-atomics.sh
old mode 100644
new mode 100755
diff --git a/scripts/atomic/gen-atomic-fallback.sh 
b/scripts/atomic/gen-atomic-fallback.sh
old mode 100644
new mode 100755
diff --git a/scripts/atomic/gen-atomic-instrumented.sh 
b/scripts/atomic/gen-atomic-instrumented.sh
old mode 100644
new mode 100755
diff --git a/scripts/atomic/gen-atomic-long.sh 
b/scripts/atomic/gen-atomic-long.sh
old mode 100644
new mode 100755
-- 
2.11.0



[PATCH] locking/atomics: restore execute permission to scripts

2018-11-01 Thread Mark Rutland
We accidentally dropped execute permissions from the atomics scripts,
which leads to the check-atomics.sh script failing at build time:

  CALLscripts/atomic/check-atomics.sh
  scripts/atomic/check-atomics.sh: line 16:
  scripts/atomic/gen-atomic-instrumented.sh: Permission denied
  warning: include/asm-generic/atomic-instrumented.h is out-of-date.
  scripts/atomic/check-atomics.sh: line 16:
  scripts/atomic/gen-atomic-long.sh: Permission denied
  warning: include/asm-generic/atomic-long.h is out-of-date.
  scripts/atomic/check-atomics.sh: line 16:
  scripts/atomic/gen-atomic-fallback.sh: Permission denied
  warning: include/linux/atomic-fallback.h is out-of-date.

Fix this by restoring execute permissions to all the atomics scripts.

Fixes: ace9bad4df2684f3 ("locking/atomics: Add common header generation files")
Signed-off-by: Mark Rutland 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
---
 scripts/atomic/atomic-tbl.sh  | 0
 scripts/atomic/check-atomics.sh   | 0
 scripts/atomic/gen-atomic-fallback.sh | 0
 scripts/atomic/gen-atomic-instrumented.sh | 0
 scripts/atomic/gen-atomic-long.sh | 0
 5 files changed, 0 insertions(+), 0 deletions(-)
 mode change 100644 => 100755 scripts/atomic/atomic-tbl.sh
 mode change 100644 => 100755 scripts/atomic/check-atomics.sh
 mode change 100644 => 100755 scripts/atomic/gen-atomic-fallback.sh
 mode change 100644 => 100755 scripts/atomic/gen-atomic-instrumented.sh
 mode change 100644 => 100755 scripts/atomic/gen-atomic-long.sh

diff --git a/scripts/atomic/atomic-tbl.sh b/scripts/atomic/atomic-tbl.sh
old mode 100644
new mode 100755
diff --git a/scripts/atomic/check-atomics.sh b/scripts/atomic/check-atomics.sh
old mode 100644
new mode 100755
diff --git a/scripts/atomic/gen-atomic-fallback.sh 
b/scripts/atomic/gen-atomic-fallback.sh
old mode 100644
new mode 100755
diff --git a/scripts/atomic/gen-atomic-instrumented.sh 
b/scripts/atomic/gen-atomic-instrumented.sh
old mode 100644
new mode 100755
diff --git a/scripts/atomic/gen-atomic-long.sh 
b/scripts/atomic/gen-atomic-long.sh
old mode 100644
new mode 100755
-- 
2.11.0



Re: [PATCH 3/3] kobject: fix warnings use pr_* to replace printk

2018-11-01 Thread YU Bo

On Wed, Oct 31, 2018 at 09:48:15AM -0700, Joe Perches wrote:

On Wed, 2018-10-31 at 09:41 -0400, YU Bo wrote:

Hi,
On Tue, Oct 30, 2018 at 08:01:50AM -0700, Joe Perches wrote:
> On Tue, 2018-10-30 at 08:01 -0400, Bo YU wrote:
> > Fix warning from checkpatch.pl use pr_* to replace printk
>
> If you look at msg, it can be unterminated with newline.
>
> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> []
> > @@ -224,7 +224,7 @@ int kobject_synth_uevent(struct kobject *kobj, const 
char *buf, size_t count)
> >  out:
> >   if (r) {
> >   devpath = kobject_get_path(kobj, GFP_KERNEL);
> > - printk(KERN_WARNING "synth uevent: %s: %s",
> > + pr_warn("synth uevent: %s: %s",
> >  devpath ?: "unknown device",
> >  msg ?: "failed to send uevent");
> >   kfree(devpath);
>
> Perhaps this block should be:
>
> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> index 63d0816ab23b..0ba1197f366e 100644
> --- a/lib/kobject_uevent.c
> +++ b/lib/kobject_uevent.c
> @@ -195,12 +195,12 @@ int kobject_synth_uevent(struct kobject *kobj, const 
char *buf, size_t count)
>enum kobject_action action;
>const char *action_args;
>struct kobj_uevent_env *env;
> -  const char *msg = NULL, *devpath;
> +  const char *msg = NULL;
>int r;
>
>r = kobject_action_type(buf, count, , _args);
>if (r) {
> -  msg = "unknown uevent action string\n";
> +  msg = "unknown uevent action string";
>goto out;
>}
>
> @@ -212,7 +212,7 @@ int kobject_synth_uevent(struct kobject *kobj, const char 
*buf, size_t count)
>r = kobject_action_args(action_args,
>count - (action_args - buf), );
>if (r == -EINVAL) {
> -  msg = "incorrect uevent action arguments\n";
> +  msg = "incorrect uevent action arguments";
>goto out;
>}
>
> @@ -223,10 +223,11 @@ int kobject_synth_uevent(struct kobject *kobj, const 
char *buf, size_t count)
>kfree(env);
> out:
>if (r) {
> -  devpath = kobject_get_path(kobj, GFP_KERNEL);
> -  printk(KERN_WARNING "synth uevent: %s: %s",
> - devpath ?: "unknown device",
> - msg ?: "failed to send uevent");
> +  char *devpath devpath = kobject_get_path(kobj, GFP_KERNEL);
> +
> +  pr_warn("synth uevent: %s: %s\n",
> +  devpath ?: "unknown device",
> +  msg ?: "failed to send uevent");
>kfree(devpath);
>}
>return r;
Sorry, but i have two stupid questions to annoy you:
Q1: If i agree with your patch, here should i to do? Acked-by tag or others or 
nothing to do?


Send V2 with content and explanation of what
it does and why.


Q2: In fact, i do not know how to test the patch. Only to cat /sys/bus/pci/* or 
something?


That wouldn't work as these go to dmesg only on
certain conditions.

It's a trivial patch, obviously correct.

Thank you, joe, i will do.





Re: [PATCH 3/3] kobject: fix warnings use pr_* to replace printk

2018-11-01 Thread YU Bo

On Wed, Oct 31, 2018 at 09:48:15AM -0700, Joe Perches wrote:

On Wed, 2018-10-31 at 09:41 -0400, YU Bo wrote:

Hi,
On Tue, Oct 30, 2018 at 08:01:50AM -0700, Joe Perches wrote:
> On Tue, 2018-10-30 at 08:01 -0400, Bo YU wrote:
> > Fix warning from checkpatch.pl use pr_* to replace printk
>
> If you look at msg, it can be unterminated with newline.
>
> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> []
> > @@ -224,7 +224,7 @@ int kobject_synth_uevent(struct kobject *kobj, const 
char *buf, size_t count)
> >  out:
> >   if (r) {
> >   devpath = kobject_get_path(kobj, GFP_KERNEL);
> > - printk(KERN_WARNING "synth uevent: %s: %s",
> > + pr_warn("synth uevent: %s: %s",
> >  devpath ?: "unknown device",
> >  msg ?: "failed to send uevent");
> >   kfree(devpath);
>
> Perhaps this block should be:
>
> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> index 63d0816ab23b..0ba1197f366e 100644
> --- a/lib/kobject_uevent.c
> +++ b/lib/kobject_uevent.c
> @@ -195,12 +195,12 @@ int kobject_synth_uevent(struct kobject *kobj, const 
char *buf, size_t count)
>enum kobject_action action;
>const char *action_args;
>struct kobj_uevent_env *env;
> -  const char *msg = NULL, *devpath;
> +  const char *msg = NULL;
>int r;
>
>r = kobject_action_type(buf, count, , _args);
>if (r) {
> -  msg = "unknown uevent action string\n";
> +  msg = "unknown uevent action string";
>goto out;
>}
>
> @@ -212,7 +212,7 @@ int kobject_synth_uevent(struct kobject *kobj, const char 
*buf, size_t count)
>r = kobject_action_args(action_args,
>count - (action_args - buf), );
>if (r == -EINVAL) {
> -  msg = "incorrect uevent action arguments\n";
> +  msg = "incorrect uevent action arguments";
>goto out;
>}
>
> @@ -223,10 +223,11 @@ int kobject_synth_uevent(struct kobject *kobj, const 
char *buf, size_t count)
>kfree(env);
> out:
>if (r) {
> -  devpath = kobject_get_path(kobj, GFP_KERNEL);
> -  printk(KERN_WARNING "synth uevent: %s: %s",
> - devpath ?: "unknown device",
> - msg ?: "failed to send uevent");
> +  char *devpath devpath = kobject_get_path(kobj, GFP_KERNEL);
> +
> +  pr_warn("synth uevent: %s: %s\n",
> +  devpath ?: "unknown device",
> +  msg ?: "failed to send uevent");
>kfree(devpath);
>}
>return r;
Sorry, but i have two stupid questions to annoy you:
Q1: If i agree with your patch, here should i to do? Acked-by tag or others or 
nothing to do?


Send V2 with content and explanation of what
it does and why.


Q2: In fact, i do not know how to test the patch. Only to cat /sys/bus/pci/* or 
something?


That wouldn't work as these go to dmesg only on
certain conditions.

It's a trivial patch, obviously correct.

Thank you, joe, i will do.





[PATCH] ARM:kexec:offline panic_smp_self_stop CPU

2018-11-01 Thread Wang Yufen
From: Yufen Wang 

In case panic() and panic() called at the same time on different CPUS.
For example:
CPU 0:
  panic()
 __crash_kexec
   machine_crash_shutdown
 crash_smp_send_stop
   machine_kexec
 BUG_ON(num_online_cpus() > 1);

CPU 1:
  panic()
local_irq_disable
panic_smp_self_stop

If CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop(), kdump
fails. CPU1 can't receive the ipi irq, CPU1 will be always online.
I changed BUG_ON to WARN in kexec crash as arm64 does, kdump also fails.
Because num_online_cpus() > 1, can't disable the L2 in _soft_restart.
To fix this problem, this patch split out the panic_smp_self_stop()
and add set_cpu_online(smp_processor_id(), false).

Signed-off-by: Yufen Wang 
---
 arch/arm/kernel/setup.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 31940bd..151861f 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -602,6 +602,16 @@ static void __init smp_build_mpidr_hash(void)
 }
 #endif
 
+void panic_smp_self_stop(void)
+{
+   printk(KERN_DEBUG "CPU %u will stop doing anything useful since another 
CPU has paniced\n",
+   smp_processor_id());
+   set_cpu_online(smp_processor_id(), false);
+   while (1)
+   cpu_relax();
+
+}
+
 static void __init setup_processor(void)
 {
struct proc_info_list *list;
-- 
2.7.4




[PATCH] ARM:kexec:offline panic_smp_self_stop CPU

2018-11-01 Thread Wang Yufen
From: Yufen Wang 

In case panic() and panic() called at the same time on different CPUS.
For example:
CPU 0:
  panic()
 __crash_kexec
   machine_crash_shutdown
 crash_smp_send_stop
   machine_kexec
 BUG_ON(num_online_cpus() > 1);

CPU 1:
  panic()
local_irq_disable
panic_smp_self_stop

If CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop(), kdump
fails. CPU1 can't receive the ipi irq, CPU1 will be always online.
I changed BUG_ON to WARN in kexec crash as arm64 does, kdump also fails.
Because num_online_cpus() > 1, can't disable the L2 in _soft_restart.
To fix this problem, this patch split out the panic_smp_self_stop()
and add set_cpu_online(smp_processor_id(), false).

Signed-off-by: Yufen Wang 
---
 arch/arm/kernel/setup.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 31940bd..151861f 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -602,6 +602,16 @@ static void __init smp_build_mpidr_hash(void)
 }
 #endif
 
+void panic_smp_self_stop(void)
+{
+   printk(KERN_DEBUG "CPU %u will stop doing anything useful since another 
CPU has paniced\n",
+   smp_processor_id());
+   set_cpu_online(smp_processor_id(), false);
+   while (1)
+   cpu_relax();
+
+}
+
 static void __init setup_processor(void)
 {
struct proc_info_list *list;
-- 
2.7.4




Re: [PATCH 07/10] sched/fair: Provide can_migrate_task_llc

2018-11-01 Thread Valentin Schneider
On 31/10/2018 19:14, Peter Zijlstra wrote:
> On Mon, Oct 29, 2018 at 07:34:50PM +, Valentin Schneider wrote:
>> On a sidenote, I find it a bit odd that the exec_start threshold depends on
>> sysctl_sched_migration_cost, which to me is more about idle_balance() cost
>> than "how long does it take for a previously run task to go cache cold".
> 
> A long long (think 2.6.20 long ago) there was code that did boot-time
> measurement of the various cache topology costs and that threshold was
> related to that.
> 
> That code got killed because of boot to boot variance and dubious
> benefits.
> 
> The migration cost is what it was replaced with as a single measure.
> 
> migration cost was then later abused in the newidle balance because it
> was over eager.


Thanks! I tried following the blame rabbit hole but got a bit lost along
the way.

> Ideally we'd get rid of it there, because we've now got
> that much more elaborate accounting, but Rohit tried and found some
> regression because of that.
> 
> Maybe we should remove it from newidle anyway.
> 

Well, if we ditch idle_balance() in favor of stealing, that would "solve"
it...


Re: [PATCH 07/10] sched/fair: Provide can_migrate_task_llc

2018-11-01 Thread Valentin Schneider
On 31/10/2018 19:14, Peter Zijlstra wrote:
> On Mon, Oct 29, 2018 at 07:34:50PM +, Valentin Schneider wrote:
>> On a sidenote, I find it a bit odd that the exec_start threshold depends on
>> sysctl_sched_migration_cost, which to me is more about idle_balance() cost
>> than "how long does it take for a previously run task to go cache cold".
> 
> A long long (think 2.6.20 long ago) there was code that did boot-time
> measurement of the various cache topology costs and that threshold was
> related to that.
> 
> That code got killed because of boot to boot variance and dubious
> benefits.
> 
> The migration cost is what it was replaced with as a single measure.
> 
> migration cost was then later abused in the newidle balance because it
> was over eager.


Thanks! I tried following the blame rabbit hole but got a bit lost along
the way.

> Ideally we'd get rid of it there, because we've now got
> that much more elaborate accounting, but Rohit tried and found some
> regression because of that.
> 
> Maybe we should remove it from newidle anyway.
> 

Well, if we ditch idle_balance() in favor of stealing, that would "solve"
it...


Re: [PATCH] of: Fix cpu node iterator to not ignore disabled cpu nodes

2018-11-01 Thread Michael Ellerman
Rob Herring  writes:

> In most cases, nodes with 'status = "disabled";' are treated as if the
> node is not present though it is a common bug to forget to check that.
> However, cpu nodes are different in that "disabled" simply means offline
> and the OS can bring the CPU core online. Commit f1f207e43b8a ("of: Add
> cpu node iterator for_each_of_cpu_node()") followed the common behavior
> of ignoring disabled cpu nodes. This breaks some powerpc systems (at
> least NXP P50XX/e5500). Fix this by dropping the status check.
>
> Fixes: 651d44f9679c ("of: use for_each_of_cpu_node iterator")
> Fixes: f1f207e43b8a ("of: Add cpu node iterator for_each_of_cpu_node()")
> Reported-by: Michael Ellerman 
> Cc: Christian Zigotzky 
> Cc: Frank Rowand 
> Signed-off-by: Rob Herring 
> ---
>  drivers/of/base.c | 2 --
>  1 file changed, 2 deletions(-)

This fixes my machine, thanks.

Tested-by: Michael Ellerman 

It was actually originally reported to me by Christian, so also:

Reported-by: Christian Zigotzky 


cheers

> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index cc62da278663..e47c5ce6cd58 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -776,8 +776,6 @@ struct device_node *of_get_next_cpu_node(struct 
> device_node *prev)
>   if (!(of_node_name_eq(next, "cpu") ||
> (next->type && !of_node_cmp(next->type, "cpu"
>   continue;
> - if (!__of_device_is_available(next))
> - continue;
>   if (of_node_get(next))
>   break;
>   }
> -- 
> 2.19.1


Re: [PATCH] of: Fix cpu node iterator to not ignore disabled cpu nodes

2018-11-01 Thread Michael Ellerman
Rob Herring  writes:

> In most cases, nodes with 'status = "disabled";' are treated as if the
> node is not present though it is a common bug to forget to check that.
> However, cpu nodes are different in that "disabled" simply means offline
> and the OS can bring the CPU core online. Commit f1f207e43b8a ("of: Add
> cpu node iterator for_each_of_cpu_node()") followed the common behavior
> of ignoring disabled cpu nodes. This breaks some powerpc systems (at
> least NXP P50XX/e5500). Fix this by dropping the status check.
>
> Fixes: 651d44f9679c ("of: use for_each_of_cpu_node iterator")
> Fixes: f1f207e43b8a ("of: Add cpu node iterator for_each_of_cpu_node()")
> Reported-by: Michael Ellerman 
> Cc: Christian Zigotzky 
> Cc: Frank Rowand 
> Signed-off-by: Rob Herring 
> ---
>  drivers/of/base.c | 2 --
>  1 file changed, 2 deletions(-)

This fixes my machine, thanks.

Tested-by: Michael Ellerman 

It was actually originally reported to me by Christian, so also:

Reported-by: Christian Zigotzky 


cheers

> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index cc62da278663..e47c5ce6cd58 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -776,8 +776,6 @@ struct device_node *of_get_next_cpu_node(struct 
> device_node *prev)
>   if (!(of_node_name_eq(next, "cpu") ||
> (next->type && !of_node_cmp(next->type, "cpu"
>   continue;
> - if (!__of_device_is_available(next))
> - continue;
>   if (of_node_get(next))
>   break;
>   }
> -- 
> 2.19.1


Re: INFO: task hung in fuse_sb_destroy

2018-11-01 Thread Dmitry Vyukov
On Thu, Nov 1, 2018 at 11:49 AM, syzbot
 wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:59fc453b21f7 Merge branch 'akpm' (patches from Andrew)
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15fb244740
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ea045471e4c756e8
> dashboard link: https://syzkaller.appspot.com/bug?extid=6339eda9cb4ebbc4c37b
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=178a105d40
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1665113340


I can easily reproduce this.

The repro gives me a task hanged at:

# cat /proc/7563/task/*/stack
[<0>] fuse_wait_aborted+0x20b/0x320
[<0>] fuse_sb_destroy+0xe2/0x1d0
[<0>] fuse_kill_sb_anon+0x15/0x20
[<0>] deactivate_locked_super+0x97/0x100
[<0>] deactivate_super+0x2bb/0x320
[<0>] cleanup_mnt+0xbf/0x160
[<0>] __cleanup_mnt+0x16/0x20
[<0>] task_work_run+0x1e8/0x2a0
[<0>] exit_to_usermode_loop+0x318/0x380
[<0>] do_syscall_64+0x6be/0x820
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0x

I double checked that writing to /sys/fs/fuse/connections/44/abort did
not help (the only entry in fuse/connections). Wrote multiple times,
and tried to kill the task, nothing helps.



> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+6339eda9cb4ebbc4c...@syzkaller.appspotmail.com
>
> INFO: task syz-executor221:17414 blocked for more than 140 seconds.
>   Not tainted 4.19.0+ #313
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor221 D23048 17414   5652 0x0004
> Call Trace:
>  context_switch kernel/sched/core.c:2831 [inline]
>  __schedule+0x8cf/0x21d0 kernel/sched/core.c:3472
>  schedule+0xfe/0x460 kernel/sched/core.c:3516
>  fuse_wait_aborted+0x20b/0x320 fs/fuse/dev.c:2155
>  fuse_sb_destroy+0xe2/0x1d0 fs/fuse/inode.c:1224
>  fuse_kill_sb_anon+0x15/0x20 fs/fuse/inode.c:1234
>  deactivate_locked_super+0x97/0x100 fs/super.c:329
>  deactivate_super+0x2bb/0x320 fs/super.c:360
>  cleanup_mnt+0xbf/0x160 fs/namespace.c:1098
>  __cleanup_mnt+0x16/0x20 fs/namespace.c:1105
>  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
>  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
>  exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
>  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x446689
> Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
> 0f 83 7b 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7f621e979da8 EFLAGS: 0293 ORIG_RAX: 00a6
> RAX:  RBX: 006dbc28 RCX: 00446689
> RDX: 00446689 RSI: 000a RDI: 2180
> RBP: 006dbc20 R08:  R09: 
> R10:  R11: 0293 R12: 006dbc2c
> R13: 0030656c69662f2e R14: 65646f6d746f6f72 R15: 
>
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1009:
>  #0: 95618e4f (rcu_read_lock){}, at:
> debug_show_all_locks+0xd0/0x424 kernel/locking/lockdep.c:4379
> 5 locks held by rsyslogd/5530:
>  #0: 28bad575 (>f_pos_lock){+.+.}, at: __fdget_pos+0x1bb/0x200
> fs/file.c:766
>  #1: e31592cd (>lock){-.-.}, at: rq_lock
> kernel/sched/sched.h:1126 [inline]
>  #1: e31592cd (>lock){-.-.}, at: __schedule+0x236/0x21d0
> kernel/sched/core.c:3410
>  #2: 95618e4f (rcu_read_lock){}, at: trace_sched_stat_runtime
> include/trace/events/sched.h:418 [inline]
>  #2: 95618e4f (rcu_read_lock){}, at: update_curr+0x383/0xbd0
> kernel/sched/fair.c:830
>  #3: e31592cd (>lock){-.-.}, at: rq_lock
> kernel/sched/sched.h:1126 [inline]
>  #3: e31592cd (>lock){-.-.}, at: ttwu_queue
> kernel/sched/core.c:1845 [inline]
>  #3: e31592cd (>lock){-.-.}, at: try_to_wake_up+0x9f6/0x1490
> kernel/sched/core.c:2057
>  #4: 95618e4f (rcu_read_lock){}, at: trace_sched_stat_runtime
> include/trace/events/sched.h:418 [inline]
>  #4: 95618e4f (rcu_read_lock){}, at: update_curr+0x383/0xbd0
> kernel/sched/fair.c:830
> 2 locks held by getty/5620:
>  #0: 492d5ad8 (>ldisc_sem){}, at: ldsem_down_read+0x32/0x40
> drivers/tty/tty_ldsem.c:353
>  #1: 88c4d769 (>atomic_read_lock){+.+.}, at:
> n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
> 2 locks held by getty/5621:
>  #0: ed56cf3c (>ldisc_sem){}, at: ldsem_down_read+0x32/0x40
> drivers/tty/tty_ldsem.c:353
>  #1: a8112d49 (>atomic_read_lock){+.+.}, at:
> n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
> 2 locks held by 

Re: INFO: task hung in fuse_sb_destroy

2018-11-01 Thread Dmitry Vyukov
On Thu, Nov 1, 2018 at 11:49 AM, syzbot
 wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:59fc453b21f7 Merge branch 'akpm' (patches from Andrew)
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15fb244740
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ea045471e4c756e8
> dashboard link: https://syzkaller.appspot.com/bug?extid=6339eda9cb4ebbc4c37b
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=178a105d40
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1665113340


I can easily reproduce this.

The repro gives me a task hanged at:

# cat /proc/7563/task/*/stack
[<0>] fuse_wait_aborted+0x20b/0x320
[<0>] fuse_sb_destroy+0xe2/0x1d0
[<0>] fuse_kill_sb_anon+0x15/0x20
[<0>] deactivate_locked_super+0x97/0x100
[<0>] deactivate_super+0x2bb/0x320
[<0>] cleanup_mnt+0xbf/0x160
[<0>] __cleanup_mnt+0x16/0x20
[<0>] task_work_run+0x1e8/0x2a0
[<0>] exit_to_usermode_loop+0x318/0x380
[<0>] do_syscall_64+0x6be/0x820
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0x

I double checked that writing to /sys/fs/fuse/connections/44/abort did
not help (the only entry in fuse/connections). Wrote multiple times,
and tried to kill the task, nothing helps.



> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+6339eda9cb4ebbc4c...@syzkaller.appspotmail.com
>
> INFO: task syz-executor221:17414 blocked for more than 140 seconds.
>   Not tainted 4.19.0+ #313
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor221 D23048 17414   5652 0x0004
> Call Trace:
>  context_switch kernel/sched/core.c:2831 [inline]
>  __schedule+0x8cf/0x21d0 kernel/sched/core.c:3472
>  schedule+0xfe/0x460 kernel/sched/core.c:3516
>  fuse_wait_aborted+0x20b/0x320 fs/fuse/dev.c:2155
>  fuse_sb_destroy+0xe2/0x1d0 fs/fuse/inode.c:1224
>  fuse_kill_sb_anon+0x15/0x20 fs/fuse/inode.c:1234
>  deactivate_locked_super+0x97/0x100 fs/super.c:329
>  deactivate_super+0x2bb/0x320 fs/super.c:360
>  cleanup_mnt+0xbf/0x160 fs/namespace.c:1098
>  __cleanup_mnt+0x16/0x20 fs/namespace.c:1105
>  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
>  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
>  exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
>  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x446689
> Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
> 0f 83 7b 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7f621e979da8 EFLAGS: 0293 ORIG_RAX: 00a6
> RAX:  RBX: 006dbc28 RCX: 00446689
> RDX: 00446689 RSI: 000a RDI: 2180
> RBP: 006dbc20 R08:  R09: 
> R10:  R11: 0293 R12: 006dbc2c
> R13: 0030656c69662f2e R14: 65646f6d746f6f72 R15: 
>
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1009:
>  #0: 95618e4f (rcu_read_lock){}, at:
> debug_show_all_locks+0xd0/0x424 kernel/locking/lockdep.c:4379
> 5 locks held by rsyslogd/5530:
>  #0: 28bad575 (>f_pos_lock){+.+.}, at: __fdget_pos+0x1bb/0x200
> fs/file.c:766
>  #1: e31592cd (>lock){-.-.}, at: rq_lock
> kernel/sched/sched.h:1126 [inline]
>  #1: e31592cd (>lock){-.-.}, at: __schedule+0x236/0x21d0
> kernel/sched/core.c:3410
>  #2: 95618e4f (rcu_read_lock){}, at: trace_sched_stat_runtime
> include/trace/events/sched.h:418 [inline]
>  #2: 95618e4f (rcu_read_lock){}, at: update_curr+0x383/0xbd0
> kernel/sched/fair.c:830
>  #3: e31592cd (>lock){-.-.}, at: rq_lock
> kernel/sched/sched.h:1126 [inline]
>  #3: e31592cd (>lock){-.-.}, at: ttwu_queue
> kernel/sched/core.c:1845 [inline]
>  #3: e31592cd (>lock){-.-.}, at: try_to_wake_up+0x9f6/0x1490
> kernel/sched/core.c:2057
>  #4: 95618e4f (rcu_read_lock){}, at: trace_sched_stat_runtime
> include/trace/events/sched.h:418 [inline]
>  #4: 95618e4f (rcu_read_lock){}, at: update_curr+0x383/0xbd0
> kernel/sched/fair.c:830
> 2 locks held by getty/5620:
>  #0: 492d5ad8 (>ldisc_sem){}, at: ldsem_down_read+0x32/0x40
> drivers/tty/tty_ldsem.c:353
>  #1: 88c4d769 (>atomic_read_lock){+.+.}, at:
> n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
> 2 locks held by getty/5621:
>  #0: ed56cf3c (>ldisc_sem){}, at: ldsem_down_read+0x32/0x40
> drivers/tty/tty_ldsem.c:353
>  #1: a8112d49 (>atomic_read_lock){+.+.}, at:
> n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
> 2 locks held by 

Re: NXP P50XX/e5500 secondary CPUs not onlined with current mainline (was [PATCH 20/21] of: use for_each_of_cpu_node iterator)

2018-11-01 Thread Michael Ellerman
Rob Herring  writes:
> On Wed, Oct 31, 2018 at 7:46 AM Michael Ellerman  wrote:
>> Rob Herring  writes:
>> > Use the for_each_of_cpu_node iterator to iterate over cpu nodes. This
>> > has the side effect of defaulting to iterating using "cpu" node names in
>> > preference to the deprecated (for FDT) device_type == "cpu".
>> >
>> > Cc: Frank Rowand 
>> > Cc: devicet...@vger.kernel.org
>> > Signed-off-by: Rob Herring 
>> > ---
>> > Please ack and I will take via the DT tree. This is dependent on the
>> > first 2 patches.
>> >
>> >  drivers/of/base.c|  2 +-
>> >  drivers/of/of_numa.c | 15 ++-
>> >  2 files changed, 3 insertions(+), 14 deletions(-)
>> >
>> > diff --git a/drivers/of/base.c b/drivers/of/base.c
>> > index 6389aeb2f48c..8285c07cab44 100644
>> > --- a/drivers/of/base.c
>> > +++ b/drivers/of/base.c
>> > @@ -389,7 +389,7 @@ struct device_node *of_get_cpu_node(int cpu, unsigned 
>> > int *thread)
>> >  {
>> >   struct device_node *cpun;
>> >
>> > - for_each_node_by_type(cpun, "cpu") {
>> > + for_each_of_cpu_node(cpun) {
>> >   if (arch_find_n_match_cpu_physical_id(cpun, cpu, thread))
>> >   return cpun;
>> >   }
>>
>> Previously we just looked for any node with a type of "cpu", but now
>> we're using for_each_of_cpu_node(), which does:
>>
>> for (; next; next = next->sibling) {
>> if (!(of_node_name_eq(next, "cpu") ||
>>   (next->type && !of_node_cmp(next->type, "cpu"
>> continue;
>>
>> if (!__of_device_is_available(next))
>> continue;
>> 
>>
>> if (of_node_get(next))
>> break;
>> }
>>
>>
>> ie. the available check is new.
>>
>> On this machine the 2nd CPU is not marked as available:
>>
>>   root@p5020ds:/proc/device-tree/cpus/PowerPC,e5500@1# lsprop status
>>   status   "disabled"
>>
>> This has the effect of preventing the SMP code from finding the 2nd CPU
>> in order to bring it up (in smp_85xx_start_cpu()). And so only the boot
>> CPU is onlined.
>>
>> The device tree is built from a dts:
>>
>>   arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
>>
>> But we don't set the status in there, so presumably u-boot is changing
>> the status during boot? (not a u-boot expert).
>
> Ah, status for cpus is a bit different. For most nodes, it should be
> equivalent to the node not being present, but for cpus it means
> offline if disabled. Though ARM platforms have never used it in that
> way.

Aha. We don't use it like that on server CPUs either, so perhaps it's
just a u-boot thing.

>> We could work around this in the platform code presumably, but I'm
>> worried this might break other things as well. You didn't mention the
>> addition of the available check in the change log so I wonder if it was
>> deliberate or just seemed like a good idea?
>
> Just seemed like a good idea...

Yeah fair enough.

> I'll send a patch now dropping those 2 lines.

Awesome, thanks.

cheers


Re: NXP P50XX/e5500 secondary CPUs not onlined with current mainline (was [PATCH 20/21] of: use for_each_of_cpu_node iterator)

2018-11-01 Thread Michael Ellerman
Rob Herring  writes:
> On Wed, Oct 31, 2018 at 7:46 AM Michael Ellerman  wrote:
>> Rob Herring  writes:
>> > Use the for_each_of_cpu_node iterator to iterate over cpu nodes. This
>> > has the side effect of defaulting to iterating using "cpu" node names in
>> > preference to the deprecated (for FDT) device_type == "cpu".
>> >
>> > Cc: Frank Rowand 
>> > Cc: devicet...@vger.kernel.org
>> > Signed-off-by: Rob Herring 
>> > ---
>> > Please ack and I will take via the DT tree. This is dependent on the
>> > first 2 patches.
>> >
>> >  drivers/of/base.c|  2 +-
>> >  drivers/of/of_numa.c | 15 ++-
>> >  2 files changed, 3 insertions(+), 14 deletions(-)
>> >
>> > diff --git a/drivers/of/base.c b/drivers/of/base.c
>> > index 6389aeb2f48c..8285c07cab44 100644
>> > --- a/drivers/of/base.c
>> > +++ b/drivers/of/base.c
>> > @@ -389,7 +389,7 @@ struct device_node *of_get_cpu_node(int cpu, unsigned 
>> > int *thread)
>> >  {
>> >   struct device_node *cpun;
>> >
>> > - for_each_node_by_type(cpun, "cpu") {
>> > + for_each_of_cpu_node(cpun) {
>> >   if (arch_find_n_match_cpu_physical_id(cpun, cpu, thread))
>> >   return cpun;
>> >   }
>>
>> Previously we just looked for any node with a type of "cpu", but now
>> we're using for_each_of_cpu_node(), which does:
>>
>> for (; next; next = next->sibling) {
>> if (!(of_node_name_eq(next, "cpu") ||
>>   (next->type && !of_node_cmp(next->type, "cpu"
>> continue;
>>
>> if (!__of_device_is_available(next))
>> continue;
>> 
>>
>> if (of_node_get(next))
>> break;
>> }
>>
>>
>> ie. the available check is new.
>>
>> On this machine the 2nd CPU is not marked as available:
>>
>>   root@p5020ds:/proc/device-tree/cpus/PowerPC,e5500@1# lsprop status
>>   status   "disabled"
>>
>> This has the effect of preventing the SMP code from finding the 2nd CPU
>> in order to bring it up (in smp_85xx_start_cpu()). And so only the boot
>> CPU is onlined.
>>
>> The device tree is built from a dts:
>>
>>   arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
>>
>> But we don't set the status in there, so presumably u-boot is changing
>> the status during boot? (not a u-boot expert).
>
> Ah, status for cpus is a bit different. For most nodes, it should be
> equivalent to the node not being present, but for cpus it means
> offline if disabled. Though ARM platforms have never used it in that
> way.

Aha. We don't use it like that on server CPUs either, so perhaps it's
just a u-boot thing.

>> We could work around this in the platform code presumably, but I'm
>> worried this might break other things as well. You didn't mention the
>> addition of the available check in the change log so I wonder if it was
>> deliberate or just seemed like a good idea?
>
> Just seemed like a good idea...

Yeah fair enough.

> I'll send a patch now dropping those 2 lines.

Awesome, thanks.

cheers


Re: [git pull] mount API series

2018-11-01 Thread Steven Whitehouse

Hi,


On 31/10/18 16:18, Linus Torvalds wrote:

On Tue, Oct 30, 2018 at 10:34 PM Al Viro  wrote:

 mount API series from David Howells.  Last cycle's objections
had been of the "I'd do it differently" variety and with no such
differently done variants having ever materialized over several
cycles...

Having just lurked on the discussions about the bugs in here, I don't
think this is ready. Maybe they got fixed, but if so, it was recent
and it was pretty fundamental.

The stated aim of the series is to make the mount API _better_, not
worse. And right now it looks like a "two steps back, one theoretical
step forwards" kind of better.

   Linus


The design of the new mount API has been under discussion for some time, 
both on the mailing lists, and also at LSF/MM too. Al and David (and 
others) have put a lot of work into getting to the current position, and 
have specifically requested input from Eric about his concerns over past 
cycles.


When I look at the discussions I'm seeing two main issues (please 
correct me if you think I'm wrong about this) which are (a) whether the 
design is correct and (b) whether there are still bugs in the current 
patch set.


Which of these are you most concerned about? It seems to me that there 
is not a lot of point in spending a large amount of time in additional 
review/testing of the current patch set if the overall design is set to 
be rejected. If your concerns are only with the robustness/stability of 
the patch set, then that provides a clear route to resolving the current 
impasse, at least assuming that Eric is able to enumerate the issues 
that he has discovered.


It looks like David has already provided a fix for one of the issues 
which Eric mentioned in his recent email. Eric it would be good if you 
could confirm that this addresses that particular concern,


Steve.



Re: [git pull] mount API series

2018-11-01 Thread Steven Whitehouse

Hi,


On 31/10/18 16:18, Linus Torvalds wrote:

On Tue, Oct 30, 2018 at 10:34 PM Al Viro  wrote:

 mount API series from David Howells.  Last cycle's objections
had been of the "I'd do it differently" variety and with no such
differently done variants having ever materialized over several
cycles...

Having just lurked on the discussions about the bugs in here, I don't
think this is ready. Maybe they got fixed, but if so, it was recent
and it was pretty fundamental.

The stated aim of the series is to make the mount API _better_, not
worse. And right now it looks like a "two steps back, one theoretical
step forwards" kind of better.

   Linus


The design of the new mount API has been under discussion for some time, 
both on the mailing lists, and also at LSF/MM too. Al and David (and 
others) have put a lot of work into getting to the current position, and 
have specifically requested input from Eric about his concerns over past 
cycles.


When I look at the discussions I'm seeing two main issues (please 
correct me if you think I'm wrong about this) which are (a) whether the 
design is correct and (b) whether there are still bugs in the current 
patch set.


Which of these are you most concerned about? It seems to me that there 
is not a lot of point in spending a large amount of time in additional 
review/testing of the current patch set if the overall design is set to 
be rejected. If your concerns are only with the robustness/stability of 
the patch set, then that provides a clear route to resolving the current 
impasse, at least assuming that Eric is able to enumerate the issues 
that he has discovered.


It looks like David has already provided a fix for one of the issues 
which Eric mentioned in his recent email. Eric it would be good if you 
could confirm that this addresses that particular concern,


Steve.



[PULL REQUEST] i2c for 4.20

2018-11-01 Thread Wolfram Sang
Linus,

I2C has a core bugfix & cleanup as well as an ID addition and
MAINTAINERS update for you.

Please pull.

Thanks,

   Wolfram


The following changes since commit 9bb9d4fdce9e6b351b7b905f150745a0fc06:

  Merge branch 'for-linus-4.20-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml (2018-10-31 15:46:16 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-next

for you to fetch changes up to 012ebc3b7801fcf424d0ebb4689c98f90a8593e0:

  MAINTAINERS: add maintainer for IMX LPI2C driver (2018-10-31 23:37:34 +)


A.s. Dong (2):
  dt-bindings: i2c: i2c-imx-lpi2c: add imx8qxp compatible string
  MAINTAINERS: add maintainer for IMX LPI2C driver

Charles Keepax (2):
  i2c: Remove unnecessary call to irq_find_mapping
  i2c: Clear client->irq in i2c_device_remove


with much appreciated quality assurance from

Benjamin Tissoires (2):
  (Rev.) i2c: Clear client->irq in i2c_device_remove
  (Rev.) i2c: Remove unnecessary call to irq_find_mapping

Rob Herring (1):
  (Rev.) dt-bindings: i2c: i2c-imx-lpi2c: add imx8qxp compatible string

 Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.txt | 1 +
 MAINTAINERS | 8 
 drivers/i2c/i2c-core-base.c | 7 +++
 3 files changed, 12 insertions(+), 4 deletions(-)


signature.asc
Description: PGP signature


[PULL REQUEST] i2c for 4.20

2018-11-01 Thread Wolfram Sang
Linus,

I2C has a core bugfix & cleanup as well as an ID addition and
MAINTAINERS update for you.

Please pull.

Thanks,

   Wolfram


The following changes since commit 9bb9d4fdce9e6b351b7b905f150745a0fc06:

  Merge branch 'for-linus-4.20-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml (2018-10-31 15:46:16 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-next

for you to fetch changes up to 012ebc3b7801fcf424d0ebb4689c98f90a8593e0:

  MAINTAINERS: add maintainer for IMX LPI2C driver (2018-10-31 23:37:34 +)


A.s. Dong (2):
  dt-bindings: i2c: i2c-imx-lpi2c: add imx8qxp compatible string
  MAINTAINERS: add maintainer for IMX LPI2C driver

Charles Keepax (2):
  i2c: Remove unnecessary call to irq_find_mapping
  i2c: Clear client->irq in i2c_device_remove


with much appreciated quality assurance from

Benjamin Tissoires (2):
  (Rev.) i2c: Clear client->irq in i2c_device_remove
  (Rev.) i2c: Remove unnecessary call to irq_find_mapping

Rob Herring (1):
  (Rev.) dt-bindings: i2c: i2c-imx-lpi2c: add imx8qxp compatible string

 Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.txt | 1 +
 MAINTAINERS | 8 
 drivers/i2c/i2c-core-base.c | 7 +++
 3 files changed, 12 insertions(+), 4 deletions(-)


signature.asc
Description: PGP signature


Re: [PATCH] kretprobe: produce sane stack traces

2018-11-01 Thread Aleksa Sarai
On 2018-11-01, Masami Hiramatsu  wrote:
> > > > Anyway, until that merge happens, this patch looks good to avoid
> > > > this issue for generic solution (e.g. for the arch which doesn't
> > > > supports retstack).
> > > 
> > > I think its time to come up with an algorithm that makes function graph
> > > work with multiple users, and have kretprobes be able to hook into it
> > > just like kprobes hooks into function tracer.
> > > 
> > > I have some ideas on how to get this done, and will try to have an RFC
> > > patch set ready by plumbers.
> > 
> > Should I continue working on this patchset?
> 
> Yes, until we finally introduce Steven's algorithm on all arch (yeah, we still
> have some archs which don't support graph-tracer but supports kprobes),
> I think your patch is the best fix for this issue.

Thanks, I just sent a v3.

Though, even with Steven's hooking of kprobes I think you'd still need
to stash away the stack trace somewhere (or am I misunderstanding the
proposal?).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [PATCH] kretprobe: produce sane stack traces

2018-11-01 Thread Aleksa Sarai
On 2018-11-01, Masami Hiramatsu  wrote:
> > > > Anyway, until that merge happens, this patch looks good to avoid
> > > > this issue for generic solution (e.g. for the arch which doesn't
> > > > supports retstack).
> > > 
> > > I think its time to come up with an algorithm that makes function graph
> > > work with multiple users, and have kretprobes be able to hook into it
> > > just like kprobes hooks into function tracer.
> > > 
> > > I have some ideas on how to get this done, and will try to have an RFC
> > > patch set ready by plumbers.
> > 
> > Should I continue working on this patchset?
> 
> Yes, until we finally introduce Steven's algorithm on all arch (yeah, we still
> have some archs which don't support graph-tracer but supports kprobes),
> I think your patch is the best fix for this issue.

Thanks, I just sent a v3.

Though, even with Steven's hooking of kprobes I think you'd still need
to stash away the stack trace somewhere (or am I misunderstanding the
proposal?).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [PATCH v3 0/7] Standardize onboard LED support for 96Boards

2018-11-01 Thread Heiko Stübner
Am Mittwoch, 31. Oktober 2018, 22:17:29 CET schrieb Linus Walleij:
> On Wed, Oct 31, 2018 at 4:47 PM Michal Simek  wrote:
> > No doubt about it that this is good. If this is there from day 1 all will
> > be good. I am just saying that we are all the time
> > saying that we shouldn't break userspace. Right now if there is single
> > application which uses existing names we are breaking it.
> 
> Yeah that's the problem, the approach is usually that if a tree falls
> in the forest and noone is there to hear it, then it doesn't make a
> sound.
> 
> But we can't assume someone is there as the safe default either.
> 
> I'd say apply it and see what happens, people are supposed to test.

At least for the Rockchip boards I'm somewhat confident, that they should
be new enough to not have anybody building cludges - especially as I think
LEDs might not be the most interesting device on such boards :-) .

So while I can apply the Rockchip patches, I'm still hoping for some
kind of consensus beforehand, least we roll it back after the fact  ;-) .

Heiko




Re: [PATCH v3 0/7] Standardize onboard LED support for 96Boards

2018-11-01 Thread Heiko Stübner
Am Mittwoch, 31. Oktober 2018, 22:17:29 CET schrieb Linus Walleij:
> On Wed, Oct 31, 2018 at 4:47 PM Michal Simek  wrote:
> > No doubt about it that this is good. If this is there from day 1 all will
> > be good. I am just saying that we are all the time
> > saying that we shouldn't break userspace. Right now if there is single
> > application which uses existing names we are breaking it.
> 
> Yeah that's the problem, the approach is usually that if a tree falls
> in the forest and noone is there to hear it, then it doesn't make a
> sound.
> 
> But we can't assume someone is there as the safe default either.
> 
> I'd say apply it and see what happens, people are supposed to test.

At least for the Rockchip boards I'm somewhat confident, that they should
be new enough to not have anybody building cludges - especially as I think
LEDs might not be the most interesting device on such boards :-) .

So while I can apply the Rockchip patches, I'm still hoping for some
kind of consensus beforehand, least we roll it back after the fact  ;-) .

Heiko




Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE

2018-11-01 Thread Konstantin Khlebnikov




On 01.11.2018 13:24, Michal Hocko wrote:

On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:

Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.


I would go on and say that allocations with sizes too large can actually
trigger a warning (once you have posted in the previous version outside
of the changelog area) because that might be interesting to people -
there are deployments to panic on warning and then a warning is much
more important.


It seems that warning isn't completely valid.


__alloc_pages_slowpath() handles this more gracefully:

/*
 * In the slowpath, we sanity check order to avoid ever trying to
 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
 * be using allocators in order of preference for an area that is
 * too large.
 */
if (order >= MAX_ORDER) {
WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
return NULL;
}


Fast path is ready for order >= MAX_ORDER


Problem is in node_reclaim() which is called earlier than 
__alloc_pages_slowpath()
from surprising place - get_page_from_freelist()


Probably node_reclaim() simply needs something like this:

if (order >= MAX_ORDER)
return NODE_RECLAIM_NOSCAN;





Signed-off-by: Konstantin Khlebnikov 


Acked-by: Michal Hocko 

Thanks!


---
  mm/util.c |4 
  1 file changed, 4 insertions(+)

diff --git a/mm/util.c b/mm/util.c
index 8bf08b5b5760..f5f04fa22814 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
gfp_t kmalloc_flags = flags;
void *ret;
  
+	if (size > KMALLOC_MAX_SIZE)

+   goto fallback;
+
/*
 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
tables)
 * so the given set of flags has to be compatible.
@@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
if (ret || size <= PAGE_SIZE)
return ret;
  
+fallback:

return __vmalloc_node_flags_caller(size, node, flags,
__builtin_return_address(0));
  }





Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE

2018-11-01 Thread Konstantin Khlebnikov




On 01.11.2018 13:24, Michal Hocko wrote:

On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:

Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.


I would go on and say that allocations with sizes too large can actually
trigger a warning (once you have posted in the previous version outside
of the changelog area) because that might be interesting to people -
there are deployments to panic on warning and then a warning is much
more important.


It seems that warning isn't completely valid.


__alloc_pages_slowpath() handles this more gracefully:

/*
 * In the slowpath, we sanity check order to avoid ever trying to
 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
 * be using allocators in order of preference for an area that is
 * too large.
 */
if (order >= MAX_ORDER) {
WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
return NULL;
}


Fast path is ready for order >= MAX_ORDER


Problem is in node_reclaim() which is called earlier than 
__alloc_pages_slowpath()
from surprising place - get_page_from_freelist()


Probably node_reclaim() simply needs something like this:

if (order >= MAX_ORDER)
return NODE_RECLAIM_NOSCAN;





Signed-off-by: Konstantin Khlebnikov 


Acked-by: Michal Hocko 

Thanks!


---
  mm/util.c |4 
  1 file changed, 4 insertions(+)

diff --git a/mm/util.c b/mm/util.c
index 8bf08b5b5760..f5f04fa22814 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
gfp_t kmalloc_flags = flags;
void *ret;
  
+	if (size > KMALLOC_MAX_SIZE)

+   goto fallback;
+
/*
 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
tables)
 * so the given set of flags has to be compatible.
@@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
if (ret || size <= PAGE_SIZE)
return ret;
  
+fallback:

return __vmalloc_node_flags_caller(size, node, flags,
__builtin_return_address(0));
  }





INFO: task hung in fuse_sb_destroy

2018-11-01 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:59fc453b21f7 Merge branch 'akpm' (patches from Andrew)
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15fb244740
kernel config:  https://syzkaller.appspot.com/x/.config?x=ea045471e4c756e8
dashboard link: https://syzkaller.appspot.com/bug?extid=6339eda9cb4ebbc4c37b
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=178a105d40
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1665113340

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+6339eda9cb4ebbc4c...@syzkaller.appspotmail.com

INFO: task syz-executor221:17414 blocked for more than 140 seconds.
  Not tainted 4.19.0+ #313
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor221 D23048 17414   5652 0x0004
Call Trace:
 context_switch kernel/sched/core.c:2831 [inline]
 __schedule+0x8cf/0x21d0 kernel/sched/core.c:3472
 schedule+0xfe/0x460 kernel/sched/core.c:3516
 fuse_wait_aborted+0x20b/0x320 fs/fuse/dev.c:2155
 fuse_sb_destroy+0xe2/0x1d0 fs/fuse/inode.c:1224
 fuse_kill_sb_anon+0x15/0x20 fs/fuse/inode.c:1234
 deactivate_locked_super+0x97/0x100 fs/super.c:329
 deactivate_super+0x2bb/0x320 fs/super.c:360
 cleanup_mnt+0xbf/0x160 fs/namespace.c:1098
 __cleanup_mnt+0x16/0x20 fs/namespace.c:1105
 task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
 prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
 do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446689
Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 7b 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00

RSP: 002b:7f621e979da8 EFLAGS: 0293 ORIG_RAX: 00a6
RAX:  RBX: 006dbc28 RCX: 00446689
RDX: 00446689 RSI: 000a RDI: 2180
RBP: 006dbc20 R08:  R09: 
R10:  R11: 0293 R12: 006dbc2c
R13: 0030656c69662f2e R14: 65646f6d746f6f72 R15: 

Showing all locks held in the system:
1 lock held by khungtaskd/1009:
 #0: 95618e4f (rcu_read_lock){}, at:  
debug_show_all_locks+0xd0/0x424 kernel/locking/lockdep.c:4379

5 locks held by rsyslogd/5530:
 #0: 28bad575 (>f_pos_lock){+.+.}, at: __fdget_pos+0x1bb/0x200  
fs/file.c:766
 #1: e31592cd (>lock){-.-.}, at: rq_lock  
kernel/sched/sched.h:1126 [inline]
 #1: e31592cd (>lock){-.-.}, at: __schedule+0x236/0x21d0  
kernel/sched/core.c:3410
 #2: 95618e4f (rcu_read_lock){}, at: trace_sched_stat_runtime  
include/trace/events/sched.h:418 [inline]
 #2: 95618e4f (rcu_read_lock){}, at: update_curr+0x383/0xbd0  
kernel/sched/fair.c:830
 #3: e31592cd (>lock){-.-.}, at: rq_lock  
kernel/sched/sched.h:1126 [inline]
 #3: e31592cd (>lock){-.-.}, at: ttwu_queue  
kernel/sched/core.c:1845 [inline]
 #3: e31592cd (>lock){-.-.}, at: try_to_wake_up+0x9f6/0x1490  
kernel/sched/core.c:2057
 #4: 95618e4f (rcu_read_lock){}, at: trace_sched_stat_runtime  
include/trace/events/sched.h:418 [inline]
 #4: 95618e4f (rcu_read_lock){}, at: update_curr+0x383/0xbd0  
kernel/sched/fair.c:830

2 locks held by getty/5620:
 #0: 492d5ad8 (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: 88c4d769 (>atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/5621:
 #0: ed56cf3c (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: a8112d49 (>atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/5622:
 #0: 858703c2 (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: b20ff0f8 (>atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/5623:
 #0: a0163126 (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: cb4be99e (>atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/5624:
 #0: 6eab39a0 (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: 341e7ea5 (>atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/5625:
 #0: e1bb9e75 (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: e6c38e03 (>atomic_read_lock){+.+.}, at:  

INFO: task hung in fuse_sb_destroy

2018-11-01 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:59fc453b21f7 Merge branch 'akpm' (patches from Andrew)
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15fb244740
kernel config:  https://syzkaller.appspot.com/x/.config?x=ea045471e4c756e8
dashboard link: https://syzkaller.appspot.com/bug?extid=6339eda9cb4ebbc4c37b
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=178a105d40
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1665113340

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+6339eda9cb4ebbc4c...@syzkaller.appspotmail.com

INFO: task syz-executor221:17414 blocked for more than 140 seconds.
  Not tainted 4.19.0+ #313
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor221 D23048 17414   5652 0x0004
Call Trace:
 context_switch kernel/sched/core.c:2831 [inline]
 __schedule+0x8cf/0x21d0 kernel/sched/core.c:3472
 schedule+0xfe/0x460 kernel/sched/core.c:3516
 fuse_wait_aborted+0x20b/0x320 fs/fuse/dev.c:2155
 fuse_sb_destroy+0xe2/0x1d0 fs/fuse/inode.c:1224
 fuse_kill_sb_anon+0x15/0x20 fs/fuse/inode.c:1234
 deactivate_locked_super+0x97/0x100 fs/super.c:329
 deactivate_super+0x2bb/0x320 fs/super.c:360
 cleanup_mnt+0xbf/0x160 fs/namespace.c:1098
 __cleanup_mnt+0x16/0x20 fs/namespace.c:1105
 task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
 prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
 do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446689
Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 7b 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00

RSP: 002b:7f621e979da8 EFLAGS: 0293 ORIG_RAX: 00a6
RAX:  RBX: 006dbc28 RCX: 00446689
RDX: 00446689 RSI: 000a RDI: 2180
RBP: 006dbc20 R08:  R09: 
R10:  R11: 0293 R12: 006dbc2c
R13: 0030656c69662f2e R14: 65646f6d746f6f72 R15: 

Showing all locks held in the system:
1 lock held by khungtaskd/1009:
 #0: 95618e4f (rcu_read_lock){}, at:  
debug_show_all_locks+0xd0/0x424 kernel/locking/lockdep.c:4379

5 locks held by rsyslogd/5530:
 #0: 28bad575 (>f_pos_lock){+.+.}, at: __fdget_pos+0x1bb/0x200  
fs/file.c:766
 #1: e31592cd (>lock){-.-.}, at: rq_lock  
kernel/sched/sched.h:1126 [inline]
 #1: e31592cd (>lock){-.-.}, at: __schedule+0x236/0x21d0  
kernel/sched/core.c:3410
 #2: 95618e4f (rcu_read_lock){}, at: trace_sched_stat_runtime  
include/trace/events/sched.h:418 [inline]
 #2: 95618e4f (rcu_read_lock){}, at: update_curr+0x383/0xbd0  
kernel/sched/fair.c:830
 #3: e31592cd (>lock){-.-.}, at: rq_lock  
kernel/sched/sched.h:1126 [inline]
 #3: e31592cd (>lock){-.-.}, at: ttwu_queue  
kernel/sched/core.c:1845 [inline]
 #3: e31592cd (>lock){-.-.}, at: try_to_wake_up+0x9f6/0x1490  
kernel/sched/core.c:2057
 #4: 95618e4f (rcu_read_lock){}, at: trace_sched_stat_runtime  
include/trace/events/sched.h:418 [inline]
 #4: 95618e4f (rcu_read_lock){}, at: update_curr+0x383/0xbd0  
kernel/sched/fair.c:830

2 locks held by getty/5620:
 #0: 492d5ad8 (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: 88c4d769 (>atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/5621:
 #0: ed56cf3c (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: a8112d49 (>atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/5622:
 #0: 858703c2 (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: b20ff0f8 (>atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/5623:
 #0: a0163126 (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: cb4be99e (>atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/5624:
 #0: 6eab39a0 (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: 341e7ea5 (>atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/5625:
 #0: e1bb9e75 (>ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: e6c38e03 (>atomic_read_lock){+.+.}, at:  

Re: [RFC PATCH v2] Minimal non-child process exit notification support

2018-11-01 Thread Aleksa Sarai
On 2018-11-01, Daniel Colascione  wrote:
> On Thu, Nov 1, 2018 at 7:00 AM, Aleksa Sarai  wrote:
> > On 2018-10-29, Daniel Colascione  wrote:
> >> This patch adds a new file under /proc/pid, /proc/pid/exithand.
> >> Attempting to read from an exithand file will block until the
> >> corresponding process exits, at which point the read will successfully
> >> complete with EOF.  The file descriptor supports both blocking
> >> operations and poll(2). It's intended to be a minimal interface for
> >> allowing a program to wait for the exit of a process that is not one
> >> of its children.
> >>
> >> Why might we want this interface? Android's lmkd kills processes in
> >> order to free memory in response to various memory pressure
> >> signals. It's desirable to wait until a killed process actually exits
> >> before moving on (if needed) to killing the next process. Since the
> >> processes that lmkd kills are not lmkd's children, lmkd currently
> >> lacks a way to wait for a process to actually die after being sent
> >> SIGKILL; today, lmkd resorts to polling the proc filesystem pid
> >> entry. This interface allow lmkd to give up polling and instead block
> >> and wait for process death.
> >
> > I agree with the need for this interface (with a few caveats), but there
> > are a few points I'd like to make:
> >
> >  * I don't think that making a new procfile is necessary. When you open
> >/proc/$pid you already have a handle for the underlying process, and
> >you can already poll to check whether the process has died (fstatat
> >fails for instance). What if we just used an inotify event to tell
> >userspace that the process has died -- to avoid userspace doing a
> >poll loop?
> 
> I'm trying to make a simple interface. The basic unix data access
> model is that a userspace application wants information (e.g., next
> bunch of bytes in a file, next packet from a socket, next signal from
> a signal FD, etc.), and tells the kernel so by making a system call on
> a file descriptor. Ordinarily, the kernel returns to userspace with
> the requested information when it's available, potentially after
> blocking until the information is available. Sometimes userspace
> doesn't want to block, so it adds O_NONBLOCK to the open file mode,
> and in this mode, the kernel can tell the userspace requestor "try
> again later", but the source of truth is still that
> ordinarily-blocking system call. How does userspace know when to try
> again in the "try again later" case? By using
> select/poll/epoll/whatever, which suggests a good time for that "try
> again later" retry, but is not dispositive about it, since that
> ordinarily-blocking system call is still the sole source of truth, and
> that poll is allowed to report spurious readabilty.

inotify gives you an event if a file or directory is deleted. A pid
dying semantically is similar to the idea of a /proc/$pid being deleted.
I don't see how a blocking read on a new procfile is simpler than using
the existing notification-on-file-events infrastructure -- not to
mention that the idea of "this file blocks until the thing we are
indirectly referencing by this file is gone" seems to me to be a really
strange interface.

Sure, it uses read(2) -- but is that the only constraint on designing
simple interfaces?

> The event file I'm proposing is so ordinary, in fact, that it works
> from the shell. Without some specific technical reason to do something
> different, we shouldn't do something unusual.

inotify-tools are available on effectively every distribution.

> Given that we *can*, cheaply, provide a clean and consistent API to
> userspace, why would we instead want to inflict some exotic and
> hard-to-use interface on userspace instead? Asking that userspace poll
> on a directory file descriptor and, when poll returns, check by
> looking for certain errors (we'd have to spec which ones) from fstatat
> is awkward. /proc/pid is a directory. In what other context does the
> kernel ask userspace to use a directory this way?

I'm not sure you understood my proposal. I said that we need an
interface to do this, and I was trying to explain (by noting what the
current way of doing it would be) what I think the interface should be.

To reiterate, I believe that having an inotify event (IN_DELETE_SELF on
/proc/$pid) would be in keeping with the current way of doing things but
allowing userspace to avoid all of the annoyances you just mentioned and
I was alluding to.

I *don't* think that the current scheme of looping on fstatat is the way
it should be left. And there is an argument the inotify is not
sufficient to 

> > I'm really not a huge fan of the "blocking read" semantic (though if we
> > have to have it, can we at least provide as much information as you get
> > from proc_connector -- such as the exit status?).
> [...]
> The exit status in /proc/pid/stat is zeroed out for readers that fail
> do_task_stat's ptrace_may_access call. (Falsifying the exit status in
> 

Re: [RFC PATCH v2] Minimal non-child process exit notification support

2018-11-01 Thread Aleksa Sarai
On 2018-11-01, Daniel Colascione  wrote:
> On Thu, Nov 1, 2018 at 7:00 AM, Aleksa Sarai  wrote:
> > On 2018-10-29, Daniel Colascione  wrote:
> >> This patch adds a new file under /proc/pid, /proc/pid/exithand.
> >> Attempting to read from an exithand file will block until the
> >> corresponding process exits, at which point the read will successfully
> >> complete with EOF.  The file descriptor supports both blocking
> >> operations and poll(2). It's intended to be a minimal interface for
> >> allowing a program to wait for the exit of a process that is not one
> >> of its children.
> >>
> >> Why might we want this interface? Android's lmkd kills processes in
> >> order to free memory in response to various memory pressure
> >> signals. It's desirable to wait until a killed process actually exits
> >> before moving on (if needed) to killing the next process. Since the
> >> processes that lmkd kills are not lmkd's children, lmkd currently
> >> lacks a way to wait for a process to actually die after being sent
> >> SIGKILL; today, lmkd resorts to polling the proc filesystem pid
> >> entry. This interface allow lmkd to give up polling and instead block
> >> and wait for process death.
> >
> > I agree with the need for this interface (with a few caveats), but there
> > are a few points I'd like to make:
> >
> >  * I don't think that making a new procfile is necessary. When you open
> >/proc/$pid you already have a handle for the underlying process, and
> >you can already poll to check whether the process has died (fstatat
> >fails for instance). What if we just used an inotify event to tell
> >userspace that the process has died -- to avoid userspace doing a
> >poll loop?
> 
> I'm trying to make a simple interface. The basic unix data access
> model is that a userspace application wants information (e.g., next
> bunch of bytes in a file, next packet from a socket, next signal from
> a signal FD, etc.), and tells the kernel so by making a system call on
> a file descriptor. Ordinarily, the kernel returns to userspace with
> the requested information when it's available, potentially after
> blocking until the information is available. Sometimes userspace
> doesn't want to block, so it adds O_NONBLOCK to the open file mode,
> and in this mode, the kernel can tell the userspace requestor "try
> again later", but the source of truth is still that
> ordinarily-blocking system call. How does userspace know when to try
> again in the "try again later" case? By using
> select/poll/epoll/whatever, which suggests a good time for that "try
> again later" retry, but is not dispositive about it, since that
> ordinarily-blocking system call is still the sole source of truth, and
> that poll is allowed to report spurious readabilty.

inotify gives you an event if a file or directory is deleted. A pid
dying semantically is similar to the idea of a /proc/$pid being deleted.
I don't see how a blocking read on a new procfile is simpler than using
the existing notification-on-file-events infrastructure -- not to
mention that the idea of "this file blocks until the thing we are
indirectly referencing by this file is gone" seems to me to be a really
strange interface.

Sure, it uses read(2) -- but is that the only constraint on designing
simple interfaces?

> The event file I'm proposing is so ordinary, in fact, that it works
> from the shell. Without some specific technical reason to do something
> different, we shouldn't do something unusual.

inotify-tools are available on effectively every distribution.

> Given that we *can*, cheaply, provide a clean and consistent API to
> userspace, why would we instead want to inflict some exotic and
> hard-to-use interface on userspace instead? Asking that userspace poll
> on a directory file descriptor and, when poll returns, check by
> looking for certain errors (we'd have to spec which ones) from fstatat
> is awkward. /proc/pid is a directory. In what other context does the
> kernel ask userspace to use a directory this way?

I'm not sure you understood my proposal. I said that we need an
interface to do this, and I was trying to explain (by noting what the
current way of doing it would be) what I think the interface should be.

To reiterate, I believe that having an inotify event (IN_DELETE_SELF on
/proc/$pid) would be in keeping with the current way of doing things but
allowing userspace to avoid all of the annoyances you just mentioned and
I was alluding to.

I *don't* think that the current scheme of looping on fstatat is the way
it should be left. And there is an argument the inotify is not
sufficient to 

> > I'm really not a huge fan of the "blocking read" semantic (though if we
> > have to have it, can we at least provide as much information as you get
> > from proc_connector -- such as the exit status?).
> [...]
> The exit status in /proc/pid/stat is zeroed out for readers that fail
> do_task_stat's ptrace_may_access call. (Falsifying the exit status in
> 

RE: [PATCH] x86/build: Build VSMP support only if selected

2018-11-01 Thread Shai Fultheim (s...@scalemp.com)
On 01/11/18 11:37, Thomas Gleixner wrote:

> VSMP support is built even if CONFIG_X86_VSMP is not set. This leads to a 
> build
> breakage when CONFIG_PCI is disabled as well.
> 
> Build VSMP code only when selected.

This patch disables detect_vsmp_box() on systems without CONFIG_X86_VSMP, due to
the recent 6da63eb241a05b0e676d68975e793c0521387141.  This is significant
regression that will affect significant number of deployments.

We will reply shortly with an updated patch that fix the dependency on 
pv_irq_ops,
and revert to CONFIG_PARAVIRT, with proper protection for CONFIG_PCI.


RE: [PATCH] x86/build: Build VSMP support only if selected

2018-11-01 Thread Shai Fultheim (s...@scalemp.com)
On 01/11/18 11:37, Thomas Gleixner wrote:

> VSMP support is built even if CONFIG_X86_VSMP is not set. This leads to a 
> build
> breakage when CONFIG_PCI is disabled as well.
> 
> Build VSMP code only when selected.

This patch disables detect_vsmp_box() on systems without CONFIG_X86_VSMP, due to
the recent 6da63eb241a05b0e676d68975e793c0521387141.  This is significant
regression that will affect significant number of deployments.

We will reply shortly with an updated patch that fix the dependency on 
pv_irq_ops,
and revert to CONFIG_PARAVIRT, with proper protection for CONFIG_PCI.


[PATCH v1 2/8] perf/x86/intel: add pmi callback support

2018-11-01 Thread Wei Wang
This patch adds the pmi callback support for the counters reserved by
components outside the perf core. For example, a hypervisor may register
such a callback to get the guest notified about the receiving of the
pmi.

The host PMI handling requires the active_events to be non-zero, so we
need active_events to be at least 1 in x86_perf_mask_perf_counters when
there are counters used by the guest.

Signed-off-by: Wei Wang 
Cc: Peter Zijlstra 
Cc: Andi Kleen 
Cc: Paolo Bonzini 
---
 arch/x86/events/core.c| 10 ++
 arch/x86/events/intel/core.c  | 27 +++
 arch/x86/events/perf_event.h  |  4 
 arch/x86/include/asm/perf_event.h |  5 +
 4 files changed, 46 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e73135a..504d102 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2408,6 +2408,16 @@ void x86_perf_mask_perf_counters(u64 mask)
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
 
/*
+* Increment active_events by 1 if there are counters reserved for
+* the guest to use, and no need to do more increments if there are
+* already counters taken by the guest.
+*/
+   if (!cpuc->intel_ctrl_guest_mask && mask)
+   atomic_inc(_events);
+   else if (cpuc->intel_ctrl_guest_mask && !mask)
+   atomic_dec(_events);
+
+   /*
 * If the counter happens to be used by a host event, take it back
 * first, and then restart the pmu after mask that counter as being
 * reserved.
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 0fb8659..f5e5191 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2283,6 +2283,13 @@ static int handle_pmi_common(struct pt_regs *regs, u64 
status)
 */
status |= cpuc->intel_cp_status;
 
+   if (status & cpuc->intel_ctrl_guest_mask) {
+   cpuc->pmi_callback(cpuc->pmi_opaque,
+  status & cpuc->intel_ctrl_guest_mask);
+   status &= ~cpuc->intel_ctrl_guest_mask;
+   handled++;
+   }
+
for_each_set_bit(bit, (unsigned long *), X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
 
@@ -3162,6 +3169,26 @@ struct perf_guest_switch_msr *perf_guest_get_msrs(int 
*nr)
 }
 EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
 
+void x86_perf_register_pmi_callback(pmi_callback_t callback, void *opaque)
+{
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+
+   cpuc->pmi_callback = callback;
+   cpuc->pmi_opaque = opaque;
+
+   apic_write(APIC_LVTPC, APIC_DM_NMI);
+}
+EXPORT_SYMBOL_GPL(x86_perf_register_pmi_callback);
+
+void x86_perf_unregister_pmi_callback(void)
+{
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+
+   cpuc->pmi_callback = NULL;
+   cpuc->pmi_opaque = NULL;
+}
+EXPORT_SYMBOL_GPL(x86_perf_unregister_pmi_callback);
+
 static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index adae087..06404eb 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -197,6 +197,10 @@ struct cpu_hw_events {
unsigned inttxn_flags;
int is_fake;
 
+   /* PMI related fields */
+   pmi_callback_t  pmi_callback;
+   void*pmi_opaque;
+
/*
 * Intel DebugStore bits
 */
diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index 5b4463e..fd33688 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -275,6 +275,8 @@ struct perf_guest_switch_msr {
u64 host, guest;
 };
 
+typedef void (*pmi_callback_t)(void *opaque, u64 status);
+
 extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
 extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
 extern void perf_check_microcode(void);
@@ -298,6 +300,9 @@ static inline void perf_check_microcode(void) { }
 #ifdef CONFIG_CPU_SUP_INTEL
  extern void intel_pt_handle_vmx(int on);
 extern void x86_perf_mask_perf_counters(u64 mask);
+extern void x86_perf_register_pmi_callback(pmi_callback_t callback,
+  void *opaque);
+extern void x86_perf_unregister_pmi_callback(void);
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
-- 
2.7.4



[PATCH v1 1/8] perf/x86: add support to mask counters from host

2018-11-01 Thread Wei Wang
Add x86_perf_mask_perf_counters to reserve counters from the host perf
subsystem. The masked counters will not be assigned to any host perf
events. This can be used by the hypervisor to reserve perf counters for
a guest to use.

This function is currently supported on Intel CPUs only, but put in x86
perf core because the counter assignment is implemented here and we need
to re-enable the pmu which is defined in the x86 perf core in the case
that a counter to be masked happens to be used by the host.

Signed-off-by: Wei Wang 
Cc: Peter Zijlstra 
Cc: Andi Kleen 
Cc: Paolo Bonzini 
---
 arch/x86/events/core.c| 37 +
 arch/x86/include/asm/perf_event.h |  1 +
 2 files changed, 38 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 106911b..e73135a 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -716,6 +716,7 @@ struct perf_sched {
 static void perf_sched_init(struct perf_sched *sched, struct event_constraint 
**constraints,
int num, int wmin, int wmax, int gpmax)
 {
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
int idx;
 
memset(sched, 0, sizeof(*sched));
@@ -723,6 +724,9 @@ static void perf_sched_init(struct perf_sched *sched, 
struct event_constraint **
sched->max_weight   = wmax;
sched->max_gp   = gpmax;
sched->constraints  = constraints;
+#ifdef CONFIG_CPU_SUP_INTEL
+   sched->state.used[0]= cpuc->intel_ctrl_guest_mask;
+#endif
 
for (idx = 0; idx < num; idx++) {
if (constraints[idx]->weight == wmin)
@@ -2386,6 +2390,39 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
}
 }
 
+#ifdef CONFIG_CPU_SUP_INTEL
+/**
+ * x86_perf_mask_perf_counters - mask perf counters
+ * @mask: the bitmask of counters
+ *
+ * Mask the perf counters that are not available to be used by the perf core.
+ * If the counter to be masked has been assigned, it will be taken back and
+ * then the perf core will re-assign usable counters to its events.
+ *
+ * This can be used by a component outside the perf core to reserve counters.
+ * For example, a hypervisor uses it to reserve counters for a guest to use,
+ * and later return the counters by another call with the related bits cleared.
+ */
+void x86_perf_mask_perf_counters(u64 mask)
+{
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+
+   /*
+* If the counter happens to be used by a host event, take it back
+* first, and then restart the pmu after mask that counter as being
+* reserved.
+*/
+   if (mask & cpuc->intel_ctrl_host_mask) {
+   perf_pmu_disable();
+   cpuc->intel_ctrl_guest_mask = mask;
+   perf_pmu_enable();
+   } else {
+   cpuc->intel_ctrl_guest_mask = mask;
+   }
+}
+EXPORT_SYMBOL_GPL(x86_perf_mask_perf_counters);
+#endif
+
 static inline int
 valid_user_frame(const void __user *fp, unsigned long size)
 {
diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index 8bdf749..5b4463e 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -297,6 +297,7 @@ static inline void perf_check_microcode(void) { }
 
 #ifdef CONFIG_CPU_SUP_INTEL
  extern void intel_pt_handle_vmx(int on);
+extern void x86_perf_mask_perf_counters(u64 mask);
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
-- 
2.7.4



[PATCH v1 2/8] perf/x86/intel: add pmi callback support

2018-11-01 Thread Wei Wang
This patch adds the pmi callback support for the counters reserved by
components outside the perf core. For example, a hypervisor may register
such a callback to get the guest notified about the receiving of the
pmi.

The host PMI handling requires the active_events to be non-zero, so we
need active_events to be at least 1 in x86_perf_mask_perf_counters when
there are counters used by the guest.

Signed-off-by: Wei Wang 
Cc: Peter Zijlstra 
Cc: Andi Kleen 
Cc: Paolo Bonzini 
---
 arch/x86/events/core.c| 10 ++
 arch/x86/events/intel/core.c  | 27 +++
 arch/x86/events/perf_event.h  |  4 
 arch/x86/include/asm/perf_event.h |  5 +
 4 files changed, 46 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e73135a..504d102 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2408,6 +2408,16 @@ void x86_perf_mask_perf_counters(u64 mask)
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
 
/*
+* Increment active_events by 1 if there are counters reserved for
+* the guest to use, and no need to do more increments if there are
+* already counters taken by the guest.
+*/
+   if (!cpuc->intel_ctrl_guest_mask && mask)
+   atomic_inc(_events);
+   else if (cpuc->intel_ctrl_guest_mask && !mask)
+   atomic_dec(_events);
+
+   /*
 * If the counter happens to be used by a host event, take it back
 * first, and then restart the pmu after mask that counter as being
 * reserved.
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 0fb8659..f5e5191 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2283,6 +2283,13 @@ static int handle_pmi_common(struct pt_regs *regs, u64 
status)
 */
status |= cpuc->intel_cp_status;
 
+   if (status & cpuc->intel_ctrl_guest_mask) {
+   cpuc->pmi_callback(cpuc->pmi_opaque,
+  status & cpuc->intel_ctrl_guest_mask);
+   status &= ~cpuc->intel_ctrl_guest_mask;
+   handled++;
+   }
+
for_each_set_bit(bit, (unsigned long *), X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
 
@@ -3162,6 +3169,26 @@ struct perf_guest_switch_msr *perf_guest_get_msrs(int 
*nr)
 }
 EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
 
+void x86_perf_register_pmi_callback(pmi_callback_t callback, void *opaque)
+{
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+
+   cpuc->pmi_callback = callback;
+   cpuc->pmi_opaque = opaque;
+
+   apic_write(APIC_LVTPC, APIC_DM_NMI);
+}
+EXPORT_SYMBOL_GPL(x86_perf_register_pmi_callback);
+
+void x86_perf_unregister_pmi_callback(void)
+{
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+
+   cpuc->pmi_callback = NULL;
+   cpuc->pmi_opaque = NULL;
+}
+EXPORT_SYMBOL_GPL(x86_perf_unregister_pmi_callback);
+
 static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index adae087..06404eb 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -197,6 +197,10 @@ struct cpu_hw_events {
unsigned inttxn_flags;
int is_fake;
 
+   /* PMI related fields */
+   pmi_callback_t  pmi_callback;
+   void*pmi_opaque;
+
/*
 * Intel DebugStore bits
 */
diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index 5b4463e..fd33688 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -275,6 +275,8 @@ struct perf_guest_switch_msr {
u64 host, guest;
 };
 
+typedef void (*pmi_callback_t)(void *opaque, u64 status);
+
 extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
 extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
 extern void perf_check_microcode(void);
@@ -298,6 +300,9 @@ static inline void perf_check_microcode(void) { }
 #ifdef CONFIG_CPU_SUP_INTEL
  extern void intel_pt_handle_vmx(int on);
 extern void x86_perf_mask_perf_counters(u64 mask);
+extern void x86_perf_register_pmi_callback(pmi_callback_t callback,
+  void *opaque);
+extern void x86_perf_unregister_pmi_callback(void);
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
-- 
2.7.4



[PATCH v1 1/8] perf/x86: add support to mask counters from host

2018-11-01 Thread Wei Wang
Add x86_perf_mask_perf_counters to reserve counters from the host perf
subsystem. The masked counters will not be assigned to any host perf
events. This can be used by the hypervisor to reserve perf counters for
a guest to use.

This function is currently supported on Intel CPUs only, but put in x86
perf core because the counter assignment is implemented here and we need
to re-enable the pmu which is defined in the x86 perf core in the case
that a counter to be masked happens to be used by the host.

Signed-off-by: Wei Wang 
Cc: Peter Zijlstra 
Cc: Andi Kleen 
Cc: Paolo Bonzini 
---
 arch/x86/events/core.c| 37 +
 arch/x86/include/asm/perf_event.h |  1 +
 2 files changed, 38 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 106911b..e73135a 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -716,6 +716,7 @@ struct perf_sched {
 static void perf_sched_init(struct perf_sched *sched, struct event_constraint 
**constraints,
int num, int wmin, int wmax, int gpmax)
 {
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
int idx;
 
memset(sched, 0, sizeof(*sched));
@@ -723,6 +724,9 @@ static void perf_sched_init(struct perf_sched *sched, 
struct event_constraint **
sched->max_weight   = wmax;
sched->max_gp   = gpmax;
sched->constraints  = constraints;
+#ifdef CONFIG_CPU_SUP_INTEL
+   sched->state.used[0]= cpuc->intel_ctrl_guest_mask;
+#endif
 
for (idx = 0; idx < num; idx++) {
if (constraints[idx]->weight == wmin)
@@ -2386,6 +2390,39 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
}
 }
 
+#ifdef CONFIG_CPU_SUP_INTEL
+/**
+ * x86_perf_mask_perf_counters - mask perf counters
+ * @mask: the bitmask of counters
+ *
+ * Mask the perf counters that are not available to be used by the perf core.
+ * If the counter to be masked has been assigned, it will be taken back and
+ * then the perf core will re-assign usable counters to its events.
+ *
+ * This can be used by a component outside the perf core to reserve counters.
+ * For example, a hypervisor uses it to reserve counters for a guest to use,
+ * and later return the counters by another call with the related bits cleared.
+ */
+void x86_perf_mask_perf_counters(u64 mask)
+{
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+
+   /*
+* If the counter happens to be used by a host event, take it back
+* first, and then restart the pmu after mask that counter as being
+* reserved.
+*/
+   if (mask & cpuc->intel_ctrl_host_mask) {
+   perf_pmu_disable();
+   cpuc->intel_ctrl_guest_mask = mask;
+   perf_pmu_enable();
+   } else {
+   cpuc->intel_ctrl_guest_mask = mask;
+   }
+}
+EXPORT_SYMBOL_GPL(x86_perf_mask_perf_counters);
+#endif
+
 static inline int
 valid_user_frame(const void __user *fp, unsigned long size)
 {
diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index 8bdf749..5b4463e 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -297,6 +297,7 @@ static inline void perf_check_microcode(void) { }
 
 #ifdef CONFIG_CPU_SUP_INTEL
  extern void intel_pt_handle_vmx(int on);
+extern void x86_perf_mask_perf_counters(u64 mask);
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
-- 
2.7.4



[PATCH v1 8/8] KVM/x86/vPMU: return the counters to host if guest is torn down

2018-11-01 Thread Wei Wang
Return the assigned counters to host in the case the guest is torn down
unexpectedly.

Signed-off-by: Wei Wang 
Cc: Andi Kleen 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
---
 arch/x86/kvm/pmu_intel.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 9eb5230..30fa8eb 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -410,7 +410,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
 {
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
-   int i;
+   u32 i, bit;
 
for (i = 0; i < INTEL_PMC_MAX_GENERIC; i++) {
pmc = >gp_counters[i];
@@ -422,6 +422,10 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
pmc->counter = 0;
}
 
+   for_each_set_bit(bit, (unsigned long *)>assigned_pmc_bitmap,
+X86_PMC_IDX_MAX)
+   intel_pmu_put_pmc(pmu, bit);
+
pmu->fixed_ctr_ctrl = 0;
pmu->global_ctrl = 0;
pmu->global_status = 0;
-- 
2.7.4



Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

2018-11-01 Thread Dean Wallace
On 31-10-18, Pierre-Louis Bossart wrote:
> 
> > Just thought it worth mentioning, this new patch that fixes sound
> > again, seems to have ressurected an old issue with PLL unlock.  I'm
> > seeing journal entries after fresh boot ..
> > 
> > ```
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard systemd[462]: Started Sound Service.
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090_pll_work: 141 callbacks suppressed
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > ```
> > 
> > sound is ok, but sometimes plugging in headphones spams journal with
> > those PLL messages, and sound turns into "daleks", and I have to
> > remove/insert headphones few times or stop/start audio to fix it.
> > It's a very old issue, maybe you'd know more about it.
> 
> I noticed this error on my Orco device used for tests many moons ago, but I
> could never find out what led to this error case, it wasn't deterministic
> and didn't impact the audio quality. All I could do is rate_limit it... If
> we have an A vs. B situation it'd be really helpful to diagnose further.
> 
> Is there really a causality between the changes from Hans and this PLL
> unlock error? Are you 100% sure this was not present in the previous install
> you used (4.18.14 as mentioned earlier in the thread)?
> 
> Thanks
> 
> -Pierre
> 
Well, numerous boots, kernels, headphone inserting - no PLL or
'Daleks'.  My laptop must have been haunted that day (halloween).
I'll put it to bed.

-- 
Dean


[PATCH v1 3/8] KVM/x86/vPMU: optimize intel vPMU

2018-11-01 Thread Wei Wang
Current vPMU relies on the host perf software stack to update the guest
change of the perf counter MSRs. The whole process includes releasing the
old perf event and re-creating a new one. This results in around 250ns
overhead to update a perf counter control MSR.

This can be avoided by having the vPMU layer directly sits on the hardware
perf counters, and in this case the guest accesses to the virtual perf
counters can be directly applied to the related hardware counter by vPMU.
The guest used counters are taken from the host perf core via
x86_perf_mask_perf_counters, which in most cases is a bit-setting of the
guest mask.

This patch implements the handling of guest accesses to the perf counter
MSRs. A host perf counter is assigned to the guest when the guest has the
vPMC enabled and returned to the host when the vPMC gets disabled.

Signed-off-by: Wei Wang 
Cc: Andi Kleen 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/pmu_intel.c| 257 
 2 files changed, 209 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55e51ff..f8bc46d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -463,6 +463,7 @@ struct kvm_pmu {
u64 global_ovf_ctrl;
u64 counter_bitmask[2];
u64 global_ctrl_mask;
+   u64 assigned_pmc_bitmap;
u64 reserved_bits;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 5ab4a36..8c2d37f 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -185,7 +185,11 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 
msr, u64 *data)
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_fixed_pmc(pmu, msr))) {
-   *data = pmc_read_counter(pmc);
+   if (test_bit(pmc->idx,
+   (unsigned long *)>assigned_pmc_bitmap))
+   rdmsrl(msr, *data);
+   else
+   *data = pmc->counter;
return 0;
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
*data = pmc->eventsel;
@@ -196,59 +200,210 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 
msr, u64 *data)
return 1;
 }
 
+static void intel_pmu_update_fixed_ctrl_msr(u64 new_ctrl, u8 idx)
+{
+   u64 host_ctrl, mask;
+
+   rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, host_ctrl);
+   mask = 0xfULL << (idx * 4);
+   host_ctrl &= ~mask;
+   new_ctrl <<= (idx * 4);
+   host_ctrl |= new_ctrl;
+   wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, host_ctrl);
+}
+
+static void intel_pmu_save_pmc_counters(struct kvm_pmu *pmu, u32 idx)
+{
+   struct kvm_pmc *pmc;
+
+   pmc = intel_pmc_idx_to_pmc(pmu, idx);
+   /*
+* The control MSRs (pmc->eventsel and pmu->fixed_ctr_ctrl) always
+* store the updated value, so we only need to save the counter value.
+*/
+   if (pmc->type == KVM_PMC_GP)
+   rdmsrl(MSR_IA32_PERFCTR0 + idx, pmc->counter);
+   else
+   rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + idx - INTEL_PMC_IDX_FIXED,
+  pmc->counter);
+}
+
+static void intel_pmu_restore_pmc_counters(struct kvm_pmu *pmu, u32 idx)
+{
+   struct kvm_pmc *pmc;
+
+   pmc = intel_pmc_idx_to_pmc(pmu, idx);
+
+   if (pmc->type == KVM_PMC_GP) {
+   wrmsrl(MSR_IA32_PERFCTR0 + idx, pmc->counter);
+   wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + idx, pmc->eventsel);
+   } else {
+   u8 ctrl;
+
+   idx -= INTEL_PMC_IDX_FIXED;
+   ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
+
+   wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + idx, pmc->counter);
+   intel_pmu_update_fixed_ctrl_msr(ctrl, idx);
+   }
+}
+
+/* Get the physical PMC from host and restore the vPMC states. */
+static inline void intel_pmu_get_pmc(struct kvm_pmu *pmu, unsigned int idx)
+{
+   /* Already assigned? */
+   if (test_bit(idx, (unsigned long *)>assigned_pmc_bitmap))
+   return;
+
+   set_bit(idx, (unsigned long *)>assigned_pmc_bitmap);
+   x86_perf_mask_perf_counters(pmu->assigned_pmc_bitmap);
+   intel_pmu_restore_pmc_counters(pmu, idx);
+}
+
+/* Save the physical PMC state and return it to host. */
+static inline void intel_pmu_put_pmc(struct kvm_pmu *pmu, unsigned int idx)
+{
+   /* Already returned? */
+   if (!test_bit(idx, (unsigned long *)>assigned_pmc_bitmap))
+   return;
+
+   intel_pmu_save_pmc_counters(pmu, idx);
+   clear_bit(idx, (unsigned long *)>assigned_pmc_bitmap);
+   x86_perf_mask_perf_counters(pmu->assigned_pmc_bitmap);
+}
+
+static int 

[PATCH v1 8/8] KVM/x86/vPMU: return the counters to host if guest is torn down

2018-11-01 Thread Wei Wang
Return the assigned counters to host in the case the guest is torn down
unexpectedly.

Signed-off-by: Wei Wang 
Cc: Andi Kleen 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
---
 arch/x86/kvm/pmu_intel.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 9eb5230..30fa8eb 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -410,7 +410,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
 {
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
-   int i;
+   u32 i, bit;
 
for (i = 0; i < INTEL_PMC_MAX_GENERIC; i++) {
pmc = >gp_counters[i];
@@ -422,6 +422,10 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
pmc->counter = 0;
}
 
+   for_each_set_bit(bit, (unsigned long *)>assigned_pmc_bitmap,
+X86_PMC_IDX_MAX)
+   intel_pmu_put_pmc(pmu, bit);
+
pmu->fixed_ctr_ctrl = 0;
pmu->global_ctrl = 0;
pmu->global_status = 0;
-- 
2.7.4



Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

2018-11-01 Thread Dean Wallace
On 31-10-18, Pierre-Louis Bossart wrote:
> 
> > Just thought it worth mentioning, this new patch that fixes sound
> > again, seems to have ressurected an old issue with PLL unlock.  I'm
> > seeing journal entries after fresh boot ..
> > 
> > ```
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard systemd[462]: Started Sound Service.
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > picard kernel: max98090_pll_work: 141 callbacks suppressed
> > picard kernel: max98090 i2c-193C9890:00: PLL unlocked
> > ```
> > 
> > sound is ok, but sometimes plugging in headphones spams journal with
> > those PLL messages, and sound turns into "daleks", and I have to
> > remove/insert headphones few times or stop/start audio to fix it.
> > It's a very old issue, maybe you'd know more about it.
> 
> I noticed this error on my Orco device used for tests many moons ago, but I
> could never find out what led to this error case, it wasn't deterministic
> and didn't impact the audio quality. All I could do is rate_limit it... If
> we have an A vs. B situation it'd be really helpful to diagnose further.
> 
> Is there really a causality between the changes from Hans and this PLL
> unlock error? Are you 100% sure this was not present in the previous install
> you used (4.18.14 as mentioned earlier in the thread)?
> 
> Thanks
> 
> -Pierre
> 
Well, numerous boots, kernels, headphone inserting - no PLL or
'Daleks'.  My laptop must have been haunted that day (halloween).
I'll put it to bed.

-- 
Dean


[PATCH v1 3/8] KVM/x86/vPMU: optimize intel vPMU

2018-11-01 Thread Wei Wang
Current vPMU relies on the host perf software stack to update the guest
change of the perf counter MSRs. The whole process includes releasing the
old perf event and re-creating a new one. This results in around 250ns
overhead to update a perf counter control MSR.

This can be avoided by having the vPMU layer directly sits on the hardware
perf counters, and in this case the guest accesses to the virtual perf
counters can be directly applied to the related hardware counter by vPMU.
The guest used counters are taken from the host perf core via
x86_perf_mask_perf_counters, which in most cases is a bit-setting of the
guest mask.

This patch implements the handling of guest accesses to the perf counter
MSRs. A host perf counter is assigned to the guest when the guest has the
vPMC enabled and returned to the host when the vPMC gets disabled.

Signed-off-by: Wei Wang 
Cc: Andi Kleen 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/pmu_intel.c| 257 
 2 files changed, 209 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55e51ff..f8bc46d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -463,6 +463,7 @@ struct kvm_pmu {
u64 global_ovf_ctrl;
u64 counter_bitmask[2];
u64 global_ctrl_mask;
+   u64 assigned_pmc_bitmap;
u64 reserved_bits;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 5ab4a36..8c2d37f 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -185,7 +185,11 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 
msr, u64 *data)
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_fixed_pmc(pmu, msr))) {
-   *data = pmc_read_counter(pmc);
+   if (test_bit(pmc->idx,
+   (unsigned long *)>assigned_pmc_bitmap))
+   rdmsrl(msr, *data);
+   else
+   *data = pmc->counter;
return 0;
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
*data = pmc->eventsel;
@@ -196,59 +200,210 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 
msr, u64 *data)
return 1;
 }
 
+static void intel_pmu_update_fixed_ctrl_msr(u64 new_ctrl, u8 idx)
+{
+   u64 host_ctrl, mask;
+
+   rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, host_ctrl);
+   mask = 0xfULL << (idx * 4);
+   host_ctrl &= ~mask;
+   new_ctrl <<= (idx * 4);
+   host_ctrl |= new_ctrl;
+   wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, host_ctrl);
+}
+
+static void intel_pmu_save_pmc_counters(struct kvm_pmu *pmu, u32 idx)
+{
+   struct kvm_pmc *pmc;
+
+   pmc = intel_pmc_idx_to_pmc(pmu, idx);
+   /*
+* The control MSRs (pmc->eventsel and pmu->fixed_ctr_ctrl) always
+* store the updated value, so we only need to save the counter value.
+*/
+   if (pmc->type == KVM_PMC_GP)
+   rdmsrl(MSR_IA32_PERFCTR0 + idx, pmc->counter);
+   else
+   rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + idx - INTEL_PMC_IDX_FIXED,
+  pmc->counter);
+}
+
+static void intel_pmu_restore_pmc_counters(struct kvm_pmu *pmu, u32 idx)
+{
+   struct kvm_pmc *pmc;
+
+   pmc = intel_pmc_idx_to_pmc(pmu, idx);
+
+   if (pmc->type == KVM_PMC_GP) {
+   wrmsrl(MSR_IA32_PERFCTR0 + idx, pmc->counter);
+   wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + idx, pmc->eventsel);
+   } else {
+   u8 ctrl;
+
+   idx -= INTEL_PMC_IDX_FIXED;
+   ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
+
+   wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + idx, pmc->counter);
+   intel_pmu_update_fixed_ctrl_msr(ctrl, idx);
+   }
+}
+
+/* Get the physical PMC from host and restore the vPMC states. */
+static inline void intel_pmu_get_pmc(struct kvm_pmu *pmu, unsigned int idx)
+{
+   /* Already assigned? */
+   if (test_bit(idx, (unsigned long *)>assigned_pmc_bitmap))
+   return;
+
+   set_bit(idx, (unsigned long *)>assigned_pmc_bitmap);
+   x86_perf_mask_perf_counters(pmu->assigned_pmc_bitmap);
+   intel_pmu_restore_pmc_counters(pmu, idx);
+}
+
+/* Save the physical PMC state and return it to host. */
+static inline void intel_pmu_put_pmc(struct kvm_pmu *pmu, unsigned int idx)
+{
+   /* Already returned? */
+   if (!test_bit(idx, (unsigned long *)>assigned_pmc_bitmap))
+   return;
+
+   intel_pmu_save_pmc_counters(pmu, idx);
+   clear_bit(idx, (unsigned long *)>assigned_pmc_bitmap);
+   x86_perf_mask_perf_counters(pmu->assigned_pmc_bitmap);
+}
+
+static int 

[PATCH v1 7/8] KVM/x86/vPMU: save/restore guest perf counters on vCPU switching

2018-11-01 Thread Wei Wang
When the vCPU is scheduled in, restore the assigned perf counter states
and register the guest pmi callback. When the vCPU is scheduled out,
save the assigned perf counter states and unregister the guest PMI
callback.

Signed-off-by: Wei Wang 
Cc: Andi Kleen 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/pmu.c  | 12 
 arch/x86/kvm/pmu.h  |  4 
 arch/x86/kvm/pmu_intel.c| 38 ++
 arch/x86/kvm/x86.c  |  6 ++
 include/linux/kvm_host.h|  1 +
 virt/kvm/kvm_main.c |  3 +++
 7 files changed, 65 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b66d164..cb1c0bf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -474,6 +474,7 @@ struct kvm_pmu {
u64 counter_bitmask[2];
u64 global_ctrl_mask;
u64 assigned_pmc_bitmap;
+   u64 restore_pmc_bitmap;
u64 reserved_bits;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7f2f63e..4448a88 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -340,6 +340,18 @@ void kvm_pmu_reset(struct kvm_vcpu *vcpu)
kvm_x86_ops->pmu_ops->reset(vcpu);
 }
 
+void kvm_pmu_sched_in(struct kvm_vcpu *vcpu)
+{
+   if (kvm_x86_ops->pmu_ops->sched_in)
+   kvm_x86_ops->pmu_ops->sched_in(vcpu);
+}
+
+void kvm_pmu_sched_out(struct kvm_vcpu *vcpu)
+{
+   if (kvm_x86_ops->pmu_ops->sched_out)
+   kvm_x86_ops->pmu_ops->sched_out(vcpu);
+}
+
 void kvm_pmu_init(struct kvm_vcpu *vcpu)
 {
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 7ab85bf..77fc973 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -34,6 +34,8 @@ struct kvm_pmu_ops {
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
+   void (*sched_in)(struct kvm_vcpu *vcpu);
+   void (*sched_out)(struct kvm_vcpu *vcpu);
 };
 
 static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
@@ -118,6 +120,8 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
 void kvm_pmu_reset(struct kvm_vcpu *vcpu);
 void kvm_pmu_init(struct kvm_vcpu *vcpu);
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
+void kvm_pmu_sched_in(struct kvm_vcpu *vcpu);
+void kvm_pmu_sched_out(struct kvm_vcpu *vcpu);
 
 bool is_vmware_backdoor_pmc(u32 pmc_idx);
 struct kvm_perf_switch_msr *intel_pmu_get_switch_msrs(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 66d7c09..9eb5230 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -427,6 +427,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
pmu->global_status = 0;
pmu->global_ovf_ctrl = 0;
pmu->assigned_pmc_bitmap = 0;
+   pmu->restore_pmc_bitmap = 0;
 }
 
 struct kvm_perf_switch_msr *intel_pmu_get_switch_msrs(struct kvm_vcpu *vcpu,
@@ -448,6 +449,41 @@ struct kvm_perf_switch_msr 
*intel_pmu_get_switch_msrs(struct kvm_vcpu *vcpu,
return arr;
 }
 
+static void intel_pmu_inject_guest_pmi(void *opaque, u64 status)
+{
+   struct kvm_vcpu *vcpu = opaque;
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+   pmu->global_status |= status;
+   kvm_make_request(KVM_REQ_PMI, vcpu);
+}
+
+static void intel_pmu_sched_out(struct kvm_vcpu *vcpu)
+{
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   u32 bit;
+
+   pmu->restore_pmc_bitmap = pmu->assigned_pmc_bitmap;
+   for_each_set_bit(bit, (unsigned long *)>restore_pmc_bitmap,
+X86_PMC_IDX_MAX)
+   intel_pmu_put_pmc(pmu, bit);
+
+   x86_perf_unregister_pmi_callback();
+}
+
+static void intel_pmu_sched_in(struct kvm_vcpu *vcpu)
+{
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   u32 bit;
+
+   for_each_set_bit(bit, (unsigned long *)>restore_pmc_bitmap,
+X86_PMC_IDX_MAX)
+   intel_pmu_get_pmc(pmu, bit);
+   pmu->restore_pmc_bitmap = 0;
+
+   x86_perf_register_pmi_callback(intel_pmu_inject_guest_pmi, vcpu);
+}
+
 struct kvm_pmu_ops intel_pmu_ops = {
.is_valid_msr_idx = intel_is_valid_msr_idx,
.is_valid_msr = intel_is_valid_msr,
@@ -457,4 +493,6 @@ struct kvm_pmu_ops intel_pmu_ops = {
.refresh = intel_pmu_refresh,
.init = intel_pmu_init,
.reset = intel_pmu_reset,
+   .sched_out = intel_pmu_sched_out,
+   .sched_in = intel_pmu_sched_in,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 66d66d7..47308bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8986,9 +8986,15 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
vcpu->arch.l1tf_flush_l1d = true;
+   kvm_pmu_sched_in(vcpu);
   

[PATCH v1 4/8] KVM/x86/vPMU: support msr switch on vmx transitions

2018-11-01 Thread Wei Wang
This patch adds support to intel vPMU to switch msrs on vmx transitions.
Currenly only 1 msr (global ctrl) is switched. The number can be
increased on demand in the future (e.g. pebs enable). The old method from
the host perf subsystem is also removed.

Signed-off-by: Wei Wang 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Andi Kleen 
---
 arch/x86/events/intel/core.c  | 60 ---
 arch/x86/events/perf_event.h  |  6 
 arch/x86/include/asm/kvm_host.h   | 11 +++
 arch/x86/include/asm/perf_event.h | 12 
 arch/x86/kvm/pmu.h|  2 ++
 arch/x86/kvm/pmu_intel.c  | 19 +
 arch/x86/kvm/vmx.c|  6 ++--
 7 files changed, 35 insertions(+), 81 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index f5e5191..33c156f 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3160,15 +3160,6 @@ static int intel_pmu_hw_config(struct perf_event *event)
return 0;
 }
 
-struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
-{
-   if (x86_pmu.guest_get_msrs)
-   return x86_pmu.guest_get_msrs(nr);
-   *nr = 0;
-   return NULL;
-}
-EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
-
 void x86_perf_register_pmi_callback(pmi_callback_t callback, void *opaque)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
@@ -3189,55 +3180,6 @@ void x86_perf_unregister_pmi_callback(void)
 }
 EXPORT_SYMBOL_GPL(x86_perf_unregister_pmi_callback);
 
-static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
-{
-   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
-   struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
-
-   arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
-   arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
-   arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
-   /*
-* If PMU counter has PEBS enabled it is not enough to disable counter
-* on a guest entry since PEBS memory write can overshoot guest entry
-* and corrupt guest memory. Disabling PEBS solves the problem.
-*/
-   arr[1].msr = MSR_IA32_PEBS_ENABLE;
-   arr[1].host = cpuc->pebs_enabled;
-   arr[1].guest = 0;
-
-   *nr = 2;
-   return arr;
-}
-
-static struct perf_guest_switch_msr *core_guest_get_msrs(int *nr)
-{
-   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
-   struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
-   int idx;
-
-   for (idx = 0; idx < x86_pmu.num_counters; idx++)  {
-   struct perf_event *event = cpuc->events[idx];
-
-   arr[idx].msr = x86_pmu_config_addr(idx);
-   arr[idx].host = arr[idx].guest = 0;
-
-   if (!test_bit(idx, cpuc->active_mask))
-   continue;
-
-   arr[idx].host = arr[idx].guest =
-   event->hw.config | ARCH_PERFMON_EVENTSEL_ENABLE;
-
-   if (event->attr.exclude_host)
-   arr[idx].host &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
-   else if (event->attr.exclude_guest)
-   arr[idx].guest &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
-   }
-
-   *nr = x86_pmu.num_counters;
-   return arr;
-}
-
 static void core_pmu_enable_event(struct perf_event *event)
 {
if (!event->attr.exclude_host)
@@ -3641,7 +3583,6 @@ static __initconst const struct x86_pmu core_pmu = {
.get_event_constraints  = intel_get_event_constraints,
.put_event_constraints  = intel_put_event_constraints,
.event_constraints  = intel_core_event_constraints,
-   .guest_get_msrs = core_guest_get_msrs,
.format_attrs   = intel_arch_formats_attr,
.events_sysfs_show  = intel_event_sysfs_show,
 
@@ -3694,7 +3635,6 @@ static __initconst const struct x86_pmu intel_pmu = {
.cpu_prepare= intel_pmu_cpu_prepare,
.cpu_starting   = intel_pmu_cpu_starting,
.cpu_dying  = intel_pmu_cpu_dying,
-   .guest_get_msrs = intel_guest_get_msrs,
.sched_task = intel_pmu_sched_task,
 };
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 06404eb..9f818d5 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -227,7 +227,6 @@ struct cpu_hw_events {
 */
u64 intel_ctrl_guest_mask;
u64 intel_ctrl_host_mask;
-   struct perf_guest_switch_msrguest_switch_msrs[X86_PMC_IDX_MAX];
 
/*
 * Intel checkpoint mask
@@ -645,11 +644,6 @@ struct x86_pmu {
 */
struct extra_reg *extra_regs;
unsigned int flags;
-
-   /*
-* Intel host/guest support (KVM)
-*/
-   struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
 };
 
 struct x86_perf_task_context {
diff 

[PATCH v1 7/8] KVM/x86/vPMU: save/restore guest perf counters on vCPU switching

2018-11-01 Thread Wei Wang
When the vCPU is scheduled in, restore the assigned perf counter states
and register the guest pmi callback. When the vCPU is scheduled out,
save the assigned perf counter states and unregister the guest PMI
callback.

Signed-off-by: Wei Wang 
Cc: Andi Kleen 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/pmu.c  | 12 
 arch/x86/kvm/pmu.h  |  4 
 arch/x86/kvm/pmu_intel.c| 38 ++
 arch/x86/kvm/x86.c  |  6 ++
 include/linux/kvm_host.h|  1 +
 virt/kvm/kvm_main.c |  3 +++
 7 files changed, 65 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b66d164..cb1c0bf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -474,6 +474,7 @@ struct kvm_pmu {
u64 counter_bitmask[2];
u64 global_ctrl_mask;
u64 assigned_pmc_bitmap;
+   u64 restore_pmc_bitmap;
u64 reserved_bits;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7f2f63e..4448a88 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -340,6 +340,18 @@ void kvm_pmu_reset(struct kvm_vcpu *vcpu)
kvm_x86_ops->pmu_ops->reset(vcpu);
 }
 
+void kvm_pmu_sched_in(struct kvm_vcpu *vcpu)
+{
+   if (kvm_x86_ops->pmu_ops->sched_in)
+   kvm_x86_ops->pmu_ops->sched_in(vcpu);
+}
+
+void kvm_pmu_sched_out(struct kvm_vcpu *vcpu)
+{
+   if (kvm_x86_ops->pmu_ops->sched_out)
+   kvm_x86_ops->pmu_ops->sched_out(vcpu);
+}
+
 void kvm_pmu_init(struct kvm_vcpu *vcpu)
 {
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 7ab85bf..77fc973 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -34,6 +34,8 @@ struct kvm_pmu_ops {
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
+   void (*sched_in)(struct kvm_vcpu *vcpu);
+   void (*sched_out)(struct kvm_vcpu *vcpu);
 };
 
 static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
@@ -118,6 +120,8 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
 void kvm_pmu_reset(struct kvm_vcpu *vcpu);
 void kvm_pmu_init(struct kvm_vcpu *vcpu);
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
+void kvm_pmu_sched_in(struct kvm_vcpu *vcpu);
+void kvm_pmu_sched_out(struct kvm_vcpu *vcpu);
 
 bool is_vmware_backdoor_pmc(u32 pmc_idx);
 struct kvm_perf_switch_msr *intel_pmu_get_switch_msrs(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 66d7c09..9eb5230 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -427,6 +427,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
pmu->global_status = 0;
pmu->global_ovf_ctrl = 0;
pmu->assigned_pmc_bitmap = 0;
+   pmu->restore_pmc_bitmap = 0;
 }
 
 struct kvm_perf_switch_msr *intel_pmu_get_switch_msrs(struct kvm_vcpu *vcpu,
@@ -448,6 +449,41 @@ struct kvm_perf_switch_msr 
*intel_pmu_get_switch_msrs(struct kvm_vcpu *vcpu,
return arr;
 }
 
+static void intel_pmu_inject_guest_pmi(void *opaque, u64 status)
+{
+   struct kvm_vcpu *vcpu = opaque;
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+   pmu->global_status |= status;
+   kvm_make_request(KVM_REQ_PMI, vcpu);
+}
+
+static void intel_pmu_sched_out(struct kvm_vcpu *vcpu)
+{
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   u32 bit;
+
+   pmu->restore_pmc_bitmap = pmu->assigned_pmc_bitmap;
+   for_each_set_bit(bit, (unsigned long *)>restore_pmc_bitmap,
+X86_PMC_IDX_MAX)
+   intel_pmu_put_pmc(pmu, bit);
+
+   x86_perf_unregister_pmi_callback();
+}
+
+static void intel_pmu_sched_in(struct kvm_vcpu *vcpu)
+{
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   u32 bit;
+
+   for_each_set_bit(bit, (unsigned long *)>restore_pmc_bitmap,
+X86_PMC_IDX_MAX)
+   intel_pmu_get_pmc(pmu, bit);
+   pmu->restore_pmc_bitmap = 0;
+
+   x86_perf_register_pmi_callback(intel_pmu_inject_guest_pmi, vcpu);
+}
+
 struct kvm_pmu_ops intel_pmu_ops = {
.is_valid_msr_idx = intel_is_valid_msr_idx,
.is_valid_msr = intel_is_valid_msr,
@@ -457,4 +493,6 @@ struct kvm_pmu_ops intel_pmu_ops = {
.refresh = intel_pmu_refresh,
.init = intel_pmu_init,
.reset = intel_pmu_reset,
+   .sched_out = intel_pmu_sched_out,
+   .sched_in = intel_pmu_sched_in,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 66d66d7..47308bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8986,9 +8986,15 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
vcpu->arch.l1tf_flush_l1d = true;
+   kvm_pmu_sched_in(vcpu);
   

[PATCH v1 4/8] KVM/x86/vPMU: support msr switch on vmx transitions

2018-11-01 Thread Wei Wang
This patch adds support to intel vPMU to switch msrs on vmx transitions.
Currenly only 1 msr (global ctrl) is switched. The number can be
increased on demand in the future (e.g. pebs enable). The old method from
the host perf subsystem is also removed.

Signed-off-by: Wei Wang 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Andi Kleen 
---
 arch/x86/events/intel/core.c  | 60 ---
 arch/x86/events/perf_event.h  |  6 
 arch/x86/include/asm/kvm_host.h   | 11 +++
 arch/x86/include/asm/perf_event.h | 12 
 arch/x86/kvm/pmu.h|  2 ++
 arch/x86/kvm/pmu_intel.c  | 19 +
 arch/x86/kvm/vmx.c|  6 ++--
 7 files changed, 35 insertions(+), 81 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index f5e5191..33c156f 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3160,15 +3160,6 @@ static int intel_pmu_hw_config(struct perf_event *event)
return 0;
 }
 
-struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
-{
-   if (x86_pmu.guest_get_msrs)
-   return x86_pmu.guest_get_msrs(nr);
-   *nr = 0;
-   return NULL;
-}
-EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
-
 void x86_perf_register_pmi_callback(pmi_callback_t callback, void *opaque)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
@@ -3189,55 +3180,6 @@ void x86_perf_unregister_pmi_callback(void)
 }
 EXPORT_SYMBOL_GPL(x86_perf_unregister_pmi_callback);
 
-static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
-{
-   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
-   struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
-
-   arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
-   arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
-   arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
-   /*
-* If PMU counter has PEBS enabled it is not enough to disable counter
-* on a guest entry since PEBS memory write can overshoot guest entry
-* and corrupt guest memory. Disabling PEBS solves the problem.
-*/
-   arr[1].msr = MSR_IA32_PEBS_ENABLE;
-   arr[1].host = cpuc->pebs_enabled;
-   arr[1].guest = 0;
-
-   *nr = 2;
-   return arr;
-}
-
-static struct perf_guest_switch_msr *core_guest_get_msrs(int *nr)
-{
-   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
-   struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
-   int idx;
-
-   for (idx = 0; idx < x86_pmu.num_counters; idx++)  {
-   struct perf_event *event = cpuc->events[idx];
-
-   arr[idx].msr = x86_pmu_config_addr(idx);
-   arr[idx].host = arr[idx].guest = 0;
-
-   if (!test_bit(idx, cpuc->active_mask))
-   continue;
-
-   arr[idx].host = arr[idx].guest =
-   event->hw.config | ARCH_PERFMON_EVENTSEL_ENABLE;
-
-   if (event->attr.exclude_host)
-   arr[idx].host &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
-   else if (event->attr.exclude_guest)
-   arr[idx].guest &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
-   }
-
-   *nr = x86_pmu.num_counters;
-   return arr;
-}
-
 static void core_pmu_enable_event(struct perf_event *event)
 {
if (!event->attr.exclude_host)
@@ -3641,7 +3583,6 @@ static __initconst const struct x86_pmu core_pmu = {
.get_event_constraints  = intel_get_event_constraints,
.put_event_constraints  = intel_put_event_constraints,
.event_constraints  = intel_core_event_constraints,
-   .guest_get_msrs = core_guest_get_msrs,
.format_attrs   = intel_arch_formats_attr,
.events_sysfs_show  = intel_event_sysfs_show,
 
@@ -3694,7 +3635,6 @@ static __initconst const struct x86_pmu intel_pmu = {
.cpu_prepare= intel_pmu_cpu_prepare,
.cpu_starting   = intel_pmu_cpu_starting,
.cpu_dying  = intel_pmu_cpu_dying,
-   .guest_get_msrs = intel_guest_get_msrs,
.sched_task = intel_pmu_sched_task,
 };
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 06404eb..9f818d5 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -227,7 +227,6 @@ struct cpu_hw_events {
 */
u64 intel_ctrl_guest_mask;
u64 intel_ctrl_host_mask;
-   struct perf_guest_switch_msrguest_switch_msrs[X86_PMC_IDX_MAX];
 
/*
 * Intel checkpoint mask
@@ -645,11 +644,6 @@ struct x86_pmu {
 */
struct extra_reg *extra_regs;
unsigned int flags;
-
-   /*
-* Intel host/guest support (KVM)
-*/
-   struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
 };
 
 struct x86_perf_task_context {
diff 

[PATCH v1 6/8] KVM/x86/vPMU: remove some unused functions

2018-11-01 Thread Wei Wang
Those functions are used in the old intel vPMU implementation and are not
needed now.

Signed-off-by: Wei Wang 
Cc: Paolo Bonzini 
Cc: Andi Kleen 
Cc: Peter Zijlstra 
---
 arch/x86/kvm/pmu_intel.c | 103 ---
 1 file changed, 103 deletions(-)

diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 4c6183b..66d7c09 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -23,87 +23,6 @@
 #define INTEL_PMU_RDPMC_FIXED_PMC (1ULL << 30)
 #define INTEL_PMU_RDPMC_COUNTER_MASK (INTEL_PMU_RDPMC_FIXED_PMC - 1)
 
-static struct kvm_event_hw_type_mapping intel_arch_events[] = {
-   /* Index must match CPUID 0x0A.EBX bit vector */
-   [0] = { 0x3c, 0x00, PERF_COUNT_HW_CPU_CYCLES },
-   [1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS },
-   [2] = { 0x3c, 0x01, PERF_COUNT_HW_BUS_CYCLES  },
-   [3] = { 0x2e, 0x4f, PERF_COUNT_HW_CACHE_REFERENCES },
-   [4] = { 0x2e, 0x41, PERF_COUNT_HW_CACHE_MISSES },
-   [5] = { 0xc4, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
-   [6] = { 0xc5, 0x00, PERF_COUNT_HW_BRANCH_MISSES },
-   [7] = { 0x00, 0x30, PERF_COUNT_HW_REF_CPU_CYCLES },
-};
-
-/* mapping between fixed pmc index and intel_arch_events array */
-static int fixed_pmc_events[] = {1, 0, 7};
-
-static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
-{
-   int i;
-
-   for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
-   u8 new_ctrl = fixed_ctrl_field(data, i);
-   u8 old_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, i);
-   struct kvm_pmc *pmc;
-
-   pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);
-
-   if (old_ctrl == new_ctrl)
-   continue;
-
-   reprogram_fixed_counter(pmc, new_ctrl, i);
-   }
-
-   pmu->fixed_ctr_ctrl = data;
-}
-
-/* function is called when global control register has been updated. */
-static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data)
-{
-   int bit;
-   u64 diff = pmu->global_ctrl ^ data;
-
-   pmu->global_ctrl = data;
-
-   for_each_set_bit(bit, (unsigned long *), X86_PMC_IDX_MAX)
-   reprogram_counter(pmu, bit);
-}
-
-static unsigned intel_find_arch_event(struct kvm_pmu *pmu,
- u8 event_select,
- u8 unit_mask)
-{
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(intel_arch_events); i++)
-   if (intel_arch_events[i].eventsel == event_select
-   && intel_arch_events[i].unit_mask == unit_mask
-   && (pmu->available_event_types & (1 << i)))
-   break;
-
-   if (i == ARRAY_SIZE(intel_arch_events))
-   return PERF_COUNT_HW_MAX;
-
-   return intel_arch_events[i].event_type;
-}
-
-static unsigned intel_find_fixed_event(int idx)
-{
-   if (idx >= ARRAY_SIZE(fixed_pmc_events))
-   return PERF_COUNT_HW_MAX;
-
-   return intel_arch_events[fixed_pmc_events[idx]].event_type;
-}
-
-/* check if a PMC is enabled by comparing it with globl_ctrl bits. */
-static bool intel_pmc_is_enabled(struct kvm_pmc *pmc)
-{
-   struct kvm_pmu *pmu = pmc_to_pmu(pmc);
-
-   return test_bit(pmc->idx, (unsigned long *)>global_ctrl);
-}
-
 static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
 {
if (pmc_idx < INTEL_PMC_IDX_FIXED)
@@ -128,23 +47,6 @@ static int intel_is_valid_msr_idx(struct kvm_vcpu *vcpu, 
unsigned idx)
(fixed && idx >= pmu->nr_arch_fixed_counters);
 }
 
-static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu,
-   unsigned idx)
-{
-   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-   bool fixed = idx & (1u << 30);
-   struct kvm_pmc *counters;
-
-   idx &= ~(3u << 30);
-   if (!fixed && idx >= pmu->nr_arch_gp_counters)
-   return NULL;
-   if (fixed && idx >= pmu->nr_arch_fixed_counters)
-   return NULL;
-   counters = fixed ? pmu->fixed_counters : pmu->gp_counters;
-
-   return [idx];
-}
-
 static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 {
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -547,11 +449,6 @@ struct kvm_perf_switch_msr 
*intel_pmu_get_switch_msrs(struct kvm_vcpu *vcpu,
 }
 
 struct kvm_pmu_ops intel_pmu_ops = {
-   .find_arch_event = intel_find_arch_event,
-   .find_fixed_event = intel_find_fixed_event,
-   .pmc_is_enabled = intel_pmc_is_enabled,
-   .pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
-   .msr_idx_to_pmc = intel_msr_idx_to_pmc,
.is_valid_msr_idx = intel_is_valid_msr_idx,
.is_valid_msr = intel_is_valid_msr,
.get_msr = intel_pmu_get_msr,
-- 
2.7.4



[PATCH v1 5/8] KVM/x86/vPMU: intel_pmu_read_pmc

2018-11-01 Thread Wei Wang
This patch adds the handling of guest rdpmc. If the physical counter has
been assigned, directly reads the value from the hardware. Otherwise,
return to the guest the virtual counter value.

Signed-off-by: Wei Wang 
Cc: Paolo Bonzini 
Cc: Andi Kleen 
Cc: Peter Zijlstra 
---
 arch/x86/kvm/pmu.c   |  3 +++
 arch/x86/kvm/pmu.h   |  1 +
 arch/x86/kvm/pmu_intel.c | 27 +++
 3 files changed, 31 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 58ead7d..7f2f63e 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -284,6 +284,9 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 
*data)
struct kvm_pmc *pmc;
u64 ctr_val;
 
+   if (kvm_x86_ops->pmu_ops->read_pmc)
+   return kvm_x86_ops->pmu_ops->read_pmc(vcpu, idx, data);
+
if (is_vmware_backdoor_pmc(idx))
return kvm_pmu_rdpmc_vmware(vcpu, idx, data);
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index b3b0238..7ab85bf 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -30,6 +30,7 @@ struct kvm_pmu_ops {
bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+   int (*read_pmc)(struct kvm_vcpu *vcpu, unsigned int idx, u64 *data);
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 2be31ad..4c6183b 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -20,6 +20,9 @@
 #include "lapic.h"
 #include "pmu.h"
 
+#define INTEL_PMU_RDPMC_FIXED_PMC (1ULL << 30)
+#define INTEL_PMU_RDPMC_COUNTER_MASK (INTEL_PMU_RDPMC_FIXED_PMC - 1)
+
 static struct kvm_event_hw_type_mapping intel_arch_events[] = {
/* Index must match CPUID 0x0A.EBX bit vector */
[0] = { 0x3c, 0x00, PERF_COUNT_HW_CPU_CYCLES },
@@ -409,6 +412,29 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
return 1;
 }
 
+static int intel_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned int idx,
+ u64 *data)
+{
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   unsigned int pmc_idx;
+
+   if (idx & INTEL_PMU_RDPMC_FIXED_PMC)
+   pmc_idx = INTEL_PMC_IDX_FIXED +
+ (idx & INTEL_PMU_RDPMC_COUNTER_MASK);
+   else
+   pmc_idx = idx & INTEL_PMU_RDPMC_COUNTER_MASK;
+
+   if (test_bit(pmc_idx,
+   (unsigned long *)>assigned_pmc_bitmap))
+   rdpmcl(idx, *data);
+   else {
+   struct kvm_pmc *pmc = intel_pmc_idx_to_pmc(pmu, pmc_idx);
+   *data = pmc->counter;
+   }
+
+   return 0;
+}
+
 static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 {
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -530,6 +556,7 @@ struct kvm_pmu_ops intel_pmu_ops = {
.is_valid_msr = intel_is_valid_msr,
.get_msr = intel_pmu_get_msr,
.set_msr = intel_pmu_set_msr,
+   .read_pmc = intel_pmu_read_pmc,
.refresh = intel_pmu_refresh,
.init = intel_pmu_init,
.reset = intel_pmu_reset,
-- 
2.7.4



[PATCH v1 0/8] Intel Virtual PMU Optimization

2018-11-01 Thread Wei Wang
This patch series optimizes the Intel PMU virtualization by reducing the
PMU virtualization overhead and providing guests with accurate PMU
statistics.

The differences of the traditional approach and the optimized apporach
are depicted in the figures here:
https://github.com/weiwangwork/vPMU/blob/master/vPMU%20Optimization.pdf

The traditional approach of PMU virtualization is host perf event oriented.
The KVM vPMU layer sits on top of the host perf subsystem and each guest's
update to the vPMU is translated into a new host perf event, which needs
to go through the host perf software stack (e.g. releasing the old perf
event, re-creating a new one and getting it rescheduled to a hardware perf
counter) for a reconfiguration.

With this optimization, we intend to make the virtualization layer to be
register oriented. The KVM vPMU layer is moved down to directly sit on the
hardware perf counters. The guest accesses to the vPMU registers can be
directly applied to the related hardware counter by the vPMU. It can
reduce the virtualization overhead from around 250ns to 400ns.
(Tested on Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz, and added host
booting parameter "nowatchdog" to avoid the noise from watchdog_hld)

We still need the vPMU to get the ownership of the physical perf counters
from the host perf core. The guest used counters are taken from the host
perf core via x86_perf_mask_perf_counters, which in most cases is a
bit-setting of the guest mask.

This series currently covers the basic perf counter virtualization. Other
features, such as pebs and lbr, will come after this series.

Wei Wang (8):
  perf/x86: add support to mask counters from host
  perf/x86/intel: add pmi callback support
  KVM/x86/vPMU: optimize intel vPMU
  KVM/x86/vPMU: support msr switch on vmx transitions
  KVM/x86/vPMU: intel_pmu_read_pmc
  KVM/x86/vPMU: remove some unused functions
  KVM/x86/vPMU: save/restore guest perf counters on vCPU switching
  KVM/x86/vPMU: return the counters to host if guest is torn down

 arch/x86/events/core.c|  47 
 arch/x86/events/intel/core.c  |  65 ++
 arch/x86/events/perf_event.h  |  10 +-
 arch/x86/include/asm/kvm_host.h   |  13 ++
 arch/x86/include/asm/perf_event.h |  16 +-
 arch/x86/kvm/pmu.c|  15 ++
 arch/x86/kvm/pmu.h|   7 +
 arch/x86/kvm/pmu_intel.c  | 448 +-
 arch/x86/kvm/vmx.c|   6 +-
 arch/x86/kvm/x86.c|   6 +
 include/linux/kvm_host.h  |   1 +
 virt/kvm/kvm_main.c   |   3 +
 12 files changed, 416 insertions(+), 221 deletions(-)

-- 
2.7.4



[PATCH v1 6/8] KVM/x86/vPMU: remove some unused functions

2018-11-01 Thread Wei Wang
Those functions are used in the old intel vPMU implementation and are not
needed now.

Signed-off-by: Wei Wang 
Cc: Paolo Bonzini 
Cc: Andi Kleen 
Cc: Peter Zijlstra 
---
 arch/x86/kvm/pmu_intel.c | 103 ---
 1 file changed, 103 deletions(-)

diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 4c6183b..66d7c09 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -23,87 +23,6 @@
 #define INTEL_PMU_RDPMC_FIXED_PMC (1ULL << 30)
 #define INTEL_PMU_RDPMC_COUNTER_MASK (INTEL_PMU_RDPMC_FIXED_PMC - 1)
 
-static struct kvm_event_hw_type_mapping intel_arch_events[] = {
-   /* Index must match CPUID 0x0A.EBX bit vector */
-   [0] = { 0x3c, 0x00, PERF_COUNT_HW_CPU_CYCLES },
-   [1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS },
-   [2] = { 0x3c, 0x01, PERF_COUNT_HW_BUS_CYCLES  },
-   [3] = { 0x2e, 0x4f, PERF_COUNT_HW_CACHE_REFERENCES },
-   [4] = { 0x2e, 0x41, PERF_COUNT_HW_CACHE_MISSES },
-   [5] = { 0xc4, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
-   [6] = { 0xc5, 0x00, PERF_COUNT_HW_BRANCH_MISSES },
-   [7] = { 0x00, 0x30, PERF_COUNT_HW_REF_CPU_CYCLES },
-};
-
-/* mapping between fixed pmc index and intel_arch_events array */
-static int fixed_pmc_events[] = {1, 0, 7};
-
-static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
-{
-   int i;
-
-   for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
-   u8 new_ctrl = fixed_ctrl_field(data, i);
-   u8 old_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, i);
-   struct kvm_pmc *pmc;
-
-   pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);
-
-   if (old_ctrl == new_ctrl)
-   continue;
-
-   reprogram_fixed_counter(pmc, new_ctrl, i);
-   }
-
-   pmu->fixed_ctr_ctrl = data;
-}
-
-/* function is called when global control register has been updated. */
-static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data)
-{
-   int bit;
-   u64 diff = pmu->global_ctrl ^ data;
-
-   pmu->global_ctrl = data;
-
-   for_each_set_bit(bit, (unsigned long *), X86_PMC_IDX_MAX)
-   reprogram_counter(pmu, bit);
-}
-
-static unsigned intel_find_arch_event(struct kvm_pmu *pmu,
- u8 event_select,
- u8 unit_mask)
-{
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(intel_arch_events); i++)
-   if (intel_arch_events[i].eventsel == event_select
-   && intel_arch_events[i].unit_mask == unit_mask
-   && (pmu->available_event_types & (1 << i)))
-   break;
-
-   if (i == ARRAY_SIZE(intel_arch_events))
-   return PERF_COUNT_HW_MAX;
-
-   return intel_arch_events[i].event_type;
-}
-
-static unsigned intel_find_fixed_event(int idx)
-{
-   if (idx >= ARRAY_SIZE(fixed_pmc_events))
-   return PERF_COUNT_HW_MAX;
-
-   return intel_arch_events[fixed_pmc_events[idx]].event_type;
-}
-
-/* check if a PMC is enabled by comparing it with globl_ctrl bits. */
-static bool intel_pmc_is_enabled(struct kvm_pmc *pmc)
-{
-   struct kvm_pmu *pmu = pmc_to_pmu(pmc);
-
-   return test_bit(pmc->idx, (unsigned long *)>global_ctrl);
-}
-
 static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
 {
if (pmc_idx < INTEL_PMC_IDX_FIXED)
@@ -128,23 +47,6 @@ static int intel_is_valid_msr_idx(struct kvm_vcpu *vcpu, 
unsigned idx)
(fixed && idx >= pmu->nr_arch_fixed_counters);
 }
 
-static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu,
-   unsigned idx)
-{
-   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-   bool fixed = idx & (1u << 30);
-   struct kvm_pmc *counters;
-
-   idx &= ~(3u << 30);
-   if (!fixed && idx >= pmu->nr_arch_gp_counters)
-   return NULL;
-   if (fixed && idx >= pmu->nr_arch_fixed_counters)
-   return NULL;
-   counters = fixed ? pmu->fixed_counters : pmu->gp_counters;
-
-   return [idx];
-}
-
 static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 {
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -547,11 +449,6 @@ struct kvm_perf_switch_msr 
*intel_pmu_get_switch_msrs(struct kvm_vcpu *vcpu,
 }
 
 struct kvm_pmu_ops intel_pmu_ops = {
-   .find_arch_event = intel_find_arch_event,
-   .find_fixed_event = intel_find_fixed_event,
-   .pmc_is_enabled = intel_pmc_is_enabled,
-   .pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
-   .msr_idx_to_pmc = intel_msr_idx_to_pmc,
.is_valid_msr_idx = intel_is_valid_msr_idx,
.is_valid_msr = intel_is_valid_msr,
.get_msr = intel_pmu_get_msr,
-- 
2.7.4



[PATCH v1 5/8] KVM/x86/vPMU: intel_pmu_read_pmc

2018-11-01 Thread Wei Wang
This patch adds the handling of guest rdpmc. If the physical counter has
been assigned, directly reads the value from the hardware. Otherwise,
return to the guest the virtual counter value.

Signed-off-by: Wei Wang 
Cc: Paolo Bonzini 
Cc: Andi Kleen 
Cc: Peter Zijlstra 
---
 arch/x86/kvm/pmu.c   |  3 +++
 arch/x86/kvm/pmu.h   |  1 +
 arch/x86/kvm/pmu_intel.c | 27 +++
 3 files changed, 31 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 58ead7d..7f2f63e 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -284,6 +284,9 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 
*data)
struct kvm_pmc *pmc;
u64 ctr_val;
 
+   if (kvm_x86_ops->pmu_ops->read_pmc)
+   return kvm_x86_ops->pmu_ops->read_pmc(vcpu, idx, data);
+
if (is_vmware_backdoor_pmc(idx))
return kvm_pmu_rdpmc_vmware(vcpu, idx, data);
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index b3b0238..7ab85bf 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -30,6 +30,7 @@ struct kvm_pmu_ops {
bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+   int (*read_pmc)(struct kvm_vcpu *vcpu, unsigned int idx, u64 *data);
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 2be31ad..4c6183b 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -20,6 +20,9 @@
 #include "lapic.h"
 #include "pmu.h"
 
+#define INTEL_PMU_RDPMC_FIXED_PMC (1ULL << 30)
+#define INTEL_PMU_RDPMC_COUNTER_MASK (INTEL_PMU_RDPMC_FIXED_PMC - 1)
+
 static struct kvm_event_hw_type_mapping intel_arch_events[] = {
/* Index must match CPUID 0x0A.EBX bit vector */
[0] = { 0x3c, 0x00, PERF_COUNT_HW_CPU_CYCLES },
@@ -409,6 +412,29 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
return 1;
 }
 
+static int intel_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned int idx,
+ u64 *data)
+{
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   unsigned int pmc_idx;
+
+   if (idx & INTEL_PMU_RDPMC_FIXED_PMC)
+   pmc_idx = INTEL_PMC_IDX_FIXED +
+ (idx & INTEL_PMU_RDPMC_COUNTER_MASK);
+   else
+   pmc_idx = idx & INTEL_PMU_RDPMC_COUNTER_MASK;
+
+   if (test_bit(pmc_idx,
+   (unsigned long *)>assigned_pmc_bitmap))
+   rdpmcl(idx, *data);
+   else {
+   struct kvm_pmc *pmc = intel_pmc_idx_to_pmc(pmu, pmc_idx);
+   *data = pmc->counter;
+   }
+
+   return 0;
+}
+
 static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 {
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -530,6 +556,7 @@ struct kvm_pmu_ops intel_pmu_ops = {
.is_valid_msr = intel_is_valid_msr,
.get_msr = intel_pmu_get_msr,
.set_msr = intel_pmu_set_msr,
+   .read_pmc = intel_pmu_read_pmc,
.refresh = intel_pmu_refresh,
.init = intel_pmu_init,
.reset = intel_pmu_reset,
-- 
2.7.4



[PATCH v1 0/8] Intel Virtual PMU Optimization

2018-11-01 Thread Wei Wang
This patch series optimizes the Intel PMU virtualization by reducing the
PMU virtualization overhead and providing guests with accurate PMU
statistics.

The differences of the traditional approach and the optimized apporach
are depicted in the figures here:
https://github.com/weiwangwork/vPMU/blob/master/vPMU%20Optimization.pdf

The traditional approach of PMU virtualization is host perf event oriented.
The KVM vPMU layer sits on top of the host perf subsystem and each guest's
update to the vPMU is translated into a new host perf event, which needs
to go through the host perf software stack (e.g. releasing the old perf
event, re-creating a new one and getting it rescheduled to a hardware perf
counter) for a reconfiguration.

With this optimization, we intend to make the virtualization layer to be
register oriented. The KVM vPMU layer is moved down to directly sit on the
hardware perf counters. The guest accesses to the vPMU registers can be
directly applied to the related hardware counter by the vPMU. It can
reduce the virtualization overhead from around 250ns to 400ns.
(Tested on Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz, and added host
booting parameter "nowatchdog" to avoid the noise from watchdog_hld)

We still need the vPMU to get the ownership of the physical perf counters
from the host perf core. The guest used counters are taken from the host
perf core via x86_perf_mask_perf_counters, which in most cases is a
bit-setting of the guest mask.

This series currently covers the basic perf counter virtualization. Other
features, such as pebs and lbr, will come after this series.

Wei Wang (8):
  perf/x86: add support to mask counters from host
  perf/x86/intel: add pmi callback support
  KVM/x86/vPMU: optimize intel vPMU
  KVM/x86/vPMU: support msr switch on vmx transitions
  KVM/x86/vPMU: intel_pmu_read_pmc
  KVM/x86/vPMU: remove some unused functions
  KVM/x86/vPMU: save/restore guest perf counters on vCPU switching
  KVM/x86/vPMU: return the counters to host if guest is torn down

 arch/x86/events/core.c|  47 
 arch/x86/events/intel/core.c  |  65 ++
 arch/x86/events/perf_event.h  |  10 +-
 arch/x86/include/asm/kvm_host.h   |  13 ++
 arch/x86/include/asm/perf_event.h |  16 +-
 arch/x86/kvm/pmu.c|  15 ++
 arch/x86/kvm/pmu.h|   7 +
 arch/x86/kvm/pmu_intel.c  | 448 +-
 arch/x86/kvm/vmx.c|   6 +-
 arch/x86/kvm/x86.c|   6 +
 include/linux/kvm_host.h  |   1 +
 virt/kvm/kvm_main.c   |   3 +
 12 files changed, 416 insertions(+), 221 deletions(-)

-- 
2.7.4



Re: [PATCH v15 1/2] Reorganize the oom report in dump_header

2018-11-01 Thread Michal Hocko
On Thu 01-11-18 18:09:39, 禹舟键 wrote:
> Hi Michal
> The null pointer is possible when calling the dump_header, this bug was
> detected by LKP. Below is the context 3 months ago.

Yeah I remember it was 0day report but I coundn't find it in my email
archive. Do you happen to have a message-id?

Anyway
if (__ratelimit(_rs))
dump_header(oc, p);
+   if (oc)
+   dump_oom_summary(oc, victim);

Clearly cannot solve any NULL ptr because oc is never NULL unless I am
missing something terribly.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v15 1/2] Reorganize the oom report in dump_header

2018-11-01 Thread Michal Hocko
On Thu 01-11-18 18:09:39, 禹舟键 wrote:
> Hi Michal
> The null pointer is possible when calling the dump_header, this bug was
> detected by LKP. Below is the context 3 months ago.

Yeah I remember it was 0day report but I coundn't find it in my email
archive. Do you happen to have a message-id?

Anyway
if (__ratelimit(_rs))
dump_header(oc, p);
+   if (oc)
+   dump_oom_summary(oc, victim);

Clearly cannot solve any NULL ptr because oc is never NULL unless I am
missing something terribly.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v2 2/5] mfd: lochnagar: Add support for the Cirrus Logic Lochnagar

2018-11-01 Thread Charles Keepax
On Mon, Oct 29, 2018 at 11:04:39AM +, Lee Jones wrote:
> On Fri, 26 Oct 2018, Mark Brown wrote:
> > On Fri, Oct 26, 2018 at 09:00:51AM +0100, Lee Jones wrote:
> > > On Thu, 25 Oct 2018, Richard Fitzgerald wrote:

I have re-ordered some of the quotes here for the benefit
of clarity. I will start with the Lochnagar focused bits
and then cover some of the more general regmap discussion.
Apologies for the wall of text towards the end but it seemed
wise to explore some of the why for parts of the current regmap
implementation and some of the implications for changing it.

> > I think from the perspective of Richard and Charles who are just trying
> > to get their driver merged this is something of an abstract distinction.
> > If the driver were merged and this discussion were happening separately
> > their perspective would most likely be different.
>
> Charles has already mentioned that he'd take a look at the current
> *use* (not changing the API, but the way in which Lochnagar
> *uses/consumes* it).  Actions which would be most welcomed.
>
> > I kind of said this above but just to be clear this driver seems to me
> > like an idiomatic user of the regmap API as it is today.  My guess is
> > that we could probably loose the defaults tables and not suffer too much
> > but equally well they don't hurt from a regmap point of view.
>
> Perhaps Charles could elaborate on whether this is possible or not?

So on this front I have had a look through and we can drop the
defaults tables for Lochnagar, although as I shall cover later
Lochnagar is the exceptional case with respect to our normal
devices.

> Utilising range support here would certainly help IMHO.
>

I am rather hesitant to switch to the range API. Whilst it is
significantly more clear in certain situations, such as say
mapping out the memory for a DSP, where there exist large
amorphis blobs of identical function registers. I am really
not convinced it really suits something like the register map
that controls Lochnagar. You have an intertwinned mix of
various purpose registers, each with a clearly defined name
and potentially with quite fine grained properties.

Certainly when working with the driver I want to be able to
fairly quickly see what registers are marked as by name. The
callbacks are very simple and I don't need to look up where
register are in the regmap to be able check their attributes.
But perhaps I have just got too used to seeing these callbacks,
things do have a way of becoming normal after being exposed to
them for a while.

What I will try for the next spin is to try to use as much:

  case REG1...REG2:

As I can without totally sacrificing grepability/clarity and
hopefully that will get us to a compromise we can all live
with at least for now. Lochnagar is a fairly small part so it
feels like this should be doable.

> > > > +   ret = devm_of_platform_populate(dev);
> > > > +   if (ret < 0) {
> > > > +   dev_err(dev, "Failed to populate child
> > > > nodes:
> > > > %d\n", ret);
> > > > +   return ret;
> > > > +   }
> > >
> > > Please do not mix OF and MFD device registration
> > > strategies.
> > >
> > > Pick one and register all devices through your chosen
> > > method.
> >
> > Hmmm we use this to do things like register some fixed
> > regulators
> > and clocks that don't need any control but do need to be
> > associated 
> > with the device. I could do that through the MFD although it
> > is in 
> > direct conflict with the feedback on the clock patches I
> > received
> > to move the fixed clocks into devicetree rather than
> > registering
> > them manually (see v2 of the patch chain).
>
> The I suggest moving everything to DT.

So pulling this out from earlier discussions in this thread,
it seems I can happily move all the child device registration
into device tree. I will also try this for the next version of
the patch, unless anyone wants to object? But it does change
the DT binding quite a lot as the individual sub drivers now
each require their own node rather than one single unified
Lochnagar node.

> > > Precisely my point.  Lochnagar is a small device yet it's required to
> > > submit hundreds of lines of Regmap tables.  Imagine what that would
> > > look like for a large device.
> >
> > There's no *requirement* to provide the data even if you're using the
> > cache (and the cache support is entirely optional), there's just costs
> > to not providing it in terms of what features you can get from the
> > regmap API and the performance of the system.  Not every device is going
> > to be bothered by those costs, many devices don't provide all of the
> > data they could.
>
> So what do we do in the case where, due to the size of the device, the
> amount of lines required by these tables go from crazy to grotesque?
>

Ultimately, I guess I have always just viewed it as just data
tables. Its a lot of lines of source but its not complicated,
and complexity has really always been the thing I try 

Re: [PATCH v2 2/5] mfd: lochnagar: Add support for the Cirrus Logic Lochnagar

2018-11-01 Thread Charles Keepax
On Mon, Oct 29, 2018 at 11:04:39AM +, Lee Jones wrote:
> On Fri, 26 Oct 2018, Mark Brown wrote:
> > On Fri, Oct 26, 2018 at 09:00:51AM +0100, Lee Jones wrote:
> > > On Thu, 25 Oct 2018, Richard Fitzgerald wrote:

I have re-ordered some of the quotes here for the benefit
of clarity. I will start with the Lochnagar focused bits
and then cover some of the more general regmap discussion.
Apologies for the wall of text towards the end but it seemed
wise to explore some of the why for parts of the current regmap
implementation and some of the implications for changing it.

> > I think from the perspective of Richard and Charles who are just trying
> > to get their driver merged this is something of an abstract distinction.
> > If the driver were merged and this discussion were happening separately
> > their perspective would most likely be different.
>
> Charles has already mentioned that he'd take a look at the current
> *use* (not changing the API, but the way in which Lochnagar
> *uses/consumes* it).  Actions which would be most welcomed.
>
> > I kind of said this above but just to be clear this driver seems to me
> > like an idiomatic user of the regmap API as it is today.  My guess is
> > that we could probably loose the defaults tables and not suffer too much
> > but equally well they don't hurt from a regmap point of view.
>
> Perhaps Charles could elaborate on whether this is possible or not?

So on this front I have had a look through and we can drop the
defaults tables for Lochnagar, although as I shall cover later
Lochnagar is the exceptional case with respect to our normal
devices.

> Utilising range support here would certainly help IMHO.
>

I am rather hesitant to switch to the range API. Whilst it is
significantly more clear in certain situations, such as say
mapping out the memory for a DSP, where there exist large
amorphis blobs of identical function registers. I am really
not convinced it really suits something like the register map
that controls Lochnagar. You have an intertwinned mix of
various purpose registers, each with a clearly defined name
and potentially with quite fine grained properties.

Certainly when working with the driver I want to be able to
fairly quickly see what registers are marked as by name. The
callbacks are very simple and I don't need to look up where
register are in the regmap to be able check their attributes.
But perhaps I have just got too used to seeing these callbacks,
things do have a way of becoming normal after being exposed to
them for a while.

What I will try for the next spin is to try to use as much:

  case REG1...REG2:

As I can without totally sacrificing grepability/clarity and
hopefully that will get us to a compromise we can all live
with at least for now. Lochnagar is a fairly small part so it
feels like this should be doable.

> > > > +   ret = devm_of_platform_populate(dev);
> > > > +   if (ret < 0) {
> > > > +   dev_err(dev, "Failed to populate child
> > > > nodes:
> > > > %d\n", ret);
> > > > +   return ret;
> > > > +   }
> > >
> > > Please do not mix OF and MFD device registration
> > > strategies.
> > >
> > > Pick one and register all devices through your chosen
> > > method.
> >
> > Hmmm we use this to do things like register some fixed
> > regulators
> > and clocks that don't need any control but do need to be
> > associated 
> > with the device. I could do that through the MFD although it
> > is in 
> > direct conflict with the feedback on the clock patches I
> > received
> > to move the fixed clocks into devicetree rather than
> > registering
> > them manually (see v2 of the patch chain).
>
> The I suggest moving everything to DT.

So pulling this out from earlier discussions in this thread,
it seems I can happily move all the child device registration
into device tree. I will also try this for the next version of
the patch, unless anyone wants to object? But it does change
the DT binding quite a lot as the individual sub drivers now
each require their own node rather than one single unified
Lochnagar node.

> > > Precisely my point.  Lochnagar is a small device yet it's required to
> > > submit hundreds of lines of Regmap tables.  Imagine what that would
> > > look like for a large device.
> >
> > There's no *requirement* to provide the data even if you're using the
> > cache (and the cache support is entirely optional), there's just costs
> > to not providing it in terms of what features you can get from the
> > regmap API and the performance of the system.  Not every device is going
> > to be bothered by those costs, many devices don't provide all of the
> > data they could.
>
> So what do we do in the case where, due to the size of the device, the
> amount of lines required by these tables go from crazy to grotesque?
>

Ultimately, I guess I have always just viewed it as just data
tables. Its a lot of lines of source but its not complicated,
and complexity has really always been the thing I try 

Re: [PATCH v3] mm/page_owner: use kvmalloc instead of kmalloc

2018-11-01 Thread Michal Hocko
On Thu 01-11-18 18:00:12, Miles Chen wrote:
[...]
> I did a test today, the only code changed is to clamp to read count to
> PAGE_SIZE and it worked well. Maybe we can solve this issue by just
> clamping the read count.
> 
> count = count > PAGE_SIZE ? PAGE_SIZE : count;

This i what Matthew was proposing AFAIR. At least as a stop gap
solution. Maybe we want to extend this to a more standard implementation
later on (e.g. seq_file).
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3] mm/page_owner: use kvmalloc instead of kmalloc

2018-11-01 Thread Michal Hocko
On Thu 01-11-18 18:00:12, Miles Chen wrote:
[...]
> I did a test today, the only code changed is to clamp to read count to
> PAGE_SIZE and it worked well. Maybe we can solve this issue by just
> clamping the read count.
> 
> count = count > PAGE_SIZE ? PAGE_SIZE : count;

This i what Matthew was proposing AFAIR. At least as a stop gap
solution. Maybe we want to extend this to a more standard implementation
later on (e.g. seq_file).
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE

2018-11-01 Thread Michal Hocko
On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.

I would go on and say that allocations with sizes too large can actually
trigger a warning (once you have posted in the previous version outside
of the changelog area) because that might be interesting to people -
there are deployments to panic on warning and then a warning is much
more important.

> Signed-off-by: Konstantin Khlebnikov 

Acked-by: Michal Hocko 

Thanks!

> ---
>  mm/util.c |4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 8bf08b5b5760..f5f04fa22814 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>   gfp_t kmalloc_flags = flags;
>   void *ret;
>  
> + if (size > KMALLOC_MAX_SIZE)
> + goto fallback;
> +
>   /*
>* vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
> tables)
>* so the given set of flags has to be compatible.
> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>   if (ret || size <= PAGE_SIZE)
>   return ret;
>  
> +fallback:
>   return __vmalloc_node_flags_caller(size, node, flags,
>   __builtin_return_address(0));
>  }
> 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE

2018-11-01 Thread Michal Hocko
On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.

I would go on and say that allocations with sizes too large can actually
trigger a warning (once you have posted in the previous version outside
of the changelog area) because that might be interesting to people -
there are deployments to panic on warning and then a warning is much
more important.

> Signed-off-by: Konstantin Khlebnikov 

Acked-by: Michal Hocko 

Thanks!

> ---
>  mm/util.c |4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 8bf08b5b5760..f5f04fa22814 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>   gfp_t kmalloc_flags = flags;
>   void *ret;
>  
> + if (size > KMALLOC_MAX_SIZE)
> + goto fallback;
> +
>   /*
>* vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
> tables)
>* so the given set of flags has to be compatible.
> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>   if (ret || size <= PAGE_SIZE)
>   return ret;
>  
> +fallback:
>   return __vmalloc_node_flags_caller(size, node, flags,
>   __builtin_return_address(0));
>  }
> 

-- 
Michal Hocko
SUSE Labs


<    3   4   5   6   7   8   9   10   11   >