[PATCH v2] MAINTAINERS: mark simple firmware interface (SFI) obsolete

2019-04-12 Thread Lukas Bulwahn
Len Brown has not been active in this part since around 2010. The recent
activity suggests that Thomas Gleixner and Jiang Lui were maintaining
this part of the kernel sources. Jiang Lui has not been active in the
kernel sources since beginning 2016. So, the maintainer's role seems to
be now with Thomas.

The referenced git tree does not exist. Instead, I found an sfi branch
in Len's kernel git repository, but that has not been updated since 2014;
so that is not worth to be mentioned in MAINTAINERS now anymore either.

Len Brown expects no further systems to be shipped with SFI, so we can
mark it obsolete and schedule it for deletion.

This change was motivated after I found that I could not send any mails
to the sfi-devel mailing list, and that the mailing list does not exist
anymore.

Signed-off-by: Lukas Bulwahn 
---
Thomas, please pick this patch.

v1: https://lore.kernel.org/patchwork/patch/1033696/ 
  - got Acked-by: Len Brown 

v2:
  - also change status to Obsolete

 MAINTAINERS | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2359e12e4c41..07bf81c52fe6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14150,11 +14150,9 @@ F: drivers/video/fbdev/sm712*
 F: Documentation/fb/sm712fb.txt
 
 SIMPLE FIRMWARE INTERFACE (SFI)
-M: Len Brown 
-L: sfi-de...@simplefirmware.org
+M: Thomas Gleixner 
 W: http://simplefirmware.org/
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6.git
-S: Supported
+S: Obsolete
 F: arch/x86/platform/sfi/
 F: drivers/sfi/
 F: include/linux/sfi*.h
-- 
2.17.1



Re: [PATCH 4/4] locking/lockdep: Test all incompatible scenario at once in check_irq_usage()

2019-04-12 Thread Yuyang Du
On Wed, 3 Apr 2019 at 00:05, Frederic Weisbecker  wrote:
>
> check_prev_add_irq() tests all incompatible scenarios one after the
> other while adding a lock (@next) to a tree dependency (@prev):
>
> LOCK_USED_IN_HARDIRQ  vs LOCK_ENABLED_HARDIRQ
> LOCK_USED_IN_HARDIRQ_READ vs LOCK_ENABLED_HARDIRQ
> LOCK_USED_IN_SOFTIRQ  vs LOCK_ENABLED_SOFTIRQ
> LOCK_USED_IN_SOFTIRQ_READ vs LOCK_ENABLED_SOFTIRQ

May I ask why

LOCK_USED_IN_*IRQ vs. LOCK_ENABLED_*IRQ_READ

is not tested?

Thanks,
Yuyang


RE: perf tools:Is there any tools to found out the max latency by irq or cpu idle

2019-04-12 Thread Linhaifeng
Sorry, the value 131081408 is just for example. Actually the result is like 
this:
 sqrt 2019-04-10 23:53:50: 43968
 sqrt 2019-04-10 23:53:51: 44060
 sqrt 2019-04-10 23:53:52: 49012
 sqrt 2019-04-10 23:53:53: 38172
 sqrt 2019-04-10 23:53:54: 131081408
 sqrt 2019-04-10 23:53:55: 43600
 sqrt 2019-04-10 23:53:56: 46704
 sqrt 2019-04-10 23:53:57: 46880
 sqrt 2019-04-10 23:53:58: 44332
 ……
 sqrt 2019-04-10 02:17:15: 136345876
 ……
 sqrt 2019-04-10 04:40:35: 136335644
 ……

-Original Message-
From: William Cohen [mailto:wco...@redhat.com] 
Sent: 2019年4月12日 22:06
To: Linhaifeng ; linux-perf-us...@vger.kernel.org; 
linux-kernel@vger.kernel.org
Subject: Re: perf tools:Is there any tools to found out the max latency by irq 
or cpu idle

On 4/11/19 8:57 PM, Linhaifeng wrote:
> Hi,
> I have a single thread application like this:
> 
> While (1) {
>     start = rdtsc();
>     sqrt (1024);  
>   end = rdtsc();
>   cycles = end – start;
>   printf("cycles: %d-%02d-%02d %02d:%02d:%02d: %lu\n", 
>   1900+timeinfo->tm_year, 1+timeinfo->tm_mon, timeinfo->tm_mday, 
> timeinfo->tm_hour, timeinfo->tm_min, timeinfo->tm_sec,
>   cycles);
> }
> It print the cycles of sqrt every second and run with taskset –c 1 ./sqrt.
> The result of test is:
> 
> sqrt 2019-04-10 23:53:50: 43968
> sqrt 2019-04-10 23:53:51: 44060
> sqrt 2019-04-10 23:53:52: 49012
> sqrt 2019-04-10 23:53:53: 38172
> sqrt 2019-04-10 23:53:54: 131081408
> sqrt 2019-04-10 23:53:55: 43600
> sqrt 2019-04-10 23:53:56: 46704
> sqrt 2019-04-10 23:53:57: 46880
> sqrt 2019-04-10 23:53:58: 44332
> ……
> sqrt 2019-04-10 02:17:15: 131081408
> ……
> sqrt 2019-04-10 04:40:35: 131081408
> ……
> 
> Every 2hour23min there would be a large cycles. I use perf sched not found 
> any sched_switch events.

Hi,

The fact that it is the same value 131081408 every 2 hours 23 minutes looks 
suspect. One would expect some variation in the counts.  It looks like there is 
some rollover or overflow issue.  It would be helpful to print out the start 
and end values to see what is going on with the raw rdstc values.  Maybe the 
thread is being moved between processors and the TSC are out of sync.  What 
particular processor model was this running on? Was this running on physical 
hardware or inside a kvm guest? 

According to the Intel 64 and IA-32 Architecture Software Devloper's Manual 
Volume 3 (325384-sdm-vol-3abcd.pdf):

The RDTSC instruction reads the time-stamp counter and is guaranteed to return 
a monotonically increasing unique value whenever executed, except for a 64-bit 
counter wraparound. Intel guarantees that the time-stamp counter will not 
wraparound within 10 years after being reset. The period for counter wrap is 
longer for Pentium 4, Intel Xeon, P6 family, and Pentium processors.

-Will Cohen


> 
> L2GW_2680:/home/fsp/zn # perf sched record -C 6-11 -o perf.sched ^C[ 
> perf record: Woken up 64 times to write data ] [ perf record: Captured 
> and wrote 204.878 MB perf.sched (1911189 samples) ]
> 
> L2GW_2680:/home/fsp/zn # perf sched latency -i perf.sched
> 
> --
> ---
>   Task  |   Runtime ms  | Switches | Average delay ms 
> | Maximum delay ms | Maximum delay at   |
> --
> ---
> --
> ---
>   TOTAL:    |  0.000 ms |    0 |
> ---
> 
> 
> 
> Is there any other tools of perf to found out the max latency by irq or cpu 
> idle ?
> 
> 



[PATCH v2] MAINTAINERS: fix style in KEYS-TRUSTED entry

2019-04-12 Thread Lukas Bulwahn
Mimi Zohar used spaces instead of a tab when adding Jarkko Sakkinen as
further maintainer to the KEYS-TRUSTED section entry.

In fact, ./scripts/checkpatch.pl -f MAINTAINERS complains:

  WARNING: MAINTAINERS entries use one tab after TYPE:
  #8581: FILE: MAINTAINERS:8581:
  +M:  Jarkko Sakkinen 

The issue was detected when writing a script that parses MAINTAINERS.

Fixes: 34bccd61b139 ("MAINTAINERS: add Jarkko as maintainer for trusted keys")
Signed-off-by: Lukas Bulwahn 
---
v1: https://lore.kernel.org/patchwork/patch/1040412/
  - Joe Perches clarified that this is a style defect. 

v2: applies cleanly to v5.1-rc4 and next-20190412

 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2359e12e4c41..75bb276aed1c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8578,7 +8578,7 @@ F:security/keys/encrypted-keys/
 
 KEYS-TRUSTED
 M: James Bottomley 
-M:  Jarkko Sakkinen 
+M: Jarkko Sakkinen 
 M: Mimi Zohar 
 L: linux-integr...@vger.kernel.org
 L: keyri...@vger.kernel.org
-- 
2.17.1



Re: [PATCH 2/3] rtc: ds1685: use correct device struct to get platform device struct

2019-04-12 Thread Joshua Kinard
On 4/12/2019 07:44, Thomas Bogendoerfer wrote:
> On Fri, 12 Apr 2019 12:11:06 +0200
> Alexandre Belloni  wrote:
> 
>> Every patch need a commit message. Maybe you could indicate that this
>> never gave any issue because parent is the first member of struct
>> device.
> 
> I'll update the commit message, I get a nice stacktrace because of that
> bug, so the path from work_queue calling ds1685_rtc_poweroff never worked.
> 
> Thomas.

I'll wager that's why the thing stopped powering off my Octane.  It *used*
to work when I wrote the driver, but stopped after some unidentified point,
and I never found the time to try and track it down.

Which machine are you testing on, out of curiosity?

-- 
Joshua Kinard
Gentoo/MIPS
ku...@gentoo.org
rsa6144/5C63F4E3F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic


Zdravstvujte Vas interesuyut klientskie bazy dannyh?

2019-04-12 Thread cona_1997
Zdravstvujte Vas interesuyut klientskie bazy dannyh?




RE,

2019-04-12 Thread Sharifah Ahmad Mustahfa
As-Salamu Alaykum, I need your help to transfer out the some amount of money, 
accumulated as undeclared excess profit made by me from the INVESTMENTS CAPITAL 
under my management in my bank. You will get 40% of the funds for your 
participation. Reply for more details thanks


Re: [GIT PULL] x86 fixes

2019-04-12 Thread pr-tracker-bot
The pull request you sent on Fri, 12 Apr 2019 15:10:42 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6d0a598489ca687e1ebac37eef546526eca41347

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] IRQ fixes

2019-04-12 Thread pr-tracker-bot
The pull request you sent on Fri, 12 Apr 2019 13:52:19 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6a022984c3feda8cc7d2d8c028b429715d0af531

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] timer fix

2019-04-12 Thread pr-tracker-bot
The pull request you sent on Fri, 12 Apr 2019 15:09:19 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> timers-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/122c215bfae884f10a189e6754d9603a06b981c3

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] scheduler fix

2019-04-12 Thread pr-tracker-bot
The pull request you sent on Fri, 12 Apr 2019 15:08:11 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> sched-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5e6f1fee60a3d80582146835ac01d9808748434f

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] core/urgent fixes

2019-04-12 Thread pr-tracker-bot
The pull request you sent on Fri, 12 Apr 2019 12:16:30 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> core-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/54c63a75581f4b8d5d4b5660424f926510cfd98c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] locking fix

2019-04-12 Thread pr-tracker-bot
The pull request you sent on Fri, 12 Apr 2019 13:53:43 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> locking-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/26e2b81977bb1ad70ff9b2acb4d4cb13c23facfd

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] perf fixes

2019-04-12 Thread pr-tracker-bot
The pull request you sent on Fri, 12 Apr 2019 15:06:42 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> perf-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/73fdb2c908c64a32e11c72c029d636f556859c0d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


[PATCH] sched/fair: Use 'unsigned long' for group_shares,group_runnable

2019-04-12 Thread Cheng Jian
group_share and group_runnable are tracked as 'unsigned long',
however some functions using them as 'long' which is ultimately
assigned back to 'unsigned long' variables in reweight_entity.

Since there is not scope on using a different and signed type,
this change improves code consistency and avoids further type
conversions. More important, to prevent undefined behavior
caused by overflow.

Using them as 'long' resulted in the following stack trace (on top
of v4.19.34)

==
UBSAN: Undefined behaviour in kernel/sched/fair.c:3055:9
signed integer overflow:
1048576 * 9144968455305 cannot be represented in type 'long int'
dump_backtrace+0x0/0x338
show_stack+0x28/0x38
dump_stack+0xc8/0x100
ubsan_epilogue+0x18/0x6c
handle_overflow+0x170/0x1c0
__ubsan_handle_mul_overflow+0x34/0x44
update_cfs_group+0x244/0x248
dequeue_entity+0x478/0x12c0
dequeue_task_fair+0x6c/0xd98
__sched_setscheduler+0x320/0xdf0
_sched_setscheduler+0xf4/0x158
do_sched_setscheduler+0x118/0x1a0
__arm64_sys_sched_setscheduler+0x50/0x70
el0_svc_common+0xf4/0x258
el0_svc_handler+0x50/0xa8

==

UBSAN: Undefined behaviour in kernel/sched/fair.c:3111:11
signed integer overflow:
97833896519391 * 952504 cannot be represented in type 'long int'
Call trace:
dump_backtrace+0x0/0x338
show_stack+0x28/0x38
dump_stack+0xc8/0x100
ubsan_epilogue+0x18/0x6c
handle_overflow+0x170/0x1c0
__ubsan_handle_mul_overflow+0x34/0x44
update_cfs_group+0x210/0x248
enqueue_entity+0x7b4/0x1868
enqueue_task_fair+0x12c/0xe70
__sched_setscheduler+0x4cc/0xdf0
_sched_setscheduler+0xf4/0x158
do_sched_setscheduler+0x118/0x1a0
__arm64_sys_sched_setscheduler+0x50/0x70
el0_svc_common+0xf4/0x258
el0_svc_handler+0x50/0xa8
el0_svc+0x8/0xc
==

Cc: sta...@vger.kernel.org 
Signed-off-by: Cheng Jian 
---
 kernel/sched/fair.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fdab7eb6f351..cf003a31c220 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2920,9 +2920,9 @@ void reweight_task(struct task_struct *p, int prio)
  *
  * hence icky!
  */
-static long calc_group_shares(struct cfs_rq *cfs_rq)
+static unsigned long calc_group_shares(struct cfs_rq *cfs_rq)
 {
-   long tg_weight, tg_shares, load, shares;
+   unsigned long tg_weight, tg_shares, load, shares;
struct task_group *tg = cfs_rq->tg;
 
tg_shares = READ_ONCE(tg->shares);
@@ -2951,7 +2951,7 @@ static long calc_group_shares(struct cfs_rq *cfs_rq)
 * case no task is runnable on a CPU MIN_SHARES=2 should be returned
 * instead of 0.
 */
-   return clamp_t(long, shares, MIN_SHARES, tg_shares);
+   return clamp_t(unsigned long, shares, MIN_SHARES, tg_shares);
 }
 
 /*
@@ -2981,9 +2981,9 @@ static long calc_group_shares(struct cfs_rq *cfs_rq)
  * Where these max() serve both to use the 'instant' values to fix the slow
  * from-idle and avoid the /0 on to-idle, similar to (6).
  */
-static long calc_group_runnable(struct cfs_rq *cfs_rq, long shares)
+static unsigned long calc_group_runnable(struct cfs_rq *cfs_rq, long shares)
 {
-   long runnable, load_avg;
+   unsigned long runnable, load_avg;
 
load_avg = max(cfs_rq->avg.load_avg,
   scale_load_down(cfs_rq->load.weight));
@@ -2995,7 +2995,7 @@ static long calc_group_runnable(struct cfs_rq *cfs_rq, 
long shares)
if (load_avg)
runnable /= load_avg;
 
-   return clamp_t(long, runnable, MIN_SHARES, shares);
+   return clamp_t(unsigned long, runnable, MIN_SHARES, shares);
 }
 #endif /* CONFIG_SMP */
 
@@ -3008,7 +3008,7 @@ static inline int throttled_hierarchy(struct cfs_rq 
*cfs_rq);
 static void update_cfs_group(struct sched_entity *se)
 {
struct cfs_rq *gcfs_rq = group_cfs_rq(se);
-   long shares, runnable;
+   unsigned long shares, runnable;
 
if (!gcfs_rq)
return;
-- 
2.17.1



[PATCH] regulator: tps65218: Convert to use regulator_get_current_limit_regmap

2019-04-12 Thread Axel Lin
Use regulator_get_current_limit_regmap helper to save some code.

Signed-off-by: Axel Lin 
---
 drivers/regulator/tps65218-regulator.c | 48 ++
 1 file changed, 19 insertions(+), 29 deletions(-)

diff --git a/drivers/regulator/tps65218-regulator.c 
b/drivers/regulator/tps65218-regulator.c
index 7aed3fbffecc..b72035610013 100644
--- a/drivers/regulator/tps65218-regulator.c
+++ b/drivers/regulator/tps65218-regulator.c
@@ -29,7 +29,8 @@
 #include 
 
 #define TPS65218_REGULATOR(_name, _of, _id, _type, _ops, _n, _vr, _vm, _er, \
-  _em, _cr, _cm, _lr, _nlr, _delay, _fuv, _sr, _sm) \
+  _em, _cr, _cm, _lr, _nlr, _delay, _fuv, _sr, _sm, \
+  _ct, _ncl) \
{   \
.name   = _name,\
.of_match   = _of,  \
@@ -42,6 +43,8 @@
.vsel_mask  = _vm,  \
.csel_reg   = _cr,  \
.csel_mask  = _cm,  \
+   .curr_table = _ct,  \
+   .n_current_limits   = _ncl, \
.enable_reg = _er,  \
.enable_mask= _em,  \
.volt_table = NULL, \
@@ -188,8 +191,7 @@ static const struct regulator_ops tps65218_ldo1_dcdc34_ops 
= {
.set_suspend_disable= tps65218_pmic_set_suspend_disable,
 };
 
-static const int ls3_currents[] = { 10, 20, 50, 100 };
-
+static const unsigned int ls3_currents[] = { 10, 20, 50, 100 };
 
 static int tps65218_pmic_set_input_current_lim(struct regulator_dev *dev,
   int lim_uA)
@@ -229,29 +231,13 @@ static int tps65218_pmic_set_current_limit(struct 
regulator_dev *dev,
 TPS65218_PROTECT_L1);
 }
 
-static int tps65218_pmic_get_current_limit(struct regulator_dev *dev)
-{
-   int retval;
-   unsigned int index;
-   struct tps65218 *tps = rdev_get_drvdata(dev);
-
-   retval = regmap_read(tps->regmap, dev->desc->csel_reg, &index);
-   if (retval < 0)
-   return retval;
-
-   index = (index & dev->desc->csel_mask) >>
-__builtin_ctz(dev->desc->csel_mask);
-
-   return ls3_currents[index];
-}
-
 static const struct regulator_ops tps65218_ls23_ops = {
.is_enabled = regulator_is_enabled_regmap,
.enable = tps65218_pmic_enable,
.disable= tps65218_pmic_disable,
.set_input_current_limit = tps65218_pmic_set_input_current_lim,
.set_current_limit  = tps65218_pmic_set_current_limit,
-   .get_current_limit  = tps65218_pmic_get_current_limit,
+   .get_current_limit  = regulator_get_current_limit_regmap,
 };
 
 /* Operations permitted on DCDC5, DCDC6 */
@@ -270,53 +256,57 @@ static const struct regulator_desc regulators[] = {
   TPS65218_CONTROL_DCDC1_MASK, TPS65218_REG_ENABLE1,
   TPS65218_ENABLE1_DC1_EN, 0, 0, dcdc1_dcdc2_ranges,
   2, 4000, 0, TPS65218_REG_SEQ3,
-  TPS65218_SEQ3_DC1_SEQ_MASK),
+  TPS65218_SEQ3_DC1_SEQ_MASK, NULL, 0),
TPS65218_REGULATOR("DCDC2", "regulator-dcdc2", TPS65218_DCDC_2,
   REGULATOR_VOLTAGE, tps65218_dcdc12_ops, 64,
   TPS65218_REG_CONTROL_DCDC2,
   TPS65218_CONTROL_DCDC2_MASK, TPS65218_REG_ENABLE1,
   TPS65218_ENABLE1_DC2_EN, 0, 0, dcdc1_dcdc2_ranges,
   2, 4000, 0, TPS65218_REG_SEQ3,
-  TPS65218_SEQ3_DC2_SEQ_MASK),
+  TPS65218_SEQ3_DC2_SEQ_MASK, NULL, 0),
TPS65218_REGULATOR("DCDC3", "regulator-dcdc3", TPS65218_DCDC_3,
   REGULATOR_VOLTAGE, tps65218_ldo1_dcdc34_ops, 64,
   TPS65218_REG_CONTROL_DCDC3,
   TPS65218_CONTROL_DCDC3_MASK, TPS65218_REG_ENABLE1,
   TPS65218_ENABLE1_DC3_EN, 0, 0, ldo1_dcdc3_ranges, 2,
-  0, 0, TPS65218_REG_SEQ4, TPS65218_SEQ4_DC3_SEQ_MASK),
+  0, 0, TPS65218_REG_SEQ4, TPS65218_SEQ4_DC3_SEQ_MASK,
+  NULL, 0),
TPS65218_REGULATOR("DCDC4", "regulator-dcdc4", TPS65218_DCDC_4,
   REGULATOR_VOLTAGE, tps65218_ldo1_dcdc34_ops, 53,
   TPS65218_REG_CONTROL_DCDC4,
   TPS65218_CONTROL_DCDC4_MASK, TPS65218_REG_ENABLE1,
   TPS65218_ENABLE1_DC4_EN, 0, 0, dcdc4_ranges, 2,
-  

Re: [PATCH-tip v3 02/14] locking/rwsem: Make owner available even if !CONFIG_RWSEM_SPIN_ON_OWNER

2019-04-12 Thread Waiman Long
On 04/12/2019 02:05 PM, Waiman Long wrote:
> On 04/12/2019 12:41 PM, Ingo Molnar wrote:
>>
>> So beyond the primary constraint of PeterZ OK-ing it all, there's also 
>> these two scalability regression reports from the ktest bot:
>>
>>  [locking/rwsem] 1b94536f2d: stress-ng.bad-altstack.ops_per_sec -32.7% 
>> regression
> A regression due to the lock handoff patch is kind of expected, but I
> will into why there is such a large drop.

I don't have a high core count system on hand. I run the stress-ng test
on a 2-socket 40-core 80-thread Skylake system:


Kernels: 1) Before lock handoff patch
 2) After lock handoff patch
 3) After wake all reader patch
 4) After reader spin on writer patch
 5) After writer spin on reader patch

    Tests K1  K2  K3  K4  K5
    - --  --  --  --  --
  bad-altstack   39928   35807   36422   40062   40747
  stackmmap    187 365 435 255 198
  vm    309589  296097  262045  281974  310439
  vm-segv   113776  114058  112318  115422  110550
 
Here, the bad-altstack dropped 10% after the lock handoff patch. However,
the performance is recovered with later patches. The stackmmap results
don't look quite right as the numbers are much smaller than the numbers
in the report.

I will rerun the tests again when I acquire a high core count system.

Anyway, the lock handoff patch is expected to reduce throughput under
heavy contention.

>>  [locking/rwsem] adc32e8877: will-it-scale.per_thread_ops -21.0% regression
> Will look into that also.

I can reproduce the regression on the same skylake system.

The results of the page_fault1 will-it-scale test are as follows:

 Threads   K2  K3   K4   K5
 ---   --  --   --   --
    20  5549772  5550332  5463961  5400064
    40  9540445 10286071  9705062  7706082
    60  8187245  8212307  247  6647705
    89  8390758  9619271  9019454  7124407

So the wake-all-reader patch is good for this benchmark. The performance
was reduced a bit with the reader-spin-on-writer patch. It got even worse
with the writer-spin-on-reader patch.

I looked at the perf output, rwsem contention accounted for less than
1% of the total cpu cycles. So I believe the regression was caused by
the behavior change introduced by the two reader optimistic spinning
patches. These patch will make writer less preferred than before. I
think the performance of this microbenchmark may be more dependent on
writer performance.

Looking at the lock event counts for K5:

 rwsem_opt_fail=253647
 rwsem_opt_nospin=8776
 rwsem_opt_rlock=259941
 rwsem_opt_wlock=2543
 rwsem_rlock=237747
 rwsem_rlock_fail=0
 rwsem_rlock_fast=0
 rwsem_rlock_handoff=0
 rwsem_sleep_reader=237747
 rwsem_sleep_writer=23098
 rwsem_wake_reader=6033
 rwsem_wake_writer=47032
 rwsem_wlock=15890
 rwsem_wlock_fail=10
 rwsem_wlock_handoff=3991

For K4, it was

 rwsem_opt_fail=479626
 rwsem_opt_rlock=8877
 rwsem_opt_wlock=114
 rwsem_rlock=453874
 rwsem_rlock_fail=0
 rwsem_rlock_fast=1234
 rwsem_rlock_handoff=0
 rwsem_sleep_reader=453058
 rwsem_sleep_writer=25836
 rwsem_wake_reader=11054
 rwsem_wake_writer=71568
 rwsem_wlock=24515
 rwsem_wlock_fail=3
 rwsem_wlock_handoff=5245

It can be seen that a lot more readers got the lock via optimistic
spinning.  One possibility is that reader optimistic spinning causes
readers to spread out into more lock acquisition groups than without. The
K3 results show that grouping more readers into one lock acquisition
group help to improve performance for this microbenchmark. I will need
to run more tests to find out the root cause of this regression. It is
not an easy problem to solve.

In the mean time, I am going to send out an updated patchset tomorrow so
that Peter can review the patch again when he is available.

Cheers,
Longman



RE: [PATCH] vmbus: Remove the undesired put_cpu_ptr() in hv_synic_cleanup()

2019-04-12 Thread Michael Kelley
From: Dexuan Cui   Sent: Friday, April 12, 2019 4:35 PM
> 
> With CONFIG_DEBUG_PREEMPT=y, the put_cpu_ptr() triggiers an underflow
> warning in preempt_count_sub().
> 
> Fixes: 37cdd991fac8 ("vmbus: put related per-cpu variable together")
> Cc: sta...@vger.kernel.org
> Cc: Stephen Hemminger 
> Signed-off-by: Dexuan Cui 

Reviewed-by: Michael Kelley 


[PATCH] printk: tie printk_once / printk_deferred_once into .data.once for reset

2019-04-12 Thread Paul Gortmaker
In commit b1fca27d384e ("kernel debug: support resetting WARN*_ONCE")
we got the opportunity to reset state on the one shot messages,
without having to reboot.

However printk_once (printk_deferred_once) live in a different file
and didn't get the same kind of update/conversion, so they remain
unconditionally one shot, until the system is rebooted.

For example, we currently have:

  sched/rt.c: printk_deferred_once("sched: RT throttling activated\n");

..which could reasonably be tripped as someone is testing and tuning
a new system/workload and their task placements.  For consistency, and
to avoid reboots in the same vein as the original commit, we make these
two instances of _once the same as the WARN*_ONCE instances are.

Cc: Andi Kleen 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: Andrew Morton 
Signed-off-by: Paul Gortmaker 
---
 Documentation/clearing-warn-once.txt | 2 +-
 include/linux/printk.h   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/clearing-warn-once.txt 
b/Documentation/clearing-warn-once.txt
index 5b1f5d547be1..c68598b31428 100644
--- a/Documentation/clearing-warn-once.txt
+++ b/Documentation/clearing-warn-once.txt
@@ -1,5 +1,5 @@
 
-WARN_ONCE / WARN_ON_ONCE only print a warning once.
+WARN_ONCE / WARN_ON_ONCE / printk_once only emit a message once.
 
 echo 1 > /sys/kernel/debug/clear_warn_once
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index d7c77ed1a4cb..84ea4d094af3 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -347,7 +347,7 @@ extern int kptr_restrict;
 #ifdef CONFIG_PRINTK
 #define printk_once(fmt, ...)  \
 ({ \
-   static bool __print_once __read_mostly; \
+   static bool __section(.data.once) __print_once; \
bool __ret_print_once = !__print_once;  \
\
if (!__print_once) {\
@@ -358,7 +358,7 @@ extern int kptr_restrict;
 })
 #define printk_deferred_once(fmt, ...) \
 ({ \
-   static bool __print_once __read_mostly; \
+   static bool __section(.data.once) __print_once; \
bool __ret_print_once = !__print_once;  \
\
if (!__print_once) {\
-- 
2.7.4



[PATCH v5 2/2] tty: serial: add driver for the SiFive UART

2019-04-12 Thread Paul Walmsley
Add a serial driver for the SiFive UART, found on SiFive FU540 devices
(among others).

The underlying serial IP block is relatively basic, and currently does
not support serial break detection.  Further information on the IP
block can be found in the documentation and Chisel sources:

https://static.dev.sifive.com/FU540-C000-v1.0.pdf


https://github.com/sifive/sifive-blocks/tree/master/src/main/scala/devices/uart

This driver was written in collaboration with Wesley Terpstra
.

Tested on a SiFive HiFive Unleashed A00 board, using BBL and the open-
source FSBL (using a DT file based on what's targeted for mainline).

This revision incorporates changes based on comments by Julia Lawall
, Emil Renner Berthing , and
Andreas Schwab .  Thanks also to Andreas for testing
the driver with his userspace and reporting a bug with the
set_termios implementation.

Signed-off-by: Paul Walmsley 
Signed-off-by: Paul Walmsley 
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: Palmer Dabbelt 
Cc: Wesley Terpstra 
Cc: linux-ser...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Julia Lawall 
Cc: Emil Renner Berthing 
Cc: Andreas Schwab 
---
 drivers/tty/serial/Kconfig   |   24 +
 drivers/tty/serial/Makefile  |1 +
 drivers/tty/serial/sifive.c  | 1056 ++
 include/uapi/linux/serial_core.h |3 +
 4 files changed, 1084 insertions(+)
 create mode 100644 drivers/tty/serial/sifive.c

diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index 72966bc0ac76..561e053b690a 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -1095,6 +1095,30 @@ config SERIAL_OMAP_CONSOLE
  your boot loader about how to pass options to the kernel at
  boot time.)
 
+config SERIAL_SIFIVE
+   tristate "SiFive UART support"
+   depends on OF
+   select SERIAL_CORE
+   help
+ Select this option if you are building a kernel for a device that
+ contains a SiFive UART IP block.  This type of UART is present on
+ SiFive FU540 SoCs, among others.
+
+config SERIAL_SIFIVE_CONSOLE
+   bool "Console on SiFive UART"
+   depends on SERIAL_SIFIVE=y
+   select SERIAL_CORE_CONSOLE
+   help
+ Select this option if you would like to use a SiFive UART as the
+ system console.
+
+ Even if you say Y here, the currently visible virtual console
+ (/dev/tty0) will still be used as the system console by default, but
+ you can alter that using a kernel command line option such as
+ "console=ttySIFx". (Try "man bootparam" or see the documentation of
+ your boot loader about how to pass options to the kernel at
+ boot time.)
+
 config SERIAL_LANTIQ
bool "Lantiq serial driver"
depends on LANTIQ
diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile
index 40b702aaa85e..2aff1d07d08b 100644
--- a/drivers/tty/serial/Makefile
+++ b/drivers/tty/serial/Makefile
@@ -92,6 +92,7 @@ obj-$(CONFIG_SERIAL_PIC32)+= pic32_uart.o
 obj-$(CONFIG_SERIAL_MPS2_UART) += mps2-uart.o
 obj-$(CONFIG_SERIAL_OWL)   += owl-uart.o
 obj-$(CONFIG_SERIAL_RDA)   += rda-uart.o
+obj-$(CONFIG_SERIAL_SIFIVE)+= sifive.o
 
 # GPIOLIB helpers for modem control lines
 obj-$(CONFIG_SERIAL_MCTRL_GPIO)+= serial_mctrl_gpio.o
diff --git a/drivers/tty/serial/sifive.c b/drivers/tty/serial/sifive.c
new file mode 100644
index ..be4687814353
--- /dev/null
+++ b/drivers/tty/serial/sifive.c
@@ -0,0 +1,1056 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * SiFive UART driver
+ * Copyright (C) 2018 Paul Walmsley 
+ * Copyright (C) 2018-2019 SiFive
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Based partially on:
+ * - drivers/tty/serial/pxa.c
+ * - drivers/tty/serial/amba-pl011.c
+ * - drivers/tty/serial/uartlite.c
+ * - drivers/tty/serial/omap-serial.c
+ * - drivers/pwm/pwm-sifive.c
+ *
+ * See the following sources for further documentation:
+ * - Chapter 19 "Universal Asynchronous Receiver/Transmitter (UART)" of
+ *   SiFive FE310-G000 v2p3
+ * - The tree/master/src/main/scala/devices/uart directory of
+ *   https://github.com/sifive/sifive-blocks/
+ *
+ * The SiFive UART design is not 8250-compatible.  The following common
+ * features are not supported:
+ * - Word lengths other than 8 bits
+ * - Break handling
+ * - Parity
+ * - Flow control
+ * - Modem signals (DSR, RI, etc.)
+ * On the other hand, the design is free from the baggage of the 8

[PATCH v5 0/2] tty: serial: add DT bindings and serial driver for the SiFive FU540 UART

2019-04-12 Thread Paul Walmsley
This series adds a serial driver, with console support, for the
UART IP block present on the SiFive FU540 SoC.  The programming
model is straightforward, but unique.

Boot-tested on a SiFive FU540 HiFive-U board, using BBL and the
open-source FSBL (with appropriate patches to the DT data).

This fifth version fixes a bug in the set_termios handler,
found by Andreas Schwab .

The patches in this series can also be found, with the PRCI patches,
DT patches, and DT prerequisite patch, at:

https://github.com/sifive/riscv-linux/tree/dev/paulw/serial-v5.1-rc4


- Paul

Paul Walmsley (2):
  dt-bindings: serial: add documentation for the SiFive UART driver
  tty: serial: add driver for the SiFive UART

 .../bindings/serial/sifive-serial.txt |   33 +
 drivers/tty/serial/Kconfig|   24 +
 drivers/tty/serial/Makefile   |1 +
 drivers/tty/serial/sifive.c   | 1056 +
 include/uapi/linux/serial_core.h  |3 +
 5 files changed, 1117 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/serial/sifive-serial.txt
 create mode 100644 drivers/tty/serial/sifive.c

-- 
2.20.1



[PATCH v5 1/2] dt-bindings: serial: add documentation for the SiFive UART driver

2019-04-12 Thread Paul Walmsley
Add DT binding documentation for the Linux driver for the SiFive
asynchronous serial IP block.

This revision incorporates changes based on feedback from Rob
Herring .

Signed-off-by: Paul Walmsley 
Signed-off-by: Paul Walmsley 
Cc: linux-ser...@vger.kernel.org
Cc: devicet...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman 
Cc: Rob Herring 
Cc: Mark Rutland 
Cc: Palmer Dabbelt 
---
 .../bindings/serial/sifive-serial.txt | 33 +++
 1 file changed, 33 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/serial/sifive-serial.txt

diff --git a/Documentation/devicetree/bindings/serial/sifive-serial.txt 
b/Documentation/devicetree/bindings/serial/sifive-serial.txt
new file mode 100644
index ..c86b1e524159
--- /dev/null
+++ b/Documentation/devicetree/bindings/serial/sifive-serial.txt
@@ -0,0 +1,33 @@
+SiFive asynchronous serial interface (UART)
+
+Required properties:
+
+- compatible: should be something similar to
+ "sifive,-uart" for the UART as integrated
+ on a particular chip, and "sifive,uart" for the
+ general UART IP block programming model.  Supported
+ compatible strings as of the date of this writing are:
+ "sifive,fu540-c000-uart" for the SiFive UART v0 as
+ integrated onto the SiFive FU540 chip, or "sifive,uart0"
+ for the SiFive UART v0 IP block with no chip integration
+ tweaks (if any)
+- reg: address and length of the register space
+- interrupts: Should contain the UART interrupt identifier
+- clocks: Should contain a clock identifier for the UART's parent clock
+
+
+UART HDL that corresponds to the IP block version numbers can be found
+here:
+
+https://github.com/sifive/sifive-blocks/tree/master/src/main/scala/devices/uart
+
+
+Example:
+
+uart0: serial@1001 {
+   compatible = "sifive,fu540-c000-uart", "sifive,uart0";
+   interrupt-parent = <&plic0>;
+   interrupts = <80>;
+   reg = <0x0 0x1001 0x0 0x1000>;
+   clocks = <&prci PRCI_CLK_TLCLK>;
+};
-- 
2.20.1



Re: [RFC] what's struct ipmi_file_private ->file for?

2019-04-12 Thread Corey Minyard
On Sat, Apr 13, 2019 at 01:47:57AM +0100, Al Viro wrote:
>   AFAICS, nothing has ever used that; am I, by any
> chance, missing some nasty macros in there?  Just curious...

I'm not sure why it's there, but it's been there since 2003 when
the driver was first added.  I've queued a patch to remove it.

Thanks,

-corey


Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-12 Thread Daniel Colascione
[Resending due to accidental HTML. I need to take Joel's advice and
switch to a real email client]

On Fri, Apr 12, 2019 at 5:54 PM Daniel Colascione  wrote:
>
> On Fri, Apr 12, 2019 at 5:09 PM Joel Fernandes  wrote:
>>
>> Hi Andy!
>>
>> On Fri, Apr 12, 2019 at 02:32:53PM -0700, Andy Lutomirski wrote:
>> > On Thu, Apr 11, 2019 at 10:51 AM Joel Fernandes (Google)
>> >  wrote:
>> > >
>> > > pidfd are /proc/pid directory file descriptors referring to a task group
>> > > leader. Android low memory killer (LMK) needs pidfd polling support to
>> > > replace code that currently checks for existence of /proc/pid for
>> > > knowing a process that is signalled to be killed has died, which is both
>> > > racy and slow. The pidfd poll approach is race-free, and also allows the
>> > > LMK to do other things (such as by polling on other fds) while awaiting
>> > > the process being killed to die.
>> > >
>> > > It prevents a situation where a PID is reused between when LMK sends a
>> > > kill signal and checks for existence of the PID, since the wrong PID is
>> > > now possibly checked for existence.
>> > >
>> > > In this patch, we follow the same mechanism used uhen the parent of the
>> > > task group is to be notified, that is when the tasks waiting on a poll
>> > > of pidfd are also awakened.
>> > >
>> > > We have decided to include the waitqueue in struct pid for the following
>> > > reasons:
>> > > 1. The wait queue has to survive for the lifetime of the poll. Including
>> > > it in task_struct would not be option in this case because the task can
>> > > be reaped and destroyed before the poll returns.
>> >
>> > Are you sure?  I admit I'm not all that familiar with the innards of
>> > poll() on Linux, but I thought that the waitqueue only had to survive
>> > long enough to kick the polling thread and did *not* have to survive
>> > until poll() actually returned.
>>
>> I am not sure now. I thought epoll(2) was based on the wait_event APIs,
>> however more closely looking at the eventpoll code, it looks like there are 2
>> waitqueues involved, one that we pass and the other that is a part of the
>> eventpoll session itself, so you could be right about that. Daniel Colascione
>> may have some more thoughts about it since he brought up the possiblity of a
>> wq life-time issue. Daniel?   We were just playing it safe.

I think you (Joel) and Andy are talking about different meanings of
poll(). Joel is talking about the VFS method; Andy is talking about
the system call. ISTM that the lifetime of wait queue we give to
poll_wait needs to last through the poll. Normally the wait queue gets
pinned by the struct file that we give to poll_wait (which takes a
reference on the struct file), but the pidfd struct file doesn't pin
the struct task, so we can't use a wait queue in struct task.
(remove_wait_queue, which poll implementations call to undo wait queue
additions, takes the wait queue head we pass to poll_wait, and we
don't want to pass a dangling pointer to remove_wait_queue.) If the
lifetime requirements for the queue aren't this strict, I don't see it
documented anywhere. Besides: if we don't actually need to pin the
waitqueue lifetime for the duration of the poll, why bother taking a
reference on the polled struct file?


[RFC] what's struct ipmi_file_private ->file for?

2019-04-12 Thread Al Viro
AFAICS, nothing has ever used that; am I, by any
chance, missing some nasty macros in there?  Just curious...


Re: [PATCH 4/4] locking/lockdep: Test all incompatible scenario at once in check_irq_usage()

2019-04-12 Thread Frederic Weisbecker
On Thu, Apr 11, 2019 at 12:46:32PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 10, 2019 at 04:28:48AM +0200, Frederic Weisbecker wrote:
> > Should I re-issue the set or you do the changes?
> 
> I've made it like so; does that work? In particular, do the comments
> make sense? It is always hard to write these things down.

It's especially hard because we describe both the bits of the usage mask
and the bits of the bit numbers of the usage mask. The resulting description
can only be naughty :-)

That said I can probably clarify that in lockdep_internals.h on further
patches.

A few comments though:

> +/*
> + * The bit number is encoded like:
> + *
> + *  bit0: 0 exclusive, 1 read lock
> + *  bit1: 0 used in irq, 1 irq enabled
> + *  bit2-n: state
> + */
>  static int exclusive_bit(int new_bit)
>  {
>   int state = new_bit & LOCK_USAGE_STATE_MASK;
> @@ -1988,45 +1968,158 @@ static int exclusive_bit(int new_bit)
>   return state | (dir ^ LOCK_USAGE_DIR_MASK);
>  }
>  
> +/*
> + * Observe that when given a bitmask where each bitnr is encoded as above, a
> + * right shift of the mask transforms the individual bitnrs as -1.
> + *
> + * So for all bits where bitnr1 == 1, we can create the mask where bitnr1 == > 0

So by bitnr1 you're referring to direction, right?

> + * by subtracting 2, or shifting the mask right by 2.

In which case we can perhaps reformulate:

  So for all bits whose number have LOCK_ENABLED_* set (bitnr1 == 1), we can 
create the
  mask with those bit numbers using LOCK_USED_IN_* (bitnr1 == 0) instead by
  subtracting the bit number by 2, or shifting the mask right by 2.

And same would go for below.

> + *
> + * Similarly, bitnr1 == 0 becomes bitnr1 == 1 by adding 2, or shifting left 
> 2.
> + *
> + * So split the mask (note that LOCKF_ENABLED_IRQ_ALL|LOCKF_USED_IN_IRQ_ALL 
> is
> + * all bits set) and recompose with bitnr1 flipped.
> + */
> +static unsigned long invert_dir_mask(unsigned long mask)
> +{
> + unsigned long excl = 0;
> +
> + /* Invert dir */
> + excl |= (mask & LOCKF_ENABLED_IRQ_ALL) >> LOCK_USAGE_DIR_MASK;
> + excl |= (mask & LOCKF_USED_IN_IRQ_ALL) << LOCK_USAGE_DIR_MASK;
> +
> + return excl;
> +}
> +
> +/*
> + * As above, we clear bitnr0 with bitmask ops. First, for all bits with 
> bitnr1
> + * set, add those with bitnr1 cleared. And then mask out all bitnr1.
> + */

Same here:

  As above, we clear bitnr0 (LOCK_*_READ off) with bitmask ops. First, for all 
bits
  with bitnr1 set (LOCK_ENABLED_*) , add those with bitnr1 cleared 
(LOCK_USED_IN_*).
  And then mask out all bitnr1.

> +static unsigned long exclusive_mask(unsigned long mask)
> +{
> + unsigned long excl = invert_dir_mask(mask);
> +
> + /* Strip read */
> + excl |= (excl & LOCKF_IRQ_READ) >> LOCK_USAGE_READ_MASK;
> + excl &= ~LOCKF_IRQ_READ;
> +
> + return excl;
> +}

Not sure I'm making things clearer but at least that's some more pointers
to enum lock_usage_bit (defined on headers where I should add more layout
explanations, especially to make it clear we play with bit number bits :-s  )

Thanks.


[PATCH v3] mm/hotplug: treat CMA pages as unmovable

2019-04-12 Thread Qian Cai
has_unmovable_pages() is used by allocating CMA and gigantic pages as
well as the memory hotplug. The later doesn't know how to offline CMA
pool properly now, but if an unused (free) CMA page is encountered, then
has_unmovable_pages() happily considers it as a free memory and
propagates this up the call chain. Memory offlining code then frees the
page without a proper CMA tear down which leads to an accounting issues.
Moreover if the same memory range is onlined again then the memory never
gets back to the CMA pool.

State after memory offline:
 # grep cma /proc/vmstat
 nr_free_cma 205824

 # cat /sys/kernel/debug/cma/cma-kvm_cma/count
 209920

Also, kmemleak still think those memory address are reserved but have
already been used by the buddy allocator after onlining.

Offlined Pages 4096
kmemleak: Cannot insert 0xc000201f7d040008 into the object search tree
(overlaps existing)
Call Trace:
[c0003dc2faf0] [c0884b2c] dump_stack+0xb0/0xf4 (unreliable)
[c0003dc2fb30] [c0424fb4] create_object+0x344/0x380
[c0003dc2fbf0] [c03d178c] __kmalloc_node+0x3ec/0x860
[c0003dc2fc90] [c0319078] kvmalloc_node+0x58/0x110
[c0003dc2fcd0] [c0484d9c] seq_read+0x41c/0x620
[c0003dc2fd60] [c04472bc] __vfs_read+0x3c/0x70
[c0003dc2fd80] [c04473ac] vfs_read+0xbc/0x1a0
[c0003dc2fdd0] [c044783c] ksys_read+0x7c/0x140
[c0003dc2fe20] [c000b108] system_call+0x5c/0x70
kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xc000201cc800 (size 13757317120):
kmemleak:   comm "swapper/0", pid 0, jiffies 4294937297
kmemleak:   min_count = -1
kmemleak:   count = 0
kmemleak:   flags = 0x5
kmemleak:   checksum = 0
kmemleak:   backtrace:
 cma_declare_contiguous+0x2a4/0x3b0
 kvm_cma_reserve+0x11c/0x134
 setup_arch+0x300/0x3f8
 start_kernel+0x9c/0x6e8
 start_here_common+0x1c/0x4b0
kmemleak: Automatic memory scanning thread ended

Acked-by: Michal Hocko 
Signed-off-by: Qian Cai 
---

v3: Use a string pointer instead of an array per Michal.
v2: Borrow some commit log texts.
Call dump_page() in the error path.

 mm/page_alloc.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d96ca5bc555b..40029b29fb88 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8005,7 +8005,10 @@ void *__init alloc_large_system_hash(const char 
*tablename,
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 int migratetype, int flags)
 {
-   unsigned long pfn, iter, found;
+   unsigned long found;
+   unsigned long iter = 0;
+   unsigned long pfn = page_to_pfn(page);
+   const char *reason = "unmovable page";
 
/*
 * TODO we could make this much more efficient by not checking every
@@ -8015,17 +8018,20 @@ bool has_unmovable_pages(struct zone *zone, struct page 
*page, int count,
 * can still lead to having bootmem allocations in zone_movable.
 */
 
-   /*
-* CMA allocations (alloc_contig_range) really need to mark isolate
-* CMA pageblocks even when they are not movable in fact so consider
-* them movable here.
-*/
-   if (is_migrate_cma(migratetype) &&
-   is_migrate_cma(get_pageblock_migratetype(page)))
-   return false;
+   if (is_migrate_cma(get_pageblock_migratetype(page))) {
+   /*
+* CMA allocations (alloc_contig_range) really need to mark
+* isolate CMA pageblocks even when they are not movable in fact
+* so consider them movable here.
+*/
+   if (is_migrate_cma(migratetype))
+   return false;
+
+   reason = "CMA page";
+   goto unmovable;
+   }
 
-   pfn = page_to_pfn(page);
-   for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
+   for (found = 0; iter < pageblock_nr_pages; iter++) {
unsigned long check = pfn + iter;
 
if (!pfn_valid_within(check))
@@ -8105,7 +8111,7 @@ bool has_unmovable_pages(struct zone *zone, struct page 
*page, int count,
 unmovable:
WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
if (flags & REPORT_FAILURE)
-   dump_page(pfn_to_page(pfn+iter), "unmovable page");
+   dump_page(pfn_to_page(pfn + iter), reason);
return true;
 }
 
-- 
2.20.1 (Apple Git-117)



Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-12 Thread Joel Fernandes
Hi Andy!

On Fri, Apr 12, 2019 at 02:32:53PM -0700, Andy Lutomirski wrote:
> On Thu, Apr 11, 2019 at 10:51 AM Joel Fernandes (Google)
>  wrote:
> >
> > pidfd are /proc/pid directory file descriptors referring to a task group
> > leader. Android low memory killer (LMK) needs pidfd polling support to
> > replace code that currently checks for existence of /proc/pid for
> > knowing a process that is signalled to be killed has died, which is both
> > racy and slow. The pidfd poll approach is race-free, and also allows the
> > LMK to do other things (such as by polling on other fds) while awaiting
> > the process being killed to die.
> >
> > It prevents a situation where a PID is reused between when LMK sends a
> > kill signal and checks for existence of the PID, since the wrong PID is
> > now possibly checked for existence.
> >
> > In this patch, we follow the same mechanism used uhen the parent of the
> > task group is to be notified, that is when the tasks waiting on a poll
> > of pidfd are also awakened.
> >
> > We have decided to include the waitqueue in struct pid for the following
> > reasons:
> > 1. The wait queue has to survive for the lifetime of the poll. Including
> > it in task_struct would not be option in this case because the task can
> > be reaped and destroyed before the poll returns.
> 
> Are you sure?  I admit I'm not all that familiar with the innards of
> poll() on Linux, but I thought that the waitqueue only had to survive
> long enough to kick the polling thread and did *not* have to survive
> until poll() actually returned.

I am not sure now. I thought epoll(2) was based on the wait_event APIs,
however more closely looking at the eventpoll code, it looks like there are 2
waitqueues involved, one that we pass and the other that is a part of the
eventpoll session itself, so you could be right about that. Daniel Colascione
may have some more thoughts about it since he brought up the possiblity of a
wq life-time issue. Daniel?   We were just playing it safe.

Either way the waitqueue in struct pid has the advantage mentioned below:

> > 2. By including the struct pid for the waitqueue means that during
> > de_exec, the thread doing de_thread() automatically gets the new
> > waitqueue/pid even though its task_struct is different.
> 
> I didn't follow this.  Can you clarify?

Sure. de_thread() can called when all threads of a thread group need to die
when any thread in the group does an execve. The thread doing the execve will
become the new thread leader. In this case, the thread that did the exec gets
the pid of the new leader. The semantics of wait(2) are such that the wait
should not return (unblock) in the above scenario because the group is
non-empty even though the task_struct of the group leader died. IOW, we
should not wake up any pidfd pollers in this cases.

So basically what I was trying to say in point 2 above is that because of
putting the waitqueue in struct pid, the change_pid() in de_thread()
automatically carries the waiting tasks to the new task_struct leader,
because the pid gets transferred to the new leader. If we put it in
task_struct, then that wouldn't work since the leader's task_struct would get
destroyed and we would have to handle the case in some other way. At least
that is the theory.  Anyway we specifically test for this case in patch 2/2
and also tested that not handling this case fails the test.

> Also, please don't call your new helper wake_up_pidfd_pollers().  One

I will call it wake_up_pollers() then, if that's Ok.

> of the goals of my patch was to make it generically possible for
> kernel code to wait for a task to exit.  There are other cases besides
> pidfd for which this would be useful.  Ahem, kthread.  (The kthread
> implementation currently does some seriously awful things to detect
> when kthreads die.)  Also, some hypothetical future vastly improved
> debugging API (to supercede ptrace for new applications) might want
> this.

Ah I see :-) Nice to know we can use this to improve the kthread code.

thanks,

 - Joel



[PATCH] smp: Do not warn if smp_call_function_single() is doing a self call.

2019-04-12 Thread Dexuan Cui
If smp_call_function_single() is calling the function for itself, it's safe
to run with irqs_disabled() == true.

I hit the warning because I'm in the below path in the .suspend callback of
a "syscore_ops" to support hibernation for a VM running on Hyper-V:

  hv_synic_cleanup() ->
clockevents_unbind_device() ->
  clockevents_unbind() ->
smp_call_function_single().

When the .suspend callback runs, only CPU0 is online and irqs_disabled() is
true.

Signed-off-by: Dexuan Cui 
---
 kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index f4cf1b0bb3b8..4fdf6a378def 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -288,7 +288,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, 
void *info,
 * can't happen.
 */
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
-&& !oops_in_progress);
+&& cpu != smp_processor_id() && !oops_in_progress);
 
csd = &csd_stack;
if (!wait) {
-- 
2.19.1



[PATCH] vmbus: Remove the undesired put_cpu_ptr() in hv_synic_cleanup()

2019-04-12 Thread Dexuan Cui
With CONFIG_DEBUG_PREEMPT=y, the put_cpu_ptr() triggiers an underflow
warning in preempt_count_sub().

Fixes: 37cdd991fac8 ("vmbus: put related per-cpu variable together")
Cc: sta...@vger.kernel.org
Cc: Stephen Hemminger 
Signed-off-by: Dexuan Cui 
---
 drivers/hv/hv.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 632d25674e7f..45653029ee18 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -408,7 +408,6 @@ int hv_synic_cleanup(unsigned int cpu)
 
clockevents_unbind_device(hv_cpu->clk_evt, cpu);
hv_ce_shutdown(hv_cpu->clk_evt);
-   put_cpu_ptr(hv_cpu);
}
 
hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
-- 
2.19.1



[GIT PULL] clk fixes for v5.1-rc4

2019-04-12 Thread Stephen Boyd
The following changes since commit 9e98c678c2d6ae3a17cb2de55d17f69dddaa231b:

  Linux 5.1-rc1 (2019-03-17 14:22:26 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git 
tags/clk-fixes-for-linus

for you to fetch changes up to f89b9e1be7da8bb0aac667a0206a00975cefe6d3:

  clk: imx: Fix PLL_1416X not rounding rates (2019-04-12 14:21:43 -0700)


Here's more than a handful of clk driver fixes for changes that came in
during the merge window:

 - Fix the AT91 sama5d2 programmable clk prescaler formula

 - A bunch of Amlogic meson clk driver fixes for the VPU clks

 - A DMI quirk for Intel's Bay Trail SoC's driver to properly mark
   pmc clks as critical only when really needed

 - Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate
   implementation

 - Use the right structure to test for a frequency table in i.MX's
   PLL_1416x driver


David Müller (1):
  clk: x86: Add system specific quirk to mark clocks as critical

Leonard Crestez (1):
  clk: imx: Fix PLL_1416X not rounding rates

Martin Blumenstingl (1):
  clk: meson: pll: fix rounding and setting a rate that matches precisely

Matthias Wieloch (1):
  clk: at91: fix programmable clock for sama5d2

Maxime Jourdan (2):
  clk: meson-gxbb: round the vdec dividers to closest
  clk: meson: g12a: fix VPU clock muxes mask

Neil Armstrong (2):
  clk: meson-g12a: fix VPU clock parents
  clk: meson: vid-pll-div: remove warning and return 0 on invalid config

Stephen Boyd (3):
  Merge tag 'meson-clk-fixes-for-5.1' of 
https://github.com/BayLibre/clk-meson into clk-fixes
  Merge tag 'meson-clk-fixes-for-5.1-v2' of 
https://github.com/BayLibre/clk-meson into clk-fixes
  platform/x86: pmc_atom: Drop __initconst on dmi table

Weiyi Lu (1):
  clk: mediatek: fix clk-gate flag setting

 drivers/clk/at91/clk-programmable.c| 57 +++---
 drivers/clk/at91/pmc.h |  2 +
 drivers/clk/at91/sama5d2.c | 10 -
 drivers/clk/imx/clk-pll14xx.c  |  2 +-
 drivers/clk/mediatek/clk-gate.c|  3 +-
 drivers/clk/meson/clk-pll.c|  2 +-
 drivers/clk/meson/g12a.c   |  6 +--
 drivers/clk/meson/gxbb.c   |  2 +
 drivers/clk/meson/vid-pll-div.c|  4 +-
 drivers/clk/x86/clk-pmc-atom.c | 14 +--
 drivers/platform/x86/pmc_atom.c| 21 ++
 include/linux/platform_data/x86/clk-pmc-atom.h |  3 ++
 12 files changed, 99 insertions(+), 27 deletions(-)

-- 
Sent by a computer through tubes


Re: [PATCH] i2c: iproc: Change driver to use 'BIT' macro

2019-04-12 Thread Peter Rosin
On 2019-04-03 23:05, Ray Jui wrote:
> Change the iProc I2C driver to use the 'BIT' macro from all '1 << XXX'
> bit operations to get rid of compiler warning and improve readability of
> the code

All? I see lots more '1 << XXX_SHIFT' matches. I might be behind though?
Anyway, if you are cleaning up, I'm just flagging that BIT(XXX_SHIFT) looks
a bit clunky to me. You might consider renaming all those single-bit
XXX_SHIFT macros to simple be

#define XXX BIT()

instead of

#define XXX_SHIFT 

but that triggers more churn, so is obviously more error prone. You might
not dare it?

Cheers,
Peter

> Signed-off-by: Ray Jui 
> ---
>  drivers/i2c/busses/i2c-bcm-iproc.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-bcm-iproc.c 
> b/drivers/i2c/busses/i2c-bcm-iproc.c
> index 562942d0c05c..a845b8decac8 100644
> --- a/drivers/i2c/busses/i2c-bcm-iproc.c
> +++ b/drivers/i2c/busses/i2c-bcm-iproc.c
> @@ -717,7 +717,7 @@ static int bcm_iproc_i2c_xfer_single_msg(struct 
> bcm_iproc_i2c_dev *iproc_i2c,
>  
>   /* mark the last byte */
>   if (i == msg->len - 1)
> - val |= 1 << M_TX_WR_STATUS_SHIFT;
> + val |= BIT(M_TX_WR_STATUS_SHIFT);
>  
>   iproc_i2c_wr_reg(iproc_i2c, M_TX_OFFSET, val);
>   }
> @@ -844,7 +844,7 @@ static int bcm_iproc_i2c_cfg_speed(struct 
> bcm_iproc_i2c_dev *iproc_i2c)
>  
>   iproc_i2c->bus_speed = bus_speed;
>   val = iproc_i2c_rd_reg(iproc_i2c, TIM_CFG_OFFSET);
> - val &= ~(1 << TIM_CFG_MODE_400_SHIFT);
> + val &= ~BIT(TIM_CFG_MODE_400_SHIFT);
>   val |= (bus_speed == 40) << TIM_CFG_MODE_400_SHIFT;
>   iproc_i2c_wr_reg(iproc_i2c, TIM_CFG_OFFSET, val);
>  
> @@ -995,7 +995,7 @@ static int bcm_iproc_i2c_resume(struct device *dev)
>  
>   /* configure to the desired bus speed */
>   val = iproc_i2c_rd_reg(iproc_i2c, TIM_CFG_OFFSET);
> - val &= ~(1 << TIM_CFG_MODE_400_SHIFT);
> + val &= ~BIT(TIM_CFG_MODE_400_SHIFT);
>   val |= (iproc_i2c->bus_speed == 40) << TIM_CFG_MODE_400_SHIFT;
>   iproc_i2c_wr_reg(iproc_i2c, TIM_CFG_OFFSET, val);
>  
> 



[PATCH] platform/chrome: wilco_ec: Add telemetry data char device interface

2019-04-12 Thread Nick Crews
The Wilco Embedded Controller is able to send telemetry data
which is useful for enterprise applications. A daemon running on
the OS sends a command to the EC via write() to char device,
and can read the response with a read() request.
Both the request and the response are in an opaque
binary format so that information which is proprietary to the
enterprise service provider is secure.

The character device will appear as /dev/wilco_telemN, where N is some
small non-negative integer, starting with 0. Only one process may have
the file descriptor open at a time. The calling userspace program needs
to keep the device file descriptor open between the calls to write() and
read() in order to preserve the response. Either 32 or 256 bytes of data
are expected for arguments, and the same number of bytes will be
returned.

For testing purposes, if the EC receives a byte sequence beginning with
0, it will return an inverted copy of the input sequence. For an
example, run the simple python script from
https://gist.github.com/52ab07c8519b56c0ec671d3338760516

Signed-off-by: Nick Crews 
---
 drivers/platform/chrome/wilco_ec/Kconfig |   7 +
 drivers/platform/chrome/wilco_ec/Makefile|   2 +
 drivers/platform/chrome/wilco_ec/core.c  |  13 +
 drivers/platform/chrome/wilco_ec/telemetry.c | 312 +++
 include/linux/platform_data/wilco-ec.h   |   2 +
 5 files changed, 336 insertions(+)
 create mode 100644 drivers/platform/chrome/wilco_ec/telemetry.c

diff --git a/drivers/platform/chrome/wilco_ec/Kconfig 
b/drivers/platform/chrome/wilco_ec/Kconfig
index e09e4cebe9b4..7846f6146b78 100644
--- a/drivers/platform/chrome/wilco_ec/Kconfig
+++ b/drivers/platform/chrome/wilco_ec/Kconfig
@@ -18,3 +18,10 @@ config WILCO_EC_DEBUGFS
  manipulation and allow for testing arbitrary commands.  This
  interface is intended for debug only and will not be present
  on production devices.
+
+config WILCO_EC_TELEMETRY
+   tristate "Enable querying telemetry data from EC"
+   depends on WILCO_EC
+   help
+ If you say Y here, you get support to query opaque binary
+ telemetrydata from /dev/wilco_telem using write() and then read().
diff --git a/drivers/platform/chrome/wilco_ec/Makefile 
b/drivers/platform/chrome/wilco_ec/Makefile
index 063e7fb4ea17..b4aa6d26a3df 100644
--- a/drivers/platform/chrome/wilco_ec/Makefile
+++ b/drivers/platform/chrome/wilco_ec/Makefile
@@ -4,3 +4,5 @@ wilco_ec-objs   := core.o mailbox.o
 obj-$(CONFIG_WILCO_EC) += wilco_ec.o
 wilco_ec_debugfs-objs  := debugfs.o
 obj-$(CONFIG_WILCO_EC_DEBUGFS) += wilco_ec_debugfs.o
+wilco_ec_telem-objs:= telemetry.o
+obj-$(CONFIG_WILCO_EC_TELEMETRY)   += wilco_ec_telem.o
diff --git a/drivers/platform/chrome/wilco_ec/core.c 
b/drivers/platform/chrome/wilco_ec/core.c
index 05e1e2be1c91..517d9627fecc 100644
--- a/drivers/platform/chrome/wilco_ec/core.c
+++ b/drivers/platform/chrome/wilco_ec/core.c
@@ -89,8 +89,20 @@ static int wilco_ec_probe(struct platform_device *pdev)
goto unregister_debugfs;
}
 
+   /* Register child device that will be found by the telemetry driver. */
+   ec->telem_pdev = platform_device_register_data(dev, "wilco_telem",
+  PLATFORM_DEVID_AUTO,
+  NULL, 0);
+   if (IS_ERR(ec->telem_pdev)) {
+   dev_err(dev, "Failed to create telemetry platform device\n");
+   ret = PTR_ERR(ec->telem_pdev);
+   goto unregister_rtc;
+   }
+
return 0;
 
+unregister_rtc:
+   platform_device_unregister(ec->rtc_pdev);
 unregister_debugfs:
if (ec->debugfs_pdev)
platform_device_unregister(ec->debugfs_pdev);
@@ -102,6 +114,7 @@ static int wilco_ec_remove(struct platform_device *pdev)
 {
struct wilco_ec_device *ec = platform_get_drvdata(pdev);
 
+   platform_device_unregister(ec->telem_pdev);
platform_device_unregister(ec->rtc_pdev);
if (ec->debugfs_pdev)
platform_device_unregister(ec->debugfs_pdev);
diff --git a/drivers/platform/chrome/wilco_ec/telemetry.c 
b/drivers/platform/chrome/wilco_ec/telemetry.c
new file mode 100644
index ..6b3dc84bd0f2
--- /dev/null
+++ b/drivers/platform/chrome/wilco_ec/telemetry.c
@@ -0,0 +1,312 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Telemetry communication for Wilco EC
+ *
+ * Copyright 2019 Google LLC
+ *
+ * The Wilco Embedded Controller is able to send telemetry data
+ * which is useful for enterprise applications. A daemon running on
+ * the OS sends a command to the EC via a write() to a char device, and
+ * can read the response with a read() request. Both the request
+ * and the response are in an opaque binary format so that information
+ * which is proprietary to the enterprise service provider is secure.
+ *
+ * The 

Re: [PATCH 0/4] mm: memcontrol: memory.stat cost & correctness

2019-04-12 Thread Roman Gushchin
On Fri, Apr 12, 2019 at 11:15:03AM -0400, Johannes Weiner wrote:
> The cgroup memory.stat file holds recursive statistics for the entire
> subtree. The current implementation does this tree walk on-demand
> whenever the file is read. This is giving us problems in production.
> 
> 1. The cost of aggregating the statistics on-demand is high. A lot of
> system service cgroups are mostly idle and their stats don't change
> between reads, yet we always have to check them. There are also always
> some lazily-dying cgroups sitting around that are pinned by a handful
> of remaining page cache; the same applies to them.
> 
> In an application that periodically monitors memory.stat in our fleet,
> we have seen the aggregation consume up to 5% CPU time.
> 
> 2. When cgroups die and disappear from the cgroup tree, so do their
> accumulated vm events. The result is that the event counters at
> higher-level cgroups can go backwards and confuse some of our
> automation, let alone people looking at the graphs over time.
> 
> To address both issues, this patch series changes the stat
> implementation to spill counts upwards when the counters change.
> 
> The upward spilling is batched using the existing per-cpu cache. In a
> sparse file stress test with 5 level cgroup nesting, the additional
> cost of the flushing was negligible (a little under 1% of CPU at 100%
> CPU utilization, compared to the 5% of reading memory.stat during
> regular operation).
> 
>  include/linux/memcontrol.h |  96 +++---
>  mm/memcontrol.c| 290 +++
>  mm/vmscan.c|   4 +-
>  mm/workingset.c|   7 +-
>  4 files changed, 234 insertions(+), 163 deletions(-)
> 
> 

For the series:
Reviewed-by: Roman Gushchin 

Thanks!


Re: [PATCH v2 2/7] dt: bindings: Add multicolor class dt bindings documention

2019-04-12 Thread Dan Murphy
All

On 4/12/19 2:10 PM, Jacek Anaszewski wrote:
> Dan,
> 
> On 4/12/19 1:50 PM, Dan Murphy wrote:
>> Marek
>>
>> On 4/11/19 5:07 PM, Marek Behun wrote:
>>> Hi Dan,
>>> this probaly was discussed, but I did not follow brightness model
>>> discussions:
>>> what will happen if I set yellow by writing into yellow mode
>>> brightness, and then orange by writing orange model brightness?
>>> Will the resulting color be a mix of yellow and orange, or will the
>>> orange overwrite the yellow setting?
>>>
>>
>> This was not discussed and is a good question.  If you write the yellow mode 
>> for a group of
>> LEDs then yellow would be produced for the brightness requested.
>>
>> If orange is then requested then orange should be displayed at the 
>> brightness level requested.
>> So yes the orange will over write the yellow.
> 
> Yes, and individual color brightness levels should correspond
> to the color components of the brightness-model level currently set.
> 
>> The next question is if the absolute colors are written does it produce the 
>> same behavior?
>>
>> So if you have yellow and write to the red should the red LED brightness be 
>> modified or should the
>> color switch to red?
>> And if the red LED is on and the blue LED is written should the color switch 
>> to blue or should the blue and red LEDs be mixed together?
> 
> Now, if any of the color brightness files is altered it should update
> the hardware with this new setting, but brightness-model and main
> brightness level should not be changed. The thing that is missing in our
> proposal is lack of the way to check if brightness-model is up to date
> (i.e. if it reflects what is written to the hardware).
> 
> How about utilizing the sync file from the new colors directory?
> It could return 1 on read when brightness levels of all colors
> match exactly the ones assigned to the brightness model level currently
> set.
> 

There maybe a more pragmatic way to set the color brightness as opposed to 
defining each level. By defining the monochrome LEDs base maximum brightness 
for the brightness model and calculating a percentage for each LED from max   I 
shared this with Jacek earlier and he did not go for it but lets see if others 
agree or disagree.

With this method we don't have to define a slew of levels. This in theory may 
work but not sure it will work in practice.

For instance

lp5024_model_yellow: brightness-models {
led-brightness-model;
model@0 {
model_name = "yellow";
layout = ;
max_level = <255 247 196>;  //max level of the 
monochrome LEDs to obtain the color model.
};
};

Assumption made that a linear slope for each color model is expected.

Then in the code when a new brightness level is requested for the color the 
brightness of the monochrome LEDs would be based off the percentage from the 
max_brightness of the max in the max_level array

The hue is presented as just another color in the colors dir with brightness 
and max_brightness files.  So it "appears" as a single LED.

max_brightness = 255 This is the max of the LEDs in the array and this would 
also be presented to the user in the max_brightness of the hue. 

Red = (255/255)  = 100%
Green = (247/255) = 97%
Blue = (196/255) = 77%

brightness_val 128

Red = (128 * 100) / 100 = 128
Green = (128 * 97) / 100 = 124
Blue = (128 * 77) / 100 = 98

brightness_val 80

Red = (80 * 100) / 100 = 80
Green = (80 * 97) / 100 = 77
Blue = (80 * 77) / 100 = 61

brightness_val 10

Red = (10 * 100) / 100 = 10
Green = (10 * 97) / 100 = 9
Blue = (10 * 77) / 100 = 7

This method maintains a rough percentage delta between the LEDs.  Where in the 
MAX brightness case (255) blue is 77% dimmer in intensity then Red and with a 
brightness that is 1/4 max the delta percentage is 75% so the delta is within 
5% tolerance.

This would keep the directory structures clean and the user space would just 
need to know colors, brightness and max_brightness for each color.

Also with this simplistic model we can simplify the DT nodes to
lp5024_model_yellow: brightness-models {
led-brightness-model;
model@0 {
model_name = "yellow";
layout = ;
max_level = <255 247 196>;
};
};

lp5024_model_orange: brightness-models {
led-brightness-model;
model@1 {
model_name = "orange";
layout = ;
max_level = <236 140 16>;
};
}; 

For orange
max_brightness = 236 This is the max of the LEDs in the array and this would 
also be presented to the user in the max_brightness of the hue. 

Red = (236/236)  = 100%
Green = (140/236) = 59%
Blue = (16/236) = 6%



After all of this most likely someone will say this will not work.
Like I indicate

[PATCH] binder: take read mode of mmap_sem in binder_alloc_free_page()

2019-04-12 Thread Tyler Hicks
Restore the behavior of locking mmap_sem for reading in
binder_alloc_free_page(), as was first done in commit 3013bf62b67a
("binder: reduce mmap_sem write-side lock"). That change was
inadvertently reverted by commit 5cec2d2e5839 ("binder: fix race between
munmap() and direct reclaim").

In addition, change the name of the label for the error path to
accurately reflect that we're taking the lock for reading.

Backporting note: This fix is only needed when *both* of the commits
mentioned above are applied. That's an unlikely situation since they
both landed during the development of v5.1 but only one of them is
targeted for stable.

Fixes: 5cec2d2e5839 ("binder: fix race between munmap() and direct reclaim")
Signed-off-by: Tyler Hicks 
---

Hello - I was backporting 5cec2d2e5839 and didn't think it looked quite
right. The patch description doesn't specifically mention a need to acquire
mmap_sem for writing and the two commits (5cec2d2e5839 and 3013bf62b67a) landed
relatively close in time to each other so I have a feeling that a mistake could
have happened while resolving conflicts.

I also noticed that commit 3013bf62b67a was slightly incomplete because
it didn't rename the err_down_write_mmap_sem_failed label to
err_down_read_mmap_sem_failed.

I've tested this change by enabling CONFIG_ANDROID_BINDER_IPC_SELFTEST
and verified that the binderfs_test selftest passes.

Tyler

 drivers/android/binder_alloc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 195f120c4e8c..bb929eb87116 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -931,8 +931,8 @@ enum lru_status binder_alloc_free_page(struct list_head 
*item,
mm = alloc->vma_vm_mm;
if (!mmget_not_zero(mm))
goto err_mmget;
-   if (!down_write_trylock(&mm->mmap_sem))
-   goto err_down_write_mmap_sem_failed;
+   if (!down_read_trylock(&mm->mmap_sem))
+   goto err_down_read_mmap_sem_failed;
vma = binder_alloc_get_vma(alloc);
 
list_lru_isolate(lru, item);
@@ -945,7 +945,7 @@ enum lru_status binder_alloc_free_page(struct list_head 
*item,
 
trace_binder_unmap_user_end(alloc, index);
}
-   up_write(&mm->mmap_sem);
+   up_read(&mm->mmap_sem);
mmput(mm);
 
trace_binder_unmap_kernel_start(alloc, index);
@@ -959,7 +959,7 @@ enum lru_status binder_alloc_free_page(struct list_head 
*item,
mutex_unlock(&alloc->mutex);
return LRU_REMOVED_RETRY;
 
-err_down_write_mmap_sem_failed:
+err_down_read_mmap_sem_failed:
mmput_async(mm);
 err_mmget:
 err_page_already_freed:
-- 
2.7.4



[PATCH v1 1/1] Add support for IPMB driver

2019-04-12 Thread Asmaa Mnebhi
Support receiving IPMB requests on a Satellite MC from the BMC.
Once a response is ready, this driver will send back a response
to the BMC via the IPMB channel.

Signed-off-by: Asmaa Mnebhi 
---
 drivers/char/ipmi/Kconfig|   8 +
 drivers/char/ipmi/Makefile   |   1 +
 drivers/char/ipmi/ipmb_dev_int.c | 418 +++
 3 files changed, 427 insertions(+)
 create mode 100644 drivers/char/ipmi/ipmb_dev_int.c

diff --git a/drivers/char/ipmi/Kconfig b/drivers/char/ipmi/Kconfig
index 94719fc..784bc84 100644
--- a/drivers/char/ipmi/Kconfig
+++ b/drivers/char/ipmi/Kconfig
@@ -74,6 +74,14 @@ config IPMI_SSIF
 have a driver that must be accessed over an I2C bus instead of a
 standard interface.  This module requires I2C support.
 
+config IPMB_DEVICE_INTERFACE
+   tristate 'IPMB Interface handler'
+   select I2C
+   help
+ Provides a driver for a device (Satellite MC) to
+ receive requests and send responses back to the BMC via
+ the IPMB interface. This module requires I2C support.
+
 config IPMI_POWERNV
depends on PPC_POWERNV
tristate 'POWERNV (OPAL firmware) IPMI interface'
diff --git a/drivers/char/ipmi/Makefile b/drivers/char/ipmi/Makefile
index 3f06b20..0822adc 100644
--- a/drivers/char/ipmi/Makefile
+++ b/drivers/char/ipmi/Makefile
@@ -26,3 +26,4 @@ obj-$(CONFIG_IPMI_KCS_BMC) += kcs_bmc.o
 obj-$(CONFIG_ASPEED_BT_IPMI_BMC) += bt-bmc.o
 obj-$(CONFIG_ASPEED_KCS_IPMI_BMC) += kcs_bmc_aspeed.o
 obj-$(CONFIG_NPCM7XX_KCS_IPMI_BMC) += kcs_bmc_npcm7xx.o
+obj-$(CONFIG_IPMB_DEVICE_INTERFACE) += ipmb_dev_int.o
diff --git a/drivers/char/ipmi/ipmb_dev_int.c b/drivers/char/ipmi/ipmb_dev_int.c
new file mode 100644
index 000..a45a686
--- /dev/null
+++ b/drivers/char/ipmi/ipmb_dev_int.c
@@ -0,0 +1,418 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Mellanox IPMB driver to receive a request and send a response
+ *
+ * Copyright (C) 2018 Mellanox Techologies, Ltd.
+ *
+ * This was inspired by Brendan Higgins' ipmi-bmc-bt-i2c driver.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#definePFX "IPMB DEV INT: "
+
+#defineDEVICE_NAME "ipmb-dev"
+
+#defineMAX_MSG_LEN 128
+#defineIPMB_REQUEST_LEN_MIN7
+#defineNETFN_RSP_BIT_MASK  0x4
+#defineREQUEST_QUEUE_MAX_LEN   256
+
+#defineIPMB_MSG_LEN_IDX0
+#defineRQ_SA_8BIT_IDX  1
+#defineNETFN_LUN_IDX   2
+
+#defineIPMB_MSG_PAYLOAD_LEN_MAX \
+   (MAX_MSG_LEN - IPMB_REQUEST_LEN_MIN - 1)
+
+struct ipmb_msg {
+   u8 len;
+   u8 rs_sa;
+   u8 netfn_rs_lun;
+   u8 checksum1;
+   u8 rq_sa;
+   u8 rq_seq_rq_lun;
+   u8 cmd;
+   u8 payload[IPMB_MSG_PAYLOAD_LEN_MAX];
+   /* checksum2 is included in payload */
+} __packed;
+
+static u32 ipmb_msg_len(struct ipmb_msg *ipmb_msg)
+{
+   return ipmb_msg->len + 1;
+}
+
+struct ipmb_request_elem {
+   struct list_head list;
+   struct ipmb_msg request;
+};
+
+struct ipmb_dev {
+   struct i2c_client *client;
+   struct miscdevice miscdev;
+   struct ipmb_msg request;
+   struct list_head request_queue;
+   atomic_t request_queue_len;
+   struct ipmb_msg response;
+   size_t msg_idx;
+   spinlock_t lock;
+   wait_queue_head_t wait_queue;
+   struct mutex file_mutex;
+};
+
+static int receive_ipmb_request(struct ipmb_dev *ipmb_dev_p,
+   bool non_blocking,
+   struct ipmb_msg *ipmb_request)
+{
+   struct ipmb_request_elem *queue_elem;
+   unsigned long flags;
+   int res;
+
+   if (!non_blocking) {
+
+try_again:
+   res = wait_event_interruptible(ipmb_dev_p->wait_queue,
+   atomic_read(&ipmb_dev_p->request_queue_len));
+   if (res)
+   return res;
+   }
+
+   spin_lock_irqsave(&ipmb_dev_p->lock, flags);
+
+   if (!atomic_read(&ipmb_dev_p->request_queue_len)) {
+   spin_unlock_irqrestore(&ipmb_dev_p->lock, flags);
+   if (non_blocking)
+   return -EAGAIN;
+   goto try_again;
+   }
+
+   if (list_empty(&ipmb_dev_p->request_queue)) {
+   pr_err(PFX "request_queue is empty\n");
+   return -EIO;
+   }
+
+   queue_elem = list_first_entry(&ipmb_dev_p->request_queue,
+ 

[PATCH v1 0/1] Add support for IPMB driver

2019-04-12 Thread Asmaa Mnebhi
Support receiving IPMB requests on a Satellite MC from the BMC.
Once a response is ready, this driver will send back a response
to the BMC via the IPMB channel.

Asmaa Mnebhi (1):
  Add support for IPMB driver

 drivers/char/ipmi/Kconfig|   8 +
 drivers/char/ipmi/Makefile   |   1 +
 drivers/char/ipmi/ipmb_dev_int.c | 418 +++
 3 files changed, 427 insertions(+)
 create mode 100644 drivers/char/ipmi/ipmb_dev_int.c

-- 
2.1.2



Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-12 Thread Andy Lutomirski
On Thu, Apr 11, 2019 at 10:51 AM Joel Fernandes (Google)
 wrote:
>
> pidfd are /proc/pid directory file descriptors referring to a task group
> leader. Android low memory killer (LMK) needs pidfd polling support to
> replace code that currently checks for existence of /proc/pid for
> knowing a process that is signalled to be killed has died, which is both
> racy and slow. The pidfd poll approach is race-free, and also allows the
> LMK to do other things (such as by polling on other fds) while awaiting
> the process being killed to die.
>
> It prevents a situation where a PID is reused between when LMK sends a
> kill signal and checks for existence of the PID, since the wrong PID is
> now possibly checked for existence.
>
> In this patch, we follow the same mechanism used uhen the parent of the
> task group is to be notified, that is when the tasks waiting on a poll
> of pidfd are also awakened.
>
> We have decided to include the waitqueue in struct pid for the following
> reasons:
> 1. The wait queue has to survive for the lifetime of the poll. Including
> it in task_struct would not be option in this case because the task can
> be reaped and destroyed before the poll returns.

Are you sure?  I admit I'm not all that familiar with the innards of
poll() on Linux, but I thought that the waitqueue only had to survive
long enough to kick the polling thread and did *not* have to survive
until poll() actually returned.

>
> 2. By including the struct pid for the waitqueue means that during
> de_exec, the thread doing de_thread() automatically gets the new
> waitqueue/pid even though its task_struct is different.

I didn't follow this.  Can you clarify?

Also, please don't call your new helper wake_up_pidfd_pollers().  One
of the goals of my patch was to make it generically possible for
kernel code to wait for a task to exit.  There are other cases besides
pidfd for which this would be useful.  Ahem, kthread.  (The kthread
implementation currently does some seriously awful things to detect
when kthreads die.)  Also, some hypothetical future vastly improved
debugging API (to supercede ptrace for new applications) might want
this.

--Andy


Re: [PATCH v2 1/1] rtc: ds3232: get SRAM access using NVMEM Framework

2019-04-12 Thread Alexandre Belloni
Hi,

On 08/04/2019 08:52:39+, Han Nandor wrote:
>  static int ds3232_probe(struct device *dev, struct regmap *regmap, int irq,
>   const char *name)
>  {
>   struct ds3232 *ds3232;
>   int ret;
> + struct nvmem_config nvmem_cfg;
>  
>   ds3232 = devm_kzalloc(dev, sizeof(*ds3232), GFP_KERNEL);
>   if (!ds3232)
> @@ -476,6 +499,15 @@ static int ds3232_probe(struct device *dev, struct 
> regmap *regmap, int irq,
>   ds3232->dev = dev;
>   dev_set_drvdata(dev, ds3232);
>  
> + nvmem_cfg.name = "ds3232_sram";
> + nvmem_cfg.stride = 1;
> + nvmem_cfg.size = DS3232_REG_SRAM_SIZE;
> + nvmem_cfg.word_size = 1;
> + nvmem_cfg.reg_read = ds3232_nvmem_read;
> + nvmem_cfg.reg_write = ds3232_nvmem_write;
> + nvmem_cfg.priv = ds3232;
> + nvmem_cfg.type = NVMEM_TYPE_BATTERY_BACKED;
> +

You have to either initialize at the declaration as done here:
https://elixir.bootlin.com/linux/v5.0/source/drivers/rtc/rtc-rv8803.c#L530
or memset the struct to 0. Else, this leaves members uninitialized.


-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [PATCH v3 0/9] klp-convert livepatch build tooling

2019-04-12 Thread Joe Lawrence
On Wed, Apr 10, 2019 at 11:50:49AM -0400, Joe Lawrence wrote:
> Hi folks,
> 
> This is the third installment of the klp-convert tool for generating and
> processing livepatch symbols for livepatch module builds.  For those
> following along at home, archive links to previous versions:
> 
> RFC:
>   https://lore.kernel.org/lkml/cover.1477578530.git.jpoim...@redhat.com/
> v2:
>   
> https://lore.kernel.org/lkml/f52d29f7-7d1b-ad3d-050b-a9fa8878f...@redhat.com/
> 
> (Note that I don't see v2 archived at lore, but that is a link to the
> most recent subthread that lore did catch.)
> 
> 
> Livepatches may use symbols which are not contained in its own scope,
> and, because of that, may end up compiled with relocations that will
> only be resolved during module load. Yet, when the referenced symbols are
> not exported, solving this relocation requires information on the object
> that holds the symbol (either vmlinux or modules) and its position inside
> the object, as an object may contain multiple symbols with the same name.
> Providing such information must be done accordingly to what is specified
> in Documentation/livepatch/module-elf-format.txt.
> 
> Currently, there is no trivial way to embed the required information as
> requested in the final livepatch elf object. klp-convert solves this
> problem in two different forms: (i) by relying on a symbol map, which is
> built during kernel compilation, to automatically infer the relocation
> targeted symbol, and, when such inference is not possible (ii) by using
> annotations in the elf object to convert the relocation accordingly to
> the specification, enabling it to be handled by the livepatch loader.
> 
> Given the above, add support for symbol mapping in the form of
> Symbols.list file; add klp-convert tool; integrate klp-convert tool into
> kbuild; make livepatch modules discernible during kernel compilation
> pipeline; add data-structure and macros to enable users to annotate
> livepatch source code; make modpost stage compatible with livepatches;
> update livepatch-sample and update documentation.
> 
> The patch was tested under three use-cases:
> 
> use-case 1: There is a relocation in the lp that can be automatically
> resolved by klp-convert.  For example. see the saved_command_line
> variable in lib/livepatch/test_klp_convert2.c.
> 
> use-case 2: There is a relocation in the lp that cannot be automatically
> resolved, as the name of the respective symbol appears in multiple
> objects. The livepatch contains an annotation to enable a correct
> relocation.  See the KLP_MODULE_RELOC / KLP_SYMPOS annotation sections
> in lib/livepatch/test_klp_convert{1,2}.c.
> 
> use-case 3: There is a relocation in the lp that cannot be automatically
> resolved similarly as 2, but no annotation was provided in the
> livepatch, triggering an error during compilation.  Reproducible by
> removing the KLP_MODULE_RELOC / KLP_SYMPOS annotation sections in
> lib/livepatch/test_klp_convert{1,2}.c.
> 
> 
> Branches
> 
> 
> Since I spent some time tinkering with v2 and accumulated a bunch of
> fixes, I collected them up and am posting this new version.  For review
> purposes, I posted two branches up to github:
> 
>   1 - an expanded branch that with changes separate from the original
>   https://github.com/joe-lawrence/linux/commits/klp-convert-v3-expanded
> 
>   2 - a squashed branch of (1) that comprises v3:
>   https://github.com/joe-lawrence/linux/commits/klp-convert-v3
> 
> Non-trivial commits in the expanded branch have some extra commentary
> and details for debugging in the commit message that were dropped when
> squashing into their respective parent commits.
> 
> 
> TODO
> 
> 
> There are a few outstanding items that I came across while reviewing v2,
> but as changes started accumulating, it made sense to defer those to a
> later version.  I'll reply to this thread individual topics to start
> discussion sub-threads for those.
> 
> 
> Changelogs
> --
> 
> The commit changelogs were getting long, so I've moved them here for
> posterity and review purposes:
> 
> livepatch: Create and include UAPI headers
>   v2:
>   - jmoreira: split up and changelog
>   v3:
>   - joe: convert to SPDX license tags
> 
> kbuild: Support for Symbols.list creation
>   v3:
>   - jmoreira: adjust for multiobject livepatch
>   - joe: add klpclean to PHONY
>   - joe: align KLP prefix
> 
> livepatch: Add klp-convert tool
>   v2:
>   - khlebnikov: use HOSTLOADLIBES_ instead of HOSTLDFLAGS: -lelf must be
> at the end
>   - jmoreira: add support to automatic relocation conversion in
> klp-convert.c, changelog
>   v3:
>   - joe: convert to SPDX license tags
>   - jmoreira: add rela symbol name to WARNs
>   - jmoreira: ignore relocations to .TOC and symbols with index 0
>   - joe: separate and fix valid_sympos() sympos=0 and sympos=1..n checks
>   - joe: fix symbol use-after-frees
>   - joe: fix remaining valgrind leak complaints (non-error paths only)
>   - joe: checkpatc

[ANNOUNCE] v5.0.7-rt5

2019-04-12 Thread Sebastian Andrzej Siewior
Dear RT folks!

I'm pleased to announce the v5.0.7-rt5 patch set. 

Changes since v5.0.7-rt4:

  - Update "x86: load FPU registers on return to userland" from v7 to
v9.

  - Update "clocksource: improve Atmel TCB timer driver" from v7 to
latest post by Alexandre Belloni. I hope this works, my HW refuses
to cooperate so I can't verify.

  - Avoid allocating a spin lock with disabled interrupts in i915.

Known issues
 - A warning triggered in "rcu_note_context_switch" originated from
   SyS_timer_gettime(). The issue was always there, it is now
   visible. Reported by Grygorii Strashko and Daniel Wagner.

 - rcutorture is currently broken on -RT. Reported by Juri Lelli.

The delta patch against v5.0.7-rt4 is appended below and can be found here:
 
 
https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.0/incr/patch-5.0.7-rt4-rt5.patch.xz

You can get this release via the git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git 
v5.0.7-rt5

The RT patch against v5.0.7 can be found here:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.0/older/patch-5.0.7-rt5.patch.xz

The split quilt queue is available at:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.0/older/patches-5.0.7-rt5.tar.xz

Sebastian

diff --git a/arch/arm/configs/at91_dt_defconfig 
b/arch/arm/configs/at91_dt_defconfig
index f4b253bd05ede..c8876d0ca41a8 100644
--- a/arch/arm/configs/at91_dt_defconfig
+++ b/arch/arm/configs/at91_dt_defconfig
@@ -65,6 +65,7 @@ CONFIG_BLK_DEV_LOOP=y
 CONFIG_BLK_DEV_RAM=y
 CONFIG_BLK_DEV_RAM_COUNT=4
 CONFIG_BLK_DEV_RAM_SIZE=8192
+CONFIG_ATMEL_TCLIB=y
 CONFIG_ATMEL_SSC=y
 CONFIG_SCSI=y
 CONFIG_BLK_DEV_SD=y
diff --git a/arch/arm/configs/sama5_defconfig b/arch/arm/configs/sama5_defconfig
index c2dc35dfb3215..10ebc9481f72c 100644
--- a/arch/arm/configs/sama5_defconfig
+++ b/arch/arm/configs/sama5_defconfig
@@ -76,6 +76,7 @@ CONFIG_BLK_DEV_LOOP=y
 CONFIG_BLK_DEV_RAM=y
 CONFIG_BLK_DEV_RAM_COUNT=4
 CONFIG_BLK_DEV_RAM_SIZE=8192
+CONFIG_ATMEL_TCLIB=y
 CONFIG_ATMEL_SSC=y
 CONFIG_EEPROM_AT24=y
 CONFIG_SCSI=y
diff --git a/arch/arm/mach-at91/Kconfig b/arch/arm/mach-at91/Kconfig
index fa493a86e2bb3..da1d97a06c53a 100644
--- a/arch/arm/mach-at91/Kconfig
+++ b/arch/arm/mach-at91/Kconfig
@@ -121,10 +121,8 @@ config ATMEL_CLOCKSOURCE_PIT
 
 config ATMEL_CLOCKSOURCE_TCB
bool "Timer Counter Blocks (TCB) support"
-   depends on SOC_AT91RM9200 || SOC_AT91SAM9 || SOC_SAMA5 || COMPILE_TEST
default SOC_AT91RM9200 || SOC_AT91SAM9 || SOC_SAMA5
-   depends on !ATMEL_TCLIB
-   select ATMEL_ARM_TCB_CLKSRC
+   select ATMEL_TCB_CLKSRC
help
  Select this to get a high precision clocksource based on a
  TC block with a 5+ MHz base clock rate.
diff --git a/arch/x86/include/asm/fpu/internal.h 
b/arch/x86/include/asm/fpu/internal.h
index 749ee389a1178..33e2294b5a675 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -120,7 +121,7 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
err;\
 })
 
-#define kernel_insn_norestore(insn, output, input...)  \
+#define kernel_insn_err(insn, output, input...)
\
 ({ \
int err;\
asm volatile("1:" #insn "\n\t"  \
@@ -141,6 +142,22 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
 _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore)  \
 : output : input)
 
+static inline int copy_fregs_to_user(struct fregs_state __user *fx)
+{
+   return user_insn(fnsave %[fx]; fwait,  [fx] "=m" (*fx), "m" (*fx));
+}
+
+static inline int copy_fxregs_to_user(struct fxregs_state __user *fx)
+{
+   if (IS_ENABLED(CONFIG_X86_32))
+   return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
+   else if (IS_ENABLED(CONFIG_AS_FXSAVEQ))
+   return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
+
+   /* See comment in copy_fxregs_to_kernel() below. */
+   return user_insn(rex64/fxsave (%[fx]), "=m" (*fx), [fx] "R" (fx));
+}
+
 static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
 {
if (IS_ENABLED(CONFIG_X86_32)) {
@@ -155,15 +172,23 @@ static inline void copy_kernel_to_fxregs(struct 
fxregs_state *fx)
}
 }
 
-static inline int copy_users_to_fxregs(struct fxregs_state *fx)
+static inline int copy_kernel_to_fxregs_err(struct fxregs_state *fx)
 {
if (IS_ENABLED(CONFIG_X86_32))
-   return kernel_insn_norestore(fxrstor %[fx], "=m" (*fx), [fx] 
"m" (*fx));
+   return kernel_insn_err(fxrstor %[fx], "=m" (*fx), [fx] "m" 
(*fx));
+   else
+ 

[PATCH v4] chrome/platform: cros_ec_proto: Add trace event to trace EC commands

2019-04-12 Thread Raul E Rangel
This is useful to see which EC commands are being executed and when.

To enable:

echo 'cros_ec:*' >> /sys/kernel/debug/tracing/set_event

Example:

/* cros_ec_cmd: version: 0, command: EC_CMD_GET_VERSION */
/* cros_ec_cmd: version: 0, command: EC_CMD_GET_PROTOCOL_INFO */
/* cros_ec_cmd: version: 1, command: EC_CMD_GET_CMD_VERSIONS */
/* cros_ec_cmd: version: 1, command: EC_CMD_USB_PD_CONTROL */

Signed-off-by: Raul E Rangel 
Reviewed-by: Ross Zwisler 
---

Changes in v4:
- Use the full command name so go to definition continues to work in my
editor.

Changes in v3:
- Use a macro to avoid duplicating the ec command names.

Changes in v2:
- Changed comment style to match other cros_ec files.
- Fixed commit tag.

 drivers/platform/chrome/Makefile|   4 +-
 drivers/platform/chrome/cros_ec_proto.c |   4 +
 drivers/platform/chrome/cros_ec_trace.c | 162 
 drivers/platform/chrome/cros_ec_trace.h |  51 
 4 files changed, 220 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/chrome/cros_ec_trace.c
 create mode 100644 drivers/platform/chrome/cros_ec_trace.h

diff --git a/drivers/platform/chrome/Makefile b/drivers/platform/chrome/Makefile
index 1e2f0029b597..e542268454a4 100644
--- a/drivers/platform/chrome/Makefile
+++ b/drivers/platform/chrome/Makefile
@@ -3,12 +3,14 @@
 obj-$(CONFIG_CHROMEOS_LAPTOP)  += chromeos_laptop.o
 obj-$(CONFIG_CHROMEOS_PSTORE)  += chromeos_pstore.o
 obj-$(CONFIG_CHROMEOS_TBMC)+= chromeos_tbmc.o
+# tell define_trace.h where to find the cros ec trace header
+CFLAGS_cros_ec_trace.o:=   -I$(src)
 obj-$(CONFIG_CROS_EC_I2C)  += cros_ec_i2c.o
 obj-$(CONFIG_CROS_EC_SPI)  += cros_ec_spi.o
 cros_ec_lpcs-objs  := cros_ec_lpc.o cros_ec_lpc_reg.o
 cros_ec_lpcs-$(CONFIG_CROS_EC_LPC_MEC) += cros_ec_lpc_mec.o
 obj-$(CONFIG_CROS_EC_LPC)  += cros_ec_lpcs.o
-obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o
+obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o cros_ec_trace.o
 obj-$(CONFIG_CROS_KBD_LED_BACKLIGHT)   += cros_kbd_led_backlight.o
 obj-$(CONFIG_CROS_EC_LIGHTBAR) += cros_ec_lightbar.o
 obj-$(CONFIG_CROS_EC_VBC)  += cros_ec_vbc.o
diff --git a/drivers/platform/chrome/cros_ec_proto.c 
b/drivers/platform/chrome/cros_ec_proto.c
index 97a068dff192..3d02c8259ac6 100644
--- a/drivers/platform/chrome/cros_ec_proto.c
+++ b/drivers/platform/chrome/cros_ec_proto.c
@@ -10,6 +10,8 @@
 #include 
 #include 
 
+#include "cros_ec_trace.h"
+
 #define EC_COMMAND_RETRIES 50
 
 static int prepare_packet(struct cros_ec_device *ec_dev,
@@ -51,6 +53,8 @@ static int send_command(struct cros_ec_device *ec_dev,
int ret;
int (*xfer_fxn)(struct cros_ec_device *ec, struct cros_ec_command *msg);
 
+   trace_cros_ec_cmd(msg);
+
if (ec_dev->proto_version > 2)
xfer_fxn = ec_dev->pkt_xfer;
else
diff --git a/drivers/platform/chrome/cros_ec_trace.c 
b/drivers/platform/chrome/cros_ec_trace.c
new file mode 100644
index ..21c497e7a2d1
--- /dev/null
+++ b/drivers/platform/chrome/cros_ec_trace.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0
+// Trace events for the ChromeOS Embedded Controller
+//
+// Copyright 2019 Google LLC.
+
+#define TRACE_SYM(a) {a, #a}
+
+#define ec_cmds \
+   TRACE_SYM(EC_CMD_PROTO_VERSION), \
+   TRACE_SYM(EC_CMD_HELLO), \
+   TRACE_SYM(EC_CMD_GET_VERSION), \
+   TRACE_SYM(EC_CMD_READ_TEST), \
+   TRACE_SYM(EC_CMD_GET_BUILD_INFO), \
+   TRACE_SYM(EC_CMD_GET_CHIP_INFO), \
+   TRACE_SYM(EC_CMD_GET_BOARD_VERSION), \
+   TRACE_SYM(EC_CMD_READ_MEMMAP), \
+   TRACE_SYM(EC_CMD_GET_CMD_VERSIONS), \
+   TRACE_SYM(EC_CMD_GET_COMMS_STATUS), \
+   TRACE_SYM(EC_CMD_TEST_PROTOCOL), \
+   TRACE_SYM(EC_CMD_GET_PROTOCOL_INFO), \
+   TRACE_SYM(EC_CMD_GSV_PAUSE_IN_S5), \
+   TRACE_SYM(EC_CMD_GET_FEATURES), \
+   TRACE_SYM(EC_CMD_GET_SKU_ID), \
+   TRACE_SYM(EC_CMD_SET_SKU_ID), \
+   TRACE_SYM(EC_CMD_FLASH_INFO), \
+   TRACE_SYM(EC_CMD_FLASH_READ), \
+   TRACE_SYM(EC_CMD_FLASH_WRITE), \
+   TRACE_SYM(EC_CMD_FLASH_ERASE), \
+   TRACE_SYM(EC_CMD_FLASH_PROTECT), \
+   TRACE_SYM(EC_CMD_FLASH_REGION_INFO), \
+   TRACE_SYM(EC_CMD_VBNV_CONTEXT), \
+   TRACE_SYM(EC_CMD_FLASH_SPI_INFO), \
+   TRACE_SYM(EC_CMD_FLASH_SELECT), \
+   TRACE_SYM(EC_CMD_PWM_GET_FAN_TARGET_RPM), \
+   TRACE_SYM(EC_CMD_PWM_SET_FAN_TARGET_RPM), \
+   TRACE_SYM(EC_CMD_PWM_GET_KEYBOARD_BACKLIGHT), \
+   TRACE_SYM(EC_CMD_PWM_SET_KEYBOARD_BACKLIGHT), \
+   TRACE_SYM(EC_CMD_PWM_SET_FAN_DUTY), \
+   TRACE_SYM(EC_CMD_PWM_SET_DUTY), \
+   TRACE_SYM(EC_CMD_PWM_GET_DUTY), \
+   TRACE_SYM(EC_CMD_LIGHTBAR_CMD), \
+   TRACE_SYM(EC_CMD_LED_CONTROL), \
+   TRACE_SYM(EC_CMD_VBOOT_HASH), \
+   TRACE_SYM(EC_CMD_MOTION_SENSE_CMD), \
+   TRACE_SYM(EC_CMD_FORCE_LID_OPE

Re: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

2019-04-12 Thread Andy Lutomirski
On Thu, Apr 11, 2019 at 10:36 PM Reshetova, Elena
 wrote:
>
> > On Wed, Apr 10, 2019 at 3:24 AM Reshetova, Elena
> >  wrote:
> > >
> > >
> > > > > > On Mon, Apr 08, 2019 at 09:13:58AM +0300, Elena Reshetova wrote:
> > > > > > > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > > > > > > index 7bc105f47d21..38ddc213a5e9 100644
> > > > > > > --- a/arch/x86/entry/common.c
> > > > > > > +++ b/arch/x86/entry/common.c
> > > > > > > @@ -35,6 +35,12 @@
> > > > > > >  #define CREATE_TRACE_POINTS
> > > > > > >  #include 
> > > > > > >
> > > > > > > +#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
> > > > > > > +#include 
> > > > > > > +
> > > > > > > +void *alloca(size_t size);
> > > > > > > +#endif
> > > > > > > +
> > > > > > >  #ifdef CONFIG_CONTEXT_TRACKING
> > > > > > >  /* Called on entry from user mode with IRQs off. */
> > > > > > >  __visible inline void enter_from_user_mode(void)
> > > > > > > @@ -273,6 +279,13 @@ __visible void do_syscall_64(unsigned long 
> > > > > > > nr,
> > struct
> > > > > pt_regs *regs)
> > > > > > >  {
> > > > > > > struct thread_info *ti;
> > > > > > >
> > > > > > > +#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
> > > > > > > +   size_t offset = ((size_t)prandom_u32()) % 256;
> > > > > > > +   char *ptr = alloca(offset);
> > > > > > > +
> > > > > > > +   asm volatile("":"=m"(*ptr));
> > > > > > > +#endif
> > > > > > > +
> > > > > > > enter_from_user_mode();
> > > > > > > local_irq_enable();
> > > > > > > ti = current_thread_info();
> > > > > >
> > > > > > Would it make sense to also do this for the compat syscalls
> > > > > > (do_fast_syscall_32, do_int80_syscall_32)?
> > > > >
> > > > > Could someone please include the full patch, with justification and
> > > > > performance impact analysis etc.? Can only find the code part of the
> > > > > thread on lkml, which leaves out this context.
> > > > >
> > > >
> > > > Sorry, this is very weird, I cannot find it either from lkml, but it 
> > > > was sent there
> > > > to begin with (and as visible from reply-to headers).
> > > >
> > > > Do you want me to resent original version or with "do_fast_syscall_32,
> > > > do_int80_syscall_32" additions (I am finishing testing them now).
> > >
> > > I will resend the original x86_64 now since this is the one I tested and
> > > measured properly. The 32 bit changes seem to work fine inside my 32 bit 
> > > VM,
> > > but since I don't have any real 32 bit HW, I am hesitant to send them out 
> > > without
> > > real HW testing and measuring.
> > >
> > > This is the asm code for 32 bits (note it requires __builtin_alloca 
> > > definition and not
> > just alloca,
> > > so I will change the 64 bit version to use it also):
> > >
> > > #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
> > > size_t offset = ((size_t)prandom_u32()) % 256;
> > > 0xc10025b6 call   0xc146f7d0 
> > > 0xc10025bb movzbl %al,%eax
> > > char *ptr = __builtin_alloca(offset);
> > > 0xc10025be add$0x12,%eax
> > > 0xc10025c1 and$0x1fc,%eax
> > > 0xc10025c6 sub%eax,%esp
> > > 0xc10025c8 lea0x27(%esp),%eax
> > > 0xc10025cc and$0xfff0,%eax
> > >
> > > Also, the result is 47 different random offsets produced,
> > > which is slightly better than 33 offsets for x86_64.
> > >
> >
> > I would suggest that you macro-ify this thing:
> >
> > #ifdef WHATEVER
> > #define add_random_stack_offset() do { void *addr = ... } while (0)
> > #else
> > #define add_random_stack_offset() do {} while (0)
> > #endif
> >
> > since you'll end up with more than one call site.
>
> Sure, will do. So, you are ok for this to be also called from 
> do_fast_syscall_32
> and do_int80_syscall_32? I can send the resulting patch, just cannot test on 
> any
> real 32 bit HW, only VM.
>

Sounds good to me.


Re: [PATCH 15/17] fpga: dfl: fme: add power management support

2019-04-12 Thread Moritz Fischer
Hi Hao,

this looks suspiciously like a hwmon driver ;-)

https://www.kernel.org/doc/Documentation/hwmon/hwmon-kernel-api.txt

Cheers,
Moritz


On Thu, Apr 11, 2019 at 1:08 PM Alan Tull  wrote:
>
> On Sun, Mar 24, 2019 at 10:24 PM Wu Hao  wrote:
>
> Hi Hao,
>
> >
> > This patch adds support for power management private feature under
> > FPGA Management Engine (FME), sysfs interfaces are introduced for
> > different power management functions, users could use these sysfs
> > interface to get current number of consumed power, throttling
>
> How about
> s/number/measurement/
> ?
>
> > thresholds, threshold status and other information, and configure
> > different value for throttling thresholds too.
> >
> > Signed-off-by: Luwei Kang 
> > Signed-off-by: Xu Yilun 
> > Signed-off-by: Wu Hao 
> > ---
> >  Documentation/ABI/testing/sysfs-platform-dfl-fme |  56 +
> >  drivers/fpga/dfl-fme-main.c  | 257 
> > +++
> >  2 files changed, 313 insertions(+)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme 
> > b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > index d3aeb88..4b6448f 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > @@ -100,3 +100,59 @@ Description:   Read-only. Read this file to get 
> > the policy of temperature
> > threshold1. It only supports two value (policy):
> > 0 - AP2 state (90% throttling)
> > 1 - AP1 state (50% throttling)
> > +
> > +What:  /sys/bus/platform/devices/dfl-fme.0/power_mgmt/consumed
> > +Date:  March 2019
> > +KernelVersion:  5.2
> > +Contact:   Wu Hao 
> > +Description:   Read-only. It returns current power consumed by FPGA.
>
> What are the units?
>
> > +
> > +What:  /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1
> > +Date:  March 2019
> > +KernelVersion:  5.2
> > +Contact:   Wu Hao 
> > +Description:   Read-Write. Read/Write this file to get/set current power
> > +   threshold1 in Watts.
>
> Perhaps document error codes here and for threshold2 below.
>
> > +
> > +What:  /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2
> > +Date:  March 2019
> > +KernelVersion:  5.2
> > +Contact:   Wu Hao 
> > +Description:   Read-Write. Read/Write this file to get/set current power
> > +   threshold2 in Watts.
> > +
> > +What:  
> > /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1_status
> > +Date:  March 2019
> > +KernelVersion:  5.2
> > +Contact:   Wu Hao 
> > +Description:   Read-only. It returns 1 if power consumption reaches the
> > +   threshold1, otherwise 0.
>
> I'm used to things like this requiring user to reset the status, so it
> may be worth making it explicit that it will return to zero if
> consumption drops below threshold if that's what's happening here.
> If it's correct, perhaps could just say something like 'returns 1 if
> power consumption is currently at or above threshold1, otherwise 0'
>
> > +
> > +What:  
> > /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2_status
> > +Date:  March 2019
> > +KernelVersion:  5.2
> > +Contact:   Wu Hao 
> > +Description:   Read-only. It returns 1 if power consumption reaches the
> > +   threshold2, otherwise 0.
>
> Same here.
>
> > +
> > +What:  /sys/bus/platform/devices/dfl-fme.0/power_mgmt/ltr
> > +Date:  March 2019
> > +KernelVersion:  5.2
> > +Contact:   Wu Hao 
> > +Description:   Read-only. Read this file to get current Latency Tolerance
> > +   Reporting (ltr) value, it's only valid for integrated
> > +   solution as it blocks CPU on low power state.
>
> If we're not on the integrated solution, it returns a value but it is
> not really real?
>
> > +
> > +What:  /sys/bus/platform/devices/dfl-fme.0/power_mgmt/xeon_limit
> > +Date:  March 2019
> > +KernelVersion:  5.2
> > +Contact:   Wu Hao 
> > +Description:   Read-only. Read this file to get power limit for xeon, it
> > +   is only valid for integrated solution.
> > +
> > +What:  /sys/bus/platform/devices/dfl-fme.0/power_mgmt/fpga_limit
> > +Date:  March 2019
> > +KernelVersion:  5.2
> > +Contact:   Wu Hao 
> > +Description:   Read-only. Read this file to get power limit for fpga, it
> > +   is only valid for integrated solution.
> > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > index 449a17d..dafa6580 100644
> > --- a/drivers/fpga/dfl-fme-main.c
> > +++ b/drivers/fpga/dfl-fme-main.c
> > @@ -415,6 +415,259 @@ static const struct dfl_feature_ops 
> > fme_thermal_mgmt_ops = {
> > .uinit = fme_thermal_mgmt_uinit,
> >  };
> >
> > +#define FME_PWR_STATUS 0x8
> > +#define FME_LATENCY_TOLERANCE  BIT_ULL(18)
> > +#define PWR_CONSUMED   GENMASK_ULL

Re: [RFC 2/2] signal: extend pidfd_send_signal() to allow expedited process killing

2019-04-12 Thread Matthew Wilcox
On Thu, Apr 11, 2019 at 10:47:50AM -0700, Daniel Colascione wrote:
> On Thu, Apr 11, 2019 at 10:36 AM Matthew Wilcox  wrote:
> > It's not a question of the kernel deciding what the right signal is.
> > The kernel knows whether a signal is fatal to a particular process or not.
> > The question is whether the killing process should do the work of reaping
> > the dying process's resources sometimes, always or never.  Currently,
> > that is never (the process reaps its own resources); Suren is suggesting
> > sometimes, and I'm asking "Why not always?"
> 
> FWIW, Suren's initial proposal is that the oom_reaper kthread do the
> reaping, not the process sending the kill. Are you suggesting that
> sending SIGKILL should spend a while in signal delivery reaping pages
> before returning? I thought about just doing it this way, but I didn't
> like the idea: it'd slow down mass-killing programs like killall(1).
> Programs expect sending SIGKILL to be a fast operation that returns
> immediately.

Suren said that the implementation in this proposal wasn't to be taken
literally.

You've raised some good points here though.  Do mass-killing programs
like kill with a pgid or killall expect the signal-sending process to
be fast, or do they not really care?

>From the kernel's point of view, the same work has to be done, no matter
whether the killer or the victim does it.  Should the work be accounted
to the killer or the victim?  It probably doesn't matter.  The victim
doing the work allows for parallelisation, but I'm not convinced that's
important either.

I see another advantage for the killer doing the work -- if the task
is currently blocking on I/O (eg to an NFS mount that has gone away),
the killer can get rid of the task's page tables.  If we have to wait
for the I/O to complete before the victim reaps its own page tables,
we may be waiting forever.


Re: [PATCH v2 2/2] PCI: Clean up resource_alignment parameter to not require static buffer

2019-04-12 Thread Logan Gunthorpe



On 2019-04-12 2:44 p.m., Bjorn Helgaas wrote:
> On Wed, Apr 10, 2019 at 03:05:32PM -0600, Logan Gunthorpe wrote:
>> Clean up the 'resource_alignment' parameter code to use kstrdup
>> in the initcall routine instead of a static buffer that wastes memory
>> regardless of whether the feature is used. This allows us to drop
>> 'COMMAND_LINE_SIZE' bytes (typically 256-4096 depending on architecture)
>> of static data.
>>
>> This is similar to what has been done for the 'disable_acs_redir'
>> parameter.
>>
>> This conversion also allows us to use RCU instead of the spinlock to
>> deal with the concurrency issue which further reduces memory usage.
> 
> I'm unconvinced about this part.  Spinlocks are CS 101 material and
> I'm a little hesitant to use a graduate-level technique like RCU in a
> case where it doesn't really buy us much -- we don't need the
> performance advantage and the size advantage seems minimal.  But I'm
> an RCU ignoramus and maybe need to be educated.

That's a reasonable point. I didn't think it was that difficult and the
kernel's RCU API is pretty straightforward. But I can resubmit later
keeping the spinlock. You're right that it's not that big of a gain.

>> As part of the clean up we also squash pci_get_resource_alignment_param()
>> into resource_alignment_show() and pci_set_resource_alignment_param()
>> into resource_alignment_store() seeing these functions only had one
>> caller and the show/store wrappers were needlessly thin.
> 
> Squashing makes sense and would be nice as a separate patch.

Ok, will do.

Logan


Re: [PATCH v2 10/11] platform/x86: asus-wmi: Switch fan boost mode

2019-04-12 Thread Yurii Pavlovskyi
On 12.04.19 10:03, Daniel Drake wrote:
> On Thu, Apr 11, 2019 at 1:47 PM Yurii Pavlovskyi
>  wrote:
>> * 0x00 - is normal,
>> * 0x01 - is obviously turbo by the amount of noise, might be useful to
>> avoid CPU frequency throttling on high load,
>> * 0x02 - the meaning is unknown at the time as modes are not named
>> in the vendor documentation, but it does look like a quiet mode as CPU
>> temperature does increase about 10 degrees on maximum load.
> 
> I'm curious which vendor documentation you're working with here.

That would be user manual for FX505 series, which is pretty minimalistic.
It says it has a fan mode switch key and that's it. Following up on your
comment, I've searched more and actually did found their names hidden in
marketing web page on the website. You're right, they call them balanced,
overboost and silent there.

> From the spec,
> 0 = normal
> 1 = overboost
> 2 = silent

> Also you can use DSTS on the 0x00110018 device to check the exact
> capabilities supported, which you should use to refine which modes can
> be cycled through.
> Bit 0 = overboost supported
> Bit 1 = silent supported
> 
> Thanks
> Daniel
> 

Thanks! I was guessing that the 3 in DSTS must've meant something.

Appreciate your comments! Will definitely implement them. I'm going to post
the v3 patch series approximately middle of next week.

Best regards,
Yurii


Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty

2019-04-12 Thread Shakeel Butt
On Fri, Apr 12, 2019 at 1:16 PM Roman Gushchin  wrote:
>
> On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote:
> > On Fri, Apr 12, 2019 at 8:15 AM Johannes Weiner  wrote:
> > >
> > > Right now, when somebody needs to know the recursive memory statistics
> > > and events of a cgroup subtree, they need to walk the entire subtree
> > > and sum up the counters manually.
> > >
> > > There are two issues with this:
> > >
> > > 1. When a cgroup gets deleted, its stats are lost. The state counters
> > > should all be 0 at that point, of course, but the events are not. When
> > > this happens, the event counters, which are supposed to be monotonic,
> > > can go backwards in the parent cgroups.
> > >
> >
> > We also faced this exact same issue as well and had the similar solution.
> >
> > > 2. During regular operation, we always have a certain number of lazily
> > > freed cgroups sitting around that have been deleted, have no tasks,
> > > but have a few cache pages remaining. These groups' statistics do not
> > > change until we eventually hit memory pressure, but somebody watching,
> > > say, memory.stat on an ancestor has to iterate those every time.
> > >
> > > This patch addresses both issues by introducing recursive counters at
> > > each level that are propagated from the write side when stats change.
> > >
> > > Upward propagation happens when the per-cpu caches spill over into the
> > > local atomic counter. This is the same thing we do during charge and
> > > uncharge, except that the latter uses atomic RMWs, which are more
> > > expensive; stat changes happen at around the same rate. In a sparse
> > > file test (page faults and reclaim at maximum CPU speed) with 5 cgroup
> > > nesting levels, perf shows __mod_memcg_page state at ~1%.
> > >
> >
> > (Unrelated to this patchset) I think there should also a way to get
> > the exact memcg stats. As the machines are getting bigger (more cpus
> > and larger basic page size) the accuracy of stats are getting worse.
> > Internally we have an additional interface memory.stat_exact for that.
> > However I am not sure in the upstream kernel will an additional
> > interface is better or something like /proc/sys/vm/stat_refresh which
> > sync all per-cpu stats.
>
> I was thinking about eventually consistent counters: sync them periodically
> from a worker thread. It should keep the cost of reading small, but
> should increase the accuracy. Will it work for you?

Worker thread based solution seems fine to me but Johannes said it
would be best to not traverse the whole tree every few seconds.


Re: [PATCH v2 2/2] PCI: Clean up resource_alignment parameter to not require static buffer

2019-04-12 Thread Bjorn Helgaas
On Wed, Apr 10, 2019 at 03:05:32PM -0600, Logan Gunthorpe wrote:
> Clean up the 'resource_alignment' parameter code to use kstrdup
> in the initcall routine instead of a static buffer that wastes memory
> regardless of whether the feature is used. This allows us to drop
> 'COMMAND_LINE_SIZE' bytes (typically 256-4096 depending on architecture)
> of static data.
> 
> This is similar to what has been done for the 'disable_acs_redir'
> parameter.
> 
> This conversion also allows us to use RCU instead of the spinlock to
> deal with the concurrency issue which further reduces memory usage.

I'm unconvinced about this part.  Spinlocks are CS 101 material and
I'm a little hesitant to use a graduate-level technique like RCU in a
case where it doesn't really buy us much -- we don't need the
performance advantage and the size advantage seems minimal.  But I'm
an RCU ignoramus and maybe need to be educated.

> As part of the clean up we also squash pci_get_resource_alignment_param()
> into resource_alignment_show() and pci_set_resource_alignment_param()
> into resource_alignment_store() seeing these functions only had one
> caller and the show/store wrappers were needlessly thin.

Squashing makes sense and would be nice as a separate patch.

> Signed-off-by: Logan Gunthorpe 
> Cc: Bjorn Helgaas 
> ---
>  drivers/pci/pci.c | 89 ---
>  1 file changed, 53 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 766f5779db92..13767c2409ae 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5896,9 +5896,7 @@ resource_size_t __weak pcibios_default_alignment(void)
>   return 0;
>  }
>  
> -#define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
> -static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
> -static DEFINE_SPINLOCK(resource_alignment_lock);
> +static const char __rcu *resource_alignment_param;
>  
>  /**
>   * pci_specified_resource_alignment - get resource alignment specified by 
> user.
> @@ -5916,9 +5914,9 @@ static resource_size_t 
> pci_specified_resource_alignment(struct pci_dev *dev,
>   const char *p;
>   int ret;
>  
> - spin_lock(&resource_alignment_lock);
> - p = resource_alignment_param;
> - if (!*p && !align)
> + rcu_read_lock();
> + p = rcu_dereference(resource_alignment_param);
> + if (!p)
>   goto out;
>   if (pci_has_flag(PCI_PROBE_ONLY)) {
>   align = 0;
> @@ -5956,7 +5954,7 @@ static resource_size_t 
> pci_specified_resource_alignment(struct pci_dev *dev,
>   p++;
>   }
>  out:
> - spin_unlock(&resource_alignment_lock);
> + rcu_read_unlock();
>   return align;
>  }
>  
> @@ -6082,35 +6080,48 @@ void pci_reassigndev_resource_alignment(struct 
> pci_dev *dev)
>   }
>  }
>  
> -static ssize_t pci_set_resource_alignment_param(const char *buf, size_t 
> count)
> +static ssize_t resource_alignment_show(struct bus_type *bus, char *buf)
>  {
> - if (count > RESOURCE_ALIGNMENT_PARAM_SIZE - 1)
> - count = RESOURCE_ALIGNMENT_PARAM_SIZE - 1;
> - spin_lock(&resource_alignment_lock);
> - strncpy(resource_alignment_param, buf, count);
> - resource_alignment_param[count] = '\0';
> - spin_unlock(&resource_alignment_lock);
> - return count;
> -}
> + const char *p;
> + size_t count = 0;
>  
> -static ssize_t pci_get_resource_alignment_param(char *buf, size_t size)
> -{
> - size_t count;
> - spin_lock(&resource_alignment_lock);
> - count = snprintf(buf, size, "%s", resource_alignment_param);
> - spin_unlock(&resource_alignment_lock);
> - return count;
> -}
> + rcu_read_lock();
> + p = rcu_dereference(resource_alignment_param);
> + if (!p)
> + goto out;
>  
> -static ssize_t resource_alignment_show(struct bus_type *bus, char *buf)
> -{
> - return pci_get_resource_alignment_param(buf, PAGE_SIZE);
> + count = snprintf(buf, PAGE_SIZE, "%s", p);
> +
> + /*
> +  * When set by the command line there will not be a
> +  * line feed, which is ugly. So conditionally add it here.
> +  */
> + if (buf[count - 2] != '\n' && count < PAGE_SIZE - 1) {
> + buf[count - 1] = '\n';
> + buf[count++] = 0;
> + }
> +
> +out:
> + rcu_read_unlock();
> +
> + return count;
>  }
>  
>  static ssize_t resource_alignment_store(struct bus_type *bus,
>   const char *buf, size_t count)
>  {
> - return pci_set_resource_alignment_param(buf, count);
> + const char *p, *old;
> +
> + p = kstrndup(buf, count, GFP_KERNEL);
> + if (!p)
> + return -ENOMEM;
> +
> + old = rcu_dereference_protected(resource_alignment_param, 1);
> + rcu_assign_pointer(resource_alignment_param, p);
> + synchronize_rcu();
> + kfree(old);
> +
> + return count;
>  }
>  
>  static BUS_ATTR_RW(resource_alignment);
> @@ -6238,8 +6249,8 @@ static 

Re: [PATCH 0/14] v2 multi-die/package topology support

2019-04-12 Thread Brice Goglin
Le 12/04/2019 à 21:52, Len Brown a écrit :
 I think I prefer 's/threads/cpus/g' on that. Threads makes me think SMT,
 and I don't think there's any guarantee the part in question will have
 SMT on.
>>> I think 'threads' is a bit confusing as well. We seem to be using 'cpu'
>>> everywhere for something we can schedule tasks on, including the sysfs
>>> /sys/devices/system/cpu/ subdirs for each SMT thread on SMT systems.
> I agree with Peter and Morten.
> "cpu" is more clear and consistent than "thread" here.
> I'll spin the series with that string changed.


Agreed, I should have used that suffix from the beginning.

Brice




Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty

2019-04-12 Thread Shakeel Butt
On Fri, Apr 12, 2019 at 1:10 PM Johannes Weiner  wrote:
>
> On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote:
> > We also faced this exact same issue as well and had the similar solution.
> >
> > > Signed-off-by: Johannes Weiner 
> >
> > Reviewed-by: Shakeel Butt 
>
> Thanks for the review!
>
> > (Unrelated to this patchset) I think there should also a way to get
> > the exact memcg stats. As the machines are getting bigger (more cpus
> > and larger basic page size) the accuracy of stats are getting worse.
> > Internally we have an additional interface memory.stat_exact for that.
> > However I am not sure in the upstream kernel will an additional
> > interface is better or something like /proc/sys/vm/stat_refresh which
> > sync all per-cpu stats.
>
> I was talking to Roman about this earlier as well and he mentioned it
> would be nice to have periodic flushing of the per-cpu caches. The
> global vmstat has something similar. We might be able to hook into
> those workers, but it would likely require some smarts so we don't
> walk the entire cgroup tree every couple of seconds.
>
> We haven't had any actual problems with the per-cpu fuzziness, mainly
> because the cgroups of interest also grow in size as the machines get
> bigger, and so the relative error doesn't increase.
>

Yes, this is very machine size dependent. We see this issue more often
on larger machines.

> Are your requirements that the error dissipates over time (waiting for
> a threshold convergence somewhere?) or do you have automation that
> gets decisions wrong due to the error at any given point in time?

Not sure about the first one but we do have the second case. The node
controller does make decisions in an online way based on the stats.
Also we do periodically collect and store stats for all jobs across
the fleet. This data is processed (offline) and is used in a lot of
ways. The inaccuracy in the stats do affect all that analysis
particularly for small jobs.


Re: [PATCH v2 1/2] PCI: Fix issue with "pci=disable_acs_redir" parameter being ignored

2019-04-12 Thread Bjorn Helgaas
On Wed, Apr 10, 2019 at 03:05:31PM -0600, Logan Gunthorpe wrote:
> In most cases, kmalloc will not be available early in boot when
> pci_setup() is called. Thus, the kstrdup call that was added to fix the
> __initdata bug with the disable_acs_redir parameter usually returns
> NULL. Thus the parameter is discarded and it does not take into effect.
> 
> To fix this, we store the string that's in initdata until a initcall
> function can allocate the memory appropriately. This way we
> don't need any additional static memory.
> 
> Fixes: d2fd6e81912a ("PCI: Fix __initdata issue with "pci=disable_acs_redir" 
> parameter")
> Signed-off-by: Logan Gunthorpe 
> Cc: Bjorn Helgaas 

Applied to for-linus for v5.1, thanks, Logan!

> ---
>  drivers/pci/pci.c | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7c1b362f599a..766f5779db92 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6262,8 +6262,7 @@ static int __init pci_setup(char *str)
>   } else if (!strncmp(str, "pcie_scan_all", 13)) {
>   pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
>   } else if (!strncmp(str, "disable_acs_redir=", 18)) {
> - disable_acs_redir_param =
> - kstrdup(str + 18, GFP_KERNEL);
> + disable_acs_redir_param = str + 18;
>   } else {
>   printk(KERN_ERR "PCI: Unknown option `%s'\n",
>   str);
> @@ -6274,3 +6273,19 @@ static int __init pci_setup(char *str)
>   return 0;
>  }
>  early_param("pci", pci_setup);
> +
> +/*
> + * 'disable_acs_redir_param' is initialized in pci_setup(), above, to point
> + * to data in the __initdata section which will be freed after the init
> + * sequence is complete. We can't allocate memory in pci_setup() because some
> + * architectures do not have any memory allocation service available during
> + * an early_param() call. So we allocate memory and copy the variable here
> + * before the init section is freed.
> + */
> +static int __init pci_realloc_setup_params(void)
> +{
> + disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
> +
> + return 0;
> +}
> +pure_initcall(pci_realloc_setup_params);
> -- 
> 2.20.1
> 


Re: [PATCH v2 05/11] platform/x86: asus-wmi: Support queued WMI event codes

2019-04-12 Thread Yurii Pavlovskyi



On 12.04.19 09:48, Daniel Drake wrote:
> On Thu, Apr 11, 2019 at 1:44 PM Yurii Pavlovskyi
>  wrote:
>> Event codes are expected to be polled from a queue on at least some
>> models.
> 
> Maybe avoid the word "poll" since it suggests something else (at least to me).

Ok, will change this in the code as well.

>>> The fix flushes the old key codes out of the queue on load and after
>> receiving event the queue is read until either .. or 1 is encountered.
>>
>> It might be considered a minor issue and no normal user would likely to
>> observe this (there is little reason unloading the driver), but it does
>> significantly frustrate a developer who is unlucky enough to encounter
>> this.
>>
>> Introduce functionality for flushing and processing queued codes, which is
>> enabled via quirk flag for ASUS7000. It might be considered if it is
>> reasonable to enable it everywhere (might introduce regressions) or always
>> try to flush the queue on module load and try to detect if this quirk is
>> present in the future.
>>
>> This patch limits the effect to the specific hardware defined by ASUS7000
>> device that is used for driver detection by vendor driver of Fx505. The
>> fallback is also implemented in case initial flush fails.
> 
> Which vendor driver are you referring to here?

The ASUS Keyboard hotkeys Driver V2.0.5 which is available to download for
FX505 has this in his INF file:
[ATKP.ntamd64]

%DeviceDesc1% = NO_DRV64, ACPI\ASUS7000

But this driver is not generic. It is limited to very specific hardware,
I'd guess gaming series (for instance K54C does not have this device). It
was rather a way to avoid breaking anything. I think your suggestion below
is much better.

> 
> Researching more:
> In our database of 144 Asus models, 142 of them have GANQ.
> 
> Of those 142, 9 of them return One in the empty-queue case. The other
> 133 match your FX505GM device exactly. So it seems valid to interpret
> both 0x and 0x1 as queue-end markers.
> 
> The 2 devices that do not have GANQ are not laptops. They also do not
> have the _UID "ATK" WMI device, they only have "ASUSWMI" and their WMI
> _WED method does not use a queue.
> There are a few more units that have both ASUSWMI and ATK devices, and
> the ASUSWMI device appears to never be queued.
> Another observation is that the ASUSWMI device works with notify code
> 0xD2, and the ATK device works with 0xFF.
> 
> Nailing this down a bit further, I found a DSDT for EEEPC X101H: that
> one only has ASUSWMI and it is also not queued.
> 
> So I think you should make this queue behaviour applied more
> generically, but either avoid it when the WMI device _UID is not "ATK"
> (as discussed in the DCTS/DSTS thread), or avoid it when the notify
> code is not 0x>
> Thanks
> Daniel

Thanks a lot for your research, it's much appreciated! That seems to
confirm that these two quirks are actually connected with ATK device. I
guess it makes sense to combine the detection for both of them. Also to
flush the queue we need to know the notify code beforehand, because it is
checked in _WED so checking for ATK seems reasonable to me.

Best regards,
Yurii


Re: [PATCH v2] mm/hotplug: treat CMA pages as unmovable

2019-04-12 Thread Michal Hocko
On Fri 12-04-19 11:26:59, Qian Cai wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d96ca5bc555b..a9d2b0236167 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8005,7 +8005,10 @@ void *__init alloc_large_system_hash(const char 
> *tablename,
>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>int migratetype, int flags)
>  {
> - unsigned long pfn, iter, found;
> + unsigned long found;
> + unsigned long iter = 0;
> + unsigned long pfn = page_to_pfn(page);
> + char reason[] = "unmovable page";

const char *reason = "unovable page";
>  
>   /*
>* TODO we could make this much more efficient by not checking every
> @@ -8015,17 +8018,20 @@ bool has_unmovable_pages(struct zone *zone, struct 
> page *page, int count,
>* can still lead to having bootmem allocations in zone_movable.
>*/
>  
> - /*
> -  * CMA allocations (alloc_contig_range) really need to mark isolate
> -  * CMA pageblocks even when they are not movable in fact so consider
> -  * them movable here.
> -  */
> - if (is_migrate_cma(migratetype) &&
> - is_migrate_cma(get_pageblock_migratetype(page)))
> - return false;
> + if (is_migrate_cma(get_pageblock_migratetype(page))) {
> + /*
> +  * CMA allocations (alloc_contig_range) really need to mark
> +  * isolate CMA pageblocks even when they are not movable in fact
> +  * so consider them movable here.
> +  */
> + if (is_migrate_cma(migratetype))
> + return false;
> +
> + strscpy(reason, "CMA page", 9);

reason = "CMA page";

> + goto unmovable;
> + }

Other than that looks good. After fixing the above, feel free to add
Acked-by: Michal Hocko 

-- 
Michal Hocko
SUSE Labs


Re: INFO: trying to register non-static key in vmk80xx_detach

2019-04-12 Thread syzbot

syzbot has found a reproducer for the following crash on:

HEAD commit:9a33b369 usb-fuzzer: main usb gadget fuzzer driver
git tree:   https://github.com/google/kasan/tree/usb-fuzzer
console output: https://syzkaller.appspot.com/x/log.txt?x=11bed6fd20
kernel config:  https://syzkaller.appspot.com/x/.config?x=23e37f59d94ddd15
dashboard link: https://syzkaller.appspot.com/bug?extid=54c2f58f15fe6876b6ad
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15fdc6bf20
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=156dc2d320

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+54c2f58f15fe6876b...@syzkaller.appspotmail.com

usb 1-1: config 0 has no interface number 0
usb 1-1: New USB device found, idVendor=10cf, idProduct=8068,  
bcdDevice=e6.8d

usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
usb 1-1: config 0 descriptor??
vmk80xx 1-1:0.117: driver 'vmk80xx' failed to auto-configure device.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.1.0-rc4-319354-g9a33b36 #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: usb_hub_wq hub_event
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xe8/0x16e lib/dump_stack.c:113
 assign_lock_key kernel/locking/lockdep.c:786 [inline]
 register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095
 __lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582
 lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x44/0x60 kernel/locking/spinlock.c:152
 down+0x12/0x80 kernel/locking/semaphore.c:58
 vmk80xx_detach+0x59/0x100 drivers/staging/comedi/drivers/vmk80xx.c:829
 comedi_device_detach+0xed/0x800 drivers/staging/comedi/drivers.c:204
 comedi_device_cleanup.part.0+0x68/0x140  
drivers/staging/comedi/comedi_fops.c:156

 comedi_device_cleanup drivers/staging/comedi/comedi_fops.c:187 [inline]
 comedi_free_board_dev.part.0+0x16/0x90  
drivers/staging/comedi/comedi_fops.c:190

 comedi_free_board_dev drivers/staging/comedi/comedi_fops.c:189 [inline]
 comedi_release_hardware_device+0x111/0x140  
drivers/staging/comedi/comedi_fops.c:2880

 comedi_auto_config.cold+0x124/0x1b0 drivers/staging/comedi/drivers.c:1068
 usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361
 really_probe+0x2da/0xb10 drivers/base/dd.c:509
 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
 __device_attach+0x223/0x3a0 drivers/base/dd.c:844
 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
 device_add+0xad2/0x16e0 drivers/base/core.c:2106
 usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021
 generic_probe+0xa2/0xda drivers/usb/core/generic.c:210
 usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266
 really_probe+0x2da/0xb10 drivers/base/dd.c:509
 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
 __device_attach+0x223/0x3a0 drivers/base/dd.c:844
 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
 device_add+0xad2/0x16e0 drivers/base/core.c:2106
 usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534
 hub_port_connect drivers/usb/core/hub.c:5089 [inline]
 hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
 port_event drivers/usb/core/hub.c:5350 [inline]
 hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432
 process_one_work+0x90f/0x1580 kernel/workqueue.c:2269
 worker_thread+0x9b/0xe20 kernel/workqueue.c:2415
 kthread+0x313/0x420 kernel/kthread.c:253
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN PTI
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.1.0-rc4-319354-g9a33b36 #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: usb_hub_wq hub_event
RIP: 0010:__list_add_valid+0x47/0xa0 lib/list_debug.c:26
Code: fa 48 c1 ea 03 80 3c 02 00 75 50 49 8b 54 24 08 48 39 f2 0f 85 59 01  
00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f2 48 c1 ea 03 <80> 3c 02 00 75  
3a 48 8b 16 4c 39 e2 0f 85 5e 01 00 00 48 39 f5 0f

RSP: 0018:8880a84a7110 EFLAGS: 00010046
RAX: dc00 RBX: 888097143850 RCX: 
RDX:  RSI:  RDI: 888097143898
RBP: 8880a84a7170 R08: 8880a8491880 R09: ed1015094e31
R10: ed1015094e30 R11: 0003 R12: 888097143890
R13:  R14: 888097143898 R15: 888097143890
FS:  (00

Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty

2019-04-12 Thread Roman Gushchin
On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote:
> On Fri, Apr 12, 2019 at 8:15 AM Johannes Weiner  wrote:
> >
> > Right now, when somebody needs to know the recursive memory statistics
> > and events of a cgroup subtree, they need to walk the entire subtree
> > and sum up the counters manually.
> >
> > There are two issues with this:
> >
> > 1. When a cgroup gets deleted, its stats are lost. The state counters
> > should all be 0 at that point, of course, but the events are not. When
> > this happens, the event counters, which are supposed to be monotonic,
> > can go backwards in the parent cgroups.
> >
> 
> We also faced this exact same issue as well and had the similar solution.
> 
> > 2. During regular operation, we always have a certain number of lazily
> > freed cgroups sitting around that have been deleted, have no tasks,
> > but have a few cache pages remaining. These groups' statistics do not
> > change until we eventually hit memory pressure, but somebody watching,
> > say, memory.stat on an ancestor has to iterate those every time.
> >
> > This patch addresses both issues by introducing recursive counters at
> > each level that are propagated from the write side when stats change.
> >
> > Upward propagation happens when the per-cpu caches spill over into the
> > local atomic counter. This is the same thing we do during charge and
> > uncharge, except that the latter uses atomic RMWs, which are more
> > expensive; stat changes happen at around the same rate. In a sparse
> > file test (page faults and reclaim at maximum CPU speed) with 5 cgroup
> > nesting levels, perf shows __mod_memcg_page state at ~1%.
> >
> 
> (Unrelated to this patchset) I think there should also a way to get
> the exact memcg stats. As the machines are getting bigger (more cpus
> and larger basic page size) the accuracy of stats are getting worse.
> Internally we have an additional interface memory.stat_exact for that.
> However I am not sure in the upstream kernel will an additional
> interface is better or something like /proc/sys/vm/stat_refresh which
> sync all per-cpu stats.

I was thinking about eventually consistent counters: sync them periodically
from a worker thread. It should keep the cost of reading small, but
should increase the accuracy. Will it work for you?


Re: [1/3] x86: Update DEBUG_TLBFLUSH options description.

2019-04-12 Thread Atish Patra

On 4/10/19 11:56 PM, Christoph Hellwig wrote:

Given that this option enables generic code (which you reuse for RISC-V
later in this series) please also move the config option to
mm/Kconfig, proabbly keyed off another ARCH_HAVE_DEBUG_TLBFLUSH
symbol that the architecture can select.



Sure.

Regards,
Atish

___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv





Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty

2019-04-12 Thread Johannes Weiner
On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote:
> We also faced this exact same issue as well and had the similar solution.
> 
> > Signed-off-by: Johannes Weiner 
> 
> Reviewed-by: Shakeel Butt 

Thanks for the review!

> (Unrelated to this patchset) I think there should also a way to get
> the exact memcg stats. As the machines are getting bigger (more cpus
> and larger basic page size) the accuracy of stats are getting worse.
> Internally we have an additional interface memory.stat_exact for that.
> However I am not sure in the upstream kernel will an additional
> interface is better or something like /proc/sys/vm/stat_refresh which
> sync all per-cpu stats.

I was talking to Roman about this earlier as well and he mentioned it
would be nice to have periodic flushing of the per-cpu caches. The
global vmstat has something similar. We might be able to hook into
those workers, but it would likely require some smarts so we don't
walk the entire cgroup tree every couple of seconds.

We haven't had any actual problems with the per-cpu fuzziness, mainly
because the cgroups of interest also grow in size as the machines get
bigger, and so the relative error doesn't increase.

Are your requirements that the error dissipates over time (waiting for
a threshold convergence somewhere?) or do you have automation that
gets decisions wrong due to the error at any given point in time?


Re: [PATCH 0/4] mm: memcontrol: memory.stat cost & correctness

2019-04-12 Thread Shakeel Butt
On Fri, Apr 12, 2019 at 8:15 AM Johannes Weiner  wrote:
>
> The cgroup memory.stat file holds recursive statistics for the entire
> subtree. The current implementation does this tree walk on-demand
> whenever the file is read. This is giving us problems in production.
>
> 1. The cost of aggregating the statistics on-demand is high. A lot of
> system service cgroups are mostly idle and their stats don't change
> between reads, yet we always have to check them. There are also always
> some lazily-dying cgroups sitting around that are pinned by a handful
> of remaining page cache; the same applies to them.
>
> In an application that periodically monitors memory.stat in our fleet,
> we have seen the aggregation consume up to 5% CPU time.
>
> 2. When cgroups die and disappear from the cgroup tree, so do their
> accumulated vm events. The result is that the event counters at
> higher-level cgroups can go backwards and confuse some of our
> automation, let alone people looking at the graphs over time.
>
> To address both issues, this patch series changes the stat
> implementation to spill counts upwards when the counters change.
>
> The upward spilling is batched using the existing per-cpu cache. In a
> sparse file stress test with 5 level cgroup nesting, the additional
> cost of the flushing was negligible (a little under 1% of CPU at 100%
> CPU utilization, compared to the 5% of reading memory.stat during
> regular operation).

For whole series:

Reviewed-by: Shakeel Butt 

>
>  include/linux/memcontrol.h |  96 +++---
>  mm/memcontrol.c| 290 +++
>  mm/vmscan.c|   4 +-
>  mm/workingset.c|   7 +-
>  4 files changed, 234 insertions(+), 163 deletions(-)
>
>


Re: [PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures

2019-04-12 Thread Roman Gushchin
On Fri, Apr 12, 2019 at 03:14:18PM -0400, Johannes Weiner wrote:
> With the default overcommit==guess we occasionally run into mmap
> rejections despite plenty of memory that would get dropped under
> pressure but just isn't accounted reclaimable. One example of this is
> dying cgroups pinned by some page cache. A previous case was auxiliary
> path name memory associated with dentries; we have since annotated
> those allocations to avoid overcommit failures (see d79f7aa496fc ("mm:
> treat indirectly reclaimable memory as free in overcommit logic")).
> 
> But trying to classify all allocated memory reliably as reclaimable
> and unreclaimable is a bit of a fool's errand. There could be a myriad
> of dependencies that constantly change with kernel versions.
> 
> It becomes even more questionable of an effort when considering how
> this estimate of available memory is used: it's not compared to the
> system-wide allocated virtual memory in any way. It's not even
> compared to the allocating process's address space. It's compared to
> the single allocation request at hand!
> 
> So we have an elaborate left-hand side of the equation that tries to
> assess the exact breathing room the system has available down to a
> page - and then compare it to an isolated allocation request with no
> additional context. We could fail an allocation of N bytes, but for
> two allocations of N/2 bytes we'd do this elaborate dance twice in a
> row and then still let N bytes of virtual memory through. This doesn't
> make a whole lot of sense.
> 
> Let's take a step back and look at the actual goal of the
> heuristic. From the documentation:
> 
>Heuristic overcommit handling. Obvious overcommits of address
>space are refused. Used for a typical system. It ensures a
>seriously wild allocation fails while allowing overcommit to
>reduce swap usage.  root is allowed to allocate slightly more
>memory in this mode. This is the default.
> 
> If all we want to do is catch clearly bogus allocation requests
> irrespective of the general virtual memory situation, the physical
> memory counter-part doesn't need to be that complicated, either.
> 
> When in GUESS mode, catch wild allocations by comparing their request
> size to total amount of ram and swap in the system.
> 
> Signed-off-by: Johannes Weiner 

My 2c here: any kinds of percpu counters and percpu data is accounted
as unreclaimable and can alter the calculation significantly.

This is a special problem on hosts, which were idle for some time.
Without any memory pressure, kernel caches do occupy most of the memory,
so than a following attempt to start a workload fails.

With a big pleasure:
Acked-by: Roman Gushchin 

Thanks!


Re: [PATCH 04/11] platform/x86: asus-wmi: Add quirk to force DSTS WMI method detection

2019-04-12 Thread Yurii Pavlovskyi
Wow that is a great thing you done, thanks a lot for your time and your
kind words! Your suggestion indeed sounds good judging by your results.

Both of my devices have UID "ATK" (actually FX505 also has two other
PNP0C14 devices with HIDs SampleDev and TestDev, but I think they are
disabled by default).

It looks like the driver is loaded already only if ASUS_WMI_MGMT_GUID is
present, so I guess that could be used for wmi_driver_register. A little
obstacle is that ACPI device pointer is stored in wmi_block structure,
which is internal to the wmi.c. This means we would have to add a new
method to wmi.h / wmi.c for getting the UID of WMI ACPI device.

I guess it might be possible to somehow navigate the device tree back to
the platform driver of WMI module, but it would definitely be more obscure,
and searching for the device by HID again is not only slower but generally
would require a guarantee that it's the same device. So adding a new
function looks reasonable to me. It might also be useful to someone
implementing similar things for other vendors in the future.

I'm going to implement this in updated patch.

Thanks,
Yurii

On 11.04.19 12:55, Daniel Drake wrote:
> On Thu, Apr 11, 2019 at 4:28 AM Yurii Pavlovskyi
>  wrote:
>> The DSTS method detection fails, as nothing is returned if method is not
>> defined in WMNB. As a result the control of keyboard backlight is not
>> functional for TUF Gaming series laptops (at the time the only
>> functionality of the driver on this model implemented with WMI methods).
>>
>> Patch was tested on a newer TUF Gaming FX505GM and older K54C model.
>>
>> FX505GM:
>> Method (WMNB, 3, Serialized)
>> { ...
>> If ((Local0 == 0x53545344))
>> {
>> ...
>> Return (Zero)
>> }
>> ...
>> // No return
>> }
>>
>> K54C:
>> Method (WMNB, 3, Serialized)
>> { ...
>> If ((Local0 == 0x53545344))
>> {
>> ...
>> Return (0x02)
>> }
>> ...
>> Return (0xFFFE)
>> }
>>
>> The non-existing method ASUS_WMI_METHODID_DSTS=0x53544344 (actually it is
>> DCTS in little endian ASCII) is selected in asus->dsts.
>>
>> One way to fix this would be to call both for every known device ID until
>> some answers - this would increase module load time.
>>
>> Another option is to check some device that is known to exist on every
>> model - none known at the time.
>>
>> Last option, which is implemented, is to check for presence of the
>> ASUS7000 device in ACPI tree (it is a dummy device), which is the
>> condition used for loading the vendor driver for this model. This might
>> not fix every affected model ever produced, but it likely does not
>> introduce any regressions. The patch introduces a quirk that is enabled
>> when ASUS7000 is found.
>>
>> Scope (_SB)
>> {
>> Device (ATK)
>> {
>> Name (_HID, "ASUS7000")  // _HID: Hardware ID
>> }
>> }
> 
> Hmm, tricky! But about time we did something about the DSTS vs DCTS guessing.
> 
> While we don't have definitive knowledge of the right thing to do
> here, I think I have enough info available to solidify some
> assumptions we'd otherwise be afraid to make, and then we can
> implement a better approach here.
> 
> In our database of 144 Asus DSDTs, 14 of them respond to DCTS:
> 
> AS_D520MT
> D425MC
> D640SA
> D320SF-K
> D415MT
> D830MT
> G11DF
> M32CD4-K
> V221ID
> V272UN_SKU3
> Z240IE
> ZN220IC-K
> ZN241IC
> ZN270IE
> 
> Of those 14, they all additionally respond to DSTS, except for D415MT
> and AS_D520MT.
> 
> In all 14 cases, the DCTS handling is done inside a device with _UID
> ASUSWMI. None of the other 130 products have a device with that _UID.
> 
> Furthermore, we recently received access to the ASUS spec, which
> confirms that the Instance Name for EeePC is "ACPI\PNP0C14\ASUSWMI_0"
> whereas the Instance Name for other notebooks is "ACPI\PNP0C14\ATK_0".
> 
> The 12 devices that respond to both DCTS and DSTS, do it on separate
> different devices, albeit with the same _WDG UUID. The one with _UID
> ASUSWMI responds to DCTS, and the one with _UID ATK responds to DSTS.
> 
> So that seems fairly convincing. My thinking is that we can check the
> _UID of the underlying device, and use DCTS for ASUSWMI, DSTS
> otherwise. To do that, I think you'd have to rework the driver probing
> so that it happens through wmi_driver_register(). Then from the probe
> function onwards you will get a wmi_device, and hopefully you can get
> the _UID through that instance somehow.
> 
> There's a separate issue of what happens on those 12 machines where
> there are two WMI devs, with the same _WDG GUID.
> drivers/platform/x86/wmi.c drops duplicates, so one of them is being
> ignored in that case. We'd ideally find a way to ignore the ASUSWMI
> node and go with ATK in that situation. But I think this can be
> separated from your work here.
> 
> Thanks for these patches and welcome to the world of kernel development!
> 
> Daniel
> 


NFS won't finish

2019-04-12 Thread bob
Kernel: Linux freedom 5.1.0-rc4 #1 SMP Sun Apr 7 18:50:31 MDT 2019 
x86_64 x86_64 x86_64 GNU/Linux


Sending large files to NAS box via NFS failed (It looks like files are 
getting transferred, but NFS


gets stuck/won't finish.

Note:
1. NFS works ok with kernel 5.0.0

2. To test NFS, (using the5.1.0-rc4 kernel), I connected to my NAS box 
and moved a small log file

   (less that 20kB)  and it moved from computer to NAS without problem.
   I then deleted the small log file from the NAS box, again without 
problem.

   (connected via NFS).
   I then copied a database archive (24929 bytes) from NAS box to computer,
   again without fail.  But today I moved 3 large tarball/archives from
   computer to NAS box (TX:   cum:   8.50GB), and it seems
   to have moved OK, but NFS does not want to complete the transfer.
   I can't get a directory listing from where I started the copy, and
   the process queue shows:

root  6176  0.0  0.0  20824  2504 pts/5    D+   10:28   0:11 cp 
site.Apr-05-2019.tgz site.Apr-12-2019.tgz site.Mar-29-2019.tgz 
nas/website/BACKUP/2019.04.12


DMESG:

perf: interrupt took too long (2694 > 2500), lowering 
kernel.perf_event_max_sample_rate to 74250
[ 1644.507579] perf: interrupt took too long (3369 > 3367), lowering 
kernel.perf_event_max_sample_rate to 59250

[ 2349.205399] FS-Cache: Loaded
[ 2349.348842] FS-Cache: Netfs 'nfs' registered for caching
[ 2349.652305] NFS: Registering the id_resolver key type
[ 2349.652316] Key type id_resolver registered
[ 2349.652317] Key type id_legacy registered
[ 3595.680685] nfs: server 192.168.20.114 not responding, still trying
[ 3595.680692] nfs: server 192.168.20.114 not responding, still trying
[ 3595.680695] nfs: server 192.168.20.114 not responding, still trying
[ 3626.400412] INFO: task cp:6176 blocked for more than 120 seconds.
[ 3626.400418]   Tainted: G  I   5.1.0-rc4 #1
[ 3626.400419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.

[ 3626.400422] cp  D    0  6176   5890 0x
[ 3626.400425] Call Trace:
[ 3626.400434]  ? __schedule+0x2a1/0x870
[ 3626.400436]  schedule+0x28/0x70
[ 3626.400440]  io_schedule+0x12/0x40
[ 3626.400443]  wait_on_page_bit_common+0x10a/0x280
[ 3626.400447]  ? file_check_and_advance_wb_err+0xc0/0xc0
[ 3626.400450]  __filemap_fdatawait_range+0xd8/0x140
[ 3626.400453]  filemap_write_and_wait_range+0x3b/0x70
[ 3626.400471]  nfs_file_fsync+0x44/0x1f0 [nfs]
[ 3626.400474]  filp_close+0x2a/0x70
[ 3626.400475]  __x64_sys_close+0x1e/0x50
[ 3626.400477]  do_syscall_64+0x4f/0x100
[ 3626.400479]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3626.400481] RIP: 0033:0x7f6be90958d4
[ 3626.400485] Code: Bad RIP value.
[ 3626.400486] RSP: 002b:7ffe07ec4398 EFLAGS: 0246 ORIG_RAX: 
0003
[ 3626.400487] RAX: ffda RBX:  RCX: 
7f6be90958d4
[ 3626.400488] RDX: 0002 RSI: 55b82b141000 RDI: 
0004
[ 3626.400488] RBP: 7ffe07ec4770 R08: 0002 R09: 
0001
[ 3626.400489] R10: 0002 R11: 0246 R12: 
7ffe07ec4970
[ 3626.400490] R13:  R14: 7ffe07ec4900 R15: 
7ffe07ec52db

[ 3747.231331] INFO: task cp:6176 blocked for more than 241 seconds.
[ 3747.231337]   Tainted: G  I   5.1.0-rc4 #1
[ 3747.231339] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.


According to NTOP, it looks like data was transferred:

TX: cum:   8.50GB   peak:    208b   rates: 0b  0b 18b

My Hardware:

04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E 
Gigabit Ethernet Controller (rev 12)

    Subsystem: ASUSTeK Computer Inc. Motherboard
    Flags: bus master, fast devsel, latency 0, IRQ 26
    Memory at fb9fc000 (64-bit, non-prefetchable) [size=16K]
    I/O ports at b800 [size=256]
    Expansion ROM at fb9c [disabled] [size=128K]
    Capabilities: 
    Kernel driver in use: sky2
    Kernel modules: sky2

06:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E 
Gigabit Ethernet Controller (rev 12)

    Subsystem: ASUSTeK Computer Inc. Motherboard
    Flags: bus master, fast devsel, latency 0, IRQ 25
    Memory at fbbfc000 (64-bit, non-prefetchable) [size=16K]
    I/O ports at d800 [size=256]
    Expansion ROM at fbbc [disabled] [size=128K]
    Capabilities: 
    Kernel driver in use: sky2
    Kernel modules: sky2

Drivers:

lsmod | grep nfs yields:
nfsv3  53248  1
nfsv4 692224  0
nfs   315392  3 nfsv4,nfsv3
fscache   380928  2 nfsv4,nfs
nfsd  413696  13
auth_rpcgss   102400  2 nfsd,rpcsec_gss_krb5
nfs_acl    16384  2 nfsd,nfsv3
lockd 106496  3 nfsd,nfsv3,nfs
grace  16384  2 nfsd,lockd
sunrpc    438272  28 
nfsd,nfsv4,auth_rpcgss,lockd,nfsv3,rpcsec_gss_krb5

Re: [PATCH] soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher

2019-04-12 Thread Matthias Brugger



On 20/03/2019 20:11, Nathan Chancellor wrote:
> On Thu, Mar 07, 2019 at 03:56:51PM -0700, Nathan Chancellor wrote:
>> When building with -Wsometimes-uninitialized, Clang warns:
>>
>> drivers/soc/mediatek/mtk-pmic-wrap.c:1358:6: error: variable 'rdata' is
>> used uninitialized whenever '||' condition is true
>> [-Werror,-Wsometimes-uninitialized]
>>
>> If pwrap_write returns non-zero, pwrap_read will not be called to
>> initialize rdata, meaning that we will use some random uninitialized
>> stack value in our print statement. Zero initialize rdata in case this
>> happens.
>>
>> Link: https://github.com/ClangBuiltLinux/linux/issues/401
>> Signed-off-by: Nathan Chancellor 
>> ---
>>
>> I don't know if this is better or to just restructure the if statement
>> below (I'm not an expert in this code so I'll leave that up to the
>> maintainers to decide).
>>
>>  drivers/soc/mediatek/mtk-pmic-wrap.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/soc/mediatek/mtk-pmic-wrap.c 
>> b/drivers/soc/mediatek/mtk-pmic-wrap.c
>> index 8236a6c87e19..2f632e8790f7 100644
>> --- a/drivers/soc/mediatek/mtk-pmic-wrap.c
>> +++ b/drivers/soc/mediatek/mtk-pmic-wrap.c
>> @@ -1281,7 +1281,7 @@ static bool pwrap_is_pmic_cipher_ready(struct 
>> pmic_wrapper *wrp)
>>  static int pwrap_init_cipher(struct pmic_wrapper *wrp)
>>  {
>>  int ret;
>> -u32 rdata;
>> +u32 rdata = 0;
>>  
>>  pwrap_writel(wrp, 0x1, PWRAP_CIPHER_SWRST);
>>  pwrap_writel(wrp, 0x0, PWRAP_CIPHER_SWRST);
>> -- 
>> 2.21.0
>>
> 
> Gentle ping (if there was a response to this, I didn't receive it). I
> know I sent it in the middle of a merge window so I get if it slipped
> through the cracks.
> 

applied now to v5.1-next/soc

thanks


Re: [PATCH] RDMA/cma: Make CM response timeout and # CM retries configurable

2019-04-12 Thread Leon Romanovsky
On Fri, Apr 12, 2019 at 04:30:24PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 04, 2019 at 06:27:26AM +, Parav Pandit wrote:
>
> > I think we should use rdma_nl_register(RDMA_NL_RDMA_CM, cma_cb_table) which 
> > was removed as part of ID stats removal.
> > Because of below reasons.
> > 1. rdma netlink command auto loads the module
>
> This is probably the best argument to stay away from sysctl for module
> parameters.. It is tricky to make sure the rdma module is loaded
> before sysctl runs in boot, and I don't think sysctl autoloads missing
> modules, or somehow copes with dynamic module loading, does it?

As far as I remember, sysctl won't be available till module is loaded,
so autoload won't work because path doesn't exist.

>
> To that end we should probably have the entire sysctl configurable
> available in netlink
>
> Jason


signature.asc
Description: PGP signature


Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty

2019-04-12 Thread Shakeel Butt
On Fri, Apr 12, 2019 at 8:15 AM Johannes Weiner  wrote:
>
> Right now, when somebody needs to know the recursive memory statistics
> and events of a cgroup subtree, they need to walk the entire subtree
> and sum up the counters manually.
>
> There are two issues with this:
>
> 1. When a cgroup gets deleted, its stats are lost. The state counters
> should all be 0 at that point, of course, but the events are not. When
> this happens, the event counters, which are supposed to be monotonic,
> can go backwards in the parent cgroups.
>

We also faced this exact same issue as well and had the similar solution.

> 2. During regular operation, we always have a certain number of lazily
> freed cgroups sitting around that have been deleted, have no tasks,
> but have a few cache pages remaining. These groups' statistics do not
> change until we eventually hit memory pressure, but somebody watching,
> say, memory.stat on an ancestor has to iterate those every time.
>
> This patch addresses both issues by introducing recursive counters at
> each level that are propagated from the write side when stats change.
>
> Upward propagation happens when the per-cpu caches spill over into the
> local atomic counter. This is the same thing we do during charge and
> uncharge, except that the latter uses atomic RMWs, which are more
> expensive; stat changes happen at around the same rate. In a sparse
> file test (page faults and reclaim at maximum CPU speed) with 5 cgroup
> nesting levels, perf shows __mod_memcg_page state at ~1%.
>

(Unrelated to this patchset) I think there should also a way to get
the exact memcg stats. As the machines are getting bigger (more cpus
and larger basic page size) the accuracy of stats are getting worse.
Internally we have an additional interface memory.stat_exact for that.
However I am not sure in the upstream kernel will an additional
interface is better or something like /proc/sys/vm/stat_refresh which
sync all per-cpu stats.

> Signed-off-by: Johannes Weiner 

Reviewed-by: Shakeel Butt 

> ---
>  include/linux/memcontrol.h |  54 +-
>  mm/memcontrol.c| 205 ++---
>  2 files changed, 150 insertions(+), 109 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index cae7d1b11eea..36bdfe8e5965 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -128,6 +128,7 @@ struct mem_cgroup_per_node {
>
> struct lruvec_stat __percpu *lruvec_stat_cpu;
> atomic_long_t   lruvec_stat[NR_VM_NODE_STAT_ITEMS];
> +   atomic_long_t   lruvec_stat_local[NR_VM_NODE_STAT_ITEMS];
>
> unsigned long   lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
>
> @@ -279,8 +280,12 @@ struct mem_cgroup {
> MEMCG_PADDING(_pad2_);
>
> atomic_long_t   vmstats[MEMCG_NR_STAT];
> +   atomic_long_t   vmstats_local[MEMCG_NR_STAT];
> +
> atomic_long_t   vmevents[NR_VM_EVENT_ITEMS];
> -   atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
> +   atomic_long_t   vmevents_local[NR_VM_EVENT_ITEMS];
> +
> +   atomic_long_t   memory_events[MEMCG_NR_MEMORY_EVENTS];
>
> unsigned long   socket_pressure;
>
> @@ -565,6 +570,20 @@ struct mem_cgroup *lock_page_memcg(struct page *page);
>  void __unlock_page_memcg(struct mem_cgroup *memcg);
>  void unlock_page_memcg(struct page *page);
>
> +/*
> + * idx can be of type enum memcg_stat_item or node_stat_item.
> + * Keep in sync with memcg_exact_page_state().
> + */
> +static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int 
> idx)
> +{
> +   long x = atomic_long_read(&memcg->vmstats[idx]);
> +#ifdef CONFIG_SMP
> +   if (x < 0)
> +   x = 0;
> +#endif
> +   return x;
> +}
> +
>  /*
>   * idx can be of type enum memcg_stat_item or node_stat_item.
>   * Keep in sync with memcg_exact_page_state().
> @@ -572,7 +591,7 @@ void unlock_page_memcg(struct page *page);
>  static inline unsigned long memcg_page_state_local(struct mem_cgroup *memcg,
>int idx)
>  {
> -   long x = atomic_long_read(&memcg->vmstats[idx]);
> +   long x = atomic_long_read(&memcg->vmstats_local[idx]);
>  #ifdef CONFIG_SMP
> if (x < 0)
> x = 0;
> @@ -624,6 +643,24 @@ static inline void mod_memcg_page_state(struct page 
> *page,
> mod_memcg_state(page->mem_cgroup, idx, val);
>  }
>
> +static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
> + enum node_stat_item idx)
> +{
> +   struct mem_cgroup_per_node *pn;
> +   long x;
> +
> +   if (mem_cgroup_disabled())
> +   return node_page_state(lruvec_pgdat(lruvec), idx);
> +
> +   pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
> +   x = atomic_long_read(&pn->lruvec_stat[idx]);
> +#ifdef CONFIG_SMP
> +   if (

Re: [PATCH 0/14] v2 multi-die/package topology support

2019-04-12 Thread Len Brown
> > > I think I prefer 's/threads/cpus/g' on that. Threads makes me think SMT,
> > > and I don't think there's any guarantee the part in question will have
> > > SMT on.
> >
> > I think 'threads' is a bit confusing as well. We seem to be using 'cpu'
> > everywhere for something we can schedule tasks on, including the sysfs
> > /sys/devices/system/cpu/ subdirs for each SMT thread on SMT systems.

I agree with Peter and Morten.
"cpu" is more clear and consistent than "thread" here.
I'll spin the series with that string changed.

> > Another thing that I find confusing is that with this series we a new
> > die id/mask which is totally unrelated to the DIE level in the
> > sched_domain hierarchy. We should rename DIE level to something that
> > reflects what it really is. If we can agree on that ;-)
> >
> > NODE level?

Cache topology and node interconnect topology impact performance, and so
we what we look at, when we decided to run something on this CPU or that one.

That logical topology lives within the physical package and die topology,
but doesn't necessarily match it.  For example, caches can be shared
or split into pieces inside a package or die.  Logical nodes may match
die boundaries, or there may be multiple logical nodes within a single
physical package or die.

We have mechanisms for explicitly enumerating the caches,
and for nodes.  This patch series does not touch those mechanisms.

The reason we need to know about physical packages and die is that
there are other things associated with them.
eg. power and temperature domains, and certain system registers
follow these physical boundaries.  Code that talks to those items
needs to be able to understand these physical boundaries.

I hope that helps.

thanks,
Len Brown, Intel Open Source Technology Center


[PATCH v3] chrome/platform: cros_ec_proto: Add trace event to trace EC commands

2019-04-12 Thread Raul E Rangel
This is useful to see which EC commands are being executed and when.

To enable:

echo 'cros_ec:*' >> /sys/kernel/debug/tracing/set_event

Example:

/* cros_ec_cmd: version: 0, command: GET_VERSION */
/* cros_ec_cmd: version: 0, command: GET_PROTOCOL_INFO */
/* cros_ec_cmd: version: 1, command: GET_CMD_VERSIONS */
/* cros_ec_cmd: version: 1, command: USB_PD_CONTROL */

Signed-off-by: Raul E Rangel 
---

Changes in v3:
- Use a macro to avoid duplicating the ec command names.

Changes in v2:
- Changed comment style to match other cros_ec files.
- Fixed commit tag.

 drivers/platform/chrome/Makefile|   4 +-
 drivers/platform/chrome/cros_ec_proto.c |   4 +
 drivers/platform/chrome/cros_ec_trace.c | 162 
 drivers/platform/chrome/cros_ec_trace.h |  51 
 4 files changed, 220 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/chrome/cros_ec_trace.c
 create mode 100644 drivers/platform/chrome/cros_ec_trace.h

diff --git a/drivers/platform/chrome/Makefile b/drivers/platform/chrome/Makefile
index 1e2f0029b597..e542268454a4 100644
--- a/drivers/platform/chrome/Makefile
+++ b/drivers/platform/chrome/Makefile
@@ -3,12 +3,14 @@
 obj-$(CONFIG_CHROMEOS_LAPTOP)  += chromeos_laptop.o
 obj-$(CONFIG_CHROMEOS_PSTORE)  += chromeos_pstore.o
 obj-$(CONFIG_CHROMEOS_TBMC)+= chromeos_tbmc.o
+# tell define_trace.h where to find the cros ec trace header
+CFLAGS_cros_ec_trace.o:=   -I$(src)
 obj-$(CONFIG_CROS_EC_I2C)  += cros_ec_i2c.o
 obj-$(CONFIG_CROS_EC_SPI)  += cros_ec_spi.o
 cros_ec_lpcs-objs  := cros_ec_lpc.o cros_ec_lpc_reg.o
 cros_ec_lpcs-$(CONFIG_CROS_EC_LPC_MEC) += cros_ec_lpc_mec.o
 obj-$(CONFIG_CROS_EC_LPC)  += cros_ec_lpcs.o
-obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o
+obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o cros_ec_trace.o
 obj-$(CONFIG_CROS_KBD_LED_BACKLIGHT)   += cros_kbd_led_backlight.o
 obj-$(CONFIG_CROS_EC_LIGHTBAR) += cros_ec_lightbar.o
 obj-$(CONFIG_CROS_EC_VBC)  += cros_ec_vbc.o
diff --git a/drivers/platform/chrome/cros_ec_proto.c 
b/drivers/platform/chrome/cros_ec_proto.c
index 97a068dff192..3d02c8259ac6 100644
--- a/drivers/platform/chrome/cros_ec_proto.c
+++ b/drivers/platform/chrome/cros_ec_proto.c
@@ -10,6 +10,8 @@
 #include 
 #include 
 
+#include "cros_ec_trace.h"
+
 #define EC_COMMAND_RETRIES 50
 
 static int prepare_packet(struct cros_ec_device *ec_dev,
@@ -51,6 +53,8 @@ static int send_command(struct cros_ec_device *ec_dev,
int ret;
int (*xfer_fxn)(struct cros_ec_device *ec, struct cros_ec_command *msg);
 
+   trace_cros_ec_cmd(msg);
+
if (ec_dev->proto_version > 2)
xfer_fxn = ec_dev->pkt_xfer;
else
diff --git a/drivers/platform/chrome/cros_ec_trace.c 
b/drivers/platform/chrome/cros_ec_trace.c
new file mode 100644
index ..1f817aa9d886
--- /dev/null
+++ b/drivers/platform/chrome/cros_ec_trace.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0
+// Trace events for the ChromeOS Embedded Controller
+//
+// Copyright 2019 Google LLC.
+
+#define EC_CMD(a) {EC_CMD_##a, #a}
+
+#define ec_cmds \
+   EC_CMD(PROTO_VERSION), \
+   EC_CMD(HELLO), \
+   EC_CMD(GET_VERSION), \
+   EC_CMD(READ_TEST), \
+   EC_CMD(GET_BUILD_INFO), \
+   EC_CMD(GET_CHIP_INFO), \
+   EC_CMD(GET_BOARD_VERSION), \
+   EC_CMD(READ_MEMMAP), \
+   EC_CMD(GET_CMD_VERSIONS), \
+   EC_CMD(GET_COMMS_STATUS), \
+   EC_CMD(TEST_PROTOCOL), \
+   EC_CMD(GET_PROTOCOL_INFO), \
+   EC_CMD(GSV_PAUSE_IN_S5), \
+   EC_CMD(GET_FEATURES), \
+   EC_CMD(GET_SKU_ID), \
+   EC_CMD(SET_SKU_ID), \
+   EC_CMD(FLASH_INFO), \
+   EC_CMD(FLASH_READ), \
+   EC_CMD(FLASH_WRITE), \
+   EC_CMD(FLASH_ERASE), \
+   EC_CMD(FLASH_PROTECT), \
+   EC_CMD(FLASH_REGION_INFO), \
+   EC_CMD(VBNV_CONTEXT), \
+   EC_CMD(FLASH_SPI_INFO), \
+   EC_CMD(FLASH_SELECT), \
+   EC_CMD(PWM_GET_FAN_TARGET_RPM), \
+   EC_CMD(PWM_SET_FAN_TARGET_RPM), \
+   EC_CMD(PWM_GET_KEYBOARD_BACKLIGHT), \
+   EC_CMD(PWM_SET_KEYBOARD_BACKLIGHT), \
+   EC_CMD(PWM_SET_FAN_DUTY), \
+   EC_CMD(PWM_SET_DUTY), \
+   EC_CMD(PWM_GET_DUTY), \
+   EC_CMD(LIGHTBAR_CMD), \
+   EC_CMD(LED_CONTROL), \
+   EC_CMD(VBOOT_HASH), \
+   EC_CMD(MOTION_SENSE_CMD), \
+   EC_CMD(FORCE_LID_OPEN), \
+   EC_CMD(CONFIG_POWER_BUTTON), \
+   EC_CMD(USB_CHARGE_SET_MODE), \
+   EC_CMD(PSTORE_INFO), \
+   EC_CMD(PSTORE_READ), \
+   EC_CMD(PSTORE_WRITE), \
+   EC_CMD(RTC_GET_VALUE), \
+   EC_CMD(RTC_GET_ALARM), \
+   EC_CMD(RTC_SET_VALUE), \
+   EC_CMD(RTC_SET_ALARM), \
+   EC_CMD(PORT80_READ), \
+   EC_CMD(VSTORE_INFO), \
+   EC_CMD(VSTORE_READ), \
+   EC_CMD(VSTORE_WRITE), \
+   EC_CMD(THERMAL_SET_THRESHOLD), \
+   EC_CMD(THERMAL_GET_THRESHOLD), 

Re: [PATCH v2] chrome/platform: cros_ec_proto:: Add trace event to trace EC commands

2019-04-12 Thread Raul Rangel
On Fri, Apr 12, 2019 at 03:04:38PM -0400, Steven Rostedt wrote:
> On Fri, 12 Apr 2019 12:49:44 -0600
> Raul E Rangel  wrote:
> 
> > +#define ec_cmds \
> > +   {EC_CMD_PROTO_VERSION, "PROTO_VERSION"}, \
> > +   {EC_CMD_HELLO, "HELLO"}, \
> > +   {EC_CMD_GET_VERSION, "GET_VERSION"}, \
> > +   {EC_CMD_READ_TEST, "READ_TEST"}, \
> > +   {EC_CMD_GET_BUILD_INFO, "GET_BUILD_INFO"}, \
> > +   {EC_CMD_GET_CHIP_INFO, "GET_CHIP_INFO"}, \
> > +   {EC_CMD_GET_BOARD_VERSION, "GET_BOARD_VERSION"}, \
> > +   {EC_CMD_READ_MEMMAP, "READ_MEMMAP"}, \
> > +   {EC_CMD_GET_CMD_VERSIONS, "GET_CMD_VERSIONS"}, \
> > +   {EC_CMD_GET_COMMS_STATUS, "GET_COMMS_STATUS"}, \
> > +   {EC_CMD_TEST_PROTOCOL, "TEST_PROTOCOL"}, \
> > +   {EC_CMD_GET_PROTOCOL_INFO, "GET_PROTOCOL_INFO"}, \
> > +   {EC_CMD_GSV_PAUSE_IN_S5, "GSV_PAUSE_IN_S5"}, \
> > +   {EC_CMD_GET_FEATURES, "GET_FEATURES"}, \
> 
> Usually, if I have something like this, I would do:
> 
> 
> #define ec_cmds \
>   EC(PROTO_VERSION), \
>   EC(HELLO), \
>   EC(GET_VERSION), \
>   EC(READ_TEST), \
>   [...]
> 
> Then:
> 
> #define EC(a) {EC_CMD_##a, #a}
> 
> and then ec_cmds ends up with the same result with much less typing and
> little risk for copy past errors.
I generally don't like doing that because it breaks my editors Goto
Definition, but I can send out a patch that uses the macro.

Thanks,
Raul
> 
> -- Steve


Re: kernel BUG at fs/inode.c:LINE!

2019-04-12 Thread Eric Biggers
Hi Dmitry,

On Fri, Apr 12, 2019 at 01:04:27PM +0200, 'Dmitry Vyukov' via syzkaller-bugs 
wrote:
> On Thu, Apr 11, 2019 at 4:23 AM Al Viro  wrote:
> >
> > On Thu, Apr 11, 2019 at 08:50:17AM +0800, Ian Kent wrote:
> > > On Wed, 2019-04-10 at 14:41 +0200, Dmitry Vyukov wrote:
> > > > On Wed, Apr 10, 2019 at 2:12 PM Al Viro  wrote:
> > > > >
> > > > > On Wed, Apr 10, 2019 at 08:07:15PM +0800, Ian Kent wrote:
> > > > >
> > > > > > > I'm unable to find a branch matching the line numbers.
> > > > > > >
> > > > > > > Given that, on the face of it, the scenario is impossible I'm
> > > > > > > seeking clarification on what linux-next to look at for the
> > > > > > > sake of accuracy.
> > > > > > >
> > > > > > > So I'm wondering if this testing done using the master branch
> > > > > > > or one of the daily branches one would use to check for conflicts
> > > > > > > before posting?
> > > > > >
> > > > > > Sorry those are tags not branches.
> > > > >
> > > > > FWIW, that's next-20181214; it is what master had been in mid-December
> > > > > and master is rebased every day.  Can it be reproduced with the 
> > > > > current
> > > > > tree?
> > > >
> > > > From the info on the dashboard we know that it happened only once on
> > > > d14b746c (the second one is result of reproducing the first one). So
> > > > it was either fixed or just hard to trigger.
> > >
> > > Looking at the source of tag next-20181214 in linux-next-history I see
> > > this is mistake I made due to incorrect error handling which I fixed
> > > soon after (there was in fact a double iput()).
> >
> > Right - "autofs: fix possible inode leak in autofs_fill_super()" had been
> > broken (and completely pointless), leading to double iput() in that failure
> > case.  And yes, double iput() can trigger that BUG_ON(), and with non-zero
> > odds do so with that stack trace.
> >
> > As far as I'm concerned, case closed - bug had been in a misguided "fix"
> > for inexistent leak (coming from misreading the calling conventions for
> > d_make_root()), introduced in -next at next-20181130 and kicked out of
> > there in next-20181219.  Dropped by Ian's request in
> > Message-ID: <66d497c00cffb3e4109ca0d5287c8277954d7132.ca...@themaw.net>
> > which has fixed that crap.  Moreover, that posting had been in reply to
> > that very syzcaller report, AFAICS.
> >
> > I don't know how to tell the bot to STFU and close the report in this
> > situation; up to you, folks.
> 
> 
> Please see the following for this:
> 
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
> > syzbot.
> 
> There are just 3 operations: mark as fixed by a commit, mark as
> invalid, mark as duplicate.
> I won't be always around. Tracking statuses of bug reports is in the
> interests of kernel quality.
> 

As I suggested before, syzbot should automatically invalidate old bug reports,
especially on linux-next, that are unlikely to still be real problems.

And, syzbot should send remainders about bugs that are still occurring.

Instead, currently developers have to waste time debugging bugs caused by
patches that were in linux-next for a few days/weeks and then dropped months
ago, and have to argue with you every time about how to tell syzbot to close the
bug when it never made it into git history.  And meanwhile, no one is looking
into the bugs that are being hit on mainline every hour or even every 10 minutes
for the past year.

That's not a great strategy to actually get bugs fixed.

- Eric


Re: [PATCH] selftests/seccomp: Handle namespace failures gracefully

2019-04-12 Thread Tycho Andersen
On Fri, Apr 12, 2019 at 11:07:11AM -0600, shuah wrote:
> On 4/12/19 9:25 AM, Tycho Andersen wrote:
> > On Thu, Apr 11, 2019 at 04:56:31PM -0700, Kees Cook wrote:
> > > When running without USERNS or PIDNS the seccomp test would hang since
> > > it was waiting forever for the child to trigger the user notification
> > > since it seems the glibc() abort handler makes a call to getpid(),
> > > which would trap again. This changes the getpid filter to getppid, and
> > > makes sure ASSERTs execute to stop from spawning the listener.
> > > 
> > > Reported-by: Shuah Khan 
> > > Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
> > > Signed-off-by: Kees Cook 
> > 
> > Sorry for the delay, thanks for looking at this!
> > 
> > Reviewed-by: Tycho Andersen 
> > 
> 
> Thanks both. Should it go into stables. I will pull this and
> add stable if that is appropriate.

Yes, for 5.0+ that sounds good.

Thanks!

Tycho


Re: 1808d65b55 ("asm-generic/tlb: Remove arch_tlb*_mmu()"): BUG: KASAN: stack-out-of-bounds in __change_page_attr_set_clr

2019-04-12 Thread Nadav Amit
> On Apr 12, 2019, at 11:19 AM, Peter Zijlstra  wrote:
> 
> On Fri, Apr 12, 2019 at 03:11:22PM +, Nadav Amit wrote:
>>> On Apr 12, 2019, at 4:17 AM, Peter Zijlstra  wrote:
> 
>>> To clarify, 'that' is Nadav's patch:
>>> 
>>> 515ab7c41306 ("x86/mm: Align TLB invalidation info")
>>> 
>>> which turns out to be the real problem.
>> 
>> Sorry for that. I still think it should be aligned, especially with all the
>> effort the Intel puts around to avoid bus-locking on unaligned atomic
>> operations.
> 
> No atomics anywhere in sight, so that's not a concern.

You are right. I still think that at least TLB-wise it should be better to
have the argument off-stack. I’ll try to run some experiments, based on
your feedback, and send a patch on top of your revert.

Sorry for the mess, again.




Re: [PATCH 1/7] s390: zcrypt: driver callback to indicate resource in use

2019-04-12 Thread Tony Krowiak

On 4/12/19 5:43 AM, Cornelia Huck wrote:

On Fri, 12 Apr 2019 08:54:54 +0200
Harald Freudenberger  wrote:


On 11.04.19 23:03, Tony Krowiak wrote:

Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. This prevents
a root user from inadvertently taking a queue away from a guest and
giving it to the host, or vice versa. The callback will be invoked
whenever a change to the AP bus's apmask or aqmask sysfs interfaces may
result in one or more AP queues being removed from its driver. If the
callback responds in the affirmative for any driver queried, the change
to the apmask or aqmask will be rejected with a device in use error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver manages AP queues passed through to one or more
guests and we don't want to unexpectedly take AP resources away from
guests which are most likely independently administered.

Signed-off-by: Tony Krowiak 


Hello Tony

you are going out with your patches but ... what is the result of the 
discussions
we had in the past ? Do we have a common understanding that we want to have
it this way ? A driver which is able to claim resources and the bus code
has lower precedence ?


This is what Reinhard suggested and what we agreed to as a team quite
some time ago. I submitted three versions of this patch to our
internal mailing list, all of which you reviewed, so I'm not sure
why you are surprised by this now.



I don't know about previous discussions, but I'm curious how you
arrived at this design. A driver being able to override the bus code
seems odd. Restricting this to 'non-default' drivers looks even more
odd.

What is this trying to solve? Traditionally, root has been able to
shoot any appendages of their choice. I would rather expect that in a
supported setup you'd have some management software keeping track of
device usage and making sure that only unused queues can be unbound. Do
we need to export more information to user space so that management
software can make better choices?


Is there a reason other than tradition to prevent root from accidentally
shooting himself in the foot or any other appendage? At present,
sysfs is the only supported setup, so IMHO it makes sense to prevent as
much accidentally caused damage as possible in the kernel.






---
  drivers/s390/crypto/ap_bus.c | 138 +--
  drivers/s390/crypto/ap_bus.h |   3 +
  2 files changed, 135 insertions(+), 6 deletions(-)






Re: [PATCH v2 4/6] dt-bindings: ti-lmu: Modify dt bindings for the LM3697

2019-04-12 Thread Jacek Anaszewski

And one more:

[5] - LED naming patch set v3

On 4/11/19 9:48 PM, Jacek Anaszewski wrote:

Hi Rob,

I've collected references to the DT patches we're waiting
to be reviewed. This is just to make sure that none of them
will be missed - recent traffic on the linux-leds list
is a bit heavier.

[0] - draft of LED multi color class - we would especially like
   to get a feedback on the proposed brightness-model DT design
[1][2] - LM3697 backlight driver rework
[3][4] - LM36274 backlight driver with regulator support

On 4/5/19 4:28 PM, Dan Murphy wrote:

The LM3697 is a single function LED driver. The single function LED
driver needs to reside in the LED directory as a dedicated LED driver
and not as a MFD device.  The device does have common brightness and ramp
features and those can be accomodated by a TI LMU framework.

The LM3697 dt binding needs to be moved from the ti-lmu.txt and a 
dedicated
LED dt binding needs to be added.  The new LM3697 LED dt binding will 
then
reside in the Documentation/devicetree/bindings/leds directory and 
follow the

current LED and general bindings guidelines.

Signed-off-by: Dan Murphy 

[...]

[0] https://marc.info/?l=linux-kernel&m=155414072611538&w=2
[1] https://marc.info/?l=devicetree&m=155447454524411&w=2
[2] https://lore.kernel.org/patchwork/patch/1058762/
[3] https://marc.info/?l=devicetree&m=155447624324995&w=2
[4] https://lore.kernel.org/patchwork/patch/1058779/



[5] - https://marc.info/?l=linux-kernel&m=155405492231039&w=2

--
Best regards,
Jacek Anaszewski


Re: [PATCH 0/14] v2 multi-die/package topology support

2019-04-12 Thread Len Brown
On Tue, Feb 26, 2019 at 1:51 PM Peter Zijlstra  wrote:
>
> On Tue, Feb 26, 2019 at 01:19:58AM -0500, Len Brown wrote:
> > Restored sysfs core_siblings, core_siblings_list
> >
> >   v1 proposed re-defining this existing attribute to
> >   be the threads in a die, rather than in a package.
> >
> >   For compatibility, decided rather to keep this
> >   attribute unchanged, for now, even though
> >   its name makes little sense, and it makes
> >   no sense in a multi-die system.
>
> So why do things that make no sense?

>>> 7) /sys/devices/system/cpu/cpuX/topology/core_siblings:
>>>
>>> internal kernel map of cpuX's hardware threads within the same
>>> physical_package_id.

This definition tells you what cpus are in what package.
That is fine, it is useful, and it is in use.

What doesn't make sense is that it is called "core_siblings".
Who is to say that the map of CPUs inside a package has anything to do
with "cores"?
Sometimes it does, sometimes it doesn't...

> What breaks?

User space applications, such as lscpu and hwloc are using this attribute
per its definition, to figure out what cpus are in what packages.
If we change the definition to match the attribute's name, they break.
If we change the attribute name to match the definition, they break.

So the plan is to simply leave this attribute and its definition in place,
deprecate it, and move to the new attribute names that don't have
the word "siblings" in them -- which imply a known fixed topology.

We can schedule this attribute to be deleted some day,
but changing it and hoping you've updated all of user-space
would be a unnecessary pain.

Len Brown, Intel Open Source Technology Center


Re: [PATCH] RDMA/cma: Make CM response timeout and # CM retries configurable

2019-04-12 Thread Jason Gunthorpe
On Mon, Mar 04, 2019 at 06:27:26AM +, Parav Pandit wrote:

> I think we should use rdma_nl_register(RDMA_NL_RDMA_CM, cma_cb_table) which 
> was removed as part of ID stats removal.
> Because of below reasons.
> 1. rdma netlink command auto loads the module

This is probably the best argument to stay away from sysctl for module
parameters.. It is tricky to make sure the rdma module is loaded
before sysctl runs in boot, and I don't think sysctl autoloads missing
modules, or somehow copes with dynamic module loading, does it?

To that end we should probably have the entire sysctl configurable
available in netlink

Jason


Re: [PATCH v2 2/7] dt: bindings: Add multicolor class dt bindings documention

2019-04-12 Thread Jacek Anaszewski

Dan,

On 4/12/19 8:46 PM, Dan Murphy wrote:

Jacek

On 4/12/19 1:14 PM, Jacek Anaszewski wrote:

Hi Marek,

On 4/12/19 12:07 AM, Marek Behun wrote:

Hi Dan,
this probaly was discussed, but I did not follow brightness model
discussions:
what will happen if I set yellow by writing into yellow mode
brightness, and then orange by writing orange model brightness?
Will the resulting color be a mix of yellow and orange, or will the
orange overwrite the yellow setting?


Orange will overwrite yellow settings. When color name is given
it should be treated as a hue. Then changing brightness level
should affect the lightness of a hue, similarly like changing
L component of HSL color model. This will be however entirely
up to DT brightness-model designer how they will design their models.
We are not going to verify that in the LED multi color class.

It implies that it will be possible to define arbitrary range
of color levels, not necessarily adhering to any established color
model. I think it could be useful to define brightness model
that allows to go from blue color (for cold) up to red (hot)
for representing a temperature for instance.

These ideas will need however more documentation. Generally
we aim to propose only a convention.



Ah but what about the issue of writing the monochrome LED color.  With your 
description
it implies that when we write the red LED, the red LED will come on and if we 
write the blue
LED then the red LED in theory should turn off and the blue come on.


Where did I write that? Probably the way I used word "color"
was not adequate in at least two cases, which may have confused
the reader.

Modifying monochrome color brightness levels (under colors dir)
will just modify corresponding IOUT brightness. Not affecting the
other. I'll try to rephrase what seems to may sound equivocal:

s/When color name is given/When brightness-model is given a color name/

s/arbitrary range of color levels/arbitrary range of brightness-model 
levels/




But these could be used to mix the colors to create some abstract violet that 
is not defined in the brightness
model.  Why should the brightness models and monochrome LEDs have two different 
operations.

Dan



--
Best regards,
Jacek Anaszewski


Re: kernel BUG at fs/inode.c:LINE!

2019-04-12 Thread Al Viro
On Fri, Apr 12, 2019 at 12:59:30PM +0200, Dmitry Vyukov wrote:

> Please don't ignore the bug status tracking part. Or we will only have
> options of not testing kernel or dropping most of the bug reports on
> the floor. Both are equally harmful for kernel quality.
> 
> Let's mark it as fixed by the next commit in series (which already
> made it into mainline):
> 
> #syz fix: autofs: simplify parse_options() function call
> 
> But I won't be around always.

You might want to add something like "commit discarded"...


[PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures

2019-04-12 Thread Johannes Weiner
With the default overcommit==guess we occasionally run into mmap
rejections despite plenty of memory that would get dropped under
pressure but just isn't accounted reclaimable. One example of this is
dying cgroups pinned by some page cache. A previous case was auxiliary
path name memory associated with dentries; we have since annotated
those allocations to avoid overcommit failures (see d79f7aa496fc ("mm:
treat indirectly reclaimable memory as free in overcommit logic")).

But trying to classify all allocated memory reliably as reclaimable
and unreclaimable is a bit of a fool's errand. There could be a myriad
of dependencies that constantly change with kernel versions.

It becomes even more questionable of an effort when considering how
this estimate of available memory is used: it's not compared to the
system-wide allocated virtual memory in any way. It's not even
compared to the allocating process's address space. It's compared to
the single allocation request at hand!

So we have an elaborate left-hand side of the equation that tries to
assess the exact breathing room the system has available down to a
page - and then compare it to an isolated allocation request with no
additional context. We could fail an allocation of N bytes, but for
two allocations of N/2 bytes we'd do this elaborate dance twice in a
row and then still let N bytes of virtual memory through. This doesn't
make a whole lot of sense.

Let's take a step back and look at the actual goal of the
heuristic. From the documentation:

   Heuristic overcommit handling. Obvious overcommits of address
   space are refused. Used for a typical system. It ensures a
   seriously wild allocation fails while allowing overcommit to
   reduce swap usage.  root is allowed to allocate slightly more
   memory in this mode. This is the default.

If all we want to do is catch clearly bogus allocation requests
irrespective of the general virtual memory situation, the physical
memory counter-part doesn't need to be that complicated, either.

When in GUESS mode, catch wild allocations by comparing their request
size to total amount of ram and swap in the system.

Signed-off-by: Johannes Weiner 
---
 mm/util.c | 51 +--
 1 file changed, 5 insertions(+), 46 deletions(-)

diff --git a/mm/util.c b/mm/util.c
index 05a464929b3e..e2e4f8c3fa12 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -652,7 +652,7 @@ EXPORT_SYMBOL_GPL(vm_memory_committed);
  */
 int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
 {
-   long free, allowed, reserve;
+   long allowed;
 
VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
-(s64)vm_committed_as_batch * num_online_cpus(),
@@ -667,51 +667,9 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, 
int cap_sys_admin)
return 0;
 
if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
-   free = global_zone_page_state(NR_FREE_PAGES);
-   free += global_node_page_state(NR_FILE_PAGES);
-
-   /*
-* shmem pages shouldn't be counted as free in this
-* case, they can't be purged, only swapped out, and
-* that won't affect the overall amount of available
-* memory in the system.
-*/
-   free -= global_node_page_state(NR_SHMEM);
-
-   free += get_nr_swap_pages();
-
-   /*
-* Any slabs which are created with the
-* SLAB_RECLAIM_ACCOUNT flag claim to have contents
-* which are reclaimable, under pressure.  The dentry
-* cache and most inode caches should fall into this
-*/
-   free += global_node_page_state(NR_SLAB_RECLAIMABLE);
-
-   /*
-* Part of the kernel memory, which can be released
-* under memory pressure.
-*/
-   free += global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);
-
-   /*
-* Leave reserved pages. The pages are not for anonymous pages.
-*/
-   if (free <= totalreserve_pages)
+   if (pages > totalram_pages() + total_swap_pages)
goto error;
-   else
-   free -= totalreserve_pages;
-
-   /*
-* Reserve some for root
-*/
-   if (!cap_sys_admin)
-   free -= sysctl_admin_reserve_kbytes >> (PAGE_SHIFT - 
10);
-
-   if (free > pages)
-   return 0;
-
-   goto error;
+   return 0;
}
 
allowed = vm_commit_limit();
@@ -725,7 +683,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, 
int cap_sys_admin)
 * Don't let a single process grow so big a user can't recover
 */
if (mm) {
-   reserve = sysctl

Re: [PATCH v2 2/7] dt: bindings: Add multicolor class dt bindings documention

2019-04-12 Thread Jacek Anaszewski

Dan,

On 4/12/19 1:50 PM, Dan Murphy wrote:

Marek

On 4/11/19 5:07 PM, Marek Behun wrote:

Hi Dan,
this probaly was discussed, but I did not follow brightness model
discussions:
what will happen if I set yellow by writing into yellow mode
brightness, and then orange by writing orange model brightness?
Will the resulting color be a mix of yellow and orange, or will the
orange overwrite the yellow setting?



This was not discussed and is a good question.  If you write the yellow mode 
for a group of
LEDs then yellow would be produced for the brightness requested.

If orange is then requested then orange should be displayed at the brightness 
level requested.
So yes the orange will over write the yellow.


Yes, and individual color brightness levels should correspond
to the color components of the brightness-model level currently set.


The next question is if the absolute colors are written does it produce the 
same behavior?

So if you have yellow and write to the red should the red LED brightness be 
modified or should the
color switch to red?
And if the red LED is on and the blue LED is written should the color switch to 
blue or should the blue and red LEDs be mixed together?


Now, if any of the color brightness files is altered it should update
the hardware with this new setting, but brightness-model and main
brightness level should not be changed. The thing that is missing in our
proposal is lack of the way to check if brightness-model is up to date
(i.e. if it reflects what is written to the hardware).

How about utilizing the sync file from the new colors directory?
It could return 1 on read when brightness levels of all colors
match exactly the ones assigned to the brightness model level currently
set.


This is tricky as the user space can write the individual absolute colors and 
mix the LEDs to produce varying
colors.  But the behavior writing the brightness models are different.

I would almost prefer that the user space reads the available absolute LED 
colors and creates the devices supported color palette and write the absolute 
LED colors only.  But this violates the requirements asked for.

Dan


Marek





--
Best regards,
Jacek Anaszewski


[PATCH] perf/x86: descriptive failure messages for PMU init

2019-04-12 Thread Bandan Das


There's a default warning message that gets printed, however,
there are various failure conditions:
 - a msr read can fail
 - a msr write can fail
 - a msr has an unexpected value
 - all msrs have unexpected values (disable PMU)

Also, commit commit 005bd0077a79 ("perf/x86: Modify error message in
virtualized environment") completely removed printing the msr in
question but these messages could be helpful for debugging vPMUs as
well. Add them back and change them to pr_debugs, this keeps the
behavior the same for baremetal.

Lastly, use %llx to silence checkpatch

Signed-off-by: Bandan Das 
---
 arch/x86/events/core.c | 66 --
 1 file changed, 50 insertions(+), 16 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e2b1447192a8..786e03893a0c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -192,9 +192,16 @@ static void release_pmc_hardware(void) {}
 
 static bool check_hw_exists(void)
 {
-   u64 val, val_fail = -1, val_new= ~0;
-   int i, reg, reg_fail = -1, ret = 0;
-   int bios_fail = 0;
+   u64 val = -1, val_fail = -1, val_new = ~0;
+   int i, reg = -1, reg_fail = -1, ret = 0;
+   bool virt = boot_cpu_has(X86_FEATURE_HYPERVISOR) ? true : false;
+   enum {
+ READ_FAIL=1,
+ WRITE_FAIL   =2,
+ PMU_FAIL =3,
+ BIOS_FAIL=4,
+   };
+   int status = 0;
int reg_safe = -1;
 
/*
@@ -204,10 +211,13 @@ static bool check_hw_exists(void)
for (i = 0; i < x86_pmu.num_counters; i++) {
reg = x86_pmu_config_addr(i);
ret = rdmsrl_safe(reg, &val);
-   if (ret)
+   if (ret) {
+   status = READ_FAIL;
goto msr_fail;
+   }
+
if (val & ARCH_PERFMON_EVENTSEL_ENABLE) {
-   bios_fail = 1;
+   status = BIOS_FAIL;
val_fail = val;
reg_fail = reg;
} else {
@@ -218,11 +228,13 @@ static bool check_hw_exists(void)
if (x86_pmu.num_counters_fixed) {
reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
ret = rdmsrl_safe(reg, &val);
-   if (ret)
+   if (ret) {
+   status = READ_FAIL;
goto msr_fail;
+   }
for (i = 0; i < x86_pmu.num_counters_fixed; i++) {
if (val & (0x03 << i*4)) {
-   bios_fail = 1;
+   status = BIOS_FAIL;
val_fail = val;
reg_fail = reg;
}
@@ -236,7 +248,7 @@ static bool check_hw_exists(void)
 */
 
if (reg_safe == -1) {
-   reg = reg_safe;
+   status = PMU_FAIL;
goto msr_fail;
}
 
@@ -246,18 +258,22 @@ static bool check_hw_exists(void)
 * (qemu/kvm) that don't trap on the MSR access and always return 0s.
 */
reg = x86_pmu_event_addr(reg_safe);
-   if (rdmsrl_safe(reg, &val))
+   if (rdmsrl_safe(reg, &val)) {
+   status = READ_FAIL;
goto msr_fail;
+   }
val ^= 0xUL;
ret = wrmsrl_safe(reg, val);
ret |= rdmsrl_safe(reg, &val_new);
-   if (ret || val != val_new)
+   if (ret || val != val_new) {
+   status = WRITE_FAIL;
goto msr_fail;
+   }
 
/*
 * We still allow the PMU driver to operate:
 */
-   if (bios_fail) {
+   if (status == BIOS_FAIL) {
pr_cont("Broken BIOS detected, complain to your hardware 
vendor.\n");
pr_err(FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x 
is %Lx)\n",
  reg_fail, val_fail);
@@ -266,12 +282,30 @@ static bool check_hw_exists(void)
return true;
 
 msr_fail:
-   if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+   if (virt)
pr_cont("PMU not available due to virtualization, using 
software events only.\n");
-   } else {
-   pr_cont("Broken PMU hardware detected, using software events 
only.\n");
-   pr_err("Failed to access perfctr msr (MSR %x is %Lx)\n",
-  reg, val_new);
+   switch (status) {
+   case READ_FAIL:
+   if (virt)
+   pr_debug("Failed to read perfctr msr (MSR %x)\n", reg);
+   else
+   pr_err("Failed to read perfctr msr (MSR %x)\n", reg);
+   break;
+   case WRITE_FAIL:
+   if (virt)
+   pr_debug("Failed to write perfctr msr (MSR %x, wrote: 
%llx, read: %llx)\n",
+reg, val, val_new);
+   else
+   pr_err("Failed to wr

Re: [PATCH] selftests: livepatch use TEST_PROGS for test shell scripts

2019-04-12 Thread shuah

On 4/12/19 12:51 PM, Joe Lawrence wrote:

On 4/12/19 1:05 PM, shuah wrote:

On 4/12/19 7:37 AM, Miroslav Benes wrote:

On Fri, 12 Apr 2019, shuah wrote:


On 4/12/19 1:03 AM, Miroslav Benes wrote:

On Thu, 11 Apr 2019, Shuah Khan wrote:


TEST_PROGS variable is for test shell scripts and common clean target
in lib.mk doesn't touch them. TEST_GEN_PROGS are removed by it.

Fix it to use TEST_PROGS for test shell scripts and 
TEST_PROGS_EXTENDED

for common functions.sh.

Signed-off-by: Shuah Khan 
---
    tools/testing/selftests/livepatch/Makefile | 3 ++-
    1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/livepatch/Makefile
b/tools/testing/selftests/livepatch/Makefile
index af4aee79bebb..fd405402c3ff 100644
--- a/tools/testing/selftests/livepatch/Makefile
+++ b/tools/testing/selftests/livepatch/Makefile
@@ -1,6 +1,7 @@
    # SPDX-License-Identifier: GPL-2.0
-TEST_GEN_PROGS := \
+TEST_PROGS_EXTENDED := functions.sh
+TEST_PROGS := \
 test-livepatch.sh \
 test-callbacks.sh \
 test-shadow-vars.sh


Hi Shuah,

thanks for the patch. We have already something similar queued in 
our git

tree. See
https://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching.git/commit/?h=for-5.1/upstream-fixes&id=abfe3c4560684864f66641438fee3075de098e89 



It is missing TEST_PROGS_EXTENDED though. Should we add it?



Please do. What that does is when selftests are installed, functions.h
gets installed as well so the the test can run from installed location.

Did I miss reviewing the original? I maintain the framework and try to
catch these if patch gets sent to me.


Unfortunately you did and it was our fault. You were not CCed, no one
noticed and we were a bit trigger happy. Sorry about that. It should not
have happened.

Would this work for you?


Looks good to me.



Hi Shuah,

Thanks for spotting this and apologies for missing your CC on my earlier 
patch post.  For future reference, do you prefer a direct CC, 
linux-kselftest, or both?


No worries. Happy to report the problem. Couldn't have missed it with
all the deleted lines showing up whenever I ran diff on my changes. :)

Direct to or cc to me and cc linux-kselftest list is preferred Same as
any other patch really, everybody getmaintainer.pl recommends.

thanks,
-- Shuah



Re: [PATCH v2] chrome/platform: cros_ec_proto:: Add trace event to trace EC commands

2019-04-12 Thread Steven Rostedt
On Fri, 12 Apr 2019 12:49:44 -0600
Raul E Rangel  wrote:

> +#define ec_cmds \
> + {EC_CMD_PROTO_VERSION, "PROTO_VERSION"}, \
> + {EC_CMD_HELLO, "HELLO"}, \
> + {EC_CMD_GET_VERSION, "GET_VERSION"}, \
> + {EC_CMD_READ_TEST, "READ_TEST"}, \
> + {EC_CMD_GET_BUILD_INFO, "GET_BUILD_INFO"}, \
> + {EC_CMD_GET_CHIP_INFO, "GET_CHIP_INFO"}, \
> + {EC_CMD_GET_BOARD_VERSION, "GET_BOARD_VERSION"}, \
> + {EC_CMD_READ_MEMMAP, "READ_MEMMAP"}, \
> + {EC_CMD_GET_CMD_VERSIONS, "GET_CMD_VERSIONS"}, \
> + {EC_CMD_GET_COMMS_STATUS, "GET_COMMS_STATUS"}, \
> + {EC_CMD_TEST_PROTOCOL, "TEST_PROTOCOL"}, \
> + {EC_CMD_GET_PROTOCOL_INFO, "GET_PROTOCOL_INFO"}, \
> + {EC_CMD_GSV_PAUSE_IN_S5, "GSV_PAUSE_IN_S5"}, \
> + {EC_CMD_GET_FEATURES, "GET_FEATURES"}, \

Usually, if I have something like this, I would do:


#define ec_cmds \
EC(PROTO_VERSION), \
EC(HELLO), \
EC(GET_VERSION), \
EC(READ_TEST), \
[...]

Then:

#define EC(a) {EC_CMD_##a, #a}

and then ec_cmds ends up with the same result with much less typing and
little risk for copy past errors.

-- Steve


Re: [PATCH][next] RDMA/mlx5: check for error return in flow_rule rather than err

2019-04-12 Thread Jason Gunthorpe
On Fri, Apr 12, 2019 at 11:40:17AM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> Currently when the call to create_flow_rule_vport_sq fails, the error
> check is being performed on err rather than on the return pointer
> flow_rule.  The return flow_rule maybe NULL (which is not considered
> an error) or an error code, so check for the error on flow_rule.
> 
> Addresses-Coverity: ("Uninitialized scalar variable")
> Fixes: d5ed8ac34cef ("RDMA/mlx5: Move default representors SQ steering to 
> rule to modify QP")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/infiniband/hw/mlx5/qp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied to for-next, thanks

Jason


Re: [PATCH] clk: rockchip: rk3288: Limit use of USB PHY clock to USB

2019-04-12 Thread Heiko Stübner
Am Freitag, 12. April 2019, 20:02:55 CEST schrieb Matthias Kaehlcke:
> On Fri, Apr 12, 2019 at 11:30:37AM +0200, Heiko Stübner wrote:
> > Am Freitag, 12. April 2019, 02:16:57 CEST schrieb Matthias Kaehlcke:
> > > Hi Heiko,
> > > 
> > > On Thu, Apr 11, 2019 at 09:03:07PM +0200, Heiko Stübner wrote:
> > > > Hi Matthias,
> > > > 
> > > > Am Donnerstag, 11. April 2019, 19:59:17 CEST schrieb Matthias Kaehlcke:
> > > > > The USB PHY clock can be configured as (grand) parent of uart0_sclk 
> > > > > and
> > > > > sclk_gpu. It has been observed that UART0 doesn't work reliably in 
> > > > > high
> > > > > speed mode with the PHY clock as input when certain USB devices are
> > > > > plugged to the USB HOST1 port (see https://crrev.com/c/320543).
> > > > > 
> > > > > Prefix the name of the PHY clock with a '.' in the non-USB muxes to
> > > > > effectively remove the clock as input from these muxes.
> > > > > 
> > > > > Signed-off-by: Matthias Kaehlcke 
> > > > > ---
> > > > >  drivers/clk/rockchip/clk-rk3288.c | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/clk/rockchip/clk-rk3288.c 
> > > > > b/drivers/clk/rockchip/clk-rk3288.c
> > > > > index 5a67b7869960..677bc5485201 100644
> > > > > --- a/drivers/clk/rockchip/clk-rk3288.c
> > > > > +++ b/drivers/clk/rockchip/clk-rk3288.c
> > > > > @@ -200,8 +200,8 @@ PNAME(mux_aclk_cpu_src_p) = { "cpll_aclk_cpu", 
> > > > > "gpll_aclk_cpu" };
> > > > >  PNAME(mux_pll_src_cpll_gpll_p)   = { "cpll", "gpll" };
> > > > >  PNAME(mux_pll_src_npll_cpll_gpll_p)  = { "npll", "cpll", "gpll" };
> > > > >  PNAME(mux_pll_src_cpll_gpll_npll_p)  = { "cpll", "gpll", "npll" };
> > > > > -PNAME(mux_pll_src_cpll_gpll_usb480m_p)   = { "cpll", "gpll", 
> > > > > "usbphy480m_src" };
> > > > > -PNAME(mux_pll_src_cpll_gll_usb_npll_p)   = { "cpll", "gpll", 
> > > > > "usbphy480m_src", "npll" };
> > > > > +PNAME(mux_pll_src_cpll_gpll_usb480m_p)   = { "cpll", "gpll", 
> > > > > ".usbphy480m_src" };
> > > > > +PNAME(mux_pll_src_cpll_gll_usb_npll_p)   = { "cpll", "gpll", 
> > > > > ".usbphy480m_src", "npll" };
> > > > 
> > > > In general I like to have things like the clock-tree described fully
> > > > and let the kernel handle correct sourcing ... but:
> > > > 
> > > > As you write this seems like a systemic problem when just connecting
> > > > random peripherals can create unstable clock source frequencies,
> > > > so I tend to agree here ... but:
> > > > 
> > > > Can we please find a more "talking" name for this ... because as with 
> > > > the
> > > > above someone will find the "." and submit a fix for it ;-) .
> > > > 
> > > > So just name it "unstable_dummy" or so?
> > > 
> > > I looked for some common pattern, but couldn't find one. I liked the
> > > '.' since it leaves the name of the clock mostly intact, just hiding
> > > it (similar to a leading '.' in a Linux file system). But I agree that
> > > it might not be expressive enough. I still like the idea to keep the
> > > clock name around for reference, maybe we could name it
> > > "unstable:usbphy480m_src" or similar. If you don't object I'll send a
> > > patch with this some time tomorrow.
> > 
> > I've just adapted the patch to use the new parent-name you suggested
> > and applied it for 5.2 So no need to resend :-) .
> 
> Pefect, thanks!
> 
> I don't see the patch in the git.kernel.org repo (nor
> https://lore.kernel.org/patchwork/patch/1060781/), looks like the push
> is still pending.

oops, I really had forgotten the push ... done now with your patch at
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/log/?h=v5.2-clk/next

Heiko




Re: [PATCH] selftests: livepatch use TEST_PROGS for test shell scripts

2019-04-12 Thread Joe Lawrence

On 4/12/19 1:05 PM, shuah wrote:

On 4/12/19 7:37 AM, Miroslav Benes wrote:

On Fri, 12 Apr 2019, shuah wrote:


On 4/12/19 1:03 AM, Miroslav Benes wrote:

On Thu, 11 Apr 2019, Shuah Khan wrote:


TEST_PROGS variable is for test shell scripts and common clean target
in lib.mk doesn't touch them. TEST_GEN_PROGS are removed by it.

Fix it to use TEST_PROGS for test shell scripts and TEST_PROGS_EXTENDED
for common functions.sh.

Signed-off-by: Shuah Khan 
---
tools/testing/selftests/livepatch/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/livepatch/Makefile
b/tools/testing/selftests/livepatch/Makefile
index af4aee79bebb..fd405402c3ff 100644
--- a/tools/testing/selftests/livepatch/Makefile
+++ b/tools/testing/selftests/livepatch/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0

-TEST_GEN_PROGS := \

+TEST_PROGS_EXTENDED := functions.sh
+TEST_PROGS := \
 test-livepatch.sh \
 test-callbacks.sh \
 test-shadow-vars.sh


Hi Shuah,

thanks for the patch. We have already something similar queued in our git
tree. See
https://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching.git/commit/?h=for-5.1/upstream-fixes&id=abfe3c4560684864f66641438fee3075de098e89

It is missing TEST_PROGS_EXTENDED though. Should we add it?



Please do. What that does is when selftests are installed, functions.h
gets installed as well so the the test can run from installed location.

Did I miss reviewing the original? I maintain the framework and try to
catch these if patch gets sent to me.


Unfortunately you did and it was our fault. You were not CCed, no one
noticed and we were a bit trigger happy. Sorry about that. It should not
have happened.

Would this work for you?


Looks good to me.



Hi Shuah,

Thanks for spotting this and apologies for missing your CC on my earlier 
patch post.  For future reference, do you prefer a direct CC, 
linux-kselftest, or both?


And for Miroslav, you can add my ack if needed.

-- Joe



-->8--

>From c158f5595286dba46f096cc7cc4bcef5ad8b6c16 Mon Sep 17 00:00:00 2001
From: Miroslav Benes 
Date: Fri, 12 Apr 2019 15:31:42 +0200
Subject: [PATCH] selftests/livepatch: Add functions.sh to TEST_PROGS_EXTENDED

Add functions.sh to TEST_PROGS_EXTENDED so that it is installed along
with the rest of the selftests and they can be run.

Originally-by: Shuah Khan 
Signed-off-by: Miroslav Benes 
---
   tools/testing/selftests/livepatch/Makefile | 1 +
   1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/livepatch/Makefile 
b/tools/testing/selftests/livepatch/Makefile
index 114f43e2081a..fd405402c3ff 100644
--- a/tools/testing/selftests/livepatch/Makefile
+++ b/tools/testing/selftests/livepatch/Makefile
@@ -1,5 +1,6 @@
   # SPDX-License-Identifier: GPL-2.0
   
+TEST_PROGS_EXTENDED := functions.sh

   TEST_PROGS := \
test-livepatch.sh \
test-callbacks.sh \



thanks,
-- Shuah



[PATCH v2] chrome/platform: cros_ec_proto:: Add trace event to trace EC commands

2019-04-12 Thread Raul E Rangel
This is useful to see which EC commands are being executed and when.

To enable:

echo 'cros_ec:*' >> /sys/kernel/debug/tracing/set_event

Example:

/* cros_ec_cmd: version: 0, command: GET_VERSION */
/* cros_ec_cmd: version: 0, command: GET_PROTOCOL_INFO */
/* cros_ec_cmd: version: 1, command: GET_CMD_VERSIONS */
/* cros_ec_cmd: version: 1, command: USB_PD_CONTROL */

Signed-off-by: Raul E Rangel 
---

Changes in v2:
- Changed comment style to match other cros_ec files.
- Fixed commit tag.

 drivers/platform/chrome/Makefile|   4 +-
 drivers/platform/chrome/cros_ec_proto.c |   4 +
 drivers/platform/chrome/cros_ec_trace.c | 161 
 drivers/platform/chrome/cros_ec_trace.h |  51 
 4 files changed, 219 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/chrome/cros_ec_trace.c
 create mode 100644 drivers/platform/chrome/cros_ec_trace.h

diff --git a/drivers/platform/chrome/Makefile b/drivers/platform/chrome/Makefile
index 1e2f0029b597..e542268454a4 100644
--- a/drivers/platform/chrome/Makefile
+++ b/drivers/platform/chrome/Makefile
@@ -3,12 +3,14 @@
 obj-$(CONFIG_CHROMEOS_LAPTOP)  += chromeos_laptop.o
 obj-$(CONFIG_CHROMEOS_PSTORE)  += chromeos_pstore.o
 obj-$(CONFIG_CHROMEOS_TBMC)+= chromeos_tbmc.o
+# tell define_trace.h where to find the cros ec trace header
+CFLAGS_cros_ec_trace.o:=   -I$(src)
 obj-$(CONFIG_CROS_EC_I2C)  += cros_ec_i2c.o
 obj-$(CONFIG_CROS_EC_SPI)  += cros_ec_spi.o
 cros_ec_lpcs-objs  := cros_ec_lpc.o cros_ec_lpc_reg.o
 cros_ec_lpcs-$(CONFIG_CROS_EC_LPC_MEC) += cros_ec_lpc_mec.o
 obj-$(CONFIG_CROS_EC_LPC)  += cros_ec_lpcs.o
-obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o
+obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o cros_ec_trace.o
 obj-$(CONFIG_CROS_KBD_LED_BACKLIGHT)   += cros_kbd_led_backlight.o
 obj-$(CONFIG_CROS_EC_LIGHTBAR) += cros_ec_lightbar.o
 obj-$(CONFIG_CROS_EC_VBC)  += cros_ec_vbc.o
diff --git a/drivers/platform/chrome/cros_ec_proto.c 
b/drivers/platform/chrome/cros_ec_proto.c
index 97a068dff192..3d02c8259ac6 100644
--- a/drivers/platform/chrome/cros_ec_proto.c
+++ b/drivers/platform/chrome/cros_ec_proto.c
@@ -10,6 +10,8 @@
 #include 
 #include 
 
+#include "cros_ec_trace.h"
+
 #define EC_COMMAND_RETRIES 50
 
 static int prepare_packet(struct cros_ec_device *ec_dev,
@@ -51,6 +53,8 @@ static int send_command(struct cros_ec_device *ec_dev,
int ret;
int (*xfer_fxn)(struct cros_ec_device *ec, struct cros_ec_command *msg);
 
+   trace_cros_ec_cmd(msg);
+
if (ec_dev->proto_version > 2)
xfer_fxn = ec_dev->pkt_xfer;
else
diff --git a/drivers/platform/chrome/cros_ec_trace.c 
b/drivers/platform/chrome/cros_ec_trace.c
new file mode 100644
index ..335bfce514d7
--- /dev/null
+++ b/drivers/platform/chrome/cros_ec_trace.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0
+// Trace events for the ChromeOS Embedded Controller
+//
+// Copyright 2019 Google LLC.
+
+#define ec_cmds \
+   {EC_CMD_PROTO_VERSION, "PROTO_VERSION"}, \
+   {EC_CMD_HELLO, "HELLO"}, \
+   {EC_CMD_GET_VERSION, "GET_VERSION"}, \
+   {EC_CMD_READ_TEST, "READ_TEST"}, \
+   {EC_CMD_GET_BUILD_INFO, "GET_BUILD_INFO"}, \
+   {EC_CMD_GET_CHIP_INFO, "GET_CHIP_INFO"}, \
+   {EC_CMD_GET_BOARD_VERSION, "GET_BOARD_VERSION"}, \
+   {EC_CMD_READ_MEMMAP, "READ_MEMMAP"}, \
+   {EC_CMD_GET_CMD_VERSIONS, "GET_CMD_VERSIONS"}, \
+   {EC_CMD_GET_COMMS_STATUS, "GET_COMMS_STATUS"}, \
+   {EC_CMD_TEST_PROTOCOL, "TEST_PROTOCOL"}, \
+   {EC_CMD_GET_PROTOCOL_INFO, "GET_PROTOCOL_INFO"}, \
+   {EC_CMD_GSV_PAUSE_IN_S5, "GSV_PAUSE_IN_S5"}, \
+   {EC_CMD_GET_FEATURES, "GET_FEATURES"}, \
+   {EC_CMD_GET_SKU_ID, "GET_SKU_ID"}, \
+   {EC_CMD_SET_SKU_ID, "SET_SKU_ID"}, \
+   {EC_CMD_FLASH_INFO, "FLASH_INFO"}, \
+   {EC_CMD_FLASH_READ, "FLASH_READ"}, \
+   {EC_CMD_FLASH_WRITE, "FLASH_WRITE"}, \
+   {EC_CMD_FLASH_ERASE, "FLASH_ERASE"}, \
+   {EC_CMD_FLASH_PROTECT, "FLASH_PROTECT"}, \
+   {EC_CMD_FLASH_REGION_INFO, "FLASH_REGION_INFO"}, \
+   {EC_CMD_VBNV_CONTEXT, "VBNV_CONTEXT"}, \
+   {EC_CMD_FLASH_SPI_INFO, "FLASH_SPI_INFO"}, \
+   {EC_CMD_FLASH_SELECT, "FLASH_SELECT"}, \
+   {EC_CMD_PWM_GET_FAN_TARGET_RPM, "PWM_GET_FAN_TARGET_RPM"}, \
+   {EC_CMD_PWM_SET_FAN_TARGET_RPM, "PWM_SET_FAN_TARGET_RPM"}, \
+   {EC_CMD_PWM_GET_KEYBOARD_BACKLIGHT, "PWM_GET_KEYBOARD_BACKLIGHT"}, \
+   {EC_CMD_PWM_SET_KEYBOARD_BACKLIGHT, "PWM_SET_KEYBOARD_BACKLIGHT"}, \
+   {EC_CMD_PWM_SET_FAN_DUTY, "PWM_SET_FAN_DUTY"}, \
+   {EC_CMD_PWM_SET_DUTY, "PWM_SET_DUTY"}, \
+   {EC_CMD_PWM_GET_DUTY, "PWM_GET_DUTY"}, \
+   {EC_CMD_LIGHTBAR_CMD, "LIGHTBAR_CMD"}, \
+   {EC_CMD_LED_CONTROL, "LED_CONTROL"}, \
+   {EC_CMD_VBOOT_HASH, "VBOOT_HASH"}, \
+   {EC_CMD_MOTION_SENSE_CMD, 

Re: [PATCH v2 2/7] dt: bindings: Add multicolor class dt bindings documention

2019-04-12 Thread Dan Murphy
Jacek

On 4/12/19 1:14 PM, Jacek Anaszewski wrote:
> Hi Marek,
> 
> On 4/12/19 12:07 AM, Marek Behun wrote:
>> Hi Dan,
>> this probaly was discussed, but I did not follow brightness model
>> discussions:
>> what will happen if I set yellow by writing into yellow mode
>> brightness, and then orange by writing orange model brightness?
>> Will the resulting color be a mix of yellow and orange, or will the
>> orange overwrite the yellow setting?
> 
> Orange will overwrite yellow settings. When color name is given
> it should be treated as a hue. Then changing brightness level
> should affect the lightness of a hue, similarly like changing
> L component of HSL color model. This will be however entirely
> up to DT brightness-model designer how they will design their models.
> We are not going to verify that in the LED multi color class.
> 
> It implies that it will be possible to define arbitrary range
> of color levels, not necessarily adhering to any established color
> model. I think it could be useful to define brightness model
> that allows to go from blue color (for cold) up to red (hot)
> for representing a temperature for instance.
> 
> These ideas will need however more documentation. Generally
> we aim to propose only a convention.
> 

Ah but what about the issue of writing the monochrome LED color.  With your 
description
it implies that when we write the red LED, the red LED will come on and if we 
write the blue
LED then the red LED in theory should turn off and the blue come on.

But these could be used to mix the colors to create some abstract violet that 
is not defined in the brightness
model.  Why should the brightness models and monochrome LEDs have two different 
operations.

Dan


Re: [PATCH RESEND v6 5/6] i2c: mediatek: Add i2c support for MediaTek MT8183

2019-04-12 Thread Matthias Brugger



On 02/04/2019 14:35, Qii Wang wrote:
> Add i2c compatible for MT8183. Compare to MT2712 i2c controller,
> MT8183 has different register offsets. Ltiming_reg is added to
> adjust low width of SCL. Arb clock and dma_sync are needed.
> 
> Signed-off-by: Qii Wang 
> Reviewed-by: Nicolas Boichat 
> ---
>  drivers/i2c/busses/i2c-mt65xx.c |   62 
> +--
>  1 file changed, 60 insertions(+), 2 deletions(-)
> 


Reviewed-by: Matthias Brugger 

> diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
> index 6137ad7..745b0d0 100644
> --- a/drivers/i2c/busses/i2c-mt65xx.c
> +++ b/drivers/i2c/busses/i2c-mt65xx.c
> @@ -133,6 +133,7 @@ enum I2C_REGS_OFFSET {
>   OFFSET_DEBUGCTRL,
>   OFFSET_TRANSFER_LEN_AUX,
>   OFFSET_CLOCK_DIV,
> + OFFSET_LTIMING,
>  };
>  
>  static const u16 mt_i2c_regs_v1[] = {
> @@ -162,6 +163,32 @@ enum I2C_REGS_OFFSET {
>   [OFFSET_CLOCK_DIV] = 0x70,
>  };
>  
> +static const u16 mt_i2c_regs_v2[] = {
> + [OFFSET_DATA_PORT] = 0x0,
> + [OFFSET_SLAVE_ADDR] = 0x4,
> + [OFFSET_INTR_MASK] = 0x8,
> + [OFFSET_INTR_STAT] = 0xc,
> + [OFFSET_CONTROL] = 0x10,
> + [OFFSET_TRANSFER_LEN] = 0x14,
> + [OFFSET_TRANSAC_LEN] = 0x18,
> + [OFFSET_DELAY_LEN] = 0x1c,
> + [OFFSET_TIMING] = 0x20,
> + [OFFSET_START] = 0x24,
> + [OFFSET_EXT_CONF] = 0x28,
> + [OFFSET_LTIMING] = 0x2c,
> + [OFFSET_HS] = 0x30,
> + [OFFSET_IO_CONFIG] = 0x34,
> + [OFFSET_FIFO_ADDR_CLR] = 0x38,
> + [OFFSET_TRANSFER_LEN_AUX] = 0x44,
> + [OFFSET_CLOCK_DIV] = 0x48,
> + [OFFSET_SOFTRESET] = 0x50,
> + [OFFSET_DEBUGSTAT] = 0xe0,
> + [OFFSET_DEBUGCTRL] = 0xe8,
> + [OFFSET_FIFO_STAT] = 0xf4,
> + [OFFSET_FIFO_THRESH] = 0xf8,
> + [OFFSET_DCM_EN] = 0xf88,
> +};
> +
>  struct mtk_i2c_compatible {
>   const struct i2c_adapter_quirks *quirks;
>   const u16 *regs;
> @@ -172,6 +199,7 @@ struct mtk_i2c_compatible {
>   unsigned char support_33bits: 1;
>   unsigned char timing_adjust: 1;
>   unsigned char dma_sync: 1;
> + unsigned char ltiming_adjust: 1;
>  };
>  
>  struct mtk_i2c {
> @@ -195,6 +223,7 @@ struct mtk_i2c {
>   enum mtk_trans_op op;
>   u16 timing_reg;
>   u16 high_speed_reg;
> + u16 ltiming_reg;
>   unsigned char auto_restart;
>   bool ignore_restart_irq;
>   const struct mtk_i2c_compatible *dev_comp;
> @@ -222,6 +251,7 @@ struct mtk_i2c {
>   .support_33bits = 1,
>   .timing_adjust = 1,
>   .dma_sync = 0,
> + .ltiming_adjust = 0,
>  };
>  
>  static const struct mtk_i2c_compatible mt6577_compat = {
> @@ -234,6 +264,7 @@ struct mtk_i2c {
>   .support_33bits = 0,
>   .timing_adjust = 0,
>   .dma_sync = 0,
> + .ltiming_adjust = 0,
>  };
>  
>  static const struct mtk_i2c_compatible mt6589_compat = {
> @@ -246,6 +277,7 @@ struct mtk_i2c {
>   .support_33bits = 0,
>   .timing_adjust = 0,
>   .dma_sync = 0,
> + .ltiming_adjust = 0,
>  };
>  
>  static const struct mtk_i2c_compatible mt7622_compat = {
> @@ -258,6 +290,7 @@ struct mtk_i2c {
>   .support_33bits = 0,
>   .timing_adjust = 0,
>   .dma_sync = 0,
> + .ltiming_adjust = 0,
>  };
>  
>  static const struct mtk_i2c_compatible mt8173_compat = {
> @@ -269,6 +302,19 @@ struct mtk_i2c {
>   .support_33bits = 1,
>   .timing_adjust = 0,
>   .dma_sync = 0,
> + .ltiming_adjust = 0,
> +};
> +
> +static const struct mtk_i2c_compatible mt8183_compat = {
> + .regs = mt_i2c_regs_v2,
> + .pmic_i2c = 0,
> + .dcm = 0,
> + .auto_restart = 1,
> + .aux_len_reg = 1,
> + .support_33bits = 1,
> + .timing_adjust = 1,
> + .dma_sync = 1,
> + .ltiming_adjust = 1,
>  };
>  
>  static const struct of_device_id mtk_i2c_of_match[] = {
> @@ -277,6 +323,7 @@ struct mtk_i2c {
>   { .compatible = "mediatek,mt6589-i2c", .data = &mt6589_compat },
>   { .compatible = "mediatek,mt7622-i2c", .data = &mt7622_compat },
>   { .compatible = "mediatek,mt8173-i2c", .data = &mt8173_compat },
> + { .compatible = "mediatek,mt8183-i2c", .data = &mt8183_compat },
>   {}
>  };
>  MODULE_DEVICE_TABLE(of, mtk_i2c_of_match);
> @@ -361,6 +408,8 @@ static void mtk_i2c_init_hw(struct mtk_i2c *i2c)
>  
>   mtk_i2c_writew(i2c, i2c->timing_reg, OFFSET_TIMING);
>   mtk_i2c_writew(i2c, i2c->high_speed_reg, OFFSET_HS);
> + if (i2c->dev_comp->ltiming_adjust)
> + mtk_i2c_writew(i2c, i2c->ltiming_reg, OFFSET_LTIMING);
>  
>   /* If use i2c pin from PMIC mt6397 side, need set PATH_DIR first */
>   if (i2c->have_pmic)
> @@ -460,6 +509,8 @@ static int mtk_i2c_set_speed(struct mtk_i2c *i2c, 
> unsigned int parent_clk)
>   unsigned int clk_src;
>   unsigned int step_cnt;
>   unsigned int sample_cnt;
> + unsigned int l_step_cnt;
> + unsigned int l_sample_cnt;
>   unsigned int target_speed;
>   int ret;
>  
> @@ -469,11 +520,11 @@ static int mtk_i2c_set_spe

Re: [PATCH RESEND v6 4/6] i2c: mediatek: Add i2c and apdma sync in i2c driver

2019-04-12 Thread Matthias Brugger



On 02/04/2019 14:35, Qii Wang wrote:
> When i2c and apdma use different source clocks, we should enable
> synchronization between them.
> 
> Signed-off-by: Qii Wang 
> Reviewed-by: Nicolas Boichat 
> ---
>  drivers/i2c/busses/i2c-mt65xx.c |   11 +++
>  1 file changed, 11 insertions(+)
> 

Reviewed-by: Matthias Brugger 

> diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
> index 1a7235e..6137ad7 100644
> --- a/drivers/i2c/busses/i2c-mt65xx.c
> +++ b/drivers/i2c/busses/i2c-mt65xx.c
> @@ -77,6 +77,8 @@
>  #define I2C_CONTROL_DIR_CHANGE  (0x1 << 4)
>  #define I2C_CONTROL_ACKERR_DET_EN   (0x1 << 5)
>  #define I2C_CONTROL_TRANSFER_LEN_CHANGE (0x1 << 6)
> +#define I2C_CONTROL_DMAACK_EN   (0x1 << 8)
> +#define I2C_CONTROL_ASYNC_MODE  (0x1 << 9)
>  #define I2C_CONTROL_WRAPPER (0x1 << 0)
>  
>  #define I2C_DRV_NAME "i2c-mt65xx"
> @@ -169,6 +171,7 @@ struct mtk_i2c_compatible {
>   unsigned char aux_len_reg: 1;
>   unsigned char support_33bits: 1;
>   unsigned char timing_adjust: 1;
> + unsigned char dma_sync: 1;
>  };
>  
>  struct mtk_i2c {
> @@ -218,6 +221,7 @@ struct mtk_i2c {
>   .aux_len_reg = 1,
>   .support_33bits = 1,
>   .timing_adjust = 1,
> + .dma_sync = 0,
>  };
>  
>  static const struct mtk_i2c_compatible mt6577_compat = {
> @@ -229,6 +233,7 @@ struct mtk_i2c {
>   .aux_len_reg = 0,
>   .support_33bits = 0,
>   .timing_adjust = 0,
> + .dma_sync = 0,
>  };
>  
>  static const struct mtk_i2c_compatible mt6589_compat = {
> @@ -240,6 +245,7 @@ struct mtk_i2c {
>   .aux_len_reg = 0,
>   .support_33bits = 0,
>   .timing_adjust = 0,
> + .dma_sync = 0,
>  };
>  
>  static const struct mtk_i2c_compatible mt7622_compat = {
> @@ -251,6 +257,7 @@ struct mtk_i2c {
>   .aux_len_reg = 1,
>   .support_33bits = 0,
>   .timing_adjust = 0,
> + .dma_sync = 0,
>  };
>  
>  static const struct mtk_i2c_compatible mt8173_compat = {
> @@ -261,6 +268,7 @@ struct mtk_i2c {
>   .aux_len_reg = 1,
>   .support_33bits = 1,
>   .timing_adjust = 0,
> + .dma_sync = 0,
>  };
>  
>  static const struct of_device_id mtk_i2c_of_match[] = {
> @@ -360,6 +368,9 @@ static void mtk_i2c_init_hw(struct mtk_i2c *i2c)
>  
>   control_reg = I2C_CONTROL_ACKERR_DET_EN |
> I2C_CONTROL_CLK_EXT_EN | I2C_CONTROL_DMA_EN;
> + if (i2c->dev_comp->dma_sync)
> + control_reg |= I2C_CONTROL_DMAACK_EN | I2C_CONTROL_ASYNC_MODE;
> +
>   mtk_i2c_writew(i2c, control_reg, OFFSET_CONTROL);
>   mtk_i2c_writew(i2c, I2C_DELAY_LEN, OFFSET_DELAY_LEN);
>  
> 


Re: [PATCH RESEND v6 3/6] i2c: mediatek: Add arb clock in i2c driver

2019-04-12 Thread Matthias Brugger



On 02/04/2019 14:35, Qii Wang wrote:
> When two i2c controllers are internally connected to the same
> GPIO pins, the arb clock is needed to ensure that the waveforms
> do not interfere with each other. And we also need to enable
> the interrupt to find arb lost, old i2c controllers also have
> the bit.
> 
> Signed-off-by: Qii Wang 
> Reviewed-by: Nicolas Boichat 
> ---
>  drivers/i2c/busses/i2c-mt65xx.c |   25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)

Looks good:
Reviewed-by: Matthias Brugger 

> 
> diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
> index be36018..1a7235e 100644
> --- a/drivers/i2c/busses/i2c-mt65xx.c
> +++ b/drivers/i2c/busses/i2c-mt65xx.c
> @@ -35,6 +35,7 @@
>  #include 
>  
>  #define I2C_RS_TRANSFER  (1 << 4)
> +#define I2C_ARB_LOST (1 << 3)
>  #define I2C_HS_NACKERR   (1 << 2)
>  #define I2C_ACKERR   (1 << 1)
>  #define I2C_TRANSAC_COMP (1 << 0)
> @@ -181,6 +182,7 @@ struct mtk_i2c {
>   struct clk *clk_main;   /* main clock for i2c bus */
>   struct clk *clk_dma;/* DMA clock for i2c via DMA */
>   struct clk *clk_pmic;   /* PMIC clock for i2c from PMIC */
> + struct clk *clk_arb;/* Arbitrator clock for i2c */
>   bool have_pmic; /* can use i2c pins from PMIC */
>   bool use_push_pull; /* IO config push-pull mode */
>  
> @@ -299,8 +301,18 @@ static int mtk_i2c_clock_enable(struct mtk_i2c *i2c)
>   if (ret)
>   goto err_pmic;
>   }
> +
> + if (i2c->clk_arb) {
> + ret = clk_prepare_enable(i2c->clk_arb);
> + if (ret)
> + goto err_arb;
> + }
> +
>   return 0;
>  
> +err_arb:
> + if (i2c->have_pmic)
> + clk_disable_unprepare(i2c->clk_pmic);
>  err_pmic:
>   clk_disable_unprepare(i2c->clk_main);
>  err_main:
> @@ -311,6 +323,9 @@ static int mtk_i2c_clock_enable(struct mtk_i2c *i2c)
>  
>  static void mtk_i2c_clock_disable(struct mtk_i2c *i2c)
>  {
> + if (i2c->clk_arb)
> + clk_disable_unprepare(i2c->clk_arb);
> +
>   if (i2c->have_pmic)
>   clk_disable_unprepare(i2c->clk_pmic);
>  
> @@ -519,13 +534,13 @@ static int mtk_i2c_do_transfer(struct mtk_i2c *i2c, 
> struct i2c_msg *msgs,
>  
>   /* Clear interrupt status */
>   mtk_i2c_writew(i2c, restart_flag | I2C_HS_NACKERR | I2C_ACKERR |
> -I2C_TRANSAC_COMP, OFFSET_INTR_STAT);
> + I2C_ARB_LOST | I2C_TRANSAC_COMP, OFFSET_INTR_STAT);
>  
>   mtk_i2c_writew(i2c, I2C_FIFO_ADDR_CLR, OFFSET_FIFO_ADDR_CLR);
>  
>   /* Enable interrupt */
>   mtk_i2c_writew(i2c, restart_flag | I2C_HS_NACKERR | I2C_ACKERR |
> -I2C_TRANSAC_COMP, OFFSET_INTR_MASK);
> + I2C_ARB_LOST | I2C_TRANSAC_COMP, OFFSET_INTR_MASK);
>  
>   /* Set transfer and transaction len */
>   if (i2c->op == I2C_MASTER_WRRD) {
> @@ -659,7 +674,7 @@ static int mtk_i2c_do_transfer(struct mtk_i2c *i2c, 
> struct i2c_msg *msgs,
>  
>   /* Clear interrupt mask */
>   mtk_i2c_writew(i2c, ~(restart_flag | I2C_HS_NACKERR | I2C_ACKERR |
> -I2C_TRANSAC_COMP), OFFSET_INTR_MASK);
> + I2C_ARB_LOST | I2C_TRANSAC_COMP), OFFSET_INTR_MASK);
>  
>   if (i2c->op == I2C_MASTER_WR) {
>   dma_unmap_single(i2c->dev, wpaddr,
> @@ -884,6 +899,10 @@ static int mtk_i2c_probe(struct platform_device *pdev)
>   return PTR_ERR(i2c->clk_dma);
>   }
>  
> + i2c->clk_arb = devm_clk_get(&pdev->dev, "arb");
> + if (IS_ERR(i2c->clk_arb))
> + i2c->clk_arb = NULL;
> +
>   clk = i2c->clk_main;
>   if (i2c->have_pmic) {
>   i2c->clk_pmic = devm_clk_get(&pdev->dev, "pmic");
> 


Re: [PATCH v2] arm64: dts: qcom: Add Lenovo Miix 630

2019-04-12 Thread Jeffrey Hugo
On Fri, Apr 12, 2019 at 12:01 PM Bjorn Andersson
 wrote:
>
> On Thu 11 Apr 13:51 PDT 2019, Jeffrey Hugo wrote:
> > diff --git a/arch/arm64/boot/dts/qcom/msm8998-clam.dtsi 
> > b/arch/arm64/boot/dts/qcom/msm8998-clam.dtsi
> [..]
> > +&tlmm {
> > + gpio-reserved-ranges = <0 4>, <81 4>;
>
> Did you double check that you actually need these? (Iirc boot testing is
> enough)

Yes, still needed.

>
> > +
> > + touchpad: touchpad {
> > + config {
> > + pins = "gpio123";
> > + bias-pull-up;   /* pull up */
> > + };
> > + };
> > +};
> [..]
> > diff --git a/arch/arm64/boot/dts/qcom/msm8998-lenovo-miix-630.dts 
> > b/arch/arm64/boot/dts/qcom/msm8998-lenovo-miix-630.dts
> [..]
> > +/ {
> > + model = "Lenovo Miix 630";
> > + compatible = "qcom,msm8998-clam";
>
> compatible = "lenovo,miix-630", "qcom,msm8998";

Thanks for the suggestion.  Will do.

>
> > +
> > + /* bootloader doesn't understand DT, so no qcom,board-id */
>
> You can omit this comment, as I don't think people will miss these
> non-standard properties.

Makes sense.  Will do.


[PATCH v4 9/9] clk: fixed-factor: Let clk framework find parent

2019-04-12 Thread Stephen Boyd
Convert this driver to a more modern way of specifying parents now that
we have a way to specify clk parents by DT index. This lets us nicely
avoid a problem where a parent clk name isn't know because the parent
clk hasn't been registered yet.

Cc: Miquel Raynal 
Cc: Jerome Brunet 
Cc: Russell King 
Cc: Michael Turquette 
Cc: Jeffrey Hugo 
Cc: Chen-Yu Tsai 
Signed-off-by: Stephen Boyd 
---
 drivers/clk/clk-fixed-factor.c | 53 +-
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
index 241b3f8c61a9..67cc7e515e42 100644
--- a/drivers/clk/clk-fixed-factor.c
+++ b/drivers/clk/clk-fixed-factor.c
@@ -64,12 +64,14 @@ const struct clk_ops clk_fixed_factor_ops = {
 };
 EXPORT_SYMBOL_GPL(clk_fixed_factor_ops);
 
-struct clk_hw *clk_hw_register_fixed_factor(struct device *dev,
-   const char *name, const char *parent_name, unsigned long flags,
-   unsigned int mult, unsigned int div)
+static struct clk_hw *
+__clk_hw_register_fixed_factor(struct device *dev, struct device_node *np,
+   const char *name, const char *parent_name, int index,
+   unsigned long flags, unsigned int mult, unsigned int div)
 {
struct clk_fixed_factor *fix;
struct clk_init_data init;
+   struct clk_parent_data pdata = { .index = index };
struct clk_hw *hw;
int ret;
 
@@ -85,11 +87,17 @@ struct clk_hw *clk_hw_register_fixed_factor(struct device 
*dev,
init.name = name;
init.ops = &clk_fixed_factor_ops;
init.flags = flags | CLK_IS_BASIC;
-   init.parent_names = &parent_name;
+   if (parent_name)
+   init.parent_names = &parent_name;
+   else
+   init.parent_data = &pdata;
init.num_parents = 1;
 
hw = &fix->hw;
-   ret = clk_hw_register(dev, hw);
+   if (dev)
+   ret = clk_hw_register(dev, hw);
+   else
+   ret = of_clk_hw_register(np, hw);
if (ret) {
kfree(fix);
hw = ERR_PTR(ret);
@@ -97,6 +105,14 @@ struct clk_hw *clk_hw_register_fixed_factor(struct device 
*dev,
 
return hw;
 }
+
+struct clk_hw *clk_hw_register_fixed_factor(struct device *dev,
+   const char *name, const char *parent_name, unsigned long flags,
+   unsigned int mult, unsigned int div)
+{
+   return __clk_hw_register_fixed_factor(dev, NULL, name, parent_name, -1, 
flags,
+ mult, div);
+}
 EXPORT_SYMBOL_GPL(clk_hw_register_fixed_factor);
 
 struct clk *clk_register_fixed_factor(struct device *dev, const char *name,
@@ -143,11 +159,10 @@ static const struct of_device_id 
set_rate_parent_matches[] = {
{ /* Sentinel */ },
 };
 
-static struct clk *_of_fixed_factor_clk_setup(struct device_node *node)
+static struct clk_hw *_of_fixed_factor_clk_setup(struct device_node *node)
 {
-   struct clk *clk;
+   struct clk_hw *hw;
const char *clk_name = node->name;
-   const char *parent_name;
unsigned long flags = 0;
u32 div, mult;
int ret;
@@ -165,30 +180,28 @@ static struct clk *_of_fixed_factor_clk_setup(struct 
device_node *node)
}
 
of_property_read_string(node, "clock-output-names", &clk_name);
-   parent_name = of_clk_get_parent_name(node, 0);
 
if (of_match_node(set_rate_parent_matches, node))
flags |= CLK_SET_RATE_PARENT;
 
-   clk = clk_register_fixed_factor(NULL, clk_name, parent_name, flags,
-   mult, div);
-   if (IS_ERR(clk)) {
+   hw = __clk_hw_register_fixed_factor(NULL, node, clk_name, NULL, 0,
+   flags, mult, div);
+   if (IS_ERR(hw)) {
/*
-* If parent clock is not registered, registration would fail.
 * Clear OF_POPULATED flag so that clock registration can be
 * attempted again from probe function.
 */
of_node_clear_flag(node, OF_POPULATED);
-   return clk;
+   return ERR_CAST(hw);
}
 
-   ret = of_clk_add_provider(node, of_clk_src_simple_get, clk);
+   ret = of_clk_add_hw_provider(node, of_clk_hw_simple_get, hw);
if (ret) {
-   clk_unregister(clk);
+   clk_hw_unregister_fixed_factor(hw);
return ERR_PTR(ret);
}
 
-   return clk;
+   return hw;
 }
 
 /**
@@ -203,17 +216,17 @@ CLK_OF_DECLARE(fixed_factor_clk, "fixed-factor-clock",
 
 static int of_fixed_factor_clk_remove(struct platform_device *pdev)
 {
-   struct clk *clk = platform_get_drvdata(pdev);
+   struct clk_hw *clk = platform_get_drvdata(pdev);
 
of_clk_del_provider(pdev->dev.of_node);
-   clk_unregister_fixed_factor(clk);
+   clk_hw_unregister_fixed_factor(clk);
 
return 0;
 }
 
 st

[PATCH v4 0/9] Rewrite clk parent handling

2019-04-12 Thread Stephen Boyd
There are a couple warts with clk parent handling in the common clk
framework. This patch series addresses one of those warts, parent-child
linkages. We use strings for all parent-child linkages, and this leads
to poorly written code that extracts clk names from struct clk pointers
and makes clk provider drivers use clk consumer APIs. I've converted the
fixed factor clk and I'm looking at converting users of other "basic"
clk types in follow-up patches so we can start moving drivers away from
string matching in the clk namespace.

Changes from v3:
 * Split clkdev patch into two
 * New patch to let dev_of_node() accept NULL
 * Use dev_of_node() in of_clk_hw_register() to avoid NULL deref

Changes from v2:
 * Dropped patches that got merged into v5.1-rc1
 * Introduced a new function of_clk_hw_register()
 * Introduced 'index' into clk_parent_data structure 
 * Converted fixed-factor basic type to new design
 * Fixed some bugs in a recent patch to clkdev, leading to
   an essential API for clkdev based lookups working

Changes from v1:
 * Dropped new get_parent_hw, we'll pick it up in a later series
 * Rebased to v5.0-rc6 to avoid conflicts with clk-fixes
 * Renamed 'fallback' to 'name' and 'name' to 'fw_name' in parent map
 * Made sure that clk_hw_get_parent_by_index() never sees an error pointer
   so that we don't mistakenly try to defer an error pointer
 * Fixed index passing mistake on of_clk_get_hw_from_clkspec()
 * Copy over all the data from parent maps and free it correctly too

Cc: Chen-Yu Tsai 
Cc: Greg Kroah-Hartman 
Cc: Jeffrey Hugo 
Cc: Jerome Brunet 
Cc: Matti Vaittinen 
Cc: Michael Turquette 
Cc: Miquel Raynal 
Cc: Rob Herring 
Cc: Russell King 

Stephen Boyd (9):
  clkdev: Hold clocks_mutex while iterating clocks list
  clkdev: Move clk creation outside of 'clocks_mutex'
  clk: Prepare for clk registration API that uses DT nodes
  driver core: Let dev_of_node() accept a NULL dev
  clk: Add of_clk_hw_register() API for early clk drivers
  clk: Allow parents to be specified without string names
  clk: Look for parents with clkdev based clk_lookups
  clk: Allow parents to be specified via clkspec index
  clk: fixed-factor: Let clk framework find parent

 drivers/clk/clk-fixed-factor.c |  53 --
 drivers/clk/clk.c  | 326 +
 drivers/clk/clk.h  |   2 +
 drivers/clk/clkdev.c   |  30 +--
 include/linux/clk-provider.h   |  22 +++
 include/linux/device.h |   2 +-
 6 files changed, 327 insertions(+), 108 deletions(-)


base-commit: 9e98c678c2d6ae3a17cb2de55d17f69dddaa231b
-- 
Sent by a computer through tubes



[PATCH v4 5/9] clk: Add of_clk_hw_register() API for early clk drivers

2019-04-12 Thread Stephen Boyd
In some circumstances drivers register clks early and don't have access
to a struct device because the device model isn't initialized yet. Add
an API to let drivers register clks associated with a struct device_node
so that these drivers can participate in getting parent clks through DT.

Cc: Miquel Raynal 
Cc: Jerome Brunet 
Cc: Russell King 
Cc: Michael Turquette 
Cc: Jeffrey Hugo 
Cc: Chen-Yu Tsai 
Cc: Rob Herring 
Signed-off-by: Stephen Boyd 
---
 drivers/clk/clk.c| 26 +++---
 include/linux/clk-provider.h |  1 +
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index d27775a73e67..ffa63ddcd408 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -45,6 +45,7 @@ struct clk_core {
struct clk_hw   *hw;
struct module   *owner;
struct device   *dev;
+   struct device_node  *of_node;
struct clk_core *parent;
const char  **parent_names;
struct clk_core **parents;
@@ -3313,7 +3314,8 @@ struct clk *clk_hw_create_clk(struct device *dev, struct 
clk_hw *hw,
return clk;
 }
 
-static struct clk *__clk_register(struct device *dev, struct clk_hw *hw)
+static struct clk *
+__clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw)
 {
int i, ret;
struct clk_core *core;
@@ -3339,6 +3341,7 @@ static struct clk *__clk_register(struct device *dev, 
struct clk_hw *hw)
if (dev && pm_runtime_enabled(dev))
core->rpm_enabled = true;
core->dev = dev;
+   core->of_node = np;
if (dev && dev->driver)
core->owner = dev->driver->owner;
core->hw = hw;
@@ -3429,7 +3432,7 @@ static struct clk *__clk_register(struct device *dev, 
struct clk_hw *hw)
  */
 struct clk *clk_register(struct device *dev, struct clk_hw *hw)
 {
-   return __clk_register(dev, hw);
+   return __clk_register(dev, dev_of_node(dev), hw);
 }
 EXPORT_SYMBOL_GPL(clk_register);
 
@@ -3445,10 +3448,27 @@ EXPORT_SYMBOL_GPL(clk_register);
  */
 int clk_hw_register(struct device *dev, struct clk_hw *hw)
 {
-   return PTR_ERR_OR_ZERO(__clk_register(dev, hw));
+   return PTR_ERR_OR_ZERO(__clk_register(dev, dev_of_node(dev), hw));
 }
 EXPORT_SYMBOL_GPL(clk_hw_register);
 
+/*
+ * of_clk_hw_register - register a clk_hw and return an error code
+ * @node: device_node of device that is registering this clock
+ * @hw: link to hardware-specific clock data
+ *
+ * of_clk_hw_register() is the primary interface for populating the clock tree
+ * with new clock nodes when a struct device is not available, but a struct
+ * device_node is. It returns an integer equal to zero indicating success or
+ * less than zero indicating failure. Drivers must test for an error code after
+ * calling of_clk_hw_register().
+ */
+int of_clk_hw_register(struct device_node *node, struct clk_hw *hw)
+{
+   return PTR_ERR_OR_ZERO(__clk_register(NULL, node, hw));
+}
+EXPORT_SYMBOL_GPL(of_clk_hw_register);
+
 /* Free memory allocated for a clock. */
 static void __clk_release(struct kref *ref)
 {
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index b7cf80a71293..7d2d97e15b76 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -773,6 +773,7 @@ struct clk *devm_clk_register(struct device *dev, struct 
clk_hw *hw);
 
 int __must_check clk_hw_register(struct device *dev, struct clk_hw *hw);
 int __must_check devm_clk_hw_register(struct device *dev, struct clk_hw *hw);
+int __must_check of_clk_hw_register(struct device_node *node, struct clk_hw 
*hw);
 
 void clk_unregister(struct clk *clk);
 void devm_clk_unregister(struct device *dev, struct clk *clk);
-- 
Sent by a computer through tubes



[PATCH v4 8/9] clk: Allow parents to be specified via clkspec index

2019-04-12 Thread Stephen Boyd
Some clk providers are simple DT nodes that only have a 'clocks'
property without having an associated 'clock-names' property. In these
cases, we want to let these clk providers point to their parent clks
without having to dereference the 'clocks' property at probe time to
figure out the parent's globally unique clk name. Let's add an 'index'
property to the parent_data structure so that clk providers can indicate
that their parent is a particular index in the 'clocks' DT property.

Cc: Miquel Raynal 
Cc: Jerome Brunet 
Cc: Russell King 
Cc: Michael Turquette 
Cc: Jeffrey Hugo 
Cc: Chen-Yu Tsai 
Signed-off-by: Stephen Boyd 
---
 drivers/clk/clk.c| 18 +++---
 include/linux/clk-provider.h |  2 ++
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 42f29fd6bfd8..65fe50b139ea 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -44,6 +44,7 @@ struct clk_parent_map {
struct clk_core *core;
const char  *fw_name;
const char  *name;
+   int index;
 };
 
 struct clk_core {
@@ -326,7 +327,8 @@ static struct clk_core *clk_core_lookup(const char *name)
 /**
  * clk_core_get - Find the clk_core parent of a clk
  * @core: clk to find parent of
- * @name: name to search for
+ * @name: name to search for (if string based)
+ * @index: index to use for search (if DT index based)
  *
  * This is the preferred method for clk providers to find the parent of a
  * clk when that parent is external to the clk controller. The parent_names
@@ -358,22 +360,23 @@ static struct clk_core *clk_core_lookup(const char *name)
  * provider knows about the clk but it isn't provided on this system.
  * A valid clk_core pointer when the clk can be found in the provider.
  */
-static struct clk_core *clk_core_get(struct clk_core *core, const char *name)
+static struct clk_core *clk_core_get(struct clk_core *core, const char *name,
+int index)
 {
struct clk_hw *hw = ERR_PTR(-ENOENT);
struct device *dev = core->dev;
const char *dev_id = dev ? dev_name(dev) : NULL;
struct device_node *np = core->of_node;
 
-   if (np)
-   hw = of_clk_get_hw(np, -1, name);
+   if (np && index >= 0)
+   hw = of_clk_get_hw(np, index, name);
 
/*
 * If the DT search above couldn't find the provider or the provider
 * didn't know about this clk, fallback to looking up via clkdev based
 * clk_lookups
 */
-   if (PTR_ERR(hw) == -ENOENT)
+   if (PTR_ERR(hw) == -ENOENT && name)
hw = clk_find_hw(dev_id, name);
 
if (IS_ERR(hw))
@@ -397,8 +400,7 @@ static void clk_core_fill_parent_index(struct clk_core 
*core, u8 index)
if (!parent)
parent = ERR_PTR(-EPROBE_DEFER);
} else {
-   if (entry->fw_name)
-   parent = clk_core_get(core, entry->fw_name);
+   parent = clk_core_get(core, entry->fw_name, entry->index);
if (IS_ERR(parent) && PTR_ERR(parent) == -ENOENT)
parent = clk_core_lookup(entry->name);
}
@@ -3443,6 +3445,7 @@ static int clk_core_populate_parent_map(struct clk_core 
*core)
 
/* Copy everything over because it might be __initdata */
for (i = 0, parent = parents; i < num_parents; i++, parent++) {
+   parent->index = -1;
if (parent_names) {
/* throw a WARN if any entries are NULL */
WARN(!parent_names[i],
@@ -3452,6 +3455,7 @@ static int clk_core_populate_parent_map(struct clk_core 
*core)
   true);
} else if (parent_data) {
parent->hw = parent_data[i].hw;
+   parent->index = parent_data[i].index;
ret = clk_cpy_name(&parent->fw_name,
   parent_data[i].fw_name, false);
if (!ret)
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index 95d5a9d4e8c3..12fef24b740f 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -255,11 +255,13 @@ struct clk_ops {
  * @hw: parent clk_hw pointer (used for clk providers with internal clks)
  * @fw_name: parent name local to provider registering clk
  * @name: globally unique parent name (used as a fallback)
+ * @index: parent index local to provider registering clk (if @fw_name absent)
  */
 struct clk_parent_data {
const struct clk_hw *hw;
const char  *fw_name;
const char  *name;
+   int index;
 };
 
 /**
-- 
Sent by a computer through tubes



[PATCH v4 1/9] clkdev: Hold clocks_mutex while iterating clocks list

2019-04-12 Thread Stephen Boyd
We recently introduced a change to support devm clk lookups. That change
introduced a code-path that used clk_find() without holding the
'clocks_mutex'. Unfortunately, clk_find() iterates over the 'clocks'
list and so we need to prevent the list from being modified at the same
time. Do this by holding the mutex and checking to make sure it's held
while iterating the list.

Note, we don't really care if the lookup is freed after we find it with
clk_find() because we're just doing a pointer comparison, but if we did
care we would need to keep holding the mutex while we dereference the
clk_lookup pointer.

Fixes: 3eee6c7d119c ("clkdev: add managed clkdev lookup registration")
Cc: Miquel Raynal 
Cc: Jerome Brunet 
Cc: Russell King 
Cc: Michael Turquette 
Cc: Jeffrey Hugo 
Cc: Chen-Yu Tsai 
Cc: Matti Vaittinen 
Signed-off-by: Stephen Boyd 
---

I plan to take this through clk-fixes for v5.1-rc series.

 drivers/clk/clkdev.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c
index 8c4435c53f09..6e787cc9e5b9 100644
--- a/drivers/clk/clkdev.c
+++ b/drivers/clk/clkdev.c
@@ -46,6 +46,8 @@ static struct clk_lookup *clk_find(const char *dev_id, const 
char *con_id)
if (con_id)
best_possible += 1;
 
+   lockdep_assert_held(&clocks_mutex);
+
list_for_each_entry(p, &clocks, node) {
match = 0;
if (p->dev_id) {
@@ -402,7 +404,10 @@ void devm_clk_release_clkdev(struct device *dev, const 
char *con_id,
struct clk_lookup *cl;
int rval;
 
+   mutex_lock(&clocks_mutex);
cl = clk_find(dev_id, con_id);
+   mutex_unlock(&clocks_mutex);
+
WARN_ON(!cl);
rval = devres_release(dev, devm_clkdev_release,
  devm_clk_match_clkdev, cl);
-- 
Sent by a computer through tubes



[PATCH v4 6/9] clk: Allow parents to be specified without string names

2019-04-12 Thread Stephen Boyd
The common clk framework is lacking in ability to describe the clk
topology without specifying strings for every possible parent-child
link. There are a few drawbacks to the current approach:

 1) String comparisons are used for everything, including describing
 topologies that are 'local' to a single clock controller.

 2) clk providers (e.g. i2c clk drivers) need to create globally unique
 clk names to avoid collisions in the clk namespace, leading to awkward
 name generation code in various clk drivers.

 3) DT bindings may not fully describe the clk topology and linkages
 between clk controllers because drivers can easily rely on globally unique
 strings to describe connections between clks.

This leads to confusing DT bindings, complicated clk name generation
code, and inefficient string comparisons during clk registration just so
that the clk framework can detect the topology of the clk tree.
Furthermore, some drivers call clk_get() and then __clk_get_name() to
extract the globally unique clk name just so they can specify the parent
of the clk they're registering. We have of_clk_parent_fill() but that
mostly only works for single clks registered from a DT node, which isn't
the norm. Let's simplify this all by introducing two new ways of
specifying clk parents.

The first method is an array of pointers to clk_hw structures
corresponding to the parents at that index. This works for clks that are
registered when we have access to all the clk_hw pointers for the
parents.

The second method is a mix of clk_hw pointers and strings of local and
global parent clk names. If the .fw_name member of the map is set we'll
look for that clk by performing a DT based lookup of the device the clk
is registered with and the .name specified in the map. If that fails,
we'll fallback to the .name member and perform a global clk name lookup
like we've always done before.

Using either one of these new methods is entirely optional. Existing
drivers will continue to work, and they can migrate to this new approach
as they see fit. Eventually, we'll want to get rid of the 'parent_names'
array in struct clk_init_data and use one of these new methods instead.

Cc: Miquel Raynal 
Cc: Jerome Brunet 
Cc: Russell King 
Cc: Michael Turquette 
Cc: Jeffrey Hugo 
Cc: Chen-Yu Tsai 
Cc: Rob Herring 
Signed-off-by: Stephen Boyd 
---
 drivers/clk/clk.c| 262 ++-
 include/linux/clk-provider.h |  19 +++
 2 files changed, 219 insertions(+), 62 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index ffa63ddcd408..de49a7c4214b 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -39,6 +39,13 @@ static LIST_HEAD(clk_notifier_list);
 
 /***private data structures***/
 
+struct clk_parent_map {
+   const struct clk_hw *hw;
+   struct clk_core *core;
+   const char  *fw_name;
+   const char  *name;
+};
+
 struct clk_core {
const char  *name;
const struct clk_ops*ops;
@@ -47,8 +54,7 @@ struct clk_core {
struct device   *dev;
struct device_node  *of_node;
struct clk_core *parent;
-   const char  **parent_names;
-   struct clk_core **parents;
+   struct clk_parent_map   *parents;
u8  num_parents;
u8  new_parent_index;
unsigned long   rate;
@@ -317,17 +323,92 @@ static struct clk_core *clk_core_lookup(const char *name)
return NULL;
 }
 
+/**
+ * clk_core_get - Find the parent of a clk using a clock specifier in DT
+ * @core: clk to find parent of
+ * @name: name to search for in 'clock-names' of device providing clk
+ *
+ * This is the preferred method for clk providers to find the parent of a
+ * clk when that parent is external to the clk controller. The parent_names
+ * array is indexed and treated as a local name matching a string in the device
+ * node's 'clock-names' property. This allows clk providers to use their own
+ * namespace instead of looking for a globally unique parent string.
+ *
+ * For example the following DT snippet would allow a clock registered by the
+ * clock-controller@c001 that has a clk_init_data::parent_data array
+ * with 'xtal' in the 'name' member to find the clock provided by the
+ * clock-controller@f00abcd without needing to get the globally unique name of
+ * the xtal clk.
+ *
+ * parent: clock-controller@f00abcd {
+ * reg = <0xf00abcd 0xabcd>;
+ * #clock-cells = <0>;
+ * };
+ *
+ * clock-controller@c001 {
+ * reg = <0xc001 0xf00d>;
+ * clocks = <&parent>;
+ * clock-names = "xtal";
+ * #clock-cells = <1>;
+ * };
+ *
+ * Returns: -ENOENT when the provider can't be found or the clk doesn't
+ * exist in the provider. -EINVAL when the name can't be found. NULL when the
+ * provider knows about the clk but it isn't provided on thi

[PATCH v4 4/9] driver core: Let dev_of_node() accept a NULL dev

2019-04-12 Thread Stephen Boyd
We'd like to chain this in places where the 'dev' argument might be
NULL. Let this function take a NULL 'dev' so this can work.

Cc: Miquel Raynal 
Cc: Jerome Brunet 
Cc: Russell King 
Cc: Michael Turquette 
Cc: Jeffrey Hugo 
Cc: Chen-Yu Tsai 
Cc: Greg Kroah-Hartman 
Cc: Rob Herring 
Signed-off-by: Stephen Boyd 
---

Please ack/review so I can take this through clk tree.

 include/linux/device.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index b425a7ee04ce..0370dd0b3ae7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1231,7 +1231,7 @@ static inline void device_lock_assert(struct device *dev)
 
 static inline struct device_node *dev_of_node(struct device *dev)
 {
-   if (!IS_ENABLED(CONFIG_OF))
+   if (!IS_ENABLED(CONFIG_OF) || !dev)
return NULL;
return dev->of_node;
 }
-- 
Sent by a computer through tubes



[PATCH v4 3/9] clk: Prepare for clk registration API that uses DT nodes

2019-04-12 Thread Stephen Boyd
Split out the body of the clk_register() function so it can be shared
between the different types of registration APIs (DT, device).

Cc: Miquel Raynal 
Cc: Jerome Brunet 
Cc: Russell King 
Cc: Michael Turquette 
Cc: Jeffrey Hugo 
Cc: Chen-Yu Tsai 
Cc: Rob Herring 
Signed-off-by: Stephen Boyd 
---
 drivers/clk/clk.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 96053a96fe2f..d27775a73e67 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -3313,18 +3313,7 @@ struct clk *clk_hw_create_clk(struct device *dev, struct 
clk_hw *hw,
return clk;
 }
 
-/**
- * clk_register - allocate a new clock, register it and return an opaque cookie
- * @dev: device that is registering this clock
- * @hw: link to hardware-specific clock data
- *
- * clk_register is the primary interface for populating the clock tree with new
- * clock nodes.  It returns a pointer to the newly allocated struct clk which
- * cannot be dereferenced by driver code but may be used in conjunction with 
the
- * rest of the clock API.  In the event of an error clk_register will return an
- * error code; drivers must test for an error code after calling clk_register.
- */
-struct clk *clk_register(struct device *dev, struct clk_hw *hw)
+static struct clk *__clk_register(struct device *dev, struct clk_hw *hw)
 {
int i, ret;
struct clk_core *core;
@@ -3426,6 +3415,22 @@ struct clk *clk_register(struct device *dev, struct 
clk_hw *hw)
 fail_out:
return ERR_PTR(ret);
 }
+
+/**
+ * clk_register - allocate a new clock, register it and return an opaque cookie
+ * @dev: device that is registering this clock
+ * @hw: link to hardware-specific clock data
+ *
+ * clk_register is the primary interface for populating the clock tree with new
+ * clock nodes.  It returns a pointer to the newly allocated struct clk which
+ * cannot be dereferenced by driver code but may be used in conjunction with 
the
+ * rest of the clock API.  In the event of an error clk_register will return an
+ * error code; drivers must test for an error code after calling clk_register.
+ */
+struct clk *clk_register(struct device *dev, struct clk_hw *hw)
+{
+   return __clk_register(dev, hw);
+}
 EXPORT_SYMBOL_GPL(clk_register);
 
 /**
@@ -3440,7 +3445,7 @@ EXPORT_SYMBOL_GPL(clk_register);
  */
 int clk_hw_register(struct device *dev, struct clk_hw *hw)
 {
-   return PTR_ERR_OR_ZERO(clk_register(dev, hw));
+   return PTR_ERR_OR_ZERO(__clk_register(dev, hw));
 }
 EXPORT_SYMBOL_GPL(clk_hw_register);
 
-- 
Sent by a computer through tubes



  1   2   3   4   5   6   7   >