[PATCH v2] MAINTAINERS: mark simple firmware interface (SFI) obsolete
Len Brown has not been active in this part since around 2010. The recent activity suggests that Thomas Gleixner and Jiang Lui were maintaining this part of the kernel sources. Jiang Lui has not been active in the kernel sources since beginning 2016. So, the maintainer's role seems to be now with Thomas. The referenced git tree does not exist. Instead, I found an sfi branch in Len's kernel git repository, but that has not been updated since 2014; so that is not worth to be mentioned in MAINTAINERS now anymore either. Len Brown expects no further systems to be shipped with SFI, so we can mark it obsolete and schedule it for deletion. This change was motivated after I found that I could not send any mails to the sfi-devel mailing list, and that the mailing list does not exist anymore. Signed-off-by: Lukas Bulwahn --- Thomas, please pick this patch. v1: https://lore.kernel.org/patchwork/patch/1033696/ - got Acked-by: Len Brown v2: - also change status to Obsolete MAINTAINERS | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 2359e12e4c41..07bf81c52fe6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14150,11 +14150,9 @@ F: drivers/video/fbdev/sm712* F: Documentation/fb/sm712fb.txt SIMPLE FIRMWARE INTERFACE (SFI) -M: Len Brown -L: sfi-de...@simplefirmware.org +M: Thomas Gleixner W: http://simplefirmware.org/ -T: git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6.git -S: Supported +S: Obsolete F: arch/x86/platform/sfi/ F: drivers/sfi/ F: include/linux/sfi*.h -- 2.17.1
Re: [PATCH 4/4] locking/lockdep: Test all incompatible scenario at once in check_irq_usage()
On Wed, 3 Apr 2019 at 00:05, Frederic Weisbecker wrote: > > check_prev_add_irq() tests all incompatible scenarios one after the > other while adding a lock (@next) to a tree dependency (@prev): > > LOCK_USED_IN_HARDIRQ vs LOCK_ENABLED_HARDIRQ > LOCK_USED_IN_HARDIRQ_READ vs LOCK_ENABLED_HARDIRQ > LOCK_USED_IN_SOFTIRQ vs LOCK_ENABLED_SOFTIRQ > LOCK_USED_IN_SOFTIRQ_READ vs LOCK_ENABLED_SOFTIRQ May I ask why LOCK_USED_IN_*IRQ vs. LOCK_ENABLED_*IRQ_READ is not tested? Thanks, Yuyang
RE: perf tools:Is there any tools to found out the max latency by irq or cpu idle
Sorry, the value 131081408 is just for example. Actually the result is like this: sqrt 2019-04-10 23:53:50: 43968 sqrt 2019-04-10 23:53:51: 44060 sqrt 2019-04-10 23:53:52: 49012 sqrt 2019-04-10 23:53:53: 38172 sqrt 2019-04-10 23:53:54: 131081408 sqrt 2019-04-10 23:53:55: 43600 sqrt 2019-04-10 23:53:56: 46704 sqrt 2019-04-10 23:53:57: 46880 sqrt 2019-04-10 23:53:58: 44332 …… sqrt 2019-04-10 02:17:15: 136345876 …… sqrt 2019-04-10 04:40:35: 136335644 …… -Original Message- From: William Cohen [mailto:wco...@redhat.com] Sent: 2019年4月12日 22:06 To: Linhaifeng ; linux-perf-us...@vger.kernel.org; linux-kernel@vger.kernel.org Subject: Re: perf tools:Is there any tools to found out the max latency by irq or cpu idle On 4/11/19 8:57 PM, Linhaifeng wrote: > Hi, > I have a single thread application like this: > > While (1) { > start = rdtsc(); > sqrt (1024); > end = rdtsc(); > cycles = end – start; > printf("cycles: %d-%02d-%02d %02d:%02d:%02d: %lu\n", > 1900+timeinfo->tm_year, 1+timeinfo->tm_mon, timeinfo->tm_mday, > timeinfo->tm_hour, timeinfo->tm_min, timeinfo->tm_sec, > cycles); > } > It print the cycles of sqrt every second and run with taskset –c 1 ./sqrt. > The result of test is: > > sqrt 2019-04-10 23:53:50: 43968 > sqrt 2019-04-10 23:53:51: 44060 > sqrt 2019-04-10 23:53:52: 49012 > sqrt 2019-04-10 23:53:53: 38172 > sqrt 2019-04-10 23:53:54: 131081408 > sqrt 2019-04-10 23:53:55: 43600 > sqrt 2019-04-10 23:53:56: 46704 > sqrt 2019-04-10 23:53:57: 46880 > sqrt 2019-04-10 23:53:58: 44332 > …… > sqrt 2019-04-10 02:17:15: 131081408 > …… > sqrt 2019-04-10 04:40:35: 131081408 > …… > > Every 2hour23min there would be a large cycles. I use perf sched not found > any sched_switch events. Hi, The fact that it is the same value 131081408 every 2 hours 23 minutes looks suspect. One would expect some variation in the counts. It looks like there is some rollover or overflow issue. It would be helpful to print out the start and end values to see what is going on with the raw rdstc values. Maybe the thread is being moved between processors and the TSC are out of sync. What particular processor model was this running on? Was this running on physical hardware or inside a kvm guest? According to the Intel 64 and IA-32 Architecture Software Devloper's Manual Volume 3 (325384-sdm-vol-3abcd.pdf): The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing unique value whenever executed, except for a 64-bit counter wraparound. Intel guarantees that the time-stamp counter will not wraparound within 10 years after being reset. The period for counter wrap is longer for Pentium 4, Intel Xeon, P6 family, and Pentium processors. -Will Cohen > > L2GW_2680:/home/fsp/zn # perf sched record -C 6-11 -o perf.sched ^C[ > perf record: Woken up 64 times to write data ] [ perf record: Captured > and wrote 204.878 MB perf.sched (1911189 samples) ] > > L2GW_2680:/home/fsp/zn # perf sched latency -i perf.sched > > -- > --- > Task | Runtime ms | Switches | Average delay ms > | Maximum delay ms | Maximum delay at | > -- > --- > -- > --- > TOTAL: | 0.000 ms | 0 | > --- > > > > Is there any other tools of perf to found out the max latency by irq or cpu > idle ? > >
[PATCH v2] MAINTAINERS: fix style in KEYS-TRUSTED entry
Mimi Zohar used spaces instead of a tab when adding Jarkko Sakkinen as further maintainer to the KEYS-TRUSTED section entry. In fact, ./scripts/checkpatch.pl -f MAINTAINERS complains: WARNING: MAINTAINERS entries use one tab after TYPE: #8581: FILE: MAINTAINERS:8581: +M: Jarkko Sakkinen The issue was detected when writing a script that parses MAINTAINERS. Fixes: 34bccd61b139 ("MAINTAINERS: add Jarkko as maintainer for trusted keys") Signed-off-by: Lukas Bulwahn --- v1: https://lore.kernel.org/patchwork/patch/1040412/ - Joe Perches clarified that this is a style defect. v2: applies cleanly to v5.1-rc4 and next-20190412 MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 2359e12e4c41..75bb276aed1c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8578,7 +8578,7 @@ F:security/keys/encrypted-keys/ KEYS-TRUSTED M: James Bottomley -M: Jarkko Sakkinen +M: Jarkko Sakkinen M: Mimi Zohar L: linux-integr...@vger.kernel.org L: keyri...@vger.kernel.org -- 2.17.1
Re: [PATCH 2/3] rtc: ds1685: use correct device struct to get platform device struct
On 4/12/2019 07:44, Thomas Bogendoerfer wrote: > On Fri, 12 Apr 2019 12:11:06 +0200 > Alexandre Belloni wrote: > >> Every patch need a commit message. Maybe you could indicate that this >> never gave any issue because parent is the first member of struct >> device. > > I'll update the commit message, I get a nice stacktrace because of that > bug, so the path from work_queue calling ds1685_rtc_poweroff never worked. > > Thomas. I'll wager that's why the thing stopped powering off my Octane. It *used* to work when I wrote the driver, but stopped after some unidentified point, and I never found the time to try and track it down. Which machine are you testing on, out of curiosity? -- Joshua Kinard Gentoo/MIPS ku...@gentoo.org rsa6144/5C63F4E3F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic
Zdravstvujte Vas interesuyut klientskie bazy dannyh?
Zdravstvujte Vas interesuyut klientskie bazy dannyh?
RE,
As-Salamu Alaykum, I need your help to transfer out the some amount of money, accumulated as undeclared excess profit made by me from the INVESTMENTS CAPITAL under my management in my bank. You will get 40% of the funds for your participation. Reply for more details thanks
Re: [GIT PULL] x86 fixes
The pull request you sent on Fri, 12 Apr 2019 15:10:42 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/6d0a598489ca687e1ebac37eef546526eca41347 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] IRQ fixes
The pull request you sent on Fri, 12 Apr 2019 13:52:19 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/6a022984c3feda8cc7d2d8c028b429715d0af531 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] timer fix
The pull request you sent on Fri, 12 Apr 2019 15:09:19 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > timers-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/122c215bfae884f10a189e6754d9603a06b981c3 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] scheduler fix
The pull request you sent on Fri, 12 Apr 2019 15:08:11 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > sched-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/5e6f1fee60a3d80582146835ac01d9808748434f Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] core/urgent fixes
The pull request you sent on Fri, 12 Apr 2019 12:16:30 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > core-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/54c63a75581f4b8d5d4b5660424f926510cfd98c Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] locking fix
The pull request you sent on Fri, 12 Apr 2019 13:53:43 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > locking-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/26e2b81977bb1ad70ff9b2acb4d4cb13c23facfd Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] perf fixes
The pull request you sent on Fri, 12 Apr 2019 15:06:42 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > perf-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/73fdb2c908c64a32e11c72c029d636f556859c0d Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
[PATCH] sched/fair: Use 'unsigned long' for group_shares,group_runnable
group_share and group_runnable are tracked as 'unsigned long', however some functions using them as 'long' which is ultimately assigned back to 'unsigned long' variables in reweight_entity. Since there is not scope on using a different and signed type, this change improves code consistency and avoids further type conversions. More important, to prevent undefined behavior caused by overflow. Using them as 'long' resulted in the following stack trace (on top of v4.19.34) == UBSAN: Undefined behaviour in kernel/sched/fair.c:3055:9 signed integer overflow: 1048576 * 9144968455305 cannot be represented in type 'long int' dump_backtrace+0x0/0x338 show_stack+0x28/0x38 dump_stack+0xc8/0x100 ubsan_epilogue+0x18/0x6c handle_overflow+0x170/0x1c0 __ubsan_handle_mul_overflow+0x34/0x44 update_cfs_group+0x244/0x248 dequeue_entity+0x478/0x12c0 dequeue_task_fair+0x6c/0xd98 __sched_setscheduler+0x320/0xdf0 _sched_setscheduler+0xf4/0x158 do_sched_setscheduler+0x118/0x1a0 __arm64_sys_sched_setscheduler+0x50/0x70 el0_svc_common+0xf4/0x258 el0_svc_handler+0x50/0xa8 == UBSAN: Undefined behaviour in kernel/sched/fair.c:3111:11 signed integer overflow: 97833896519391 * 952504 cannot be represented in type 'long int' Call trace: dump_backtrace+0x0/0x338 show_stack+0x28/0x38 dump_stack+0xc8/0x100 ubsan_epilogue+0x18/0x6c handle_overflow+0x170/0x1c0 __ubsan_handle_mul_overflow+0x34/0x44 update_cfs_group+0x210/0x248 enqueue_entity+0x7b4/0x1868 enqueue_task_fair+0x12c/0xe70 __sched_setscheduler+0x4cc/0xdf0 _sched_setscheduler+0xf4/0x158 do_sched_setscheduler+0x118/0x1a0 __arm64_sys_sched_setscheduler+0x50/0x70 el0_svc_common+0xf4/0x258 el0_svc_handler+0x50/0xa8 el0_svc+0x8/0xc == Cc: sta...@vger.kernel.org Signed-off-by: Cheng Jian --- kernel/sched/fair.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index fdab7eb6f351..cf003a31c220 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2920,9 +2920,9 @@ void reweight_task(struct task_struct *p, int prio) * * hence icky! */ -static long calc_group_shares(struct cfs_rq *cfs_rq) +static unsigned long calc_group_shares(struct cfs_rq *cfs_rq) { - long tg_weight, tg_shares, load, shares; + unsigned long tg_weight, tg_shares, load, shares; struct task_group *tg = cfs_rq->tg; tg_shares = READ_ONCE(tg->shares); @@ -2951,7 +2951,7 @@ static long calc_group_shares(struct cfs_rq *cfs_rq) * case no task is runnable on a CPU MIN_SHARES=2 should be returned * instead of 0. */ - return clamp_t(long, shares, MIN_SHARES, tg_shares); + return clamp_t(unsigned long, shares, MIN_SHARES, tg_shares); } /* @@ -2981,9 +2981,9 @@ static long calc_group_shares(struct cfs_rq *cfs_rq) * Where these max() serve both to use the 'instant' values to fix the slow * from-idle and avoid the /0 on to-idle, similar to (6). */ -static long calc_group_runnable(struct cfs_rq *cfs_rq, long shares) +static unsigned long calc_group_runnable(struct cfs_rq *cfs_rq, long shares) { - long runnable, load_avg; + unsigned long runnable, load_avg; load_avg = max(cfs_rq->avg.load_avg, scale_load_down(cfs_rq->load.weight)); @@ -2995,7 +2995,7 @@ static long calc_group_runnable(struct cfs_rq *cfs_rq, long shares) if (load_avg) runnable /= load_avg; - return clamp_t(long, runnable, MIN_SHARES, shares); + return clamp_t(unsigned long, runnable, MIN_SHARES, shares); } #endif /* CONFIG_SMP */ @@ -3008,7 +3008,7 @@ static inline int throttled_hierarchy(struct cfs_rq *cfs_rq); static void update_cfs_group(struct sched_entity *se) { struct cfs_rq *gcfs_rq = group_cfs_rq(se); - long shares, runnable; + unsigned long shares, runnable; if (!gcfs_rq) return; -- 2.17.1
[PATCH] regulator: tps65218: Convert to use regulator_get_current_limit_regmap
Use regulator_get_current_limit_regmap helper to save some code. Signed-off-by: Axel Lin --- drivers/regulator/tps65218-regulator.c | 48 ++ 1 file changed, 19 insertions(+), 29 deletions(-) diff --git a/drivers/regulator/tps65218-regulator.c b/drivers/regulator/tps65218-regulator.c index 7aed3fbffecc..b72035610013 100644 --- a/drivers/regulator/tps65218-regulator.c +++ b/drivers/regulator/tps65218-regulator.c @@ -29,7 +29,8 @@ #include #define TPS65218_REGULATOR(_name, _of, _id, _type, _ops, _n, _vr, _vm, _er, \ - _em, _cr, _cm, _lr, _nlr, _delay, _fuv, _sr, _sm) \ + _em, _cr, _cm, _lr, _nlr, _delay, _fuv, _sr, _sm, \ + _ct, _ncl) \ { \ .name = _name,\ .of_match = _of, \ @@ -42,6 +43,8 @@ .vsel_mask = _vm, \ .csel_reg = _cr, \ .csel_mask = _cm, \ + .curr_table = _ct, \ + .n_current_limits = _ncl, \ .enable_reg = _er, \ .enable_mask= _em, \ .volt_table = NULL, \ @@ -188,8 +191,7 @@ static const struct regulator_ops tps65218_ldo1_dcdc34_ops = { .set_suspend_disable= tps65218_pmic_set_suspend_disable, }; -static const int ls3_currents[] = { 10, 20, 50, 100 }; - +static const unsigned int ls3_currents[] = { 10, 20, 50, 100 }; static int tps65218_pmic_set_input_current_lim(struct regulator_dev *dev, int lim_uA) @@ -229,29 +231,13 @@ static int tps65218_pmic_set_current_limit(struct regulator_dev *dev, TPS65218_PROTECT_L1); } -static int tps65218_pmic_get_current_limit(struct regulator_dev *dev) -{ - int retval; - unsigned int index; - struct tps65218 *tps = rdev_get_drvdata(dev); - - retval = regmap_read(tps->regmap, dev->desc->csel_reg, &index); - if (retval < 0) - return retval; - - index = (index & dev->desc->csel_mask) >> -__builtin_ctz(dev->desc->csel_mask); - - return ls3_currents[index]; -} - static const struct regulator_ops tps65218_ls23_ops = { .is_enabled = regulator_is_enabled_regmap, .enable = tps65218_pmic_enable, .disable= tps65218_pmic_disable, .set_input_current_limit = tps65218_pmic_set_input_current_lim, .set_current_limit = tps65218_pmic_set_current_limit, - .get_current_limit = tps65218_pmic_get_current_limit, + .get_current_limit = regulator_get_current_limit_regmap, }; /* Operations permitted on DCDC5, DCDC6 */ @@ -270,53 +256,57 @@ static const struct regulator_desc regulators[] = { TPS65218_CONTROL_DCDC1_MASK, TPS65218_REG_ENABLE1, TPS65218_ENABLE1_DC1_EN, 0, 0, dcdc1_dcdc2_ranges, 2, 4000, 0, TPS65218_REG_SEQ3, - TPS65218_SEQ3_DC1_SEQ_MASK), + TPS65218_SEQ3_DC1_SEQ_MASK, NULL, 0), TPS65218_REGULATOR("DCDC2", "regulator-dcdc2", TPS65218_DCDC_2, REGULATOR_VOLTAGE, tps65218_dcdc12_ops, 64, TPS65218_REG_CONTROL_DCDC2, TPS65218_CONTROL_DCDC2_MASK, TPS65218_REG_ENABLE1, TPS65218_ENABLE1_DC2_EN, 0, 0, dcdc1_dcdc2_ranges, 2, 4000, 0, TPS65218_REG_SEQ3, - TPS65218_SEQ3_DC2_SEQ_MASK), + TPS65218_SEQ3_DC2_SEQ_MASK, NULL, 0), TPS65218_REGULATOR("DCDC3", "regulator-dcdc3", TPS65218_DCDC_3, REGULATOR_VOLTAGE, tps65218_ldo1_dcdc34_ops, 64, TPS65218_REG_CONTROL_DCDC3, TPS65218_CONTROL_DCDC3_MASK, TPS65218_REG_ENABLE1, TPS65218_ENABLE1_DC3_EN, 0, 0, ldo1_dcdc3_ranges, 2, - 0, 0, TPS65218_REG_SEQ4, TPS65218_SEQ4_DC3_SEQ_MASK), + 0, 0, TPS65218_REG_SEQ4, TPS65218_SEQ4_DC3_SEQ_MASK, + NULL, 0), TPS65218_REGULATOR("DCDC4", "regulator-dcdc4", TPS65218_DCDC_4, REGULATOR_VOLTAGE, tps65218_ldo1_dcdc34_ops, 53, TPS65218_REG_CONTROL_DCDC4, TPS65218_CONTROL_DCDC4_MASK, TPS65218_REG_ENABLE1, TPS65218_ENABLE1_DC4_EN, 0, 0, dcdc4_ranges, 2, -
Re: [PATCH-tip v3 02/14] locking/rwsem: Make owner available even if !CONFIG_RWSEM_SPIN_ON_OWNER
On 04/12/2019 02:05 PM, Waiman Long wrote: > On 04/12/2019 12:41 PM, Ingo Molnar wrote: >> >> So beyond the primary constraint of PeterZ OK-ing it all, there's also >> these two scalability regression reports from the ktest bot: >> >> [locking/rwsem] 1b94536f2d: stress-ng.bad-altstack.ops_per_sec -32.7% >> regression > A regression due to the lock handoff patch is kind of expected, but I > will into why there is such a large drop. I don't have a high core count system on hand. I run the stress-ng test on a 2-socket 40-core 80-thread Skylake system: Kernels: 1) Before lock handoff patch 2) After lock handoff patch 3) After wake all reader patch 4) After reader spin on writer patch 5) After writer spin on reader patch Tests K1 K2 K3 K4 K5 - -- -- -- -- -- bad-altstack 39928 35807 36422 40062 40747 stackmmap 187 365 435 255 198 vm 309589 296097 262045 281974 310439 vm-segv 113776 114058 112318 115422 110550 Here, the bad-altstack dropped 10% after the lock handoff patch. However, the performance is recovered with later patches. The stackmmap results don't look quite right as the numbers are much smaller than the numbers in the report. I will rerun the tests again when I acquire a high core count system. Anyway, the lock handoff patch is expected to reduce throughput under heavy contention. >> [locking/rwsem] adc32e8877: will-it-scale.per_thread_ops -21.0% regression > Will look into that also. I can reproduce the regression on the same skylake system. The results of the page_fault1 will-it-scale test are as follows: Threads K2 K3 K4 K5 --- -- -- -- -- 20 5549772 5550332 5463961 5400064 40 9540445 10286071 9705062 7706082 60 8187245 8212307 247 6647705 89 8390758 9619271 9019454 7124407 So the wake-all-reader patch is good for this benchmark. The performance was reduced a bit with the reader-spin-on-writer patch. It got even worse with the writer-spin-on-reader patch. I looked at the perf output, rwsem contention accounted for less than 1% of the total cpu cycles. So I believe the regression was caused by the behavior change introduced by the two reader optimistic spinning patches. These patch will make writer less preferred than before. I think the performance of this microbenchmark may be more dependent on writer performance. Looking at the lock event counts for K5: rwsem_opt_fail=253647 rwsem_opt_nospin=8776 rwsem_opt_rlock=259941 rwsem_opt_wlock=2543 rwsem_rlock=237747 rwsem_rlock_fail=0 rwsem_rlock_fast=0 rwsem_rlock_handoff=0 rwsem_sleep_reader=237747 rwsem_sleep_writer=23098 rwsem_wake_reader=6033 rwsem_wake_writer=47032 rwsem_wlock=15890 rwsem_wlock_fail=10 rwsem_wlock_handoff=3991 For K4, it was rwsem_opt_fail=479626 rwsem_opt_rlock=8877 rwsem_opt_wlock=114 rwsem_rlock=453874 rwsem_rlock_fail=0 rwsem_rlock_fast=1234 rwsem_rlock_handoff=0 rwsem_sleep_reader=453058 rwsem_sleep_writer=25836 rwsem_wake_reader=11054 rwsem_wake_writer=71568 rwsem_wlock=24515 rwsem_wlock_fail=3 rwsem_wlock_handoff=5245 It can be seen that a lot more readers got the lock via optimistic spinning. One possibility is that reader optimistic spinning causes readers to spread out into more lock acquisition groups than without. The K3 results show that grouping more readers into one lock acquisition group help to improve performance for this microbenchmark. I will need to run more tests to find out the root cause of this regression. It is not an easy problem to solve. In the mean time, I am going to send out an updated patchset tomorrow so that Peter can review the patch again when he is available. Cheers, Longman
RE: [PATCH] vmbus: Remove the undesired put_cpu_ptr() in hv_synic_cleanup()
From: Dexuan Cui Sent: Friday, April 12, 2019 4:35 PM > > With CONFIG_DEBUG_PREEMPT=y, the put_cpu_ptr() triggiers an underflow > warning in preempt_count_sub(). > > Fixes: 37cdd991fac8 ("vmbus: put related per-cpu variable together") > Cc: sta...@vger.kernel.org > Cc: Stephen Hemminger > Signed-off-by: Dexuan Cui Reviewed-by: Michael Kelley
[PATCH] printk: tie printk_once / printk_deferred_once into .data.once for reset
In commit b1fca27d384e ("kernel debug: support resetting WARN*_ONCE") we got the opportunity to reset state on the one shot messages, without having to reboot. However printk_once (printk_deferred_once) live in a different file and didn't get the same kind of update/conversion, so they remain unconditionally one shot, until the system is rebooted. For example, we currently have: sched/rt.c: printk_deferred_once("sched: RT throttling activated\n"); ..which could reasonably be tripped as someone is testing and tuning a new system/workload and their task placements. For consistency, and to avoid reboots in the same vein as the original commit, we make these two instances of _once the same as the WARN*_ONCE instances are. Cc: Andi Kleen Cc: Petr Mladek Cc: Sergey Senozhatsky Cc: Steven Rostedt Cc: Andrew Morton Signed-off-by: Paul Gortmaker --- Documentation/clearing-warn-once.txt | 2 +- include/linux/printk.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/clearing-warn-once.txt b/Documentation/clearing-warn-once.txt index 5b1f5d547be1..c68598b31428 100644 --- a/Documentation/clearing-warn-once.txt +++ b/Documentation/clearing-warn-once.txt @@ -1,5 +1,5 @@ -WARN_ONCE / WARN_ON_ONCE only print a warning once. +WARN_ONCE / WARN_ON_ONCE / printk_once only emit a message once. echo 1 > /sys/kernel/debug/clear_warn_once diff --git a/include/linux/printk.h b/include/linux/printk.h index d7c77ed1a4cb..84ea4d094af3 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -347,7 +347,7 @@ extern int kptr_restrict; #ifdef CONFIG_PRINTK #define printk_once(fmt, ...) \ ({ \ - static bool __print_once __read_mostly; \ + static bool __section(.data.once) __print_once; \ bool __ret_print_once = !__print_once; \ \ if (!__print_once) {\ @@ -358,7 +358,7 @@ extern int kptr_restrict; }) #define printk_deferred_once(fmt, ...) \ ({ \ - static bool __print_once __read_mostly; \ + static bool __section(.data.once) __print_once; \ bool __ret_print_once = !__print_once; \ \ if (!__print_once) {\ -- 2.7.4
[PATCH v5 2/2] tty: serial: add driver for the SiFive UART
Add a serial driver for the SiFive UART, found on SiFive FU540 devices (among others). The underlying serial IP block is relatively basic, and currently does not support serial break detection. Further information on the IP block can be found in the documentation and Chisel sources: https://static.dev.sifive.com/FU540-C000-v1.0.pdf https://github.com/sifive/sifive-blocks/tree/master/src/main/scala/devices/uart This driver was written in collaboration with Wesley Terpstra . Tested on a SiFive HiFive Unleashed A00 board, using BBL and the open- source FSBL (using a DT file based on what's targeted for mainline). This revision incorporates changes based on comments by Julia Lawall , Emil Renner Berthing , and Andreas Schwab . Thanks also to Andreas for testing the driver with his userspace and reporting a bug with the set_termios implementation. Signed-off-by: Paul Walmsley Signed-off-by: Paul Walmsley Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Palmer Dabbelt Cc: Wesley Terpstra Cc: linux-ser...@vger.kernel.org Cc: linux-ri...@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: Julia Lawall Cc: Emil Renner Berthing Cc: Andreas Schwab --- drivers/tty/serial/Kconfig | 24 + drivers/tty/serial/Makefile |1 + drivers/tty/serial/sifive.c | 1056 ++ include/uapi/linux/serial_core.h |3 + 4 files changed, 1084 insertions(+) create mode 100644 drivers/tty/serial/sifive.c diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig index 72966bc0ac76..561e053b690a 100644 --- a/drivers/tty/serial/Kconfig +++ b/drivers/tty/serial/Kconfig @@ -1095,6 +1095,30 @@ config SERIAL_OMAP_CONSOLE your boot loader about how to pass options to the kernel at boot time.) +config SERIAL_SIFIVE + tristate "SiFive UART support" + depends on OF + select SERIAL_CORE + help + Select this option if you are building a kernel for a device that + contains a SiFive UART IP block. This type of UART is present on + SiFive FU540 SoCs, among others. + +config SERIAL_SIFIVE_CONSOLE + bool "Console on SiFive UART" + depends on SERIAL_SIFIVE=y + select SERIAL_CORE_CONSOLE + help + Select this option if you would like to use a SiFive UART as the + system console. + + Even if you say Y here, the currently visible virtual console + (/dev/tty0) will still be used as the system console by default, but + you can alter that using a kernel command line option such as + "console=ttySIFx". (Try "man bootparam" or see the documentation of + your boot loader about how to pass options to the kernel at + boot time.) + config SERIAL_LANTIQ bool "Lantiq serial driver" depends on LANTIQ diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile index 40b702aaa85e..2aff1d07d08b 100644 --- a/drivers/tty/serial/Makefile +++ b/drivers/tty/serial/Makefile @@ -92,6 +92,7 @@ obj-$(CONFIG_SERIAL_PIC32)+= pic32_uart.o obj-$(CONFIG_SERIAL_MPS2_UART) += mps2-uart.o obj-$(CONFIG_SERIAL_OWL) += owl-uart.o obj-$(CONFIG_SERIAL_RDA) += rda-uart.o +obj-$(CONFIG_SERIAL_SIFIVE)+= sifive.o # GPIOLIB helpers for modem control lines obj-$(CONFIG_SERIAL_MCTRL_GPIO)+= serial_mctrl_gpio.o diff --git a/drivers/tty/serial/sifive.c b/drivers/tty/serial/sifive.c new file mode 100644 index ..be4687814353 --- /dev/null +++ b/drivers/tty/serial/sifive.c @@ -0,0 +1,1056 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * SiFive UART driver + * Copyright (C) 2018 Paul Walmsley + * Copyright (C) 2018-2019 SiFive + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Based partially on: + * - drivers/tty/serial/pxa.c + * - drivers/tty/serial/amba-pl011.c + * - drivers/tty/serial/uartlite.c + * - drivers/tty/serial/omap-serial.c + * - drivers/pwm/pwm-sifive.c + * + * See the following sources for further documentation: + * - Chapter 19 "Universal Asynchronous Receiver/Transmitter (UART)" of + * SiFive FE310-G000 v2p3 + * - The tree/master/src/main/scala/devices/uart directory of + * https://github.com/sifive/sifive-blocks/ + * + * The SiFive UART design is not 8250-compatible. The following common + * features are not supported: + * - Word lengths other than 8 bits + * - Break handling + * - Parity + * - Flow control + * - Modem signals (DSR, RI, etc.) + * On the other hand, the design is free from the baggage of the 8
[PATCH v5 0/2] tty: serial: add DT bindings and serial driver for the SiFive FU540 UART
This series adds a serial driver, with console support, for the UART IP block present on the SiFive FU540 SoC. The programming model is straightforward, but unique. Boot-tested on a SiFive FU540 HiFive-U board, using BBL and the open-source FSBL (with appropriate patches to the DT data). This fifth version fixes a bug in the set_termios handler, found by Andreas Schwab . The patches in this series can also be found, with the PRCI patches, DT patches, and DT prerequisite patch, at: https://github.com/sifive/riscv-linux/tree/dev/paulw/serial-v5.1-rc4 - Paul Paul Walmsley (2): dt-bindings: serial: add documentation for the SiFive UART driver tty: serial: add driver for the SiFive UART .../bindings/serial/sifive-serial.txt | 33 + drivers/tty/serial/Kconfig| 24 + drivers/tty/serial/Makefile |1 + drivers/tty/serial/sifive.c | 1056 + include/uapi/linux/serial_core.h |3 + 5 files changed, 1117 insertions(+) create mode 100644 Documentation/devicetree/bindings/serial/sifive-serial.txt create mode 100644 drivers/tty/serial/sifive.c -- 2.20.1
[PATCH v5 1/2] dt-bindings: serial: add documentation for the SiFive UART driver
Add DT binding documentation for the Linux driver for the SiFive asynchronous serial IP block. This revision incorporates changes based on feedback from Rob Herring . Signed-off-by: Paul Walmsley Signed-off-by: Paul Walmsley Cc: linux-ser...@vger.kernel.org Cc: devicet...@vger.kernel.org Cc: linux-ri...@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman Cc: Rob Herring Cc: Mark Rutland Cc: Palmer Dabbelt --- .../bindings/serial/sifive-serial.txt | 33 +++ 1 file changed, 33 insertions(+) create mode 100644 Documentation/devicetree/bindings/serial/sifive-serial.txt diff --git a/Documentation/devicetree/bindings/serial/sifive-serial.txt b/Documentation/devicetree/bindings/serial/sifive-serial.txt new file mode 100644 index ..c86b1e524159 --- /dev/null +++ b/Documentation/devicetree/bindings/serial/sifive-serial.txt @@ -0,0 +1,33 @@ +SiFive asynchronous serial interface (UART) + +Required properties: + +- compatible: should be something similar to + "sifive,-uart" for the UART as integrated + on a particular chip, and "sifive,uart" for the + general UART IP block programming model. Supported + compatible strings as of the date of this writing are: + "sifive,fu540-c000-uart" for the SiFive UART v0 as + integrated onto the SiFive FU540 chip, or "sifive,uart0" + for the SiFive UART v0 IP block with no chip integration + tweaks (if any) +- reg: address and length of the register space +- interrupts: Should contain the UART interrupt identifier +- clocks: Should contain a clock identifier for the UART's parent clock + + +UART HDL that corresponds to the IP block version numbers can be found +here: + +https://github.com/sifive/sifive-blocks/tree/master/src/main/scala/devices/uart + + +Example: + +uart0: serial@1001 { + compatible = "sifive,fu540-c000-uart", "sifive,uart0"; + interrupt-parent = <&plic0>; + interrupts = <80>; + reg = <0x0 0x1001 0x0 0x1000>; + clocks = <&prci PRCI_CLK_TLCLK>; +}; -- 2.20.1
Re: [RFC] what's struct ipmi_file_private ->file for?
On Sat, Apr 13, 2019 at 01:47:57AM +0100, Al Viro wrote: > AFAICS, nothing has ever used that; am I, by any > chance, missing some nasty macros in there? Just curious... I'm not sure why it's there, but it's been there since 2003 when the driver was first added. I've queued a patch to remove it. Thanks, -corey
Re: [PATCH RFC 1/2] Add polling support to pidfd
[Resending due to accidental HTML. I need to take Joel's advice and switch to a real email client] On Fri, Apr 12, 2019 at 5:54 PM Daniel Colascione wrote: > > On Fri, Apr 12, 2019 at 5:09 PM Joel Fernandes wrote: >> >> Hi Andy! >> >> On Fri, Apr 12, 2019 at 02:32:53PM -0700, Andy Lutomirski wrote: >> > On Thu, Apr 11, 2019 at 10:51 AM Joel Fernandes (Google) >> > wrote: >> > > >> > > pidfd are /proc/pid directory file descriptors referring to a task group >> > > leader. Android low memory killer (LMK) needs pidfd polling support to >> > > replace code that currently checks for existence of /proc/pid for >> > > knowing a process that is signalled to be killed has died, which is both >> > > racy and slow. The pidfd poll approach is race-free, and also allows the >> > > LMK to do other things (such as by polling on other fds) while awaiting >> > > the process being killed to die. >> > > >> > > It prevents a situation where a PID is reused between when LMK sends a >> > > kill signal and checks for existence of the PID, since the wrong PID is >> > > now possibly checked for existence. >> > > >> > > In this patch, we follow the same mechanism used uhen the parent of the >> > > task group is to be notified, that is when the tasks waiting on a poll >> > > of pidfd are also awakened. >> > > >> > > We have decided to include the waitqueue in struct pid for the following >> > > reasons: >> > > 1. The wait queue has to survive for the lifetime of the poll. Including >> > > it in task_struct would not be option in this case because the task can >> > > be reaped and destroyed before the poll returns. >> > >> > Are you sure? I admit I'm not all that familiar with the innards of >> > poll() on Linux, but I thought that the waitqueue only had to survive >> > long enough to kick the polling thread and did *not* have to survive >> > until poll() actually returned. >> >> I am not sure now. I thought epoll(2) was based on the wait_event APIs, >> however more closely looking at the eventpoll code, it looks like there are 2 >> waitqueues involved, one that we pass and the other that is a part of the >> eventpoll session itself, so you could be right about that. Daniel Colascione >> may have some more thoughts about it since he brought up the possiblity of a >> wq life-time issue. Daniel? We were just playing it safe. I think you (Joel) and Andy are talking about different meanings of poll(). Joel is talking about the VFS method; Andy is talking about the system call. ISTM that the lifetime of wait queue we give to poll_wait needs to last through the poll. Normally the wait queue gets pinned by the struct file that we give to poll_wait (which takes a reference on the struct file), but the pidfd struct file doesn't pin the struct task, so we can't use a wait queue in struct task. (remove_wait_queue, which poll implementations call to undo wait queue additions, takes the wait queue head we pass to poll_wait, and we don't want to pass a dangling pointer to remove_wait_queue.) If the lifetime requirements for the queue aren't this strict, I don't see it documented anywhere. Besides: if we don't actually need to pin the waitqueue lifetime for the duration of the poll, why bother taking a reference on the polled struct file?
[RFC] what's struct ipmi_file_private ->file for?
AFAICS, nothing has ever used that; am I, by any chance, missing some nasty macros in there? Just curious...
Re: [PATCH 4/4] locking/lockdep: Test all incompatible scenario at once in check_irq_usage()
On Thu, Apr 11, 2019 at 12:46:32PM +0200, Peter Zijlstra wrote: > On Wed, Apr 10, 2019 at 04:28:48AM +0200, Frederic Weisbecker wrote: > > Should I re-issue the set or you do the changes? > > I've made it like so; does that work? In particular, do the comments > make sense? It is always hard to write these things down. It's especially hard because we describe both the bits of the usage mask and the bits of the bit numbers of the usage mask. The resulting description can only be naughty :-) That said I can probably clarify that in lockdep_internals.h on further patches. A few comments though: > +/* > + * The bit number is encoded like: > + * > + * bit0: 0 exclusive, 1 read lock > + * bit1: 0 used in irq, 1 irq enabled > + * bit2-n: state > + */ > static int exclusive_bit(int new_bit) > { > int state = new_bit & LOCK_USAGE_STATE_MASK; > @@ -1988,45 +1968,158 @@ static int exclusive_bit(int new_bit) > return state | (dir ^ LOCK_USAGE_DIR_MASK); > } > > +/* > + * Observe that when given a bitmask where each bitnr is encoded as above, a > + * right shift of the mask transforms the individual bitnrs as -1. > + * > + * So for all bits where bitnr1 == 1, we can create the mask where bitnr1 == > 0 So by bitnr1 you're referring to direction, right? > + * by subtracting 2, or shifting the mask right by 2. In which case we can perhaps reformulate: So for all bits whose number have LOCK_ENABLED_* set (bitnr1 == 1), we can create the mask with those bit numbers using LOCK_USED_IN_* (bitnr1 == 0) instead by subtracting the bit number by 2, or shifting the mask right by 2. And same would go for below. > + * > + * Similarly, bitnr1 == 0 becomes bitnr1 == 1 by adding 2, or shifting left > 2. > + * > + * So split the mask (note that LOCKF_ENABLED_IRQ_ALL|LOCKF_USED_IN_IRQ_ALL > is > + * all bits set) and recompose with bitnr1 flipped. > + */ > +static unsigned long invert_dir_mask(unsigned long mask) > +{ > + unsigned long excl = 0; > + > + /* Invert dir */ > + excl |= (mask & LOCKF_ENABLED_IRQ_ALL) >> LOCK_USAGE_DIR_MASK; > + excl |= (mask & LOCKF_USED_IN_IRQ_ALL) << LOCK_USAGE_DIR_MASK; > + > + return excl; > +} > + > +/* > + * As above, we clear bitnr0 with bitmask ops. First, for all bits with > bitnr1 > + * set, add those with bitnr1 cleared. And then mask out all bitnr1. > + */ Same here: As above, we clear bitnr0 (LOCK_*_READ off) with bitmask ops. First, for all bits with bitnr1 set (LOCK_ENABLED_*) , add those with bitnr1 cleared (LOCK_USED_IN_*). And then mask out all bitnr1. > +static unsigned long exclusive_mask(unsigned long mask) > +{ > + unsigned long excl = invert_dir_mask(mask); > + > + /* Strip read */ > + excl |= (excl & LOCKF_IRQ_READ) >> LOCK_USAGE_READ_MASK; > + excl &= ~LOCKF_IRQ_READ; > + > + return excl; > +} Not sure I'm making things clearer but at least that's some more pointers to enum lock_usage_bit (defined on headers where I should add more layout explanations, especially to make it clear we play with bit number bits :-s ) Thanks.
[PATCH v3] mm/hotplug: treat CMA pages as unmovable
has_unmovable_pages() is used by allocating CMA and gigantic pages as well as the memory hotplug. The later doesn't know how to offline CMA pool properly now, but if an unused (free) CMA page is encountered, then has_unmovable_pages() happily considers it as a free memory and propagates this up the call chain. Memory offlining code then frees the page without a proper CMA tear down which leads to an accounting issues. Moreover if the same memory range is onlined again then the memory never gets back to the CMA pool. State after memory offline: # grep cma /proc/vmstat nr_free_cma 205824 # cat /sys/kernel/debug/cma/cma-kvm_cma/count 209920 Also, kmemleak still think those memory address are reserved but have already been used by the buddy allocator after onlining. Offlined Pages 4096 kmemleak: Cannot insert 0xc000201f7d040008 into the object search tree (overlaps existing) Call Trace: [c0003dc2faf0] [c0884b2c] dump_stack+0xb0/0xf4 (unreliable) [c0003dc2fb30] [c0424fb4] create_object+0x344/0x380 [c0003dc2fbf0] [c03d178c] __kmalloc_node+0x3ec/0x860 [c0003dc2fc90] [c0319078] kvmalloc_node+0x58/0x110 [c0003dc2fcd0] [c0484d9c] seq_read+0x41c/0x620 [c0003dc2fd60] [c04472bc] __vfs_read+0x3c/0x70 [c0003dc2fd80] [c04473ac] vfs_read+0xbc/0x1a0 [c0003dc2fdd0] [c044783c] ksys_read+0x7c/0x140 [c0003dc2fe20] [c000b108] system_call+0x5c/0x70 kmemleak: Kernel memory leak detector disabled kmemleak: Object 0xc000201cc800 (size 13757317120): kmemleak: comm "swapper/0", pid 0, jiffies 4294937297 kmemleak: min_count = -1 kmemleak: count = 0 kmemleak: flags = 0x5 kmemleak: checksum = 0 kmemleak: backtrace: cma_declare_contiguous+0x2a4/0x3b0 kvm_cma_reserve+0x11c/0x134 setup_arch+0x300/0x3f8 start_kernel+0x9c/0x6e8 start_here_common+0x1c/0x4b0 kmemleak: Automatic memory scanning thread ended Acked-by: Michal Hocko Signed-off-by: Qian Cai --- v3: Use a string pointer instead of an array per Michal. v2: Borrow some commit log texts. Call dump_page() in the error path. mm/page_alloc.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d96ca5bc555b..40029b29fb88 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8005,7 +8005,10 @@ void *__init alloc_large_system_hash(const char *tablename, bool has_unmovable_pages(struct zone *zone, struct page *page, int count, int migratetype, int flags) { - unsigned long pfn, iter, found; + unsigned long found; + unsigned long iter = 0; + unsigned long pfn = page_to_pfn(page); + const char *reason = "unmovable page"; /* * TODO we could make this much more efficient by not checking every @@ -8015,17 +8018,20 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, * can still lead to having bootmem allocations in zone_movable. */ - /* -* CMA allocations (alloc_contig_range) really need to mark isolate -* CMA pageblocks even when they are not movable in fact so consider -* them movable here. -*/ - if (is_migrate_cma(migratetype) && - is_migrate_cma(get_pageblock_migratetype(page))) - return false; + if (is_migrate_cma(get_pageblock_migratetype(page))) { + /* +* CMA allocations (alloc_contig_range) really need to mark +* isolate CMA pageblocks even when they are not movable in fact +* so consider them movable here. +*/ + if (is_migrate_cma(migratetype)) + return false; + + reason = "CMA page"; + goto unmovable; + } - pfn = page_to_pfn(page); - for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { + for (found = 0; iter < pageblock_nr_pages; iter++) { unsigned long check = pfn + iter; if (!pfn_valid_within(check)) @@ -8105,7 +8111,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, unmovable: WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); if (flags & REPORT_FAILURE) - dump_page(pfn_to_page(pfn+iter), "unmovable page"); + dump_page(pfn_to_page(pfn + iter), reason); return true; } -- 2.20.1 (Apple Git-117)
Re: [PATCH RFC 1/2] Add polling support to pidfd
Hi Andy! On Fri, Apr 12, 2019 at 02:32:53PM -0700, Andy Lutomirski wrote: > On Thu, Apr 11, 2019 at 10:51 AM Joel Fernandes (Google) > wrote: > > > > pidfd are /proc/pid directory file descriptors referring to a task group > > leader. Android low memory killer (LMK) needs pidfd polling support to > > replace code that currently checks for existence of /proc/pid for > > knowing a process that is signalled to be killed has died, which is both > > racy and slow. The pidfd poll approach is race-free, and also allows the > > LMK to do other things (such as by polling on other fds) while awaiting > > the process being killed to die. > > > > It prevents a situation where a PID is reused between when LMK sends a > > kill signal and checks for existence of the PID, since the wrong PID is > > now possibly checked for existence. > > > > In this patch, we follow the same mechanism used uhen the parent of the > > task group is to be notified, that is when the tasks waiting on a poll > > of pidfd are also awakened. > > > > We have decided to include the waitqueue in struct pid for the following > > reasons: > > 1. The wait queue has to survive for the lifetime of the poll. Including > > it in task_struct would not be option in this case because the task can > > be reaped and destroyed before the poll returns. > > Are you sure? I admit I'm not all that familiar with the innards of > poll() on Linux, but I thought that the waitqueue only had to survive > long enough to kick the polling thread and did *not* have to survive > until poll() actually returned. I am not sure now. I thought epoll(2) was based on the wait_event APIs, however more closely looking at the eventpoll code, it looks like there are 2 waitqueues involved, one that we pass and the other that is a part of the eventpoll session itself, so you could be right about that. Daniel Colascione may have some more thoughts about it since he brought up the possiblity of a wq life-time issue. Daniel? We were just playing it safe. Either way the waitqueue in struct pid has the advantage mentioned below: > > 2. By including the struct pid for the waitqueue means that during > > de_exec, the thread doing de_thread() automatically gets the new > > waitqueue/pid even though its task_struct is different. > > I didn't follow this. Can you clarify? Sure. de_thread() can called when all threads of a thread group need to die when any thread in the group does an execve. The thread doing the execve will become the new thread leader. In this case, the thread that did the exec gets the pid of the new leader. The semantics of wait(2) are such that the wait should not return (unblock) in the above scenario because the group is non-empty even though the task_struct of the group leader died. IOW, we should not wake up any pidfd pollers in this cases. So basically what I was trying to say in point 2 above is that because of putting the waitqueue in struct pid, the change_pid() in de_thread() automatically carries the waiting tasks to the new task_struct leader, because the pid gets transferred to the new leader. If we put it in task_struct, then that wouldn't work since the leader's task_struct would get destroyed and we would have to handle the case in some other way. At least that is the theory. Anyway we specifically test for this case in patch 2/2 and also tested that not handling this case fails the test. > Also, please don't call your new helper wake_up_pidfd_pollers(). One I will call it wake_up_pollers() then, if that's Ok. > of the goals of my patch was to make it generically possible for > kernel code to wait for a task to exit. There are other cases besides > pidfd for which this would be useful. Ahem, kthread. (The kthread > implementation currently does some seriously awful things to detect > when kthreads die.) Also, some hypothetical future vastly improved > debugging API (to supercede ptrace for new applications) might want > this. Ah I see :-) Nice to know we can use this to improve the kthread code. thanks, - Joel
[PATCH] smp: Do not warn if smp_call_function_single() is doing a self call.
If smp_call_function_single() is calling the function for itself, it's safe to run with irqs_disabled() == true. I hit the warning because I'm in the below path in the .suspend callback of a "syscore_ops" to support hibernation for a VM running on Hyper-V: hv_synic_cleanup() -> clockevents_unbind_device() -> clockevents_unbind() -> smp_call_function_single(). When the .suspend callback runs, only CPU0 is online and irqs_disabled() is true. Signed-off-by: Dexuan Cui --- kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/smp.c b/kernel/smp.c index f4cf1b0bb3b8..4fdf6a378def 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -288,7 +288,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info, * can't happen. */ WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled() -&& !oops_in_progress); +&& cpu != smp_processor_id() && !oops_in_progress); csd = &csd_stack; if (!wait) { -- 2.19.1
[PATCH] vmbus: Remove the undesired put_cpu_ptr() in hv_synic_cleanup()
With CONFIG_DEBUG_PREEMPT=y, the put_cpu_ptr() triggiers an underflow warning in preempt_count_sub(). Fixes: 37cdd991fac8 ("vmbus: put related per-cpu variable together") Cc: sta...@vger.kernel.org Cc: Stephen Hemminger Signed-off-by: Dexuan Cui --- drivers/hv/hv.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c index 632d25674e7f..45653029ee18 100644 --- a/drivers/hv/hv.c +++ b/drivers/hv/hv.c @@ -408,7 +408,6 @@ int hv_synic_cleanup(unsigned int cpu) clockevents_unbind_device(hv_cpu->clk_evt, cpu); hv_ce_shutdown(hv_cpu->clk_evt); - put_cpu_ptr(hv_cpu); } hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64); -- 2.19.1
[GIT PULL] clk fixes for v5.1-rc4
The following changes since commit 9e98c678c2d6ae3a17cb2de55d17f69dddaa231b: Linux 5.1-rc1 (2019-03-17 14:22:26 -0700) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git tags/clk-fixes-for-linus for you to fetch changes up to f89b9e1be7da8bb0aac667a0206a00975cefe6d3: clk: imx: Fix PLL_1416X not rounding rates (2019-04-12 14:21:43 -0700) Here's more than a handful of clk driver fixes for changes that came in during the merge window: - Fix the AT91 sama5d2 programmable clk prescaler formula - A bunch of Amlogic meson clk driver fixes for the VPU clks - A DMI quirk for Intel's Bay Trail SoC's driver to properly mark pmc clks as critical only when really needed - Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate implementation - Use the right structure to test for a frequency table in i.MX's PLL_1416x driver David Müller (1): clk: x86: Add system specific quirk to mark clocks as critical Leonard Crestez (1): clk: imx: Fix PLL_1416X not rounding rates Martin Blumenstingl (1): clk: meson: pll: fix rounding and setting a rate that matches precisely Matthias Wieloch (1): clk: at91: fix programmable clock for sama5d2 Maxime Jourdan (2): clk: meson-gxbb: round the vdec dividers to closest clk: meson: g12a: fix VPU clock muxes mask Neil Armstrong (2): clk: meson-g12a: fix VPU clock parents clk: meson: vid-pll-div: remove warning and return 0 on invalid config Stephen Boyd (3): Merge tag 'meson-clk-fixes-for-5.1' of https://github.com/BayLibre/clk-meson into clk-fixes Merge tag 'meson-clk-fixes-for-5.1-v2' of https://github.com/BayLibre/clk-meson into clk-fixes platform/x86: pmc_atom: Drop __initconst on dmi table Weiyi Lu (1): clk: mediatek: fix clk-gate flag setting drivers/clk/at91/clk-programmable.c| 57 +++--- drivers/clk/at91/pmc.h | 2 + drivers/clk/at91/sama5d2.c | 10 - drivers/clk/imx/clk-pll14xx.c | 2 +- drivers/clk/mediatek/clk-gate.c| 3 +- drivers/clk/meson/clk-pll.c| 2 +- drivers/clk/meson/g12a.c | 6 +-- drivers/clk/meson/gxbb.c | 2 + drivers/clk/meson/vid-pll-div.c| 4 +- drivers/clk/x86/clk-pmc-atom.c | 14 +-- drivers/platform/x86/pmc_atom.c| 21 ++ include/linux/platform_data/x86/clk-pmc-atom.h | 3 ++ 12 files changed, 99 insertions(+), 27 deletions(-) -- Sent by a computer through tubes
Re: [PATCH] i2c: iproc: Change driver to use 'BIT' macro
On 2019-04-03 23:05, Ray Jui wrote: > Change the iProc I2C driver to use the 'BIT' macro from all '1 << XXX' > bit operations to get rid of compiler warning and improve readability of > the code All? I see lots more '1 << XXX_SHIFT' matches. I might be behind though? Anyway, if you are cleaning up, I'm just flagging that BIT(XXX_SHIFT) looks a bit clunky to me. You might consider renaming all those single-bit XXX_SHIFT macros to simple be #define XXX BIT() instead of #define XXX_SHIFT but that triggers more churn, so is obviously more error prone. You might not dare it? Cheers, Peter > Signed-off-by: Ray Jui > --- > drivers/i2c/busses/i2c-bcm-iproc.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/i2c/busses/i2c-bcm-iproc.c > b/drivers/i2c/busses/i2c-bcm-iproc.c > index 562942d0c05c..a845b8decac8 100644 > --- a/drivers/i2c/busses/i2c-bcm-iproc.c > +++ b/drivers/i2c/busses/i2c-bcm-iproc.c > @@ -717,7 +717,7 @@ static int bcm_iproc_i2c_xfer_single_msg(struct > bcm_iproc_i2c_dev *iproc_i2c, > > /* mark the last byte */ > if (i == msg->len - 1) > - val |= 1 << M_TX_WR_STATUS_SHIFT; > + val |= BIT(M_TX_WR_STATUS_SHIFT); > > iproc_i2c_wr_reg(iproc_i2c, M_TX_OFFSET, val); > } > @@ -844,7 +844,7 @@ static int bcm_iproc_i2c_cfg_speed(struct > bcm_iproc_i2c_dev *iproc_i2c) > > iproc_i2c->bus_speed = bus_speed; > val = iproc_i2c_rd_reg(iproc_i2c, TIM_CFG_OFFSET); > - val &= ~(1 << TIM_CFG_MODE_400_SHIFT); > + val &= ~BIT(TIM_CFG_MODE_400_SHIFT); > val |= (bus_speed == 40) << TIM_CFG_MODE_400_SHIFT; > iproc_i2c_wr_reg(iproc_i2c, TIM_CFG_OFFSET, val); > > @@ -995,7 +995,7 @@ static int bcm_iproc_i2c_resume(struct device *dev) > > /* configure to the desired bus speed */ > val = iproc_i2c_rd_reg(iproc_i2c, TIM_CFG_OFFSET); > - val &= ~(1 << TIM_CFG_MODE_400_SHIFT); > + val &= ~BIT(TIM_CFG_MODE_400_SHIFT); > val |= (iproc_i2c->bus_speed == 40) << TIM_CFG_MODE_400_SHIFT; > iproc_i2c_wr_reg(iproc_i2c, TIM_CFG_OFFSET, val); > >
[PATCH] platform/chrome: wilco_ec: Add telemetry data char device interface
The Wilco Embedded Controller is able to send telemetry data which is useful for enterprise applications. A daemon running on the OS sends a command to the EC via write() to char device, and can read the response with a read() request. Both the request and the response are in an opaque binary format so that information which is proprietary to the enterprise service provider is secure. The character device will appear as /dev/wilco_telemN, where N is some small non-negative integer, starting with 0. Only one process may have the file descriptor open at a time. The calling userspace program needs to keep the device file descriptor open between the calls to write() and read() in order to preserve the response. Either 32 or 256 bytes of data are expected for arguments, and the same number of bytes will be returned. For testing purposes, if the EC receives a byte sequence beginning with 0, it will return an inverted copy of the input sequence. For an example, run the simple python script from https://gist.github.com/52ab07c8519b56c0ec671d3338760516 Signed-off-by: Nick Crews --- drivers/platform/chrome/wilco_ec/Kconfig | 7 + drivers/platform/chrome/wilco_ec/Makefile| 2 + drivers/platform/chrome/wilco_ec/core.c | 13 + drivers/platform/chrome/wilco_ec/telemetry.c | 312 +++ include/linux/platform_data/wilco-ec.h | 2 + 5 files changed, 336 insertions(+) create mode 100644 drivers/platform/chrome/wilco_ec/telemetry.c diff --git a/drivers/platform/chrome/wilco_ec/Kconfig b/drivers/platform/chrome/wilco_ec/Kconfig index e09e4cebe9b4..7846f6146b78 100644 --- a/drivers/platform/chrome/wilco_ec/Kconfig +++ b/drivers/platform/chrome/wilco_ec/Kconfig @@ -18,3 +18,10 @@ config WILCO_EC_DEBUGFS manipulation and allow for testing arbitrary commands. This interface is intended for debug only and will not be present on production devices. + +config WILCO_EC_TELEMETRY + tristate "Enable querying telemetry data from EC" + depends on WILCO_EC + help + If you say Y here, you get support to query opaque binary + telemetrydata from /dev/wilco_telem using write() and then read(). diff --git a/drivers/platform/chrome/wilco_ec/Makefile b/drivers/platform/chrome/wilco_ec/Makefile index 063e7fb4ea17..b4aa6d26a3df 100644 --- a/drivers/platform/chrome/wilco_ec/Makefile +++ b/drivers/platform/chrome/wilco_ec/Makefile @@ -4,3 +4,5 @@ wilco_ec-objs := core.o mailbox.o obj-$(CONFIG_WILCO_EC) += wilco_ec.o wilco_ec_debugfs-objs := debugfs.o obj-$(CONFIG_WILCO_EC_DEBUGFS) += wilco_ec_debugfs.o +wilco_ec_telem-objs:= telemetry.o +obj-$(CONFIG_WILCO_EC_TELEMETRY) += wilco_ec_telem.o diff --git a/drivers/platform/chrome/wilco_ec/core.c b/drivers/platform/chrome/wilco_ec/core.c index 05e1e2be1c91..517d9627fecc 100644 --- a/drivers/platform/chrome/wilco_ec/core.c +++ b/drivers/platform/chrome/wilco_ec/core.c @@ -89,8 +89,20 @@ static int wilco_ec_probe(struct platform_device *pdev) goto unregister_debugfs; } + /* Register child device that will be found by the telemetry driver. */ + ec->telem_pdev = platform_device_register_data(dev, "wilco_telem", + PLATFORM_DEVID_AUTO, + NULL, 0); + if (IS_ERR(ec->telem_pdev)) { + dev_err(dev, "Failed to create telemetry platform device\n"); + ret = PTR_ERR(ec->telem_pdev); + goto unregister_rtc; + } + return 0; +unregister_rtc: + platform_device_unregister(ec->rtc_pdev); unregister_debugfs: if (ec->debugfs_pdev) platform_device_unregister(ec->debugfs_pdev); @@ -102,6 +114,7 @@ static int wilco_ec_remove(struct platform_device *pdev) { struct wilco_ec_device *ec = platform_get_drvdata(pdev); + platform_device_unregister(ec->telem_pdev); platform_device_unregister(ec->rtc_pdev); if (ec->debugfs_pdev) platform_device_unregister(ec->debugfs_pdev); diff --git a/drivers/platform/chrome/wilco_ec/telemetry.c b/drivers/platform/chrome/wilco_ec/telemetry.c new file mode 100644 index ..6b3dc84bd0f2 --- /dev/null +++ b/drivers/platform/chrome/wilco_ec/telemetry.c @@ -0,0 +1,312 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Telemetry communication for Wilco EC + * + * Copyright 2019 Google LLC + * + * The Wilco Embedded Controller is able to send telemetry data + * which is useful for enterprise applications. A daemon running on + * the OS sends a command to the EC via a write() to a char device, and + * can read the response with a read() request. Both the request + * and the response are in an opaque binary format so that information + * which is proprietary to the enterprise service provider is secure. + * + * The
Re: [PATCH 0/4] mm: memcontrol: memory.stat cost & correctness
On Fri, Apr 12, 2019 at 11:15:03AM -0400, Johannes Weiner wrote: > The cgroup memory.stat file holds recursive statistics for the entire > subtree. The current implementation does this tree walk on-demand > whenever the file is read. This is giving us problems in production. > > 1. The cost of aggregating the statistics on-demand is high. A lot of > system service cgroups are mostly idle and their stats don't change > between reads, yet we always have to check them. There are also always > some lazily-dying cgroups sitting around that are pinned by a handful > of remaining page cache; the same applies to them. > > In an application that periodically monitors memory.stat in our fleet, > we have seen the aggregation consume up to 5% CPU time. > > 2. When cgroups die and disappear from the cgroup tree, so do their > accumulated vm events. The result is that the event counters at > higher-level cgroups can go backwards and confuse some of our > automation, let alone people looking at the graphs over time. > > To address both issues, this patch series changes the stat > implementation to spill counts upwards when the counters change. > > The upward spilling is batched using the existing per-cpu cache. In a > sparse file stress test with 5 level cgroup nesting, the additional > cost of the flushing was negligible (a little under 1% of CPU at 100% > CPU utilization, compared to the 5% of reading memory.stat during > regular operation). > > include/linux/memcontrol.h | 96 +++--- > mm/memcontrol.c| 290 +++ > mm/vmscan.c| 4 +- > mm/workingset.c| 7 +- > 4 files changed, 234 insertions(+), 163 deletions(-) > > For the series: Reviewed-by: Roman Gushchin Thanks!
Re: [PATCH v2 2/7] dt: bindings: Add multicolor class dt bindings documention
All On 4/12/19 2:10 PM, Jacek Anaszewski wrote: > Dan, > > On 4/12/19 1:50 PM, Dan Murphy wrote: >> Marek >> >> On 4/11/19 5:07 PM, Marek Behun wrote: >>> Hi Dan, >>> this probaly was discussed, but I did not follow brightness model >>> discussions: >>> what will happen if I set yellow by writing into yellow mode >>> brightness, and then orange by writing orange model brightness? >>> Will the resulting color be a mix of yellow and orange, or will the >>> orange overwrite the yellow setting? >>> >> >> This was not discussed and is a good question. If you write the yellow mode >> for a group of >> LEDs then yellow would be produced for the brightness requested. >> >> If orange is then requested then orange should be displayed at the >> brightness level requested. >> So yes the orange will over write the yellow. > > Yes, and individual color brightness levels should correspond > to the color components of the brightness-model level currently set. > >> The next question is if the absolute colors are written does it produce the >> same behavior? >> >> So if you have yellow and write to the red should the red LED brightness be >> modified or should the >> color switch to red? >> And if the red LED is on and the blue LED is written should the color switch >> to blue or should the blue and red LEDs be mixed together? > > Now, if any of the color brightness files is altered it should update > the hardware with this new setting, but brightness-model and main > brightness level should not be changed. The thing that is missing in our > proposal is lack of the way to check if brightness-model is up to date > (i.e. if it reflects what is written to the hardware). > > How about utilizing the sync file from the new colors directory? > It could return 1 on read when brightness levels of all colors > match exactly the ones assigned to the brightness model level currently > set. > There maybe a more pragmatic way to set the color brightness as opposed to defining each level. By defining the monochrome LEDs base maximum brightness for the brightness model and calculating a percentage for each LED from max I shared this with Jacek earlier and he did not go for it but lets see if others agree or disagree. With this method we don't have to define a slew of levels. This in theory may work but not sure it will work in practice. For instance lp5024_model_yellow: brightness-models { led-brightness-model; model@0 { model_name = "yellow"; layout = ; max_level = <255 247 196>; //max level of the monochrome LEDs to obtain the color model. }; }; Assumption made that a linear slope for each color model is expected. Then in the code when a new brightness level is requested for the color the brightness of the monochrome LEDs would be based off the percentage from the max_brightness of the max in the max_level array The hue is presented as just another color in the colors dir with brightness and max_brightness files. So it "appears" as a single LED. max_brightness = 255 This is the max of the LEDs in the array and this would also be presented to the user in the max_brightness of the hue. Red = (255/255) = 100% Green = (247/255) = 97% Blue = (196/255) = 77% brightness_val 128 Red = (128 * 100) / 100 = 128 Green = (128 * 97) / 100 = 124 Blue = (128 * 77) / 100 = 98 brightness_val 80 Red = (80 * 100) / 100 = 80 Green = (80 * 97) / 100 = 77 Blue = (80 * 77) / 100 = 61 brightness_val 10 Red = (10 * 100) / 100 = 10 Green = (10 * 97) / 100 = 9 Blue = (10 * 77) / 100 = 7 This method maintains a rough percentage delta between the LEDs. Where in the MAX brightness case (255) blue is 77% dimmer in intensity then Red and with a brightness that is 1/4 max the delta percentage is 75% so the delta is within 5% tolerance. This would keep the directory structures clean and the user space would just need to know colors, brightness and max_brightness for each color. Also with this simplistic model we can simplify the DT nodes to lp5024_model_yellow: brightness-models { led-brightness-model; model@0 { model_name = "yellow"; layout = ; max_level = <255 247 196>; }; }; lp5024_model_orange: brightness-models { led-brightness-model; model@1 { model_name = "orange"; layout = ; max_level = <236 140 16>; }; }; For orange max_brightness = 236 This is the max of the LEDs in the array and this would also be presented to the user in the max_brightness of the hue. Red = (236/236) = 100% Green = (140/236) = 59% Blue = (16/236) = 6% After all of this most likely someone will say this will not work. Like I indicate
[PATCH] binder: take read mode of mmap_sem in binder_alloc_free_page()
Restore the behavior of locking mmap_sem for reading in binder_alloc_free_page(), as was first done in commit 3013bf62b67a ("binder: reduce mmap_sem write-side lock"). That change was inadvertently reverted by commit 5cec2d2e5839 ("binder: fix race between munmap() and direct reclaim"). In addition, change the name of the label for the error path to accurately reflect that we're taking the lock for reading. Backporting note: This fix is only needed when *both* of the commits mentioned above are applied. That's an unlikely situation since they both landed during the development of v5.1 but only one of them is targeted for stable. Fixes: 5cec2d2e5839 ("binder: fix race between munmap() and direct reclaim") Signed-off-by: Tyler Hicks --- Hello - I was backporting 5cec2d2e5839 and didn't think it looked quite right. The patch description doesn't specifically mention a need to acquire mmap_sem for writing and the two commits (5cec2d2e5839 and 3013bf62b67a) landed relatively close in time to each other so I have a feeling that a mistake could have happened while resolving conflicts. I also noticed that commit 3013bf62b67a was slightly incomplete because it didn't rename the err_down_write_mmap_sem_failed label to err_down_read_mmap_sem_failed. I've tested this change by enabling CONFIG_ANDROID_BINDER_IPC_SELFTEST and verified that the binderfs_test selftest passes. Tyler drivers/android/binder_alloc.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index 195f120c4e8c..bb929eb87116 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -931,8 +931,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item, mm = alloc->vma_vm_mm; if (!mmget_not_zero(mm)) goto err_mmget; - if (!down_write_trylock(&mm->mmap_sem)) - goto err_down_write_mmap_sem_failed; + if (!down_read_trylock(&mm->mmap_sem)) + goto err_down_read_mmap_sem_failed; vma = binder_alloc_get_vma(alloc); list_lru_isolate(lru, item); @@ -945,7 +945,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item, trace_binder_unmap_user_end(alloc, index); } - up_write(&mm->mmap_sem); + up_read(&mm->mmap_sem); mmput(mm); trace_binder_unmap_kernel_start(alloc, index); @@ -959,7 +959,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item, mutex_unlock(&alloc->mutex); return LRU_REMOVED_RETRY; -err_down_write_mmap_sem_failed: +err_down_read_mmap_sem_failed: mmput_async(mm); err_mmget: err_page_already_freed: -- 2.7.4
[PATCH v1 1/1] Add support for IPMB driver
Support receiving IPMB requests on a Satellite MC from the BMC. Once a response is ready, this driver will send back a response to the BMC via the IPMB channel. Signed-off-by: Asmaa Mnebhi --- drivers/char/ipmi/Kconfig| 8 + drivers/char/ipmi/Makefile | 1 + drivers/char/ipmi/ipmb_dev_int.c | 418 +++ 3 files changed, 427 insertions(+) create mode 100644 drivers/char/ipmi/ipmb_dev_int.c diff --git a/drivers/char/ipmi/Kconfig b/drivers/char/ipmi/Kconfig index 94719fc..784bc84 100644 --- a/drivers/char/ipmi/Kconfig +++ b/drivers/char/ipmi/Kconfig @@ -74,6 +74,14 @@ config IPMI_SSIF have a driver that must be accessed over an I2C bus instead of a standard interface. This module requires I2C support. +config IPMB_DEVICE_INTERFACE + tristate 'IPMB Interface handler' + select I2C + help + Provides a driver for a device (Satellite MC) to + receive requests and send responses back to the BMC via + the IPMB interface. This module requires I2C support. + config IPMI_POWERNV depends on PPC_POWERNV tristate 'POWERNV (OPAL firmware) IPMI interface' diff --git a/drivers/char/ipmi/Makefile b/drivers/char/ipmi/Makefile index 3f06b20..0822adc 100644 --- a/drivers/char/ipmi/Makefile +++ b/drivers/char/ipmi/Makefile @@ -26,3 +26,4 @@ obj-$(CONFIG_IPMI_KCS_BMC) += kcs_bmc.o obj-$(CONFIG_ASPEED_BT_IPMI_BMC) += bt-bmc.o obj-$(CONFIG_ASPEED_KCS_IPMI_BMC) += kcs_bmc_aspeed.o obj-$(CONFIG_NPCM7XX_KCS_IPMI_BMC) += kcs_bmc_npcm7xx.o +obj-$(CONFIG_IPMB_DEVICE_INTERFACE) += ipmb_dev_int.o diff --git a/drivers/char/ipmi/ipmb_dev_int.c b/drivers/char/ipmi/ipmb_dev_int.c new file mode 100644 index 000..a45a686 --- /dev/null +++ b/drivers/char/ipmi/ipmb_dev_int.c @@ -0,0 +1,418 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Mellanox IPMB driver to receive a request and send a response + * + * Copyright (C) 2018 Mellanox Techologies, Ltd. + * + * This was inspired by Brendan Higgins' ipmi-bmc-bt-i2c driver. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#definePFX "IPMB DEV INT: " + +#defineDEVICE_NAME "ipmb-dev" + +#defineMAX_MSG_LEN 128 +#defineIPMB_REQUEST_LEN_MIN7 +#defineNETFN_RSP_BIT_MASK 0x4 +#defineREQUEST_QUEUE_MAX_LEN 256 + +#defineIPMB_MSG_LEN_IDX0 +#defineRQ_SA_8BIT_IDX 1 +#defineNETFN_LUN_IDX 2 + +#defineIPMB_MSG_PAYLOAD_LEN_MAX \ + (MAX_MSG_LEN - IPMB_REQUEST_LEN_MIN - 1) + +struct ipmb_msg { + u8 len; + u8 rs_sa; + u8 netfn_rs_lun; + u8 checksum1; + u8 rq_sa; + u8 rq_seq_rq_lun; + u8 cmd; + u8 payload[IPMB_MSG_PAYLOAD_LEN_MAX]; + /* checksum2 is included in payload */ +} __packed; + +static u32 ipmb_msg_len(struct ipmb_msg *ipmb_msg) +{ + return ipmb_msg->len + 1; +} + +struct ipmb_request_elem { + struct list_head list; + struct ipmb_msg request; +}; + +struct ipmb_dev { + struct i2c_client *client; + struct miscdevice miscdev; + struct ipmb_msg request; + struct list_head request_queue; + atomic_t request_queue_len; + struct ipmb_msg response; + size_t msg_idx; + spinlock_t lock; + wait_queue_head_t wait_queue; + struct mutex file_mutex; +}; + +static int receive_ipmb_request(struct ipmb_dev *ipmb_dev_p, + bool non_blocking, + struct ipmb_msg *ipmb_request) +{ + struct ipmb_request_elem *queue_elem; + unsigned long flags; + int res; + + if (!non_blocking) { + +try_again: + res = wait_event_interruptible(ipmb_dev_p->wait_queue, + atomic_read(&ipmb_dev_p->request_queue_len)); + if (res) + return res; + } + + spin_lock_irqsave(&ipmb_dev_p->lock, flags); + + if (!atomic_read(&ipmb_dev_p->request_queue_len)) { + spin_unlock_irqrestore(&ipmb_dev_p->lock, flags); + if (non_blocking) + return -EAGAIN; + goto try_again; + } + + if (list_empty(&ipmb_dev_p->request_queue)) { + pr_err(PFX "request_queue is empty\n"); + return -EIO; + } + + queue_elem = list_first_entry(&ipmb_dev_p->request_queue, +
[PATCH v1 0/1] Add support for IPMB driver
Support receiving IPMB requests on a Satellite MC from the BMC. Once a response is ready, this driver will send back a response to the BMC via the IPMB channel. Asmaa Mnebhi (1): Add support for IPMB driver drivers/char/ipmi/Kconfig| 8 + drivers/char/ipmi/Makefile | 1 + drivers/char/ipmi/ipmb_dev_int.c | 418 +++ 3 files changed, 427 insertions(+) create mode 100644 drivers/char/ipmi/ipmb_dev_int.c -- 2.1.2
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Thu, Apr 11, 2019 at 10:51 AM Joel Fernandes (Google) wrote: > > pidfd are /proc/pid directory file descriptors referring to a task group > leader. Android low memory killer (LMK) needs pidfd polling support to > replace code that currently checks for existence of /proc/pid for > knowing a process that is signalled to be killed has died, which is both > racy and slow. The pidfd poll approach is race-free, and also allows the > LMK to do other things (such as by polling on other fds) while awaiting > the process being killed to die. > > It prevents a situation where a PID is reused between when LMK sends a > kill signal and checks for existence of the PID, since the wrong PID is > now possibly checked for existence. > > In this patch, we follow the same mechanism used uhen the parent of the > task group is to be notified, that is when the tasks waiting on a poll > of pidfd are also awakened. > > We have decided to include the waitqueue in struct pid for the following > reasons: > 1. The wait queue has to survive for the lifetime of the poll. Including > it in task_struct would not be option in this case because the task can > be reaped and destroyed before the poll returns. Are you sure? I admit I'm not all that familiar with the innards of poll() on Linux, but I thought that the waitqueue only had to survive long enough to kick the polling thread and did *not* have to survive until poll() actually returned. > > 2. By including the struct pid for the waitqueue means that during > de_exec, the thread doing de_thread() automatically gets the new > waitqueue/pid even though its task_struct is different. I didn't follow this. Can you clarify? Also, please don't call your new helper wake_up_pidfd_pollers(). One of the goals of my patch was to make it generically possible for kernel code to wait for a task to exit. There are other cases besides pidfd for which this would be useful. Ahem, kthread. (The kthread implementation currently does some seriously awful things to detect when kthreads die.) Also, some hypothetical future vastly improved debugging API (to supercede ptrace for new applications) might want this. --Andy
Re: [PATCH v2 1/1] rtc: ds3232: get SRAM access using NVMEM Framework
Hi, On 08/04/2019 08:52:39+, Han Nandor wrote: > static int ds3232_probe(struct device *dev, struct regmap *regmap, int irq, > const char *name) > { > struct ds3232 *ds3232; > int ret; > + struct nvmem_config nvmem_cfg; > > ds3232 = devm_kzalloc(dev, sizeof(*ds3232), GFP_KERNEL); > if (!ds3232) > @@ -476,6 +499,15 @@ static int ds3232_probe(struct device *dev, struct > regmap *regmap, int irq, > ds3232->dev = dev; > dev_set_drvdata(dev, ds3232); > > + nvmem_cfg.name = "ds3232_sram"; > + nvmem_cfg.stride = 1; > + nvmem_cfg.size = DS3232_REG_SRAM_SIZE; > + nvmem_cfg.word_size = 1; > + nvmem_cfg.reg_read = ds3232_nvmem_read; > + nvmem_cfg.reg_write = ds3232_nvmem_write; > + nvmem_cfg.priv = ds3232; > + nvmem_cfg.type = NVMEM_TYPE_BATTERY_BACKED; > + You have to either initialize at the declaration as done here: https://elixir.bootlin.com/linux/v5.0/source/drivers/rtc/rtc-rv8803.c#L530 or memset the struct to 0. Else, this leaves members uninitialized. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Re: [PATCH v3 0/9] klp-convert livepatch build tooling
On Wed, Apr 10, 2019 at 11:50:49AM -0400, Joe Lawrence wrote: > Hi folks, > > This is the third installment of the klp-convert tool for generating and > processing livepatch symbols for livepatch module builds. For those > following along at home, archive links to previous versions: > > RFC: > https://lore.kernel.org/lkml/cover.1477578530.git.jpoim...@redhat.com/ > v2: > > https://lore.kernel.org/lkml/f52d29f7-7d1b-ad3d-050b-a9fa8878f...@redhat.com/ > > (Note that I don't see v2 archived at lore, but that is a link to the > most recent subthread that lore did catch.) > > > Livepatches may use symbols which are not contained in its own scope, > and, because of that, may end up compiled with relocations that will > only be resolved during module load. Yet, when the referenced symbols are > not exported, solving this relocation requires information on the object > that holds the symbol (either vmlinux or modules) and its position inside > the object, as an object may contain multiple symbols with the same name. > Providing such information must be done accordingly to what is specified > in Documentation/livepatch/module-elf-format.txt. > > Currently, there is no trivial way to embed the required information as > requested in the final livepatch elf object. klp-convert solves this > problem in two different forms: (i) by relying on a symbol map, which is > built during kernel compilation, to automatically infer the relocation > targeted symbol, and, when such inference is not possible (ii) by using > annotations in the elf object to convert the relocation accordingly to > the specification, enabling it to be handled by the livepatch loader. > > Given the above, add support for symbol mapping in the form of > Symbols.list file; add klp-convert tool; integrate klp-convert tool into > kbuild; make livepatch modules discernible during kernel compilation > pipeline; add data-structure and macros to enable users to annotate > livepatch source code; make modpost stage compatible with livepatches; > update livepatch-sample and update documentation. > > The patch was tested under three use-cases: > > use-case 1: There is a relocation in the lp that can be automatically > resolved by klp-convert. For example. see the saved_command_line > variable in lib/livepatch/test_klp_convert2.c. > > use-case 2: There is a relocation in the lp that cannot be automatically > resolved, as the name of the respective symbol appears in multiple > objects. The livepatch contains an annotation to enable a correct > relocation. See the KLP_MODULE_RELOC / KLP_SYMPOS annotation sections > in lib/livepatch/test_klp_convert{1,2}.c. > > use-case 3: There is a relocation in the lp that cannot be automatically > resolved similarly as 2, but no annotation was provided in the > livepatch, triggering an error during compilation. Reproducible by > removing the KLP_MODULE_RELOC / KLP_SYMPOS annotation sections in > lib/livepatch/test_klp_convert{1,2}.c. > > > Branches > > > Since I spent some time tinkering with v2 and accumulated a bunch of > fixes, I collected them up and am posting this new version. For review > purposes, I posted two branches up to github: > > 1 - an expanded branch that with changes separate from the original > https://github.com/joe-lawrence/linux/commits/klp-convert-v3-expanded > > 2 - a squashed branch of (1) that comprises v3: > https://github.com/joe-lawrence/linux/commits/klp-convert-v3 > > Non-trivial commits in the expanded branch have some extra commentary > and details for debugging in the commit message that were dropped when > squashing into their respective parent commits. > > > TODO > > > There are a few outstanding items that I came across while reviewing v2, > but as changes started accumulating, it made sense to defer those to a > later version. I'll reply to this thread individual topics to start > discussion sub-threads for those. > > > Changelogs > -- > > The commit changelogs were getting long, so I've moved them here for > posterity and review purposes: > > livepatch: Create and include UAPI headers > v2: > - jmoreira: split up and changelog > v3: > - joe: convert to SPDX license tags > > kbuild: Support for Symbols.list creation > v3: > - jmoreira: adjust for multiobject livepatch > - joe: add klpclean to PHONY > - joe: align KLP prefix > > livepatch: Add klp-convert tool > v2: > - khlebnikov: use HOSTLOADLIBES_ instead of HOSTLDFLAGS: -lelf must be > at the end > - jmoreira: add support to automatic relocation conversion in > klp-convert.c, changelog > v3: > - joe: convert to SPDX license tags > - jmoreira: add rela symbol name to WARNs > - jmoreira: ignore relocations to .TOC and symbols with index 0 > - joe: separate and fix valid_sympos() sympos=0 and sympos=1..n checks > - joe: fix symbol use-after-frees > - joe: fix remaining valgrind leak complaints (non-error paths only) > - joe: checkpatc
[ANNOUNCE] v5.0.7-rt5
Dear RT folks! I'm pleased to announce the v5.0.7-rt5 patch set. Changes since v5.0.7-rt4: - Update "x86: load FPU registers on return to userland" from v7 to v9. - Update "clocksource: improve Atmel TCB timer driver" from v7 to latest post by Alexandre Belloni. I hope this works, my HW refuses to cooperate so I can't verify. - Avoid allocating a spin lock with disabled interrupts in i915. Known issues - A warning triggered in "rcu_note_context_switch" originated from SyS_timer_gettime(). The issue was always there, it is now visible. Reported by Grygorii Strashko and Daniel Wagner. - rcutorture is currently broken on -RT. Reported by Juri Lelli. The delta patch against v5.0.7-rt4 is appended below and can be found here: https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.0/incr/patch-5.0.7-rt4-rt5.patch.xz You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git v5.0.7-rt5 The RT patch against v5.0.7 can be found here: https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.0/older/patch-5.0.7-rt5.patch.xz The split quilt queue is available at: https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.0/older/patches-5.0.7-rt5.tar.xz Sebastian diff --git a/arch/arm/configs/at91_dt_defconfig b/arch/arm/configs/at91_dt_defconfig index f4b253bd05ede..c8876d0ca41a8 100644 --- a/arch/arm/configs/at91_dt_defconfig +++ b/arch/arm/configs/at91_dt_defconfig @@ -65,6 +65,7 @@ CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=4 CONFIG_BLK_DEV_RAM_SIZE=8192 +CONFIG_ATMEL_TCLIB=y CONFIG_ATMEL_SSC=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y diff --git a/arch/arm/configs/sama5_defconfig b/arch/arm/configs/sama5_defconfig index c2dc35dfb3215..10ebc9481f72c 100644 --- a/arch/arm/configs/sama5_defconfig +++ b/arch/arm/configs/sama5_defconfig @@ -76,6 +76,7 @@ CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=4 CONFIG_BLK_DEV_RAM_SIZE=8192 +CONFIG_ATMEL_TCLIB=y CONFIG_ATMEL_SSC=y CONFIG_EEPROM_AT24=y CONFIG_SCSI=y diff --git a/arch/arm/mach-at91/Kconfig b/arch/arm/mach-at91/Kconfig index fa493a86e2bb3..da1d97a06c53a 100644 --- a/arch/arm/mach-at91/Kconfig +++ b/arch/arm/mach-at91/Kconfig @@ -121,10 +121,8 @@ config ATMEL_CLOCKSOURCE_PIT config ATMEL_CLOCKSOURCE_TCB bool "Timer Counter Blocks (TCB) support" - depends on SOC_AT91RM9200 || SOC_AT91SAM9 || SOC_SAMA5 || COMPILE_TEST default SOC_AT91RM9200 || SOC_AT91SAM9 || SOC_SAMA5 - depends on !ATMEL_TCLIB - select ATMEL_ARM_TCB_CLKSRC + select ATMEL_TCB_CLKSRC help Select this to get a high precision clocksource based on a TC block with a 5+ MHz base clock rate. diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 749ee389a1178..33e2294b5a675 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -14,6 +14,7 @@ #include #include #include +#include #include #include @@ -120,7 +121,7 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu); err;\ }) -#define kernel_insn_norestore(insn, output, input...) \ +#define kernel_insn_err(insn, output, input...) \ ({ \ int err;\ asm volatile("1:" #insn "\n\t" \ @@ -141,6 +142,22 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu); _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore) \ : output : input) +static inline int copy_fregs_to_user(struct fregs_state __user *fx) +{ + return user_insn(fnsave %[fx]; fwait, [fx] "=m" (*fx), "m" (*fx)); +} + +static inline int copy_fxregs_to_user(struct fxregs_state __user *fx) +{ + if (IS_ENABLED(CONFIG_X86_32)) + return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx)); + else if (IS_ENABLED(CONFIG_AS_FXSAVEQ)) + return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx)); + + /* See comment in copy_fxregs_to_kernel() below. */ + return user_insn(rex64/fxsave (%[fx]), "=m" (*fx), [fx] "R" (fx)); +} + static inline void copy_kernel_to_fxregs(struct fxregs_state *fx) { if (IS_ENABLED(CONFIG_X86_32)) { @@ -155,15 +172,23 @@ static inline void copy_kernel_to_fxregs(struct fxregs_state *fx) } } -static inline int copy_users_to_fxregs(struct fxregs_state *fx) +static inline int copy_kernel_to_fxregs_err(struct fxregs_state *fx) { if (IS_ENABLED(CONFIG_X86_32)) - return kernel_insn_norestore(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx)); + return kernel_insn_err(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx)); + else +
[PATCH v4] chrome/platform: cros_ec_proto: Add trace event to trace EC commands
This is useful to see which EC commands are being executed and when. To enable: echo 'cros_ec:*' >> /sys/kernel/debug/tracing/set_event Example: /* cros_ec_cmd: version: 0, command: EC_CMD_GET_VERSION */ /* cros_ec_cmd: version: 0, command: EC_CMD_GET_PROTOCOL_INFO */ /* cros_ec_cmd: version: 1, command: EC_CMD_GET_CMD_VERSIONS */ /* cros_ec_cmd: version: 1, command: EC_CMD_USB_PD_CONTROL */ Signed-off-by: Raul E Rangel Reviewed-by: Ross Zwisler --- Changes in v4: - Use the full command name so go to definition continues to work in my editor. Changes in v3: - Use a macro to avoid duplicating the ec command names. Changes in v2: - Changed comment style to match other cros_ec files. - Fixed commit tag. drivers/platform/chrome/Makefile| 4 +- drivers/platform/chrome/cros_ec_proto.c | 4 + drivers/platform/chrome/cros_ec_trace.c | 162 drivers/platform/chrome/cros_ec_trace.h | 51 4 files changed, 220 insertions(+), 1 deletion(-) create mode 100644 drivers/platform/chrome/cros_ec_trace.c create mode 100644 drivers/platform/chrome/cros_ec_trace.h diff --git a/drivers/platform/chrome/Makefile b/drivers/platform/chrome/Makefile index 1e2f0029b597..e542268454a4 100644 --- a/drivers/platform/chrome/Makefile +++ b/drivers/platform/chrome/Makefile @@ -3,12 +3,14 @@ obj-$(CONFIG_CHROMEOS_LAPTOP) += chromeos_laptop.o obj-$(CONFIG_CHROMEOS_PSTORE) += chromeos_pstore.o obj-$(CONFIG_CHROMEOS_TBMC)+= chromeos_tbmc.o +# tell define_trace.h where to find the cros ec trace header +CFLAGS_cros_ec_trace.o:= -I$(src) obj-$(CONFIG_CROS_EC_I2C) += cros_ec_i2c.o obj-$(CONFIG_CROS_EC_SPI) += cros_ec_spi.o cros_ec_lpcs-objs := cros_ec_lpc.o cros_ec_lpc_reg.o cros_ec_lpcs-$(CONFIG_CROS_EC_LPC_MEC) += cros_ec_lpc_mec.o obj-$(CONFIG_CROS_EC_LPC) += cros_ec_lpcs.o -obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o +obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o cros_ec_trace.o obj-$(CONFIG_CROS_KBD_LED_BACKLIGHT) += cros_kbd_led_backlight.o obj-$(CONFIG_CROS_EC_LIGHTBAR) += cros_ec_lightbar.o obj-$(CONFIG_CROS_EC_VBC) += cros_ec_vbc.o diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c index 97a068dff192..3d02c8259ac6 100644 --- a/drivers/platform/chrome/cros_ec_proto.c +++ b/drivers/platform/chrome/cros_ec_proto.c @@ -10,6 +10,8 @@ #include #include +#include "cros_ec_trace.h" + #define EC_COMMAND_RETRIES 50 static int prepare_packet(struct cros_ec_device *ec_dev, @@ -51,6 +53,8 @@ static int send_command(struct cros_ec_device *ec_dev, int ret; int (*xfer_fxn)(struct cros_ec_device *ec, struct cros_ec_command *msg); + trace_cros_ec_cmd(msg); + if (ec_dev->proto_version > 2) xfer_fxn = ec_dev->pkt_xfer; else diff --git a/drivers/platform/chrome/cros_ec_trace.c b/drivers/platform/chrome/cros_ec_trace.c new file mode 100644 index ..21c497e7a2d1 --- /dev/null +++ b/drivers/platform/chrome/cros_ec_trace.c @@ -0,0 +1,162 @@ +// SPDX-License-Identifier: GPL-2.0 +// Trace events for the ChromeOS Embedded Controller +// +// Copyright 2019 Google LLC. + +#define TRACE_SYM(a) {a, #a} + +#define ec_cmds \ + TRACE_SYM(EC_CMD_PROTO_VERSION), \ + TRACE_SYM(EC_CMD_HELLO), \ + TRACE_SYM(EC_CMD_GET_VERSION), \ + TRACE_SYM(EC_CMD_READ_TEST), \ + TRACE_SYM(EC_CMD_GET_BUILD_INFO), \ + TRACE_SYM(EC_CMD_GET_CHIP_INFO), \ + TRACE_SYM(EC_CMD_GET_BOARD_VERSION), \ + TRACE_SYM(EC_CMD_READ_MEMMAP), \ + TRACE_SYM(EC_CMD_GET_CMD_VERSIONS), \ + TRACE_SYM(EC_CMD_GET_COMMS_STATUS), \ + TRACE_SYM(EC_CMD_TEST_PROTOCOL), \ + TRACE_SYM(EC_CMD_GET_PROTOCOL_INFO), \ + TRACE_SYM(EC_CMD_GSV_PAUSE_IN_S5), \ + TRACE_SYM(EC_CMD_GET_FEATURES), \ + TRACE_SYM(EC_CMD_GET_SKU_ID), \ + TRACE_SYM(EC_CMD_SET_SKU_ID), \ + TRACE_SYM(EC_CMD_FLASH_INFO), \ + TRACE_SYM(EC_CMD_FLASH_READ), \ + TRACE_SYM(EC_CMD_FLASH_WRITE), \ + TRACE_SYM(EC_CMD_FLASH_ERASE), \ + TRACE_SYM(EC_CMD_FLASH_PROTECT), \ + TRACE_SYM(EC_CMD_FLASH_REGION_INFO), \ + TRACE_SYM(EC_CMD_VBNV_CONTEXT), \ + TRACE_SYM(EC_CMD_FLASH_SPI_INFO), \ + TRACE_SYM(EC_CMD_FLASH_SELECT), \ + TRACE_SYM(EC_CMD_PWM_GET_FAN_TARGET_RPM), \ + TRACE_SYM(EC_CMD_PWM_SET_FAN_TARGET_RPM), \ + TRACE_SYM(EC_CMD_PWM_GET_KEYBOARD_BACKLIGHT), \ + TRACE_SYM(EC_CMD_PWM_SET_KEYBOARD_BACKLIGHT), \ + TRACE_SYM(EC_CMD_PWM_SET_FAN_DUTY), \ + TRACE_SYM(EC_CMD_PWM_SET_DUTY), \ + TRACE_SYM(EC_CMD_PWM_GET_DUTY), \ + TRACE_SYM(EC_CMD_LIGHTBAR_CMD), \ + TRACE_SYM(EC_CMD_LED_CONTROL), \ + TRACE_SYM(EC_CMD_VBOOT_HASH), \ + TRACE_SYM(EC_CMD_MOTION_SENSE_CMD), \ + TRACE_SYM(EC_CMD_FORCE_LID_OPE
Re: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall
On Thu, Apr 11, 2019 at 10:36 PM Reshetova, Elena wrote: > > > On Wed, Apr 10, 2019 at 3:24 AM Reshetova, Elena > > wrote: > > > > > > > > > > > > On Mon, Apr 08, 2019 at 09:13:58AM +0300, Elena Reshetova wrote: > > > > > > > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c > > > > > > > index 7bc105f47d21..38ddc213a5e9 100644 > > > > > > > --- a/arch/x86/entry/common.c > > > > > > > +++ b/arch/x86/entry/common.c > > > > > > > @@ -35,6 +35,12 @@ > > > > > > > #define CREATE_TRACE_POINTS > > > > > > > #include > > > > > > > > > > > > > > +#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET > > > > > > > +#include > > > > > > > + > > > > > > > +void *alloca(size_t size); > > > > > > > +#endif > > > > > > > + > > > > > > > #ifdef CONFIG_CONTEXT_TRACKING > > > > > > > /* Called on entry from user mode with IRQs off. */ > > > > > > > __visible inline void enter_from_user_mode(void) > > > > > > > @@ -273,6 +279,13 @@ __visible void do_syscall_64(unsigned long > > > > > > > nr, > > struct > > > > > pt_regs *regs) > > > > > > > { > > > > > > > struct thread_info *ti; > > > > > > > > > > > > > > +#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET > > > > > > > + size_t offset = ((size_t)prandom_u32()) % 256; > > > > > > > + char *ptr = alloca(offset); > > > > > > > + > > > > > > > + asm volatile("":"=m"(*ptr)); > > > > > > > +#endif > > > > > > > + > > > > > > > enter_from_user_mode(); > > > > > > > local_irq_enable(); > > > > > > > ti = current_thread_info(); > > > > > > > > > > > > Would it make sense to also do this for the compat syscalls > > > > > > (do_fast_syscall_32, do_int80_syscall_32)? > > > > > > > > > > Could someone please include the full patch, with justification and > > > > > performance impact analysis etc.? Can only find the code part of the > > > > > thread on lkml, which leaves out this context. > > > > > > > > > > > > > Sorry, this is very weird, I cannot find it either from lkml, but it > > > > was sent there > > > > to begin with (and as visible from reply-to headers). > > > > > > > > Do you want me to resent original version or with "do_fast_syscall_32, > > > > do_int80_syscall_32" additions (I am finishing testing them now). > > > > > > I will resend the original x86_64 now since this is the one I tested and > > > measured properly. The 32 bit changes seem to work fine inside my 32 bit > > > VM, > > > but since I don't have any real 32 bit HW, I am hesitant to send them out > > > without > > > real HW testing and measuring. > > > > > > This is the asm code for 32 bits (note it requires __builtin_alloca > > > definition and not > > just alloca, > > > so I will change the 64 bit version to use it also): > > > > > > #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET > > > size_t offset = ((size_t)prandom_u32()) % 256; > > > 0xc10025b6 call 0xc146f7d0 > > > 0xc10025bb movzbl %al,%eax > > > char *ptr = __builtin_alloca(offset); > > > 0xc10025be add$0x12,%eax > > > 0xc10025c1 and$0x1fc,%eax > > > 0xc10025c6 sub%eax,%esp > > > 0xc10025c8 lea0x27(%esp),%eax > > > 0xc10025cc and$0xfff0,%eax > > > > > > Also, the result is 47 different random offsets produced, > > > which is slightly better than 33 offsets for x86_64. > > > > > > > I would suggest that you macro-ify this thing: > > > > #ifdef WHATEVER > > #define add_random_stack_offset() do { void *addr = ... } while (0) > > #else > > #define add_random_stack_offset() do {} while (0) > > #endif > > > > since you'll end up with more than one call site. > > Sure, will do. So, you are ok for this to be also called from > do_fast_syscall_32 > and do_int80_syscall_32? I can send the resulting patch, just cannot test on > any > real 32 bit HW, only VM. > Sounds good to me.
Re: [PATCH 15/17] fpga: dfl: fme: add power management support
Hi Hao, this looks suspiciously like a hwmon driver ;-) https://www.kernel.org/doc/Documentation/hwmon/hwmon-kernel-api.txt Cheers, Moritz On Thu, Apr 11, 2019 at 1:08 PM Alan Tull wrote: > > On Sun, Mar 24, 2019 at 10:24 PM Wu Hao wrote: > > Hi Hao, > > > > > This patch adds support for power management private feature under > > FPGA Management Engine (FME), sysfs interfaces are introduced for > > different power management functions, users could use these sysfs > > interface to get current number of consumed power, throttling > > How about > s/number/measurement/ > ? > > > thresholds, threshold status and other information, and configure > > different value for throttling thresholds too. > > > > Signed-off-by: Luwei Kang > > Signed-off-by: Xu Yilun > > Signed-off-by: Wu Hao > > --- > > Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 + > > drivers/fpga/dfl-fme-main.c | 257 > > +++ > > 2 files changed, 313 insertions(+) > > > > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme > > b/Documentation/ABI/testing/sysfs-platform-dfl-fme > > index d3aeb88..4b6448f 100644 > > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme > > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme > > @@ -100,3 +100,59 @@ Description: Read-only. Read this file to get > > the policy of temperature > > threshold1. It only supports two value (policy): > > 0 - AP2 state (90% throttling) > > 1 - AP1 state (50% throttling) > > + > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/consumed > > +Date: March 2019 > > +KernelVersion: 5.2 > > +Contact: Wu Hao > > +Description: Read-only. It returns current power consumed by FPGA. > > What are the units? > > > + > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1 > > +Date: March 2019 > > +KernelVersion: 5.2 > > +Contact: Wu Hao > > +Description: Read-Write. Read/Write this file to get/set current power > > + threshold1 in Watts. > > Perhaps document error codes here and for threshold2 below. > > > + > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2 > > +Date: March 2019 > > +KernelVersion: 5.2 > > +Contact: Wu Hao > > +Description: Read-Write. Read/Write this file to get/set current power > > + threshold2 in Watts. > > + > > +What: > > /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1_status > > +Date: March 2019 > > +KernelVersion: 5.2 > > +Contact: Wu Hao > > +Description: Read-only. It returns 1 if power consumption reaches the > > + threshold1, otherwise 0. > > I'm used to things like this requiring user to reset the status, so it > may be worth making it explicit that it will return to zero if > consumption drops below threshold if that's what's happening here. > If it's correct, perhaps could just say something like 'returns 1 if > power consumption is currently at or above threshold1, otherwise 0' > > > + > > +What: > > /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2_status > > +Date: March 2019 > > +KernelVersion: 5.2 > > +Contact: Wu Hao > > +Description: Read-only. It returns 1 if power consumption reaches the > > + threshold2, otherwise 0. > > Same here. > > > + > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/ltr > > +Date: March 2019 > > +KernelVersion: 5.2 > > +Contact: Wu Hao > > +Description: Read-only. Read this file to get current Latency Tolerance > > + Reporting (ltr) value, it's only valid for integrated > > + solution as it blocks CPU on low power state. > > If we're not on the integrated solution, it returns a value but it is > not really real? > > > + > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/xeon_limit > > +Date: March 2019 > > +KernelVersion: 5.2 > > +Contact: Wu Hao > > +Description: Read-only. Read this file to get power limit for xeon, it > > + is only valid for integrated solution. > > + > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/fpga_limit > > +Date: March 2019 > > +KernelVersion: 5.2 > > +Contact: Wu Hao > > +Description: Read-only. Read this file to get power limit for fpga, it > > + is only valid for integrated solution. > > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c > > index 449a17d..dafa6580 100644 > > --- a/drivers/fpga/dfl-fme-main.c > > +++ b/drivers/fpga/dfl-fme-main.c > > @@ -415,6 +415,259 @@ static const struct dfl_feature_ops > > fme_thermal_mgmt_ops = { > > .uinit = fme_thermal_mgmt_uinit, > > }; > > > > +#define FME_PWR_STATUS 0x8 > > +#define FME_LATENCY_TOLERANCE BIT_ULL(18) > > +#define PWR_CONSUMED GENMASK_ULL
Re: [RFC 2/2] signal: extend pidfd_send_signal() to allow expedited process killing
On Thu, Apr 11, 2019 at 10:47:50AM -0700, Daniel Colascione wrote: > On Thu, Apr 11, 2019 at 10:36 AM Matthew Wilcox wrote: > > It's not a question of the kernel deciding what the right signal is. > > The kernel knows whether a signal is fatal to a particular process or not. > > The question is whether the killing process should do the work of reaping > > the dying process's resources sometimes, always or never. Currently, > > that is never (the process reaps its own resources); Suren is suggesting > > sometimes, and I'm asking "Why not always?" > > FWIW, Suren's initial proposal is that the oom_reaper kthread do the > reaping, not the process sending the kill. Are you suggesting that > sending SIGKILL should spend a while in signal delivery reaping pages > before returning? I thought about just doing it this way, but I didn't > like the idea: it'd slow down mass-killing programs like killall(1). > Programs expect sending SIGKILL to be a fast operation that returns > immediately. Suren said that the implementation in this proposal wasn't to be taken literally. You've raised some good points here though. Do mass-killing programs like kill with a pgid or killall expect the signal-sending process to be fast, or do they not really care? >From the kernel's point of view, the same work has to be done, no matter whether the killer or the victim does it. Should the work be accounted to the killer or the victim? It probably doesn't matter. The victim doing the work allows for parallelisation, but I'm not convinced that's important either. I see another advantage for the killer doing the work -- if the task is currently blocking on I/O (eg to an NFS mount that has gone away), the killer can get rid of the task's page tables. If we have to wait for the I/O to complete before the victim reaps its own page tables, we may be waiting forever.
Re: [PATCH v2 2/2] PCI: Clean up resource_alignment parameter to not require static buffer
On 2019-04-12 2:44 p.m., Bjorn Helgaas wrote: > On Wed, Apr 10, 2019 at 03:05:32PM -0600, Logan Gunthorpe wrote: >> Clean up the 'resource_alignment' parameter code to use kstrdup >> in the initcall routine instead of a static buffer that wastes memory >> regardless of whether the feature is used. This allows us to drop >> 'COMMAND_LINE_SIZE' bytes (typically 256-4096 depending on architecture) >> of static data. >> >> This is similar to what has been done for the 'disable_acs_redir' >> parameter. >> >> This conversion also allows us to use RCU instead of the spinlock to >> deal with the concurrency issue which further reduces memory usage. > > I'm unconvinced about this part. Spinlocks are CS 101 material and > I'm a little hesitant to use a graduate-level technique like RCU in a > case where it doesn't really buy us much -- we don't need the > performance advantage and the size advantage seems minimal. But I'm > an RCU ignoramus and maybe need to be educated. That's a reasonable point. I didn't think it was that difficult and the kernel's RCU API is pretty straightforward. But I can resubmit later keeping the spinlock. You're right that it's not that big of a gain. >> As part of the clean up we also squash pci_get_resource_alignment_param() >> into resource_alignment_show() and pci_set_resource_alignment_param() >> into resource_alignment_store() seeing these functions only had one >> caller and the show/store wrappers were needlessly thin. > > Squashing makes sense and would be nice as a separate patch. Ok, will do. Logan
Re: [PATCH v2 10/11] platform/x86: asus-wmi: Switch fan boost mode
On 12.04.19 10:03, Daniel Drake wrote: > On Thu, Apr 11, 2019 at 1:47 PM Yurii Pavlovskyi > wrote: >> * 0x00 - is normal, >> * 0x01 - is obviously turbo by the amount of noise, might be useful to >> avoid CPU frequency throttling on high load, >> * 0x02 - the meaning is unknown at the time as modes are not named >> in the vendor documentation, but it does look like a quiet mode as CPU >> temperature does increase about 10 degrees on maximum load. > > I'm curious which vendor documentation you're working with here. That would be user manual for FX505 series, which is pretty minimalistic. It says it has a fan mode switch key and that's it. Following up on your comment, I've searched more and actually did found their names hidden in marketing web page on the website. You're right, they call them balanced, overboost and silent there. > From the spec, > 0 = normal > 1 = overboost > 2 = silent > Also you can use DSTS on the 0x00110018 device to check the exact > capabilities supported, which you should use to refine which modes can > be cycled through. > Bit 0 = overboost supported > Bit 1 = silent supported > > Thanks > Daniel > Thanks! I was guessing that the 3 in DSTS must've meant something. Appreciate your comments! Will definitely implement them. I'm going to post the v3 patch series approximately middle of next week. Best regards, Yurii
Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty
On Fri, Apr 12, 2019 at 1:16 PM Roman Gushchin wrote: > > On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote: > > On Fri, Apr 12, 2019 at 8:15 AM Johannes Weiner wrote: > > > > > > Right now, when somebody needs to know the recursive memory statistics > > > and events of a cgroup subtree, they need to walk the entire subtree > > > and sum up the counters manually. > > > > > > There are two issues with this: > > > > > > 1. When a cgroup gets deleted, its stats are lost. The state counters > > > should all be 0 at that point, of course, but the events are not. When > > > this happens, the event counters, which are supposed to be monotonic, > > > can go backwards in the parent cgroups. > > > > > > > We also faced this exact same issue as well and had the similar solution. > > > > > 2. During regular operation, we always have a certain number of lazily > > > freed cgroups sitting around that have been deleted, have no tasks, > > > but have a few cache pages remaining. These groups' statistics do not > > > change until we eventually hit memory pressure, but somebody watching, > > > say, memory.stat on an ancestor has to iterate those every time. > > > > > > This patch addresses both issues by introducing recursive counters at > > > each level that are propagated from the write side when stats change. > > > > > > Upward propagation happens when the per-cpu caches spill over into the > > > local atomic counter. This is the same thing we do during charge and > > > uncharge, except that the latter uses atomic RMWs, which are more > > > expensive; stat changes happen at around the same rate. In a sparse > > > file test (page faults and reclaim at maximum CPU speed) with 5 cgroup > > > nesting levels, perf shows __mod_memcg_page state at ~1%. > > > > > > > (Unrelated to this patchset) I think there should also a way to get > > the exact memcg stats. As the machines are getting bigger (more cpus > > and larger basic page size) the accuracy of stats are getting worse. > > Internally we have an additional interface memory.stat_exact for that. > > However I am not sure in the upstream kernel will an additional > > interface is better or something like /proc/sys/vm/stat_refresh which > > sync all per-cpu stats. > > I was thinking about eventually consistent counters: sync them periodically > from a worker thread. It should keep the cost of reading small, but > should increase the accuracy. Will it work for you? Worker thread based solution seems fine to me but Johannes said it would be best to not traverse the whole tree every few seconds.
Re: [PATCH v2 2/2] PCI: Clean up resource_alignment parameter to not require static buffer
On Wed, Apr 10, 2019 at 03:05:32PM -0600, Logan Gunthorpe wrote: > Clean up the 'resource_alignment' parameter code to use kstrdup > in the initcall routine instead of a static buffer that wastes memory > regardless of whether the feature is used. This allows us to drop > 'COMMAND_LINE_SIZE' bytes (typically 256-4096 depending on architecture) > of static data. > > This is similar to what has been done for the 'disable_acs_redir' > parameter. > > This conversion also allows us to use RCU instead of the spinlock to > deal with the concurrency issue which further reduces memory usage. I'm unconvinced about this part. Spinlocks are CS 101 material and I'm a little hesitant to use a graduate-level technique like RCU in a case where it doesn't really buy us much -- we don't need the performance advantage and the size advantage seems minimal. But I'm an RCU ignoramus and maybe need to be educated. > As part of the clean up we also squash pci_get_resource_alignment_param() > into resource_alignment_show() and pci_set_resource_alignment_param() > into resource_alignment_store() seeing these functions only had one > caller and the show/store wrappers were needlessly thin. Squashing makes sense and would be nice as a separate patch. > Signed-off-by: Logan Gunthorpe > Cc: Bjorn Helgaas > --- > drivers/pci/pci.c | 89 --- > 1 file changed, 53 insertions(+), 36 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 766f5779db92..13767c2409ae 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5896,9 +5896,7 @@ resource_size_t __weak pcibios_default_alignment(void) > return 0; > } > > -#define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE > -static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; > -static DEFINE_SPINLOCK(resource_alignment_lock); > +static const char __rcu *resource_alignment_param; > > /** > * pci_specified_resource_alignment - get resource alignment specified by > user. > @@ -5916,9 +5914,9 @@ static resource_size_t > pci_specified_resource_alignment(struct pci_dev *dev, > const char *p; > int ret; > > - spin_lock(&resource_alignment_lock); > - p = resource_alignment_param; > - if (!*p && !align) > + rcu_read_lock(); > + p = rcu_dereference(resource_alignment_param); > + if (!p) > goto out; > if (pci_has_flag(PCI_PROBE_ONLY)) { > align = 0; > @@ -5956,7 +5954,7 @@ static resource_size_t > pci_specified_resource_alignment(struct pci_dev *dev, > p++; > } > out: > - spin_unlock(&resource_alignment_lock); > + rcu_read_unlock(); > return align; > } > > @@ -6082,35 +6080,48 @@ void pci_reassigndev_resource_alignment(struct > pci_dev *dev) > } > } > > -static ssize_t pci_set_resource_alignment_param(const char *buf, size_t > count) > +static ssize_t resource_alignment_show(struct bus_type *bus, char *buf) > { > - if (count > RESOURCE_ALIGNMENT_PARAM_SIZE - 1) > - count = RESOURCE_ALIGNMENT_PARAM_SIZE - 1; > - spin_lock(&resource_alignment_lock); > - strncpy(resource_alignment_param, buf, count); > - resource_alignment_param[count] = '\0'; > - spin_unlock(&resource_alignment_lock); > - return count; > -} > + const char *p; > + size_t count = 0; > > -static ssize_t pci_get_resource_alignment_param(char *buf, size_t size) > -{ > - size_t count; > - spin_lock(&resource_alignment_lock); > - count = snprintf(buf, size, "%s", resource_alignment_param); > - spin_unlock(&resource_alignment_lock); > - return count; > -} > + rcu_read_lock(); > + p = rcu_dereference(resource_alignment_param); > + if (!p) > + goto out; > > -static ssize_t resource_alignment_show(struct bus_type *bus, char *buf) > -{ > - return pci_get_resource_alignment_param(buf, PAGE_SIZE); > + count = snprintf(buf, PAGE_SIZE, "%s", p); > + > + /* > + * When set by the command line there will not be a > + * line feed, which is ugly. So conditionally add it here. > + */ > + if (buf[count - 2] != '\n' && count < PAGE_SIZE - 1) { > + buf[count - 1] = '\n'; > + buf[count++] = 0; > + } > + > +out: > + rcu_read_unlock(); > + > + return count; > } > > static ssize_t resource_alignment_store(struct bus_type *bus, > const char *buf, size_t count) > { > - return pci_set_resource_alignment_param(buf, count); > + const char *p, *old; > + > + p = kstrndup(buf, count, GFP_KERNEL); > + if (!p) > + return -ENOMEM; > + > + old = rcu_dereference_protected(resource_alignment_param, 1); > + rcu_assign_pointer(resource_alignment_param, p); > + synchronize_rcu(); > + kfree(old); > + > + return count; > } > > static BUS_ATTR_RW(resource_alignment); > @@ -6238,8 +6249,8 @@ static
Re: [PATCH 0/14] v2 multi-die/package topology support
Le 12/04/2019 à 21:52, Len Brown a écrit : I think I prefer 's/threads/cpus/g' on that. Threads makes me think SMT, and I don't think there's any guarantee the part in question will have SMT on. >>> I think 'threads' is a bit confusing as well. We seem to be using 'cpu' >>> everywhere for something we can schedule tasks on, including the sysfs >>> /sys/devices/system/cpu/ subdirs for each SMT thread on SMT systems. > I agree with Peter and Morten. > "cpu" is more clear and consistent than "thread" here. > I'll spin the series with that string changed. Agreed, I should have used that suffix from the beginning. Brice
Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty
On Fri, Apr 12, 2019 at 1:10 PM Johannes Weiner wrote: > > On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote: > > We also faced this exact same issue as well and had the similar solution. > > > > > Signed-off-by: Johannes Weiner > > > > Reviewed-by: Shakeel Butt > > Thanks for the review! > > > (Unrelated to this patchset) I think there should also a way to get > > the exact memcg stats. As the machines are getting bigger (more cpus > > and larger basic page size) the accuracy of stats are getting worse. > > Internally we have an additional interface memory.stat_exact for that. > > However I am not sure in the upstream kernel will an additional > > interface is better or something like /proc/sys/vm/stat_refresh which > > sync all per-cpu stats. > > I was talking to Roman about this earlier as well and he mentioned it > would be nice to have periodic flushing of the per-cpu caches. The > global vmstat has something similar. We might be able to hook into > those workers, but it would likely require some smarts so we don't > walk the entire cgroup tree every couple of seconds. > > We haven't had any actual problems with the per-cpu fuzziness, mainly > because the cgroups of interest also grow in size as the machines get > bigger, and so the relative error doesn't increase. > Yes, this is very machine size dependent. We see this issue more often on larger machines. > Are your requirements that the error dissipates over time (waiting for > a threshold convergence somewhere?) or do you have automation that > gets decisions wrong due to the error at any given point in time? Not sure about the first one but we do have the second case. The node controller does make decisions in an online way based on the stats. Also we do periodically collect and store stats for all jobs across the fleet. This data is processed (offline) and is used in a lot of ways. The inaccuracy in the stats do affect all that analysis particularly for small jobs.
Re: [PATCH v2 1/2] PCI: Fix issue with "pci=disable_acs_redir" parameter being ignored
On Wed, Apr 10, 2019 at 03:05:31PM -0600, Logan Gunthorpe wrote: > In most cases, kmalloc will not be available early in boot when > pci_setup() is called. Thus, the kstrdup call that was added to fix the > __initdata bug with the disable_acs_redir parameter usually returns > NULL. Thus the parameter is discarded and it does not take into effect. > > To fix this, we store the string that's in initdata until a initcall > function can allocate the memory appropriately. This way we > don't need any additional static memory. > > Fixes: d2fd6e81912a ("PCI: Fix __initdata issue with "pci=disable_acs_redir" > parameter") > Signed-off-by: Logan Gunthorpe > Cc: Bjorn Helgaas Applied to for-linus for v5.1, thanks, Logan! > --- > drivers/pci/pci.c | 19 +-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 7c1b362f599a..766f5779db92 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -6262,8 +6262,7 @@ static int __init pci_setup(char *str) > } else if (!strncmp(str, "pcie_scan_all", 13)) { > pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS); > } else if (!strncmp(str, "disable_acs_redir=", 18)) { > - disable_acs_redir_param = > - kstrdup(str + 18, GFP_KERNEL); > + disable_acs_redir_param = str + 18; > } else { > printk(KERN_ERR "PCI: Unknown option `%s'\n", > str); > @@ -6274,3 +6273,19 @@ static int __init pci_setup(char *str) > return 0; > } > early_param("pci", pci_setup); > + > +/* > + * 'disable_acs_redir_param' is initialized in pci_setup(), above, to point > + * to data in the __initdata section which will be freed after the init > + * sequence is complete. We can't allocate memory in pci_setup() because some > + * architectures do not have any memory allocation service available during > + * an early_param() call. So we allocate memory and copy the variable here > + * before the init section is freed. > + */ > +static int __init pci_realloc_setup_params(void) > +{ > + disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL); > + > + return 0; > +} > +pure_initcall(pci_realloc_setup_params); > -- > 2.20.1 >
Re: [PATCH v2 05/11] platform/x86: asus-wmi: Support queued WMI event codes
On 12.04.19 09:48, Daniel Drake wrote: > On Thu, Apr 11, 2019 at 1:44 PM Yurii Pavlovskyi > wrote: >> Event codes are expected to be polled from a queue on at least some >> models. > > Maybe avoid the word "poll" since it suggests something else (at least to me). Ok, will change this in the code as well. >>> The fix flushes the old key codes out of the queue on load and after >> receiving event the queue is read until either .. or 1 is encountered. >> >> It might be considered a minor issue and no normal user would likely to >> observe this (there is little reason unloading the driver), but it does >> significantly frustrate a developer who is unlucky enough to encounter >> this. >> >> Introduce functionality for flushing and processing queued codes, which is >> enabled via quirk flag for ASUS7000. It might be considered if it is >> reasonable to enable it everywhere (might introduce regressions) or always >> try to flush the queue on module load and try to detect if this quirk is >> present in the future. >> >> This patch limits the effect to the specific hardware defined by ASUS7000 >> device that is used for driver detection by vendor driver of Fx505. The >> fallback is also implemented in case initial flush fails. > > Which vendor driver are you referring to here? The ASUS Keyboard hotkeys Driver V2.0.5 which is available to download for FX505 has this in his INF file: [ATKP.ntamd64] %DeviceDesc1% = NO_DRV64, ACPI\ASUS7000 But this driver is not generic. It is limited to very specific hardware, I'd guess gaming series (for instance K54C does not have this device). It was rather a way to avoid breaking anything. I think your suggestion below is much better. > > Researching more: > In our database of 144 Asus models, 142 of them have GANQ. > > Of those 142, 9 of them return One in the empty-queue case. The other > 133 match your FX505GM device exactly. So it seems valid to interpret > both 0x and 0x1 as queue-end markers. > > The 2 devices that do not have GANQ are not laptops. They also do not > have the _UID "ATK" WMI device, they only have "ASUSWMI" and their WMI > _WED method does not use a queue. > There are a few more units that have both ASUSWMI and ATK devices, and > the ASUSWMI device appears to never be queued. > Another observation is that the ASUSWMI device works with notify code > 0xD2, and the ATK device works with 0xFF. > > Nailing this down a bit further, I found a DSDT for EEEPC X101H: that > one only has ASUSWMI and it is also not queued. > > So I think you should make this queue behaviour applied more > generically, but either avoid it when the WMI device _UID is not "ATK" > (as discussed in the DCTS/DSTS thread), or avoid it when the notify > code is not 0x> > Thanks > Daniel Thanks a lot for your research, it's much appreciated! That seems to confirm that these two quirks are actually connected with ATK device. I guess it makes sense to combine the detection for both of them. Also to flush the queue we need to know the notify code beforehand, because it is checked in _WED so checking for ATK seems reasonable to me. Best regards, Yurii
Re: [PATCH v2] mm/hotplug: treat CMA pages as unmovable
On Fri 12-04-19 11:26:59, Qian Cai wrote: [...] > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index d96ca5bc555b..a9d2b0236167 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8005,7 +8005,10 @@ void *__init alloc_large_system_hash(const char > *tablename, > bool has_unmovable_pages(struct zone *zone, struct page *page, int count, >int migratetype, int flags) > { > - unsigned long pfn, iter, found; > + unsigned long found; > + unsigned long iter = 0; > + unsigned long pfn = page_to_pfn(page); > + char reason[] = "unmovable page"; const char *reason = "unovable page"; > > /* >* TODO we could make this much more efficient by not checking every > @@ -8015,17 +8018,20 @@ bool has_unmovable_pages(struct zone *zone, struct > page *page, int count, >* can still lead to having bootmem allocations in zone_movable. >*/ > > - /* > - * CMA allocations (alloc_contig_range) really need to mark isolate > - * CMA pageblocks even when they are not movable in fact so consider > - * them movable here. > - */ > - if (is_migrate_cma(migratetype) && > - is_migrate_cma(get_pageblock_migratetype(page))) > - return false; > + if (is_migrate_cma(get_pageblock_migratetype(page))) { > + /* > + * CMA allocations (alloc_contig_range) really need to mark > + * isolate CMA pageblocks even when they are not movable in fact > + * so consider them movable here. > + */ > + if (is_migrate_cma(migratetype)) > + return false; > + > + strscpy(reason, "CMA page", 9); reason = "CMA page"; > + goto unmovable; > + } Other than that looks good. After fixing the above, feel free to add Acked-by: Michal Hocko -- Michal Hocko SUSE Labs
Re: INFO: trying to register non-static key in vmk80xx_detach
syzbot has found a reproducer for the following crash on: HEAD commit:9a33b369 usb-fuzzer: main usb gadget fuzzer driver git tree: https://github.com/google/kasan/tree/usb-fuzzer console output: https://syzkaller.appspot.com/x/log.txt?x=11bed6fd20 kernel config: https://syzkaller.appspot.com/x/.config?x=23e37f59d94ddd15 dashboard link: https://syzkaller.appspot.com/bug?extid=54c2f58f15fe6876b6ad compiler: gcc (GCC) 9.0.0 20181231 (experimental) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15fdc6bf20 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=156dc2d320 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+54c2f58f15fe6876b...@syzkaller.appspotmail.com usb 1-1: config 0 has no interface number 0 usb 1-1: New USB device found, idVendor=10cf, idProduct=8068, bcdDevice=e6.8d usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 usb 1-1: config 0 descriptor?? vmk80xx 1-1:0.117: driver 'vmk80xx' failed to auto-configure device. INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.1.0-rc4-319354-g9a33b36 #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xe8/0x16e lib/dump_stack.c:113 assign_lock_key kernel/locking/lockdep.c:786 [inline] register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095 __lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582 lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x44/0x60 kernel/locking/spinlock.c:152 down+0x12/0x80 kernel/locking/semaphore.c:58 vmk80xx_detach+0x59/0x100 drivers/staging/comedi/drivers/vmk80xx.c:829 comedi_device_detach+0xed/0x800 drivers/staging/comedi/drivers.c:204 comedi_device_cleanup.part.0+0x68/0x140 drivers/staging/comedi/comedi_fops.c:156 comedi_device_cleanup drivers/staging/comedi/comedi_fops.c:187 [inline] comedi_free_board_dev.part.0+0x16/0x90 drivers/staging/comedi/comedi_fops.c:190 comedi_free_board_dev drivers/staging/comedi/comedi_fops.c:189 [inline] comedi_release_hardware_device+0x111/0x140 drivers/staging/comedi/comedi_fops.c:2880 comedi_auto_config.cold+0x124/0x1b0 drivers/staging/comedi/drivers.c:1068 usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361 really_probe+0x2da/0xb10 drivers/base/dd.c:509 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454 __device_attach+0x223/0x3a0 drivers/base/dd.c:844 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514 device_add+0xad2/0x16e0 drivers/base/core.c:2106 usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021 generic_probe+0xa2/0xda drivers/usb/core/generic.c:210 usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266 really_probe+0x2da/0xb10 drivers/base/dd.c:509 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454 __device_attach+0x223/0x3a0 drivers/base/dd.c:844 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514 device_add+0xad2/0x16e0 drivers/base/core.c:2106 usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534 hub_port_connect drivers/usb/core/hub.c:5089 [inline] hub_port_connect_change drivers/usb/core/hub.c:5204 [inline] port_event drivers/usb/core/hub.c:5350 [inline] hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432 process_one_work+0x90f/0x1580 kernel/workqueue.c:2269 worker_thread+0x9b/0xe20 kernel/workqueue.c:2415 kthread+0x313/0x420 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] SMP KASAN PTI CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.1.0-rc4-319354-g9a33b36 #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event RIP: 0010:__list_add_valid+0x47/0xa0 lib/list_debug.c:26 Code: fa 48 c1 ea 03 80 3c 02 00 75 50 49 8b 54 24 08 48 39 f2 0f 85 59 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f2 48 c1 ea 03 <80> 3c 02 00 75 3a 48 8b 16 4c 39 e2 0f 85 5e 01 00 00 48 39 f5 0f RSP: 0018:8880a84a7110 EFLAGS: 00010046 RAX: dc00 RBX: 888097143850 RCX: RDX: RSI: RDI: 888097143898 RBP: 8880a84a7170 R08: 8880a8491880 R09: ed1015094e31 R10: ed1015094e30 R11: 0003 R12: 888097143890 R13: R14: 888097143898 R15: 888097143890 FS: (00
Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty
On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote: > On Fri, Apr 12, 2019 at 8:15 AM Johannes Weiner wrote: > > > > Right now, when somebody needs to know the recursive memory statistics > > and events of a cgroup subtree, they need to walk the entire subtree > > and sum up the counters manually. > > > > There are two issues with this: > > > > 1. When a cgroup gets deleted, its stats are lost. The state counters > > should all be 0 at that point, of course, but the events are not. When > > this happens, the event counters, which are supposed to be monotonic, > > can go backwards in the parent cgroups. > > > > We also faced this exact same issue as well and had the similar solution. > > > 2. During regular operation, we always have a certain number of lazily > > freed cgroups sitting around that have been deleted, have no tasks, > > but have a few cache pages remaining. These groups' statistics do not > > change until we eventually hit memory pressure, but somebody watching, > > say, memory.stat on an ancestor has to iterate those every time. > > > > This patch addresses both issues by introducing recursive counters at > > each level that are propagated from the write side when stats change. > > > > Upward propagation happens when the per-cpu caches spill over into the > > local atomic counter. This is the same thing we do during charge and > > uncharge, except that the latter uses atomic RMWs, which are more > > expensive; stat changes happen at around the same rate. In a sparse > > file test (page faults and reclaim at maximum CPU speed) with 5 cgroup > > nesting levels, perf shows __mod_memcg_page state at ~1%. > > > > (Unrelated to this patchset) I think there should also a way to get > the exact memcg stats. As the machines are getting bigger (more cpus > and larger basic page size) the accuracy of stats are getting worse. > Internally we have an additional interface memory.stat_exact for that. > However I am not sure in the upstream kernel will an additional > interface is better or something like /proc/sys/vm/stat_refresh which > sync all per-cpu stats. I was thinking about eventually consistent counters: sync them periodically from a worker thread. It should keep the cost of reading small, but should increase the accuracy. Will it work for you?
Re: [1/3] x86: Update DEBUG_TLBFLUSH options description.
On 4/10/19 11:56 PM, Christoph Hellwig wrote: Given that this option enables generic code (which you reuse for RISC-V later in this series) please also move the config option to mm/Kconfig, proabbly keyed off another ARCH_HAVE_DEBUG_TLBFLUSH symbol that the architecture can select. Sure. Regards, Atish ___ linux-riscv mailing list linux-ri...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty
On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote: > We also faced this exact same issue as well and had the similar solution. > > > Signed-off-by: Johannes Weiner > > Reviewed-by: Shakeel Butt Thanks for the review! > (Unrelated to this patchset) I think there should also a way to get > the exact memcg stats. As the machines are getting bigger (more cpus > and larger basic page size) the accuracy of stats are getting worse. > Internally we have an additional interface memory.stat_exact for that. > However I am not sure in the upstream kernel will an additional > interface is better or something like /proc/sys/vm/stat_refresh which > sync all per-cpu stats. I was talking to Roman about this earlier as well and he mentioned it would be nice to have periodic flushing of the per-cpu caches. The global vmstat has something similar. We might be able to hook into those workers, but it would likely require some smarts so we don't walk the entire cgroup tree every couple of seconds. We haven't had any actual problems with the per-cpu fuzziness, mainly because the cgroups of interest also grow in size as the machines get bigger, and so the relative error doesn't increase. Are your requirements that the error dissipates over time (waiting for a threshold convergence somewhere?) or do you have automation that gets decisions wrong due to the error at any given point in time?
Re: [PATCH 0/4] mm: memcontrol: memory.stat cost & correctness
On Fri, Apr 12, 2019 at 8:15 AM Johannes Weiner wrote: > > The cgroup memory.stat file holds recursive statistics for the entire > subtree. The current implementation does this tree walk on-demand > whenever the file is read. This is giving us problems in production. > > 1. The cost of aggregating the statistics on-demand is high. A lot of > system service cgroups are mostly idle and their stats don't change > between reads, yet we always have to check them. There are also always > some lazily-dying cgroups sitting around that are pinned by a handful > of remaining page cache; the same applies to them. > > In an application that periodically monitors memory.stat in our fleet, > we have seen the aggregation consume up to 5% CPU time. > > 2. When cgroups die and disappear from the cgroup tree, so do their > accumulated vm events. The result is that the event counters at > higher-level cgroups can go backwards and confuse some of our > automation, let alone people looking at the graphs over time. > > To address both issues, this patch series changes the stat > implementation to spill counts upwards when the counters change. > > The upward spilling is batched using the existing per-cpu cache. In a > sparse file stress test with 5 level cgroup nesting, the additional > cost of the flushing was negligible (a little under 1% of CPU at 100% > CPU utilization, compared to the 5% of reading memory.stat during > regular operation). For whole series: Reviewed-by: Shakeel Butt > > include/linux/memcontrol.h | 96 +++--- > mm/memcontrol.c| 290 +++ > mm/vmscan.c| 4 +- > mm/workingset.c| 7 +- > 4 files changed, 234 insertions(+), 163 deletions(-) > >
Re: [PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures
On Fri, Apr 12, 2019 at 03:14:18PM -0400, Johannes Weiner wrote: > With the default overcommit==guess we occasionally run into mmap > rejections despite plenty of memory that would get dropped under > pressure but just isn't accounted reclaimable. One example of this is > dying cgroups pinned by some page cache. A previous case was auxiliary > path name memory associated with dentries; we have since annotated > those allocations to avoid overcommit failures (see d79f7aa496fc ("mm: > treat indirectly reclaimable memory as free in overcommit logic")). > > But trying to classify all allocated memory reliably as reclaimable > and unreclaimable is a bit of a fool's errand. There could be a myriad > of dependencies that constantly change with kernel versions. > > It becomes even more questionable of an effort when considering how > this estimate of available memory is used: it's not compared to the > system-wide allocated virtual memory in any way. It's not even > compared to the allocating process's address space. It's compared to > the single allocation request at hand! > > So we have an elaborate left-hand side of the equation that tries to > assess the exact breathing room the system has available down to a > page - and then compare it to an isolated allocation request with no > additional context. We could fail an allocation of N bytes, but for > two allocations of N/2 bytes we'd do this elaborate dance twice in a > row and then still let N bytes of virtual memory through. This doesn't > make a whole lot of sense. > > Let's take a step back and look at the actual goal of the > heuristic. From the documentation: > >Heuristic overcommit handling. Obvious overcommits of address >space are refused. Used for a typical system. It ensures a >seriously wild allocation fails while allowing overcommit to >reduce swap usage. root is allowed to allocate slightly more >memory in this mode. This is the default. > > If all we want to do is catch clearly bogus allocation requests > irrespective of the general virtual memory situation, the physical > memory counter-part doesn't need to be that complicated, either. > > When in GUESS mode, catch wild allocations by comparing their request > size to total amount of ram and swap in the system. > > Signed-off-by: Johannes Weiner My 2c here: any kinds of percpu counters and percpu data is accounted as unreclaimable and can alter the calculation significantly. This is a special problem on hosts, which were idle for some time. Without any memory pressure, kernel caches do occupy most of the memory, so than a following attempt to start a workload fails. With a big pleasure: Acked-by: Roman Gushchin Thanks!
Re: [PATCH 04/11] platform/x86: asus-wmi: Add quirk to force DSTS WMI method detection
Wow that is a great thing you done, thanks a lot for your time and your kind words! Your suggestion indeed sounds good judging by your results. Both of my devices have UID "ATK" (actually FX505 also has two other PNP0C14 devices with HIDs SampleDev and TestDev, but I think they are disabled by default). It looks like the driver is loaded already only if ASUS_WMI_MGMT_GUID is present, so I guess that could be used for wmi_driver_register. A little obstacle is that ACPI device pointer is stored in wmi_block structure, which is internal to the wmi.c. This means we would have to add a new method to wmi.h / wmi.c for getting the UID of WMI ACPI device. I guess it might be possible to somehow navigate the device tree back to the platform driver of WMI module, but it would definitely be more obscure, and searching for the device by HID again is not only slower but generally would require a guarantee that it's the same device. So adding a new function looks reasonable to me. It might also be useful to someone implementing similar things for other vendors in the future. I'm going to implement this in updated patch. Thanks, Yurii On 11.04.19 12:55, Daniel Drake wrote: > On Thu, Apr 11, 2019 at 4:28 AM Yurii Pavlovskyi > wrote: >> The DSTS method detection fails, as nothing is returned if method is not >> defined in WMNB. As a result the control of keyboard backlight is not >> functional for TUF Gaming series laptops (at the time the only >> functionality of the driver on this model implemented with WMI methods). >> >> Patch was tested on a newer TUF Gaming FX505GM and older K54C model. >> >> FX505GM: >> Method (WMNB, 3, Serialized) >> { ... >> If ((Local0 == 0x53545344)) >> { >> ... >> Return (Zero) >> } >> ... >> // No return >> } >> >> K54C: >> Method (WMNB, 3, Serialized) >> { ... >> If ((Local0 == 0x53545344)) >> { >> ... >> Return (0x02) >> } >> ... >> Return (0xFFFE) >> } >> >> The non-existing method ASUS_WMI_METHODID_DSTS=0x53544344 (actually it is >> DCTS in little endian ASCII) is selected in asus->dsts. >> >> One way to fix this would be to call both for every known device ID until >> some answers - this would increase module load time. >> >> Another option is to check some device that is known to exist on every >> model - none known at the time. >> >> Last option, which is implemented, is to check for presence of the >> ASUS7000 device in ACPI tree (it is a dummy device), which is the >> condition used for loading the vendor driver for this model. This might >> not fix every affected model ever produced, but it likely does not >> introduce any regressions. The patch introduces a quirk that is enabled >> when ASUS7000 is found. >> >> Scope (_SB) >> { >> Device (ATK) >> { >> Name (_HID, "ASUS7000") // _HID: Hardware ID >> } >> } > > Hmm, tricky! But about time we did something about the DSTS vs DCTS guessing. > > While we don't have definitive knowledge of the right thing to do > here, I think I have enough info available to solidify some > assumptions we'd otherwise be afraid to make, and then we can > implement a better approach here. > > In our database of 144 Asus DSDTs, 14 of them respond to DCTS: > > AS_D520MT > D425MC > D640SA > D320SF-K > D415MT > D830MT > G11DF > M32CD4-K > V221ID > V272UN_SKU3 > Z240IE > ZN220IC-K > ZN241IC > ZN270IE > > Of those 14, they all additionally respond to DSTS, except for D415MT > and AS_D520MT. > > In all 14 cases, the DCTS handling is done inside a device with _UID > ASUSWMI. None of the other 130 products have a device with that _UID. > > Furthermore, we recently received access to the ASUS spec, which > confirms that the Instance Name for EeePC is "ACPI\PNP0C14\ASUSWMI_0" > whereas the Instance Name for other notebooks is "ACPI\PNP0C14\ATK_0". > > The 12 devices that respond to both DCTS and DSTS, do it on separate > different devices, albeit with the same _WDG UUID. The one with _UID > ASUSWMI responds to DCTS, and the one with _UID ATK responds to DSTS. > > So that seems fairly convincing. My thinking is that we can check the > _UID of the underlying device, and use DCTS for ASUSWMI, DSTS > otherwise. To do that, I think you'd have to rework the driver probing > so that it happens through wmi_driver_register(). Then from the probe > function onwards you will get a wmi_device, and hopefully you can get > the _UID through that instance somehow. > > There's a separate issue of what happens on those 12 machines where > there are two WMI devs, with the same _WDG GUID. > drivers/platform/x86/wmi.c drops duplicates, so one of them is being > ignored in that case. We'd ideally find a way to ignore the ASUSWMI > node and go with ATK in that situation. But I think this can be > separated from your work here. > > Thanks for these patches and welcome to the world of kernel development! > > Daniel >
NFS won't finish
Kernel: Linux freedom 5.1.0-rc4 #1 SMP Sun Apr 7 18:50:31 MDT 2019 x86_64 x86_64 x86_64 GNU/Linux Sending large files to NAS box via NFS failed (It looks like files are getting transferred, but NFS gets stuck/won't finish. Note: 1. NFS works ok with kernel 5.0.0 2. To test NFS, (using the5.1.0-rc4 kernel), I connected to my NAS box and moved a small log file (less that 20kB) and it moved from computer to NAS without problem. I then deleted the small log file from the NAS box, again without problem. (connected via NFS). I then copied a database archive (24929 bytes) from NAS box to computer, again without fail. But today I moved 3 large tarball/archives from computer to NAS box (TX: cum: 8.50GB), and it seems to have moved OK, but NFS does not want to complete the transfer. I can't get a directory listing from where I started the copy, and the process queue shows: root 6176 0.0 0.0 20824 2504 pts/5 D+ 10:28 0:11 cp site.Apr-05-2019.tgz site.Apr-12-2019.tgz site.Mar-29-2019.tgz nas/website/BACKUP/2019.04.12 DMESG: perf: interrupt took too long (2694 > 2500), lowering kernel.perf_event_max_sample_rate to 74250 [ 1644.507579] perf: interrupt took too long (3369 > 3367), lowering kernel.perf_event_max_sample_rate to 59250 [ 2349.205399] FS-Cache: Loaded [ 2349.348842] FS-Cache: Netfs 'nfs' registered for caching [ 2349.652305] NFS: Registering the id_resolver key type [ 2349.652316] Key type id_resolver registered [ 2349.652317] Key type id_legacy registered [ 3595.680685] nfs: server 192.168.20.114 not responding, still trying [ 3595.680692] nfs: server 192.168.20.114 not responding, still trying [ 3595.680695] nfs: server 192.168.20.114 not responding, still trying [ 3626.400412] INFO: task cp:6176 blocked for more than 120 seconds. [ 3626.400418] Tainted: G I 5.1.0-rc4 #1 [ 3626.400419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3626.400422] cp D 0 6176 5890 0x [ 3626.400425] Call Trace: [ 3626.400434] ? __schedule+0x2a1/0x870 [ 3626.400436] schedule+0x28/0x70 [ 3626.400440] io_schedule+0x12/0x40 [ 3626.400443] wait_on_page_bit_common+0x10a/0x280 [ 3626.400447] ? file_check_and_advance_wb_err+0xc0/0xc0 [ 3626.400450] __filemap_fdatawait_range+0xd8/0x140 [ 3626.400453] filemap_write_and_wait_range+0x3b/0x70 [ 3626.400471] nfs_file_fsync+0x44/0x1f0 [nfs] [ 3626.400474] filp_close+0x2a/0x70 [ 3626.400475] __x64_sys_close+0x1e/0x50 [ 3626.400477] do_syscall_64+0x4f/0x100 [ 3626.400479] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3626.400481] RIP: 0033:0x7f6be90958d4 [ 3626.400485] Code: Bad RIP value. [ 3626.400486] RSP: 002b:7ffe07ec4398 EFLAGS: 0246 ORIG_RAX: 0003 [ 3626.400487] RAX: ffda RBX: RCX: 7f6be90958d4 [ 3626.400488] RDX: 0002 RSI: 55b82b141000 RDI: 0004 [ 3626.400488] RBP: 7ffe07ec4770 R08: 0002 R09: 0001 [ 3626.400489] R10: 0002 R11: 0246 R12: 7ffe07ec4970 [ 3626.400490] R13: R14: 7ffe07ec4900 R15: 7ffe07ec52db [ 3747.231331] INFO: task cp:6176 blocked for more than 241 seconds. [ 3747.231337] Tainted: G I 5.1.0-rc4 #1 [ 3747.231339] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. According to NTOP, it looks like data was transferred: TX: cum: 8.50GB peak: 208b rates: 0b 0b 18b My Hardware: 04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) Subsystem: ASUSTeK Computer Inc. Motherboard Flags: bus master, fast devsel, latency 0, IRQ 26 Memory at fb9fc000 (64-bit, non-prefetchable) [size=16K] I/O ports at b800 [size=256] Expansion ROM at fb9c [disabled] [size=128K] Capabilities: Kernel driver in use: sky2 Kernel modules: sky2 06:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) Subsystem: ASUSTeK Computer Inc. Motherboard Flags: bus master, fast devsel, latency 0, IRQ 25 Memory at fbbfc000 (64-bit, non-prefetchable) [size=16K] I/O ports at d800 [size=256] Expansion ROM at fbbc [disabled] [size=128K] Capabilities: Kernel driver in use: sky2 Kernel modules: sky2 Drivers: lsmod | grep nfs yields: nfsv3 53248 1 nfsv4 692224 0 nfs 315392 3 nfsv4,nfsv3 fscache 380928 2 nfsv4,nfs nfsd 413696 13 auth_rpcgss 102400 2 nfsd,rpcsec_gss_krb5 nfs_acl 16384 2 nfsd,nfsv3 lockd 106496 3 nfsd,nfsv3,nfs grace 16384 2 nfsd,lockd sunrpc 438272 28 nfsd,nfsv4,auth_rpcgss,lockd,nfsv3,rpcsec_gss_krb5
Re: [PATCH] soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher
On 20/03/2019 20:11, Nathan Chancellor wrote: > On Thu, Mar 07, 2019 at 03:56:51PM -0700, Nathan Chancellor wrote: >> When building with -Wsometimes-uninitialized, Clang warns: >> >> drivers/soc/mediatek/mtk-pmic-wrap.c:1358:6: error: variable 'rdata' is >> used uninitialized whenever '||' condition is true >> [-Werror,-Wsometimes-uninitialized] >> >> If pwrap_write returns non-zero, pwrap_read will not be called to >> initialize rdata, meaning that we will use some random uninitialized >> stack value in our print statement. Zero initialize rdata in case this >> happens. >> >> Link: https://github.com/ClangBuiltLinux/linux/issues/401 >> Signed-off-by: Nathan Chancellor >> --- >> >> I don't know if this is better or to just restructure the if statement >> below (I'm not an expert in this code so I'll leave that up to the >> maintainers to decide). >> >> drivers/soc/mediatek/mtk-pmic-wrap.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/soc/mediatek/mtk-pmic-wrap.c >> b/drivers/soc/mediatek/mtk-pmic-wrap.c >> index 8236a6c87e19..2f632e8790f7 100644 >> --- a/drivers/soc/mediatek/mtk-pmic-wrap.c >> +++ b/drivers/soc/mediatek/mtk-pmic-wrap.c >> @@ -1281,7 +1281,7 @@ static bool pwrap_is_pmic_cipher_ready(struct >> pmic_wrapper *wrp) >> static int pwrap_init_cipher(struct pmic_wrapper *wrp) >> { >> int ret; >> -u32 rdata; >> +u32 rdata = 0; >> >> pwrap_writel(wrp, 0x1, PWRAP_CIPHER_SWRST); >> pwrap_writel(wrp, 0x0, PWRAP_CIPHER_SWRST); >> -- >> 2.21.0 >> > > Gentle ping (if there was a response to this, I didn't receive it). I > know I sent it in the middle of a merge window so I get if it slipped > through the cracks. > applied now to v5.1-next/soc thanks
Re: [PATCH] RDMA/cma: Make CM response timeout and # CM retries configurable
On Fri, Apr 12, 2019 at 04:30:24PM -0300, Jason Gunthorpe wrote: > On Mon, Mar 04, 2019 at 06:27:26AM +, Parav Pandit wrote: > > > I think we should use rdma_nl_register(RDMA_NL_RDMA_CM, cma_cb_table) which > > was removed as part of ID stats removal. > > Because of below reasons. > > 1. rdma netlink command auto loads the module > > This is probably the best argument to stay away from sysctl for module > parameters.. It is tricky to make sure the rdma module is loaded > before sysctl runs in boot, and I don't think sysctl autoloads missing > modules, or somehow copes with dynamic module loading, does it? As far as I remember, sysctl won't be available till module is loaded, so autoload won't work because path doesn't exist. > > To that end we should probably have the entire sysctl configurable > available in netlink > > Jason signature.asc Description: PGP signature
Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty
On Fri, Apr 12, 2019 at 8:15 AM Johannes Weiner wrote: > > Right now, when somebody needs to know the recursive memory statistics > and events of a cgroup subtree, they need to walk the entire subtree > and sum up the counters manually. > > There are two issues with this: > > 1. When a cgroup gets deleted, its stats are lost. The state counters > should all be 0 at that point, of course, but the events are not. When > this happens, the event counters, which are supposed to be monotonic, > can go backwards in the parent cgroups. > We also faced this exact same issue as well and had the similar solution. > 2. During regular operation, we always have a certain number of lazily > freed cgroups sitting around that have been deleted, have no tasks, > but have a few cache pages remaining. These groups' statistics do not > change until we eventually hit memory pressure, but somebody watching, > say, memory.stat on an ancestor has to iterate those every time. > > This patch addresses both issues by introducing recursive counters at > each level that are propagated from the write side when stats change. > > Upward propagation happens when the per-cpu caches spill over into the > local atomic counter. This is the same thing we do during charge and > uncharge, except that the latter uses atomic RMWs, which are more > expensive; stat changes happen at around the same rate. In a sparse > file test (page faults and reclaim at maximum CPU speed) with 5 cgroup > nesting levels, perf shows __mod_memcg_page state at ~1%. > (Unrelated to this patchset) I think there should also a way to get the exact memcg stats. As the machines are getting bigger (more cpus and larger basic page size) the accuracy of stats are getting worse. Internally we have an additional interface memory.stat_exact for that. However I am not sure in the upstream kernel will an additional interface is better or something like /proc/sys/vm/stat_refresh which sync all per-cpu stats. > Signed-off-by: Johannes Weiner Reviewed-by: Shakeel Butt > --- > include/linux/memcontrol.h | 54 +- > mm/memcontrol.c| 205 ++--- > 2 files changed, 150 insertions(+), 109 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index cae7d1b11eea..36bdfe8e5965 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -128,6 +128,7 @@ struct mem_cgroup_per_node { > > struct lruvec_stat __percpu *lruvec_stat_cpu; > atomic_long_t lruvec_stat[NR_VM_NODE_STAT_ITEMS]; > + atomic_long_t lruvec_stat_local[NR_VM_NODE_STAT_ITEMS]; > > unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; > > @@ -279,8 +280,12 @@ struct mem_cgroup { > MEMCG_PADDING(_pad2_); > > atomic_long_t vmstats[MEMCG_NR_STAT]; > + atomic_long_t vmstats_local[MEMCG_NR_STAT]; > + > atomic_long_t vmevents[NR_VM_EVENT_ITEMS]; > - atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; > + atomic_long_t vmevents_local[NR_VM_EVENT_ITEMS]; > + > + atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; > > unsigned long socket_pressure; > > @@ -565,6 +570,20 @@ struct mem_cgroup *lock_page_memcg(struct page *page); > void __unlock_page_memcg(struct mem_cgroup *memcg); > void unlock_page_memcg(struct page *page); > > +/* > + * idx can be of type enum memcg_stat_item or node_stat_item. > + * Keep in sync with memcg_exact_page_state(). > + */ > +static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int > idx) > +{ > + long x = atomic_long_read(&memcg->vmstats[idx]); > +#ifdef CONFIG_SMP > + if (x < 0) > + x = 0; > +#endif > + return x; > +} > + > /* > * idx can be of type enum memcg_stat_item or node_stat_item. > * Keep in sync with memcg_exact_page_state(). > @@ -572,7 +591,7 @@ void unlock_page_memcg(struct page *page); > static inline unsigned long memcg_page_state_local(struct mem_cgroup *memcg, >int idx) > { > - long x = atomic_long_read(&memcg->vmstats[idx]); > + long x = atomic_long_read(&memcg->vmstats_local[idx]); > #ifdef CONFIG_SMP > if (x < 0) > x = 0; > @@ -624,6 +643,24 @@ static inline void mod_memcg_page_state(struct page > *page, > mod_memcg_state(page->mem_cgroup, idx, val); > } > > +static inline unsigned long lruvec_page_state(struct lruvec *lruvec, > + enum node_stat_item idx) > +{ > + struct mem_cgroup_per_node *pn; > + long x; > + > + if (mem_cgroup_disabled()) > + return node_page_state(lruvec_pgdat(lruvec), idx); > + > + pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); > + x = atomic_long_read(&pn->lruvec_stat[idx]); > +#ifdef CONFIG_SMP > + if (
Re: [PATCH 0/14] v2 multi-die/package topology support
> > > I think I prefer 's/threads/cpus/g' on that. Threads makes me think SMT, > > > and I don't think there's any guarantee the part in question will have > > > SMT on. > > > > I think 'threads' is a bit confusing as well. We seem to be using 'cpu' > > everywhere for something we can schedule tasks on, including the sysfs > > /sys/devices/system/cpu/ subdirs for each SMT thread on SMT systems. I agree with Peter and Morten. "cpu" is more clear and consistent than "thread" here. I'll spin the series with that string changed. > > Another thing that I find confusing is that with this series we a new > > die id/mask which is totally unrelated to the DIE level in the > > sched_domain hierarchy. We should rename DIE level to something that > > reflects what it really is. If we can agree on that ;-) > > > > NODE level? Cache topology and node interconnect topology impact performance, and so we what we look at, when we decided to run something on this CPU or that one. That logical topology lives within the physical package and die topology, but doesn't necessarily match it. For example, caches can be shared or split into pieces inside a package or die. Logical nodes may match die boundaries, or there may be multiple logical nodes within a single physical package or die. We have mechanisms for explicitly enumerating the caches, and for nodes. This patch series does not touch those mechanisms. The reason we need to know about physical packages and die is that there are other things associated with them. eg. power and temperature domains, and certain system registers follow these physical boundaries. Code that talks to those items needs to be able to understand these physical boundaries. I hope that helps. thanks, Len Brown, Intel Open Source Technology Center
[PATCH v3] chrome/platform: cros_ec_proto: Add trace event to trace EC commands
This is useful to see which EC commands are being executed and when. To enable: echo 'cros_ec:*' >> /sys/kernel/debug/tracing/set_event Example: /* cros_ec_cmd: version: 0, command: GET_VERSION */ /* cros_ec_cmd: version: 0, command: GET_PROTOCOL_INFO */ /* cros_ec_cmd: version: 1, command: GET_CMD_VERSIONS */ /* cros_ec_cmd: version: 1, command: USB_PD_CONTROL */ Signed-off-by: Raul E Rangel --- Changes in v3: - Use a macro to avoid duplicating the ec command names. Changes in v2: - Changed comment style to match other cros_ec files. - Fixed commit tag. drivers/platform/chrome/Makefile| 4 +- drivers/platform/chrome/cros_ec_proto.c | 4 + drivers/platform/chrome/cros_ec_trace.c | 162 drivers/platform/chrome/cros_ec_trace.h | 51 4 files changed, 220 insertions(+), 1 deletion(-) create mode 100644 drivers/platform/chrome/cros_ec_trace.c create mode 100644 drivers/platform/chrome/cros_ec_trace.h diff --git a/drivers/platform/chrome/Makefile b/drivers/platform/chrome/Makefile index 1e2f0029b597..e542268454a4 100644 --- a/drivers/platform/chrome/Makefile +++ b/drivers/platform/chrome/Makefile @@ -3,12 +3,14 @@ obj-$(CONFIG_CHROMEOS_LAPTOP) += chromeos_laptop.o obj-$(CONFIG_CHROMEOS_PSTORE) += chromeos_pstore.o obj-$(CONFIG_CHROMEOS_TBMC)+= chromeos_tbmc.o +# tell define_trace.h where to find the cros ec trace header +CFLAGS_cros_ec_trace.o:= -I$(src) obj-$(CONFIG_CROS_EC_I2C) += cros_ec_i2c.o obj-$(CONFIG_CROS_EC_SPI) += cros_ec_spi.o cros_ec_lpcs-objs := cros_ec_lpc.o cros_ec_lpc_reg.o cros_ec_lpcs-$(CONFIG_CROS_EC_LPC_MEC) += cros_ec_lpc_mec.o obj-$(CONFIG_CROS_EC_LPC) += cros_ec_lpcs.o -obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o +obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o cros_ec_trace.o obj-$(CONFIG_CROS_KBD_LED_BACKLIGHT) += cros_kbd_led_backlight.o obj-$(CONFIG_CROS_EC_LIGHTBAR) += cros_ec_lightbar.o obj-$(CONFIG_CROS_EC_VBC) += cros_ec_vbc.o diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c index 97a068dff192..3d02c8259ac6 100644 --- a/drivers/platform/chrome/cros_ec_proto.c +++ b/drivers/platform/chrome/cros_ec_proto.c @@ -10,6 +10,8 @@ #include #include +#include "cros_ec_trace.h" + #define EC_COMMAND_RETRIES 50 static int prepare_packet(struct cros_ec_device *ec_dev, @@ -51,6 +53,8 @@ static int send_command(struct cros_ec_device *ec_dev, int ret; int (*xfer_fxn)(struct cros_ec_device *ec, struct cros_ec_command *msg); + trace_cros_ec_cmd(msg); + if (ec_dev->proto_version > 2) xfer_fxn = ec_dev->pkt_xfer; else diff --git a/drivers/platform/chrome/cros_ec_trace.c b/drivers/platform/chrome/cros_ec_trace.c new file mode 100644 index ..1f817aa9d886 --- /dev/null +++ b/drivers/platform/chrome/cros_ec_trace.c @@ -0,0 +1,162 @@ +// SPDX-License-Identifier: GPL-2.0 +// Trace events for the ChromeOS Embedded Controller +// +// Copyright 2019 Google LLC. + +#define EC_CMD(a) {EC_CMD_##a, #a} + +#define ec_cmds \ + EC_CMD(PROTO_VERSION), \ + EC_CMD(HELLO), \ + EC_CMD(GET_VERSION), \ + EC_CMD(READ_TEST), \ + EC_CMD(GET_BUILD_INFO), \ + EC_CMD(GET_CHIP_INFO), \ + EC_CMD(GET_BOARD_VERSION), \ + EC_CMD(READ_MEMMAP), \ + EC_CMD(GET_CMD_VERSIONS), \ + EC_CMD(GET_COMMS_STATUS), \ + EC_CMD(TEST_PROTOCOL), \ + EC_CMD(GET_PROTOCOL_INFO), \ + EC_CMD(GSV_PAUSE_IN_S5), \ + EC_CMD(GET_FEATURES), \ + EC_CMD(GET_SKU_ID), \ + EC_CMD(SET_SKU_ID), \ + EC_CMD(FLASH_INFO), \ + EC_CMD(FLASH_READ), \ + EC_CMD(FLASH_WRITE), \ + EC_CMD(FLASH_ERASE), \ + EC_CMD(FLASH_PROTECT), \ + EC_CMD(FLASH_REGION_INFO), \ + EC_CMD(VBNV_CONTEXT), \ + EC_CMD(FLASH_SPI_INFO), \ + EC_CMD(FLASH_SELECT), \ + EC_CMD(PWM_GET_FAN_TARGET_RPM), \ + EC_CMD(PWM_SET_FAN_TARGET_RPM), \ + EC_CMD(PWM_GET_KEYBOARD_BACKLIGHT), \ + EC_CMD(PWM_SET_KEYBOARD_BACKLIGHT), \ + EC_CMD(PWM_SET_FAN_DUTY), \ + EC_CMD(PWM_SET_DUTY), \ + EC_CMD(PWM_GET_DUTY), \ + EC_CMD(LIGHTBAR_CMD), \ + EC_CMD(LED_CONTROL), \ + EC_CMD(VBOOT_HASH), \ + EC_CMD(MOTION_SENSE_CMD), \ + EC_CMD(FORCE_LID_OPEN), \ + EC_CMD(CONFIG_POWER_BUTTON), \ + EC_CMD(USB_CHARGE_SET_MODE), \ + EC_CMD(PSTORE_INFO), \ + EC_CMD(PSTORE_READ), \ + EC_CMD(PSTORE_WRITE), \ + EC_CMD(RTC_GET_VALUE), \ + EC_CMD(RTC_GET_ALARM), \ + EC_CMD(RTC_SET_VALUE), \ + EC_CMD(RTC_SET_ALARM), \ + EC_CMD(PORT80_READ), \ + EC_CMD(VSTORE_INFO), \ + EC_CMD(VSTORE_READ), \ + EC_CMD(VSTORE_WRITE), \ + EC_CMD(THERMAL_SET_THRESHOLD), \ + EC_CMD(THERMAL_GET_THRESHOLD),
Re: [PATCH v2] chrome/platform: cros_ec_proto:: Add trace event to trace EC commands
On Fri, Apr 12, 2019 at 03:04:38PM -0400, Steven Rostedt wrote: > On Fri, 12 Apr 2019 12:49:44 -0600 > Raul E Rangel wrote: > > > +#define ec_cmds \ > > + {EC_CMD_PROTO_VERSION, "PROTO_VERSION"}, \ > > + {EC_CMD_HELLO, "HELLO"}, \ > > + {EC_CMD_GET_VERSION, "GET_VERSION"}, \ > > + {EC_CMD_READ_TEST, "READ_TEST"}, \ > > + {EC_CMD_GET_BUILD_INFO, "GET_BUILD_INFO"}, \ > > + {EC_CMD_GET_CHIP_INFO, "GET_CHIP_INFO"}, \ > > + {EC_CMD_GET_BOARD_VERSION, "GET_BOARD_VERSION"}, \ > > + {EC_CMD_READ_MEMMAP, "READ_MEMMAP"}, \ > > + {EC_CMD_GET_CMD_VERSIONS, "GET_CMD_VERSIONS"}, \ > > + {EC_CMD_GET_COMMS_STATUS, "GET_COMMS_STATUS"}, \ > > + {EC_CMD_TEST_PROTOCOL, "TEST_PROTOCOL"}, \ > > + {EC_CMD_GET_PROTOCOL_INFO, "GET_PROTOCOL_INFO"}, \ > > + {EC_CMD_GSV_PAUSE_IN_S5, "GSV_PAUSE_IN_S5"}, \ > > + {EC_CMD_GET_FEATURES, "GET_FEATURES"}, \ > > Usually, if I have something like this, I would do: > > > #define ec_cmds \ > EC(PROTO_VERSION), \ > EC(HELLO), \ > EC(GET_VERSION), \ > EC(READ_TEST), \ > [...] > > Then: > > #define EC(a) {EC_CMD_##a, #a} > > and then ec_cmds ends up with the same result with much less typing and > little risk for copy past errors. I generally don't like doing that because it breaks my editors Goto Definition, but I can send out a patch that uses the macro. Thanks, Raul > > -- Steve
Re: kernel BUG at fs/inode.c:LINE!
Hi Dmitry, On Fri, Apr 12, 2019 at 01:04:27PM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > On Thu, Apr 11, 2019 at 4:23 AM Al Viro wrote: > > > > On Thu, Apr 11, 2019 at 08:50:17AM +0800, Ian Kent wrote: > > > On Wed, 2019-04-10 at 14:41 +0200, Dmitry Vyukov wrote: > > > > On Wed, Apr 10, 2019 at 2:12 PM Al Viro wrote: > > > > > > > > > > On Wed, Apr 10, 2019 at 08:07:15PM +0800, Ian Kent wrote: > > > > > > > > > > > > I'm unable to find a branch matching the line numbers. > > > > > > > > > > > > > > Given that, on the face of it, the scenario is impossible I'm > > > > > > > seeking clarification on what linux-next to look at for the > > > > > > > sake of accuracy. > > > > > > > > > > > > > > So I'm wondering if this testing done using the master branch > > > > > > > or one of the daily branches one would use to check for conflicts > > > > > > > before posting? > > > > > > > > > > > > Sorry those are tags not branches. > > > > > > > > > > FWIW, that's next-20181214; it is what master had been in mid-December > > > > > and master is rebased every day. Can it be reproduced with the > > > > > current > > > > > tree? > > > > > > > > From the info on the dashboard we know that it happened only once on > > > > d14b746c (the second one is result of reproducing the first one). So > > > > it was either fixed or just hard to trigger. > > > > > > Looking at the source of tag next-20181214 in linux-next-history I see > > > this is mistake I made due to incorrect error handling which I fixed > > > soon after (there was in fact a double iput()). > > > > Right - "autofs: fix possible inode leak in autofs_fill_super()" had been > > broken (and completely pointless), leading to double iput() in that failure > > case. And yes, double iput() can trigger that BUG_ON(), and with non-zero > > odds do so with that stack trace. > > > > As far as I'm concerned, case closed - bug had been in a misguided "fix" > > for inexistent leak (coming from misreading the calling conventions for > > d_make_root()), introduced in -next at next-20181130 and kicked out of > > there in next-20181219. Dropped by Ian's request in > > Message-ID: <66d497c00cffb3e4109ca0d5287c8277954d7132.ca...@themaw.net> > > which has fixed that crap. Moreover, that posting had been in reply to > > that very syzcaller report, AFAICS. > > > > I don't know how to tell the bot to STFU and close the report in this > > situation; up to you, folks. > > > Please see the following for this: > > > syzbot will keep track of this bug report. See: > > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with > > syzbot. > > There are just 3 operations: mark as fixed by a commit, mark as > invalid, mark as duplicate. > I won't be always around. Tracking statuses of bug reports is in the > interests of kernel quality. > As I suggested before, syzbot should automatically invalidate old bug reports, especially on linux-next, that are unlikely to still be real problems. And, syzbot should send remainders about bugs that are still occurring. Instead, currently developers have to waste time debugging bugs caused by patches that were in linux-next for a few days/weeks and then dropped months ago, and have to argue with you every time about how to tell syzbot to close the bug when it never made it into git history. And meanwhile, no one is looking into the bugs that are being hit on mainline every hour or even every 10 minutes for the past year. That's not a great strategy to actually get bugs fixed. - Eric
Re: [PATCH] selftests/seccomp: Handle namespace failures gracefully
On Fri, Apr 12, 2019 at 11:07:11AM -0600, shuah wrote: > On 4/12/19 9:25 AM, Tycho Andersen wrote: > > On Thu, Apr 11, 2019 at 04:56:31PM -0700, Kees Cook wrote: > > > When running without USERNS or PIDNS the seccomp test would hang since > > > it was waiting forever for the child to trigger the user notification > > > since it seems the glibc() abort handler makes a call to getpid(), > > > which would trap again. This changes the getpid filter to getppid, and > > > makes sure ASSERTs execute to stop from spawning the listener. > > > > > > Reported-by: Shuah Khan > > > Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") > > > Signed-off-by: Kees Cook > > > > Sorry for the delay, thanks for looking at this! > > > > Reviewed-by: Tycho Andersen > > > > Thanks both. Should it go into stables. I will pull this and > add stable if that is appropriate. Yes, for 5.0+ that sounds good. Thanks! Tycho
Re: 1808d65b55 ("asm-generic/tlb: Remove arch_tlb*_mmu()"): BUG: KASAN: stack-out-of-bounds in __change_page_attr_set_clr
> On Apr 12, 2019, at 11:19 AM, Peter Zijlstra wrote: > > On Fri, Apr 12, 2019 at 03:11:22PM +, Nadav Amit wrote: >>> On Apr 12, 2019, at 4:17 AM, Peter Zijlstra wrote: > >>> To clarify, 'that' is Nadav's patch: >>> >>> 515ab7c41306 ("x86/mm: Align TLB invalidation info") >>> >>> which turns out to be the real problem. >> >> Sorry for that. I still think it should be aligned, especially with all the >> effort the Intel puts around to avoid bus-locking on unaligned atomic >> operations. > > No atomics anywhere in sight, so that's not a concern. You are right. I still think that at least TLB-wise it should be better to have the argument off-stack. I’ll try to run some experiments, based on your feedback, and send a patch on top of your revert. Sorry for the mess, again.
Re: [PATCH 1/7] s390: zcrypt: driver callback to indicate resource in use
On 4/12/19 5:43 AM, Cornelia Huck wrote: On Fri, 12 Apr 2019 08:54:54 +0200 Harald Freudenberger wrote: On 11.04.19 23:03, Tony Krowiak wrote: Introduces a new driver callback to prevent a root user from unbinding an AP queue from its device driver if the queue is in use. This prevents a root user from inadvertently taking a queue away from a guest and giving it to the host, or vice versa. The callback will be invoked whenever a change to the AP bus's apmask or aqmask sysfs interfaces may result in one or more AP queues being removed from its driver. If the callback responds in the affirmative for any driver queried, the change to the apmask or aqmask will be rejected with a device in use error. For this patch, only non-default drivers will be queried. Currently, there is only one non-default driver, the vfio_ap device driver. The vfio_ap device driver manages AP queues passed through to one or more guests and we don't want to unexpectedly take AP resources away from guests which are most likely independently administered. Signed-off-by: Tony Krowiak Hello Tony you are going out with your patches but ... what is the result of the discussions we had in the past ? Do we have a common understanding that we want to have it this way ? A driver which is able to claim resources and the bus code has lower precedence ? This is what Reinhard suggested and what we agreed to as a team quite some time ago. I submitted three versions of this patch to our internal mailing list, all of which you reviewed, so I'm not sure why you are surprised by this now. I don't know about previous discussions, but I'm curious how you arrived at this design. A driver being able to override the bus code seems odd. Restricting this to 'non-default' drivers looks even more odd. What is this trying to solve? Traditionally, root has been able to shoot any appendages of their choice. I would rather expect that in a supported setup you'd have some management software keeping track of device usage and making sure that only unused queues can be unbound. Do we need to export more information to user space so that management software can make better choices? Is there a reason other than tradition to prevent root from accidentally shooting himself in the foot or any other appendage? At present, sysfs is the only supported setup, so IMHO it makes sense to prevent as much accidentally caused damage as possible in the kernel. --- drivers/s390/crypto/ap_bus.c | 138 +-- drivers/s390/crypto/ap_bus.h | 3 + 2 files changed, 135 insertions(+), 6 deletions(-)
Re: [PATCH v2 4/6] dt-bindings: ti-lmu: Modify dt bindings for the LM3697
And one more: [5] - LED naming patch set v3 On 4/11/19 9:48 PM, Jacek Anaszewski wrote: Hi Rob, I've collected references to the DT patches we're waiting to be reviewed. This is just to make sure that none of them will be missed - recent traffic on the linux-leds list is a bit heavier. [0] - draft of LED multi color class - we would especially like to get a feedback on the proposed brightness-model DT design [1][2] - LM3697 backlight driver rework [3][4] - LM36274 backlight driver with regulator support On 4/5/19 4:28 PM, Dan Murphy wrote: The LM3697 is a single function LED driver. The single function LED driver needs to reside in the LED directory as a dedicated LED driver and not as a MFD device. The device does have common brightness and ramp features and those can be accomodated by a TI LMU framework. The LM3697 dt binding needs to be moved from the ti-lmu.txt and a dedicated LED dt binding needs to be added. The new LM3697 LED dt binding will then reside in the Documentation/devicetree/bindings/leds directory and follow the current LED and general bindings guidelines. Signed-off-by: Dan Murphy [...] [0] https://marc.info/?l=linux-kernel&m=155414072611538&w=2 [1] https://marc.info/?l=devicetree&m=155447454524411&w=2 [2] https://lore.kernel.org/patchwork/patch/1058762/ [3] https://marc.info/?l=devicetree&m=155447624324995&w=2 [4] https://lore.kernel.org/patchwork/patch/1058779/ [5] - https://marc.info/?l=linux-kernel&m=155405492231039&w=2 -- Best regards, Jacek Anaszewski
Re: [PATCH 0/14] v2 multi-die/package topology support
On Tue, Feb 26, 2019 at 1:51 PM Peter Zijlstra wrote: > > On Tue, Feb 26, 2019 at 01:19:58AM -0500, Len Brown wrote: > > Restored sysfs core_siblings, core_siblings_list > > > > v1 proposed re-defining this existing attribute to > > be the threads in a die, rather than in a package. > > > > For compatibility, decided rather to keep this > > attribute unchanged, for now, even though > > its name makes little sense, and it makes > > no sense in a multi-die system. > > So why do things that make no sense? >>> 7) /sys/devices/system/cpu/cpuX/topology/core_siblings: >>> >>> internal kernel map of cpuX's hardware threads within the same >>> physical_package_id. This definition tells you what cpus are in what package. That is fine, it is useful, and it is in use. What doesn't make sense is that it is called "core_siblings". Who is to say that the map of CPUs inside a package has anything to do with "cores"? Sometimes it does, sometimes it doesn't... > What breaks? User space applications, such as lscpu and hwloc are using this attribute per its definition, to figure out what cpus are in what packages. If we change the definition to match the attribute's name, they break. If we change the attribute name to match the definition, they break. So the plan is to simply leave this attribute and its definition in place, deprecate it, and move to the new attribute names that don't have the word "siblings" in them -- which imply a known fixed topology. We can schedule this attribute to be deleted some day, but changing it and hoping you've updated all of user-space would be a unnecessary pain. Len Brown, Intel Open Source Technology Center
Re: [PATCH] RDMA/cma: Make CM response timeout and # CM retries configurable
On Mon, Mar 04, 2019 at 06:27:26AM +, Parav Pandit wrote: > I think we should use rdma_nl_register(RDMA_NL_RDMA_CM, cma_cb_table) which > was removed as part of ID stats removal. > Because of below reasons. > 1. rdma netlink command auto loads the module This is probably the best argument to stay away from sysctl for module parameters.. It is tricky to make sure the rdma module is loaded before sysctl runs in boot, and I don't think sysctl autoloads missing modules, or somehow copes with dynamic module loading, does it? To that end we should probably have the entire sysctl configurable available in netlink Jason
Re: [PATCH v2 2/7] dt: bindings: Add multicolor class dt bindings documention
Dan, On 4/12/19 8:46 PM, Dan Murphy wrote: Jacek On 4/12/19 1:14 PM, Jacek Anaszewski wrote: Hi Marek, On 4/12/19 12:07 AM, Marek Behun wrote: Hi Dan, this probaly was discussed, but I did not follow brightness model discussions: what will happen if I set yellow by writing into yellow mode brightness, and then orange by writing orange model brightness? Will the resulting color be a mix of yellow and orange, or will the orange overwrite the yellow setting? Orange will overwrite yellow settings. When color name is given it should be treated as a hue. Then changing brightness level should affect the lightness of a hue, similarly like changing L component of HSL color model. This will be however entirely up to DT brightness-model designer how they will design their models. We are not going to verify that in the LED multi color class. It implies that it will be possible to define arbitrary range of color levels, not necessarily adhering to any established color model. I think it could be useful to define brightness model that allows to go from blue color (for cold) up to red (hot) for representing a temperature for instance. These ideas will need however more documentation. Generally we aim to propose only a convention. Ah but what about the issue of writing the monochrome LED color. With your description it implies that when we write the red LED, the red LED will come on and if we write the blue LED then the red LED in theory should turn off and the blue come on. Where did I write that? Probably the way I used word "color" was not adequate in at least two cases, which may have confused the reader. Modifying monochrome color brightness levels (under colors dir) will just modify corresponding IOUT brightness. Not affecting the other. I'll try to rephrase what seems to may sound equivocal: s/When color name is given/When brightness-model is given a color name/ s/arbitrary range of color levels/arbitrary range of brightness-model levels/ But these could be used to mix the colors to create some abstract violet that is not defined in the brightness model. Why should the brightness models and monochrome LEDs have two different operations. Dan -- Best regards, Jacek Anaszewski
Re: kernel BUG at fs/inode.c:LINE!
On Fri, Apr 12, 2019 at 12:59:30PM +0200, Dmitry Vyukov wrote: > Please don't ignore the bug status tracking part. Or we will only have > options of not testing kernel or dropping most of the bug reports on > the floor. Both are equally harmful for kernel quality. > > Let's mark it as fixed by the next commit in series (which already > made it into mainline): > > #syz fix: autofs: simplify parse_options() function call > > But I won't be around always. You might want to add something like "commit discarded"...
[PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures
With the default overcommit==guess we occasionally run into mmap rejections despite plenty of memory that would get dropped under pressure but just isn't accounted reclaimable. One example of this is dying cgroups pinned by some page cache. A previous case was auxiliary path name memory associated with dentries; we have since annotated those allocations to avoid overcommit failures (see d79f7aa496fc ("mm: treat indirectly reclaimable memory as free in overcommit logic")). But trying to classify all allocated memory reliably as reclaimable and unreclaimable is a bit of a fool's errand. There could be a myriad of dependencies that constantly change with kernel versions. It becomes even more questionable of an effort when considering how this estimate of available memory is used: it's not compared to the system-wide allocated virtual memory in any way. It's not even compared to the allocating process's address space. It's compared to the single allocation request at hand! So we have an elaborate left-hand side of the equation that tries to assess the exact breathing room the system has available down to a page - and then compare it to an isolated allocation request with no additional context. We could fail an allocation of N bytes, but for two allocations of N/2 bytes we'd do this elaborate dance twice in a row and then still let N bytes of virtual memory through. This doesn't make a whole lot of sense. Let's take a step back and look at the actual goal of the heuristic. From the documentation: Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slightly more memory in this mode. This is the default. If all we want to do is catch clearly bogus allocation requests irrespective of the general virtual memory situation, the physical memory counter-part doesn't need to be that complicated, either. When in GUESS mode, catch wild allocations by comparing their request size to total amount of ram and swap in the system. Signed-off-by: Johannes Weiner --- mm/util.c | 51 +-- 1 file changed, 5 insertions(+), 46 deletions(-) diff --git a/mm/util.c b/mm/util.c index 05a464929b3e..e2e4f8c3fa12 100644 --- a/mm/util.c +++ b/mm/util.c @@ -652,7 +652,7 @@ EXPORT_SYMBOL_GPL(vm_memory_committed); */ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) { - long free, allowed, reserve; + long allowed; VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) < -(s64)vm_committed_as_batch * num_online_cpus(), @@ -667,51 +667,9 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) return 0; if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { - free = global_zone_page_state(NR_FREE_PAGES); - free += global_node_page_state(NR_FILE_PAGES); - - /* -* shmem pages shouldn't be counted as free in this -* case, they can't be purged, only swapped out, and -* that won't affect the overall amount of available -* memory in the system. -*/ - free -= global_node_page_state(NR_SHMEM); - - free += get_nr_swap_pages(); - - /* -* Any slabs which are created with the -* SLAB_RECLAIM_ACCOUNT flag claim to have contents -* which are reclaimable, under pressure. The dentry -* cache and most inode caches should fall into this -*/ - free += global_node_page_state(NR_SLAB_RECLAIMABLE); - - /* -* Part of the kernel memory, which can be released -* under memory pressure. -*/ - free += global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE); - - /* -* Leave reserved pages. The pages are not for anonymous pages. -*/ - if (free <= totalreserve_pages) + if (pages > totalram_pages() + total_swap_pages) goto error; - else - free -= totalreserve_pages; - - /* -* Reserve some for root -*/ - if (!cap_sys_admin) - free -= sysctl_admin_reserve_kbytes >> (PAGE_SHIFT - 10); - - if (free > pages) - return 0; - - goto error; + return 0; } allowed = vm_commit_limit(); @@ -725,7 +683,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) * Don't let a single process grow so big a user can't recover */ if (mm) { - reserve = sysctl
Re: [PATCH v2 2/7] dt: bindings: Add multicolor class dt bindings documention
Dan, On 4/12/19 1:50 PM, Dan Murphy wrote: Marek On 4/11/19 5:07 PM, Marek Behun wrote: Hi Dan, this probaly was discussed, but I did not follow brightness model discussions: what will happen if I set yellow by writing into yellow mode brightness, and then orange by writing orange model brightness? Will the resulting color be a mix of yellow and orange, or will the orange overwrite the yellow setting? This was not discussed and is a good question. If you write the yellow mode for a group of LEDs then yellow would be produced for the brightness requested. If orange is then requested then orange should be displayed at the brightness level requested. So yes the orange will over write the yellow. Yes, and individual color brightness levels should correspond to the color components of the brightness-model level currently set. The next question is if the absolute colors are written does it produce the same behavior? So if you have yellow and write to the red should the red LED brightness be modified or should the color switch to red? And if the red LED is on and the blue LED is written should the color switch to blue or should the blue and red LEDs be mixed together? Now, if any of the color brightness files is altered it should update the hardware with this new setting, but brightness-model and main brightness level should not be changed. The thing that is missing in our proposal is lack of the way to check if brightness-model is up to date (i.e. if it reflects what is written to the hardware). How about utilizing the sync file from the new colors directory? It could return 1 on read when brightness levels of all colors match exactly the ones assigned to the brightness model level currently set. This is tricky as the user space can write the individual absolute colors and mix the LEDs to produce varying colors. But the behavior writing the brightness models are different. I would almost prefer that the user space reads the available absolute LED colors and creates the devices supported color palette and write the absolute LED colors only. But this violates the requirements asked for. Dan Marek -- Best regards, Jacek Anaszewski
[PATCH] perf/x86: descriptive failure messages for PMU init
There's a default warning message that gets printed, however, there are various failure conditions: - a msr read can fail - a msr write can fail - a msr has an unexpected value - all msrs have unexpected values (disable PMU) Also, commit commit 005bd0077a79 ("perf/x86: Modify error message in virtualized environment") completely removed printing the msr in question but these messages could be helpful for debugging vPMUs as well. Add them back and change them to pr_debugs, this keeps the behavior the same for baremetal. Lastly, use %llx to silence checkpatch Signed-off-by: Bandan Das --- arch/x86/events/core.c | 66 -- 1 file changed, 50 insertions(+), 16 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index e2b1447192a8..786e03893a0c 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -192,9 +192,16 @@ static void release_pmc_hardware(void) {} static bool check_hw_exists(void) { - u64 val, val_fail = -1, val_new= ~0; - int i, reg, reg_fail = -1, ret = 0; - int bios_fail = 0; + u64 val = -1, val_fail = -1, val_new = ~0; + int i, reg = -1, reg_fail = -1, ret = 0; + bool virt = boot_cpu_has(X86_FEATURE_HYPERVISOR) ? true : false; + enum { + READ_FAIL=1, + WRITE_FAIL =2, + PMU_FAIL =3, + BIOS_FAIL=4, + }; + int status = 0; int reg_safe = -1; /* @@ -204,10 +211,13 @@ static bool check_hw_exists(void) for (i = 0; i < x86_pmu.num_counters; i++) { reg = x86_pmu_config_addr(i); ret = rdmsrl_safe(reg, &val); - if (ret) + if (ret) { + status = READ_FAIL; goto msr_fail; + } + if (val & ARCH_PERFMON_EVENTSEL_ENABLE) { - bios_fail = 1; + status = BIOS_FAIL; val_fail = val; reg_fail = reg; } else { @@ -218,11 +228,13 @@ static bool check_hw_exists(void) if (x86_pmu.num_counters_fixed) { reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL; ret = rdmsrl_safe(reg, &val); - if (ret) + if (ret) { + status = READ_FAIL; goto msr_fail; + } for (i = 0; i < x86_pmu.num_counters_fixed; i++) { if (val & (0x03 << i*4)) { - bios_fail = 1; + status = BIOS_FAIL; val_fail = val; reg_fail = reg; } @@ -236,7 +248,7 @@ static bool check_hw_exists(void) */ if (reg_safe == -1) { - reg = reg_safe; + status = PMU_FAIL; goto msr_fail; } @@ -246,18 +258,22 @@ static bool check_hw_exists(void) * (qemu/kvm) that don't trap on the MSR access and always return 0s. */ reg = x86_pmu_event_addr(reg_safe); - if (rdmsrl_safe(reg, &val)) + if (rdmsrl_safe(reg, &val)) { + status = READ_FAIL; goto msr_fail; + } val ^= 0xUL; ret = wrmsrl_safe(reg, val); ret |= rdmsrl_safe(reg, &val_new); - if (ret || val != val_new) + if (ret || val != val_new) { + status = WRITE_FAIL; goto msr_fail; + } /* * We still allow the PMU driver to operate: */ - if (bios_fail) { + if (status == BIOS_FAIL) { pr_cont("Broken BIOS detected, complain to your hardware vendor.\n"); pr_err(FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n", reg_fail, val_fail); @@ -266,12 +282,30 @@ static bool check_hw_exists(void) return true; msr_fail: - if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { + if (virt) pr_cont("PMU not available due to virtualization, using software events only.\n"); - } else { - pr_cont("Broken PMU hardware detected, using software events only.\n"); - pr_err("Failed to access perfctr msr (MSR %x is %Lx)\n", - reg, val_new); + switch (status) { + case READ_FAIL: + if (virt) + pr_debug("Failed to read perfctr msr (MSR %x)\n", reg); + else + pr_err("Failed to read perfctr msr (MSR %x)\n", reg); + break; + case WRITE_FAIL: + if (virt) + pr_debug("Failed to write perfctr msr (MSR %x, wrote: %llx, read: %llx)\n", +reg, val, val_new); + else + pr_err("Failed to wr
Re: [PATCH] selftests: livepatch use TEST_PROGS for test shell scripts
On 4/12/19 12:51 PM, Joe Lawrence wrote: On 4/12/19 1:05 PM, shuah wrote: On 4/12/19 7:37 AM, Miroslav Benes wrote: On Fri, 12 Apr 2019, shuah wrote: On 4/12/19 1:03 AM, Miroslav Benes wrote: On Thu, 11 Apr 2019, Shuah Khan wrote: TEST_PROGS variable is for test shell scripts and common clean target in lib.mk doesn't touch them. TEST_GEN_PROGS are removed by it. Fix it to use TEST_PROGS for test shell scripts and TEST_PROGS_EXTENDED for common functions.sh. Signed-off-by: Shuah Khan --- tools/testing/selftests/livepatch/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/livepatch/Makefile b/tools/testing/selftests/livepatch/Makefile index af4aee79bebb..fd405402c3ff 100644 --- a/tools/testing/selftests/livepatch/Makefile +++ b/tools/testing/selftests/livepatch/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 -TEST_GEN_PROGS := \ +TEST_PROGS_EXTENDED := functions.sh +TEST_PROGS := \ test-livepatch.sh \ test-callbacks.sh \ test-shadow-vars.sh Hi Shuah, thanks for the patch. We have already something similar queued in our git tree. See https://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching.git/commit/?h=for-5.1/upstream-fixes&id=abfe3c4560684864f66641438fee3075de098e89 It is missing TEST_PROGS_EXTENDED though. Should we add it? Please do. What that does is when selftests are installed, functions.h gets installed as well so the the test can run from installed location. Did I miss reviewing the original? I maintain the framework and try to catch these if patch gets sent to me. Unfortunately you did and it was our fault. You were not CCed, no one noticed and we were a bit trigger happy. Sorry about that. It should not have happened. Would this work for you? Looks good to me. Hi Shuah, Thanks for spotting this and apologies for missing your CC on my earlier patch post. For future reference, do you prefer a direct CC, linux-kselftest, or both? No worries. Happy to report the problem. Couldn't have missed it with all the deleted lines showing up whenever I ran diff on my changes. :) Direct to or cc to me and cc linux-kselftest list is preferred Same as any other patch really, everybody getmaintainer.pl recommends. thanks, -- Shuah
Re: [PATCH v2] chrome/platform: cros_ec_proto:: Add trace event to trace EC commands
On Fri, 12 Apr 2019 12:49:44 -0600 Raul E Rangel wrote: > +#define ec_cmds \ > + {EC_CMD_PROTO_VERSION, "PROTO_VERSION"}, \ > + {EC_CMD_HELLO, "HELLO"}, \ > + {EC_CMD_GET_VERSION, "GET_VERSION"}, \ > + {EC_CMD_READ_TEST, "READ_TEST"}, \ > + {EC_CMD_GET_BUILD_INFO, "GET_BUILD_INFO"}, \ > + {EC_CMD_GET_CHIP_INFO, "GET_CHIP_INFO"}, \ > + {EC_CMD_GET_BOARD_VERSION, "GET_BOARD_VERSION"}, \ > + {EC_CMD_READ_MEMMAP, "READ_MEMMAP"}, \ > + {EC_CMD_GET_CMD_VERSIONS, "GET_CMD_VERSIONS"}, \ > + {EC_CMD_GET_COMMS_STATUS, "GET_COMMS_STATUS"}, \ > + {EC_CMD_TEST_PROTOCOL, "TEST_PROTOCOL"}, \ > + {EC_CMD_GET_PROTOCOL_INFO, "GET_PROTOCOL_INFO"}, \ > + {EC_CMD_GSV_PAUSE_IN_S5, "GSV_PAUSE_IN_S5"}, \ > + {EC_CMD_GET_FEATURES, "GET_FEATURES"}, \ Usually, if I have something like this, I would do: #define ec_cmds \ EC(PROTO_VERSION), \ EC(HELLO), \ EC(GET_VERSION), \ EC(READ_TEST), \ [...] Then: #define EC(a) {EC_CMD_##a, #a} and then ec_cmds ends up with the same result with much less typing and little risk for copy past errors. -- Steve
Re: [PATCH][next] RDMA/mlx5: check for error return in flow_rule rather than err
On Fri, Apr 12, 2019 at 11:40:17AM +0100, Colin King wrote: > From: Colin Ian King > > Currently when the call to create_flow_rule_vport_sq fails, the error > check is being performed on err rather than on the return pointer > flow_rule. The return flow_rule maybe NULL (which is not considered > an error) or an error code, so check for the error on flow_rule. > > Addresses-Coverity: ("Uninitialized scalar variable") > Fixes: d5ed8ac34cef ("RDMA/mlx5: Move default representors SQ steering to > rule to modify QP") > Signed-off-by: Colin Ian King > --- > drivers/infiniband/hw/mlx5/qp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied to for-next, thanks Jason
Re: [PATCH] clk: rockchip: rk3288: Limit use of USB PHY clock to USB
Am Freitag, 12. April 2019, 20:02:55 CEST schrieb Matthias Kaehlcke: > On Fri, Apr 12, 2019 at 11:30:37AM +0200, Heiko Stübner wrote: > > Am Freitag, 12. April 2019, 02:16:57 CEST schrieb Matthias Kaehlcke: > > > Hi Heiko, > > > > > > On Thu, Apr 11, 2019 at 09:03:07PM +0200, Heiko Stübner wrote: > > > > Hi Matthias, > > > > > > > > Am Donnerstag, 11. April 2019, 19:59:17 CEST schrieb Matthias Kaehlcke: > > > > > The USB PHY clock can be configured as (grand) parent of uart0_sclk > > > > > and > > > > > sclk_gpu. It has been observed that UART0 doesn't work reliably in > > > > > high > > > > > speed mode with the PHY clock as input when certain USB devices are > > > > > plugged to the USB HOST1 port (see https://crrev.com/c/320543). > > > > > > > > > > Prefix the name of the PHY clock with a '.' in the non-USB muxes to > > > > > effectively remove the clock as input from these muxes. > > > > > > > > > > Signed-off-by: Matthias Kaehlcke > > > > > --- > > > > > drivers/clk/rockchip/clk-rk3288.c | 4 ++-- > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/drivers/clk/rockchip/clk-rk3288.c > > > > > b/drivers/clk/rockchip/clk-rk3288.c > > > > > index 5a67b7869960..677bc5485201 100644 > > > > > --- a/drivers/clk/rockchip/clk-rk3288.c > > > > > +++ b/drivers/clk/rockchip/clk-rk3288.c > > > > > @@ -200,8 +200,8 @@ PNAME(mux_aclk_cpu_src_p) = { "cpll_aclk_cpu", > > > > > "gpll_aclk_cpu" }; > > > > > PNAME(mux_pll_src_cpll_gpll_p) = { "cpll", "gpll" }; > > > > > PNAME(mux_pll_src_npll_cpll_gpll_p) = { "npll", "cpll", "gpll" }; > > > > > PNAME(mux_pll_src_cpll_gpll_npll_p) = { "cpll", "gpll", "npll" }; > > > > > -PNAME(mux_pll_src_cpll_gpll_usb480m_p) = { "cpll", "gpll", > > > > > "usbphy480m_src" }; > > > > > -PNAME(mux_pll_src_cpll_gll_usb_npll_p) = { "cpll", "gpll", > > > > > "usbphy480m_src", "npll" }; > > > > > +PNAME(mux_pll_src_cpll_gpll_usb480m_p) = { "cpll", "gpll", > > > > > ".usbphy480m_src" }; > > > > > +PNAME(mux_pll_src_cpll_gll_usb_npll_p) = { "cpll", "gpll", > > > > > ".usbphy480m_src", "npll" }; > > > > > > > > In general I like to have things like the clock-tree described fully > > > > and let the kernel handle correct sourcing ... but: > > > > > > > > As you write this seems like a systemic problem when just connecting > > > > random peripherals can create unstable clock source frequencies, > > > > so I tend to agree here ... but: > > > > > > > > Can we please find a more "talking" name for this ... because as with > > > > the > > > > above someone will find the "." and submit a fix for it ;-) . > > > > > > > > So just name it "unstable_dummy" or so? > > > > > > I looked for some common pattern, but couldn't find one. I liked the > > > '.' since it leaves the name of the clock mostly intact, just hiding > > > it (similar to a leading '.' in a Linux file system). But I agree that > > > it might not be expressive enough. I still like the idea to keep the > > > clock name around for reference, maybe we could name it > > > "unstable:usbphy480m_src" or similar. If you don't object I'll send a > > > patch with this some time tomorrow. > > > > I've just adapted the patch to use the new parent-name you suggested > > and applied it for 5.2 So no need to resend :-) . > > Pefect, thanks! > > I don't see the patch in the git.kernel.org repo (nor > https://lore.kernel.org/patchwork/patch/1060781/), looks like the push > is still pending. oops, I really had forgotten the push ... done now with your patch at https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/log/?h=v5.2-clk/next Heiko
Re: [PATCH] selftests: livepatch use TEST_PROGS for test shell scripts
On 4/12/19 1:05 PM, shuah wrote: On 4/12/19 7:37 AM, Miroslav Benes wrote: On Fri, 12 Apr 2019, shuah wrote: On 4/12/19 1:03 AM, Miroslav Benes wrote: On Thu, 11 Apr 2019, Shuah Khan wrote: TEST_PROGS variable is for test shell scripts and common clean target in lib.mk doesn't touch them. TEST_GEN_PROGS are removed by it. Fix it to use TEST_PROGS for test shell scripts and TEST_PROGS_EXTENDED for common functions.sh. Signed-off-by: Shuah Khan --- tools/testing/selftests/livepatch/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/livepatch/Makefile b/tools/testing/selftests/livepatch/Makefile index af4aee79bebb..fd405402c3ff 100644 --- a/tools/testing/selftests/livepatch/Makefile +++ b/tools/testing/selftests/livepatch/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 -TEST_GEN_PROGS := \ +TEST_PROGS_EXTENDED := functions.sh +TEST_PROGS := \ test-livepatch.sh \ test-callbacks.sh \ test-shadow-vars.sh Hi Shuah, thanks for the patch. We have already something similar queued in our git tree. See https://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching.git/commit/?h=for-5.1/upstream-fixes&id=abfe3c4560684864f66641438fee3075de098e89 It is missing TEST_PROGS_EXTENDED though. Should we add it? Please do. What that does is when selftests are installed, functions.h gets installed as well so the the test can run from installed location. Did I miss reviewing the original? I maintain the framework and try to catch these if patch gets sent to me. Unfortunately you did and it was our fault. You were not CCed, no one noticed and we were a bit trigger happy. Sorry about that. It should not have happened. Would this work for you? Looks good to me. Hi Shuah, Thanks for spotting this and apologies for missing your CC on my earlier patch post. For future reference, do you prefer a direct CC, linux-kselftest, or both? And for Miroslav, you can add my ack if needed. -- Joe -->8-- >From c158f5595286dba46f096cc7cc4bcef5ad8b6c16 Mon Sep 17 00:00:00 2001 From: Miroslav Benes Date: Fri, 12 Apr 2019 15:31:42 +0200 Subject: [PATCH] selftests/livepatch: Add functions.sh to TEST_PROGS_EXTENDED Add functions.sh to TEST_PROGS_EXTENDED so that it is installed along with the rest of the selftests and they can be run. Originally-by: Shuah Khan Signed-off-by: Miroslav Benes --- tools/testing/selftests/livepatch/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/livepatch/Makefile b/tools/testing/selftests/livepatch/Makefile index 114f43e2081a..fd405402c3ff 100644 --- a/tools/testing/selftests/livepatch/Makefile +++ b/tools/testing/selftests/livepatch/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 +TEST_PROGS_EXTENDED := functions.sh TEST_PROGS := \ test-livepatch.sh \ test-callbacks.sh \ thanks, -- Shuah
[PATCH v2] chrome/platform: cros_ec_proto:: Add trace event to trace EC commands
This is useful to see which EC commands are being executed and when. To enable: echo 'cros_ec:*' >> /sys/kernel/debug/tracing/set_event Example: /* cros_ec_cmd: version: 0, command: GET_VERSION */ /* cros_ec_cmd: version: 0, command: GET_PROTOCOL_INFO */ /* cros_ec_cmd: version: 1, command: GET_CMD_VERSIONS */ /* cros_ec_cmd: version: 1, command: USB_PD_CONTROL */ Signed-off-by: Raul E Rangel --- Changes in v2: - Changed comment style to match other cros_ec files. - Fixed commit tag. drivers/platform/chrome/Makefile| 4 +- drivers/platform/chrome/cros_ec_proto.c | 4 + drivers/platform/chrome/cros_ec_trace.c | 161 drivers/platform/chrome/cros_ec_trace.h | 51 4 files changed, 219 insertions(+), 1 deletion(-) create mode 100644 drivers/platform/chrome/cros_ec_trace.c create mode 100644 drivers/platform/chrome/cros_ec_trace.h diff --git a/drivers/platform/chrome/Makefile b/drivers/platform/chrome/Makefile index 1e2f0029b597..e542268454a4 100644 --- a/drivers/platform/chrome/Makefile +++ b/drivers/platform/chrome/Makefile @@ -3,12 +3,14 @@ obj-$(CONFIG_CHROMEOS_LAPTOP) += chromeos_laptop.o obj-$(CONFIG_CHROMEOS_PSTORE) += chromeos_pstore.o obj-$(CONFIG_CHROMEOS_TBMC)+= chromeos_tbmc.o +# tell define_trace.h where to find the cros ec trace header +CFLAGS_cros_ec_trace.o:= -I$(src) obj-$(CONFIG_CROS_EC_I2C) += cros_ec_i2c.o obj-$(CONFIG_CROS_EC_SPI) += cros_ec_spi.o cros_ec_lpcs-objs := cros_ec_lpc.o cros_ec_lpc_reg.o cros_ec_lpcs-$(CONFIG_CROS_EC_LPC_MEC) += cros_ec_lpc_mec.o obj-$(CONFIG_CROS_EC_LPC) += cros_ec_lpcs.o -obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o +obj-$(CONFIG_CROS_EC_PROTO)+= cros_ec_proto.o cros_ec_trace.o obj-$(CONFIG_CROS_KBD_LED_BACKLIGHT) += cros_kbd_led_backlight.o obj-$(CONFIG_CROS_EC_LIGHTBAR) += cros_ec_lightbar.o obj-$(CONFIG_CROS_EC_VBC) += cros_ec_vbc.o diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c index 97a068dff192..3d02c8259ac6 100644 --- a/drivers/platform/chrome/cros_ec_proto.c +++ b/drivers/platform/chrome/cros_ec_proto.c @@ -10,6 +10,8 @@ #include #include +#include "cros_ec_trace.h" + #define EC_COMMAND_RETRIES 50 static int prepare_packet(struct cros_ec_device *ec_dev, @@ -51,6 +53,8 @@ static int send_command(struct cros_ec_device *ec_dev, int ret; int (*xfer_fxn)(struct cros_ec_device *ec, struct cros_ec_command *msg); + trace_cros_ec_cmd(msg); + if (ec_dev->proto_version > 2) xfer_fxn = ec_dev->pkt_xfer; else diff --git a/drivers/platform/chrome/cros_ec_trace.c b/drivers/platform/chrome/cros_ec_trace.c new file mode 100644 index ..335bfce514d7 --- /dev/null +++ b/drivers/platform/chrome/cros_ec_trace.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0 +// Trace events for the ChromeOS Embedded Controller +// +// Copyright 2019 Google LLC. + +#define ec_cmds \ + {EC_CMD_PROTO_VERSION, "PROTO_VERSION"}, \ + {EC_CMD_HELLO, "HELLO"}, \ + {EC_CMD_GET_VERSION, "GET_VERSION"}, \ + {EC_CMD_READ_TEST, "READ_TEST"}, \ + {EC_CMD_GET_BUILD_INFO, "GET_BUILD_INFO"}, \ + {EC_CMD_GET_CHIP_INFO, "GET_CHIP_INFO"}, \ + {EC_CMD_GET_BOARD_VERSION, "GET_BOARD_VERSION"}, \ + {EC_CMD_READ_MEMMAP, "READ_MEMMAP"}, \ + {EC_CMD_GET_CMD_VERSIONS, "GET_CMD_VERSIONS"}, \ + {EC_CMD_GET_COMMS_STATUS, "GET_COMMS_STATUS"}, \ + {EC_CMD_TEST_PROTOCOL, "TEST_PROTOCOL"}, \ + {EC_CMD_GET_PROTOCOL_INFO, "GET_PROTOCOL_INFO"}, \ + {EC_CMD_GSV_PAUSE_IN_S5, "GSV_PAUSE_IN_S5"}, \ + {EC_CMD_GET_FEATURES, "GET_FEATURES"}, \ + {EC_CMD_GET_SKU_ID, "GET_SKU_ID"}, \ + {EC_CMD_SET_SKU_ID, "SET_SKU_ID"}, \ + {EC_CMD_FLASH_INFO, "FLASH_INFO"}, \ + {EC_CMD_FLASH_READ, "FLASH_READ"}, \ + {EC_CMD_FLASH_WRITE, "FLASH_WRITE"}, \ + {EC_CMD_FLASH_ERASE, "FLASH_ERASE"}, \ + {EC_CMD_FLASH_PROTECT, "FLASH_PROTECT"}, \ + {EC_CMD_FLASH_REGION_INFO, "FLASH_REGION_INFO"}, \ + {EC_CMD_VBNV_CONTEXT, "VBNV_CONTEXT"}, \ + {EC_CMD_FLASH_SPI_INFO, "FLASH_SPI_INFO"}, \ + {EC_CMD_FLASH_SELECT, "FLASH_SELECT"}, \ + {EC_CMD_PWM_GET_FAN_TARGET_RPM, "PWM_GET_FAN_TARGET_RPM"}, \ + {EC_CMD_PWM_SET_FAN_TARGET_RPM, "PWM_SET_FAN_TARGET_RPM"}, \ + {EC_CMD_PWM_GET_KEYBOARD_BACKLIGHT, "PWM_GET_KEYBOARD_BACKLIGHT"}, \ + {EC_CMD_PWM_SET_KEYBOARD_BACKLIGHT, "PWM_SET_KEYBOARD_BACKLIGHT"}, \ + {EC_CMD_PWM_SET_FAN_DUTY, "PWM_SET_FAN_DUTY"}, \ + {EC_CMD_PWM_SET_DUTY, "PWM_SET_DUTY"}, \ + {EC_CMD_PWM_GET_DUTY, "PWM_GET_DUTY"}, \ + {EC_CMD_LIGHTBAR_CMD, "LIGHTBAR_CMD"}, \ + {EC_CMD_LED_CONTROL, "LED_CONTROL"}, \ + {EC_CMD_VBOOT_HASH, "VBOOT_HASH"}, \ + {EC_CMD_MOTION_SENSE_CMD,
Re: [PATCH v2 2/7] dt: bindings: Add multicolor class dt bindings documention
Jacek On 4/12/19 1:14 PM, Jacek Anaszewski wrote: > Hi Marek, > > On 4/12/19 12:07 AM, Marek Behun wrote: >> Hi Dan, >> this probaly was discussed, but I did not follow brightness model >> discussions: >> what will happen if I set yellow by writing into yellow mode >> brightness, and then orange by writing orange model brightness? >> Will the resulting color be a mix of yellow and orange, or will the >> orange overwrite the yellow setting? > > Orange will overwrite yellow settings. When color name is given > it should be treated as a hue. Then changing brightness level > should affect the lightness of a hue, similarly like changing > L component of HSL color model. This will be however entirely > up to DT brightness-model designer how they will design their models. > We are not going to verify that in the LED multi color class. > > It implies that it will be possible to define arbitrary range > of color levels, not necessarily adhering to any established color > model. I think it could be useful to define brightness model > that allows to go from blue color (for cold) up to red (hot) > for representing a temperature for instance. > > These ideas will need however more documentation. Generally > we aim to propose only a convention. > Ah but what about the issue of writing the monochrome LED color. With your description it implies that when we write the red LED, the red LED will come on and if we write the blue LED then the red LED in theory should turn off and the blue come on. But these could be used to mix the colors to create some abstract violet that is not defined in the brightness model. Why should the brightness models and monochrome LEDs have two different operations. Dan
Re: [PATCH RESEND v6 5/6] i2c: mediatek: Add i2c support for MediaTek MT8183
On 02/04/2019 14:35, Qii Wang wrote: > Add i2c compatible for MT8183. Compare to MT2712 i2c controller, > MT8183 has different register offsets. Ltiming_reg is added to > adjust low width of SCL. Arb clock and dma_sync are needed. > > Signed-off-by: Qii Wang > Reviewed-by: Nicolas Boichat > --- > drivers/i2c/busses/i2c-mt65xx.c | 62 > +-- > 1 file changed, 60 insertions(+), 2 deletions(-) > Reviewed-by: Matthias Brugger > diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c > index 6137ad7..745b0d0 100644 > --- a/drivers/i2c/busses/i2c-mt65xx.c > +++ b/drivers/i2c/busses/i2c-mt65xx.c > @@ -133,6 +133,7 @@ enum I2C_REGS_OFFSET { > OFFSET_DEBUGCTRL, > OFFSET_TRANSFER_LEN_AUX, > OFFSET_CLOCK_DIV, > + OFFSET_LTIMING, > }; > > static const u16 mt_i2c_regs_v1[] = { > @@ -162,6 +163,32 @@ enum I2C_REGS_OFFSET { > [OFFSET_CLOCK_DIV] = 0x70, > }; > > +static const u16 mt_i2c_regs_v2[] = { > + [OFFSET_DATA_PORT] = 0x0, > + [OFFSET_SLAVE_ADDR] = 0x4, > + [OFFSET_INTR_MASK] = 0x8, > + [OFFSET_INTR_STAT] = 0xc, > + [OFFSET_CONTROL] = 0x10, > + [OFFSET_TRANSFER_LEN] = 0x14, > + [OFFSET_TRANSAC_LEN] = 0x18, > + [OFFSET_DELAY_LEN] = 0x1c, > + [OFFSET_TIMING] = 0x20, > + [OFFSET_START] = 0x24, > + [OFFSET_EXT_CONF] = 0x28, > + [OFFSET_LTIMING] = 0x2c, > + [OFFSET_HS] = 0x30, > + [OFFSET_IO_CONFIG] = 0x34, > + [OFFSET_FIFO_ADDR_CLR] = 0x38, > + [OFFSET_TRANSFER_LEN_AUX] = 0x44, > + [OFFSET_CLOCK_DIV] = 0x48, > + [OFFSET_SOFTRESET] = 0x50, > + [OFFSET_DEBUGSTAT] = 0xe0, > + [OFFSET_DEBUGCTRL] = 0xe8, > + [OFFSET_FIFO_STAT] = 0xf4, > + [OFFSET_FIFO_THRESH] = 0xf8, > + [OFFSET_DCM_EN] = 0xf88, > +}; > + > struct mtk_i2c_compatible { > const struct i2c_adapter_quirks *quirks; > const u16 *regs; > @@ -172,6 +199,7 @@ struct mtk_i2c_compatible { > unsigned char support_33bits: 1; > unsigned char timing_adjust: 1; > unsigned char dma_sync: 1; > + unsigned char ltiming_adjust: 1; > }; > > struct mtk_i2c { > @@ -195,6 +223,7 @@ struct mtk_i2c { > enum mtk_trans_op op; > u16 timing_reg; > u16 high_speed_reg; > + u16 ltiming_reg; > unsigned char auto_restart; > bool ignore_restart_irq; > const struct mtk_i2c_compatible *dev_comp; > @@ -222,6 +251,7 @@ struct mtk_i2c { > .support_33bits = 1, > .timing_adjust = 1, > .dma_sync = 0, > + .ltiming_adjust = 0, > }; > > static const struct mtk_i2c_compatible mt6577_compat = { > @@ -234,6 +264,7 @@ struct mtk_i2c { > .support_33bits = 0, > .timing_adjust = 0, > .dma_sync = 0, > + .ltiming_adjust = 0, > }; > > static const struct mtk_i2c_compatible mt6589_compat = { > @@ -246,6 +277,7 @@ struct mtk_i2c { > .support_33bits = 0, > .timing_adjust = 0, > .dma_sync = 0, > + .ltiming_adjust = 0, > }; > > static const struct mtk_i2c_compatible mt7622_compat = { > @@ -258,6 +290,7 @@ struct mtk_i2c { > .support_33bits = 0, > .timing_adjust = 0, > .dma_sync = 0, > + .ltiming_adjust = 0, > }; > > static const struct mtk_i2c_compatible mt8173_compat = { > @@ -269,6 +302,19 @@ struct mtk_i2c { > .support_33bits = 1, > .timing_adjust = 0, > .dma_sync = 0, > + .ltiming_adjust = 0, > +}; > + > +static const struct mtk_i2c_compatible mt8183_compat = { > + .regs = mt_i2c_regs_v2, > + .pmic_i2c = 0, > + .dcm = 0, > + .auto_restart = 1, > + .aux_len_reg = 1, > + .support_33bits = 1, > + .timing_adjust = 1, > + .dma_sync = 1, > + .ltiming_adjust = 1, > }; > > static const struct of_device_id mtk_i2c_of_match[] = { > @@ -277,6 +323,7 @@ struct mtk_i2c { > { .compatible = "mediatek,mt6589-i2c", .data = &mt6589_compat }, > { .compatible = "mediatek,mt7622-i2c", .data = &mt7622_compat }, > { .compatible = "mediatek,mt8173-i2c", .data = &mt8173_compat }, > + { .compatible = "mediatek,mt8183-i2c", .data = &mt8183_compat }, > {} > }; > MODULE_DEVICE_TABLE(of, mtk_i2c_of_match); > @@ -361,6 +408,8 @@ static void mtk_i2c_init_hw(struct mtk_i2c *i2c) > > mtk_i2c_writew(i2c, i2c->timing_reg, OFFSET_TIMING); > mtk_i2c_writew(i2c, i2c->high_speed_reg, OFFSET_HS); > + if (i2c->dev_comp->ltiming_adjust) > + mtk_i2c_writew(i2c, i2c->ltiming_reg, OFFSET_LTIMING); > > /* If use i2c pin from PMIC mt6397 side, need set PATH_DIR first */ > if (i2c->have_pmic) > @@ -460,6 +509,8 @@ static int mtk_i2c_set_speed(struct mtk_i2c *i2c, > unsigned int parent_clk) > unsigned int clk_src; > unsigned int step_cnt; > unsigned int sample_cnt; > + unsigned int l_step_cnt; > + unsigned int l_sample_cnt; > unsigned int target_speed; > int ret; > > @@ -469,11 +520,11 @@ static int mtk_i2c_set_spe
Re: [PATCH RESEND v6 4/6] i2c: mediatek: Add i2c and apdma sync in i2c driver
On 02/04/2019 14:35, Qii Wang wrote: > When i2c and apdma use different source clocks, we should enable > synchronization between them. > > Signed-off-by: Qii Wang > Reviewed-by: Nicolas Boichat > --- > drivers/i2c/busses/i2c-mt65xx.c | 11 +++ > 1 file changed, 11 insertions(+) > Reviewed-by: Matthias Brugger > diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c > index 1a7235e..6137ad7 100644 > --- a/drivers/i2c/busses/i2c-mt65xx.c > +++ b/drivers/i2c/busses/i2c-mt65xx.c > @@ -77,6 +77,8 @@ > #define I2C_CONTROL_DIR_CHANGE (0x1 << 4) > #define I2C_CONTROL_ACKERR_DET_EN (0x1 << 5) > #define I2C_CONTROL_TRANSFER_LEN_CHANGE (0x1 << 6) > +#define I2C_CONTROL_DMAACK_EN (0x1 << 8) > +#define I2C_CONTROL_ASYNC_MODE (0x1 << 9) > #define I2C_CONTROL_WRAPPER (0x1 << 0) > > #define I2C_DRV_NAME "i2c-mt65xx" > @@ -169,6 +171,7 @@ struct mtk_i2c_compatible { > unsigned char aux_len_reg: 1; > unsigned char support_33bits: 1; > unsigned char timing_adjust: 1; > + unsigned char dma_sync: 1; > }; > > struct mtk_i2c { > @@ -218,6 +221,7 @@ struct mtk_i2c { > .aux_len_reg = 1, > .support_33bits = 1, > .timing_adjust = 1, > + .dma_sync = 0, > }; > > static const struct mtk_i2c_compatible mt6577_compat = { > @@ -229,6 +233,7 @@ struct mtk_i2c { > .aux_len_reg = 0, > .support_33bits = 0, > .timing_adjust = 0, > + .dma_sync = 0, > }; > > static const struct mtk_i2c_compatible mt6589_compat = { > @@ -240,6 +245,7 @@ struct mtk_i2c { > .aux_len_reg = 0, > .support_33bits = 0, > .timing_adjust = 0, > + .dma_sync = 0, > }; > > static const struct mtk_i2c_compatible mt7622_compat = { > @@ -251,6 +257,7 @@ struct mtk_i2c { > .aux_len_reg = 1, > .support_33bits = 0, > .timing_adjust = 0, > + .dma_sync = 0, > }; > > static const struct mtk_i2c_compatible mt8173_compat = { > @@ -261,6 +268,7 @@ struct mtk_i2c { > .aux_len_reg = 1, > .support_33bits = 1, > .timing_adjust = 0, > + .dma_sync = 0, > }; > > static const struct of_device_id mtk_i2c_of_match[] = { > @@ -360,6 +368,9 @@ static void mtk_i2c_init_hw(struct mtk_i2c *i2c) > > control_reg = I2C_CONTROL_ACKERR_DET_EN | > I2C_CONTROL_CLK_EXT_EN | I2C_CONTROL_DMA_EN; > + if (i2c->dev_comp->dma_sync) > + control_reg |= I2C_CONTROL_DMAACK_EN | I2C_CONTROL_ASYNC_MODE; > + > mtk_i2c_writew(i2c, control_reg, OFFSET_CONTROL); > mtk_i2c_writew(i2c, I2C_DELAY_LEN, OFFSET_DELAY_LEN); > >
Re: [PATCH RESEND v6 3/6] i2c: mediatek: Add arb clock in i2c driver
On 02/04/2019 14:35, Qii Wang wrote: > When two i2c controllers are internally connected to the same > GPIO pins, the arb clock is needed to ensure that the waveforms > do not interfere with each other. And we also need to enable > the interrupt to find arb lost, old i2c controllers also have > the bit. > > Signed-off-by: Qii Wang > Reviewed-by: Nicolas Boichat > --- > drivers/i2c/busses/i2c-mt65xx.c | 25 ++--- > 1 file changed, 22 insertions(+), 3 deletions(-) Looks good: Reviewed-by: Matthias Brugger > > diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c > index be36018..1a7235e 100644 > --- a/drivers/i2c/busses/i2c-mt65xx.c > +++ b/drivers/i2c/busses/i2c-mt65xx.c > @@ -35,6 +35,7 @@ > #include > > #define I2C_RS_TRANSFER (1 << 4) > +#define I2C_ARB_LOST (1 << 3) > #define I2C_HS_NACKERR (1 << 2) > #define I2C_ACKERR (1 << 1) > #define I2C_TRANSAC_COMP (1 << 0) > @@ -181,6 +182,7 @@ struct mtk_i2c { > struct clk *clk_main; /* main clock for i2c bus */ > struct clk *clk_dma;/* DMA clock for i2c via DMA */ > struct clk *clk_pmic; /* PMIC clock for i2c from PMIC */ > + struct clk *clk_arb;/* Arbitrator clock for i2c */ > bool have_pmic; /* can use i2c pins from PMIC */ > bool use_push_pull; /* IO config push-pull mode */ > > @@ -299,8 +301,18 @@ static int mtk_i2c_clock_enable(struct mtk_i2c *i2c) > if (ret) > goto err_pmic; > } > + > + if (i2c->clk_arb) { > + ret = clk_prepare_enable(i2c->clk_arb); > + if (ret) > + goto err_arb; > + } > + > return 0; > > +err_arb: > + if (i2c->have_pmic) > + clk_disable_unprepare(i2c->clk_pmic); > err_pmic: > clk_disable_unprepare(i2c->clk_main); > err_main: > @@ -311,6 +323,9 @@ static int mtk_i2c_clock_enable(struct mtk_i2c *i2c) > > static void mtk_i2c_clock_disable(struct mtk_i2c *i2c) > { > + if (i2c->clk_arb) > + clk_disable_unprepare(i2c->clk_arb); > + > if (i2c->have_pmic) > clk_disable_unprepare(i2c->clk_pmic); > > @@ -519,13 +534,13 @@ static int mtk_i2c_do_transfer(struct mtk_i2c *i2c, > struct i2c_msg *msgs, > > /* Clear interrupt status */ > mtk_i2c_writew(i2c, restart_flag | I2C_HS_NACKERR | I2C_ACKERR | > -I2C_TRANSAC_COMP, OFFSET_INTR_STAT); > + I2C_ARB_LOST | I2C_TRANSAC_COMP, OFFSET_INTR_STAT); > > mtk_i2c_writew(i2c, I2C_FIFO_ADDR_CLR, OFFSET_FIFO_ADDR_CLR); > > /* Enable interrupt */ > mtk_i2c_writew(i2c, restart_flag | I2C_HS_NACKERR | I2C_ACKERR | > -I2C_TRANSAC_COMP, OFFSET_INTR_MASK); > + I2C_ARB_LOST | I2C_TRANSAC_COMP, OFFSET_INTR_MASK); > > /* Set transfer and transaction len */ > if (i2c->op == I2C_MASTER_WRRD) { > @@ -659,7 +674,7 @@ static int mtk_i2c_do_transfer(struct mtk_i2c *i2c, > struct i2c_msg *msgs, > > /* Clear interrupt mask */ > mtk_i2c_writew(i2c, ~(restart_flag | I2C_HS_NACKERR | I2C_ACKERR | > -I2C_TRANSAC_COMP), OFFSET_INTR_MASK); > + I2C_ARB_LOST | I2C_TRANSAC_COMP), OFFSET_INTR_MASK); > > if (i2c->op == I2C_MASTER_WR) { > dma_unmap_single(i2c->dev, wpaddr, > @@ -884,6 +899,10 @@ static int mtk_i2c_probe(struct platform_device *pdev) > return PTR_ERR(i2c->clk_dma); > } > > + i2c->clk_arb = devm_clk_get(&pdev->dev, "arb"); > + if (IS_ERR(i2c->clk_arb)) > + i2c->clk_arb = NULL; > + > clk = i2c->clk_main; > if (i2c->have_pmic) { > i2c->clk_pmic = devm_clk_get(&pdev->dev, "pmic"); >
Re: [PATCH v2] arm64: dts: qcom: Add Lenovo Miix 630
On Fri, Apr 12, 2019 at 12:01 PM Bjorn Andersson wrote: > > On Thu 11 Apr 13:51 PDT 2019, Jeffrey Hugo wrote: > > diff --git a/arch/arm64/boot/dts/qcom/msm8998-clam.dtsi > > b/arch/arm64/boot/dts/qcom/msm8998-clam.dtsi > [..] > > +&tlmm { > > + gpio-reserved-ranges = <0 4>, <81 4>; > > Did you double check that you actually need these? (Iirc boot testing is > enough) Yes, still needed. > > > + > > + touchpad: touchpad { > > + config { > > + pins = "gpio123"; > > + bias-pull-up; /* pull up */ > > + }; > > + }; > > +}; > [..] > > diff --git a/arch/arm64/boot/dts/qcom/msm8998-lenovo-miix-630.dts > > b/arch/arm64/boot/dts/qcom/msm8998-lenovo-miix-630.dts > [..] > > +/ { > > + model = "Lenovo Miix 630"; > > + compatible = "qcom,msm8998-clam"; > > compatible = "lenovo,miix-630", "qcom,msm8998"; Thanks for the suggestion. Will do. > > > + > > + /* bootloader doesn't understand DT, so no qcom,board-id */ > > You can omit this comment, as I don't think people will miss these > non-standard properties. Makes sense. Will do.
[PATCH v4 9/9] clk: fixed-factor: Let clk framework find parent
Convert this driver to a more modern way of specifying parents now that we have a way to specify clk parents by DT index. This lets us nicely avoid a problem where a parent clk name isn't know because the parent clk hasn't been registered yet. Cc: Miquel Raynal Cc: Jerome Brunet Cc: Russell King Cc: Michael Turquette Cc: Jeffrey Hugo Cc: Chen-Yu Tsai Signed-off-by: Stephen Boyd --- drivers/clk/clk-fixed-factor.c | 53 +- 1 file changed, 33 insertions(+), 20 deletions(-) diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c index 241b3f8c61a9..67cc7e515e42 100644 --- a/drivers/clk/clk-fixed-factor.c +++ b/drivers/clk/clk-fixed-factor.c @@ -64,12 +64,14 @@ const struct clk_ops clk_fixed_factor_ops = { }; EXPORT_SYMBOL_GPL(clk_fixed_factor_ops); -struct clk_hw *clk_hw_register_fixed_factor(struct device *dev, - const char *name, const char *parent_name, unsigned long flags, - unsigned int mult, unsigned int div) +static struct clk_hw * +__clk_hw_register_fixed_factor(struct device *dev, struct device_node *np, + const char *name, const char *parent_name, int index, + unsigned long flags, unsigned int mult, unsigned int div) { struct clk_fixed_factor *fix; struct clk_init_data init; + struct clk_parent_data pdata = { .index = index }; struct clk_hw *hw; int ret; @@ -85,11 +87,17 @@ struct clk_hw *clk_hw_register_fixed_factor(struct device *dev, init.name = name; init.ops = &clk_fixed_factor_ops; init.flags = flags | CLK_IS_BASIC; - init.parent_names = &parent_name; + if (parent_name) + init.parent_names = &parent_name; + else + init.parent_data = &pdata; init.num_parents = 1; hw = &fix->hw; - ret = clk_hw_register(dev, hw); + if (dev) + ret = clk_hw_register(dev, hw); + else + ret = of_clk_hw_register(np, hw); if (ret) { kfree(fix); hw = ERR_PTR(ret); @@ -97,6 +105,14 @@ struct clk_hw *clk_hw_register_fixed_factor(struct device *dev, return hw; } + +struct clk_hw *clk_hw_register_fixed_factor(struct device *dev, + const char *name, const char *parent_name, unsigned long flags, + unsigned int mult, unsigned int div) +{ + return __clk_hw_register_fixed_factor(dev, NULL, name, parent_name, -1, flags, + mult, div); +} EXPORT_SYMBOL_GPL(clk_hw_register_fixed_factor); struct clk *clk_register_fixed_factor(struct device *dev, const char *name, @@ -143,11 +159,10 @@ static const struct of_device_id set_rate_parent_matches[] = { { /* Sentinel */ }, }; -static struct clk *_of_fixed_factor_clk_setup(struct device_node *node) +static struct clk_hw *_of_fixed_factor_clk_setup(struct device_node *node) { - struct clk *clk; + struct clk_hw *hw; const char *clk_name = node->name; - const char *parent_name; unsigned long flags = 0; u32 div, mult; int ret; @@ -165,30 +180,28 @@ static struct clk *_of_fixed_factor_clk_setup(struct device_node *node) } of_property_read_string(node, "clock-output-names", &clk_name); - parent_name = of_clk_get_parent_name(node, 0); if (of_match_node(set_rate_parent_matches, node)) flags |= CLK_SET_RATE_PARENT; - clk = clk_register_fixed_factor(NULL, clk_name, parent_name, flags, - mult, div); - if (IS_ERR(clk)) { + hw = __clk_hw_register_fixed_factor(NULL, node, clk_name, NULL, 0, + flags, mult, div); + if (IS_ERR(hw)) { /* -* If parent clock is not registered, registration would fail. * Clear OF_POPULATED flag so that clock registration can be * attempted again from probe function. */ of_node_clear_flag(node, OF_POPULATED); - return clk; + return ERR_CAST(hw); } - ret = of_clk_add_provider(node, of_clk_src_simple_get, clk); + ret = of_clk_add_hw_provider(node, of_clk_hw_simple_get, hw); if (ret) { - clk_unregister(clk); + clk_hw_unregister_fixed_factor(hw); return ERR_PTR(ret); } - return clk; + return hw; } /** @@ -203,17 +216,17 @@ CLK_OF_DECLARE(fixed_factor_clk, "fixed-factor-clock", static int of_fixed_factor_clk_remove(struct platform_device *pdev) { - struct clk *clk = platform_get_drvdata(pdev); + struct clk_hw *clk = platform_get_drvdata(pdev); of_clk_del_provider(pdev->dev.of_node); - clk_unregister_fixed_factor(clk); + clk_hw_unregister_fixed_factor(clk); return 0; } st
[PATCH v4 0/9] Rewrite clk parent handling
There are a couple warts with clk parent handling in the common clk framework. This patch series addresses one of those warts, parent-child linkages. We use strings for all parent-child linkages, and this leads to poorly written code that extracts clk names from struct clk pointers and makes clk provider drivers use clk consumer APIs. I've converted the fixed factor clk and I'm looking at converting users of other "basic" clk types in follow-up patches so we can start moving drivers away from string matching in the clk namespace. Changes from v3: * Split clkdev patch into two * New patch to let dev_of_node() accept NULL * Use dev_of_node() in of_clk_hw_register() to avoid NULL deref Changes from v2: * Dropped patches that got merged into v5.1-rc1 * Introduced a new function of_clk_hw_register() * Introduced 'index' into clk_parent_data structure * Converted fixed-factor basic type to new design * Fixed some bugs in a recent patch to clkdev, leading to an essential API for clkdev based lookups working Changes from v1: * Dropped new get_parent_hw, we'll pick it up in a later series * Rebased to v5.0-rc6 to avoid conflicts with clk-fixes * Renamed 'fallback' to 'name' and 'name' to 'fw_name' in parent map * Made sure that clk_hw_get_parent_by_index() never sees an error pointer so that we don't mistakenly try to defer an error pointer * Fixed index passing mistake on of_clk_get_hw_from_clkspec() * Copy over all the data from parent maps and free it correctly too Cc: Chen-Yu Tsai Cc: Greg Kroah-Hartman Cc: Jeffrey Hugo Cc: Jerome Brunet Cc: Matti Vaittinen Cc: Michael Turquette Cc: Miquel Raynal Cc: Rob Herring Cc: Russell King Stephen Boyd (9): clkdev: Hold clocks_mutex while iterating clocks list clkdev: Move clk creation outside of 'clocks_mutex' clk: Prepare for clk registration API that uses DT nodes driver core: Let dev_of_node() accept a NULL dev clk: Add of_clk_hw_register() API for early clk drivers clk: Allow parents to be specified without string names clk: Look for parents with clkdev based clk_lookups clk: Allow parents to be specified via clkspec index clk: fixed-factor: Let clk framework find parent drivers/clk/clk-fixed-factor.c | 53 -- drivers/clk/clk.c | 326 + drivers/clk/clk.h | 2 + drivers/clk/clkdev.c | 30 +-- include/linux/clk-provider.h | 22 +++ include/linux/device.h | 2 +- 6 files changed, 327 insertions(+), 108 deletions(-) base-commit: 9e98c678c2d6ae3a17cb2de55d17f69dddaa231b -- Sent by a computer through tubes
[PATCH v4 5/9] clk: Add of_clk_hw_register() API for early clk drivers
In some circumstances drivers register clks early and don't have access to a struct device because the device model isn't initialized yet. Add an API to let drivers register clks associated with a struct device_node so that these drivers can participate in getting parent clks through DT. Cc: Miquel Raynal Cc: Jerome Brunet Cc: Russell King Cc: Michael Turquette Cc: Jeffrey Hugo Cc: Chen-Yu Tsai Cc: Rob Herring Signed-off-by: Stephen Boyd --- drivers/clk/clk.c| 26 +++--- include/linux/clk-provider.h | 1 + 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index d27775a73e67..ffa63ddcd408 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -45,6 +45,7 @@ struct clk_core { struct clk_hw *hw; struct module *owner; struct device *dev; + struct device_node *of_node; struct clk_core *parent; const char **parent_names; struct clk_core **parents; @@ -3313,7 +3314,8 @@ struct clk *clk_hw_create_clk(struct device *dev, struct clk_hw *hw, return clk; } -static struct clk *__clk_register(struct device *dev, struct clk_hw *hw) +static struct clk * +__clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw) { int i, ret; struct clk_core *core; @@ -3339,6 +3341,7 @@ static struct clk *__clk_register(struct device *dev, struct clk_hw *hw) if (dev && pm_runtime_enabled(dev)) core->rpm_enabled = true; core->dev = dev; + core->of_node = np; if (dev && dev->driver) core->owner = dev->driver->owner; core->hw = hw; @@ -3429,7 +3432,7 @@ static struct clk *__clk_register(struct device *dev, struct clk_hw *hw) */ struct clk *clk_register(struct device *dev, struct clk_hw *hw) { - return __clk_register(dev, hw); + return __clk_register(dev, dev_of_node(dev), hw); } EXPORT_SYMBOL_GPL(clk_register); @@ -3445,10 +3448,27 @@ EXPORT_SYMBOL_GPL(clk_register); */ int clk_hw_register(struct device *dev, struct clk_hw *hw) { - return PTR_ERR_OR_ZERO(__clk_register(dev, hw)); + return PTR_ERR_OR_ZERO(__clk_register(dev, dev_of_node(dev), hw)); } EXPORT_SYMBOL_GPL(clk_hw_register); +/* + * of_clk_hw_register - register a clk_hw and return an error code + * @node: device_node of device that is registering this clock + * @hw: link to hardware-specific clock data + * + * of_clk_hw_register() is the primary interface for populating the clock tree + * with new clock nodes when a struct device is not available, but a struct + * device_node is. It returns an integer equal to zero indicating success or + * less than zero indicating failure. Drivers must test for an error code after + * calling of_clk_hw_register(). + */ +int of_clk_hw_register(struct device_node *node, struct clk_hw *hw) +{ + return PTR_ERR_OR_ZERO(__clk_register(NULL, node, hw)); +} +EXPORT_SYMBOL_GPL(of_clk_hw_register); + /* Free memory allocated for a clock. */ static void __clk_release(struct kref *ref) { diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index b7cf80a71293..7d2d97e15b76 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -773,6 +773,7 @@ struct clk *devm_clk_register(struct device *dev, struct clk_hw *hw); int __must_check clk_hw_register(struct device *dev, struct clk_hw *hw); int __must_check devm_clk_hw_register(struct device *dev, struct clk_hw *hw); +int __must_check of_clk_hw_register(struct device_node *node, struct clk_hw *hw); void clk_unregister(struct clk *clk); void devm_clk_unregister(struct device *dev, struct clk *clk); -- Sent by a computer through tubes
[PATCH v4 8/9] clk: Allow parents to be specified via clkspec index
Some clk providers are simple DT nodes that only have a 'clocks' property without having an associated 'clock-names' property. In these cases, we want to let these clk providers point to their parent clks without having to dereference the 'clocks' property at probe time to figure out the parent's globally unique clk name. Let's add an 'index' property to the parent_data structure so that clk providers can indicate that their parent is a particular index in the 'clocks' DT property. Cc: Miquel Raynal Cc: Jerome Brunet Cc: Russell King Cc: Michael Turquette Cc: Jeffrey Hugo Cc: Chen-Yu Tsai Signed-off-by: Stephen Boyd --- drivers/clk/clk.c| 18 +++--- include/linux/clk-provider.h | 2 ++ 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 42f29fd6bfd8..65fe50b139ea 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -44,6 +44,7 @@ struct clk_parent_map { struct clk_core *core; const char *fw_name; const char *name; + int index; }; struct clk_core { @@ -326,7 +327,8 @@ static struct clk_core *clk_core_lookup(const char *name) /** * clk_core_get - Find the clk_core parent of a clk * @core: clk to find parent of - * @name: name to search for + * @name: name to search for (if string based) + * @index: index to use for search (if DT index based) * * This is the preferred method for clk providers to find the parent of a * clk when that parent is external to the clk controller. The parent_names @@ -358,22 +360,23 @@ static struct clk_core *clk_core_lookup(const char *name) * provider knows about the clk but it isn't provided on this system. * A valid clk_core pointer when the clk can be found in the provider. */ -static struct clk_core *clk_core_get(struct clk_core *core, const char *name) +static struct clk_core *clk_core_get(struct clk_core *core, const char *name, +int index) { struct clk_hw *hw = ERR_PTR(-ENOENT); struct device *dev = core->dev; const char *dev_id = dev ? dev_name(dev) : NULL; struct device_node *np = core->of_node; - if (np) - hw = of_clk_get_hw(np, -1, name); + if (np && index >= 0) + hw = of_clk_get_hw(np, index, name); /* * If the DT search above couldn't find the provider or the provider * didn't know about this clk, fallback to looking up via clkdev based * clk_lookups */ - if (PTR_ERR(hw) == -ENOENT) + if (PTR_ERR(hw) == -ENOENT && name) hw = clk_find_hw(dev_id, name); if (IS_ERR(hw)) @@ -397,8 +400,7 @@ static void clk_core_fill_parent_index(struct clk_core *core, u8 index) if (!parent) parent = ERR_PTR(-EPROBE_DEFER); } else { - if (entry->fw_name) - parent = clk_core_get(core, entry->fw_name); + parent = clk_core_get(core, entry->fw_name, entry->index); if (IS_ERR(parent) && PTR_ERR(parent) == -ENOENT) parent = clk_core_lookup(entry->name); } @@ -3443,6 +3445,7 @@ static int clk_core_populate_parent_map(struct clk_core *core) /* Copy everything over because it might be __initdata */ for (i = 0, parent = parents; i < num_parents; i++, parent++) { + parent->index = -1; if (parent_names) { /* throw a WARN if any entries are NULL */ WARN(!parent_names[i], @@ -3452,6 +3455,7 @@ static int clk_core_populate_parent_map(struct clk_core *core) true); } else if (parent_data) { parent->hw = parent_data[i].hw; + parent->index = parent_data[i].index; ret = clk_cpy_name(&parent->fw_name, parent_data[i].fw_name, false); if (!ret) diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 95d5a9d4e8c3..12fef24b740f 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -255,11 +255,13 @@ struct clk_ops { * @hw: parent clk_hw pointer (used for clk providers with internal clks) * @fw_name: parent name local to provider registering clk * @name: globally unique parent name (used as a fallback) + * @index: parent index local to provider registering clk (if @fw_name absent) */ struct clk_parent_data { const struct clk_hw *hw; const char *fw_name; const char *name; + int index; }; /** -- Sent by a computer through tubes
[PATCH v4 1/9] clkdev: Hold clocks_mutex while iterating clocks list
We recently introduced a change to support devm clk lookups. That change introduced a code-path that used clk_find() without holding the 'clocks_mutex'. Unfortunately, clk_find() iterates over the 'clocks' list and so we need to prevent the list from being modified at the same time. Do this by holding the mutex and checking to make sure it's held while iterating the list. Note, we don't really care if the lookup is freed after we find it with clk_find() because we're just doing a pointer comparison, but if we did care we would need to keep holding the mutex while we dereference the clk_lookup pointer. Fixes: 3eee6c7d119c ("clkdev: add managed clkdev lookup registration") Cc: Miquel Raynal Cc: Jerome Brunet Cc: Russell King Cc: Michael Turquette Cc: Jeffrey Hugo Cc: Chen-Yu Tsai Cc: Matti Vaittinen Signed-off-by: Stephen Boyd --- I plan to take this through clk-fixes for v5.1-rc series. drivers/clk/clkdev.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c index 8c4435c53f09..6e787cc9e5b9 100644 --- a/drivers/clk/clkdev.c +++ b/drivers/clk/clkdev.c @@ -46,6 +46,8 @@ static struct clk_lookup *clk_find(const char *dev_id, const char *con_id) if (con_id) best_possible += 1; + lockdep_assert_held(&clocks_mutex); + list_for_each_entry(p, &clocks, node) { match = 0; if (p->dev_id) { @@ -402,7 +404,10 @@ void devm_clk_release_clkdev(struct device *dev, const char *con_id, struct clk_lookup *cl; int rval; + mutex_lock(&clocks_mutex); cl = clk_find(dev_id, con_id); + mutex_unlock(&clocks_mutex); + WARN_ON(!cl); rval = devres_release(dev, devm_clkdev_release, devm_clk_match_clkdev, cl); -- Sent by a computer through tubes
[PATCH v4 6/9] clk: Allow parents to be specified without string names
The common clk framework is lacking in ability to describe the clk topology without specifying strings for every possible parent-child link. There are a few drawbacks to the current approach: 1) String comparisons are used for everything, including describing topologies that are 'local' to a single clock controller. 2) clk providers (e.g. i2c clk drivers) need to create globally unique clk names to avoid collisions in the clk namespace, leading to awkward name generation code in various clk drivers. 3) DT bindings may not fully describe the clk topology and linkages between clk controllers because drivers can easily rely on globally unique strings to describe connections between clks. This leads to confusing DT bindings, complicated clk name generation code, and inefficient string comparisons during clk registration just so that the clk framework can detect the topology of the clk tree. Furthermore, some drivers call clk_get() and then __clk_get_name() to extract the globally unique clk name just so they can specify the parent of the clk they're registering. We have of_clk_parent_fill() but that mostly only works for single clks registered from a DT node, which isn't the norm. Let's simplify this all by introducing two new ways of specifying clk parents. The first method is an array of pointers to clk_hw structures corresponding to the parents at that index. This works for clks that are registered when we have access to all the clk_hw pointers for the parents. The second method is a mix of clk_hw pointers and strings of local and global parent clk names. If the .fw_name member of the map is set we'll look for that clk by performing a DT based lookup of the device the clk is registered with and the .name specified in the map. If that fails, we'll fallback to the .name member and perform a global clk name lookup like we've always done before. Using either one of these new methods is entirely optional. Existing drivers will continue to work, and they can migrate to this new approach as they see fit. Eventually, we'll want to get rid of the 'parent_names' array in struct clk_init_data and use one of these new methods instead. Cc: Miquel Raynal Cc: Jerome Brunet Cc: Russell King Cc: Michael Turquette Cc: Jeffrey Hugo Cc: Chen-Yu Tsai Cc: Rob Herring Signed-off-by: Stephen Boyd --- drivers/clk/clk.c| 262 ++- include/linux/clk-provider.h | 19 +++ 2 files changed, 219 insertions(+), 62 deletions(-) diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index ffa63ddcd408..de49a7c4214b 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -39,6 +39,13 @@ static LIST_HEAD(clk_notifier_list); /***private data structures***/ +struct clk_parent_map { + const struct clk_hw *hw; + struct clk_core *core; + const char *fw_name; + const char *name; +}; + struct clk_core { const char *name; const struct clk_ops*ops; @@ -47,8 +54,7 @@ struct clk_core { struct device *dev; struct device_node *of_node; struct clk_core *parent; - const char **parent_names; - struct clk_core **parents; + struct clk_parent_map *parents; u8 num_parents; u8 new_parent_index; unsigned long rate; @@ -317,17 +323,92 @@ static struct clk_core *clk_core_lookup(const char *name) return NULL; } +/** + * clk_core_get - Find the parent of a clk using a clock specifier in DT + * @core: clk to find parent of + * @name: name to search for in 'clock-names' of device providing clk + * + * This is the preferred method for clk providers to find the parent of a + * clk when that parent is external to the clk controller. The parent_names + * array is indexed and treated as a local name matching a string in the device + * node's 'clock-names' property. This allows clk providers to use their own + * namespace instead of looking for a globally unique parent string. + * + * For example the following DT snippet would allow a clock registered by the + * clock-controller@c001 that has a clk_init_data::parent_data array + * with 'xtal' in the 'name' member to find the clock provided by the + * clock-controller@f00abcd without needing to get the globally unique name of + * the xtal clk. + * + * parent: clock-controller@f00abcd { + * reg = <0xf00abcd 0xabcd>; + * #clock-cells = <0>; + * }; + * + * clock-controller@c001 { + * reg = <0xc001 0xf00d>; + * clocks = <&parent>; + * clock-names = "xtal"; + * #clock-cells = <1>; + * }; + * + * Returns: -ENOENT when the provider can't be found or the clk doesn't + * exist in the provider. -EINVAL when the name can't be found. NULL when the + * provider knows about the clk but it isn't provided on thi
[PATCH v4 4/9] driver core: Let dev_of_node() accept a NULL dev
We'd like to chain this in places where the 'dev' argument might be NULL. Let this function take a NULL 'dev' so this can work. Cc: Miquel Raynal Cc: Jerome Brunet Cc: Russell King Cc: Michael Turquette Cc: Jeffrey Hugo Cc: Chen-Yu Tsai Cc: Greg Kroah-Hartman Cc: Rob Herring Signed-off-by: Stephen Boyd --- Please ack/review so I can take this through clk tree. include/linux/device.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/device.h b/include/linux/device.h index b425a7ee04ce..0370dd0b3ae7 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1231,7 +1231,7 @@ static inline void device_lock_assert(struct device *dev) static inline struct device_node *dev_of_node(struct device *dev) { - if (!IS_ENABLED(CONFIG_OF)) + if (!IS_ENABLED(CONFIG_OF) || !dev) return NULL; return dev->of_node; } -- Sent by a computer through tubes
[PATCH v4 3/9] clk: Prepare for clk registration API that uses DT nodes
Split out the body of the clk_register() function so it can be shared between the different types of registration APIs (DT, device). Cc: Miquel Raynal Cc: Jerome Brunet Cc: Russell King Cc: Michael Turquette Cc: Jeffrey Hugo Cc: Chen-Yu Tsai Cc: Rob Herring Signed-off-by: Stephen Boyd --- drivers/clk/clk.c | 31 ++- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 96053a96fe2f..d27775a73e67 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -3313,18 +3313,7 @@ struct clk *clk_hw_create_clk(struct device *dev, struct clk_hw *hw, return clk; } -/** - * clk_register - allocate a new clock, register it and return an opaque cookie - * @dev: device that is registering this clock - * @hw: link to hardware-specific clock data - * - * clk_register is the primary interface for populating the clock tree with new - * clock nodes. It returns a pointer to the newly allocated struct clk which - * cannot be dereferenced by driver code but may be used in conjunction with the - * rest of the clock API. In the event of an error clk_register will return an - * error code; drivers must test for an error code after calling clk_register. - */ -struct clk *clk_register(struct device *dev, struct clk_hw *hw) +static struct clk *__clk_register(struct device *dev, struct clk_hw *hw) { int i, ret; struct clk_core *core; @@ -3426,6 +3415,22 @@ struct clk *clk_register(struct device *dev, struct clk_hw *hw) fail_out: return ERR_PTR(ret); } + +/** + * clk_register - allocate a new clock, register it and return an opaque cookie + * @dev: device that is registering this clock + * @hw: link to hardware-specific clock data + * + * clk_register is the primary interface for populating the clock tree with new + * clock nodes. It returns a pointer to the newly allocated struct clk which + * cannot be dereferenced by driver code but may be used in conjunction with the + * rest of the clock API. In the event of an error clk_register will return an + * error code; drivers must test for an error code after calling clk_register. + */ +struct clk *clk_register(struct device *dev, struct clk_hw *hw) +{ + return __clk_register(dev, hw); +} EXPORT_SYMBOL_GPL(clk_register); /** @@ -3440,7 +3445,7 @@ EXPORT_SYMBOL_GPL(clk_register); */ int clk_hw_register(struct device *dev, struct clk_hw *hw) { - return PTR_ERR_OR_ZERO(clk_register(dev, hw)); + return PTR_ERR_OR_ZERO(__clk_register(dev, hw)); } EXPORT_SYMBOL_GPL(clk_hw_register); -- Sent by a computer through tubes