regression (bisected): "modprobe parport_pc" hangs in current mainline
Hello, I encountered a regression in current (post-5.0) mainline kernel which I bisected to commit 1aec4211204d ("parport: daisy: use new parport device model"). Running "modprobe parport_pc" hangs up: tweed:~ # ps ax | grep modprobe 1206 pts/0D+ 0:00 modprobe parport_pc 1209 ?S 0:00 /sbin/modprobe -q -- parport_lowlevel 1211 pts/1S+ 0:00 grep modprobe tweed:~ # cat /proc/1206/stack [<0>] call_usermodehelper_exec+0xc7/0x140 [<0>] __request_module+0x1a1/0x430 [<0>] __parport_register_driver+0x142/0x150 [parport] [<0>] parport_bus_init+0x1d/0x30 [parport] [<0>] parport_default_proc_register+0x28/0x1000 [parport] [<0>] do_one_initcall+0x46/0x1cd [<0>] do_init_module+0x5b/0x20d [<0>] load_module+0x1b3d/0x20f0 [<0>] __do_sys_finit_module+0xbd/0xe0 [<0>] do_syscall_64+0x60/0x120 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<0>] 0x tweed:~ # cat /proc/1209/stack [<0>] load_module+0xe6a/0x20f0 [<0>] __do_sys_finit_module+0xbd/0xe0 [<0>] do_syscall_64+0x60/0x120 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<0>] 0x call_usermodehelper_exec+0xc7/0x140 is (build from commit 1aec4211204d) line 583 in kernel/umh.c: retval = wait_for_completion_killable(&done); and load_module+0xe6a/0x20f0 is in add_unformed_module(), line 3577 in kernel/module.c: err = wait_event_interruptible(module_wq, finished_loading(mod->name)); Unfortunately I don't have version of crash able to deal with kernels as new as these so I wasn't able to find more for now. I have seen this both on real hardware and in a VM. Michal Kubecek
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On Tue, Mar 12, 2019 at 10:11 PM Tetsuo Handa wrote: > > On 2019/03/13 2:15, Dmitry Vyukov wrote: > >> Also, this bisection is finding multiple different crash patterns, which > >> suggests that the crashed tests are not giving correct feedback to syzbot. > > > > Treating different crashes as just "crash" is intended. Kernel bugs > > can manifest in very different ways. > > Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update > > in parallel" in https://syzkaller.appspot.com/?fixed=upstream > > It lead to 50+ different failure modes. > > > > But syzbot already found a rather simple C reproducer > ( https://syzkaller.appspot.com/text?tag=ReproC&x=116fc7a8c0 ) for this > bug. > Was this reproducer used for bisection? The C reproducer used for bisection is provided as "C reproducer" in the bisection report. > I guess that if this reproducer was used, > syzbot did not hit "WARNING: ODEBUG bug in netdev_freemem" cases. Maybe. But we won't have more than 1 in future. Currently syzbot bisects over a backlog of crashes, some of them accumulated multiple reproducers over weeks/months/years. When it will bisect newly reported bugs as they are found, there will be only 1 reproducer. E.g. these two for this bug were found within a month. > Also, humans can sometimes find more simpler C reproducers from syzbot > provided > reproducers. It would be nice if syzbot can accept and use a user defined C > reproducer for testing. It would be more useful to accept patches that make syzkaller create better reproducers from these people. Manual work is not scalable. We would need 10 reproducers per day for a dozen of OSes (incl some private kernels/branches). Anybody is free to run syzkaller manually and do full manual (perfect) reporting. But for us it become clear very early that it won't work. Then see above, while that human is sleeping/on weekend/vacation, syzbot will already bisect own reproducer. Adding manual reproducer later won't help in any way. syzkaller already does lots of smart work for reproducers. Let's not give up on the last mile and switch back to all manual work.
Re: WARNING: bad usercopy in fanotify_read
On Wed, Mar 13, 2019 at 8:26 AM Kees Cook wrote: > > On Mon, Mar 11, 2019 at 1:42 PM syzbot > wrote: > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17ee410b20 > > [...] > > [ cut here ] > > Bad or missing usercopy whitelist? Kernel memory exposure attempt detected > > from SLAB object 'fanotify_event' (offset 40, size 8)! > > [...] > > copy_to_user include/linux/uaccess.h:151 [inline] > > copy_fid_to_user fs/notify/fanotify/fanotify_user.c:236 [inline] > > copy_event_to_user fs/notify/fanotify/fanotify_user.c:294 [inline] > > Looks like this is the fh/ext_fh union in struct fanotify_fid, field > "fid" in struct fanotify_event. Given that "fid" is itself in a union > against a struct path, I think instead of a whitelist using > KMEM_CACHE_USERCOPY(), this should just use a bounce buffer to avoid > leaving a whitelist open for path or ext_fh exposure. > > Maybe something like this (untested): I tested. Patch is fine by me with minor nit. You may add: Reviewed-by: Amir Goldstein > > diff --git a/fs/notify/fanotify/fanotify_user.c > b/fs/notify/fanotify/fanotify_user.c > index 56992b32c6bb..b87da9580b3c 100644 > --- a/fs/notify/fanotify/fanotify_user.c > +++ b/fs/notify/fanotify/fanotify_user.c > @@ -207,6 +207,7 @@ static int process_access_response(struct > fsnotify_group *group, > static int copy_fid_to_user(struct fanotify_event *event, char __user *buf) > { > struct fanotify_event_info_fid info = { }; > + unsigned char bounce[FANOTIFY_INLINE_FH_LEN], *fh; > struct file_handle handle = { }; > size_t fh_len = event->fh_len; > size_t len = fanotify_event_info_len(event); > @@ -233,7 +234,18 @@ static int copy_fid_to_user(struct fanotify_event > *event, char __user *buf) > > buf += sizeof(handle); > len -= sizeof(handle); > - if (copy_to_user(buf, fanotify_event_fh(event), fh_len)) > + > + /* > +* For an inline fh, copy through stack to exclude the copy from > +* usercopy hardening protections. > +*/ > + fh = fanotify_event_fh(event); > + if (fh_len <= sizeof(bounce)) { Prefer <= FANOTIFY_INLINE_FH_LEN > + memcpy(bounce, fh, fh_len); > + fh = bounce; > + } > + > + if (copy_to_user(buf, fh, fh_len)) > return -EFAULT; > > /* Pad with 0's */ > > > -- > Kees Cook
Re: [PATCH v3 2/2] dt-bindings: net: bluetooth: Add device tree bindings for QTI chip wcn3998
Hi Matthias, On 2019-03-12 22:29, Matthias Kaehlcke wrote: +DT folks Please add them in future versions (script/scripts/get_maintainer.pl should have listed them) [Harish] -- Will add them in new version of patches. On Tue, Mar 12, 2019 at 05:52:59PM +0530, Harish Bandi wrote: This patch enables regulators for the Qualcomm Bluetooth wcn3998 controller. No, it doesn't. The next version should probably say something like "Add compatible string for the Qualcomm WCN3998 Bluetooth controller. [Harish] -- From new patch onwards will add all patch version changes and add proper description. Is there any particular reason why QCA drivers folks use 'wcn' instead of 'WCN'? The QCA documentations calls it WCN399x, so I'd suggest to consistently use the uppercase name in comments and documentation (and log messages?). [Harish] -- I think in DT we need to have small case like wcn, i think that is the reason it started using in code, comments and dt documentation. Signed-off-by: Harish Bandi --- changes in v3: - updated to latest code base. This comment is useless, please describe what changed wrt the previous version. [Harish] -- added details in v2, and v3 uploaded just to rebase on tip of bluetooth-next for better understanding of code in review. From new patch onwards will add all patch version changes and add proper description. --- .../devicetree/bindings/net/qualcomm-bluetooth.txt| 15 +++ 1 file changed, 15 insertions(+) diff --git a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt index 824c0e2..1221535 100644 --- a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt +++ b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt @@ -53,3 +53,18 @@ serial@898000 { max-speed = <320>; }; }; + +&blsp1_uart3 { + pinctrl-names = "default"; + pinctrl-0 = <&blsp1_uart3_default>; + status = "okay"; + + bluetooth: wcn3998-bt { + compatible = "qcom,wcn3998-bt"; + vddio-supply = <&vreg_l6_1p8>; + vddxo-supply = <&vreg_l5_1p8>; + vddrf-supply = <&vreg_s5_1p35>; + vddch0-supply = <&vdd_ch0_3p3>; + max-speed = <320>; + }; +}; \ No newline at end of file I think the example isn't really needed since it's essentially the same as the one for 'qcom,wcn3990-bt'. But the important part is missing: add the new compatible string under ´Required properties´. You also want to update the documentation that mentiones 'qcom,wcn3990-bt' to 'qcom,wcn399x-bt' (assuming for now that other possible WCN399x chips would be similar). [Harish] -- Will check the DT properties, documentation and update accordingly in new patch. You mentioned in an earlier version of the series that there are multiple WCN3998 variants with different requirements for voltage/current. This seems to suggests that multiple compatible strings are needed to distinguish between them. [Harish] -- for now we want to add WCN3998 support only, What i mean to say in my earlier explanation that. WCN3990 is base variant and on top of that we have variants like WCN3990, WCN3998 and WCN3998-0,WCN3998-1 like that.. So I think wcn399x would make sense for this series. Thanks Matthias Thanks, Harish
Re: WARNING: bad usercopy in fanotify_read
On Mon, Mar 11, 2019 at 1:42 PM syzbot wrote: > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17ee410b20 > [...] > [ cut here ] > Bad or missing usercopy whitelist? Kernel memory exposure attempt detected > from SLAB object 'fanotify_event' (offset 40, size 8)! > [...] > copy_to_user include/linux/uaccess.h:151 [inline] > copy_fid_to_user fs/notify/fanotify/fanotify_user.c:236 [inline] > copy_event_to_user fs/notify/fanotify/fanotify_user.c:294 [inline] Looks like this is the fh/ext_fh union in struct fanotify_fid, field "fid" in struct fanotify_event. Given that "fid" is itself in a union against a struct path, I think instead of a whitelist using KMEM_CACHE_USERCOPY(), this should just use a bounce buffer to avoid leaving a whitelist open for path or ext_fh exposure. Maybe something like this (untested): diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 56992b32c6bb..b87da9580b3c 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -207,6 +207,7 @@ static int process_access_response(struct fsnotify_group *group, static int copy_fid_to_user(struct fanotify_event *event, char __user *buf) { struct fanotify_event_info_fid info = { }; + unsigned char bounce[FANOTIFY_INLINE_FH_LEN], *fh; struct file_handle handle = { }; size_t fh_len = event->fh_len; size_t len = fanotify_event_info_len(event); @@ -233,7 +234,18 @@ static int copy_fid_to_user(struct fanotify_event *event, char __user *buf) buf += sizeof(handle); len -= sizeof(handle); - if (copy_to_user(buf, fanotify_event_fh(event), fh_len)) + + /* +* For an inline fh, copy through stack to exclude the copy from +* usercopy hardening protections. +*/ + fh = fanotify_event_fh(event); + if (fh_len <= sizeof(bounce)) { + memcpy(bounce, fh, fh_len); + fh = bounce; + } + + if (copy_to_user(buf, fh, fh_len)) return -EFAULT; /* Pad with 0's */ -- Kees Cook
Re: [PATCH v3 1/2] Bluetooth: hci_qca: Added support for wcn3998
Hi Matthias, On 2019-03-12 21:59, Matthias Kaehlcke wrote: Hi Harish, On Tue, Mar 12, 2019 at 05:52:58PM +0530, Harish Bandi wrote: Added new compatible for wcn3998 and corresponding voltage and current values to wcn3998 compatible. Changed driver code to support wcn3998 Signed-off-by: Harish Bandi --- changes in v3: - updated to latest code base. This is not useful, for future versions please describe what changed (e.g. 'specify regulator constraints in the driver instead of the DT') [Harish] -- added details in v2, and v3 uploaded just to rebase on tip of bluetooth-next for better understanding of code in review. From new patch onwards will add all patch version changes and add proper description. --- drivers/bluetooth/btqca.c | 4 ++-- drivers/bluetooth/btqca.h | 3 ++- drivers/bluetooth/hci_qca.c | 40 ++-- 3 files changed, 30 insertions(+), 17 deletions(-) diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c index 6122685..70cab13 100644 --- a/drivers/bluetooth/btqca.c +++ b/drivers/bluetooth/btqca.c @@ -344,7 +344,7 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, /* Download rampatch file */ config.type = TLV_TYPE_PATCH; - if (soc_type == QCA_WCN3990) { + if (soc_type >= QCA_WCN3990) { That works, but isn't super-clear and might need to be adapted when future non-WCN399x controllers are added. Some possible alternatives: - is_wcn399x(soc_type) - have a family (Rome, Cherokee (IIRC this name was used for WCN3990)) and a chip id (QCA6174, WCN3990, WCN3998, ...) [Harish] -- Will change like is_wcn399x(soc_type) and come up with new patch /* Firmware files to download are based on ROM version. * ROM version is derived from last two bytes of soc_ver. */ @@ -365,7 +365,7 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, /* Download NVM configuration */ config.type = TLV_TYPE_NVM; - if (soc_type == QCA_WCN3990) + if (soc_type >= QCA_WCN3990) snprintf(config.fwname, sizeof(config.fwname), "qca/crnv%02x.bin", rom_ver); else diff --git a/drivers/bluetooth/btqca.h b/drivers/bluetooth/btqca.h index c72c56e..f03d96e 100644 --- a/drivers/bluetooth/btqca.h +++ b/drivers/bluetooth/btqca.h @@ -132,7 +132,8 @@ enum qca_btsoc_type { QCA_INVALID = -1, QCA_AR3002, QCA_ROME, - QCA_WCN3990 + QCA_WCN3990, + QCA_WCN3998 nit: if you add a comma after the last value the line doesn't need to be changed when a new type is added in the future. [Harish] -- will take care in new patch Is 'WCN3998' specific enough? You mentioned earlier that there are multiple WCN3998 variants with different requirements for regulator voltages/max currents. Which names does Qualcomm use to distinguish between them (e.g. WCN3998-A, WCN3998-B, ...)? [Harish] -- for now we want to add WCN3998 support only, What i mean to say in my earlier explanation that. WCN3990 is base variant and on top of that we have variants like WCN3990, WCN3998 and WCN3998-0,WCN3998-1 like that.. Thanks Matthias Thanks, Harish
Re: [PATCH v5] PM / devfreq: Restart previous governor if new governor fails to start
On 2019-03-12 12:47, MyungJoo Ham wrote: From: Saravana Kannan If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. [Mjungjoo: assume fatal on revert failure and set df->governor to NULL] Signed-off-by: Sibi Sankar Signed-off-by: Saravana Kannan Reviewed-by: Chanwoo Choi I'll modify WARN->ERROR for the case when it's fatal: Sure, thanks. + if (ret) { + dev_warn(dev, +"%s: reverting to Governor %s failed (%d)\n", +__func__, df->governor_name, ret); + df->governor = NULL; + } Acked-by: MyungJoo Ham --- V5: * assume fatal on revert failure and set df->governor to NULL V4: * Removed prev_governor check. V3: * Fix NULL deref for real this time. * Addressed some style preferences. V2: * Fixed typo in commit text * Fixed potential NULL deref drivers/devfreq/devfreq.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) -- -- Sibi Sankar -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
On Tue, Mar 12, 2019 at 12:59:34PM -0700, Mike Kravetz wrote: > On 3/11/19 2:36 AM, Peter Xu wrote: > > > > The "kvm" entry is a bit special here only to make sure that existing > > users like QEMU/KVM won't break by this newly introduced flag. What > > we need to do is simply set the "unprivileged_userfaultfd" flag to > > "kvm" here to automatically grant userfaultfd permission for processes > > like QEMU/KVM without extra code to tweak these flags in the admin > > code. > > Another user is Oracle DB, specifically with hugetlbfs. For them, we would > like to add a special case like kvm described above. The admin controls > who can have access to hugetlbfs, so I think adding code to the open > routine as in patch 2 of this series would seem to work. Yes I think if there's an explicit and safe place we can hook for hugetlbfs then we can do the similar trick as KVM case. Though I noticed that we can not only create hugetlbfs files under the mountpoint (which the admin can control), but also using some other ways. The question (of me... sorry if it's a silly one!) is whether all other ways to use hugetlbfs is still under control of the admin. One I know of is memfd_create() which seems to be doable even as unprivileged users. If so, should we only limit the uffd privilege to those hugetlbfs users who use the mountpoint directly? Another question is about fork() of privileged processes - for KVM we only grant privilege for the exact process that opened the /dev/kvm node, and the privilege will be lost for any forked childrens. Is that the same thing for OracleDB/Hugetlbfs? > > However, I can imagine more special cases being added for other users. And, > once you have more than one special case then you may want to combine them. > For example, kvm and hugetlbfs together. It looks fine to me if we're using MMF_USERFAULTFD_ALLOW flag upon mm_struct, since that seems to be a very general flag that can be used by anything we want to grant privilege for, not only KVM? Thanks, -- Peter Xu
Re: [RFC][PATCH 00/16] sched: Core scheduling
On Tue, Mar 12, 2019 at 3:45 PM Aubrey Li wrote: > > On Tue, Mar 12, 2019 at 7:36 AM Subhra Mazumdar > wrote: > > > > > > On 3/11/19 11:34 AM, Subhra Mazumdar wrote: > > > > > > On 3/10/19 9:23 PM, Aubrey Li wrote: > > >> On Sat, Mar 9, 2019 at 3:50 AM Subhra Mazumdar > > >> wrote: > > >>> expected. Most of the performance recovery happens in patch 15 which, > > >>> unfortunately, is also the one that introduces the hard lockup. > > >>> > > >> After applied Subhra's patch, the following is triggered by enabling > > >> core sched when a cgroup is > > >> under heavy load. > > >> > > > It seems you are facing some other deadlock where printk is involved. > > > Can you > > > drop the last patch (patch 16 sched: Debug bits...) and try? > > > > > > Thanks, > > > Subhra > > > > > Never Mind, I am seeing the same lockdep deadlock output even w/o patch > > 16. Btw > > the NULL fix had something missing, > > One more NULL pointer dereference: > > Mar 12 02:24:46 aubrey-ivb kernel: [ 201.916741] core sched enabled > [ 201.950203] BUG: unable to handle kernel NULL pointer dereference > at 0008 > [ 201.950254] [ cut here ] > [ 201.959045] #PF error: [normal kernel read fault] > [ 201.964272] !se->on_rq > [ 201.964287] WARNING: CPU: 22 PID: 2965 at kernel/sched/fair.c:6849 > set_next_buddy+0x52/0x70 A quick workaround below: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1d0dac4fd94f..ef6acfe2cf7d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6834,7 +6834,7 @@ static void set_last_buddy(struct sched_entity *se) return; for_each_sched_entity(se) { - if (SCHED_WARN_ON(!se->on_rq)) + if (SCHED_WARN_ON(!(se && se->on_rq)) return; cfs_rq_of(se)->last = se; } @@ -6846,7 +6846,7 @@ static void set_next_buddy(struct sched_entity *se) return; for_each_sched_entity(se) { - if (SCHED_WARN_ON(!se->on_rq)) + if (SCHED_WARN_ON(!(se && se->on_rq)) return; cfs_rq_of(se)->next = se; } And now I'm running into a hard LOCKUP: [ 326.336279] NMI watchdog: Watchdog detected hard LOCKUP on cpu 31 [ 326.336280] Modules linked in: ipt_MASQUERADE xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntracki [ 326.336311] irq event stamp: 164460 [ 326.336312] hardirqs last enabled at (164459): [] sched_core_balance+0x247/0x470 [ 326.336312] hardirqs last disabled at (164460): [] sched_core_balance+0x113/0x470 [ 326.336313] softirqs last enabled at (164250): [] __do_softirq+0x359/0x40a [ 326.336314] softirqs last disabled at (164213): [] irq_exit+0xc1/0xd0 [ 326.336315] CPU: 31 PID: 0 Comm: swapper/31 Tainted: G I 5.0.0-rc8-00542-gd697415be692-dirty #15 [ 326.336316] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x058.082120120902 08/21/2012 [ 326.336317] RIP: 0010:native_queued_spin_lock_slowpath+0x18f/0x1c0 [ 326.336318] Code: c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 80 51 1e 00 48 03 04 f5 40 58 39 82 48 89 10 8b 42 08 85 c0 75 09 f3 90 <8b> 42 08 85 c0 74 f7 4b [ 326.336318] RSP: :c9000643bd58 EFLAGS: 0046 [ 326.336319] RAX: RBX: 888c0ade4400 RCX: 0080 [ 326.336320] RDX: 88980bbe5180 RSI: 0019 RDI: 888c0ade4400 [ 326.336321] RBP: 888c0ade4400 R08: 0080 R09: 001e3a80 [ 326.336321] R10: c9000643bd08 R11: R12: [ 326.336322] R13: R14: 88980bbe4400 R15: 001f [ 326.336323] FS: () GS:88980ba0() knlGS: [ 326.336323] CS: 0010 DS: ES: CR0: 80050033 [ 326.336324] CR2: 7fdcd7fd7728 CR3: 0017e821a001 CR4: 000606e0 [ 326.336325] Call Trace: [ 326.336325] do_raw_spin_lock+0xab/0xb0 [ 326.336326] _raw_spin_lock+0x4b/0x60 [ 326.336326] double_rq_lock+0x99/0x140 [ 326.336327] sched_core_balance+0x11e/0x470 [ 326.336327] __balance_callback+0x49/0xa0 [ 326.336328] __schedule+0x1113/0x1570 [ 326.336328] schedule_idle+0x1e/0x40 [ 326.336329] do_idle+0x16b/0x2a0 [ 326.336329] cpu_startup_entry+0x19/0x20 [ 326.336330] start_secondary+0x17f/0x1d0 [ 326.336331] secondary_startup_64+0xa4/0xb0 [ 330.959367] ---[ end Kernel panic - not syncing: Hard LOCKUP ]---
Re: [PATCH] arch: arm: Kconfig: pedantic formatting
On 13.03.19 00:54, Andrew Jeffery wrote: > > For the ASPEED bits: > > Acked-by: Andrew Jeffery > shall I split the patch ? --mtx -- Enrico Weigelt, metux IT consult Free software and Linux embedded engineering i...@metux.net -- +49-151-27565287
[PATCH v2 2/2] dmaengine: tegra210-adma: update system sleep callbacks
If the driver is active till late suspend, where runtime PM cannot run, force suspend is essential in such case to put the device in low power state. Thus pm_runtime_force_suspend and pm_runtime_force_resume are used as system sleep callbacks during system wide PM transitions. Signed-off-by: Sameer Pujar --- drivers/dma/tegra210-adma.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/dma/tegra210-adma.c b/drivers/dma/tegra210-adma.c index 650cd9c..be29171 100644 --- a/drivers/dma/tegra210-adma.c +++ b/drivers/dma/tegra210-adma.c @@ -796,17 +796,11 @@ static int tegra_adma_remove(struct platform_device *pdev) return 0; } -#ifdef CONFIG_PM_SLEEP -static int tegra_adma_pm_suspend(struct device *dev) -{ - return pm_runtime_suspended(dev) == false; -} -#endif - static const struct dev_pm_ops tegra_adma_dev_pm_ops = { SET_RUNTIME_PM_OPS(tegra_adma_runtime_suspend, tegra_adma_runtime_resume, NULL) - SET_SYSTEM_SLEEP_PM_OPS(tegra_adma_pm_suspend, NULL) + SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, + pm_runtime_force_resume) }; static struct platform_driver tegra_admac_driver = { -- 2.7.4
[PATCH v2 1/2] dmaengine: tegra210-adma: use devm_clk_*() helpers
adma driver is using pm_clk_*() interface for managing clock resources. With this it is observed that clocks remain ON always. This happens on Tegra devices which use BPMP co-processor to manage clock resources, where clocks are enabled during prepare phase. This is necessary because clocks to BPMP are always blocking. When pm_clk_*() interface is used on such Tegra devices, clock prepare count is not balanced till remove call happens for the driver and hence clocks are seen ON always. Thus this patch replaces pm_clk_*() with devm_clk_*() framework. Suggested-by: Mohan Kumar D Reviewed-by: Jonathan Hunter Signed-off-by: Sameer Pujar --- drivers/dma/tegra210-adma.c | 27 --- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/drivers/dma/tegra210-adma.c b/drivers/dma/tegra210-adma.c index 5ec0dd9..650cd9c 100644 --- a/drivers/dma/tegra210-adma.c +++ b/drivers/dma/tegra210-adma.c @@ -22,7 +22,6 @@ #include #include #include -#include #include #include @@ -141,6 +140,7 @@ struct tegra_adma { struct dma_device dma_dev; struct device *dev; void __iomem*base_addr; + struct clk *ahub_clk; unsigned intnr_channels; unsigned long rx_requests_reserved; unsigned long tx_requests_reserved; @@ -637,8 +637,9 @@ static int tegra_adma_runtime_suspend(struct device *dev) struct tegra_adma *tdma = dev_get_drvdata(dev); tdma->global_cmd = tdma_read(tdma, ADMA_GLOBAL_CMD); + clk_disable_unprepare(tdma->ahub_clk); - return pm_clk_suspend(dev); + return 0; } static int tegra_adma_runtime_resume(struct device *dev) @@ -646,10 +647,11 @@ static int tegra_adma_runtime_resume(struct device *dev) struct tegra_adma *tdma = dev_get_drvdata(dev); int ret; - ret = pm_clk_resume(dev); - if (ret) + ret = clk_prepare_enable(tdma->ahub_clk); + if (ret) { + dev_err(dev, "ahub clk_enable failed: %d\n", ret); return ret; - + } tdma_write(tdma, ADMA_GLOBAL_CMD, tdma->global_cmd); return 0; @@ -693,13 +695,11 @@ static int tegra_adma_probe(struct platform_device *pdev) if (IS_ERR(tdma->base_addr)) return PTR_ERR(tdma->base_addr); - ret = pm_clk_create(&pdev->dev); - if (ret) - return ret; - - ret = of_pm_clk_add_clk(&pdev->dev, "d_audio"); - if (ret) - goto clk_destroy; + tdma->ahub_clk = devm_clk_get(&pdev->dev, "d_audio"); + if (IS_ERR(tdma->ahub_clk)) { + dev_err(&pdev->dev, "Error: Missing ahub controller clock\n"); + return PTR_ERR(tdma->ahub_clk); + } pm_runtime_enable(&pdev->dev); @@ -776,8 +776,6 @@ static int tegra_adma_probe(struct platform_device *pdev) pm_runtime_put_sync(&pdev->dev); rpm_disable: pm_runtime_disable(&pdev->dev); -clk_destroy: - pm_clk_destroy(&pdev->dev); return ret; } @@ -794,7 +792,6 @@ static int tegra_adma_remove(struct platform_device *pdev) pm_runtime_put_sync(&pdev->dev); pm_runtime_disable(&pdev->dev); - pm_clk_destroy(&pdev->dev); return 0; } -- 2.7.4
RE: [PATCH v3 6/6] dt-bindings: fpga: Add bindings for ZynqMP fpga driver
Ping !! > -Original Message- > From: Nava kishore Manne > Sent: Tuesday, March 5, 2019 3:12 PM > To: 'Rob Herring' > Cc: mark.rutl...@arm.com; Michal Simek ; Rajan Vaja > ; linux-arm-ker...@lists.infradead.org; linux- > ker...@vger.kernel.org; devicet...@vger.kernel.org; Jolly Shah > ; chinnikishore...@gmail.com; 'Alan Tull' > ; Moritz Fischer > Subject: RE: [PATCH v3 6/6] dt-bindings: fpga: Add bindings for ZynqMP fpga > driver > > Hi Rob, > > Thanks for the quick response. > Please find my response inline. > > > -Original Message- > > From: Rob Herring [mailto:r...@kernel.org] > > Sent: Monday, March 4, 2019 10:57 PM > > To: Nava kishore Manne > > Cc: mark.rutl...@arm.com; Michal Simek ; Rajan > > Vaja ; linux-arm-ker...@lists.infradead.org; linux- > > ker...@vger.kernel.org; devicet...@vger.kernel.org; Jolly Shah > > ; chinnikishore...@gmail.com > > Subject: Re: [PATCH v3 6/6] dt-bindings: fpga: Add bindings for ZynqMP > > fpga driver > > > > On Mon, Mar 4, 2019 at 5:35 AM Nava kishore Manne > > wrote: > > > > > > Hi Rob, > > > > > > Thanks for providing the review comments.. > > > Please find my response inline. > > > > > > > -Original Message- > > > > From: Rob Herring [mailto:r...@kernel.org] > > > > Sent: Saturday, February 23, 2019 2:01 AM > > > > To: Nava kishore Manne > > > > Cc: mark.rutl...@arm.com; Michal Simek ; Rajan > > > > Vaja ; linux-arm-ker...@lists.infradead.org; > > > > linux- ker...@vger.kernel.org; devicet...@vger.kernel.org; Jolly > > > > Shah ; chinnikishore...@gmail.com > > > > Subject: Re: [PATCH v3 6/6] dt-bindings: fpga: Add bindings for > > > > ZynqMP fpga driver > > > > > > > > On Wed, Jan 23, 2019 at 2:46 PM Nava kishore Manne > > > > > > > > wrote: > > > > > > > > > > Hi Rob, > > > > > > > > > > > > > > > > > > > > Thanks for providing the comments... > > > > > > > > Please fix your mailer to send plain text emails to mail lists. > > > > > > > Thanks for pointing it.. > > > > > > > > > > > > > -Original Message- > > > > > > > > > > > From: Rob Herring [mailto:r...@kernel.org] > > > > > > > > > > > Sent: Monday, January 21, 2019 9:19 PM > > > > > > > > > > > To: Nava kishore Manne > > > > > > > > > > > Cc: mark.rutl...@arm.com; Michal Simek ; > > > > > > Rajan Vaja > > > > > > > > > > > ; linux-arm-ker...@lists.infradead.org; > > > > > > linux- > > > > > > > > > > > ker...@vger.kernel.org; devicet...@vger.kernel.org; Jolly Shah > > > > > > > > > > > ; chinnikishore...@gmail.com > > > > > > > > > > > Subject: Re: [PATCH v3 6/6] dt-bindings: fpga: Add bindings > > > > > > for ZynqMP fpga > > > > > > > > > > > driver > > > > > > > > > > > > > > > > > > > > > > On Mon, Jan 21, 2019 at 11:08:35PM +0530, Nava kishore Manne > wrote: > > > > > > > > > > > > Add documentation to describe Xilinx ZynqMP fpga driver bindings. > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Nava kishore Manne > > > > > > > > > > > > --- > > > > > > > > > > > > Changes for v3: > > > > > > > > > > > > -Removed PCAP as a child node to the > > > > > > > FW and Created > > > > > > > > > > > > an independent node since PCAP > > > > > > > driver is a consumer > > > > > > > > > > > > not a provider. > > > > > > > > > > > > > > > > > > > > > > > > .../bindings/fpga/xlnx,zynqmp-pcap-fpga.txt | 13 > + > > > > > > > > > > > > 1 file changed, 13 insertions(+) > > > > > > > > > > > > create mode 100644 > > > > > > > > > > > > Documentation/devicetree/bindings/fpga/xlnx,zynqmp-pcap-fpga > > > > > > > .t > > > > > > > xt > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > a/Documentation/devicetree/bindings/fpga/xlnx,zynqmp-pcap-fp > > > > > > > ga > > > > > > > .txt > > > > > > > > > > > > b/Documentation/devicetree/bindings/fpga/xlnx,zynqmp-pcap-fp > > > > > > > ga > > > > > > > .txt > > > > > > > > > > > > new file mode 100644 > > > > > > > > > > > > index ..1f6f58872311 > > > > > > > > > > > > --- /dev/null > > > > > > > > > > > > +++ b/Documentation/devicetree/bindings/fpga/xlnx,zynqmp-pca > > > > > > > +++ p- > > > > > > > +++ fpga > > > > > > > +++ .txt > > > > > > > > > > > > @@ -0,0 +1,13 @@ > > > > > > > > > > > > +Device Tree zynqmp-fpga bindings for the Zynq Ultrascale+ > > > > > > > +MPSoC > > > > > > > > > > > > +controlled using ZynqMP SoC firmware interface For > > > > > > > +Bitstream > > > > > > > > > > > > +configuration on ZynqMp Soc uses processor configuration > > > > > > > > > > > > +port(PCAP) to configure the programmable logic(PL) through > > > > > > > +PS by > > > > > > > > > > > > +using FW interface. > > > > > > > > > > > > + > > > > > > > > > > > > +Required properties: > > > > > > > > > > > > +- compatible: should contain "xlnx,zynqmp-pcap-fpga" > > > > > > > > > > > > + > > > > > > > > > > > > +Example: > > > > > > > > > > > > + zynqmp_pcap: pcap { > > > > > > > > > > > > +
Zdravstvujte Vas interesuyut klientskie bazy dannyh?
Zdravstvujte Vas interesuyut klientskie bazy dannyh?
[PATCH v2 3/5] slab: Use slab_list instead of lru
Currently we use the page->lru list for maintaining lists of slabs. We have a list_head in the page structure (slab_list) that can be used for this purpose. Doing so makes the code cleaner since we are not overloading the lru list. The slab_list is part of a union within the page struct (included here stripped down): union { struct {/* Page cache and anonymous pages */ struct list_head lru; ... }; struct { dma_addr_t dma_addr; }; struct {/* slab, slob and slub */ union { struct list_head slab_list; struct {/* Partial pages */ struct page *next; int pages; /* Nr of pages left */ int pobjects; /* Approximate count */ }; }; ... Here we see that slab_list and lru are the same bits. We can verify that this change is safe to do by examining the object file produced from slab.c before and after this patch is applied. Steps taken to verify: 1. checkout current tip of Linus' tree commit a667cb7a94d4 ("Merge branch 'akpm' (patches from Andrew)") 2. configure and build (selecting SLAB allocator) CONFIG_SLAB=y CONFIG_SLAB_FREELIST_RANDOM=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_SLAB_LEAK=y CONFIG_HAVE_DEBUG_KMEMLEAK=y 3. dissasemble object file `objdump -dr mm/slab.o > before.s 4. apply patch 5. build 6. dissasemble object file `objdump -dr mm/slab.o > after.s 7. diff before.s after.s Use slab_list list_head instead of the lru list_head for maintaining lists of slabs. Reviewed-by: Roman Gushchin Signed-off-by: Tobin C. Harding --- mm/slab.c | 49 + 1 file changed, 25 insertions(+), 24 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index 28652e4218e0..09cc64ef9613 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1710,8 +1710,8 @@ static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list) { struct page *page, *n; - list_for_each_entry_safe(page, n, list, lru) { - list_del(&page->lru); + list_for_each_entry_safe(page, n, list, slab_list) { + list_del(&page->slab_list); slab_destroy(cachep, page); } } @@ -2265,8 +2265,8 @@ static int drain_freelist(struct kmem_cache *cache, goto out; } - page = list_entry(p, struct page, lru); - list_del(&page->lru); + page = list_entry(p, struct page, slab_list); + list_del(&page->slab_list); n->free_slabs--; n->total_slabs--; /* @@ -2726,13 +2726,13 @@ static void cache_grow_end(struct kmem_cache *cachep, struct page *page) if (!page) return; - INIT_LIST_HEAD(&page->lru); + INIT_LIST_HEAD(&page->slab_list); n = get_node(cachep, page_to_nid(page)); spin_lock(&n->list_lock); n->total_slabs++; if (!page->active) { - list_add_tail(&page->lru, &(n->slabs_free)); + list_add_tail(&page->slab_list, &n->slabs_free); n->free_slabs++; } else fixup_slab_list(cachep, n, page, &list); @@ -2841,9 +2841,9 @@ static inline void fixup_slab_list(struct kmem_cache *cachep, void **list) { /* move slabp to correct slabp list: */ - list_del(&page->lru); + list_del(&page->slab_list); if (page->active == cachep->num) { - list_add(&page->lru, &n->slabs_full); + list_add(&page->slab_list, &n->slabs_full); if (OBJFREELIST_SLAB(cachep)) { #if DEBUG /* Poisoning will be done without holding the lock */ @@ -2857,7 +2857,7 @@ static inline void fixup_slab_list(struct kmem_cache *cachep, page->freelist = NULL; } } else - list_add(&page->lru, &n->slabs_partial); + list_add(&page->slab_list, &n->slabs_partial); } /* Try to find non-pfmemalloc slab if needed */ @@ -2880,20 +2880,20 @@ static noinline struct page *get_valid_first_slab(struct kmem_cache_node *n, } /* Move pfmemalloc slab to the end of list to speed up next search */ - list_del(&page->lru); + list_del(&page->slab_list); if (!page->active) { - list_add_tail(&page->lru, &n->slabs_free); + list_add_tail(&page->slab_list, &n->slabs_free); n->free_slabs++; } else - list_add_tail(&page->lru, &n->slabs_partial); + list_
[PATCH v2 4/5] slob: Use slab_list instead of lru
Currently we use the page->lru list for maintaining lists of slabs. We have a list_head in the page structure (slab_list) that can be used for this purpose. Doing so makes the code cleaner since we are not overloading the lru list. The slab_list is part of a union within the page struct (included here stripped down): union { struct {/* Page cache and anonymous pages */ struct list_head lru; ... }; struct { dma_addr_t dma_addr; }; struct {/* slab, slob and slub */ union { struct list_head slab_list; struct {/* Partial pages */ struct page *next; int pages; /* Nr of pages left */ int pobjects; /* Approximate count */ }; }; ... Here we see that slab_list and lru are the same bits. We can verify that this change is safe to do by examining the object file produced from slob.c before and after this patch is applied. Steps taken to verify: 1. checkout current tip of Linus' tree commit a667cb7a94d4 ("Merge branch 'akpm' (patches from Andrew)") 2. configure and build (select SLOB allocator) CONFIG_SLOB=y CONFIG_SLAB_MERGE_DEFAULT=y 3. dissasemble object file `objdump -dr mm/slub.o > before.s 4. apply patch 5. build 6. dissasemble object file `objdump -dr mm/slub.o > after.s 7. diff before.s after.s Use slab_list list_head instead of the lru list_head for maintaining lists of slabs. Reviewed-by: Roman Gushchin Signed-off-by: Tobin C. Harding --- mm/slob.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/slob.c b/mm/slob.c index 307c2c9feb44..ee68ff2a2833 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -112,13 +112,13 @@ static inline int slob_page_free(struct page *sp) static void set_slob_page_free(struct page *sp, struct list_head *list) { - list_add(&sp->lru, list); + list_add(&sp->slab_list, list); __SetPageSlobFree(sp); } static inline void clear_slob_page_free(struct page *sp) { - list_del(&sp->lru); + list_del(&sp->slab_list); __ClearPageSlobFree(sp); } @@ -283,7 +283,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node) spin_lock_irqsave(&slob_lock, flags); /* Iterate through each partially free page, try to find room */ - list_for_each_entry(sp, slob_list, lru) { + list_for_each_entry(sp, slob_list, slab_list) { #ifdef CONFIG_NUMA /* * If there's a node specification, search for a partial @@ -297,7 +297,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node) continue; /* Attempt to alloc */ - prev = sp->lru.prev; + prev = sp->slab_list.prev; b = slob_page_alloc(sp, size, align); if (!b) continue; @@ -323,7 +323,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node) spin_lock_irqsave(&slob_lock, flags); sp->units = SLOB_UNITS(PAGE_SIZE); sp->freelist = b; - INIT_LIST_HEAD(&sp->lru); + INIT_LIST_HEAD(&sp->slab_list); set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE)); set_slob_page_free(sp, slob_list); b = slob_page_alloc(sp, size, align); -- 2.21.0
[PATCH v2 1/5] slub: Add comments to endif pre-processor macros
SLUB allocator makes heavy use of ifdef/endif pre-processor macros. The pairing of these statements is at times hard to follow e.g. if the pair are further than a screen apart or if there are nested pairs. We can reduce cognitive load by adding a comment to the endif statement of form #ifdef CONFIG_FOO ... #endif /* CONFIG_FOO */ Add comments to endif pre-processor macros if ifdef/endif pair is not immediately apparent. Reviewed-by: Roman Gushchin Signed-off-by: Tobin C. Harding --- mm/slub.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 1b08fbcb7e61..b282e22885cd 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1951,7 +1951,7 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, } } } while (read_mems_allowed_retry(cpuset_mems_cookie)); -#endif +#endif /* CONFIG_NUMA */ return NULL; } @@ -2249,7 +2249,7 @@ static void unfreeze_partials(struct kmem_cache *s, discard_slab(s, page); stat(s, FREE_SLAB); } -#endif +#endif /* CONFIG_SLUB_CPU_PARTIAL */ } /* @@ -2308,7 +2308,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) local_irq_restore(flags); } preempt_enable(); -#endif +#endif /* CONFIG_SLUB_CPU_PARTIAL */ } static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c) @@ -2813,7 +2813,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s, } EXPORT_SYMBOL(kmem_cache_alloc_node_trace); #endif -#endif +#endif /* CONFIG_NUMA */ /* * Slow path handling. This may still be called frequently since objects @@ -3845,7 +3845,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node) return ret; } EXPORT_SYMBOL(__kmalloc_node); -#endif +#endif /* CONFIG_NUMA */ #ifdef CONFIG_HARDENED_USERCOPY /* @@ -4063,7 +4063,7 @@ void __kmemcg_cache_deactivate(struct kmem_cache *s) */ slab_deactivate_memcg_cache_rcu_sched(s, kmemcg_cache_deact_after_rcu); } -#endif +#endif /* CONFIG_MEMCG */ static int slab_mem_going_offline_callback(void *arg) { @@ -4696,7 +4696,7 @@ static int list_locations(struct kmem_cache *s, char *buf, len += sprintf(buf, "No data\n"); return len; } -#endif +#endif /* CONFIG_SLUB_DEBUG */ #ifdef SLUB_RESILIENCY_TEST static void __init resiliency_test(void) @@ -4756,7 +4756,7 @@ static void __init resiliency_test(void) #ifdef CONFIG_SYSFS static void resiliency_test(void) {}; #endif -#endif +#endif /* SLUB_RESILIENCY_TEST */ #ifdef CONFIG_SYSFS enum slab_stat_type { @@ -5413,7 +5413,7 @@ STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc); STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free); STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node); STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain); -#endif +#endif /* CONFIG_SLUB_STATS */ static struct attribute *slab_attrs[] = { &slab_size_attr.attr, @@ -5614,7 +5614,7 @@ static void memcg_propagate_slab_attrs(struct kmem_cache *s) if (buffer) free_page((unsigned long)buffer); -#endif +#endif /* CONFIG_MEMCG */ } static void kmem_cache_release(struct kobject *k) -- 2.21.0
[PATCH v2 5/5] mm: Remove stale comment from page struct
We now use the slab_list list_head instead of the lru list_head. This comment has become stale. Remove stale comment from page struct slab_list list_head. Signed-off-by: Tobin C. Harding --- include/linux/mm_types.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7eade9132f02..63a34e3d7c29 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -103,7 +103,7 @@ struct page { }; struct {/* slab, slob and slub */ union { - struct list_head slab_list; /* uses lru */ + struct list_head slab_list; struct {/* Partial pages */ struct page *next; #ifdef CONFIG_64BIT -- 2.21.0
[PATCH v2 2/5] slub: Use slab_list instead of lru
Currently we use the page->lru list for maintaining lists of slabs. We have a list_head in the page structure (slab_list) that can be used for this purpose. Doing so makes the code cleaner since we are not overloading the lru list. The slab_list is part of a union within the page struct (included here stripped down): union { struct {/* Page cache and anonymous pages */ struct list_head lru; ... }; struct { dma_addr_t dma_addr; }; struct {/* slab, slob and slub */ union { struct list_head slab_list; struct {/* Partial pages */ struct page *next; int pages; /* Nr of pages left */ int pobjects; /* Approximate count */ }; }; ... Here we see that slab_list and lru are the same bits. We can verify that this change is safe to do by examining the object file produced from slub.c before and after this patch is applied. Steps taken to verify: 1. checkout current tip of Linus' tree commit a667cb7a94d4 ("Merge branch 'akpm' (patches from Andrew)") 2. configure and build (defaults to SLUB allocator) CONFIG_SLUB=y CONFIG_SLUB_DEBUG=y CONFIG_SLUB_DEBUG_ON=y CONFIG_SLUB_STATS=y CONFIG_HAVE_DEBUG_KMEMLEAK=y CONFIG_SLAB_FREELIST_RANDOM=y CONFIG_SLAB_FREELIST_HARDENED=y 3. dissasemble object file `objdump -dr mm/slub.o > before.s 4. apply patch 5. build 6. dissasemble object file `objdump -dr mm/slub.o > after.s 7. diff before.s after.s Use slab_list list_head instead of the lru list_head for maintaining lists of slabs. Reviewed-by: Roman Gushchin Signed-off-by: Tobin C. Harding --- mm/slub.c | 40 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index b282e22885cd..d692b5e0163d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1023,7 +1023,7 @@ static void add_full(struct kmem_cache *s, return; lockdep_assert_held(&n->list_lock); - list_add(&page->lru, &n->full); + list_add(&page->slab_list, &n->full); } static void remove_full(struct kmem_cache *s, struct kmem_cache_node *n, struct page *page) @@ -1032,7 +1032,7 @@ static void remove_full(struct kmem_cache *s, struct kmem_cache_node *n, struct return; lockdep_assert_held(&n->list_lock); - list_del(&page->lru); + list_del(&page->slab_list); } /* Tracking of the number of slabs for debugging purposes */ @@ -1773,9 +1773,9 @@ __add_partial(struct kmem_cache_node *n, struct page *page, int tail) { n->nr_partial++; if (tail == DEACTIVATE_TO_TAIL) - list_add_tail(&page->lru, &n->partial); + list_add_tail(&page->slab_list, &n->partial); else - list_add(&page->lru, &n->partial); + list_add(&page->slab_list, &n->partial); } static inline void add_partial(struct kmem_cache_node *n, @@ -1789,7 +1789,7 @@ static inline void remove_partial(struct kmem_cache_node *n, struct page *page) { lockdep_assert_held(&n->list_lock); - list_del(&page->lru); + list_del(&page->slab_list); n->nr_partial--; } @@ -1863,7 +1863,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, return NULL; spin_lock(&n->list_lock); - list_for_each_entry_safe(page, page2, &n->partial, lru) { + list_for_each_entry_safe(page, page2, &n->partial, slab_list) { void *t; if (!pfmemalloc_match(page, flags)) @@ -2407,7 +2407,7 @@ static unsigned long count_partial(struct kmem_cache_node *n, struct page *page; spin_lock_irqsave(&n->list_lock, flags); - list_for_each_entry(page, &n->partial, lru) + list_for_each_entry(page, &n->partial, slab_list) x += get_count(page); spin_unlock_irqrestore(&n->list_lock, flags); return x; @@ -3702,10 +3702,10 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n) BUG_ON(irqs_disabled()); spin_lock_irq(&n->list_lock); - list_for_each_entry_safe(page, h, &n->partial, lru) { + list_for_each_entry_safe(page, h, &n->partial, slab_list) { if (!page->inuse) { remove_partial(n, page); - list_add(&page->lru, &discard); + list_add(&page->slab_list, &discard); } else { list_slab_objects(s, page, "Objects remaining i
[PATCH v2 0/5]] mm: Use slab_list list_head instead of lru
Currently the slab allocators (ab)use the struct page 'lru' list_head. We have a list head for slab allocators to use, 'slab_list'. Clean up all three allocators by using the 'slab_list' list_head instead of overloading the 'lru' list_head. Patch 1 - Makes no code changes, adds comments to #endif statements. Patches 2,3,4 - Do changes as a patch per allocator, tested by building and booting (in Qemu) after configuring kernel to use appropriate allocator. Also build and boot with debug options enabled (for slab and slub). Verify the object files (before and after the set applied) are the same. Patch 5 - Removes the now stale comment in the page struct definition. Changes since v1: - Verify object files are the same before and after the patch set is applied (suggested by Matthew). - Add extra explanation to the commit logs explaining why these changes are safe to make (suggested by Roman). - Remove stale comment (thanks Willy). thanks, Tobin. Tobin C. Harding (5): slub: Add comments to endif pre-processor macros slub: Use slab_list instead of lru slab: Use slab_list instead of lru slob: Use slab_list instead of lru mm: Remove stale comment from page struct include/linux/mm_types.h | 2 +- mm/slab.c| 49 mm/slob.c| 10 +++ mm/slub.c| 60 4 files changed, 61 insertions(+), 60 deletions(-) -- 2.21.0
lening aanvragen
-- Goedendag, Het is ons een genoegen u te schrijven met betrekking tot het verstrekken van leningen per postadvertentie. Simple Federal Credit Union, We opereren onder een korte, duidelijke en begrijpelijke algemene voorwaarden. We verstrekken leningen tegen een lage rente van 3%. Geachte lezers moeten opmerken dat dit aanbod is voor serieuze individuen, bedrijven en bedrijven. Krijg uw lening om uw financiële problemen op te lossen, zoals rekeningen betalen, nieuwe bedrijven vestigen, oude bedrijven herstellen. geïnteresseerde personen, bedrijven en bedrijven moeten contact met ons opnemen via dit e-mailadres: i...@simplefederalcreditunion.com Laat deze kans niet voorbijgaan. Krijg uw lening om uw financiële problemen op te lossen. Als u geïnteresseerd bent in onze lening vul dan onmiddellijk dit leningsaanvraagformulier in. Jullie namen: Adres: Telefoonnummer: Lening bedrag nodig: Looptijd: Bezetting: Maandelijks inkomensniveau: Geslacht: Geboortedatum: Staat: land: Postcode: Doel: We wachten op je snelle reactie. Eric
[GIT PULL RESEND] pidfd changes for v5.1-rc1
Hi, Thanks for the work on this system call! I am interested in making use of it in my process supervisor. It works pretty well and avoids the long-standing issue of PID reuse. One thing that instantly came to mind is to be able to delegate killing to some third process depending on the confguration. However, I don't see that permissions are attached to the open file description, but seemed to be checked when calling pidfd_send_signal as they are with kill(2). Is there any particular reason this was avoided? For instance, if a process with CAP_KILL opens the procfd, shouldn't any process that uses a descriptor pointing to this same file description be permitted to send signals? It would be a lot more useful that way. There doesn't seem to much benefit of using file descriptors for processes otherwise if cannot use them that way, apart from PID reuse. So, is something like this on the roadmap in the future, and if not, what was the reason it was avoided? I don't see a problem with using CAP_KILL to not check permissions at call time, otherwise I can see why it would be a problem in general (because processes can change credentials). Regards, Jonathon Kowalski
Re: [PATCH] regulator: palmas: Remove *rdev[PALMAS_NUM_REGS] from struct palmas_pmic
On 10/03/19 8:36 PM, Axel Lin wrote: This driver is using devm_regulator_register() so it is not necessary to save *rdev for clean up. Actually the pmic->rdev[id] is not used now. Reviewed-by: Keerthy Signed-off-by: Axel Lin --- drivers/regulator/palmas-regulator.c | 12 include/linux/mfd/palmas.h | 1 - 2 files changed, 13 deletions(-) diff --git a/drivers/regulator/palmas-regulator.c b/drivers/regulator/palmas-regulator.c index 7fb9e8dd834e..f13c7c8b1061 100644 --- a/drivers/regulator/palmas-regulator.c +++ b/drivers/regulator/palmas-regulator.c @@ -991,9 +991,6 @@ static int palmas_ldo_registration(struct palmas_pmic *pmic, return PTR_ERR(rdev); } - /* Save regulator for cleanup */ - pmic->rdev[id] = rdev; - /* Initialise sleep/init values from platform data */ if (pdata) { reg_init = pdata->reg_init[id]; @@ -1101,9 +1098,6 @@ static int tps65917_ldo_registration(struct palmas_pmic *pmic, return PTR_ERR(rdev); } - /* Save regulator for cleanup */ - pmic->rdev[id] = rdev; - /* Initialise sleep/init values from platform data */ if (pdata) { reg_init = pdata->reg_init[id]; @@ -1288,9 +1282,6 @@ static int palmas_smps_registration(struct palmas_pmic *pmic, pdev_name); return PTR_ERR(rdev); } - - /* Save regulator for cleanup */ - pmic->rdev[id] = rdev; } return 0; @@ -1395,9 +1386,6 @@ static int tps65917_smps_registration(struct palmas_pmic *pmic, pdev_name); return PTR_ERR(rdev); } - - /* Save regulator for cleanup */ - pmic->rdev[id] = rdev; } return 0; diff --git a/include/linux/mfd/palmas.h b/include/linux/mfd/palmas.h index 75e5c8ff85fc..c34d5f0d34d7 100644 --- a/include/linux/mfd/palmas.h +++ b/include/linux/mfd/palmas.h @@ -553,7 +553,6 @@ struct palmas_pmic { struct palmas *palmas; struct device *dev; struct regulator_desc desc[PALMAS_NUM_REGS]; - struct regulator_dev *rdev[PALMAS_NUM_REGS]; struct mutex mutex; int smps123;
Re: [PATCH RFT] regulator: lp87565: Fix missing register for LP87565_BUCK_0
On 01/03/19 11:46 AM, Axel Lin wrote: LP87565_BUCK_0 is missed, fix it. Fixes: f0168a9bf ("regulator: lp87565: Add support for lp87565 PMIC regulators") Signed-off-by: Axel Lin --- Hi J Keerthy, While reading the code, it seems strange that LP87565_BUCK_0 is never used. So current code only register 3 BUCKs for lp87565-regulator. Can you confirm if this fix is correct or not? Axel, If you look at the if check later: if (lp87565->dev_type == LP87565_DEVICE_TYPE_LP87565_Q1) { Currently the device that i am using only uses LP87565_BUCK_10 and LP87565_BUCK_23(dual phase). So your patch definitely makes sense when all 4 are used individually. Thanks for catching this. Reviewed-by: Keerthy Thanks, Axel drivers/regulator/lp87565-regulator.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/regulator/lp87565-regulator.c b/drivers/regulator/lp87565-regulator.c index 4ed41731a5b1..0418e478c6dc 100644 --- a/drivers/regulator/lp87565-regulator.c +++ b/drivers/regulator/lp87565-regulator.c @@ -193,7 +193,7 @@ static int lp87565_regulator_probe(struct platform_device *pdev) struct lp87565 *lp87565 = dev_get_drvdata(pdev->dev.parent); struct regulator_config config = { }; struct regulator_dev *rdev; - int i, min_idx = LP87565_BUCK_1, max_idx = LP87565_BUCK_3; + int i, min_idx = LP87565_BUCK_0, max_idx = LP87565_BUCK_3; platform_set_drvdata(pdev, lp87565);
[PATCH v2 1/2] bus: tegra-aconnect: use devm_clk_*() helpers
aconnect bus driver is using pm_clk_*() interface for managing clocks. With this, clocks seem to be always ON. This happens on Tegra devices which use BPMP co-processor to manage clock resources, where clocks are enabled during prepare phase. This is necessary because calls to BPMP are always blocking. When pm_clk_*() interface is used on such Tegra devices, clock prepare count is not balanced till driver remove() gets executed and hence clocks are seen ON always. Thus this patch replaces pm_clk_*() with devm_clk_*() framework. Suggested-by: Mohan Kumar D Reviewed-by: Jonathan Hunter Signed-off-by: Sameer Pujar --- drivers/bus/tegra-aconnect.c | 64 ++-- 1 file changed, 44 insertions(+), 20 deletions(-) diff --git a/drivers/bus/tegra-aconnect.c b/drivers/bus/tegra-aconnect.c index 084ae28..9349157 100644 --- a/drivers/bus/tegra-aconnect.c +++ b/drivers/bus/tegra-aconnect.c @@ -12,28 +12,38 @@ #include #include #include -#include #include +struct tegra_aconnect { + struct clk *ape_clk; + struct clk *apb2ape_clk; +}; + static int tegra_aconnect_probe(struct platform_device *pdev) { - int ret; + struct tegra_aconnect *aconnect; if (!pdev->dev.of_node) return -EINVAL; - ret = pm_clk_create(&pdev->dev); - if (ret) - return ret; + aconnect = devm_kzalloc(&pdev->dev, sizeof(struct tegra_aconnect), + GFP_KERNEL); + if (!aconnect) + return -ENOMEM; - ret = of_pm_clk_add_clk(&pdev->dev, "ape"); - if (ret) - goto clk_destroy; + aconnect->ape_clk = devm_clk_get(&pdev->dev, "ape"); + if (IS_ERR(aconnect->ape_clk)) { + dev_err(&pdev->dev, "Can't retrieve ape clock\n"); + return PTR_ERR(aconnect->ape_clk); + } - ret = of_pm_clk_add_clk(&pdev->dev, "apb2ape"); - if (ret) - goto clk_destroy; + aconnect->apb2ape_clk = devm_clk_get(&pdev->dev, "apb2ape"); + if (IS_ERR(aconnect->apb2ape_clk)) { + dev_err(&pdev->dev, "Can't retrieve apb2ape clock\n"); + return PTR_ERR(aconnect->apb2ape_clk); + } + dev_set_drvdata(&pdev->dev, aconnect); pm_runtime_enable(&pdev->dev); of_platform_populate(pdev->dev.of_node, NULL, NULL, &pdev->dev); @@ -41,30 +51,44 @@ static int tegra_aconnect_probe(struct platform_device *pdev) dev_info(&pdev->dev, "Tegra ACONNECT bus registered\n"); return 0; - -clk_destroy: - pm_clk_destroy(&pdev->dev); - - return ret; } static int tegra_aconnect_remove(struct platform_device *pdev) { pm_runtime_disable(&pdev->dev); - pm_clk_destroy(&pdev->dev); - return 0; } static int tegra_aconnect_runtime_resume(struct device *dev) { - return pm_clk_resume(dev); + struct tegra_aconnect *aconnect = dev_get_drvdata(dev); + int ret; + + ret = clk_prepare_enable(aconnect->ape_clk); + if (ret) { + dev_err(dev, "ape clk_enable failed: %d\n", ret); + return ret; + } + + ret = clk_prepare_enable(aconnect->apb2ape_clk); + if (ret) { + clk_disable_unprepare(aconnect->ape_clk); + dev_err(dev, "apb2ape clk_enable failed: %d\n", ret); + return ret; + } + + return 0; } static int tegra_aconnect_runtime_suspend(struct device *dev) { - return pm_clk_suspend(dev); + struct tegra_aconnect *aconnect = dev_get_drvdata(dev); + + clk_disable_unprepare(aconnect->ape_clk); + clk_disable_unprepare(aconnect->apb2ape_clk); + + return 0; } static const struct dev_pm_ops tegra_aconnect_pm_ops = { -- 2.7.4
[PATCH v2 2/2] bus: tegra-aconnect: add system sleep callbacks
pm_runtime_force_suspend() and pm_runtime_force_resume() are used as system sleep noirq suspend and resume callbacks. If the driver is active till late suspend, where runtime PM cannot run, force suspend is essential for the device. This makes sure that the device is put into low power state during system wide PM transitions to sleep states. Signed-off-by: Sameer Pujar --- drivers/bus/tegra-aconnect.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/bus/tegra-aconnect.c b/drivers/bus/tegra-aconnect.c index 9349157..ac58142 100644 --- a/drivers/bus/tegra-aconnect.c +++ b/drivers/bus/tegra-aconnect.c @@ -94,6 +94,8 @@ static int tegra_aconnect_runtime_suspend(struct device *dev) static const struct dev_pm_ops tegra_aconnect_pm_ops = { SET_RUNTIME_PM_OPS(tegra_aconnect_runtime_suspend, tegra_aconnect_runtime_resume, NULL) + SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, + pm_runtime_force_resume) }; static const struct of_device_id tegra_aconnect_of_match[] = { -- 2.7.4
[PATCHv2] x86/boot/KASLR: skip the specified crashkernel reserved region
crashkernel=x@y option may fail to reserve the required memory region if KASLR puts kernel into the region. To avoid this uncertainty, making KASLR skip the required region. Signed-off-by: Pingfan Liu Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Baoquan He Cc: Will Deacon Cc: Nicolas Pitre Cc: Pingfan Liu Cc: Chao Fan Cc: "Kirill A. Shutemov" Cc: Ard Biesheuvel Cc: linux-kernel@vger.kernel.org --- v1 -> v2: fix some trival format arch/x86/boot/compressed/kaslr.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index 9ed9709..e185318 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -109,6 +109,7 @@ enum mem_avoid_index { MEM_AVOID_BOOTPARAMS, MEM_AVOID_MEMMAP_BEGIN, MEM_AVOID_MEMMAP_END = MEM_AVOID_MEMMAP_BEGIN + MAX_MEMMAP_REGIONS - 1, + MEM_AVOID_CRASHKERNEL, MEM_AVOID_MAX, }; @@ -240,6 +241,25 @@ static void parse_gb_huge_pages(char *param, char *val) } } +/* parse crashkernel=x@y option */ +static void mem_avoid_crashkernel_simple(char *option) +{ + unsigned long long crash_size, crash_base; + char *cur = option; + + crash_size = memparse(option, &cur); + if (option == cur) + return; + + if (*cur == '@') { + option = cur + 1; + crash_base = memparse(option, &cur); + if (option == cur) + return; + mem_avoid[MEM_AVOID_CRASHKERNEL].start = crash_base; + mem_avoid[MEM_AVOID_CRASHKERNEL].size = crash_size; + } +} static void handle_mem_options(void) { @@ -250,7 +270,7 @@ static void handle_mem_options(void) u64 mem_size; if (!strstr(args, "memmap=") && !strstr(args, "mem=") && - !strstr(args, "hugepages")) + !strstr(args, "hugepages") && !strstr(args, "crashkernel=")) return; tmp_cmdline = malloc(len + 1); @@ -286,6 +306,8 @@ static void handle_mem_options(void) goto out; mem_limit = mem_size; + } else if (strstr(param, "crashkernel")) { + mem_avoid_crashkernel_simple(val); } } @@ -414,7 +436,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size, /* We don't need to set a mapping for setup_data. */ - /* Mark the memmap regions we need to avoid */ + /* Mark the regions we need to avoid */ handle_mem_options(); #ifdef CONFIG_X86_VERBOSE_BOOTUP -- 2.7.4
Re: [PATCH 4.19 000/149] 4.19.29-stable review
On Tue, 12 Mar 2019 at 22:44, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.19.29 release. > There are 149 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Thu Mar 14 17:02:30 UTC 2019. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.29-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.19.y > and the diffstat can be found below. > > thanks, > > greg k-h > Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Summary kernel: 4.19.29-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.19.y git commit: 6d884178544e4eca1bd21c4626e16346f7e5aa9e git describe: v4.19.28-150-g6d884178544e Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.28-150-g6d884178544e No regressions (compared to build v4.19.28) No fixes (compared to build v4.19.28) Ran 23131 total tests in the following environments and test suites. Environments -- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64 Test Suites --- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * spectre-meltdown-checker-test * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none -- Linaro LKFT https://lkft.linaro.org
Re: [PATCH 4.14 000/135] 4.14.106-stable review
On Tue, 12 Mar 2019 at 22:47, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.14.106 release. > There are 135 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Thu Mar 14 17:02:27 UTC 2019. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.106-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.14.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Summary kernel: 4.14.106-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: f881675936ef263f178eab5a5cf95bc86089cf20 git describe: v4.14.105-136-gf881675936ef Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.105-136-gf881675936ef No regressions (compared to build v4.14.105) No fixes (compared to build v4.14.105) Ran 22869 total tests in the following environments and test suites. Environments -- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64 Test Suites --- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * spectre-meltdown-checker-test * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none -- Linaro LKFT https://lkft.linaro.org
Re: [PATCH 4.9 00/96] 4.9.163-stable review
On Tue, 12 Mar 2019 at 22:48, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.9.163 release. > There are 96 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Thu Mar 14 17:10:06 UTC 2019. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.163-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.9.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Summary kernel: 4.9.163-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.9.y git commit: 605129cbbd389949c754d0a23ce49030adae8f17 git describe: v4.9.162-97-g605129cbbd38 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.162-97-g605129cbbd38 No regressions (compared to build v4.9.162) No fixes (compared to build v4.9.162) Ran 22654 total tests in the following environments and test suites. Environments -- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64 Test Suites --- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * spectre-meltdown-checker-test * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none -- Linaro LKFT https://lkft.linaro.org
Re: [PATCH] ARM: dts: imx7s-warp: PMIC swbst boot-on/always-on
On Sat, Mar 2, 2019 at 7:33 PM Pierre-Jean Texier wrote: > > PMIC swbst regulator is used for the MikroBUS socket (pin +5V). > > We have to set the regulator to "boot-on" and "always-on" > to output a voltage of 5V on this socket. > > Signed-off-by: Pierre-Jean Texier Reviewed-by: Fabio Estevam
Re: [PATCH] svm: Fix AVIC incomplete IPI emulation
Oren, On 3/11/19 6:38 PM, Suthikulpanit, Suravee wrote: > However, looking a bit more closely, I notice the logic in > svm_deliver_avic_intr() > should also have been changed from kvm_vcpu_wake_up() to kvm_vcpu_kick() > since the latter will result in clearing the IRR bit for the IPI vector > when trying to send IPI as part of the following call path. > > vcpu_enter_guest() > |-- inject_pending_event() > |-- kvm_cpu_get_interrupt() > |-- kvm_get_apic_interrupt() > |-- apic_clear_irr() > |-- apic_set_isr() > |-- apic_update_ppr() > > Please see the patch below. > > Not sure if this would address the problem you are seeing. > > Thanks, > Suravee > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index 24dfa6a93711..d2841c3dbc04 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -5219,11 +5256,13 @@ static void svm_deliver_avic_intr(struct kvm_vcpu > *vcpu, int vec) > kvm_lapic_set_irr(vec, vcpu->arch.apic); > smp_mb__after_atomic(); > > - if (avic_vcpu_is_running(vcpu)) > + if (avic_vcpu_is_running(vcpu)) { > wrmsrl(SVM_AVIC_DOORBELL, > kvm_cpu_get_apicid(vcpu->cpu)); > - else > - kvm_vcpu_wake_up(vcpu); > + } else { > + kvm_make_request(KVM_REQ_EVENT, vcpu); > + kvm_vcpu_kick(vcpu); > + } >} > >static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data > *pi) Please ignore the part mentioned above. The current implementation should already be fine. Suravee
Backport Set the CPB bit unconditionally on F17h to 4.14 branch
Request to backport 0237199186e7a4aa5310741f0a6498a20c820fd7 to 4.14 branch Thanks! Alec
Re: [PATCH V1 07/11] mmc: cqhci: add quirk for setting DCMD CMD_TIMING
On 3/7/2019 11:46 PM, Sowjanya Komatineni wrote: On 3/6/2019 6:30 PM, Adrian Hunter wrote: On 2/03/19 7:20 AM, Sowjanya Komatineni wrote: This patch adds a quirk for setting CMD_TIMING to 1 in descriptor for DCMD with R1B response type to allow the command to be sent to device during data activity or busy time. Tegra186 CQHCI host has bug where it selects DATA_PRESENT_SELECT to 1 by CQHCI controller for DCMDs with R1B response type and since DCMD does not trigger any data transfer, DCMD task complete happens leaving the DATA FSM of host controller in wait state for data. This effects the data transfer task issued after R1B DCMD task and no interrupt is generated for the data transfer task. SW WAR for this issue is to set CMD_TIMING bit to 1 in DCMD task descriptor and as DCMD task descriptor preparation is done by cqhci driver, this patch adds cqequirk to handle this. Signed-off-by: Sowjanya Komatineni --- drivers/mmc/host/cqhci.c | 5 - drivers/mmc/host/cqhci.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/cqhci.c b/drivers/mmc/host/cqhci.c index a8af682a9182..b34c07125f32 100644 --- a/drivers/mmc/host/cqhci.c +++ b/drivers/mmc/host/cqhci.c @@ -521,7 +521,10 @@ static void cqhci_prep_dcmd_desc(struct mmc_host *mmc, } else { if (mrq->cmd->flags & MMC_RSP_R1B) { resp_type = 0x3; - timing = 0x0; + if (cq_host->quirks & CQHCI_QUIRK_CMD_TIMING_R1B_DCMD) + timing = 0x1; + else + timing = 0x0; I was thinking it would be nice if there was a generic way for drivers to make changes to descriptors before a task is started. Currently there is host->ops->write_l() which would make it possible by checking for host->ops->CQHCI_TDBR register and, in this case, the DCMD tag. We would need to export get_desc(), perhaps rename it cqhci_get_desc() and put it in cqhci.h since it is an inline function. We take spin_lock_irqsave after the descriptor is prepared and before writing to TDBR. Not sure but tomorrow this may become a limitation for drivers to make changes to descriptors if they edit descriptors in host->ops->write_l() call. Though in this case it is not required here. Alternatively we could add host->ops for descriptor preparation. Both ways sounds good to me. But maybe adding a host->ops for descriptor preparation is better way to go, since that will be the right interface exposed to make changes to descriptors. DCMD descriptor attributes remain same for any host and also task parameters as QBR need to be enabled with DCMD. So I believe it should be ok if we just add callback to allow hosts to update command parameters of DCMD descriptor only thru cqhci_host_ops. For now we can add host->ops as update_dcmd_desc and pass the task_desc of DCMD for updating any params which host may want to update. Also, don’t see any requirement for host specific Task parameter updates in Task descriptor so not sure if there is any need to provide callback for task descriptor data preparation to hosts. Please confirm. Sure, for now the requirement has come up only for DCMD desc update. Sure we can add task descriptor ops in the similar way later when required. Adrian, please confirm if you are fine with both of above? What do people think? } else { resp_type = 0x2; timing = 0x1; diff --git a/drivers/mmc/host/cqhci.h b/drivers/mmc/host/cqhci.h index 9e68286a07b4..f96d8565cc07 100644 --- a/drivers/mmc/host/cqhci.h +++ b/drivers/mmc/host/cqhci.h @@ -170,6 +170,7 @@ struct cqhci_host { u32 quirks; #define CQHCI_QUIRK_SHORT_TXFR_DESC_SZ 0x1 +#define CQHCI_QUIRK_CMD_TIMING_R1B_DCMD0x2 bool enabled; bool halted;
Re: [RFC PATCH v1 00/25] printk: new implementation
On (03/12/19 16:15), John Ogness wrote: > > I suggest the following way forward (separate patchsets): > > > > 1. Replace log buffer (least controversial thing) > > Yes. I will post a series that only implements the ringbuffer using your > simplified API. That will be enough to remove printk_safe and actually > does most of the work of updating devkmsg, kmsg_dump, and syslog. This may _not_ be enough to remove printk_safe. One of the reasons printk_safe "condom" came into existence was console_sem (which is a bit too important to ignore it): printk() console_trylock() console_unlock() up() raw_spin_lock_irqsave(&sem->lock, flags) __up() wake_up_process() WARN/etc printk() console_trylock() down_trylock() raw_spin_lock_irqsave(&sem->lock, flags) << deadlock Back then we were looking at printk->console_sem->lock->printk->console_sem->lock deadlock report from LG, if I'm not mistaken. -ss
Re: [RFC PATCH v1 00/25] printk: new implementation
On (03/12/19 13:38), Petr Mladek wrote: > > Hmm. OK. So one of the things with printk is that it's fully sequential. > > We call console drivers one by one. Slow consoles can affect what appears > > on the fast consoles; fast console have no impact on slow ones. > > > > call_console_drivers() > > for_each_console(c) > > c->write(c, text, text_len); > > > > So a list of (slow_serial serial netcon) console drivers is a camel train; > > fast netcon is not fast anymore, and slow consoles sometimes are the reason > > we have dropped messages. And if we drop messages we drop them for all > > consoles, including fast netcon. Turning that sequential pipline into a > > bunch of per-console kthreads/irq and letting fast consoles to be fast is > > not a completely bad thing. Let's think more about this, I'd like to read > > more opinions. > > Per-console kthread sounds interesting but there is the problem with > reliability. I mean that kthread need not get scheduled. Correct, it has to get scheduled. From that point of view IRQ offloading looks better - either to irq_work (like John suggested) or to serial drivers' irq handler (poll uart xmit + logbuf). kthread offloading is not super reliable. That's why I played tricks with CPU affinity - scheduler sometimes schedule printk_kthread on the same CPU which spins in console_unlock() loop printing the messages, so printk_kthread offloading never happens. It was first discovered by Jan Kara (back in the days of async-printk patch set). I think at some point Jan's async-printk patch set had two printk kthreads. We also had some concerns regarding offloading on UP systems. > Some of these problems might get solved by the per-console loglevel > patchset. Yes, some. > Sigh, any feature might be useful in some situation. But we always > have to consider the cost and the gain. I wonder how common is > to actively use two consoles at the same time and what would > be the motivation. Facebook fleet for example. The motivation is - to have a fancy fast console that does things which simple serial consoles cannot do and a slow serial console, which is sometimes more reliable, as last resort. Fancy stuff usually means dependencies - net, mm, etc. So when fancy console stop working, slow serial console still does. -ss
[PATCH] mm/hotplug: fix offline undo_isolate_page_range()
The commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") introduced move_pfn_range_to_zone() which calls memmap_init_zone() during onlining a memory block. memmap_init_zone() will reset pagetype flags and makes migrate type to be MOVABLE. However, in __offline_pages(), it also call undo_isolate_page_range() after offline_isolated_pages() to do the same thing. Due to the commit 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed __first_valid_page() to skip offline pages, undo_isolate_page_range() here just waste CPU cycles looping around the offlining PFN range while doing nothing, because __first_valid_page() will return NULL as offline_isolated_pages() has already marked all memory sections within the pfn range as offline via offline_mem_sections(). Also, after calling the "useless" undo_isolate_page_range() here, it reaches the point of no returning by notifying MEM_OFFLINE. Those pages will be marked as MIGRATE_MOVABLE again once onlining. The only thing left to do is to decrease the number of isolated pageblocks zone counter which would make some paths of the page allocation slower that the above commit introduced. Fix an incorrect comment along the way. Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") Signed-off-by: Qian Cai --- mm/memory_hotplug.c | 18 -- mm/sparse.c | 2 +- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index cd23c081924d..260a8e943483 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1661,8 +1661,22 @@ static int __ref __offline_pages(unsigned long start_pfn, /* Ok, all of our target is isolated. We cannot do rollback at this point. */ offline_isolated_pages(start_pfn, end_pfn); - /* reset pagetype flags and makes migrate type to be MOVABLE */ - undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + + /* +* Onlining will reset pagetype flags and makes migrate type +* MOVABLE, so just need to decrease the number of isolated +* pageblocks zone counter here. +*/ + for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { + int i; + + for (i = 0; i < pageblock_nr_pages; i++) + if (pfn_valid_within(pfn + i)) { + zone->nr_isolate_pageblock--; + break; + } + } + /* removal success */ adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages); zone->present_pages -= offlined_pages; diff --git a/mm/sparse.c b/mm/sparse.c index 69904aa6165b..56e057c432f9 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -567,7 +567,7 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn) } #ifdef CONFIG_MEMORY_HOTREMOVE -/* Mark all memory sections within the pfn range as online */ +/* Mark all memory sections within the pfn range as offline */ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn; -- 2.17.2 (Apple Git-113)
Re: [PATCH 00/10] HMM updates for 5.1
Andrew you will not be pushing this patchset in 5.1 ? Cheers, Jérôme
Re: [PATCH 4/4] MIPS: Loongson32: dts: add ls1b & ls1c
Hi Rob, Thanks for your reply, I have some questions on that: 在 2019/3/12 下午8:28, Rob Herring 写道: On Tue, Mar 12, 2019 at 4:16 AM Jiaxun Yang wrote: Add devicetree skeleton for ls1b and ls1c Signed-off-by: Jiaxun Yang --- +/ { + model = "Loongson LS1B"; + compatible = "loongson,ls1b"; Documented? Should I document the vendor string or whole "loongson,ls1b"? + +}; + +&ehci0 { + status = "okay"; +}; + +&ohci0 { + status = "okay"; +}; \ No newline at end of file Fix this. diff --git a/arch/mips/boot/dts/loongson/ls1c.dts b/arch/mips/boot/dts/loongson/ls1c.dts new file mode 100644 index ..778d205a586e --- /dev/null +++ b/arch/mips/boot/dts/loongson/ls1c.dts @@ -0,0 +1,25 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2019 Jiaxun Yang + */ + +/dts-v1/; +#include + +/ { + model = "Loongson LS1C300A"; + compatible = "loongson,ls1c300a"; + +}; + +&platintc4 { + status = "okay"; +}; + +&ehci0 { + status = "okay"; +}; + +&ohci0 { + status = "okay"; +}; \ No newline at end of file diff --git a/arch/mips/boot/dts/loongson/ls1x.dtsi b/arch/mips/boot/dts/loongson/ls1x.dtsi new file mode 100644 index ..f808e4328fd8 --- /dev/null +++ b/arch/mips/boot/dts/loongson/ls1x.dtsi @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2019 Jiaxun Yang + */ + +/dts-v1/; +#include + + +/ { +#address-cells = <1>; + #size-cells = <1>; + + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + device_type = "cpu"; + reg = <0>; Needs a (documented) compatible string. + }; + + ehci0: usb@1fe2 { + compatible = "generic-ehci"; It would be better to add a chip specific compatible here. Most all USB controllers have some quirks. Should it be documented? + reg = <0x1fe2 0x100>; + interrupt-parent = <&platintc1>; + interrupts = <0 IRQ_TYPE_LEVEL_HIGH>; + + status = "disabled"; + }; + + ohci0: usb@1fe28000 { + compatible = "generic-ohci"; + reg = <0x1fe28000 0x100>; + interrupt-parent = <&platintc1>; + interrupts = <1 IRQ_TYPE_LEVEL_HIGH>; + + status = "disabled"; + }; Don't you need a serial port or something for a console? serial port is currently added by legacy pdev code. I'm going to add it to devicetree after rework on clk driver being sent out. Thanks. -- Jiaxun Yang
Re: [PATCH 09/10] mm/hmm: allow to mirror vma of a file on a DAX backed filesystem
On Tue, Mar 12, 2019 at 1:34 PM Dave Chinner wrote: > > On Tue, Mar 12, 2019 at 12:30:52PM -0700, Dan Williams wrote: > > On Tue, Mar 12, 2019 at 12:06 PM Jerome Glisse wrote: > > > On Tue, Mar 12, 2019 at 09:06:12AM -0700, Dan Williams wrote: > > > > On Tue, Mar 12, 2019 at 8:26 AM Jerome Glisse > > > > wrote: > > [..] > > > > > Spirit of the rule is better than blind application of rule. > > > > > > > > Again, I fail to see why HMM is suddenly unable to make forward > > > > progress when the infrastructure that came before it was merged with > > > > consumers in the same development cycle. > > > > > > > > A gate to upstream merge is about the only lever a reviewer has to > > > > push for change, and these requests to uncouple the consumer only > > > > serve to weaken that review tool in my mind. > > > > > > Well let just agree to disagree and leave it at that and stop > > > wasting each other time > > > > I'm fine to continue this discussion if you are. Please be specific > > about where we disagree and what aspect of the proposed rules about > > merge staging are either acceptable, painful-but-doable, or > > show-stoppers. Do you agree that HMM is doing something novel with > > merge staging, am I off base there? I expect I can find folks that > > would balk with even a one cycle deferment of consumers, but can we > > start with that concession and see how it goes? I'm missing where I've > > proposed something that is untenable for the future of HMM which is > > addressing some real needs in gaps in the kernel's support for new > > hardware. > > /me quietly wonders why the hmm infrastructure can't be staged in a > maintainer tree development branch on a kernel.org and then > all merged in one go when that branch has both infrastructure and > drivers merged into it... > > i.e. everyone doing hmm driver work gets the infrastructure from the > dev tree, not mainline. That's a pretty standard procedure for > developing complex features, and it avoids all the issues being > argued over right now... True, but I wasn't considering that because the mm tree does not do stable topic branches. This kind of staging seems not amenable to a quilt workflow and it needs to keep pace with the rest of mm.
Re: [PATCH 09/10] mm/hmm: allow to mirror vma of a file on a DAX backed filesystem
On Tue, Mar 12, 2019 at 05:46:51PM -0700, Dan Williams wrote: > On Tue, Mar 12, 2019 at 5:10 PM Jerome Glisse wrote: > > > > On Tue, Mar 12, 2019 at 02:52:14PM -0700, Andrew Morton wrote: > > > On Tue, 12 Mar 2019 12:30:52 -0700 Dan Williams > > > wrote: > > > > > > > On Tue, Mar 12, 2019 at 12:06 PM Jerome Glisse > > > > wrote: > > > > > On Tue, Mar 12, 2019 at 09:06:12AM -0700, Dan Williams wrote: > > > > > > On Tue, Mar 12, 2019 at 8:26 AM Jerome Glisse > > > > > > wrote: > > > > [..] > > > > > > > Spirit of the rule is better than blind application of rule. > > > > > > > > > > > > Again, I fail to see why HMM is suddenly unable to make forward > > > > > > progress when the infrastructure that came before it was merged with > > > > > > consumers in the same development cycle. > > > > > > > > > > > > A gate to upstream merge is about the only lever a reviewer has to > > > > > > push for change, and these requests to uncouple the consumer only > > > > > > serve to weaken that review tool in my mind. > > > > > > > > > > Well let just agree to disagree and leave it at that and stop > > > > > wasting each other time > > > > > > > > I'm fine to continue this discussion if you are. Please be specific > > > > about where we disagree and what aspect of the proposed rules about > > > > merge staging are either acceptable, painful-but-doable, or > > > > show-stoppers. Do you agree that HMM is doing something novel with > > > > merge staging, am I off base there? > > > > > > You're correct. We chose to go this way because the HMM code is so > > > large and all-over-the-place that developing it in a standalone tree > > > seemed impractical - better to feed it into mainline piecewise. > > > > > > This decision very much assumed that HMM users would definitely be > > > merged, and that it would happen soon. I was skeptical for a long time > > > and was eventually persuaded by quite a few conversations with various > > > architecture and driver maintainers indicating that these HMM users > > > would be forthcoming. > > > > > > In retrospect, the arrival of HMM clients took quite a lot longer than > > > was anticipated and I'm not sure that all of the anticipated usage > > > sites will actually be using it. I wish I'd kept records of > > > who-said-what, but I didn't and the info is now all rather dissipated. > > > > > > So the plan didn't really work out as hoped. Lesson learned, I would > > > now very much prefer that new HMM feature work's changelogs include > > > links to the driver patchsets which will be using those features and > > > acks and review input from the developers of those driver patchsets. > > > > This is what i am doing now and this patchset falls into that. I did > > post the ODP and nouveau bits to use the 2 new functions (dma map and > > unmap). I expect to merge both ODP and nouveau bits for that during > > the next merge window. > > > > Also with 5.1 everything that is upstream is use by nouveau at least. > > They are posted patches to use HMM for AMD, Intel, Radeon, ODP, PPC. > > Some are going through several revisions so i do not know exactly when > > each will make it upstream but i keep working on all this. > > > > So the guideline we agree on: > > - no new infrastructure without user > > - device driver maintainer for which new infrastructure is done > > must either sign off or review of explicitly say that they want > > the feature I do not expect all driver maintainer will have > > the bandwidth to do proper review of the mm part of the infra- > > structure and it would not be fair to ask that from them. They > > can still provide feedback on the API expose to the device > > driver. > > - driver bits must be posted at the same time as the new infra- > > structure even if they target the next release cycle to avoid > > inter-tree dependency > > - driver bits must be merge as soon as possible > > What about EXPORT_SYMBOL_GPL? I explained why i do not see value in changing export, but i will not oppose that change either. > > Thing we do not agree on: > > - If driver bits miss for any reason the +1 target directly > > revert the new infra-structure. I think it should not be black > > and white and the reasons why the driver bit missed the merge > > window should be taken into account. If the feature is still > > wanted and the driver bits missed the window for simple reasons > > then it means that we push everything by 2 release ie the > > revert is done in +1 then we reupload the infra-structure in > > +2 and finaly repush the driver bit in +3 so we loose 1 cycle. > > I think that pain is reasonable. > > > Hence why i would rather that the revert would only happen if > > it is clear that the infrastructure is not ready or can not > > be use in timely (over couple kernel release) fashion by any > > drivers. > > This seems too generous to me,
Re: [PATCH 09/10] mm/hmm: allow to mirror vma of a file on a DAX backed filesystem
On Tue, Mar 12, 2019 at 5:10 PM Jerome Glisse wrote: > > On Tue, Mar 12, 2019 at 02:52:14PM -0700, Andrew Morton wrote: > > On Tue, 12 Mar 2019 12:30:52 -0700 Dan Williams > > wrote: > > > > > On Tue, Mar 12, 2019 at 12:06 PM Jerome Glisse wrote: > > > > On Tue, Mar 12, 2019 at 09:06:12AM -0700, Dan Williams wrote: > > > > > On Tue, Mar 12, 2019 at 8:26 AM Jerome Glisse > > > > > wrote: > > > [..] > > > > > > Spirit of the rule is better than blind application of rule. > > > > > > > > > > Again, I fail to see why HMM is suddenly unable to make forward > > > > > progress when the infrastructure that came before it was merged with > > > > > consumers in the same development cycle. > > > > > > > > > > A gate to upstream merge is about the only lever a reviewer has to > > > > > push for change, and these requests to uncouple the consumer only > > > > > serve to weaken that review tool in my mind. > > > > > > > > Well let just agree to disagree and leave it at that and stop > > > > wasting each other time > > > > > > I'm fine to continue this discussion if you are. Please be specific > > > about where we disagree and what aspect of the proposed rules about > > > merge staging are either acceptable, painful-but-doable, or > > > show-stoppers. Do you agree that HMM is doing something novel with > > > merge staging, am I off base there? > > > > You're correct. We chose to go this way because the HMM code is so > > large and all-over-the-place that developing it in a standalone tree > > seemed impractical - better to feed it into mainline piecewise. > > > > This decision very much assumed that HMM users would definitely be > > merged, and that it would happen soon. I was skeptical for a long time > > and was eventually persuaded by quite a few conversations with various > > architecture and driver maintainers indicating that these HMM users > > would be forthcoming. > > > > In retrospect, the arrival of HMM clients took quite a lot longer than > > was anticipated and I'm not sure that all of the anticipated usage > > sites will actually be using it. I wish I'd kept records of > > who-said-what, but I didn't and the info is now all rather dissipated. > > > > So the plan didn't really work out as hoped. Lesson learned, I would > > now very much prefer that new HMM feature work's changelogs include > > links to the driver patchsets which will be using those features and > > acks and review input from the developers of those driver patchsets. > > This is what i am doing now and this patchset falls into that. I did > post the ODP and nouveau bits to use the 2 new functions (dma map and > unmap). I expect to merge both ODP and nouveau bits for that during > the next merge window. > > Also with 5.1 everything that is upstream is use by nouveau at least. > They are posted patches to use HMM for AMD, Intel, Radeon, ODP, PPC. > Some are going through several revisions so i do not know exactly when > each will make it upstream but i keep working on all this. > > So the guideline we agree on: > - no new infrastructure without user > - device driver maintainer for which new infrastructure is done > must either sign off or review of explicitly say that they want > the feature I do not expect all driver maintainer will have > the bandwidth to do proper review of the mm part of the infra- > structure and it would not be fair to ask that from them. They > can still provide feedback on the API expose to the device > driver. > - driver bits must be posted at the same time as the new infra- > structure even if they target the next release cycle to avoid > inter-tree dependency > - driver bits must be merge as soon as possible What about EXPORT_SYMBOL_GPL? > > Thing we do not agree on: > - If driver bits miss for any reason the +1 target directly > revert the new infra-structure. I think it should not be black > and white and the reasons why the driver bit missed the merge > window should be taken into account. If the feature is still > wanted and the driver bits missed the window for simple reasons > then it means that we push everything by 2 release ie the > revert is done in +1 then we reupload the infra-structure in > +2 and finaly repush the driver bit in +3 so we loose 1 cycle. I think that pain is reasonable. > Hence why i would rather that the revert would only happen if > it is clear that the infrastructure is not ready or can not > be use in timely (over couple kernel release) fashion by any > drivers. This seems too generous to me, but in the interest of moving this discussion forward let's cross that bridge if/when it happens. Hopefully the threat of this debate recurring means consumers put in the due diligence to get things merged at infrastructure + 1 time.
Re: [PATCH v3 1/1] mm: introduce put_user_page*(), placeholder versions
On 3/12/19 8:30 AM, Ira Weiny wrote: On Wed, Mar 06, 2019 at 03:54:55PM -0800, john.hubb...@gmail.com wrote: From: John Hubbard Introduces put_user_page(), which simply calls put_page(). This provides a way to update all get_user_pages*() callers, so that they call put_user_page(), instead of put_page(). So I've been running with these patches for a while but today while ramping up my testing I hit the following: [ 1355.557819] [ cut here ] [ 1355.563436] get_user_pages pin count overflowed Hi Ira, Thanks for reporting this. That overflow, at face value, means that we've used more than the 22 bits worth of gup pin counts, so about 4 million pins of the same page... [ 1355.563446] WARNING: CPU: 1 PID: 1740 at mm/gup.c:73 get_gup_pin_page+0xa5/0xb0 [ 1355.577391] Modules linked in: ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transpo rt_srp ext4 mbcache jbd2 mlx4_ib opa_vnic rpcrdma sunrpc rdma_ucm ib_iser rdma_cm ib_umad iw_cm libiscs i ib_ipoib scsi_transport_iscsi ib_cm sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbyp ass snd_hda_codec_realtek ib_uverbs snd_hda_codec_generic crct10dif_pclmul ledtrig_audio snd_hda_intel crc32_pclmul snd_hda_codec snd_hda_core ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel crypto_simd s nd_timer ib_core cryptd snd glue_helper dax_pmem soundcore nd_pmem ipmi_si device_dax nd_btt ioatdma nd _e820 ipmi_devintf ipmi_msghandler iTCO_wdt i2c_i801 iTCO_vendor_support libnvdimm pcspkr lpc_ich mei_m e mei mfd_core wmi pcc_cpufreq acpi_cpufreq sch_fq_codel xfs libcrc32c mlx4_en sr_mod cdrom sd_mod mgag 200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlx4_core ttm crc32c_intel igb isci ah ci dca libsas firewire_ohci drm i2c_algo_bit libahci scsi_transport_sas [ 1355.577429] firewire_core crc_itu_t i2c_core libata dm_mod [last unloaded: rdmavt] [ 1355.686703] CPU: 1 PID: 1740 Comm: reg-mr Not tainted 5.0.0+ #10 [ 1355.693851] Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.02.04.0003.1023201411 38 10/23/2014 [ 1355.705750] RIP: 0010:get_gup_pin_page+0xa5/0xb0 [ 1355.711348] Code: e8 40 02 ff ff 80 3d ba a2 fb 00 00 b8 b5 ff ff ff 75 bb 48 c7 c7 48 0a e9 81 89 4 4 24 04 c6 05 a1 a2 fb 00 01 e8 35 63 e8 ff <0f> 0b 8b 44 24 04 eb 9c 0f 1f 00 66 66 66 66 90 41 57 49 bf 00 00 [ 1355.733244] RSP: 0018:c90005a23b30 EFLAGS: 00010286 [ 1355.739536] RAX: RBX: ea001422 RCX: [ 1355.748005] RDX: 0003 RSI: 827d94a3 RDI: 0246 [ 1355.756453] RBP: ea001422 R08: 0002 R09: 00022400 [ 1355.764907] R10: 0009ccf0ad0c4203 R11: 0001 R12: 00010207 [ 1355.773369] R13: 8884130b7040 R14: fff00fff R15: 000fffe0 [ 1355.781836] FS: 7f2680d0d740() GS:88842e84() knlGS: [ 1355.791384] CS: 0010 DS: ES: CR0: 80050033 [ 1355.798319] CR2: 00589000 CR3: 00040b05e004 CR4: 000606e0 [ 1355.806809] Call Trace: [ 1355.810078] follow_page_pte+0x4f3/0x5c0 [ 1355.814987] __get_user_pages+0x1eb/0x730 [ 1355.820020] get_user_pages+0x3e/0x50 [ 1355.824657] ib_umem_get+0x283/0x500 [ib_uverbs] [ 1355.830340] ? _cond_resched+0x15/0x30 [ 1355.835065] mlx4_ib_reg_user_mr+0x75/0x1e0 [mlx4_ib] [ 1355.841235] ib_uverbs_reg_mr+0x10c/0x220 [ib_uverbs] [ 1355.847400] ib_uverbs_write+0x2f9/0x4d0 [ib_uverbs] [ 1355.853473] __vfs_write+0x36/0x1b0 [ 1355.857904] ? selinux_file_permission+0xf0/0x130 [ 1355.863702] ? security_file_permission+0x2e/0xe0 [ 1355.869503] vfs_write+0xa5/0x1a0 [ 1355.873751] ksys_write+0x4f/0xb0 [ 1355.878009] do_syscall_64+0x5b/0x180 [ 1355.882656] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1355.62] RIP: 0033:0x7f2680ec3ed8 [ 1355.893420] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 45 78 0 d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55 [ 1355.915573] RSP: 002b:7ffe65d50bc8 EFLAGS: 0246 ORIG_RAX: 0001 [ 1355.924621] RAX: ffda RBX: 7ffe65d50c74 RCX: 7f2680ec3ed8 [ 1355.933195] RDX: 0030 RSI: 7ffe65d50c80 RDI: 0003 [ 1355.941760] RBP: 0030 R08: 0007 R09: 00581260 [ 1355.950326] R10: R11: 0246 R12: 00581930 [ 1355.958885] R13: 000c R14: 00581260 R15: [ 1355.967430] ---[ end trace bc771ac6189977a2 ]--- I'm not sure what I did to do this and I'm going to work on a reproducer. At the time of the Warning I only had 1 GUP user?!?!?!?! If there is a get_user_pages() call that lacks a corresponding put_user_pages() call, then the count could start working its way up, and up. Either that, or a bug in my patches here, could cause this. The basic counting works correctly in fio runs on an NVMe drive
Re: [PATCH] arm64: dts: fsl: imx8mq: enable the thermal management unit (TMU)
On Tue, Mar 12, 2019 at 1:19 PM Angus Ainslie wrote: > > Hi Andrey, > > On 2019-03-11 19:35, Andrey Smirnov wrote: > > On Mon, Mar 11, 2019 at 2:35 PM Angus Ainslie (Purism) > > wrote: > >> > >> These are the TMU nodes from the NXP vendor kernel > >> > > > > Hey Angus, > > > > TMU block supports multiple thermal zones and vendor kernel doesn't > > really account for that (see below). Latest version of the driver in > > thermal tree now actually supports that feature (mulit-sensor), so I > > think the code in DT should reflect that as well. I recently submitted > > a series adding HWMON integration for TMU > > (https://lore.kernel.org/lkml/2019000508.26325-1-andrew.smir...@gmail.com/T/#u) > > I tried applying those to linux-next. They don't apply very cleanly > there so I gave up. > > > and this is my take on this patch: > > > > https://github.com/ndreys/linux/commit/09931e3d60af0a74377307b433db97da1be31570 > > > > All of the code there is up for grabs, if you feel like using it. > > > > I followed that and I have a version that works with linux-next that > does not include the GPU > and VPU parts. > > I also tested a version with the GPU and VPU parts and it "works" but > creates 2 useless > paths in /sys/class/thermal . > > Should I wait for your changes to get into linux-next or resubmit a > version that works with the current one ? > You don't really need my changes for multi sensor support, only the patch Fabio pointed to. You can send the patch sans GPU and VPU nodes to get it in sooner or you can wait until multi-sensor support patch trickles down to get all in in a single patch. All options are fine by me, so it's up to you and Shawn I'd say. Thanks, Andrey Smirnov
Re: [PATCH 09/10] mm/hmm: allow to mirror vma of a file on a DAX backed filesystem
On Tue, Mar 12, 2019 at 02:52:14PM -0700, Andrew Morton wrote: > On Tue, 12 Mar 2019 12:30:52 -0700 Dan Williams > wrote: > > > On Tue, Mar 12, 2019 at 12:06 PM Jerome Glisse wrote: > > > On Tue, Mar 12, 2019 at 09:06:12AM -0700, Dan Williams wrote: > > > > On Tue, Mar 12, 2019 at 8:26 AM Jerome Glisse > > > > wrote: > > [..] > > > > > Spirit of the rule is better than blind application of rule. > > > > > > > > Again, I fail to see why HMM is suddenly unable to make forward > > > > progress when the infrastructure that came before it was merged with > > > > consumers in the same development cycle. > > > > > > > > A gate to upstream merge is about the only lever a reviewer has to > > > > push for change, and these requests to uncouple the consumer only > > > > serve to weaken that review tool in my mind. > > > > > > Well let just agree to disagree and leave it at that and stop > > > wasting each other time > > > > I'm fine to continue this discussion if you are. Please be specific > > about where we disagree and what aspect of the proposed rules about > > merge staging are either acceptable, painful-but-doable, or > > show-stoppers. Do you agree that HMM is doing something novel with > > merge staging, am I off base there? > > You're correct. We chose to go this way because the HMM code is so > large and all-over-the-place that developing it in a standalone tree > seemed impractical - better to feed it into mainline piecewise. > > This decision very much assumed that HMM users would definitely be > merged, and that it would happen soon. I was skeptical for a long time > and was eventually persuaded by quite a few conversations with various > architecture and driver maintainers indicating that these HMM users > would be forthcoming. > > In retrospect, the arrival of HMM clients took quite a lot longer than > was anticipated and I'm not sure that all of the anticipated usage > sites will actually be using it. I wish I'd kept records of > who-said-what, but I didn't and the info is now all rather dissipated. > > So the plan didn't really work out as hoped. Lesson learned, I would > now very much prefer that new HMM feature work's changelogs include > links to the driver patchsets which will be using those features and > acks and review input from the developers of those driver patchsets. This is what i am doing now and this patchset falls into that. I did post the ODP and nouveau bits to use the 2 new functions (dma map and unmap). I expect to merge both ODP and nouveau bits for that during the next merge window. Also with 5.1 everything that is upstream is use by nouveau at least. They are posted patches to use HMM for AMD, Intel, Radeon, ODP, PPC. Some are going through several revisions so i do not know exactly when each will make it upstream but i keep working on all this. So the guideline we agree on: - no new infrastructure without user - device driver maintainer for which new infrastructure is done must either sign off or review of explicitly say that they want the feature I do not expect all driver maintainer will have the bandwidth to do proper review of the mm part of the infra- structure and it would not be fair to ask that from them. They can still provide feedback on the API expose to the device driver. - driver bits must be posted at the same time as the new infra- structure even if they target the next release cycle to avoid inter-tree dependency - driver bits must be merge as soon as possible Thing we do not agree on: - If driver bits miss for any reason the +1 target directly revert the new infra-structure. I think it should not be black and white and the reasons why the driver bit missed the merge window should be taken into account. If the feature is still wanted and the driver bits missed the window for simple reasons then it means that we push everything by 2 release ie the revert is done in +1 then we reupload the infra-structure in +2 and finaly repush the driver bit in +3 so we loose 1 cycle. Hence why i would rather that the revert would only happen if it is clear that the infrastructure is not ready or can not be use in timely (over couple kernel release) fashion by any drivers. Cheers, Jérôme
Re: [PATCH] b/arch/x86/mm/pti.c - make local symbols static
> Make both variables static. "pti_set_kernel_image_nonglobal(void)" is an awfully funny looking variable. ;) > Signed-off-by: Valdis Kletnieks > > --- > diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c > index 4fee5c3003ed..139b28a01ce4 100644 > --- a/arch/x86/mm/pti.c > +++ b/arch/x86/mm/pti.c > @@ -77,7 +77,7 @@ static void __init pti_print_if_secure(const char *reason) > pr_info("%s\n", reason); > } > > -enum pti_mode { > +static enum pti_mode { I'm struggling to figure out why we would want to do this. If there's a really good reason, I think we probably need to do it en masse: $ grep -r '^enum.{' arch/x86/ | wc -l 48 > PTI_AUTO = 0, > PTI_FORCE_OFF, > PTI_FORCE_ON > @@ -602,7 +602,7 @@ static void pti_clone_kernel_text(void) > set_memory_global(start, (end_global - start) >> PAGE_SHIFT); > } > > -void pti_set_kernel_image_nonglobal(void) > +static void pti_set_kernel_image_nonglobal(void) Yes, this function should be static.
Re: [PATCH] arch: arm: Kconfig: pedantic formatting
On Tue, 12 Mar 2019, at 00:26, Enrico Weigelt, metux IT consult wrote: > Formatting of Kconfig files doesn't look so pretty, so let the > Great White Handkerchief come around and clean it up. > > Signed-off-by: Enrico Weigelt, metux IT consult > --- > arch/arm/Kconfig | 24 +++--- > arch/arm/mach-aspeed/Kconfig | 10 - > arch/arm/mach-ep93xx/Kconfig | 8 > arch/arm/mach-hisi/Kconfig| 14 ++--- > arch/arm/mach-ixp4xx/Kconfig | 32 ++--- > arch/arm/mach-mmp/Kconfig | 2 +- > arch/arm/mach-omap1/Kconfig | 36 > arch/arm/mach-omap2/Kconfig | 12 +-- > arch/arm/mach-prima2/Kconfig | 6 +++--- > arch/arm/mach-s3c24xx/Kconfig | 32 ++--- > arch/arm/mach-s3c64xx/Kconfig | 16 +++ > arch/arm/mach-sa1100/Kconfig | 4 ++-- > arch/arm/mach-vt8500/Kconfig | 6 +++--- > arch/arm/mach-w90x900/Kconfig | 6 +++--- > arch/arm/mm/Kconfig | 48 > +-- > arch/arm/plat-samsung/Kconfig | 26 +++ > 16 files changed, 140 insertions(+), 142 deletions(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 5085a1e..c89f683 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -1115,14 +1115,14 @@ config ARM_ERRATA_764369 > in the diagnostic control register of the SCU. > > config ARM_ERRATA_775420 > - bool "ARM errata: A data cache maintenance operation which > aborts, might lead to deadlock" > - depends on CPU_V7 > - help > - This option enables the workaround for the 775420 Cortex-A9 (r2p2, > - r2p6,r2p8,r2p10,r3p0) erratum. In case a date cache maintenance > - operation aborts with MMU exception, it might cause the processor > - to deadlock. This workaround puts DSB before executing ISB if > - an abort may occur on cache maintenance. > + bool "ARM errata: A data cache maintenance operation which aborts, > might lead to deadlock" > + depends on CPU_V7 > + help > + This option enables the workaround for the 775420 Cortex-A9 (r2p2, > + r2p6,r2p8,r2p10,r3p0) erratum. In case a date cache maintenance > + operation aborts with MMU exception, it might cause the processor > + to deadlock. This workaround puts DSB before executing ISB if > + an abort may occur on cache maintenance. > > config ARM_ERRATA_798181 > bool "ARM errata: TLBI/DSB failure on Cortex-A15" > @@ -1650,12 +1650,12 @@ config HW_PERF_EVENTS > depends on ARM_PMU > > config SYS_SUPPORTS_HUGETLBFS > - def_bool y > - depends on ARM_LPAE > + def_bool y > + depends on ARM_LPAE > > config HAVE_ARCH_TRANSPARENT_HUGEPAGE > - def_bool y > - depends on ARM_LPAE > + def_bool y > + depends on ARM_LPAE > > config ARCH_WANT_GENERAL_HUGETLB > def_bool y > diff --git a/arch/arm/mach-aspeed/Kconfig b/arch/arm/mach-aspeed/Kconfig > index 2d5570e..f6eaf05 100644 > --- a/arch/arm/mach-aspeed/Kconfig > +++ b/arch/arm/mach-aspeed/Kconfig > @@ -18,9 +18,9 @@ config MACH_ASPEED_G4 > select CPU_ARM926T > select PINCTRL_ASPEED_G4 > help > - Say yes if you intend to run on an Aspeed ast2400 or similar > - fourth generation BMCs, such as those used by OpenPower Power8 > - systems. > + Say yes if you intend to run on an Aspeed ast2400 or similar > + fourth generation BMCs, such as those used by OpenPower Power8 > + systems. > > config MACH_ASPEED_G5 > bool "Aspeed SoC 5th Generation" > @@ -28,7 +28,7 @@ config MACH_ASPEED_G5 > select CPU_V6 > select PINCTRL_ASPEED_G5 > help > - Say yes if you intend to run on an Aspeed ast2500 or similar > - fifth generation Aspeed BMCs. > + Say yes if you intend to run on an Aspeed ast2500 or similar > + fifth generation Aspeed BMCs. For the ASPEED bits: Acked-by: Andrew Jeffery
Re: [GIT PULL] UBI/UBIFS updates for 5.1-rc1
On Tue, Mar 12, 2019 at 3:18 PM Linus Torvalds wrote: > > On Tue, Mar 12, 2019 at 8:13 AM Richard Weinberger wrote: > > > > git://git.infradead.org/linux-ubifs.git tags/upstream-5.1-rc1 > > Pulling this thing is taking forever for me. I can _ping_ the site, > but the "git pull" has been hanging for a while. Ok, after leaving it for an hour and a half I just gave up. It's not worth it. Put your repos on some other host, and re-send the pull requests, Linus
Re: [PATCH v3 1/1] mm: introduce put_user_page*(), placeholder versions
On Wed, Mar 06, 2019 at 03:54:55PM -0800, john.hubb...@gmail.com wrote: > From: John Hubbard > > Introduces put_user_page(), which simply calls put_page(). > This provides a way to update all get_user_pages*() callers, > so that they call put_user_page(), instead of put_page(). So I've been running with these patches for a while but today while ramping up my testing I hit the following: [ 1355.557819] [ cut here ] [ 1355.563436] get_user_pages pin count overflowed [ 1355.563446] WARNING: CPU: 1 PID: 1740 at mm/gup.c:73 get_gup_pin_page+0xa5/0xb0 [ 1355.577391] Modules linked in: ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transpo rt_srp ext4 mbcache jbd2 mlx4_ib opa_vnic rpcrdma sunrpc rdma_ucm ib_iser rdma_cm ib_umad iw_cm libiscs i ib_ipoib scsi_transport_iscsi ib_cm sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbyp ass snd_hda_codec_realtek ib_uverbs snd_hda_codec_generic crct10dif_pclmul ledtrig_audio snd_hda_intel crc32_pclmul snd_hda_codec snd_hda_core ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel crypto_simd s nd_timer ib_core cryptd snd glue_helper dax_pmem soundcore nd_pmem ipmi_si device_dax nd_btt ioatdma nd _e820 ipmi_devintf ipmi_msghandler iTCO_wdt i2c_i801 iTCO_vendor_support libnvdimm pcspkr lpc_ich mei_m e mei mfd_core wmi pcc_cpufreq acpi_cpufreq sch_fq_codel xfs libcrc32c mlx4_en sr_mod cdrom sd_mod mgag 200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlx4_core ttm crc32c_intel igb isci ah ci dca libsas firewire_ohci drm i2c_algo_bit libahci scsi_transport_sas [ 1355.577429] firewire_core crc_itu_t i2c_core libata dm_mod [last unloaded: rdmavt] [ 1355.686703] CPU: 1 PID: 1740 Comm: reg-mr Not tainted 5.0.0+ #10 [ 1355.693851] Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.02.04.0003.1023201411 38 10/23/2014 [ 1355.705750] RIP: 0010:get_gup_pin_page+0xa5/0xb0 [ 1355.711348] Code: e8 40 02 ff ff 80 3d ba a2 fb 00 00 b8 b5 ff ff ff 75 bb 48 c7 c7 48 0a e9 81 89 4 4 24 04 c6 05 a1 a2 fb 00 01 e8 35 63 e8 ff <0f> 0b 8b 44 24 04 eb 9c 0f 1f 00 66 66 66 66 90 41 57 49 bf 00 00 [ 1355.733244] RSP: 0018:c90005a23b30 EFLAGS: 00010286 [ 1355.739536] RAX: RBX: ea001422 RCX: [ 1355.748005] RDX: 0003 RSI: 827d94a3 RDI: 0246 [ 1355.756453] RBP: ea001422 R08: 0002 R09: 00022400 [ 1355.764907] R10: 0009ccf0ad0c4203 R11: 0001 R12: 00010207 [ 1355.773369] R13: 8884130b7040 R14: fff00fff R15: 000fffe0 [ 1355.781836] FS: 7f2680d0d740() GS:88842e84() knlGS: [ 1355.791384] CS: 0010 DS: ES: CR0: 80050033 [ 1355.798319] CR2: 00589000 CR3: 00040b05e004 CR4: 000606e0 [ 1355.806809] Call Trace: [ 1355.810078] follow_page_pte+0x4f3/0x5c0 [ 1355.814987] __get_user_pages+0x1eb/0x730 [ 1355.820020] get_user_pages+0x3e/0x50 [ 1355.824657] ib_umem_get+0x283/0x500 [ib_uverbs] [ 1355.830340] ? _cond_resched+0x15/0x30 [ 1355.835065] mlx4_ib_reg_user_mr+0x75/0x1e0 [mlx4_ib] [ 1355.841235] ib_uverbs_reg_mr+0x10c/0x220 [ib_uverbs] [ 1355.847400] ib_uverbs_write+0x2f9/0x4d0 [ib_uverbs] [ 1355.853473] __vfs_write+0x36/0x1b0 [ 1355.857904] ? selinux_file_permission+0xf0/0x130 [ 1355.863702] ? security_file_permission+0x2e/0xe0 [ 1355.869503] vfs_write+0xa5/0x1a0 [ 1355.873751] ksys_write+0x4f/0xb0 [ 1355.878009] do_syscall_64+0x5b/0x180 [ 1355.882656] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1355.62] RIP: 0033:0x7f2680ec3ed8 [ 1355.893420] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 45 78 0 d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55 [ 1355.915573] RSP: 002b:7ffe65d50bc8 EFLAGS: 0246 ORIG_RAX: 0001 [ 1355.924621] RAX: ffda RBX: 7ffe65d50c74 RCX: 7f2680ec3ed8 [ 1355.933195] RDX: 0030 RSI: 7ffe65d50c80 RDI: 0003 [ 1355.941760] RBP: 0030 R08: 0007 R09: 00581260 [ 1355.950326] R10: R11: 0246 R12: 00581930 [ 1355.958885] R13: 000c R14: 00581260 R15: [ 1355.967430] ---[ end trace bc771ac6189977a2 ]--- I'm not sure what I did to do this and I'm going to work on a reproducer. At the time of the Warning I only had 1 GUP user?!?!?!?! I'm not using ODP, so I don't think the changes we have discussed there are a problem. Ira > > Also introduces put_user_pages(), and a few dirty/locked variations, > as a replacement for release_pages(), and also as a replacement > for open-coded loops that release multiple pages. > These may be used for subsequent performance improvements, > via batching of pages to be released. > > This is the first step of fixing a problem (also described in [1] and > [2]) with intera
Re: [PATCH] arm64: dts: sdm845: Include the interconnect resources DT header
On Mon, Mar 11, 2019 at 7:06 AM Georgi Djakov wrote: > > Include the device tree header for the on-chip interconnect endpoint > resources on sdm845 devices. This will allow using the "interconnects" > property in DT nodes to describe the interconnect path resources they use. > > The sdm845 interconnect provider DT node is already present, but the > header file with the resources is not included, so let's fix this. > > Signed-off-by: Georgi Djakov Reviewed-by: Evan Green
Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions
On Wed, Mar 13, 2019 at 09:11:13AM +1100, Dave Chinner wrote: > On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote: > > IMHO I don't think that the copy_file_range() is going to carry us through > > the > > next wave of user performance requirements. RDMA, while the first, is not > > the > > only technology which is looking to have direct access to files. XDP is > > another.[1] > > Sure, all I doing here was demonstrating that people have been > trying to get local direct access to file mappings to DMA directly > into them for a long time. Direct Io games like these are now > largely unnecessary because we now have much better APIs to do > zero-copy data transfer between files (which can do hardware offload > if it is available!). > > It's the long term pins that RDMA does that are the problem here. > I'm asssuming that for XDP, you're talking about userspace zero copy > from files to the network hardware and vice versa? transmit is > simple (read-only mapping), but receive probably requires bpf > programs to ensure that data (minus headers) in the incoming packet > stream is correctly placed into the UMEM region? Yes, exactly. > > XDP receive seems pretty much like the same problem as RDMA writes > into the file. i.e. the incoming write DMAs are going to have to > trigger page faults if the UMEM is a long term pin so the filesystem > behaves correctly with this remote data placement. I'd suggest that > RDMA, XDP and anything other hardware that is going to pin > file-backed mappings for the long term need to use the same "inform > the fs of a write operation into it's mapping" mechanisms... Yes agreed. I have a hack patch I'm testing right now which allows the user to take a LAYOUT lease from user space and GUP triggers on that, either allowing or rejecting the pin based on the lease. I think this is the first step of what Jan suggested.[1] There is a lot more detail to work out with what happens if that lease needs to be broken. > > And if we start talking about wanting to do peer-to-peer DMA from > network/GPU device to storage device without going through a > file-backed CPU mapping, we still need to have the filesystem > involved to translate file offsets to storage locations the > filesystem has allocated for the data and to lock them down for as > long as the peer-to-peer DMA offload is in place. In effect, this > is the same problem as RDMA+FS-DAXs - the filesystem owns the file > offset to storage location mapping and manages storage access > arbitration, not the mm/vma mapping presented to userspace I've only daydreamed about Peer-to-peer transfers. But yes I think this is the direction we need to go. But The details of doing a GPU -> RDMA -> {network } -> RDMA -> FS DAX And back again... without CPU/OS involvement are only a twinkle in my eye... If that. Ira [1] https://lore.kernel.org/lkml/20190212160707.ga19...@quack2.suse.cz/
[GIT PULL] SELinux fixes for v5.1 (#1)
Hi Linus, Two small fixes for SELinux in v5.1: one adds a buffer length check to the SELinux SCTP code, the other ensures that the SELinux labeling for a NFS mount is not disabled if the filesystem is mounted twice. Please pull for v5.1-rc1. Thanks, -Paul -- The following changes since commit 45189a1998e00f6375ebd49d1e18161acddd73de: selinux: fix avc audit messages (2019-02-05 12:34:33 -0500) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git tags/selinux-pr-20190312 for you to fetch changes up to 3815a245b50124f0865415dcb606a034e97494d4: security/selinux: fix SECURITY_LSM_NATIVE_LABELS on reused superblock (2019-03-11 16:13:17 -0400) selinux/stable-5.1 PR 20190312 J. Bruce Fields (1): security/selinux: fix SECURITY_LSM_NATIVE_LABELS on reused superblock Xin Long (1): selinux: add the missing walk_size + len check in selinux_sctp_bind_connect security/selinux/hooks.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) -- paul moore www.paul-moore.com
[PATCH] Bluetooth: btqca: Fix misspelling of 'baudrate'
Rename the misspelled struct 'qca_bardrate' to 'qca_baudrate' Signed-off-by: Matthias Kaehlcke --- drivers/bluetooth/btqca.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/bluetooth/btqca.h b/drivers/bluetooth/btqca.h index c72c56ea7480..6fdc25d7bba7 100644 --- a/drivers/bluetooth/btqca.h +++ b/drivers/bluetooth/btqca.h @@ -41,7 +41,7 @@ #define QCA_WCN3990_POWERON_PULSE 0xFC #define QCA_WCN3990_POWEROFF_PULSE 0xC0 -enum qca_bardrate { +enum qca_baudrate { QCA_BAUDRATE_115200 = 0, QCA_BAUDRATE_57600, QCA_BAUDRATE_38400, -- 2.21.0.360.g471c308f928-goog
Re: [PATCH 4.9 00/96] 4.9.163-stable review
stable-rc/linux-4.9.y boot: 92 boots: 1 failed, 91 passed (v4.9.162-97-g605129cbbd38) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.9.y/kernel/v4.9.162-97-g605129cbbd38/ Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.9.y/kernel/v4.9.162-97-g605129cbbd38/ Tree: stable-rc Branch: linux-4.9.y Git Describe: v4.9.162-97-g605129cbbd38 Git Commit: 605129cbbd389949c754d0a23ce49030adae8f17 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 47 unique boards, 21 SoC families, 13 builds out of 193 Boot Regressions Detected: arm: qcom_defconfig: gcc-7: qcom-apq8064-cm-qs600: lab-baylibre-seattle: new failure (last pass: v4.9.162-94-g0384d1b03fc9) Boot Failure Detected: arm: qcom_defconfig: gcc-7: qcom-apq8064-cm-qs600: 1 failed lab --- For more info write to
Re: [RFC][Patch v9 2/6] KVM: Enables the kernel to isolate guest free pages
On Tue, Mar 12, 2019 at 2:53 PM David Hildenbrand wrote: > > On 12.03.19 22:13, Alexander Duyck wrote: > > On Tue, Mar 12, 2019 at 12:46 PM Nitesh Narayan Lal > > wrote: > >> > >> On 3/8/19 4:39 PM, Alexander Duyck wrote: > >>> On Fri, Mar 8, 2019 at 11:39 AM Nitesh Narayan Lal > >>> wrote: > On 3/8/19 2:25 PM, Alexander Duyck wrote: > > On Fri, Mar 8, 2019 at 11:10 AM Nitesh Narayan Lal > > wrote: > >> On 3/8/19 1:06 PM, Alexander Duyck wrote: > >>> On Thu, Mar 7, 2019 at 6:32 PM Michael S. Tsirkin > >>> wrote: > On Thu, Mar 07, 2019 at 02:35:53PM -0800, Alexander Duyck wrote: > > The only other thing I still want to try and see if I can do is to > > add > > a jiffies value to the page private data in the case of the buddy > > pages. > Actually there's one extra thing I think we should do, and that is > make > sure we do not leave less than X% off the free memory at a time. > This way chances of triggering an OOM are lower. > >>> If nothing else we could probably look at doing a watermark of some > >>> sort so we have to have X amount of memory free but not hinted before > >>> we will start providing the hints. It would just be a matter of > >>> tracking how much memory we have hinted on versus the amount of memory > >>> that has been pulled from that pool. > >> This is to avoid false OOM in the guest? > > Partially, though it would still be possible. Basically it would just > > be a way of determining when we have hinted "enough". Basically it > > doesn't do us much good to be hinting on free memory if the guest is > > already constrained and just going to reallocate the memory shortly > > after we hinted on it. The idea is with a watermark we can avoid > > hinting until we start having pages that are actually going to stay > > free for a while. > > > >>> It is another reason why we > >>> probably want a bit in the buddy pages somewhere to indicate if a page > >>> has been hinted or not as we can then use that to determine if we have > >>> to account for it in the statistics. > >> The one benefit which I can see of having an explicit bit is that it > >> will help us to have a single hook away from the hot path within buddy > >> merging code (just like your arch_merge_page) and still avoid duplicate > >> hints while releasing pages. > >> > >> I still have to check PG_idle and PG_young which you mentioned but I > >> don't think we can reuse any existing bits. > > Those are bits that are already there for 64b. I think those exist in > > the page extension for 32b systems. If I am not mistaken they are only > > used in VMA mapped memory. What I was getting at is that those are the > > bits we could think about reusing. > > > >> If we really want to have something like a watermark, then can't we use > >> zone->free_pages before isolating to see how many free pages are there > >> and put a threshold on it? (__isolate_free_page() does a similar thing > >> but it does that on per request basis). > > Right. That is only part of it though since that tells you how many > > free pages are there. But how many of those free pages are hinted? > > That is the part we would need to track separately and then then > > compare to free_pages to determine if we need to start hinting on more > > memory or not. > Only pages which are isolated will be hinted, and once a page is > isolated it will not be counted in the zone free pages. > Feel free to correct me if I am wrong. > >>> You are correct up to here. When we isolate the page it isn't counted > >>> against the free pages. However after we complete the hint we end up > >>> taking it out of isolation and returning it to the "free" state, so it > >>> will be counted against the free pages. > >>> > If I am understanding it correctly you only want to hint the idle pages, > is that right? > >>> Getting back to the ideas from our earlier discussion, we had 3 stages > >>> for things. Free but not hinted, isolated due to hinting, and free and > >>> hinted. So what we would need to do is identify the size of the first > >>> pool that is free and not hinted by knowing the total number of free > >>> pages, and then subtract the size of the pages that are hinted and > >>> still free. > >> To summarize, for now, I think it makes sense to stick with the current > >> approach as this way we can avoid any locking in the allocation path and > >> reduce the number of hypercalls for a bunch of MAX_ORDER - 1 page. > > > > I'm not sure what you are talking about by "avoid any locking in the > > allocation path". Are you talking about the spin on idle bit, if so > > then yes. However I have been testing your patches and I was correct > > in the assumption that you forgot to handle the zone lock when you >
Re: [PATCH] Makefile: Add '-fno-builtin-bcmp' to CLANG_FLAGS
On Tue, Mar 12, 2019 at 2:53 PM Nathan Chancellor wrote: > > After LLVM revision r355672 [1], all known working kernel configurations > fail to link [2]: > > ld: init/do_mounts.o: in function `prepare_namespace': > do_mounts.c:(.init.text+0x5ca): undefined reference to `bcmp' > ld: do_mounts.c:(.init.text+0x5e6): undefined reference to `bcmp' > ld: init/initramfs.o: in function `do_header': > initramfs.c:(.init.text+0x6e0): undefined reference to `bcmp' > ld: initramfs.c:(.init.text+0x6f8): undefined reference to `bcmp' > ld: arch/x86/kernel/setup.o: in function `setup_arch': > setup.c:(.init.text+0x21d): undefined reference to `bcmp' > > Commit 6edfba1b33c7 ("[PATCH] x86_64: Don't define string functions to > builtin") removed '-ffreestanding' globally and the kernel doesn't > provide a bcmp definition so the linker cannot find a reference to it. > > Fix this by explicitly telling LLVM through Clang not to emit bcmp > references. This flag does not need to be behind 'cc-option' because all > working versions of Clang support this flag. > > [1]: > https://github.com/llvm/llvm-project/commit/8e16d73346f8091461319a7dfc4ddd18eedcff13 > [2]: > https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/104027249 > > Link: https://github.com/ClangBuiltLinux/linux/issues/416 > Link: https://bugs.llvm.org/show_bug.cgi?id=41035 > Cc: sta...@vger.kernel.org > Signed-off-by: Nathan Chancellor Thanks for this patch. Can the maintainers please consider this an emergency patch; without it, the recent change to LLVM has caused ALL of our CI targets to go red. Reviewed-by: Nick Desaulniers > --- > Makefile | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/Makefile b/Makefile > index 9ef547fc7ffe..6645a274b6e3 100644 > --- a/Makefile > +++ b/Makefile > @@ -501,6 +501,7 @@ ifneq ($(GCC_TOOLCHAIN),) > CLANG_FLAGS+= --gcc-toolchain=$(GCC_TOOLCHAIN) > endif > CLANG_FLAGS+= -no-integrated-as > +CLANG_FLAGS+= -fno-builtin-bcmp > KBUILD_CFLAGS += $(CLANG_FLAGS) > KBUILD_AFLAGS += $(CLANG_FLAGS) > export CLANG_FLAGS > -- > 2.21.0 > -- Thanks, ~Nick Desaulniers
Re: [PATCH] drivers/platform/x86/dell-rbtn.c - add missing #include
On Tue, 12 Mar 2019 23:46:11 +0100, Pali Rohar said: > Can you identify in which commit was introduced this problem? If yes, > then Fixes: keyword should be added into commit message. I admit not knowing how long that's been there - I mostly found myself with a large amount of free time, a good supply of caffeine, and I got irritated with how many warnings 'make C=2 W=1' throws, so I decided to start going through and beating things into shape. For those issues where I can easily identify the regression point, I will include a Fixes: as I go..
Re: KASAN: null-ptr-deref Read in reclaim_high
On Tue, Mar 12, 2019 at 09:33:44AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote: > On Tue, Mar 12, 2019 at 7:25 AM Andrew Morton > wrote: > > > > On Tue, 12 Mar 2019 07:08:38 +0100 Dmitry Vyukov wrote: > > > > > On Tue, Mar 12, 2019 at 12:37 AM Andrew Morton > > > wrote: > > > > > > > > On Mon, 11 Mar 2019 06:08:01 -0700 syzbot > > > > wrote: > > > > > > > > > syzbot has bisected this bug to: > > > > > > > > > > commit 29a4b8e275d1f10c51c7891362877ef6cffae9e7 > > > > > Author: Shakeel Butt > > > > > Date: Wed Jan 9 22:02:21 2019 + > > > > > > > > > > memcg: schedule high reclaim for remote memcgs on high_work > > > > > > > > > > bisection log: > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=155bf5db20 > > > > > start commit: 29a4b8e2 memcg: schedule high reclaim for remote > > > > > memcgs on.. > > > > > git tree: linux-next > > > > > final crash: > > > > > https://syzkaller.appspot.com/x/report.txt?x=175bf5db20 > > > > > console output: > > > > > https://syzkaller.appspot.com/x/log.txt?x=135bf5db20 > > > > > kernel config: > > > > > https://syzkaller.appspot.com/x/.config?x=611f89e5b6868db > > > > > dashboard link: > > > > > https://syzkaller.appspot.com/bug?extid=fa11f9da42b46cea3b4a > > > > > userspace arch: amd64 > > > > > syz repro: > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1425901740 > > > > > C reproducer: > > > > > https://syzkaller.appspot.com/x/repro.c?x=141630a0c0 > > > > > > > > > > Reported-by: syzbot+fa11f9da42b46cea3...@syzkaller.appspotmail.com > > > > > Fixes: 29a4b8e2 ("memcg: schedule high reclaim for remote memcgs on > > > > > high_work") > > > > > > > > The following patch > > > > memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work-v3.patch > > > > might have fixed this. Was it applied? > > > > > > Hi Andrew, > > > > > > You mean if the patch was applied during the bisection? > > > No, it wasn't. Bisection is very specifically done on the same tree > > > where the bug was hit. There are already too many factors that make > > > the result flaky/wrong/inconclusive without changing the tree state. > > > Now, if syzbot would know about any pending fix for this bug, then it > > > would not do the bisection at all. But it have not seen any patch in > > > upstream/linux-next with the Reported-by tag, nor it received any syz > > > fix commands for this bugs. Should have been it aware of the fix? How? > > > > memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work-v3.patch was > > added to linux-next on Jan 10. I take it that this bug was hit when > > testing the entire linux-next tree, so we can assume that > > memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work-v3.patch > > does not fix it, correct? > > In which case, over to Shakeel! > > Jan 10 is exactly when this bug was reported: > https://groups.google.com/forum/#!msg/syzkaller-bugs/5YkhNUg2PFY/4-B5M7bDCAAJ > https://syzkaller.appspot.com/bug?extid=fa11f9da42b46cea3b4a > > We don't know if that patch fixed the bug or not because nobody tested > the reproducer with that patch. > > It seems that the problem here is that nobody associated the fix with > the bug report. So people looking at open bug reports will spend time > again and again debugging this just to find that this was fixed months > ago. syzbot also doesn't have a chance to realize that this is fixed > and bisection is not necessary anymore. It also won't confirm/disprove > that the fix actually fixes the bug because even if the crash will > continue to happen it will look like the old crash just continues to > happen, so nothing to notify about. > > Associating fixes with bug reports solves all these problems for > humans and bots. > I think syzbot needs to be more aggressive about invalidating old bug reports on linux-next, e.g. automatically invalidate linux-next bugs that no longer occur after a few weeks even if there is a reproducer. Patches get added, changed, and removed in linux-next every day. Bugs that syzbot runs into on linux-next are often obvious enough that they get reported by other people too, resulting in bugs being fixed or dropped without people ever seeing the syzbot report. How do you propose that people associate fixes with syzbot reports when they never saw the syzbot report in the first place? This is a problem on mainline too, of course. But we *know* it's a more severe problem on linux-next, and that a bug like this that only ever happened on linux-next and stopped happening 2 months ago, is much less likely to be relevant than a bug in mainline. Kernel developers don't have time to examine every single syzbot report so you need to help them out by reducing the noise. - Eric
Re: [PATCH] drivers/platform/x86/dell-rbtn.c - add missing #include
On Tuesday 12 March 2019 07:26:06 Valdis Klētnieks wrote: > Building with W=1 complains: > CC [M] drivers/platform/x86/dell-rbtn.o > drivers/platform/x86/dell-rbtn.c:345:5: warning: no previous prototype for > 'dell_rbtn_notifier_register' [-Wmissing-prototypes] > 345 | int dell_rbtn_notifier_register(struct notifier_block *nb) > | ^~~ > drivers/platform/x86/dell-rbtn.c:371:5: warning: no previous prototype for > 'dell_rbtn_notifier_unregister' [-Wmissing-prototypes] > 371 | int dell_rbtn_notifier_unregister(struct notifier_block *nb) > | ^ Can you identify in which commit was introduced this problem? If yes, then Fixes: keyword should be added into commit message. > > The real problem is a missing include. Add it to keep dell-rbtn.c and .h in > sync > > Signed-off-by: Valdis Kletnieks > > diff --git a/drivers/platform/x86/dell-rbtn.c > b/drivers/platform/x86/dell-rbtn.c > index f3afe778001e..d50ca96d99f0 100644 > --- a/drivers/platform/x86/dell-rbtn.c > +++ b/drivers/platform/x86/dell-rbtn.c > @@ -17,6 +17,7 @@ > #include > #include > #include > +#include "dell-rbtn.h" > > enum rbtn_type { > RBTN_UNKNOWN, > -- Pali Rohár pali.ro...@gmail.com signature.asc Description: PGP signature
Re: [PATCH 4.14 000/135] 4.14.106-stable review
stable-rc/linux-4.14.y boot: 105 boots: 1 failed, 103 passed with 1 untried/unknown (v4.14.105-136-gf881675936ef) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.14.105-136-gf881675936ef/ Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.105-136-gf881675936ef/ Tree: stable-rc Branch: linux-4.14.y Git Describe: v4.14.105-136-gf881675936ef Git Commit: f881675936ef263f178eab5a5cf95bc86089cf20 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 58 unique boards, 23 SoC families, 12 builds out of 197 Boot Failure Detected: arm64: defconfig: gcc-7: rk3399-firefly: 1 failed lab --- For more info write to
[PATCH v2 2/6] mm: prepare to premature release of per-node lruvec_stat_cpu
Similar to the memcg's vmstats_percpu, per-memcg per-node stats consists of percpu- and atomic counterparts, and we do expect that both coexist during the whole life-cycle of the memcg. To prepare for a premature release of percpu per-node data, let's pretend that lruvec_stat_cpu is a rcu-protected pointer, which can be NULL. This patch adds corresponding checks whenever required. Signed-off-by: Roman Gushchin Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 21 +++-- mm/memcontrol.c| 14 +++--- 2 files changed, 26 insertions(+), 9 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 05ca77767c6a..8ac04632002a 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -126,7 +126,7 @@ struct memcg_shrinker_map { struct mem_cgroup_per_node { struct lruvec lruvec; - struct lruvec_stat __percpu *lruvec_stat_cpu; + struct lruvec_stat __rcu /* __percpu */ *lruvec_stat_cpu; atomic_long_t lruvec_stat[NR_VM_NODE_STAT_ITEMS]; unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; @@ -682,6 +682,7 @@ static inline unsigned long lruvec_page_state(struct lruvec *lruvec, static inline void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { + struct lruvec_stat __percpu *lruvec_stat_cpu; struct mem_cgroup_per_node *pn; long x; @@ -697,12 +698,20 @@ static inline void __mod_lruvec_state(struct lruvec *lruvec, __mod_memcg_state(pn->memcg, idx, val); /* Update lruvec */ - x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]); - if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { - atomic_long_add(x, &pn->lruvec_stat[idx]); - x = 0; + rcu_read_lock(); + lruvec_stat_cpu = (struct lruvec_stat __percpu *) + rcu_dereference(pn->lruvec_stat_cpu); + if (likely(lruvec_stat_cpu)) { + x = val + __this_cpu_read(lruvec_stat_cpu->count[idx]); + if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { + atomic_long_add(x, &pn->lruvec_stat[idx]); + x = 0; + } + __this_cpu_write(lruvec_stat_cpu->count[idx], x); + } else { + atomic_long_add(val, &pn->lruvec_stat[idx]); } - __this_cpu_write(pn->lruvec_stat_cpu->count[idx], x); + rcu_read_unlock(); } static inline void mod_lruvec_state(struct lruvec *lruvec, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 803c772f354b..5ef4098f3f8d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2122,6 +2122,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) static int memcg_hotplug_cpu_dead(unsigned int cpu) { struct memcg_vmstats_percpu __percpu *vmstats_percpu; + struct lruvec_stat __percpu *lruvec_stat_cpu; struct memcg_stock_pcp *stock; struct mem_cgroup *memcg; @@ -2152,7 +2153,12 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu) struct mem_cgroup_per_node *pn; pn = mem_cgroup_nodeinfo(memcg, nid); - x = this_cpu_xchg(pn->lruvec_stat_cpu->count[i], 0); + + lruvec_stat_cpu = (struct lruvec_stat __percpu*) + rcu_dereference(pn->lruvec_stat_cpu); + if (!lruvec_stat_cpu) + continue; + x = this_cpu_xchg(lruvec_stat_cpu->count[i], 0); if (x) atomic_long_add(x, &pn->lruvec_stat[i]); } @@ -4414,6 +4420,7 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id) static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) { + struct lruvec_stat __percpu *lruvec_stat_cpu; struct mem_cgroup_per_node *pn; int tmp = node; /* @@ -4430,11 +4437,12 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!pn) return 1; - pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat); - if (!pn->lruvec_stat_cpu) { + lruvec_stat_cpu = alloc_percpu(struct lruvec_stat); + if (!lruvec_stat_cpu) { kfree(pn); return 1; } + rcu_assign_pointer(pn->lruvec_stat_cpu, lruvec_stat_cpu); lruvec_init(&pn->lruvec); pn->usage_in_excess = 0; -- 2.20.1
[PATCH v2 6/6] mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu()
It's possible to remove a big chunk of the redundant code by making memcg_flush_offline_percpu() to take cpumask as an argument and flush percpu data on all cpus belonging to the mask instead of all possible cpus. Then memcg_hotplug_cpu_dead() can call it with a single CPU bit set. This approach allows to remove all duplicated code, but safe the performance optimization made in memcg_flush_offline_percpu(): only one atomic operation per data entry. for_each_data_entry() for_each_cpu(cpu. cpumask) sum_events() flush() Otherwise it would be one atomic operation per data entry per cpu: for_each_cpu(cpu) for_each_data_entry() flush() Signed-off-by: Roman Gushchin --- mm/memcontrol.c | 61 - 1 file changed, 9 insertions(+), 52 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0f18bf2afea8..92c80275d5eb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2122,11 +2122,12 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) /* * Flush all per-cpu stats and events into atomics. * Try to minimize the number of atomic writes by gathering data from - * all cpus locally, and then make one atomic update. + * all cpus in cpumask locally, and then make one atomic update. * No locking is required, because no one has an access to * the offlined percpu data. */ -static void memcg_flush_offline_percpu(struct mem_cgroup *memcg) +static void memcg_flush_offline_percpu(struct mem_cgroup *memcg, + const struct cpumask *cpumask) { struct memcg_vmstats_percpu __percpu *vmstats_percpu; struct lruvec_stat __percpu *lruvec_stat_cpu; @@ -2140,7 +2141,7 @@ static void memcg_flush_offline_percpu(struct mem_cgroup *memcg) int nid; x = 0; - for_each_possible_cpu(cpu) + for_each_cpu(cpu, cpumask) x += per_cpu(vmstats_percpu->stat[i], cpu); if (x) atomic_long_add(x, &memcg->vmstats[i]); @@ -2153,7 +2154,7 @@ static void memcg_flush_offline_percpu(struct mem_cgroup *memcg) lruvec_stat_cpu = pn->lruvec_stat_cpu_offlined; x = 0; - for_each_possible_cpu(cpu) + for_each_cpu(cpu, cpumask) x += per_cpu(lruvec_stat_cpu->count[i], cpu); if (x) atomic_long_add(x, &pn->lruvec_stat[i]); @@ -2162,7 +2163,7 @@ static void memcg_flush_offline_percpu(struct mem_cgroup *memcg) for (i = 0; i < NR_VM_EVENT_ITEMS; i++) { x = 0; - for_each_possible_cpu(cpu) + for_each_cpu(cpu, cpumask) x += per_cpu(vmstats_percpu->events[i], cpu); if (x) atomic_long_add(x, &memcg->vmevents[i]); @@ -2171,8 +2172,6 @@ static void memcg_flush_offline_percpu(struct mem_cgroup *memcg) static int memcg_hotplug_cpu_dead(unsigned int cpu) { - struct memcg_vmstats_percpu __percpu *vmstats_percpu; - struct lruvec_stat __percpu *lruvec_stat_cpu; struct memcg_stock_pcp *stock; struct mem_cgroup *memcg; @@ -2180,50 +2179,8 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu) drain_stock(stock); rcu_read_lock(); - for_each_mem_cgroup(memcg) { - int i; - - vmstats_percpu = (struct memcg_vmstats_percpu __percpu *) - rcu_dereference(memcg->vmstats_percpu); - - for (i = 0; i < MEMCG_NR_STAT; i++) { - int nid; - long x; - - if (vmstats_percpu) { - x = this_cpu_xchg(vmstats_percpu->stat[i], 0); - if (x) - atomic_long_add(x, &memcg->vmstats[i]); - } - - if (i >= NR_VM_NODE_STAT_ITEMS) - continue; - - for_each_node(nid) { - struct mem_cgroup_per_node *pn; - - pn = mem_cgroup_nodeinfo(memcg, nid); - - lruvec_stat_cpu = (struct lruvec_stat __percpu*) - rcu_dereference(pn->lruvec_stat_cpu); - if (!lruvec_stat_cpu) - continue; - x = this_cpu_xchg(lruvec_stat_cpu->count[i], 0); - if (x) - atomic_long_add(x, &pn->lruvec_stat[i]); - } - } - - for (i = 0; i < NR_VM_EVENT_ITEMS; i++) { - long x; - - if (vmstats_percpu) { -
[PATCH v2 4/6] mm: release per-node memcg percpu data prematurely
Similar to memcg-level statistics, per-node data isn't expected to be hot after cgroup removal. Switching over to atomics and prematurely releasing percpu data helps to reduce the memory footprint of dying cgroups. Signed-off-by: Roman Gushchin Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 1 + mm/memcontrol.c| 24 +++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 569337514230..f296693d102b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -127,6 +127,7 @@ struct mem_cgroup_per_node { struct lruvec lruvec; struct lruvec_stat __rcu /* __percpu */ *lruvec_stat_cpu; + struct lruvec_stat __percpu *lruvec_stat_cpu_offlined; atomic_long_t lruvec_stat[NR_VM_NODE_STAT_ITEMS]; unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index efd5bc131a38..1b5fe826d6d0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4460,7 +4460,7 @@ static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!pn) return; - free_percpu(pn->lruvec_stat_cpu); + WARN_ON_ONCE(pn->lruvec_stat_cpu != NULL); kfree(pn); } @@ -4616,7 +4616,17 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) static void percpu_rcu_free(struct rcu_head *rcu) { struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu); + int node; + + for_each_node(node) { + struct mem_cgroup_per_node *pn = memcg->nodeinfo[node]; + if (!pn) + continue; + + free_percpu(pn->lruvec_stat_cpu_offlined); + WARN_ON_ONCE(pn->lruvec_stat_cpu != NULL); + } free_percpu(memcg->vmstats_percpu_offlined); WARN_ON_ONCE(memcg->vmstats_percpu); @@ -4625,6 +4635,18 @@ static void percpu_rcu_free(struct rcu_head *rcu) static void mem_cgroup_offline_percpu(struct mem_cgroup *memcg) { + int node; + + for_each_node(node) { + struct mem_cgroup_per_node *pn = memcg->nodeinfo[node]; + + if (!pn) + continue; + + pn->lruvec_stat_cpu_offlined = (struct lruvec_stat __percpu *) + rcu_dereference(pn->lruvec_stat_cpu); + rcu_assign_pointer(pn->lruvec_stat_cpu, NULL); + } memcg->vmstats_percpu_offlined = (struct memcg_vmstats_percpu __percpu*) rcu_dereference(memcg->vmstats_percpu); rcu_assign_pointer(memcg->vmstats_percpu, NULL); -- 2.20.1
[PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups
A cgroup can remain in the dying state for a long time, being pinned in the memory by any kernel object. It can be pinned by a page, shared with other cgroup (e.g. mlocked by a process in the other cgroup). It can be pinned by a vfs cache object, etc. Mostly because of percpu data, the size of a memcg structure in the kernel memory is quite large. Depending on the machine size and the kernel config, it can easily reach hundreds of kilobytes per cgroup. Depending on the memory pressure and the reclaim approach (which is a separate topic), it looks like several hundreds (if not single thousands) of dying cgroups is a typical number. On a moderately sized machine the overall memory footprint is measured in hundreds of megabytes. So if we can't completely get rid of dying cgroups, let's make them smaller. This patchset aims to reduce the size of a dying memory cgroup by the premature release of percpu data during the cgroup removal, and use of atomic counterparts instead. Currently it covers per-memcg vmstat_percpu, per-memcg per-node lruvec_stat_cpu. The same approach can be further applied to other percpu data. Results on my test machine (32 CPUs, singe node): With the patchset: Originally: nr_dying_descendants 0 Slab: 66640 kB Slab: 67644 kB Percpu: 6912 kB Percpu: 6912 kB nr_dying_descendants 1000 Slab: 85912 kB Slab: 84704 kB Percpu:26880 kB Percpu:64128 kB So one dying cgroup went from 75 kB to 39 kB, which is almost twice smaller. The difference will be even bigger on a bigger machine (especially, with NUMA). To test the patchset, I used the following script: CG=/sys/fs/cgroup/percpu_test/ mkdir ${CG} echo "+memory" > ${CG}/cgroup.subtree_control cat ${CG}/cgroup.stat | grep nr_dying_descendants cat /proc/meminfo | grep -e Percpu -e Slab for i in `seq 1 1000`; do mkdir ${CG}/${i} echo $$ > ${CG}/${i}/cgroup.procs dd if=/dev/urandom of=/tmp/test-${i} count=1 2> /dev/null echo $$ > /sys/fs/cgroup/cgroup.procs rmdir ${CG}/${i} done cat /sys/fs/cgroup/cgroup.stat | grep nr_dying_descendants cat /proc/meminfo | grep -e Percpu -e Slab rmdir ${CG} v2: - several renamings suggested by Johannes Weiner - added a patch, which merges cpu offlining and percpu flush code Roman Gushchin (6): mm: prepare to premature release of memcg->vmstats_percpu mm: prepare to premature release of per-node lruvec_stat_cpu mm: release memcg percpu data prematurely mm: release per-node memcg percpu data prematurely mm: flush memcg percpu stats and events before releasing mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu() include/linux/memcontrol.h | 66 ++ mm/memcontrol.c| 179 - 2 files changed, 186 insertions(+), 59 deletions(-) -- 2.20.1
[PATCH v2 1/6] mm: prepare to premature release of memcg->vmstats_percpu
Prepare to handle premature release of memcg->vmstats_percpu data. Currently it's a generic pointer which is expected to be non-NULL during the whole life time of a memcg. Switch over to the rcu-protected pointer, and carefully check it for being non-NULL. This change is a required step towards dynamic premature release of percpu memcg data. Signed-off-by: Roman Gushchin Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 40 +--- mm/memcontrol.c| 62 +- 2 files changed, 77 insertions(+), 25 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 534267947664..05ca77767c6a 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -274,7 +274,7 @@ struct mem_cgroup { struct task_struct *move_lock_task; /* memory.stat */ - struct memcg_vmstats_percpu __percpu *vmstats_percpu; + struct memcg_vmstats_percpu __rcu /* __percpu */ *vmstats_percpu; MEMCG_PADDING(_pad2_); @@ -597,17 +597,26 @@ static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, static inline void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) { + struct memcg_vmstats_percpu __percpu *vmstats_percpu; long x; if (mem_cgroup_disabled()) return; - x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]); - if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { - atomic_long_add(x, &memcg->vmstats[idx]); - x = 0; + rcu_read_lock(); + vmstats_percpu = (struct memcg_vmstats_percpu __percpu *) + rcu_dereference(memcg->vmstats_percpu); + if (likely(vmstats_percpu)) { + x = val + __this_cpu_read(vmstats_percpu->stat[idx]); + if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { + atomic_long_add(x, &memcg->vmstats[idx]); + x = 0; + } + __this_cpu_write(vmstats_percpu->stat[idx], x); + } else { + atomic_long_add(val, &memcg->vmstats[idx]); } - __this_cpu_write(memcg->vmstats_percpu->stat[idx], x); + rcu_read_unlock(); } /* idx can be of type enum memcg_stat_item or node_stat_item */ @@ -740,17 +749,26 @@ static inline void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, unsigned long count) { + struct memcg_vmstats_percpu __percpu *vmstats_percpu; unsigned long x; if (mem_cgroup_disabled()) return; - x = count + __this_cpu_read(memcg->vmstats_percpu->events[idx]); - if (unlikely(x > MEMCG_CHARGE_BATCH)) { - atomic_long_add(x, &memcg->vmevents[idx]); - x = 0; + rcu_read_lock(); + vmstats_percpu = (struct memcg_vmstats_percpu __percpu *) + rcu_dereference(memcg->vmstats_percpu); + if (likely(vmstats_percpu)) { + x = count + __this_cpu_read(vmstats_percpu->events[idx]); + if (unlikely(x > MEMCG_CHARGE_BATCH)) { + atomic_long_add(x, &memcg->vmevents[idx]); + x = 0; + } + __this_cpu_write(vmstats_percpu->events[idx], x); + } else { + atomic_long_add(count, &memcg->vmevents[idx]); } - __this_cpu_write(memcg->vmstats_percpu->events[idx], x); + rcu_read_unlock(); } static inline void count_memcg_events(struct mem_cgroup *memcg, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c532f8685aa3..803c772f354b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -697,6 +697,8 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, struct page *page, bool compound, int nr_pages) { + struct memcg_vmstats_percpu __percpu *vmstats_percpu; + /* * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is * counted as CACHE even if it's on ANON LRU. @@ -722,7 +724,12 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, nr_pages = -nr_pages; /* for event */ } - __this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages); + rcu_read_lock(); + vmstats_percpu = (struct memcg_vmstats_percpu __percpu *) + rcu_dereference(memcg->vmstats_percpu); + if (likely(vmstats_percpu)) + __this_cpu_add(vmstats_percpu->nr_page_events, nr_pages); + rcu_read_unlock(); } unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, @@ -756,10 +763,18 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
[PATCH v2 5/6] mm: flush memcg percpu stats and events before releasing
Flush percpu stats and events data to corresponding before releasing percpu memory. Although per-cpu stats are never exactly precise, dropping them on floor regularly may lead to an accumulation of an error. So, it's safer to flush them before releasing. To minimize the number of atomic updates, let's sum all stats/events on all cpus locally, and then make a single update per entry. Signed-off-by: Roman Gushchin --- mm/memcontrol.c | 52 + 1 file changed, 52 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1b5fe826d6d0..0f18bf2afea8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2119,6 +2119,56 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) mutex_unlock(&percpu_charge_mutex); } +/* + * Flush all per-cpu stats and events into atomics. + * Try to minimize the number of atomic writes by gathering data from + * all cpus locally, and then make one atomic update. + * No locking is required, because no one has an access to + * the offlined percpu data. + */ +static void memcg_flush_offline_percpu(struct mem_cgroup *memcg) +{ + struct memcg_vmstats_percpu __percpu *vmstats_percpu; + struct lruvec_stat __percpu *lruvec_stat_cpu; + struct mem_cgroup_per_node *pn; + int cpu, i; + long x; + + vmstats_percpu = memcg->vmstats_percpu_offlined; + + for (i = 0; i < MEMCG_NR_STAT; i++) { + int nid; + + x = 0; + for_each_possible_cpu(cpu) + x += per_cpu(vmstats_percpu->stat[i], cpu); + if (x) + atomic_long_add(x, &memcg->vmstats[i]); + + if (i >= NR_VM_NODE_STAT_ITEMS) + continue; + + for_each_node(nid) { + pn = mem_cgroup_nodeinfo(memcg, nid); + lruvec_stat_cpu = pn->lruvec_stat_cpu_offlined; + + x = 0; + for_each_possible_cpu(cpu) + x += per_cpu(lruvec_stat_cpu->count[i], cpu); + if (x) + atomic_long_add(x, &pn->lruvec_stat[i]); + } + } + + for (i = 0; i < NR_VM_EVENT_ITEMS; i++) { + x = 0; + for_each_possible_cpu(cpu) + x += per_cpu(vmstats_percpu->events[i], cpu); + if (x) + atomic_long_add(x, &memcg->vmevents[i]); + } +} + static int memcg_hotplug_cpu_dead(unsigned int cpu) { struct memcg_vmstats_percpu __percpu *vmstats_percpu; @@ -4618,6 +4668,8 @@ static void percpu_rcu_free(struct rcu_head *rcu) struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu); int node; + memcg_flush_offline_percpu(memcg); + for_each_node(node) { struct mem_cgroup_per_node *pn = memcg->nodeinfo[node]; -- 2.20.1
[PATCH 5/5] mm: spill memcg percpu stats and events before releasing
Spill percpu stats and events data to corresponding before releasing percpu memory. Although per-cpu stats are never exactly precise, dropping them on floor regularly may lead to an accumulation of an error. So, it's safer to sync them before releasing. To minimize the number of atomic updates, let's sum all stats/events on all cpus locally, and then make a single update per entry. Signed-off-by: Roman Gushchin --- mm/memcontrol.c | 52 + 1 file changed, 52 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 18e863890392..b7eb6fac735e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4612,11 +4612,63 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) return 0; } +/* + * Spill all per-cpu stats and events into atomics. + * Try to minimize the number of atomic writes by gathering data from + * all cpus locally, and then make one atomic update. + * No locking is required, because no one has an access to + * the offlined percpu data. + */ +static void mem_cgroup_spill_offlined_percpu(struct mem_cgroup *memcg) +{ + struct memcg_vmstats_percpu __percpu *vmstats_percpu; + struct lruvec_stat __percpu *lruvec_stat_cpu; + struct mem_cgroup_per_node *pn; + int cpu, i; + long x; + + vmstats_percpu = memcg->vmstats_percpu_offlined; + + for (i = 0; i < MEMCG_NR_STAT; i++) { + int nid; + + x = 0; + for_each_possible_cpu(cpu) + x += per_cpu(vmstats_percpu->stat[i], cpu); + if (x) + atomic_long_add(x, &memcg->vmstats[i]); + + if (i >= NR_VM_NODE_STAT_ITEMS) + continue; + + for_each_node(nid) { + pn = mem_cgroup_nodeinfo(memcg, nid); + lruvec_stat_cpu = pn->lruvec_stat_cpu_offlined; + + x = 0; + for_each_possible_cpu(cpu) + x += per_cpu(lruvec_stat_cpu->count[i], cpu); + if (x) + atomic_long_add(x, &pn->lruvec_stat[i]); + } + } + + for (i = 0; i < NR_VM_EVENT_ITEMS; i++) { + x = 0; + for_each_possible_cpu(cpu) + x += per_cpu(vmstats_percpu->events[i], cpu); + if (x) + atomic_long_add(x, &memcg->vmevents[i]); + } +} + static void mem_cgroup_free_percpu(struct rcu_head *rcu) { struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu); int node; + mem_cgroup_spill_offlined_percpu(memcg); + for_each_node(node) { struct mem_cgroup_per_node *pn = memcg->nodeinfo[node]; -- 2.20.1
[PATCH v2 3/6] mm: release memcg percpu data prematurely
To reduce the memory footprint of a dying memory cgroup, let's release massive percpu data (vmstats_percpu) as early as possible, and use atomic counterparts instead. A dying cgroup can remain in the dying state for quite a long time, being pinned in memory by any reference. For example, if a page mlocked by some other cgroup, is charged to the dying cgroup, it won't go away until the page will be released. A dying memory cgroup can have some memory activity (e.g. dirty pages can be flushed after cgroup removal), but in general it's not expected to be very active in comparison to living cgroups. So reducing the memory footprint by releasing percpu data and switching over to atomics seems to be a good trade off. Signed-off-by: Roman Gushchin Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 4 mm/memcontrol.c| 24 +++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 8ac04632002a..569337514230 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -275,6 +275,10 @@ struct mem_cgroup { /* memory.stat */ struct memcg_vmstats_percpu __rcu /* __percpu */ *vmstats_percpu; + struct memcg_vmstats_percpu __percpu *vmstats_percpu_offlined; + + /* used to release non-used percpu memory */ + struct rcu_head rcu; MEMCG_PADDING(_pad2_); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5ef4098f3f8d..efd5bc131a38 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4470,7 +4470,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg) for_each_node(node) free_mem_cgroup_per_node_info(memcg, node); - free_percpu(memcg->vmstats_percpu); + WARN_ON_ONCE(memcg->vmstats_percpu != NULL); kfree(memcg); } @@ -4613,6 +4613,26 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) return 0; } +static void percpu_rcu_free(struct rcu_head *rcu) +{ + struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu); + + free_percpu(memcg->vmstats_percpu_offlined); + WARN_ON_ONCE(memcg->vmstats_percpu); + + css_put(&memcg->css); +} + +static void mem_cgroup_offline_percpu(struct mem_cgroup *memcg) +{ + memcg->vmstats_percpu_offlined = (struct memcg_vmstats_percpu __percpu*) + rcu_dereference(memcg->vmstats_percpu); + rcu_assign_pointer(memcg->vmstats_percpu, NULL); + + css_get(&memcg->css); + call_rcu(&memcg->rcu, percpu_rcu_free); +} + static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); @@ -4639,6 +4659,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) drain_all_stock(memcg); mem_cgroup_id_put(memcg); + + mem_cgroup_offline_percpu(memcg); } static void mem_cgroup_css_released(struct cgroup_subsys_state *css) -- 2.20.1
Re: KASAN: null-ptr-deref Read in reclaim_high
Hi Dmitry, On Tue, Mar 12, 2019 at 09:21:09AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote: > On Tue, Mar 12, 2019 at 7:43 AM Eric Biggers wrote: > > > > On Mon, Mar 11, 2019 at 11:25:41PM -0700, Andrew Morton wrote: > > > On Tue, 12 Mar 2019 07:08:38 +0100 Dmitry Vyukov > > > wrote: > > > > > > > On Tue, Mar 12, 2019 at 12:37 AM Andrew Morton > > > > wrote: > > > > > > > > > > On Mon, 11 Mar 2019 06:08:01 -0700 syzbot > > > > > wrote: > > > > > > > > > > > syzbot has bisected this bug to: > > > > > > > > > > > > commit 29a4b8e275d1f10c51c7891362877ef6cffae9e7 > > > > > > Author: Shakeel Butt > > > > > > Date: Wed Jan 9 22:02:21 2019 + > > > > > > > > > > > > memcg: schedule high reclaim for remote memcgs on high_work > > > > > > > > > > > > bisection log: > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=155bf5db20 > > > > > > start commit: 29a4b8e2 memcg: schedule high reclaim for remote > > > > > > memcgs on.. > > > > > > git tree: linux-next > > > > > > final crash: > > > > > > https://syzkaller.appspot.com/x/report.txt?x=175bf5db20 > > > > > > console output: > > > > > > https://syzkaller.appspot.com/x/log.txt?x=135bf5db20 > > > > > > kernel config: > > > > > > https://syzkaller.appspot.com/x/.config?x=611f89e5b6868db > > > > > > dashboard link: > > > > > > https://syzkaller.appspot.com/bug?extid=fa11f9da42b46cea3b4a > > > > > > userspace arch: amd64 > > > > > > syz repro: > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1425901740 > > > > > > C reproducer: > > > > > > https://syzkaller.appspot.com/x/repro.c?x=141630a0c0 > > > > > > > > > > > > Reported-by: syzbot+fa11f9da42b46cea3...@syzkaller.appspotmail.com > > > > > > Fixes: 29a4b8e2 ("memcg: schedule high reclaim for remote memcgs on > > > > > > high_work") > > > > > > > > > > The following patch > > > > > memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work-v3.patch > > > > > might have fixed this. Was it applied? > > > > > > > > Hi Andrew, > > > > > > > > You mean if the patch was applied during the bisection? > > > > No, it wasn't. Bisection is very specifically done on the same tree > > > > where the bug was hit. There are already too many factors that make > > > > the result flaky/wrong/inconclusive without changing the tree state. > > > > Now, if syzbot would know about any pending fix for this bug, then it > > > > would not do the bisection at all. But it have not seen any patch in > > > > upstream/linux-next with the Reported-by tag, nor it received any syz > > > > fix commands for this bugs. Should have been it aware of the fix? How? > > > > > > memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work-v3.patch was > > > added to linux-next on Jan 10. I take it that this bug was hit when > > > testing the entire linux-next tree, so we can assume that > > > memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work-v3.patch > > > does not fix it, correct? > > > > > > In which case, over to Shakeel! > > > > > > > I don't understand what happened here. First, the syzbot report doesn't say > > which linux-next version was tested (which it should), but I get: > > > > $ git tag --contains 29a4b8e275d1f10c51c7891362877ef6cffae9e7 > > next-20190110 > > next-20190111 > > next-20190114 > > next-20190115 > > next-20190116 > > > > That's almost 2 months old, yet this bug was just reported now. Why? > > Hi Eric, > > This bug was reported on Jan 10: > https://syzkaller.appspot.com/bug?extid=fa11f9da42b46cea3b4a > https://groups.google.com/forum/#!msg/syzkaller-bugs/5YkhNUg2PFY/4-B5M7bDCAAJ > > The start revision of the bisection process (provided) is the same > that was used to create the reproducer. The end revision and bisection > log are provided in the email. > > How can we improve the format to make it more clear? > syzbot started a new thread rather than sending the bisection result in the existing thread. So I thought it was a new bug report, as did everyone else probably. - Eric
Re: [GIT PULL] Btrfs updates for 5.1, part 2
The pull request you sent on Tue, 12 Mar 2019 16:08:52 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git > for-5.1-part2-tag has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/92825b0298ca6822085ef483f914b6e0dea9bf66 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] Ceph updates for 5.1-rc1
The pull request you sent on Tue, 12 Mar 2019 18:50:17 +0100: > https://github.com/ceph/ceph-client.git tags/ceph-for-5.1-rc1 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/2b0a80b0d0bb0a3db74588279bf851b28c6c4705 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] ext4 changes for 5.1
The pull request you sent on Tue, 12 Mar 2019 15:50:59 -0400: > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git > tags/ext4_for_linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/a5adcfcad55d5f034b33f79f1a873229d1e77b24 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] Please pull NFS client updates for Linux 5.1
The pull request you sent on Tue, 12 Mar 2019 11:46:27 +: > git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-5.1-1 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/1fbf3e48123d701584bc75ccac67ef2fe412ac4c Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] overlayfs update for 5.1
The pull request you sent on Tue, 12 Mar 2019 10:03:09 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git > tags/ovl-update-5.1 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/f88c5942cfaf7d55e46d395136cccaca65b2e3bf Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] fuse update for 5.1
The pull request you sent on Tue, 12 Mar 2019 09:44:49 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git > tags/fuse-update-5.1 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/dfee9c257b102d7c0407629eef2ed32e152de0d2 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] nfsd changes for 5.1
The pull request you sent on Tue, 12 Mar 2019 14:23:59 -0400: > git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.1 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/ebc551f2b8f905eca0e25c476c1e5c098cd92103 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] UBI/UBIFS updates for 5.1-rc1
On Tue, Mar 12, 2019 at 8:13 AM Richard Weinberger wrote: > > git://git.infradead.org/linux-ubifs.git tags/upstream-5.1-rc1 Pulling this thing is taking forever for me. I can _ping_ the site, but the "git pull" has been hanging for a while. I really tjhink people need to stop using infradead.org for git hosting. It really is annoyingly slow. If you don't have a kernel.org account, use github or gitlab or something. But use something that isn't excruciatingly slow, ok? I don't know _why_ infradead.org is so slow, and I don't much care. It doesn't seem to be due to network issues, because quick look at things, there's no real network traffic, it's just waiting for the server to start sending git object information. And git tends to be good about network bandwidth anyway. But git *does* require some reasonable server-side resources, and they seem to be lacking in this case. Linus
Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions
On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote: > IMHO I don't think that the copy_file_range() is going to carry us through the > next wave of user performance requirements. RDMA, while the first, is not the > only technology which is looking to have direct access to files. XDP is > another.[1] Sure, all I doing here was demonstrating that people have been trying to get local direct access to file mappings to DMA directly into them for a long time. Direct Io games like these are now largely unnecessary because we now have much better APIs to do zero-copy data transfer between files (which can do hardware offload if it is available!). It's the long term pins that RDMA does that are the problem here. I'm asssuming that for XDP, you're talking about userspace zero copy from files to the network hardware and vice versa? transmit is simple (read-only mapping), but receive probably requires bpf programs to ensure that data (minus headers) in the incoming packet stream is correctly placed into the UMEM region? XDP receive seems pretty much like the same problem as RDMA writes into the file. i.e. the incoming write DMAs are going to have to trigger page faults if the UMEM is a long term pin so the filesystem behaves correctly with this remote data placement. I'd suggest that RDMA, XDP and anything other hardware that is going to pin file-backed mappings for the long term need to use the same "inform the fs of a write operation into it's mapping" mechanisms... And if we start talking about wanting to do peer-to-peer DMA from network/GPU device to storage device without going through a file-backed CPU mapping, we still need to have the filesystem involved to translate file offsets to storage locations the filesystem has allocated for the data and to lock them down for as long as the peer-to-peer DMA offload is in place. In effect, this is the same problem as RDMA+FS-DAXs - the filesystem owns the file offset to storage location mapping and manages storage access arbitration, not the mm/vma mapping presented to userspace Cheers, Dave. -- Dave Chinner da...@fromorbit.com
[PATCH 3/3] RISC-V: Allow booting kernel from any 4KB aligned address
Currently, we have to boot RISCV64 kernel from a 2MB aligned physical address and RISCV32 kernel from a 4MB aligned physical address. This constraint is because initial pagetable setup (i.e. setup_vm()) maps entire RAM using hugepages (i.e. 2MB for 3-level pagetable and 4MB for 2-level pagetable). Further, the above booting contraint also results in memory wastage because if we boot kernel from some address (which is not same as RAM start address) then RISCV kernel will map PAGE_OFFSET virtual address lineraly to physical address and memory between RAM start and will be reserved/unusable. For example, RISCV64 kernel booted from 0x8020 will waste 2MB of RAM and RISCV32 kernel booted from 0x8040 will waste 4MB of RAM. This patch re-writes the initial pagetable setup code to allow booting RISV32 and RISCV64 kernel from any 4KB (i.e. PAGE_SIZE) aligned address. To achieve this: 1. We map kernel, dtb and only some amount of RAM (few MBs) using 4KB mappings in setup_vm() (called from head.S) 2. Once we reach paging_init() (called from setup_arch()) after memblock setup, we map all available memory banks using 4KB mappings and memblock APIs. With this patch in-place, the booting constraint for RISCV32 and RISCV64 kernel is much more relaxed and we can now boot kernel very close to RAM start thereby minimizng memory wastage. Signed-off-by: Anup Patel --- arch/riscv/include/asm/fixmap.h | 5 + arch/riscv/include/asm/pgtable-64.h | 5 + arch/riscv/include/asm/pgtable.h| 6 +- arch/riscv/kernel/head.S| 1 + arch/riscv/kernel/setup.c | 4 +- arch/riscv/mm/init.c| 357 +++- 6 files changed, 317 insertions(+), 61 deletions(-) diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h index 57afe604b495..5cf53dd882e5 100644 --- a/arch/riscv/include/asm/fixmap.h +++ b/arch/riscv/include/asm/fixmap.h @@ -21,6 +21,11 @@ */ enum fixed_addresses { FIX_HOLE, +#define FIX_FDT_SIZE SZ_1M + FIX_FDT_END, + FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1, + FIX_PTE, + FIX_PMD, FIX_EARLYCON_MEM_BASE, __end_of_fixed_addresses }; diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h index 7aa0ea9bd8bb..56ecc3dc939d 100644 --- a/arch/riscv/include/asm/pgtable-64.h +++ b/arch/riscv/include/asm/pgtable-64.h @@ -78,6 +78,11 @@ static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot) return __pmd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot)); } +static inline unsigned long _pmd_pfn(pmd_t pmd) +{ + return pmd_val(pmd) >> _PAGE_PFN_SHIFT; +} + #define pmd_ERROR(e) \ pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e)) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 1141364d990e..05fa2115e736 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -121,12 +121,16 @@ static inline void pmd_clear(pmd_t *pmdp) set_pmd(pmdp, __pmd(0)); } - static inline pgd_t pfn_pgd(unsigned long pfn, pgprot_t prot) { return __pgd((pfn << _PAGE_PFN_SHIFT) | pgprot_val(prot)); } +static inline unsigned long _pgd_pfn(pgd_t pgd) +{ + return pgd_val(pgd) >> _PAGE_PFN_SHIFT; +} + #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) /* Locate an entry in the page global directory */ diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S index 7966262b4f9d..12a3ec5eb8ab 100644 --- a/arch/riscv/kernel/head.S +++ b/arch/riscv/kernel/head.S @@ -63,6 +63,7 @@ clear_bss_done: /* Initialize page tables and relocate to virtual addresses */ la sp, init_thread_union + THREAD_SIZE la a0, _start + mv a1, s1 call setup_vm call relocate diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index ecb654f6a79e..acdd0f74982b 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -30,6 +30,7 @@ #include #include +#include #include #include #include @@ -62,7 +63,8 @@ unsigned long boot_cpu_hartid; void __init parse_dtb(unsigned int hartid, void *dtb) { - if (early_init_dt_scan(__va(dtb))) + dtb = (void *)fix_to_virt(FIX_FDT) + ((uintptr_t)dtb & ~PAGE_MASK); + if (early_init_dt_scan(dtb)) return; pr_err("No DTB passed to the kernel\n"); diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index f35299f2f3d5..ee55a4b90dec 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -1,14 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ /* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. * Copyright (C) 2012 Regents of the University of California - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Founda
[PATCH 1/3] RISC-V: Add separate defconfig for 32bit systems
This patch adds rv32_defconfig for 32bit systems. The only difference between rv32_defconfig and defconfig is that rv32_defconfig has CONFIG_ARCH_RV32I=y. Signed-off-by: Anup Patel --- arch/riscv/configs/rv32_defconfig | 84 +++ 1 file changed, 84 insertions(+) create mode 100644 arch/riscv/configs/rv32_defconfig diff --git a/arch/riscv/configs/rv32_defconfig b/arch/riscv/configs/rv32_defconfig new file mode 100644 index ..1a911ed8e772 --- /dev/null +++ b/arch/riscv/configs/rv32_defconfig @@ -0,0 +1,84 @@ +CONFIG_SYSVIPC=y +CONFIG_POSIX_MQUEUE=y +CONFIG_IKCONFIG=y +CONFIG_IKCONFIG_PROC=y +CONFIG_CGROUPS=y +CONFIG_CGROUP_SCHED=y +CONFIG_CFS_BANDWIDTH=y +CONFIG_CGROUP_BPF=y +CONFIG_NAMESPACES=y +CONFIG_USER_NS=y +CONFIG_CHECKPOINT_RESTORE=y +CONFIG_BLK_DEV_INITRD=y +CONFIG_EXPERT=y +CONFIG_BPF_SYSCALL=y +CONFIG_ARCH_RV32I=y +CONFIG_SMP=y +CONFIG_MODULES=y +CONFIG_MODULE_UNLOAD=y +CONFIG_NET=y +CONFIG_PACKET=y +CONFIG_UNIX=y +CONFIG_INET=y +CONFIG_IP_MULTICAST=y +CONFIG_IP_ADVANCED_ROUTER=y +CONFIG_IP_PNP=y +CONFIG_IP_PNP_DHCP=y +CONFIG_IP_PNP_BOOTP=y +CONFIG_IP_PNP_RARP=y +CONFIG_NETLINK_DIAG=y +CONFIG_PCI=y +CONFIG_PCIEPORTBUS=y +CONFIG_PCI_HOST_GENERIC=y +CONFIG_PCIE_XILINX=y +CONFIG_DEVTMPFS=y +CONFIG_BLK_DEV_LOOP=y +CONFIG_VIRTIO_BLK=y +CONFIG_BLK_DEV_SD=y +CONFIG_BLK_DEV_SR=y +CONFIG_ATA=y +CONFIG_SATA_AHCI=y +CONFIG_SATA_AHCI_PLATFORM=y +CONFIG_NETDEVICES=y +CONFIG_VIRTIO_NET=y +CONFIG_MACB=y +CONFIG_E1000E=y +CONFIG_R8169=y +CONFIG_MICROSEMI_PHY=y +CONFIG_INPUT_MOUSEDEV=y +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_SERIAL_OF_PLATFORM=y +CONFIG_SERIAL_EARLYCON_RISCV_SBI=y +CONFIG_HVC_RISCV_SBI=y +# CONFIG_PTP_1588_CLOCK is not set +CONFIG_DRM=y +CONFIG_DRM_RADEON=y +CONFIG_FRAMEBUFFER_CONSOLE=y +CONFIG_USB=y +CONFIG_USB_XHCI_HCD=y +CONFIG_USB_XHCI_PLATFORM=y +CONFIG_USB_EHCI_HCD=y +CONFIG_USB_EHCI_HCD_PLATFORM=y +CONFIG_USB_OHCI_HCD=y +CONFIG_USB_OHCI_HCD_PLATFORM=y +CONFIG_USB_STORAGE=y +CONFIG_USB_UAS=y +CONFIG_VIRTIO_MMIO=y +CONFIG_SIFIVE_PLIC=y +CONFIG_EXT4_FS=y +CONFIG_EXT4_FS_POSIX_ACL=y +CONFIG_AUTOFS4_FS=y +CONFIG_MSDOS_FS=y +CONFIG_VFAT_FS=y +CONFIG_TMPFS=y +CONFIG_TMPFS_POSIX_ACL=y +CONFIG_NFS_FS=y +CONFIG_NFS_V4=y +CONFIG_NFS_V4_1=y +CONFIG_NFS_V4_2=y +CONFIG_ROOT_NFS=y +CONFIG_CRYPTO_USER_API_HASH=y +CONFIG_CRYPTO_DEV_VIRTIO=y +CONFIG_PRINTK_TIME=y +# CONFIG_RCU_TRACE is not set -- 2.17.1
[PATCH 2/3] RISC-V: Make setup_vm() independent of GCC code model
The setup_vm() must access kernel symbols in a position independent way because it will be called from head.S with MMU off. If we compile kernel with cmodel=medany then PC-relative addressing will be used in setup_vm() to access kernel symbols so it works perfectly fine. Although, if we compile kernel with cmodel=medlow then either absolute addressing or PC-relative addressing (based on whichever requires fewer instructions) is used to access kernel symbols in setup_vm(). This can break setup_vm() whenever any absolute addressing is used to access kernel symbols. With the movement of setup_vm() from kernel/setup.c to mm/init.c, the setup_vm() is now broken for cmodel=medlow but it works perfectly fine for cmodel=medany. This patch fixes setup_vm() and makes it independent of GCC code model by accessing kernel symbols relative to kernel load address instead of assuming PC-relative addressing. Fixes: 6f1e9e946f0b ("RISC-V: Move setup_vm() to mm/init.c") Signed-off-by: Anup Patel --- arch/riscv/kernel/head.S | 1 + arch/riscv/mm/init.c | 71 ++-- 2 files changed, 47 insertions(+), 25 deletions(-) diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S index fe884cd69abd..7966262b4f9d 100644 --- a/arch/riscv/kernel/head.S +++ b/arch/riscv/kernel/head.S @@ -62,6 +62,7 @@ clear_bss_done: /* Initialize page tables and relocate to virtual addresses */ la sp, init_thread_union + THREAD_SIZE + la a0, _start call setup_vm call relocate diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index b379a75ac6a6..f35299f2f3d5 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -172,55 +172,76 @@ void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot) } } -asmlinkage void __init setup_vm(void) +static inline void *__early_va(void *ptr, uintptr_t load_pa) { extern char _start; + uintptr_t va = (uintptr_t)ptr; + uintptr_t sz = (uintptr_t)(&_end) - (uintptr_t)(&_start); + + if (va >= PAGE_OFFSET && va < (PAGE_OFFSET + sz)) + return (void *)(load_pa + (va - PAGE_OFFSET)); + return (void *)va; +} + +asmlinkage void __init setup_vm(uintptr_t load_pa) +{ uintptr_t i; - uintptr_t pa = (uintptr_t) &_start; +#ifndef __PAGETABLE_PMD_FOLDED + pmd_t *pmdp; +#endif + pgd_t *pgdp; + phys_addr_t map_pa; + pgprot_t tableprot = __pgprot(_PAGE_TABLE); pgprot_t prot = __pgprot(pgprot_val(PAGE_KERNEL) | _PAGE_EXEC); - va_pa_offset = PAGE_OFFSET - pa; - pfn_base = PFN_DOWN(pa); + va_pa_offset = PAGE_OFFSET - load_pa; + pfn_base = PFN_DOWN(load_pa); /* Sanity check alignment and size */ BUG_ON((PAGE_OFFSET % PGDIR_SIZE) != 0); - BUG_ON((pa % (PAGE_SIZE * PTRS_PER_PTE)) != 0); + BUG_ON((load_pa % (PAGE_SIZE * PTRS_PER_PTE)) != 0); #ifndef __PAGETABLE_PMD_FOLDED - trampoline_pg_dir[(PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD] = - pfn_pgd(PFN_DOWN((uintptr_t)trampoline_pmd), - __pgprot(_PAGE_TABLE)); - trampoline_pmd[0] = pfn_pmd(PFN_DOWN(pa), prot); + pgdp = __early_va(trampoline_pg_dir, load_pa); + map_pa = (uintptr_t)__early_va(trampoline_pmd, load_pa); + pgdp[(PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD] = + pfn_pgd(PFN_DOWN(map_pa), tableprot); + trampoline_pmd[0] = pfn_pmd(PFN_DOWN(load_pa), prot); + + pgdp = __early_va(swapper_pg_dir, load_pa); for (i = 0; i < (-PAGE_OFFSET)/PGDIR_SIZE; ++i) { size_t o = (PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD + i; - swapper_pg_dir[o] = - pfn_pgd(PFN_DOWN((uintptr_t)swapper_pmd) + i, - __pgprot(_PAGE_TABLE)); + map_pa = (uintptr_t)__early_va(swapper_pmd, load_pa); + pgdp[o] = pfn_pgd(PFN_DOWN(map_pa) + i, tableprot); } + pmdp = __early_va(swapper_pmd, load_pa); for (i = 0; i < ARRAY_SIZE(swapper_pmd); i++) - swapper_pmd[i] = pfn_pmd(PFN_DOWN(pa + i * PMD_SIZE), prot); + pmdp[i] = pfn_pmd(PFN_DOWN(load_pa + i * PMD_SIZE), prot); - swapper_pg_dir[(FIXADDR_START >> PGDIR_SHIFT) % PTRS_PER_PGD] = - pfn_pgd(PFN_DOWN((uintptr_t)fixmap_pmd), - __pgprot(_PAGE_TABLE)); + map_pa = (uintptr_t)__early_va(fixmap_pmd, load_pa); + pgdp[(FIXADDR_START >> PGDIR_SHIFT) % PTRS_PER_PGD] = + pfn_pgd(PFN_DOWN(map_pa), tableprot); + pmdp = __early_va(fixmap_pmd, load_pa); + map_pa = (uintptr_t)__early_va(fixmap_pte, load_pa); fixmap_pmd[(FIXADDR_START >> PMD_SHIFT) % PTRS_PER_PMD] = - pfn_pmd(PFN_DOWN((uintptr_t)fixmap_pte), - __pgprot(_PAGE_TABLE)); + pfn_pmd(PFN_DOWN(map_pa), tableprot); #else - trampoline_pg_dir
[PATCH 0/3] Boot RISC-V kernel from any 4KB aligned address
From: Anup Patel This patchset primarily extends initial page table setup using fixmap to boot Linux RISC-V kernel (64bit and 32bit) from any 4KB aligned address. We also add 32bit defconfig to allow people to try 32bit Linux RISC-V kernel as well. The patchset is tested on SiFive Unleashed board and QEMU virt machine. It can also be found in riscv_setup_vm_v1 branch of https//github.com/avpatel/linux.git Anup Patel (3): RISC-V: Add separate defconfig for 32bit systems RISC-V: Make setup_vm() independent of GCC code model RISC-V: Allow booting kernel from any 4KB aligned address arch/riscv/configs/rv32_defconfig | 84 +++ arch/riscv/include/asm/fixmap.h | 5 + arch/riscv/include/asm/pgtable-64.h | 5 + arch/riscv/include/asm/pgtable.h| 6 +- arch/riscv/kernel/head.S| 2 + arch/riscv/kernel/setup.c | 4 +- arch/riscv/mm/init.c| 370 +++- 7 files changed, 419 insertions(+), 57 deletions(-) create mode 100644 arch/riscv/configs/rv32_defconfig -- 2.17.1
Re: [PATCH] dt-bindings: clock: imx8mq: Fix numbering overlaps and gaps
Quoting Patrick Wildt (2019-03-12 13:59:22) > On Tue, Mar 12, 2019 at 01:39:50PM -0700, Stephen Boyd wrote: > > Quoting Patrick Wildt (2019-03-12 00:36:54) > > > On Fri, Mar 08, 2019 at 07:29:05AM -0800, Stephen Boyd wrote: > > > > It's mostly about making sure that any existing dtbs don't have their > > > > numbers shifted around. So hopefully any overlapping identifiers aren't > > > > in use yet and then those ids can be changed while leaving the ones that > > > > are in use how they are. > > > > > > In practice I bet no one uses Linux 5.0's i.MX8M device trees since they > > > lack too much support. It's so basic it's not useful. You'd still run > > > your existing non-mainline bindings until it is. Thus I would argue > > > changing the ABI right now would be the only chance there is. > > > > > > If you think that chance is gone, then I guess the reasonable thing is > > > to keep the numbers and only move those (to the end) which overlap. Or > > > put them into that erreneous number gap. > > > > > > > The chance is quickly slipping away because we're going to be at -rc1 > > soon. I'm not the one to decide what is and isn't being used by people > > out there, so I'm happy to apply this patch now before the next -rc1 > > comes out as long as it doesn't break anything in arm-soc area. The > > confidence I'm getting isn't high though. Has anyone from NXP reviewed > > this change? Maybe I can get an ack from someone else that normally > > looks after the arm-soc/dts side of things here indicating that nothing > > should go wrong? That would increase my confidence levels. > > The person that supplied the diff apparently is from NXP, which should > be enough to say that NXP reviewed it? > > It's a bit of a shame that the ones that are CC'd keep quiet. I would > take this chance and go ahead with it. After 5.1/rc1 there will be no > chance to rectify this in a sane way. Ok. I'm just going to merge it and see if anyone complains. I'll send the PR tomorrow.
Re: [PATCH 0/2] Drivers: hv: Move Hyper-V clock/timer code to separate clocksource driver
On Tue, Mar 12, 2019 at 09:53:28PM +, Michael Kelley wrote: > From: gre...@linuxfoundation.org Sent: Tuesday, > March 12, 2019 2:47 PM > > > > > > Michael Kelley (2): > > > Drivers: hv: Move Hyper-V clockevents code to new clocksource driver > > > Drivers: hv: Move Hyper-V clocksource code to new clocksource driver > > > > You have two different patches that do different things, yet have the > > same identical shortlog text :( > > > > That's not ok, and something that I reject for trivial patches, it > > should never happen for a "real" patch as you don't do the same thing in > > both of these patches. > > Hmmm. Not identical. The first patch is "clockevents" code, and the second > patch is "clocksource" code. Wow, that's not obvious at all, sorry. You still might want to make it a bit more different :) greg k-h
Re: [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device
On 3/11/19 4:31 AM, Pierre Morel wrote: On 08/03/2019 23:43, Tony Krowiak wrote: On 2/22/19 10:29 AM, Pierre Morel wrote: When the device is remove, we must make sure to clear the interruption and reset the AP device. We also need to clear the CRYCB of the guest. Signed-off-by: Pierre Morel --- drivers/s390/crypto/vfio_ap_drv.c | 35 +++ drivers/s390/crypto/vfio_ap_ops.c | 3 ++- drivers/s390/crypto/vfio_ap_private.h | 3 +++ 3 files changed, 40 insertions(+), 1 deletion(-) diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c index eca0ffc..e5d91ff 100644 --- a/drivers/s390/crypto/vfio_ap_drv.c +++ b/drivers/s390/crypto/vfio_ap_drv.c @@ -5,6 +5,7 @@ * Copyright IBM Corp. 2018 * * Author(s): Tony Krowiak + * Pierre Morel */ #include @@ -12,6 +13,8 @@ #include #include #include +#include +#include #include "vfio_ap_private.h" #define VFIO_AP_ROOT_NAME "vfio_ap" @@ -61,6 +64,33 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev) } /** + * vfio_ap_update_crycb + * @q: A pointer to the queue being removed + * + * We clear the APID of the queue, making this queue unusable for the guest. + * After this function we can reset the queue without to fear a race with + * the guest to access the queue again. + * We do not fear race with the host as we still get the device. + */ +static void vfio_ap_update_crycb(struct vfio_ap_queue *q) +{ + struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev; + + if (!matrix_mdev) + return; + You should probably check whether the APID has been cleared before proceeding. Take the case where an AP with multiple queues is removed from the configuration via the SE or SCLP. The AP bus is going to invoke the vfio_ap_queue_dev_remove() function for each of the queues. The APID will get cleared on the first remove, so it is not only unnecessary to clear it on subsequent removes, it is kind of nasty to keep resetting the masks in the guest's CRYCB (below) each time the remove callback is invoked. + clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm); + + if (!matrix_mdev->kvm) + return; + + kvm_arch_crypto_set_masks(matrix_mdev->kvm, + matrix_mdev->matrix.apm, + matrix_mdev->matrix.aqm, + matrix_mdev->matrix.adm); +} + +/** * vfio_ap_queue_dev_remove: * * Free the associated vfio_ap_queue structure @@ -70,6 +100,11 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev) struct vfio_ap_queue *q; q = dev_get_drvdata(&apdev->device); + if (!q) + return; + + vfio_ap_update_crycb(q); + vfio_ap_mdev_reset_queue(q); Since the bit corresponding to the APID is cleared in the vfio_ap_update_crycb() above, shouldn't all queues on that card also be reset? I do not think so. The remove function will be called in a loop for all queues by the bus. No need to clear all queues. list_del(&q->list); kfree(q); } diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index 0196065..5b9bb33 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -59,6 +59,7 @@ int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q) if (retry <= 0) pr_warn("%s: queue 0x%04x not empty\n", __func__, q->apqn); + vfio_ap_free_irq(q); Shouldn't this be done for the response codes that terminate this loop such as those caught by the default case? I do not think so, the error code is returned and the caller may want to reset the queue again. I think that doing the free inside the call to reset is not right. I will investigate in this direction. Regards, Pierre
Re: [PATCH 3/4] printk: Add consoles to a virtual "console" bus
On Monday 03/11 at 14:33 +0100, Petr Mladek wrote: > On Fri 2019-03-01 16:48:19, Calvin Owens wrote: > > This patch embeds a device struct in the console struct, and registers > > them on a "console" bus so we can expose attributes in sysfs. > > > > Currently, most drivers declare static console structs, and that is > > incompatible with the dev refcount model. So we end up needing to patch > > all of the console drivers to: > > > > 1. Dynamically allocate the console struct using a new helper > > 2. Handle the allocation in (1) possibly failing > > 3. Dispose of (1) with put_device() > > > > Early console structures must still be static, since they're required > > before we're able to allocate memory. The least ugly way I can come up > > with to handle this is an "is_static" flag in the structure which makes > > the gets and puts NOPs, and is checked in ->release() to catch mistakes. > > > > diff --git a/drivers/char/lp.c b/drivers/char/lp.c > > index 5c8d780637bd..e09cb192a469 100644 > > --- a/drivers/char/lp.c > > +++ b/drivers/char/lp.c > > @@ -857,12 +857,12 @@ static void lp_console_write(struct console *co, > > const char *s, > > parport_release(dev); > > } > > > > -static struct console lpcons = { > > - .name = "lp", > > +static const struct console_operations lp_cons_ops = { > > .write = lp_console_write, > > - .flags = CON_PRINTBUFFER, > > }; > > > > +static struct console *lpcons; > > I have got the following compilation error (see below): > > CC drivers/char/lp.o > drivers/char/lp.c: In function ‘lp_register’: > drivers/char/lp.c:925:2: error: ‘lpcons’ undeclared (first use in this > function) > lpcons = allocate_console_dfl(&lp_cons_ops, "lp", NULL); > ^ > drivers/char/lp.c:925:2: note: each undeclared identifier is reported only > once for each function it appears in > In file included from drivers/char/lp.c:125:0: > drivers/char/lp.c:925:33: error: ‘lp_cons_ops’ undeclared (first use in this > function) D'oh, will fix. > > > #endif /* console on line printer */ > > > > /* --- initialisation code - */ > > @@ -921,6 +921,11 @@ static int lp_register(int nr, struct parport *port) > > &ppdev_cb, nr); > > if (lp_table[nr].dev == NULL) > > return 1; > > + > > + lpcons = allocate_console_dfl(&lp_cons_ops, "lp", NULL); > > + if (!lpcons) > > + return -ENOMEM; > > This should be done inside #ifdef CONFIG_LP_CONSOLE > to avoid the above compilation error. > > > + > > lp_table[nr].flags |= LP_EXIST; > > > > if (reset) > > [...] > > diff --git a/include/linux/console.h b/include/linux/console.h > > index 3c27a4a29b8c..382591683033 100644 > > --- a/include/linux/console.h > > +++ b/include/linux/console.h > > @@ -142,20 +143,28 @@ static inline int con_debug_leave(void) > > #define CON_BRL(32) /* Used for a braille device */ > > #define CON_EXTENDED (64) /* Use the extended output format a la > > /dev/kmsg */ > > > > -struct console { > > - charname[16]; > > +struct console; > > + > > +struct console_operations { > > void(*write)(struct console *, const char *, unsigned); > > int (*read)(struct console *, char *, unsigned); > > struct tty_driver *(*device)(struct console *, int *); > > void(*unblank)(void); > > int (*setup)(struct console *, char *); > > int (*match)(struct console *, char *name, int idx, char *options); > > +}; > > + > > +struct console { > > + charname[16]; > > short flags; > > short index; > > int cflag; > > void*data; > > struct console *next; > > int level; > > + const struct console_operations *ops; > > + struct device dev; > > + int is_static; > > }; > > > > /* > > @@ -167,6 +176,29 @@ struct console { > > extern int console_set_on_cmdline; > > extern struct console *early_console; > > > > +extern struct console *allocate_console(const struct console_operations > > *ops, > > + const char *name, short flags, > > + short index, void *data); > > + > > +#define allocate_console_dfl(ops, name, data) \ > > + allocate_console(ops, name, CON_PRINTBUFFER, -1, data) > > + > > +/* > > + * Helpers for get/put that do the right thing for static early consoles. > > + */ > > + > > +#define get_console(con) \ > > +do { \ > > + if (!con->is_static) \ > > + get_device(&(con)->dev); \ > > +} while (0) > > + > > +#define put_console(con) \ > > +do { \ > > + if (con && !con->is_static) \ > > + put_device(&((struct console *)con)->dev); \ > > +} while (0) > > + > > extern int add_preferred_console(char *name, int idx, char *options); > > extern void register_console(struct console *); > > extern int unregister_console(struct console *); > > diff --git a/include/
Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()
On Tue, Mar 12, 2019 at 02:19:15PM -0700, James Bottomley wrote: > I mean in the sequence > > flush_dcache_page(page); > flush_dcache_page(page); > > The first flush_dcache_page did all the work and the second it a > tightly pipelined no-op. That's what I mean by there not really being > a double hit. Ok I wasn't sure it was clear there was a double (profiling) hit on that function. void flush_kernel_dcache_page_addr(void *addr) { unsigned long flags; flush_kernel_dcache_page_asm(addr); purge_tlb_start(flags); pdtlb_kernel(addr); purge_tlb_end(flags); } #define purge_tlb_start(flags) spin_lock_irqsave(&pa_tlb_lock, flags) #define purge_tlb_end(flags)spin_unlock_irqrestore(&pa_tlb_lock, flags) You got a system-wide spinlock in there that won't just go away the second time. So it's a bit more than a tightly pipelined "noop". Your logic of adding the flush on kunmap makes sense, all I'm saying is that it's sacrificing some performance for safety. You asked "optimized what", I meant to optimize away all the above quoted code that will end running twice for each vhost set_bit when it should run just once like in other archs. And it clearly paid off until now (until now it run just once and it was the only safe one). Before we can leverage your idea to flush the dcache on kunmap in common code without having to sacrifice performance in arch code, we'd need to change all other archs to add the cache flushes on kunmap too, and then remove the cache flushes from the other places like copy_page or we'd waste CPU. Then you'd have the best of both words, no double flush and kunmap would be enough. Thanks, Andrea
[PATCH] Makefile: Add '-fno-builtin-bcmp' to CLANG_FLAGS
After LLVM revision r355672 [1], all known working kernel configurations fail to link [2]: ld: init/do_mounts.o: in function `prepare_namespace': do_mounts.c:(.init.text+0x5ca): undefined reference to `bcmp' ld: do_mounts.c:(.init.text+0x5e6): undefined reference to `bcmp' ld: init/initramfs.o: in function `do_header': initramfs.c:(.init.text+0x6e0): undefined reference to `bcmp' ld: initramfs.c:(.init.text+0x6f8): undefined reference to `bcmp' ld: arch/x86/kernel/setup.o: in function `setup_arch': setup.c:(.init.text+0x21d): undefined reference to `bcmp' Commit 6edfba1b33c7 ("[PATCH] x86_64: Don't define string functions to builtin") removed '-ffreestanding' globally and the kernel doesn't provide a bcmp definition so the linker cannot find a reference to it. Fix this by explicitly telling LLVM through Clang not to emit bcmp references. This flag does not need to be behind 'cc-option' because all working versions of Clang support this flag. [1]: https://github.com/llvm/llvm-project/commit/8e16d73346f8091461319a7dfc4ddd18eedcff13 [2]: https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/104027249 Link: https://github.com/ClangBuiltLinux/linux/issues/416 Link: https://bugs.llvm.org/show_bug.cgi?id=41035 Cc: sta...@vger.kernel.org Signed-off-by: Nathan Chancellor --- Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/Makefile b/Makefile index 9ef547fc7ffe..6645a274b6e3 100644 --- a/Makefile +++ b/Makefile @@ -501,6 +501,7 @@ ifneq ($(GCC_TOOLCHAIN),) CLANG_FLAGS+= --gcc-toolchain=$(GCC_TOOLCHAIN) endif CLANG_FLAGS+= -no-integrated-as +CLANG_FLAGS+= -fno-builtin-bcmp KBUILD_CFLAGS += $(CLANG_FLAGS) KBUILD_AFLAGS += $(CLANG_FLAGS) export CLANG_FLAGS -- 2.21.0
RE: [PATCH 0/2] Drivers: hv: Move Hyper-V clock/timer code to separate clocksource driver
From: gre...@linuxfoundation.org Sent: Tuesday, March 12, 2019 2:47 PM > > > > Michael Kelley (2): > > Drivers: hv: Move Hyper-V clockevents code to new clocksource driver > > Drivers: hv: Move Hyper-V clocksource code to new clocksource driver > > You have two different patches that do different things, yet have the > same identical shortlog text :( > > That's not ok, and something that I reject for trivial patches, it > should never happen for a "real" patch as you don't do the same thing in > both of these patches. Hmmm. Not identical. The first patch is "clockevents" code, and the second patch is "clocksource" code. Michael
Re: [RFC][Patch v9 2/6] KVM: Enables the kernel to isolate guest free pages
On 12.03.19 22:13, Alexander Duyck wrote: > On Tue, Mar 12, 2019 at 12:46 PM Nitesh Narayan Lal wrote: >> >> On 3/8/19 4:39 PM, Alexander Duyck wrote: >>> On Fri, Mar 8, 2019 at 11:39 AM Nitesh Narayan Lal >>> wrote: On 3/8/19 2:25 PM, Alexander Duyck wrote: > On Fri, Mar 8, 2019 at 11:10 AM Nitesh Narayan Lal > wrote: >> On 3/8/19 1:06 PM, Alexander Duyck wrote: >>> On Thu, Mar 7, 2019 at 6:32 PM Michael S. Tsirkin >>> wrote: On Thu, Mar 07, 2019 at 02:35:53PM -0800, Alexander Duyck wrote: > The only other thing I still want to try and see if I can do is to add > a jiffies value to the page private data in the case of the buddy > pages. Actually there's one extra thing I think we should do, and that is make sure we do not leave less than X% off the free memory at a time. This way chances of triggering an OOM are lower. >>> If nothing else we could probably look at doing a watermark of some >>> sort so we have to have X amount of memory free but not hinted before >>> we will start providing the hints. It would just be a matter of >>> tracking how much memory we have hinted on versus the amount of memory >>> that has been pulled from that pool. >> This is to avoid false OOM in the guest? > Partially, though it would still be possible. Basically it would just > be a way of determining when we have hinted "enough". Basically it > doesn't do us much good to be hinting on free memory if the guest is > already constrained and just going to reallocate the memory shortly > after we hinted on it. The idea is with a watermark we can avoid > hinting until we start having pages that are actually going to stay > free for a while. > >>> It is another reason why we >>> probably want a bit in the buddy pages somewhere to indicate if a page >>> has been hinted or not as we can then use that to determine if we have >>> to account for it in the statistics. >> The one benefit which I can see of having an explicit bit is that it >> will help us to have a single hook away from the hot path within buddy >> merging code (just like your arch_merge_page) and still avoid duplicate >> hints while releasing pages. >> >> I still have to check PG_idle and PG_young which you mentioned but I >> don't think we can reuse any existing bits. > Those are bits that are already there for 64b. I think those exist in > the page extension for 32b systems. If I am not mistaken they are only > used in VMA mapped memory. What I was getting at is that those are the > bits we could think about reusing. > >> If we really want to have something like a watermark, then can't we use >> zone->free_pages before isolating to see how many free pages are there >> and put a threshold on it? (__isolate_free_page() does a similar thing >> but it does that on per request basis). > Right. That is only part of it though since that tells you how many > free pages are there. But how many of those free pages are hinted? > That is the part we would need to track separately and then then > compare to free_pages to determine if we need to start hinting on more > memory or not. Only pages which are isolated will be hinted, and once a page is isolated it will not be counted in the zone free pages. Feel free to correct me if I am wrong. >>> You are correct up to here. When we isolate the page it isn't counted >>> against the free pages. However after we complete the hint we end up >>> taking it out of isolation and returning it to the "free" state, so it >>> will be counted against the free pages. >>> If I am understanding it correctly you only want to hint the idle pages, is that right? >>> Getting back to the ideas from our earlier discussion, we had 3 stages >>> for things. Free but not hinted, isolated due to hinting, and free and >>> hinted. So what we would need to do is identify the size of the first >>> pool that is free and not hinted by knowing the total number of free >>> pages, and then subtract the size of the pages that are hinted and >>> still free. >> To summarize, for now, I think it makes sense to stick with the current >> approach as this way we can avoid any locking in the allocation path and >> reduce the number of hypercalls for a bunch of MAX_ORDER - 1 page. > > I'm not sure what you are talking about by "avoid any locking in the > allocation path". Are you talking about the spin on idle bit, if so > then yes. However I have been testing your patches and I was correct > in the assumption that you forgot to handle the zone lock when you > were freeing __free_one_page. I just did a quick copy/paste from your > zone lock handling from the guest_free_page_hinting function into the > release_buddy_pages function and then I was able to enable multiple > CPUs without any issues. > >> For the
Re: [PATCH 09/10] mm/hmm: allow to mirror vma of a file on a DAX backed filesystem
On Tue, 12 Mar 2019 12:30:52 -0700 Dan Williams wrote: > On Tue, Mar 12, 2019 at 12:06 PM Jerome Glisse wrote: > > On Tue, Mar 12, 2019 at 09:06:12AM -0700, Dan Williams wrote: > > > On Tue, Mar 12, 2019 at 8:26 AM Jerome Glisse wrote: > [..] > > > > Spirit of the rule is better than blind application of rule. > > > > > > Again, I fail to see why HMM is suddenly unable to make forward > > > progress when the infrastructure that came before it was merged with > > > consumers in the same development cycle. > > > > > > A gate to upstream merge is about the only lever a reviewer has to > > > push for change, and these requests to uncouple the consumer only > > > serve to weaken that review tool in my mind. > > > > Well let just agree to disagree and leave it at that and stop > > wasting each other time > > I'm fine to continue this discussion if you are. Please be specific > about where we disagree and what aspect of the proposed rules about > merge staging are either acceptable, painful-but-doable, or > show-stoppers. Do you agree that HMM is doing something novel with > merge staging, am I off base there? You're correct. We chose to go this way because the HMM code is so large and all-over-the-place that developing it in a standalone tree seemed impractical - better to feed it into mainline piecewise. This decision very much assumed that HMM users would definitely be merged, and that it would happen soon. I was skeptical for a long time and was eventually persuaded by quite a few conversations with various architecture and driver maintainers indicating that these HMM users would be forthcoming. In retrospect, the arrival of HMM clients took quite a lot longer than was anticipated and I'm not sure that all of the anticipated usage sites will actually be using it. I wish I'd kept records of who-said-what, but I didn't and the info is now all rather dissipated. So the plan didn't really work out as hoped. Lesson learned, I would now very much prefer that new HMM feature work's changelogs include links to the driver patchsets which will be using those features and acks and review input from the developers of those driver patchsets.
Re: [PATCH v3] net: sh_eth: fix a missing check of of_get_phy_mode
From: Kangjie Lu Date: Tue, 12 Mar 2019 02:43:18 -0500 > of_get_phy_mode may fail and return a negative error code; > the fix checks the return value of of_get_phy_mode and > returns NULL of it fails. > > Signed-off-by: Kangjie Lu Applied with Fixes: tag added.
Re: [PATCH 0/2] Drivers: hv: Move Hyper-V clock/timer code to separate clocksource driver
On Tue, Mar 12, 2019 at 09:42:09PM +, Michael Kelley wrote: > This patch series moves Hyper-V clock/timer code to a separate Hyper-V > clocksource driver. Previously, Hyper-V clock/timer code and data > structures were mixed in with other Hyper-V code in the ISA independent > drivers/hv code as well as in arch dependent code. The new Hyper-V > clocksource driver is ISA independent, with a just few dependencies on > arch specific functions. The patch series does not change any behavior > or functionality -- it only reorganizes the existing code and fixes up > the linkages. A few places outside of Hyper-V code are fixed up to use > the new #include file structure. > > This restructuring is in response to Marc Zyngier's review comments > on supporting Hyper-V running on ARM64, and is a good idea in general. > It increases the amount of code shared between the x86 and ARM64 > architectures, and reduces the size of the new code for supporting > Hyper-V on ARM64. A new version of the Hyper-V on ARM64 patches will > follow once this clocksource restructuring is accepted. > > The code is currently diff'ed against Linux 5.0. I'll rebase > to linux-next once 5.1-rc1 is available. > > Michael Kelley (2): > Drivers: hv: Move Hyper-V clockevents code to new clocksource driver > Drivers: hv: Move Hyper-V clocksource code to new clocksource driver You have two different patches that do different things, yet have the same identical shortlog text :( That's not ok, and something that I reject for trivial patches, it should never happen for a "real" patch as you don't do the same thing in both of these patches. thanks, greg k-h
[PATCH 2/2] Drivers: hv: Move Hyper-V clocksource code to new clocksource driver
Code for the Hyper-V specific clocksources is currently mixed in with other Hyper-V code. Move the code to a Hyper-V specific driver in the "clocksource" directory, while separating out ISA dependencies so that the new clocksource driver is ISA independent. Update the Hyper-V initialization code to call initialization and cleanup routines since the Hyper-V synthetic timers are not independently enumerated in ACPI. Update Hyper-V clocksource users KVM and VDSO to get definitions from a new include file. No behavior is changed and no new functionality is added. Signed-off-by: Michael Kelley --- arch/x86/entry/vdso/vclock_gettime.c | 1 + arch/x86/entry/vdso/vma.c | 2 +- arch/x86/hyperv/hv_init.c | 91 ++--- arch/x86/include/asm/mshyperv.h | 80 +++--- arch/x86/kvm/x86.c| 1 + drivers/clocksource/hyperv_syntimer.c | 122 ++ include/clocksource/hyperv_syntimer.h | 78 ++ 7 files changed, 219 insertions(+), 156 deletions(-) diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c index 007b3fe9..b0de2a2 100644 --- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -21,6 +21,7 @@ #include #include #include +#include #define gtod (&VVAR(vsyscall_gtod_data)) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index babc4e7..8b81c91 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -22,7 +22,7 @@ #include #include #include -#include +#include #if defined(CONFIG_X86_64) unsigned int __read_mostly vdso64_enabled = 1; diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 7abb09e..eb80f65 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -27,64 +27,13 @@ #include #include #include -#include #include #include #include - -#ifdef CONFIG_HYPERV_TSCPAGE - -static struct ms_hyperv_tsc_page *tsc_pg; - -struct ms_hyperv_tsc_page *hv_get_tsc_page(void) -{ - return tsc_pg; -} -EXPORT_SYMBOL_GPL(hv_get_tsc_page); - -static u64 read_hv_clock_tsc(struct clocksource *arg) -{ - u64 current_tick = hv_read_tsc_page(tsc_pg); - - if (current_tick == U64_MAX) - rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick); - - return current_tick; -} - -static struct clocksource hyperv_cs_tsc = { - .name = "hyperv_clocksource_tsc_page", - .rating = 400, - .read = read_hv_clock_tsc, - .mask = CLOCKSOURCE_MASK(64), - .flags = CLOCK_SOURCE_IS_CONTINUOUS, -}; -#endif - -static u64 read_hv_clock_msr(struct clocksource *arg) -{ - u64 current_tick; - /* -* Read the partition counter to get the current tick count. This count -* is set to 0 when the partition is created and is incremented in -* 100 nanosecond units. -*/ - rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick); - return current_tick; -} - -static struct clocksource hyperv_cs_msr = { - .name = "hyperv_clocksource_msr", - .rating = 400, - .read = read_hv_clock_msr, - .mask = CLOCKSOURCE_MASK(64), - .flags = CLOCK_SOURCE_IS_CONTINUOUS, -}; +#include void *hv_hypercall_pg; EXPORT_SYMBOL_GPL(hv_hypercall_pg); -struct clocksource *hyperv_cs; -EXPORT_SYMBOL_GPL(hyperv_cs); u32 *hv_vp_index; EXPORT_SYMBOL_GPL(hv_vp_index); @@ -349,41 +298,11 @@ void __init hyperv_init(void) x86_init.pci.arch_init = hv_pci_init; /* -* Register Hyper-V specific clocksource. +* Register Hyper-V specific clocksource. Pass 'false' as the +* arguemnt, indicating to not register the clocksource as the +* sched clock. */ -#ifdef CONFIG_HYPERV_TSCPAGE - if (ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE) { - union hv_x64_msr_hypercall_contents tsc_msr; - - tsc_pg = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL); - if (!tsc_pg) - goto register_msr_cs; - - hyperv_cs = &hyperv_cs_tsc; - - rdmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64); - - tsc_msr.enable = 1; - tsc_msr.guest_physical_address = vmalloc_to_pfn(tsc_pg); - - wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64); - - hyperv_cs_tsc.archdata.vclock_mode = VCLOCK_HVCLOCK; - - clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100); - return; - } -register_msr_cs: -#endif - /* -* For 32 bit guests just use the MSR based mechanism for reading -* the partition counter. -*/ - - hyperv_cs = &hyperv_cs_msr; - if (ms_hyperv.features & HV_MSR_TIME_REF_COUNT_AVAILABLE) -
[PATCH 1/2] Drivers: hv: Move Hyper-V clockevents code to new clocksource driver
Clockevents code for Hyper-V synthetic timers is currently mixed in with other Hyper-V code. Move the code to a Hyper-V specific driver in the "clocksource" directory. Update the VMbus driver to call initialization and cleanup routines since the Hyper-V synthetic timers are not independently enumerated in ACPI. No behavior is changed and no new functionality is added. Signed-off-by: Michael Kelley --- MAINTAINERS | 2 + arch/x86/include/asm/hyperv-tlfs.h| 6 + arch/x86/kernel/cpu/mshyperv.c| 2 + drivers/clocksource/Makefile | 1 + drivers/clocksource/hyperv_syntimer.c | 206 ++ drivers/hv/hv.c | 154 - drivers/hv/hyperv_vmbus.h | 3 - drivers/hv/vmbus_drv.c| 39 +++ include/clocksource/hyperv_syntimer.h | 26 + 9 files changed, 263 insertions(+), 176 deletions(-) create mode 100644 drivers/clocksource/hyperv_syntimer.c create mode 100644 include/clocksource/hyperv_syntimer.h diff --git a/MAINTAINERS b/MAINTAINERS index 21ab064..3352716 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7159,6 +7159,7 @@ F:arch/x86/include/asm/trace/hyperv.h F: arch/x86/include/asm/hyperv-tlfs.h F: arch/x86/kernel/cpu/mshyperv.c F: arch/x86/hyperv +F: drivers/clocksource/hyperv_syntimer.c F: drivers/hid/hid-hyperv.c F: drivers/hv/ F: drivers/input/serio/hyperv-keyboard.c @@ -7168,6 +7169,7 @@ F:drivers/scsi/storvsc_drv.c F: drivers/uio/uio_hv_generic.c F: drivers/video/fbdev/hyperv_fb.c F: net/vmw_vsock/hyperv_transport.c +F: include/clocksource/hyperv_syntimer.h F: include/linux/hyperv.h F: include/uapi/linux/hyperv.h F: tools/hv/ diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h index 2bdbbbc..ee62f57 100644 --- a/arch/x86/include/asm/hyperv-tlfs.h +++ b/arch/x86/include/asm/hyperv-tlfs.h @@ -401,6 +401,12 @@ enum HV_GENERIC_SET_FORMAT { #define HV_STATUS_INVALID_CONNECTION_ID18 #define HV_STATUS_INSUFFICIENT_BUFFERS 19 +/* + * The Hyper-V TimeRefCount register and the TSC + * page provide a guest VM clock with 100ns tick rate + */ +#define HV_CLOCK_HZ (NSEC_PER_SEC/100) + typedef struct _HV_REFERENCE_TSC_PAGE { __u32 tsc_sequence; __u32 res1; diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index e81a2db..f53a35a 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -84,6 +85,7 @@ __visible void __irq_entry hv_stimer0_vector_handler(struct pt_regs *regs) inc_irq_stat(hyperv_stimer0_count); if (hv_stimer0_handler) hv_stimer0_handler(); + add_interrupt_randomness(HYPERV_STIMER0_VECTOR, 0); ack_APIC_irq(); exiting_irq(); diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile index be6e0fb..a887955 100644 --- a/drivers/clocksource/Makefile +++ b/drivers/clocksource/Makefile @@ -83,3 +83,4 @@ obj-$(CONFIG_ATCPIT100_TIMER) += timer-atcpit100.o obj-$(CONFIG_RISCV_TIMER) += timer-riscv.o obj-$(CONFIG_CSKY_MP_TIMER)+= timer-mp-csky.o obj-$(CONFIG_GX6605S_TIMER)+= timer-gx6605s.o +obj-$(CONFIG_HYPERV) += hyperv_syntimer.o diff --git a/drivers/clocksource/hyperv_syntimer.c b/drivers/clocksource/hyperv_syntimer.c new file mode 100644 index 000..7276308 --- /dev/null +++ b/drivers/clocksource/hyperv_syntimer.c @@ -0,0 +1,206 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Clocksource driver for the synthetic counter and timers + * provided by the Hyper-V hypervisor to guest VMs, as described + * in the Hyper-V Top Level Functional Spec (TLFS). This driver + * is instruction set architecture independent. + * + * Copyright (C) 2019, Microsoft, Inc. + * + * Author: Michael Kelley + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +static struct clock_event_device __percpu *hv_clock_event; + +/* + * If false, we're using the old mechanism for stimer0 interrupts + * where it sends a VMbus message when it expires. The old + * mechanism is used when running on older versions of Hyper-V + * that don't support Direct Mode. While Hyper-V provides + * four stimer's per CPU, Linux uses only stimer0. + */ +static bool direct_mode_enabled; + +static int stimer0_irq; +static int stimer0_vector; +static int stimer0_message_sint; +static int stimer0_cpuhp_online; + +/* + * ISR for when stimer0 is operating in Direct Mode. Direct Mode + * does not use VMbus or any VMbus messages, so process here and not + * in the VMbus driver code. + */ +void hv_stimer0_isr(void) +{ + struct clock_event_device *ce; + + ce = this_cpu_ptr(hv_clock_event); + ce->event_handl
[PATCH 0/2] Drivers: hv: Move Hyper-V clock/timer code to separate clocksource driver
This patch series moves Hyper-V clock/timer code to a separate Hyper-V clocksource driver. Previously, Hyper-V clock/timer code and data structures were mixed in with other Hyper-V code in the ISA independent drivers/hv code as well as in arch dependent code. The new Hyper-V clocksource driver is ISA independent, with a just few dependencies on arch specific functions. The patch series does not change any behavior or functionality -- it only reorganizes the existing code and fixes up the linkages. A few places outside of Hyper-V code are fixed up to use the new #include file structure. This restructuring is in response to Marc Zyngier's review comments on supporting Hyper-V running on ARM64, and is a good idea in general. It increases the amount of code shared between the x86 and ARM64 architectures, and reduces the size of the new code for supporting Hyper-V on ARM64. A new version of the Hyper-V on ARM64 patches will follow once this clocksource restructuring is accepted. The code is currently diff'ed against Linux 5.0. I'll rebase to linux-next once 5.1-rc1 is available. Michael Kelley (2): Drivers: hv: Move Hyper-V clockevents code to new clocksource driver Drivers: hv: Move Hyper-V clocksource code to new clocksource driver MAINTAINERS | 2 + arch/x86/entry/vdso/vclock_gettime.c | 1 + arch/x86/entry/vdso/vma.c | 2 +- arch/x86/hyperv/hv_init.c | 91 +- arch/x86/include/asm/hyperv-tlfs.h| 6 + arch/x86/include/asm/mshyperv.h | 80 ++--- arch/x86/kernel/cpu/mshyperv.c| 2 + arch/x86/kvm/x86.c| 1 + drivers/clocksource/Makefile | 1 + drivers/clocksource/hyperv_syntimer.c | 328 ++ drivers/hv/hv.c | 154 drivers/hv/hyperv_vmbus.h | 3 - drivers/hv/vmbus_drv.c| 39 ++-- include/clocksource/hyperv_syntimer.h | 104 +++ 14 files changed, 482 insertions(+), 332 deletions(-) create mode 100644 drivers/clocksource/hyperv_syntimer.c create mode 100644 include/clocksource/hyperv_syntimer.h -- 1.8.3.1
Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
On 3/3/19 9:09 PM, Halil Pasic wrote: On Fri, 22 Feb 2019 16:29:56 +0100 Pierre Morel wrote: We need to associate the ap_vfio_queue, which will hold the per queue information for interrupt with a matrix mediated device which hold the configuration and the way to the CRYCB. [..] +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid) +{ + int apqi, apqn; + int ret = 0; + struct vfio_ap_queue *q; + struct list_head q_list; + + INIT_LIST_HEAD(&q_list); + + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) { + apqn = AP_MKQID(apid, apqi); + q = vfio_ap_get_queue(apqn, &matrix_dev->free_list); + if (!q) { + ret = -EADDRNOTAVAIL; + goto rewind; + } + if (q->matrix_mdev) { + ret = -EADDRINUSE; You tried to get the q from matrix_dev->free_list thus modulo races q->matrix_mdev should be 0. This change breaks the error codes in a sense that it becomes impossible to provoke EADDRINUSE (the proper error code for taken by another matrix_mdev). It is necessary to determine if the queue is in use by another mdev, so it will still be necessary to traverse all of the matrix_mdev structs to see if q is in matrix_mdev->qlist. It seems that maintaining the qlist does not buy us much. + goto rewind; + } + list_move(&q->list, &q_list); + } + move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev); return 0; +rewind: + move_and_set(&q_list, &matrix_dev->free_list, NULL); + return ret; } - /** - * vfio_ap_mdev_verify_no_sharing + * vfio_ap_get_all_cards: * - * Verifies that the APQNs derived from the cross product of the AP adapter IDs - * and AP queue indexes comprising the AP matrix are not configured for another - * mediated device. AP queue sharing is not allowed. + * @matrix_mdev: the matrix mediated device for which we want to associate + * all available queues with a given apqi. + * @apqi: The apqi which associated with all defined APID of the + * mediated device will define a AP queue. * - * @matrix_mdev: the mediated matrix device + * We define a local list to put all queues we find on the matrix device + * free list when associating the apqi with all already defined apid for + * this matrix mediated device. * - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE. + * If we can get all the devices we roll them to the mediated device list + * If we get errors we unroll them to the free list. */ -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev) +static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi) { - struct ap_matrix_mdev *lstdev; - DECLARE_BITMAP(apm, AP_DEVICES); - DECLARE_BITMAP(aqm, AP_DOMAINS); - - list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) { - if (matrix_mdev == lstdev) - continue; - - memset(apm, 0, sizeof(apm)); - memset(aqm, 0, sizeof(aqm)); - - /* -* We work on full longs, as we can only exclude the leftover -* bits in non-inverse order. The leftover is all zeros. -*/ - if (!bitmap_and(apm, matrix_mdev->matrix.apm, - lstdev->matrix.apm, AP_DEVICES)) - continue; - - if (!bitmap_and(aqm, matrix_mdev->matrix.aqm, - lstdev->matrix.aqm, AP_DOMAINS)) - continue; - - return -EADDRINUSE; + int apid, apqn; + int ret = 0; + struct vfio_ap_queue *q; + struct list_head q_list; + struct ap_matrix_mdev *tmp = NULL; + + INIT_LIST_HEAD(&q_list); + + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) { + apqn = AP_MKQID(apid, apqi); + q = vfio_ap_get_queue(apqn, &matrix_dev->free_list); + if (!q) { + ret = -EADDRNOTAVAIL; + goto rewind; + } + if (q->matrix_mdev) { + ret = -EADDRINUSE; Same here! Regards, Halil + goto rewind; + } + list_move(&q->list, &q_list); } [..]
Re: [question] panic() during reboot -f (reboot syscall)
Linus Torvalds writes: > On Wed, Mar 6, 2019 at 5:29 AM Petr Mladek wrote: >> >> I wonder if it is "normal" to get panic() when the system is rebooted >> using "reboot -f". I looks a bit weird to me. > > No, a panic is never normal (except possibly for test modules etc, of course). > >> Now, "reboot -f" just calls the reboot() syscall. I do not see >> anything that would stop processes. > > There isn't supposed to be anything. It's meant for "things are > screwed up, just reboot *now* without doing anything else". > > The "reboot now" is basically meant to be a poor man's power cycle. > >> But it shuts down devices very early, via: >> >> + kernel_restart() >> + kernel_restart_prepare() >> + blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, >> cmd); >> + device_shutdown() > > The problem is that there are conflicting goals here, and the kernel > doesn't even *know* if this is supposed to be a normal clean reboot, > or a "reboot -f" that just shuts down everything. > > On a nice clean reboot (where init has shut everything down) we > obviously _do_ want to shut devices down etc. Quite often you need to > do it just to make sure they come up nicely again (because the > firmware isn't even always re-initializing things properly on a soft > reboot). > > But on a "reboot -f", user space _hasn't_ cleaned up, and just wants > things to reboot. But the kernel doesn't really know. It just gets the > reboot system call in both cases. > >> By other words. It looks like the panic() is possible by design. >> But it looks a bit weird. Any opinion? > > It's definitely not "by design", but it might be unavoidable in this case. > > Of course, "unavoidable" is relative. There could be workarounds that > are reasonably ok in practice. > > Like having the filesystem panic code see "oh, system_state isn't > SYSTEM_RUNNING, so I shouldn't be panicing". I wonder if there is an easy way to get the scheduler to not schedule userspace processes once the reboot system call has started. That sounds like the simple way to avoid this kind of confusion. Eric
Re: [PATCH] mm: remove unused variable
On Tue, 12 Mar 2019 15:03:52 +0100 Bartosz Golaszewski wrote: > wt., 12 mar 2019 o 14:59 Khalid Aziz napisał(a): > > > > On 3/12/19 7:28 AM, Bartosz Golaszewski wrote: > > > From: Bartosz Golaszewski > > > > > > The mm variable is set but unused. Remove it. > > > > It is used. Look further down for calls to set_pte_at(). > > > > -- > > Khalid > > > > > > > > Signed-off-by: Bartosz Golaszewski > > > --- > > > mm/mprotect.c | 1 - > > > 1 file changed, 1 deletion(-) > > > > > > diff --git a/mm/mprotect.c b/mm/mprotect.c > > > index 028c724dcb1a..130dac3ad04f 100644 > > > --- a/mm/mprotect.c > > > +++ b/mm/mprotect.c > > > @@ -39,7 +39,6 @@ static unsigned long change_pte_range(struct > > > vm_area_struct *vma, pmd_t *pmd, > > > unsigned long addr, unsigned long end, pgprot_t newprot, > > > int dirty_accountable, int prot_numa) > > > { > > > - struct mm_struct *mm = vma->vm_mm; > > > pte_t *pte, oldpte; > > > spinlock_t *ptl; > > > unsigned long pages = 0; > > > > > > > > > Oops, I blindly assumed the compiler is right, sorry for that. GCC > complains it's unused when building usermode linux. I guess it's a > matter of how set_pte_at() is defined for ARCH=um. I'll take a second > look. > The problem is that set_pte_at() is implemented as a macro on some architectures. The appropriate fix is to make all architectures use a static inline C functions in all cases. That will make the compiler think that the `mm' arg is used, even if it is not.
Re: [PATCH 4/9] arm64: dts: meson: g12a: add uart_ao_a pinctrl
Neil Armstrong writes: > On 07/03/2019 16:13, Neil Armstrong wrote: >> From: Jerome Brunet >> >> Add the always on UART pinctrl setting to the g12a soc DT and >> use it for the u200 reference design >> >> Signed-off-by: Jerome Brunet >> Signed-off-by: Neil Armstrong >> --- >> .../arm64/boot/dts/amlogic/meson-g12a-u200.dts | 2 ++ >> arch/arm64/boot/dts/amlogic/meson-g12a.dtsi| 18 ++ >> 2 files changed, 20 insertions(+) >> >> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12a-u200.dts >> b/arch/arm64/boot/dts/amlogic/meson-g12a-u200.dts >> index c44dbdddf2cf..f2afd0bf3e28 100644 >> --- a/arch/arm64/boot/dts/amlogic/meson-g12a-u200.dts >> +++ b/arch/arm64/boot/dts/amlogic/meson-g12a-u200.dts >> @@ -25,5 +25,7 @@ >> >> &uart_AO { >> status = "okay"; >> +pinctrl-0 = <&uart_ao_a_pins>; >> +pinctrl-names = "default"; >> }; >> >> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi >> b/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi >> index c141cc7f6b09..f8f055c49f9a 100644 >> --- a/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi >> +++ b/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi >> @@ -177,6 +177,24 @@ >> #gpio-cells = <2>; >> gpio-ranges = <&ao_pinctrl 0 0 >> 15>; >> }; >> + >> +uart_ao_a_pins: uart_a_ao { >> +mux { >> +groups = "uart_ao_a_tx", >> + "uart_ao_a_rx"; >> +function = "uart_ao_a"; >> +bias-disable; >> +}; >> +}; >> + >> +uart_ao_a_cts_rts_pins: >> uart_ao_a_cts_rts { >> +mux { >> +groups = >> "uart_ao_a_cts", >> + >> "uart_ao_a_rts"; >> +function = "uart_ao_a"; >> +bias-disable; >> +}; >> +}; >> }; >> }; >> >> > > Will move this out of this patchset to the boards patchset I assume you meant you'd move the first hunk, the one that modifies the u200 board to the other series, but keep this hunk in this series? Kevin