[PATCH v2 2/2] dt-bindings: power: reset: add document for NVMEM based reboot-mode
Add the device tree bindings document for the NVMEM based reboot-mode driver. Signed-off-by: Nandor Han --- .../power/reset/nvmem-reboot-mode.txt | 32 +++ 1 file changed, 32 insertions(+) create mode 100644 Documentation/devicetree/bindings/power/reset/nvmem-reboot-mode.txt diff --git a/Documentation/devicetree/bindings/power/reset/nvmem-reboot-mode.txt b/Documentation/devicetree/bindings/power/reset/nvmem-reboot-mode.txt new file mode 100644 index ..2e1b86c31cb3 --- /dev/null +++ b/Documentation/devicetree/bindings/power/reset/nvmem-reboot-mode.txt @@ -0,0 +1,32 @@ +NVMEM reboot mode driver + +This driver gets reboot mode magic value from reboot-mode driver +and stores it in a NVMEM cell named "reboot-mode". Then the bootloader +can read it and take different action according to the magic +value stored. + +This DT node should be represented as a sub-node of a "simple-mfd" +node. + +Required properties: +- compatible: should be "nvmem-reboot-mode". +- nvmem-cells: A phandle to the reboot mode provided by a nvmem device. +- nvmem-cell-names: Should be "reboot-mode". + +The rest of the properties should follow the generic reboot-mode description +found in reboot-mode.txt + +Example: + reboot-mode-nvmem@0 { + compatible = "simple-mfd"; + reboot-mode { + compatible = "nvmem-reboot-mode"; + nvmem-cells = <_mode>; + nvmem-cell-names = "reboot-mode"; + + mode-normal = <0x5501>; + mode-bootloader = <0x5500>; + mode-recovery = <0x5502>; + mode-test = <0x5503>; + }; + }; -- 2.17.2
[PATCH v2 1/2] power: reset: nvmem-reboot-mode: use NVMEM as reboot mode write interface
Add a new reboot mode write interface that is using an NVMEM cell to store the reboot mode magic. Signed-off-by: Nandor Han --- drivers/power/reset/Kconfig | 9 +++ drivers/power/reset/Makefile| 1 + drivers/power/reset/nvmem-reboot-mode.c | 76 + 3 files changed, 86 insertions(+) create mode 100644 drivers/power/reset/nvmem-reboot-mode.c diff --git a/drivers/power/reset/Kconfig b/drivers/power/reset/Kconfig index 6533aa560aa1..bb4a4e854f96 100644 --- a/drivers/power/reset/Kconfig +++ b/drivers/power/reset/Kconfig @@ -245,5 +245,14 @@ config POWER_RESET_SC27XX PMICs includes the SC2720, SC2721, SC2723, SC2730 and SC2731 chips. +config NVMEM_REBOOT_MODE + tristate "Generic NVMEM reboot mode driver" + select REBOOT_MODE + help + Say y here will enable reboot mode driver. This will + get reboot mode arguments and store it in a NVMEM cell, + then the bootloader can read it and take different + action according to the mode. + endif diff --git a/drivers/power/reset/Makefile b/drivers/power/reset/Makefile index 0aebee954ac1..85da3198e4e0 100644 --- a/drivers/power/reset/Makefile +++ b/drivers/power/reset/Makefile @@ -29,3 +29,4 @@ obj-$(CONFIG_POWER_RESET_ZX) += zx-reboot.o obj-$(CONFIG_REBOOT_MODE) += reboot-mode.o obj-$(CONFIG_SYSCON_REBOOT_MODE) += syscon-reboot-mode.o obj-$(CONFIG_POWER_RESET_SC27XX) += sc27xx-poweroff.o +obj-$(CONFIG_NVMEM_REBOOT_MODE) += nvmem-reboot-mode.o diff --git a/drivers/power/reset/nvmem-reboot-mode.c b/drivers/power/reset/nvmem-reboot-mode.c new file mode 100644 index ..816cfdab16a7 --- /dev/null +++ b/drivers/power/reset/nvmem-reboot-mode.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (c) Vaisala Oyj. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include + +struct nvmem_reboot_mode { + struct reboot_mode_driver reboot; + struct nvmem_cell *cell; +}; + +static int nvmem_reboot_mode_write(struct reboot_mode_driver *reboot, + unsigned int magic) +{ + int ret; + struct nvmem_reboot_mode *nvmem_rbm; + + nvmem_rbm = container_of(reboot, struct nvmem_reboot_mode, reboot); + + ret = nvmem_cell_write(nvmem_rbm->cell, , sizeof(magic)); + if (ret < 0) + dev_err(reboot->dev, "update reboot mode bits failed\n"); + + return ret; +} + +static int nvmem_reboot_mode_probe(struct platform_device *pdev) +{ + int ret; + struct nvmem_reboot_mode *nvmem_rbm; + + nvmem_rbm = devm_kzalloc(>dev, sizeof(*nvmem_rbm), GFP_KERNEL); + if (!nvmem_rbm) + return -ENOMEM; + + nvmem_rbm->reboot.dev = >dev; + nvmem_rbm->reboot.write = nvmem_reboot_mode_write; + + nvmem_rbm->cell = devm_nvmem_cell_get(>dev, "reboot-mode"); + if (IS_ERR(nvmem_rbm->cell)) { + dev_err(>dev, "failed to get the nvmem cell reboot-mode\n"); + return PTR_ERR(nvmem_rbm->cell); + } + + ret = devm_reboot_mode_register(>dev, _rbm->reboot); + if (ret) + dev_err(>dev, "can't register reboot mode\n"); + + return ret; +} + +static const struct of_device_id nvmem_reboot_mode_of_match[] = { + { .compatible = "nvmem-reboot-mode" }, + {} +}; +MODULE_DEVICE_TABLE(of, nvmem_reboot_mode_of_match); + +static struct platform_driver nvmem_reboot_mode_driver = { + .probe = nvmem_reboot_mode_probe, + .driver = { + .name = "nvmem-reboot-mode", + .of_match_table = nvmem_reboot_mode_of_match, + }, +}; +module_platform_driver(nvmem_reboot_mode_driver); + +MODULE_AUTHOR("Nandor Han "); +MODULE_DESCRIPTION("NVMEM reboot mode driver"); +MODULE_LICENSE("GPL v2"); -- 2.17.2
[PATCH v2 0/2] Use NVMEM as reboot-mode write interface
Description --- Extend the reboot mode driver to use a NVMEM cell as writing interface. Testing --- The testing is done by configuring DT from a custom board. The NVMEM cell is configured in an RTC non-volatile memory. Kernel: 4.14.60 (the patchset was rebased on kernel master) DT configurations: ` ... reboot-mode-nvmem@0 { compatible = "simple-mfd"; reboot-mode { compatible = "nvmem-reboot-mode"; nvmem-cells = <_mode>; nvmem-cell-names = "reboot-mode"; mode-test = <0x21969147>; }; }; ... reboot_mode: nvmem_reboot_mode@0 { reg = <0x00 0x4>; }; ... ` 1. Reboot the system using the command `reboot test` 2. Verify that kernel logs show that reboot was done in mode `test`: PASS `[ 413.957172] reboot: Restarting system with command 'test' ` 3. Stop in U-Boot and verify that mode `test` magic value is present in RTCs non-volatile memory: PASS Kernel: 5.1.0-rc3 1. Configure `arch/arm/configs/imx_v6_v7_defconfig` to contain `CONFIG_NVMEM_REBOOT_MODE=y` 2. Verify that Kernel compiles successful: PASS ` make ARCH=arm CROSS_COMPILE=arm-linux-gnu- imx_v6_v7_defconfig zImage ... CC drivers/power/reset/nvmem-reboot-mode.o ... Kernel: arch/arm/boot/zImage is ready ` Changes since v1: - - split the documentation on a separate patch - add a missing header Nandor Han (2): power: reset: nvmem-reboot-mode: use NVMEM as reboot mode write interface dt-bindings: power: reset: add document for NVMEM based reboot-mode .../power/reset/nvmem-reboot-mode.txt | 32 drivers/power/reset/Kconfig | 9 +++ drivers/power/reset/Makefile | 1 + drivers/power/reset/nvmem-reboot-mode.c | 76 +++ 4 files changed, 118 insertions(+) create mode 100644 Documentation/devicetree/bindings/power/reset/nvmem-reboot-mode.txt create mode 100644 drivers/power/reset/nvmem-reboot-mode.c -- 2.17.2
Re: [PATCH v3] init: Do not select DEBUG_KERNEL by default
On Thu, Apr 11, 2019 at 2:44 PM Sinan Kaya wrote: > > On 4/11/2019 1:31 AM, Masahiro Yamada wrote: > >> t looks like CONFIG_KALLSYMS_ALL is the only feature that > >> requires CONFIG_DEBUG_KERNEL. > > Which part of KALLSYMS_ALL code requires CONFIG_DEBUG_KERNEL? > > > > I was going by what Kconfig tells me > > Symbol: KALLSYMS_ALL [=n] > Depends on: DEBUG_KERNEL [=n] && KALLSYMS [=y] Lots of features have 'depends on DEBUG_KERNEL'. What is special about KALLSYMS_ALL here? ./drivers/gpio/Kconfig:52: depends on DEBUG_KERNEL ./drivers/pci/Kconfig:69: depends on DEBUG_KERNEL ./drivers/usb/gadget/Kconfig:51: depends on DEBUG_KERNEL ./drivers/base/Kconfig:119: depends on DEBUG_KERNEL ./drivers/base/Kconfig:130: depends on DEBUG_KERNEL ./drivers/base/Kconfig:142: depends on DEBUG_KERNEL ./drivers/spi/Kconfig:29: depends on DEBUG_KERNEL ./drivers/pinctrl/Kconfig:29: depends on DEBUG_KERNEL ./drivers/gpu/drm/Kconfig:55: depends on DEBUG_KERNEL ./kernel/rcu/Kconfig.debug:16: depends on DEBUG_KERNEL ./kernel/rcu/Kconfig.debug:33: depends on DEBUG_KERNEL ./kernel/rcu/Kconfig.debug:61: depends on DEBUG_KERNEL ./kernel/rcu/Kconfig.debug:73: depends on DEBUG_KERNEL ./net/dccp/Kconfig:30: depends on DEBUG_KERNEL=y ./crypto/Kconfig:173: depends on DEBUG_KERNEL && !CRYPTO_MANAGER_DISABLE_TESTS ./init/Kconfig:951: depends on DEBUG_KERNEL ./init/Kconfig:1476: depends on DEBUG_KERNEL && KALLSYMS ./mm/Kconfig.debug:12: depends on DEBUG_KERNEL ./mm/Kconfig.debug:44: depends on DEBUG_KERNEL && STACKTRACE_SUPPORT ./mm/Kconfig.debug:99: depends on DEBUG_KERNEL ./mm/Kconfig:494: depends on DEBUG_KERNEL && CMA ./lib/Kconfig.kgdb:8: depends on DEBUG_KERNEL ./lib/Kconfig.debug:80: depends on DEBUG_KERNEL && PRINTK && GENERIC_CALIBRATE_DELAY ./lib/Kconfig.debug:172: depends on DEBUG_KERNEL && !COMPILE_TEST ./lib/Kconfig.debug:264:depends on DEBUG_KERNEL ./lib/Kconfig.debug:363: depends on DEBUG_KERNEL && (M68K || UML || SUPERH) || ARCH_WANT_FRAME_POINTERS ./lib/Kconfig.debug:387: depends on DEBUG_KERNEL ./lib/Kconfig.debug:447: depends on DEBUG_KERNEL ./lib/Kconfig.debug:508: depends on DEBUG_KERNEL && SLAB ./lib/Kconfig.debug:549: depends on DEBUG_KERNEL && HAVE_DEBUG_KMEMLEAK ./lib/Kconfig.debug:614: depends on DEBUG_KERNEL && !IA64 ./lib/Kconfig.debug:623: depends on DEBUG_KERNEL ./lib/Kconfig.debug:661: depends on DEBUG_KERNEL && ARCH_HAS_DEBUG_VIRTUAL ./lib/Kconfig.debug:670: depends on DEBUG_KERNEL && !MMU ./lib/Kconfig.debug:712: depends on DEBUG_KERNEL ./lib/Kconfig.debug:723: depends on DEBUG_KERNEL && HIGHMEM ./lib/Kconfig.debug:733: depends on DEBUG_KERNEL && HAVE_DEBUG_STACKOVERFLOW ./lib/Kconfig.debug:802: depends on DEBUG_KERNEL ./lib/Kconfig.debug:816: depends on DEBUG_KERNEL && !S390 ./lib/Kconfig.debug:868: depends on DEBUG_KERNEL && !S390 ./lib/Kconfig.debug:902: depends on DEBUG_KERNEL ./lib/Kconfig.debug:956: depends on DEBUG_KERNEL ./lib/Kconfig.debug:997: depends on DEBUG_KERNEL && PROC_FS ./lib/Kconfig.debug:1010: depends on DEBUG_KERNEL && PROC_FS ./lib/Kconfig.debug:1023: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1048: depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT ./lib/Kconfig.debug:1065: depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT ./lib/Kconfig.debug:: depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT ./lib/Kconfig.debug:1133: depends on DEBUG_KERNEL && RT_MUTEXES ./lib/Kconfig.debug:1140: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1150: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1157: depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT ./lib/Kconfig.debug:1174: depends on DEBUG_KERNEL && RWSEM_SPIN_ON_OWNER ./lib/Kconfig.debug:1181: depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT ./lib/Kconfig.debug:1196: depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT ./lib/Kconfig.debug:1207: depends on DEBUG_KERNEL && LOCKDEP ./lib/Kconfig.debug:1216: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1226: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1237: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1308: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1346: depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION ./lib/Kconfig.debug:1355: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1365: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1375: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1385: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1402: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1417: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1444: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1457: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1536: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1615: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1680: depends on DEBUG_KERNEL || m ./lib/Kconfig.debug:1690: depends on DEBUG_KERNEL || m ./lib/Kconfig.debug:1699: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1710: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1724: depends on DEBUG_KERNEL ./lib/Kconfig.debug:1731: depends on DEBUG_KERNEL ./arch/xtensa/Kconfig.debug:5: depends on DEBUG_KERNEL && MMU
Some new bio merging behaviors in __bio_try_merge_page
Hi Ming, I found a erofs issue after commit 07173c3ec276 ("block: enable multipage bvecs") is merged. It seems that it tries to merge more physical continuous pages in one iovec. However it breaks the current erofs_read_raw_page logic since it uses nr_iovecs of bio_alloc to limit the maximum number of physical continuous blocks as well. It was practicable since the old __bio_try_merge_page only tries to merge in the same page. it is a kAPI behavior change which also affects bio_alloc... ... 231 err = erofs_map_blocks(inode, , EROFS_GET_BLOCKS_RAW); 232 if (unlikely(err)) 233 goto err_out; ... 284 /* max # of continuous pages */ 285 if (nblocks > DIV_ROUND_UP(map.m_plen, PAGE_SIZE)) 286 nblocks = DIV_ROUND_UP(map.m_plen, PAGE_SIZE); 287 if (nblocks > BIO_MAX_PAGES) 288 nblocks = BIO_MAX_PAGES; 289 290 bio = erofs_grab_bio(sb, blknr, nblocks, sb, 291 read_endio, false); 292 if (IS_ERR(bio)) { 293 err = PTR_ERR(bio); 294 bio = NULL; 295 goto err_out; 296 } 297 } 298 299 err = bio_add_page(bio, page, PAGE_SIZE, 0); 300 /* out of the extent or bio is full */ 301 if (err < PAGE_SIZE) 302 goto submit_bio_retry; ... After commit 07173c3ec276 ("block: enable multipage bvecs"), erofs could read more beyond what erofs_map_blocks assigns, and out-of-bound data could be read and it breaks tail-end inline determination. I can change the logic in erofs. However, out of curiosity, I have no idea if some other places also are designed like this. IMO, it's better to provide a total count which indicates how many real pages have been added in this bio. some thoughts? Thanks, Gao Xiang
Re: [PATCH] ARM: dts: imx6q-logicpd: Reduce inrush current on USBH1
On Tue, Apr 02, 2019 at 02:32:04PM -0500, Adam Ford wrote: > Some USB peripherals draw more power, and the sourcing regulator > take a little time to turn on. This patch fixes an issue where > some devices occasionally do not get detected, because the power > isn't quite ready when communication starts, so we add a bit > of a delay. > > Fixes: 1c207f911fe9 ("ARM: dts: imx: Add support for Logic PD > i.MX6QD EVM") > > Signed-off-by: Adam Ford Applied, thanks.
[PATCH v2 11/11] platform/x86: asus-wmi: Do not disable keyboard backlight on unload
The keyboard backlight is disabled when module is unloaded as it is exposed as LED device. Change this behavior to ignore setting 0 brightness when the ledclass device is unloading. Signed-off-by: Yurii Pavlovskyi --- drivers/platform/x86/asus-wmi.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c index f0e506feb924..f49992fa87b3 100644 --- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -475,6 +475,10 @@ static void do_kbd_led_set(struct led_classdev *led_cdev, int value) static void kbd_led_set(struct led_classdev *led_cdev, enum led_brightness value) { + /* Prevent disabling keyboard backlight on module unregister */ + if (led_cdev->flags & LED_UNREGISTERING) + return; + do_kbd_led_set(led_cdev, value); } -- 2.17.1
Re: [PATCH V2] ARM: dts: imx6q-logicpd: Shutdown LCD regulator during suspend
On Tue, Apr 02, 2019 at 02:25:46PM -0500, Adam Ford wrote: > The LCD power sequencer is very finicky. The backlight cannot > be driven until after the sequencer is done. Until now, the > regulators were marked with 'regulator-always-on' to make sure > it came up before the backlight. This patch allows the LCD > regulators to power down and prevent the backlight from being > used again until the sequencer is ready. This reduces > standby power consumption by ~100mW. > > Signed-off-by: Adam Ford Applied, thanks.
[PATCH v2 09/11] platform/x86: asus-wmi: Control RGB keyboard backlight
The WMI exposes two methods for controlling RGB keyboard backlight which allow to control: * RGB components in range 00 - ff, * Switch between 4 effects, * Switch between 3 effect speed modes, * Separately enable the backlight on boot, in awake state (after driver load), in sleep mode, and probably in something called shutdown mode (no observable effects of enabling it are known so far). The configuration should be written to several sysfs parameter buffers which are then written via WMI by writing either 1 or 2 to the "kbbl_set" parameter. When reading the buffers the last written value is returned. If the 2 is written to "kbbl_set", the parameters will be reset on reboot (temporary mode), 1 is permanent mode, parameters are retained. The calls use new 3-dword input buffer method call. The functionality is only enabled if corresponding DSTS methods return exact valid values. The following script demonstrates usage: echo Red [00 - ff] echo 33 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_red echo Green [00 - ff] echo ff > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_green echo Blue [00 - ff] echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_blue echo Mode: 0 - static color, 1 - blink, 2 - rainbow, 3 - strobe echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_mode echo Speed for modes 1 and 2: 0 - slow, 1 - medium, 2 - fast echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_speed echo Enable: 02 - on boot, before module load, 08 - awake, 20 - sleep, echo 2a or ff to set all echo 2a > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_flags echo Save: 1 - permanently, 2 - temporarily, reset after reboot echo 1 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_set Signed-off-by: Yurii Pavlovskyi --- .../ABI/testing/sysfs-platform-asus-wmi | 61 drivers/platform/x86/asus-wmi.c | 329 ++ include/linux/platform_data/x86/asus-wmi.h| 2 + 3 files changed, 392 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-platform-asus-wmi b/Documentation/ABI/testing/sysfs-platform-asus-wmi index 019e1e29370e..300a40519695 100644 --- a/Documentation/ABI/testing/sysfs-platform-asus-wmi +++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi @@ -36,3 +36,64 @@ KernelVersion: 3.5 Contact: "AceLan Kao" Description: Resume on lid open. 1 means on, 0 means off. + +What: /sys/devices/platform//kbbl/kbbl_red +Date: Apr 2019 +KernelVersion: 5.1 +Contact: "Yurii Pavlovskyi" +Description: + RGB keyboard backlight red component: 00 .. ff. + +What: /sys/devices/platform//kbbl/kbbl_green +Date: Apr 2019 +KernelVersion: 5.1 +Contact: "Yurii Pavlovskyi" +Description: + RGB keyboard backlight green component: 00 .. ff. + +What: /sys/devices/platform//kbbl/kbbl_blue +Date: Apr 2019 +KernelVersion: 5.1 +Contact: "Yurii Pavlovskyi" +Description: + RGB keyboard backlight blue component: 00 .. ff. + +What: /sys/devices/platform//kbbl/kbbl_mode +Date: Apr 2019 +KernelVersion: 5.1 +Contact: "Yurii Pavlovskyi" +Description: + RGB keyboard backlight mode: + * 0 - static color, + * 1 - blink, + * 2 - rainbow, + * 3 - strobe. + +What: /sys/devices/platform//kbbl/kbbl_speed +Date: Apr 2019 +KernelVersion: 5.1 +Contact: "Yurii Pavlovskyi" +Description: + RGB keyboard backlight speed for modes 1 and 2: + * 0 - slow, + * 1 - medium, + * 2 - fast. + +What: /sys/devices/platform//kbbl/kbbl_flags +Date: Apr 2019 +KernelVersion: 5.1 +Contact: "Yurii Pavlovskyi" +Description: + RGB keyboard backlight enable flags (2a to enable everything), OR of: + * 02 - on boot (until module load), + * 08 - awake, + * 20 - sleep. + +What: /sys/devices/platform//kbbl/kbbl_set +Date: Apr 2019 +KernelVersion: 5.1 +Contact: "Yurii Pavlovskyi" +Description: + Write changed RGB keyboard backlight parameters: + * 1 - permanently, + * 2 - temporarily. diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c index de0a8f61d4a1..b4fd200e8335 100644 --- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -145,6 +145,21 @@ struct asus_rfkill { u32 dev_id; }; +struct asus_kbbl_rgb { + u8 kbbl_red; + u8 kbbl_green; + u8 kbbl_blue; + u8 kbbl_mode; + u8 kbbl_speed; + + u8 kbbl_set_red; + u8 kbbl_set_green; + u8 kbbl_set_blue; + u8 kbbl_set_mode; + u8 kbbl_set_speed; + u8 kbbl_set_flags; +}; + struct asus_wmi { int dsts_id; int spec;
[PATCH v2 10/11] platform/x86: asus-wmi: Switch fan boost mode
The WMI exposes a write-only device ID where three modes can be switched on some laptops (TUF Gaming FX505GM). There is a hotkey combination Fn-F5 that does have a fan icon which is designed to toggle between these 3 modes. Add a SysFS entry that reads the last written value and updates value in WMI on write and a hotkey handler that toggles the modes. The corresponding DEVS device handler does obviously take 3 possible argument values. Method (SFBM, 1, NotSerialized) { If ((Arg0 == Zero) { .. } If ((Arg0 == One)) { .. } If ((Arg0 == 0x02)) { .. } } ... // DEVS If ((IIA0 == 0x00110018)) { SFBM (IIA1) Return (One) } * 0x00 - is normal, * 0x01 - is obviously turbo by the amount of noise, might be useful to avoid CPU frequency throttling on high load, * 0x02 - the meaning is unknown at the time as modes are not named in the vendor documentation, but it does look like a quiet mode as CPU temperature does increase about 10 degrees on maximum load. Signed-off-by: Yurii Pavlovskyi --- .../ABI/testing/sysfs-platform-asus-wmi | 10 ++ drivers/platform/x86/asus-wmi.c | 119 -- include/linux/platform_data/x86/asus-wmi.h| 1 + 3 files changed, 117 insertions(+), 13 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-platform-asus-wmi b/Documentation/ABI/testing/sysfs-platform-asus-wmi index 300a40519695..2b3184e297a7 100644 --- a/Documentation/ABI/testing/sysfs-platform-asus-wmi +++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi @@ -97,3 +97,13 @@ Description: Write changed RGB keyboard backlight parameters: * 1 - permanently, * 2 - temporarily. + +What: /sys/devices/platform//fan_mode +Date: Apr 2019 +KernelVersion: 5.1 +Contact: "Yurii Pavlovskyi" +Description: + Fan boost mode: + * 0 - normal, + * 1 - turbo, + * 2 - quiet? diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c index b4fd200e8335..f0e506feb924 100644 --- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -69,6 +69,7 @@ MODULE_LICENSE("GPL"); #define NOTIFY_KBD_BRTUP 0xc4 #define NOTIFY_KBD_BRTDWN 0xc5 #define NOTIFY_KBD_BRTTOGGLE 0xc7 +#define NOTIFY_KBD_FBM 0x99 #define ASUS_FAN_DESC "cpu_fan" #define ASUS_FAN_MFUN 0x13 @@ -77,6 +78,8 @@ MODULE_LICENSE("GPL"); #define ASUS_FAN_CTRL_MANUAL 1 #define ASUS_FAN_CTRL_AUTO 2 +#define ASUS_FAN_MODE_COUNT3 + #define USB_INTEL_XUSB2PR 0xD0 #define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31 @@ -196,6 +199,9 @@ struct asus_wmi { int asus_hwmon_num_fans; int asus_hwmon_pwm; + bool fan_mode_available; + u8 fan_mode; + bool kbbl_rgb_available; struct asus_kbbl_rgb kbbl_rgb; @@ -1832,6 +1838,87 @@ static int asus_wmi_fan_init(struct asus_wmi *asus) return 0; } +/* Fan mode ***/ + +static int fan_mode_check_present(struct asus_wmi *asus) +{ + u32 result; + int err; + + asus->fan_mode_available = false; + + err = asus_wmi_get_devstate(asus, ASUS_WMI_DEVID_FAN_MODE, ); + if (err) { + if (err == -ENODEV) + return 0; + else + return err; + } + + if (result & ASUS_WMI_DSTS_PRESENCE_BIT) + asus->fan_mode_available = true; + + return 0; +} + +static int fan_mode_write(struct asus_wmi *asus) +{ + int err; + u8 value; + u32 retval; + + value = asus->fan_mode % ASUS_FAN_MODE_COUNT; + pr_info("Set fan mode: %u\n", value); + err = asus_wmi_set_devstate(ASUS_WMI_DEVID_FAN_MODE, value, ); + + if (err) { + pr_warn("Failed to set fan mode: %d\n", err); + return err; + } + + if (retval != 1) { + pr_warn("Failed to set fan mode (retval): 0x%x\n", retval); + return -EIO; + } + + return 0; +} + +static int fan_mode_switch_next(struct asus_wmi *asus) +{ + asus->fan_mode = (asus->fan_mode + 1) % ASUS_FAN_MODE_COUNT; + return fan_mode_write(asus); +} + +static ssize_t fan_mode_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct asus_wmi *asus = dev_get_drvdata(dev); + + return show_u8(asus->fan_mode, buf); +} + +static ssize_t fan_mode_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t count) +{ + int result; + u8 new_mode; + + struct asus_wmi *asus = dev_get_drvdata(dev); + + result = store_u8(_mode, buf, count); + if (result < 0) + return result; + +
[PATCH v2 08/11] platform/x86: asus-wmi: Enhance detection of thermal data
The obviously wrong value 1 for temperature device ID in this driver is returned by at least some devices, including TUF Gaming series laptops, instead of 0 as expected previously. Observable effect is that a temp1_input in hwmon reads temperature near absolute zero. * Consider 0.1 K as erroneous value in addition to 0 K. * Refactor detection of thermal input availability to a separate function. Signed-off-by: Yurii Pavlovskyi --- drivers/platform/x86/asus-wmi.c | 45 - 1 file changed, 38 insertions(+), 7 deletions(-) diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c index a98df005d6cb..de0a8f61d4a1 100644 --- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -176,6 +176,7 @@ struct asus_wmi { struct asus_rfkill gps; struct asus_rfkill uwb; + bool asus_hwmon_thermal_available; bool asus_hwmon_fan_manual_mode; int asus_hwmon_num_fans; int asus_hwmon_pwm; @@ -1373,6 +1374,32 @@ static struct attribute *hwmon_attributes[] = { NULL }; +static int asus_hwmon_check_thermal_available(struct asus_wmi *asus) +{ + u32 value = ASUS_WMI_UNSUPPORTED_METHOD; + int err; + + asus->asus_hwmon_thermal_available = false; + err = asus_wmi_get_devstate(asus, ASUS_WMI_DEVID_THERMAL_CTRL, ); + + if (err < 0) { + if (err == -ENODEV) + return 0; + + return err; + } + + /* +* If the temperature value in deci-Kelvin is near the absolute +* zero temperature, something is clearly wrong. +*/ + if (!value || value == 1) + return 0; + + asus->asus_hwmon_thermal_available = true; + return 0; +} + static umode_t asus_hwmon_sysfs_is_visible(struct kobject *kobj, struct attribute *attr, int idx) { @@ -1386,8 +1413,6 @@ static umode_t asus_hwmon_sysfs_is_visible(struct kobject *kobj, if (attr == _attr_pwm1.attr) dev_id = ASUS_WMI_DEVID_FAN_CTRL; - else if (attr == _attr_temp1_input.attr) - dev_id = ASUS_WMI_DEVID_THERMAL_CTRL; if (attr == _attr_fan1_input.attr || attr == _attr_fan1_label.attr @@ -1412,15 +1437,13 @@ static umode_t asus_hwmon_sysfs_is_visible(struct kobject *kobj, * - reverved bits are non-zero * - sfun and presence bit are not set */ - if (value == ASUS_WMI_UNSUPPORTED_METHOD || value & 0xFFF8 + if (value == ASUS_WMI_UNSUPPORTED_METHOD || (value & 0xFFF8) || (!asus->sfun && !(value & ASUS_WMI_DSTS_PRESENCE_BIT))) ok = false; else ok = fan_attr <= asus->asus_hwmon_num_fans; - } else if (dev_id == ASUS_WMI_DEVID_THERMAL_CTRL) { - /* If value is zero, something is clearly wrong */ - if (!value) - ok = false; + } else if (attr == _attr_temp1_input.attr) { + ok = asus->asus_hwmon_thermal_available; } else if (fan_attr <= asus->asus_hwmon_num_fans && fan_attr != -1) { ok = true; } else { @@ -1476,6 +1499,14 @@ static int asus_wmi_fan_init(struct asus_wmi *asus) } pr_info("Number of fans: %d\n", asus->asus_hwmon_num_fans); + + status = asus_hwmon_check_thermal_available(asus); + if (status) { + pr_warn("Could not check if thermal available: %d\n", status); + return -ENXIO; + } + + pr_info("Thermal available: %d\n", asus->asus_hwmon_thermal_available); return 0; } -- 2.17.1
Re: [PATCH V2] ARM: dts: imx6q-logicpd: Reduce inrush current on start
On Tue, Apr 02, 2019 at 02:19:08PM -0500, Adam Ford wrote: > The main 3.3V regulator sources a series of additional regulators. > This patch adds a small delay, so when the 3.3V regulator comes > on it delays a bit before the subsequent regulators can come on. > This reduces the inrush current a bit on the external DC power > supply to help prevent a situation where the sourcing power supply > cannot source enough current and overloads and the kit fails to > start. > > Fixes: 1c207f911fe9 ("ARM: dts: imx: Add support for Logic PD > i.MX6QD EVM") > > Signed-off-by: Adam Ford Applied, thanks.
Re: [PATCH net] vhost: reject zero size iova range
From: Jason Wang Date: Tue, 9 Apr 2019 12:10:25 +0800 > We used to accept zero size iova range which will lead a infinite loop > in translate_desc(). Fixing this by failing the request in this case. > > Reported-by: syzbot+d21e6e297322a900c...@syzkaller.appspotmail.com > Fixes: 6b1e6cc7 ("vhost: new device IOTLB API") > Signed-off-by: Jason Wang Applied and queued up for -stable.
Re: [PATCH V2] ARM: dts: imx6q-logicpd: Enable Analog audio capture
On Tue, Apr 02, 2019 at 02:25:45PM -0500, Adam Ford wrote: > The original submission had functional audio out and was based > on reviewing other boards using the same wm8962 codec. However, > the Logic PD board uses an analog microphone which was being > disabled for a digital mic. This patch corrects that and > explicitly sets the gpio-cfg pins all to 0x which allows the > analog microphone to capture audio. > > Signed-off-by: Adam Ford Applied, thanks.
[PATCH v2 06/11] platform/x86: asus-nb-wmi: Add microphone mute key code
The microphone mute key that is present on FX505GM laptop and possibly others is missing from sparse keymap. Add the missing code. Also comment on the fan mode switch key that has the same code as the already used key. Signed-off-by: Yurii Pavlovskyi --- drivers/platform/x86/asus-nb-wmi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/platform/x86/asus-nb-wmi.c b/drivers/platform/x86/asus-nb-wmi.c index 357d273ed336..39cf447198a9 100644 --- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -474,6 +474,7 @@ static const struct key_entry asus_nb_wmi_keymap[] = { { KE_KEY, 0x6B, { KEY_TOUCHPAD_TOGGLE } }, { KE_IGNORE, 0x6E, }, /* Low Battery notification */ { KE_KEY, 0x7a, { KEY_ALS_TOGGLE } }, /* Ambient Light Sensor Toggle */ + { KE_KEY, 0x7c, { KEY_MICMUTE } }, { KE_KEY, 0x7D, { KEY_BLUETOOTH } }, /* Bluetooth Enable */ { KE_KEY, 0x7E, { KEY_BLUETOOTH } }, /* Bluetooth Disable */ { KE_KEY, 0x82, { KEY_CAMERA } }, @@ -488,7 +489,7 @@ static const struct key_entry asus_nb_wmi_keymap[] = { { KE_KEY, 0x92, { KEY_SWITCHVIDEOMODE } }, /* SDSP CRT + TV + DVI */ { KE_KEY, 0x93, { KEY_SWITCHVIDEOMODE } }, /* SDSP LCD + CRT + TV + DVI */ { KE_KEY, 0x95, { KEY_MEDIA } }, - { KE_KEY, 0x99, { KEY_PHONE } }, + { KE_KEY, 0x99, { KEY_PHONE } }, /* Conflicts with fan mode switch */ { KE_KEY, 0xA0, { KEY_SWITCHVIDEOMODE } }, /* SDSP HDMI only */ { KE_KEY, 0xA1, { KEY_SWITCHVIDEOMODE } }, /* SDSP LCD + HDMI */ { KE_KEY, 0xA2, { KEY_SWITCHVIDEOMODE } }, /* SDSP CRT + HDMI */ -- 2.17.1
[PATCH v2 07/11] platform/x86: asus-wmi: Organize code into sections
The driver has grown (and will more) pretty big which makes it hard to navigate and understand. Add uniform comments to the code and ensure that it is sorted into logical sections. Signed-off-by: Yurii Pavlovskyi --- drivers/platform/x86/asus-wmi.c | 94 - 1 file changed, 46 insertions(+), 48 deletions(-) diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c index 5aa30f8a2a38..a98df005d6cb 100644 --- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -191,6 +191,8 @@ struct asus_wmi { struct asus_wmi_driver *driver; }; +/* Input **/ + static int asus_wmi_input_init(struct asus_wmi *asus) { int err; @@ -228,6 +230,8 @@ static void asus_wmi_input_exit(struct asus_wmi *asus) asus->inputdev = NULL; } +/* WMI / + static int asus_wmi_evaluate_method_3dw(u32 method_id, u32 arg0, u32 arg1, u32 arg2, u32 *retval) { @@ -246,7 +250,7 @@ static int asus_wmi_evaluate_method_3dw(u32 method_id, u32 arg0, u32 arg1, , ); if (ACPI_FAILURE(status)) - goto exit; + return -EIO; obj = (union acpi_object *)output.pointer; if (obj && obj->type == ACPI_TYPE_INTEGER) @@ -257,10 +261,6 @@ static int asus_wmi_evaluate_method_3dw(u32 method_id, u32 arg0, u32 arg1, kfree(obj); -exit: - if (ACPI_FAILURE(status)) - return -EIO; - if (tmp == ASUS_WMI_UNSUPPORTED_METHOD) return -ENODEV; @@ -344,9 +344,8 @@ static int asus_wmi_get_devstate_simple(struct asus_wmi *asus, u32 dev_id) ASUS_WMI_DSTS_STATUS_BIT); } -/* - * LEDs - */ +/* LEDs ***/ + /* * These functions actually update the LED's, and are called from a * workqueue. By doing this as separate work rather than when the LED @@ -656,6 +655,7 @@ static int asus_wmi_led_init(struct asus_wmi *asus) return rv; } +/* RF */ /* * PCI hotplug (for wlan rfkill) @@ -1078,6 +1078,8 @@ static int asus_wmi_rfkill_init(struct asus_wmi *asus) return result; } +/* Quirks */ + static void asus_wmi_set_xusb2pr(struct asus_wmi *asus) { struct pci_dev *xhci_pdev; @@ -1110,9 +1112,8 @@ static void asus_wmi_set_als(void) asus_wmi_set_devstate(ASUS_WMI_DEVID_ALS_ENABLE, 1, NULL); } -/* - * Hwmon device - */ +/* Hwmon device ***/ + static int asus_hwmon_agfn_fan_speed_read(struct asus_wmi *asus, int fan, int *speed) { @@ -1388,7 +1389,6 @@ static umode_t asus_hwmon_sysfs_is_visible(struct kobject *kobj, else if (attr == _attr_temp1_input.attr) dev_id = ASUS_WMI_DEVID_THERMAL_CTRL; - if (attr == _attr_fan1_input.attr || attr == _attr_fan1_label.attr || attr == _attr_pwm1.attr @@ -1460,9 +1460,27 @@ static void asus_wmi_hwmon_exit(struct asus_wmi *asus) } } -/* - * Backlight - */ +static int asus_wmi_fan_init(struct asus_wmi *asus) +{ + int status; + + asus->asus_hwmon_pwm = -1; + asus->asus_hwmon_num_fans = -1; + asus->asus_hwmon_fan_manual_mode = false; + + status = asus_hwmon_get_fan_number(asus, >asus_hwmon_num_fans); + if (status) { + asus->asus_hwmon_num_fans = 0; + pr_warn("Could not determine number of fans: %d\n", status); + return -ENXIO; + } + + pr_info("Number of fans: %d\n", asus->asus_hwmon_num_fans); + return 0; +} + +/* Backlight **/ + static int read_backlight_power(struct asus_wmi *asus) { int ret; @@ -1644,6 +1662,8 @@ static int is_display_toggle(int code) return 0; } +/* WMI events */ + static int asus_poll_wmi_event(u32 value) { struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL }; @@ -1766,9 +1786,8 @@ static int asus_wmi_notify_queue_flush(struct asus_wmi *asus) return -EIO; } -/* - * Sys helpers - */ +/* Sysfs **/ + static int parse_arg(const char *buf, unsigned long count, int *val) { if (!count) @@ -1907,9 +1926,8 @@ static int asus_wmi_sysfs_init(struct platform_device *device) return sysfs_create_group(>dev.kobj, _attribute_group); } -/* - * Platform device - */ +/* Platform device
[PATCH v2 05/11] platform/x86: asus-wmi: Support queued WMI event codes
Event codes are expected to be polled from a queue on at least some models. The WMI event codes are pushed into queue based on circular buffer. After INIT method is called ACPI code is allowed to push events into this buffer the INIT method can not be reverted. If the module is unloaded and an event (such as hotkey press) gets emitted before inserting it back the events get processed delayed by one or, if the queue overflows, additionally delayed by about 3 seconds. Patch was tested on a newer TUF Gaming FX505GM and older K54C model. FX505GM Device (ATKD) { .. Name (ATKQ, Package (0x10) { 0x, .. } Method (IANQ, 1, Serialized) { If ((AQNO >= 0x10)) { Local0 = 0x64 While ((Local0 && (AQNO >= 0x10))) { Local0-- Sleep (0x0A) } ... .. AQTI++ AQTI &= 0x0F ATKQ [AQTI] = Arg0 ... } Method (GANQ, 0, Serialized) { .. If (AQNO) { ... Local0 = DerefOf (ATKQ [AQHI]) AQHI++ AQHI &= 0x0F Return (Local0) } Return (One) } This code is almost identical to K54C, which does return Ones on empty queue. K54C: Method (GANQ, 0, Serialized) { If (AQNO) { ... Return (Local0) } Return (Ones) } The fix flushes the old key codes out of the queue on load and after receiving event the queue is read until either .. or 1 is encountered. It might be considered a minor issue and no normal user would likely to observe this (there is little reason unloading the driver), but it does significantly frustrate a developer who is unlucky enough to encounter this. Introduce functionality for flushing and processing queued codes, which is enabled via quirk flag for ASUS7000. It might be considered if it is reasonable to enable it everywhere (might introduce regressions) or always try to flush the queue on module load and try to detect if this quirk is present in the future. This patch limits the effect to the specific hardware defined by ASUS7000 device that is used for driver detection by vendor driver of Fx505. The fallback is also implemented in case initial flush fails. Signed-off-by: Yurii Pavlovskyi --- drivers/platform/x86/asus-nb-wmi.c | 1 + drivers/platform/x86/asus-wmi.c| 122 ++--- drivers/platform/x86/asus-wmi.h| 2 + 3 files changed, 97 insertions(+), 28 deletions(-) diff --git a/drivers/platform/x86/asus-nb-wmi.c b/drivers/platform/x86/asus-nb-wmi.c index cc5f0765a8d9..357d273ed336 100644 --- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -438,6 +438,7 @@ static void asus_nb_wmi_quirks(struct asus_wmi_driver *driver) if (acpi_dev_found("ASUS7000")) { driver->quirks->force_dsts = true; + driver->quirks->wmi_event_queue = true; } } diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c index 80f3447734fc..5aa30f8a2a38 100644 --- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -80,6 +80,12 @@ MODULE_LICENSE("GPL"); #define USB_INTEL_XUSB2PR 0xD0 #define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31 +#define WMI_EVENT_QUEUE_SIZE 0x10 +#define WMI_EVENT_QUEUE_END0x1 +#define WMI_EVENT_MASK 0x +/* The event value is always the same. */ +#define WMI_EVENT_VALUE0xFF + static const char * const ashs_ids[] = { "ATK4001", "ATK4002", NULL }; static bool ashs_present(void) @@ -143,6 +149,7 @@ struct asus_wmi { int dsts_id; int spec; int sfun; + bool wmi_event_queue; struct input_dev *inputdev; struct backlight_device *backlight_device; @@ -1637,77 +1644,126 @@ static int is_display_toggle(int code) return 0; } -static void asus_wmi_notify(u32 value, void *context) +static int asus_poll_wmi_event(u32 value) { - struct asus_wmi *asus = context; - struct acpi_buffer response = { ACPI_ALLOCATE_BUFFER, NULL }; + struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL }; union acpi_object *obj; acpi_status status; - int code; - int orig_code; - unsigned int key_value = 1; - bool autorelease = 1; + int code = -EIO; - status = wmi_get_event_data(value, ); - if (status != AE_OK) { - pr_err("bad event status 0x%x\n", status); - return; + status = wmi_get_event_data(value, ); + if (ACPI_FAILURE(status)) { + pr_warn("Failed to get WMI event code: %s\n", + acpi_format_exception(status)); + return code; } - obj = (union acpi_object *)response.pointer; + obj = (union acpi_object *)output.pointer; -
Re: [PATCH v3] init: Do not select DEBUG_KERNEL by default
On 4/11/2019 1:31 AM, Masahiro Yamada wrote: t looks like CONFIG_KALLSYMS_ALL is the only feature that requires CONFIG_DEBUG_KERNEL. Which part of KALLSYMS_ALL code requires CONFIG_DEBUG_KERNEL? I was going by what Kconfig tells me Symbol: KALLSYMS_ALL [=n] Depends on: DEBUG_KERNEL [=n] && KALLSYMS [=y]
[PATCH v2 04/11] platform/x86: asus-wmi: Add quirk to force DSTS WMI method detection
The DSTS method detection fails, as nothing is returned if method is not defined in WMNB. As a result the control of keyboard backlight is not functional for TUF Gaming series laptops (at the time the only functionality of the driver on this model implemented with WMI methods). Patch was tested on a newer TUF Gaming FX505GM and older K54C model. FX505GM: Method (WMNB, 3, Serialized) { ... If ((Local0 == 0x53545344)) { ... Return (Zero) } ... // No return } K54C: Method (WMNB, 3, Serialized) { ... If ((Local0 == 0x53545344)) { ... Return (0x02) } ... Return (0xFFFE) } The non-existing method ASUS_WMI_METHODID_DSTS=0x53544344 (actually it is DCTS in little endian ASCII) is selected in asus->dsts. One way to fix this would be to call both for every known device ID until some answers - this would increase module load time. Another option is to check some device that is known to exist on every model - none known at the time. Last option, which is implemented, is to check for presence of the ASUS7000 device in ACPI tree (it is a dummy device), which is the condition used for loading the vendor driver for this model. This might not fix every affected model ever produced, but it likely does not introduce any regressions. The patch introduces a quirk that is enabled when ASUS7000 is found. Scope (_SB) { Device (ATK) { Name (_HID, "ASUS7000") // _HID: Hardware ID } } Signed-off-by: Yurii Pavlovskyi --- drivers/platform/x86/asus-nb-wmi.c | 5 + drivers/platform/x86/asus-wmi.c| 14 -- drivers/platform/x86/asus-wmi.h| 5 + 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/platform/x86/asus-nb-wmi.c b/drivers/platform/x86/asus-nb-wmi.c index b6f2ff95c3ed..cc5f0765a8d9 100644 --- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "asus-wmi.h" @@ -434,6 +435,10 @@ static void asus_nb_wmi_quirks(struct asus_wmi_driver *driver) } pr_info("Using i8042 filter function for receiving events\n"); } + + if (acpi_dev_found("ASUS7000")) { + driver->quirks->force_dsts = true; + } } static const struct key_entry asus_nb_wmi_keymap[] = { diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c index cfccfc0b8c2f..80f3447734fc 100644 --- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -1885,11 +1885,21 @@ static int asus_wmi_platform_init(struct asus_wmi *asus) * Note, on most Eeepc, there is no way to check if a method exist * or note, while on notebooks, they returns 0xFFFE on failure, * but once again, SPEC may probably be used for that kind of things. +* +* Additionally at least TUF Gaming series laptops return 0 for unknown +* methods, so the detection in this way is not possible and method must +* be forced. Likely the presence of ACPI device ASUS7000 indicates +* this. */ - if (!asus_wmi_evaluate_method(ASUS_WMI_METHODID_DSTS, 0, 0, NULL)) + if (asus->driver->quirks->force_dsts) { + pr_info("DSTS method forced\n"); + asus->dsts_id = ASUS_WMI_METHODID_DSTS2; + } else if (!asus_wmi_evaluate_method(ASUS_WMI_METHODID_DSTS, + 0, 0, NULL)) { asus->dsts_id = ASUS_WMI_METHODID_DSTS; - else + } else { asus->dsts_id = ASUS_WMI_METHODID_DSTS2; + } /* CWAP allow to define the behavior of the Fn+F2 key, * this method doesn't seems to be present on Eee PCs */ diff --git a/drivers/platform/x86/asus-wmi.h b/drivers/platform/x86/asus-wmi.h index 6c1311f4b04d..94056da02fde 100644 --- a/drivers/platform/x86/asus-wmi.h +++ b/drivers/platform/x86/asus-wmi.h @@ -54,6 +54,11 @@ struct quirk_entry { */ int no_display_toggle; u32 xusb2pr; + /** +* Force DSTS instead of DSCS and skip detection. Useful if WMNB +* returns nothing on unknown method call. +*/ + bool force_dsts; bool (*i8042_filter)(unsigned char data, unsigned char str, struct serio *serio); -- 2.17.1
Re: [PATCH 00/11] asus-wmi: Support of ASUS TUF Gaming series laptops
Hi, sorry, just realized, I've broken the logging. I will re-post patches 04 to 11 as replies to original ones, 1 to 3 were ok.
Re: [PATCH v3] init: Do not select DEBUG_KERNEL by default
On Thu, Apr 11, 2019 at 11:47 AM Kees Cook wrote: > > On Wed, Apr 10, 2019 at 5:56 PM Sinan Kaya wrote: > > > > We can't seem to have a kernel with CONFIG_EXPERT set but > > CONFIG_DEBUG_KERNEL unset these days. > > > > While some of the features under the CONFIG_EXPERT require > > CONFIG_DEBUG_KERNEL, it doesn't apply for all features. > > > > It looks like CONFIG_KALLSYMS_ALL is the only feature that > > requires CONFIG_DEBUG_KERNEL. > > > > Select CONFIG_EXPERT when CONFIG_DEBUG_KERNEL is chosen but > > you can still choose CONFIG_EXPERT without CONFIG_DEBUG_KERNEL. > > > > Signed-off-by: Sinan Kaya > > Reviewed-by: Kees Cook > > Masahiro, should this go via your tree, or somewhere else? I think somewhere else. -- Best Regards Masahiro Yamada
Re: [External] Re: Basics : Memory Configuration
From: Christopher Lameter Sent: 09 April 2019 21:31 To: Pankaj Suryawanshi Cc: linux-kernel@vger.kernel.org; linux...@kvack.org Subject: [External] Re: Basics : Memory Configuration On Tue, 9 Apr 2019, Pankaj Suryawanshi wrote: > I am confuse about memory configuration and I have below questions Hmmm... Yes some of the terminology that you use is a bit confusing. > 1. if 32-bit os maximum virtual address is 4GB, When i have 4 gb of ram > for 32-bit os, What about the virtual memory size ? is it required > virtual memory(disk space) or we can directly use physical memory ? The virtual memory size is the maximum virtual size of a single process. Multiple processes can run and each can use different amounts of physical memory. So both are actually independent. Okay Got it. The size of the virtual memory space per process is configurable on x86 32 bit (2G, 3G, 4G). Thus the possible virtual process size may vary depending on the hardware architecture and the configuration of the kernel. Another Questions - - Q. If i configures VMSPLIT = 2G/2G what does it mean ? - Q. Disk Space is used by Virtual Memory ? If this is true, than without secondary storage there is no virtual memory ? let say for 32-bit os i have 4GB ram than what is the use case of virtual memory ? > 2. In 32-bit os 12 bits are offset because page size=4k i.e 2^12 and > 2^20 for page addresses >What about 64-bit os, What is offset size ? What is page size ? How it > calculated. 12 bits are passed through? Thats what you mean? The remainder of the bits are used to lookup the physical frame number(PFN) in the page tables. 64 bit is the same. However, the number of bits used for lookups in the page tables are much higher. - Q. for 32-bit os page size is 4k, what is the page size for 64-bit os ? page size and offset is related to each other ? - Q. if i increase the page size from 4k to 8k, does it change the offset size that it 2^12 to 2^13 ? - Q. Why only 48 bits are used in 64-bit os ? > 3. What is PAE? If enabled how to decide size of PAE, what is maximum > and minimum size of extended memory. PAE increases the physical memory size that can be addressed through a page table lookup. The number of bits that can be specified in the PFN is increased and thus more than 4GB of physical memory can be used by the operating system. However, the virtual memory size stays the same and an individual process still cannot use more memory. - Q. Let say i enabled PAE for 32-bit os with 6GB ram.Virtual size is same 4GB, 32-bit os cant address more thatn 4gb, Than what is the use of 6GB with PAE enabled. * eInfochips Business Disclaimer: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer. Any views expressed in this message are those of the individual sender unless otherwise stated. Company has taken enough precautions to prevent the spread of viruses. However the company accepts no liability for any damage caused by any virus transmitted by this email. *
Re: [PATCH v3] init: Do not select DEBUG_KERNEL by default
On Thu, Apr 11, 2019 at 9:59 AM Sinan Kaya wrote: > > We can't seem to have a kernel with CONFIG_EXPERT set but > CONFIG_DEBUG_KERNEL unset these days. > > While some of the features under the CONFIG_EXPERT require > CONFIG_DEBUG_KERNEL, it doesn't apply for all features. > > It looks like CONFIG_KALLSYMS_ALL is the only feature that > requires CONFIG_DEBUG_KERNEL. Which part of KALLSYMS_ALL code requires CONFIG_DEBUG_KERNEL? > Select CONFIG_EXPERT when CONFIG_DEBUG_KERNEL is chosen but > you can still choose CONFIG_EXPERT without CONFIG_DEBUG_KERNEL. > > Signed-off-by: Sinan Kaya > Reviewed-by: Kees Cook > --- > init/Kconfig | 2 -- > lib/Kconfig.debug | 1 + > 2 files changed, 1 insertion(+), 2 deletions(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 4592bf7997c0..37e10a8391a3 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1206,8 +1206,6 @@ config BPF > > menuconfig EXPERT > bool "Configure standard kernel features (expert users)" > - # Unhide debug options, to make the on-by-default options visible > - select DEBUG_KERNEL > help > This option allows certain base kernel options and settings >to be disabled or tweaked. This is for specialized > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 0d9e81779e37..9fbf3499ec8d 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -434,6 +434,7 @@ config MAGIC_SYSRQ_SERIAL > > config DEBUG_KERNEL > bool "Kernel debugging" > + default EXPERT > help > Say Y here if you are developing drivers or trying to debug and > identify kernel problems. > -- > 2.21.0 > -- Best Regards Masahiro Yamada
Re: \\ 答复: [PATCH] of: del redundant type conversion
On 4/10/19 9:21 PM, Frank Rowand wrote: > On 4/10/19 9:13 PM, Frank Rowand wrote: >> On 4/10/19 6:51 PM, xiaojiangfeng wrote: >>> My pleasure. >>> >>> I am very new to sparse. >>> >>> I guess the warning is caused by the macro min. >> >> I think the warning is likely because the type of data is 'void *'. >> >> Removing the (int) cast is a good fix, but does not resolve >> the sparse warning. > > Let me correct myself. When I ran sparse, I see the removing min() does > eliminate the sparse warning. I'm not sure why, so I'll go dig a little > deeper. Digging leaves me with more information, but still not sure of the actual underlying cause. min() is defined in include/linux/kernel.h. Unraveling the defines, the code that sparse is complaining about is in __no_side_effects(), which is: #define __no_side_effects(x, y) \ (__is_constexpr(x) && __is_constexpr(y)) and __is_constexpr() is: #define __is_constexpr(x) \ (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8))) The compiler warning points to the second sizeof() in the __is_constexpr() for 'l', which expands as: (sizeof(int) == sizeof(*(8 ? ((void *)((long)( l) * 0l)) : (int *)8))) I'll dig into this a little more, to see if maybe the problem is related to my compiler version or sparse version. Or if the reason lies elsewhere. -Frank > > -Frank > >> >> -Frank >> >> >>> Then I submitted my changes. >>> >>> Thanks for code review. >>> >>> >>> -邮件原件- >>> 发件人: Frank Rowand [mailto:frowand.l...@gmail.com] >>> 发送时间: 2019年4月11日 2:50 >>> 收件人: xiaojiangfeng ; robh...@kernel.org; >>> r...@kernel.org >>> 抄送: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org >>> 主题: Re: [PATCH] of: del redundant type conversion >>> >>> On 4/10/19 1:29 AM, xiaojiangfeng wrote: The type of variable l in early_init_dt_scan_chosen is int, there is no need to convert to int. Signed-off-by: xiaojiangfeng --- drivers/of/fdt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 4734223..de893c9 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, /* Retrieve command line */ p = of_get_flat_dt_prop(node, "bootargs", ); if (p != NULL && l > 0) - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE)); + strlcpy(data, p, min(l, COMMAND_LINE_SIZE)); /* * CONFIG_CMDLINE is meant to be a default in case nothing else >>> >>> Thanks for catching the redundant cast. >>> >>> There is a second problem detected by sparse on that line: >>> >>> drivers/of/fdt.c:1094:34: warning: expression using sizeof(void) >>> >>> Can you please fix both issues? >>> >>> Thanks, >>> >>> Frank >>> >> >> > >
Re: [PATCH 1/2] soc: imx: gpc: use devm_platform_ioremap_resource() to simplify code
On Mon, Apr 01, 2019 at 06:07:08AM +, Anson Huang wrote: > Use the new helper devm_platform_ioremap_resource() which wraps the > platform_get_resource() and devm_ioremap_resource() together, to > simplify the code. > > Signed-off-by: Anson Huang Applied both, thanks.
[PATCH] arm64: dts: ls1028a: Add USB dt nodes
This patch adds USB dt nodes for LS1028A. Signed-off-by: Ran Wang --- arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 20 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi index 8dd3501..d4bc314 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi @@ -144,6 +144,26 @@ clocks = <>; }; + usb0:usb3@310 { + compatible= "snps,dwc3"; + reg= <0x0 0x310 0x0 0x1>; + interrupts= <0 80 0x4>; + dr_mode= "host"; + snps,dis_rxdet_inp3_quirk; + snps,quirk-frame-length-adjustment = <0x20>; + snps,incr-burst-type-adjustment = <1>, <4>, <8>, <16>; + }; + + usb1:usb3@311 { + compatible= "snps,dwc3"; + reg= <0x0 0x311 0x0 0x1>; + interrupts= <0 81 0x4>; + dr_mode= "host"; + snps,dis_rxdet_inp3_quirk; + snps,quirk-frame-length-adjustment = <0x20>; + snps,incr-burst-type-adjustment = <1>, <4>, <8>, <16>; + }; + i2c0: i2c@200 { compatible = "fsl,vf610-i2c"; #address-cells = <1>; -- 1.7.1
Re: [PATCH 2/2] x86/pci: Clean up usage of X86_DEV_DMA_OPS
On Wed, Apr 10, 2019 at 04:45:01PM -0500, Bjorn Helgaas wrote: > [+cc Keith, Jonathan (VMD guys)] > > I'm OK with this from a PCI perspective. It would be nice if > > dma_domain_list > dma_domain_list_lock > add_dma_domain() > del_dma_domain() > set_dma_domain_ops() > > could all be moved to vmd.c, since they're really only used there. I have another patch to eventually kill that, but it will need a little more prep work and thus be delayed to the next merge window.
Re: [PATCH] clk: imx: use devm_platform_ioremap_resource() to simplify code
On Mon, Apr 01, 2019 at 05:13:02AM +, Anson Huang wrote: > Use the new helper devm_platform_ioremap_resource() which wraps the > platform_get_resource() and devm_ioremap_resource() together, to > simplify the code. > > Signed-off-by: Anson Huang Applied, thanks.
Re: [PATCH] of: del redundant type conversion
On 4/10/19 1:29 AM, xiaojiangfeng wrote: > The type of variable l in early_init_dt_scan_chosen is > int, there is no need to convert to int. > > Signed-off-by: xiaojiangfeng > --- > drivers/of/fdt.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 4734223..de893c9 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long > node, const char *uname, > /* Retrieve command line */ > p = of_get_flat_dt_prop(node, "bootargs", ); > if (p != NULL && l > 0) > - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE)); > + strlcpy(data, p, min(l, COMMAND_LINE_SIZE)); > > /* >* CONFIG_CMDLINE is meant to be a default in case nothing else > My first reply to this patch asked for a sparse warning on this line to also be fixed. After xiaojiangfeng followed up with a different patch to try to resolve the issues with this line of code, I see that the sparse warning was not caused by my first conjecture and this patch is the correct one to apply. I will pursue the cause of the sparse warning myself separately. Reviewed-by: Frank Rowand
Re: [PATCH] of: fix expression using sizeof(void)
On 4/10/19 6:47 PM, xiaojiangfeng wrote: > problem detected by sparse: > drivers/of/fdt.c:1094:34: warning: expression using sizeof(void) > > Signed-off-by: xiaojiangfeng > --- > drivers/of/fdt.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 4734223..75c6c55 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long > node, const char *uname, > /* Retrieve command line */ > p = of_get_flat_dt_prop(node, "bootargs", ); > if (p != NULL && l > 0) > - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE)); > + strlcpy(data, p, COMMAND_LINE_SIZE); > > /* >* CONFIG_CMDLINE is meant to be a default in case nothing else > The fuller discussion is in the thread where you first attempted to fix an issue with the line of code and I reported a sparse error against this line. After digging deeper, your first patch is valid, removing min() here is not the correct approach. I will add my Reviewed-by to the first patch and I will pursue the sparse warning separately. Thanks, Frank
Re: [PATCH v1 00/15] Refactor pgalloc stuff
Christophe Leroy writes: > This series converts book3e64 to pte_fragment and refactor > things that are common among subarches. > > Christophe Leroy (15): > powerpc/mm: drop __bad_pte() > powerpc/mm: define __pud_free_tlb() at all time on nohash/64 > powerpc/mm: convert Book3E 64 to pte_fragment > powerpc/mm: move pgtable_t in asm/mmu.h > powerpc/mm: get rid of nohash/32/mmu.h and nohash/64/mmu.h > powerpc/Kconfig: select PPC_MM_SLICES from subarch type > powerpc/book3e: move early_alloc_pgtable() to init section > powerpc/mm: don't use pte_alloc_kernel() until slab is available on > PPC32 > powerpc/mm: inline pte_alloc_one_kernel() and pte_alloc_one() on PPC32 > powerpc/mm: refactor pte_alloc_one() and pte_free() families > definition. > powerpc/mm: refactor definition of pgtable_cache[] > powerpc/mm: Only keep one version of pmd_populate() functions on > nohash/32 > powerpc/mm: refactor pgtable freeing functions on nohash > powerpc/mm: refactor pmd_pgtable() > powerpc/mm: refactor pgd_alloc() and pgd_free() on nohash > > arch/powerpc/include/asm/book3s/32/mmu-hash.h | 4 - > arch/powerpc/include/asm/book3s/32/pgalloc.h | 41 - > arch/powerpc/include/asm/book3s/64/mmu.h | 8 -- > arch/powerpc/include/asm/book3s/64/pgalloc.h | 49 -- > arch/powerpc/include/asm/mmu.h| 3 + > arch/powerpc/include/asm/mmu_context.h| 6 -- > arch/powerpc/include/asm/nohash/32/mmu.h | 25 -- > arch/powerpc/include/asm/nohash/32/pgalloc.h | 123 > ++ > arch/powerpc/include/asm/nohash/64/mmu.h | 12 --- > arch/powerpc/include/asm/nohash/64/pgalloc.h | 117 +--- > arch/powerpc/include/asm/nohash/mmu.h | 16 +++- > arch/powerpc/include/asm/nohash/pgalloc.h | 56 > arch/powerpc/include/asm/pgalloc.h| 51 +++ > arch/powerpc/mm/Makefile | 4 +- > arch/powerpc/mm/mmu_context.c | 2 +- > arch/powerpc/mm/pgtable-book3e.c | 4 +- > arch/powerpc/mm/pgtable_32.c | 42 + > arch/powerpc/platforms/Kconfig.cputype| 4 +- > 18 files changed, 165 insertions(+), 402 deletions(-) > delete mode 100644 arch/powerpc/include/asm/nohash/32/mmu.h > delete mode 100644 arch/powerpc/include/asm/nohash/64/mmu.h > > -- > 2.13.3 Looks good. You can add for the series Reviewed-by: Aneesh Kumar K.V
Re: [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects
On Thu, Apr 11, 2019 at 05:47:46AM +0100, Al Viro wrote: > On Thu, Apr 11, 2019 at 12:48:21PM +1000, Tobin C. Harding wrote: > > > Oh, so putting entries on a shrink list is enough to pin them? > > Not exactly pin, but __dentry_kill() has this: > if (dentry->d_flags & DCACHE_SHRINK_LIST) { > dentry->d_flags |= DCACHE_MAY_FREE; > can_free = false; > } > spin_unlock(>d_lock); > if (likely(can_free)) > dentry_free(dentry); > and shrink_dentry_list() - this: > if (dentry->d_lockref.count < 0) > can_free = dentry->d_flags & DCACHE_MAY_FREE; > spin_unlock(>d_lock); > if (can_free) > dentry_free(dentry); > continue; > so if dentry destruction comes before we get around to > shrink_dentry_list(), it'll stop short of dentry_free() and mark it for > shrink_dentry_list() to do just dentry_free(); if it overlaps with > shrink_dentry_list(), but doesn't progress all the way to freeing, > we will > * have dentry removed from shrink list > * notice the negative ->d_count (i.e. that it has already reached > __dentry_kill()) > * see that __dentry_kill() is not through with tearing the sucker > apart (no DCACHE_MAY_FREE set) > ... and just leave it alone, letting __dentry_kill() do the rest of its > thing - it's already off the shrink list, so __dentry_kill() will do > everything, including dentry_free(). > > The reason for that dance is the locking - shrink list belongs to whoever > has set it up and nobody else is modifying it. So __dentry_kill() doesn't > even try to remove the victim from there; it does all the teardown > (detaches from inode, unhashes, etc.) and leaves removal from the shrink > list and actual freeing to the owner of shrink list. That way we don't > have to protect all shrink lists a single lock (contention on it would > be painful) and we don't have to play with per-shrink-list locks and > all the attendant headaches (those lists usually live on stack frame > of some function, so just having the lock next to the list_head would > do us no good, etc.). Much easier to have the shrink_dentry_list() > do all the manipulations... > > The bottom line is, once it's on a shrink list, it'll stay there > until shrink_dentry_list(). It may get extra references after > being inserted there (e.g. be found by hash lookup), it may drop > those, whatever - it won't get freed until we run shrink_dentry_list(). > If it ends up with extra references, no problem - shrink_dentry_list() > will just kick it off the shrink list and leave it alone. > > Note, BTW, that umount coming between isolate and drop is not a problem; > it call shrink_dcache_parent() on the root. And if shrink_dcache_parent() > finds something on (another) shrink list, it won't put it to the shrink > list of its own, but it will make note of that and repeat the scan in > such case. So if we find something with zero refcount and not on > shrink list, we can move it to our shrink list and be sure that its > superblock won't go away under us... Man, that was good to read. Thanks for taking the time to write this. Tobin
Re: [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects
On Thu, Apr 11, 2019 at 12:48:21PM +1000, Tobin C. Harding wrote: > Oh, so putting entries on a shrink list is enough to pin them? Not exactly pin, but __dentry_kill() has this: if (dentry->d_flags & DCACHE_SHRINK_LIST) { dentry->d_flags |= DCACHE_MAY_FREE; can_free = false; } spin_unlock(>d_lock); if (likely(can_free)) dentry_free(dentry); and shrink_dentry_list() - this: if (dentry->d_lockref.count < 0) can_free = dentry->d_flags & DCACHE_MAY_FREE; spin_unlock(>d_lock); if (can_free) dentry_free(dentry); continue; so if dentry destruction comes before we get around to shrink_dentry_list(), it'll stop short of dentry_free() and mark it for shrink_dentry_list() to do just dentry_free(); if it overlaps with shrink_dentry_list(), but doesn't progress all the way to freeing, we will * have dentry removed from shrink list * notice the negative ->d_count (i.e. that it has already reached __dentry_kill()) * see that __dentry_kill() is not through with tearing the sucker apart (no DCACHE_MAY_FREE set) ... and just leave it alone, letting __dentry_kill() do the rest of its thing - it's already off the shrink list, so __dentry_kill() will do everything, including dentry_free(). The reason for that dance is the locking - shrink list belongs to whoever has set it up and nobody else is modifying it. So __dentry_kill() doesn't even try to remove the victim from there; it does all the teardown (detaches from inode, unhashes, etc.) and leaves removal from the shrink list and actual freeing to the owner of shrink list. That way we don't have to protect all shrink lists a single lock (contention on it would be painful) and we don't have to play with per-shrink-list locks and all the attendant headaches (those lists usually live on stack frame of some function, so just having the lock next to the list_head would do us no good, etc.). Much easier to have the shrink_dentry_list() do all the manipulations... The bottom line is, once it's on a shrink list, it'll stay there until shrink_dentry_list(). It may get extra references after being inserted there (e.g. be found by hash lookup), it may drop those, whatever - it won't get freed until we run shrink_dentry_list(). If it ends up with extra references, no problem - shrink_dentry_list() will just kick it off the shrink list and leave it alone. Note, BTW, that umount coming between isolate and drop is not a problem; it call shrink_dcache_parent() on the root. And if shrink_dcache_parent() finds something on (another) shrink list, it won't put it to the shrink list of its own, but it will make note of that and repeat the scan in such case. So if we find something with zero refcount and not on shrink list, we can move it to our shrink list and be sure that its superblock won't go away under us...
[PATCH] vfs: update d_make_root() description
Clearify d_make_root() usage, error handling and cleanup requirements. Signed-off-by: Ian Kent --- Documentation/filesystems/porting | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index cf43bc4dbf31..1ebc1c6eb64b 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -428,8 +428,19 @@ release it yourself. -- [mandatory] d_alloc_root() is gone, along with a lot of bugs caused by code -misusing it. Replacement: d_make_root(inode). The difference is, -d_make_root() drops the reference to inode if dentry allocation fails. +misusing it. Replacement: d_make_root(inode). On success d_make_root(inode) +allocates and returns a new dentry instantiated with the passed in inode. +On failure NULL is returned and the passed in inode is dropped so failure +handling need not do any cleanup for the inode. If d_make_root(inode) +is passed a NULL inode it returns NULL and also requires no further +error handling. Typical usage is: + + inode = foofs_new_inode(); + s->s_root = d_make_inode(inode); + if (!s->s_root) + /* Nothing needed for the inode cleanup */ + return -ENOMEM; + ... -- [mandatory]
Re: \\ 答复: [PATCH] of: del redundant type conversion
On 4/10/19 9:13 PM, Frank Rowand wrote: > On 4/10/19 6:51 PM, xiaojiangfeng wrote: >> My pleasure. >> >> I am very new to sparse. >> >> I guess the warning is caused by the macro min. > > I think the warning is likely because the type of data is 'void *'. > > Removing the (int) cast is a good fix, but does not resolve > the sparse warning. Let me correct myself. When I ran sparse, I see the removing min() does eliminate the sparse warning. I'm not sure why, so I'll go dig a little deeper. -Frank > > -Frank > > >> Then I submitted my changes. >> >> Thanks for code review. >> >> >> -邮件原件- >> 发件人: Frank Rowand [mailto:frowand.l...@gmail.com] >> 发送时间: 2019年4月11日 2:50 >> 收件人: xiaojiangfeng ; robh...@kernel.org; >> r...@kernel.org >> 抄送: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org >> 主题: Re: [PATCH] of: del redundant type conversion >> >> On 4/10/19 1:29 AM, xiaojiangfeng wrote: >>> The type of variable l in early_init_dt_scan_chosen is int, there is >>> no need to convert to int. >>> >>> Signed-off-by: xiaojiangfeng >>> --- >>> drivers/of/fdt.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index >>> 4734223..de893c9 100644 >>> --- a/drivers/of/fdt.c >>> +++ b/drivers/of/fdt.c >>> @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long >>> node, const char *uname, >>> /* Retrieve command line */ >>> p = of_get_flat_dt_prop(node, "bootargs", ); >>> if (p != NULL && l > 0) >>> - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE)); >>> + strlcpy(data, p, min(l, COMMAND_LINE_SIZE)); >>> >>> /* >>> * CONFIG_CMDLINE is meant to be a default in case nothing else >>> >> >> Thanks for catching the redundant cast. >> >> There is a second problem detected by sparse on that line: >> >> drivers/of/fdt.c:1094:34: warning: expression using sizeof(void) >> >> Can you please fix both issues? >> >> Thanks, >> >> Frank >> > >
Re: \\ 答复: [PATCH] of: del redundant type conversion
On 4/10/19 6:51 PM, xiaojiangfeng wrote: > My pleasure. > > I am very new to sparse. > > I guess the warning is caused by the macro min. I think the warning is likely because the type of data is 'void *'. Removing the (int) cast is a good fix, but does not resolve the sparse warning. -Frank > Then I submitted my changes. > > Thanks for code review. > > > -邮件原件- > 发件人: Frank Rowand [mailto:frowand.l...@gmail.com] > 发送时间: 2019年4月11日 2:50 > 收件人: xiaojiangfeng ; robh...@kernel.org; > r...@kernel.org > 抄送: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org > 主题: Re: [PATCH] of: del redundant type conversion > > On 4/10/19 1:29 AM, xiaojiangfeng wrote: >> The type of variable l in early_init_dt_scan_chosen is int, there is >> no need to convert to int. >> >> Signed-off-by: xiaojiangfeng >> --- >> drivers/of/fdt.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index >> 4734223..de893c9 100644 >> --- a/drivers/of/fdt.c >> +++ b/drivers/of/fdt.c >> @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long >> node, const char *uname, >> /* Retrieve command line */ >> p = of_get_flat_dt_prop(node, "bootargs", ); >> if (p != NULL && l > 0) >> -strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE)); >> +strlcpy(data, p, min(l, COMMAND_LINE_SIZE)); >> >> /* >> * CONFIG_CMDLINE is meant to be a default in case nothing else >> > > Thanks for catching the redundant cast. > > There is a second problem detected by sparse on that line: > > drivers/of/fdt.c:1094:34: warning: expression using sizeof(void) > > Can you please fix both issues? > > Thanks, > > Frank >
Re: [PATCH v3 1/2] cpufreq: Add sunxi nvmem based CPU scaling driver
On 10-04-19, 13:41, Yangtao Li wrote: > For some SoCs, the CPU frequency subset and voltage value of each OPP > varies based on the silicon variant in use. The sunxi-cpufreq-nvmem > driver reads the efuse value from the SoC to provide the OPP framework > with required information. > > Signed-off-by: Yangtao Li > --- > MAINTAINERS | 7 + > drivers/cpufreq/Kconfig.arm | 10 ++ > drivers/cpufreq/Makefile | 1 + > drivers/cpufreq/cpufreq-dt-platdev.c | 2 + > drivers/cpufreq/sunxi-cpufreq-nvmem.c | 232 ++ > 5 files changed, 252 insertions(+) > create mode 100644 drivers/cpufreq/sunxi-cpufreq-nvmem.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index 391405091c6b..bfd18ba6aa1a 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -667,6 +667,13 @@ S: Maintained > F: Documentation/i2c/busses/i2c-ali1563 > F: drivers/i2c/busses/i2c-ali1563.c > > +ALLWINNER CPUFREQ DRIVER > +M: Yangtao Li > +L: linux...@vger.kernel.org > +S: Maintained > +F: Documentation/devicetree/bindings/opp/sunxi-nvmem-cpufreq.txt > +F: drivers/cpufreq/sunxi-cpufreq-nvmem.c > + > ALLWINNER SECURITY SYSTEM > M: Corentin Labbe > L: linux-cry...@vger.kernel.org > diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm > index 179a1d302f48..25933c4321a7 100644 > --- a/drivers/cpufreq/Kconfig.arm > +++ b/drivers/cpufreq/Kconfig.arm > @@ -18,6 +18,16 @@ config ACPI_CPPC_CPUFREQ > > If in doubt, say N. > > +config ARM_ALLWINNER_CPUFREQ_NVMEM > + tristate "Allwinner nvmem based CPUFreq" > + depends on ARCH_SUNXI > + depends on NVMEM_SUNXI_SID > + select PM_OPP > + help > + This adds the CPUFreq driver for Allwinner nvmem based SoC. > + > + If in doubt, say N. > + > config ARM_ARMADA_37XX_CPUFREQ > tristate "Armada 37xx CPUFreq support" > depends on ARCH_MVEBU && CPUFREQ_DT > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile > index 689b26c6f949..da28de67613c 100644 > --- a/drivers/cpufreq/Makefile > +++ b/drivers/cpufreq/Makefile > @@ -78,6 +78,7 @@ obj-$(CONFIG_ARM_SCMI_CPUFREQ) += > scmi-cpufreq.o > obj-$(CONFIG_ARM_SCPI_CPUFREQ) += scpi-cpufreq.o > obj-$(CONFIG_ARM_SPEAR_CPUFREQ) += spear-cpufreq.o > obj-$(CONFIG_ARM_STI_CPUFREQ)+= sti-cpufreq.o > +obj-$(CONFIG_ARM_ALLWINNER_CPUFREQ_NVMEM) += sunxi-cpufreq-nvmem.o > obj-$(CONFIG_ARM_TANGO_CPUFREQ) += tango-cpufreq.o > obj-$(CONFIG_ARM_TEGRA20_CPUFREQ)+= tegra20-cpufreq.o > obj-$(CONFIG_ARM_TEGRA124_CPUFREQ) += tegra124-cpufreq.o > diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c > b/drivers/cpufreq/cpufreq-dt-platdev.c > index 47729a22c159..50e7810f3a28 100644 > --- a/drivers/cpufreq/cpufreq-dt-platdev.c > +++ b/drivers/cpufreq/cpufreq-dt-platdev.c > @@ -105,6 +105,8 @@ static const struct of_device_id whitelist[] __initconst > = { > * platforms using "operating-points-v2" property. > */ > static const struct of_device_id blacklist[] __initconst = { > + { .compatible = "allwinner,sun50i-h6", }, > + > { .compatible = "calxeda,highbank", }, > { .compatible = "calxeda,ecx-2000", }, > > diff --git a/drivers/cpufreq/sunxi-cpufreq-nvmem.c > b/drivers/cpufreq/sunxi-cpufreq-nvmem.c > new file mode 100644 > index ..6bf4755d00d9 > --- /dev/null > +++ b/drivers/cpufreq/sunxi-cpufreq-nvmem.c > @@ -0,0 +1,232 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Allwinner CPUFreq nvmem based driver > + * > + * The sunxi-cpufreq-nvmem driver reads the efuse value from the SoC to > + * provide the OPP framework with required information. > + * > + * Copyright (C) 2019 Yangtao Li > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define MAX_NAME_LEN 7 > + > +struct sunxi_cpufreq_soc_data { > + u32 (*efuse_xlate)(const struct sunxi_cpufreq_soc_data *soc_data, > +u32 efuse); > + u32 nvmem_mask; > + u32 nvmem_shift; > +}; > + > +static struct platform_device *cpufreq_dt_pdev, *sunxi_cpufreq_pdev; > + > +static u32 sun50i_efuse_xlate(const struct sunxi_cpufreq_soc_data *soc_data, > + u32 efuse) > +{ > + return (efuse >> soc_data->nvmem_shift) & soc_data->nvmem_mask; > +} > + > +/** > + * sunxi_cpufreq_get_efuse() - Parse and return efuse value present on SoC > + * @soc_data: Pointer to sunxi_cpufreq_soc_data context > + * @versions: Set to the value parsed from efuse > + * > + * Returns 0 if success. > + */ > +static int sunxi_cpufreq_get_efuse(const struct sunxi_cpufreq_soc_data > *soc_data, > +u32 *versions) > +{ > + struct nvmem_cell *speedbin_nvmem; > + struct device_node *np; > + struct device *cpu_dev; > + u32 *speedbin; > + size_t len; > + int ret; > + > + cpu_dev
[v2 PATCH 6/9] mm: vmscan: don't demote for memcg reclaim
The memcg reclaim happens when the limit is breached, but demotion just migrate pages to the other node instead of reclaiming them. This sounds pointless to memcg reclaim since the usage is not reduced at all. Signed-off-by: Yang Shi --- mm/vmscan.c | 38 +- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2a96609..80cd624 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1046,8 +1046,12 @@ static void page_check_dirty_writeback(struct page *page, mapping->a_ops->is_dirty_writeback(page, dirty, writeback); } -static inline bool is_demote_ok(int nid) +static inline bool is_demote_ok(int nid, struct scan_control *sc) { + /* It is pointless to do demotion in memcg reclaim */ + if (!global_reclaim(sc)) + return false; + /* Current node is cpuless node */ if (!node_state(nid, N_CPU_MEM)) return false; @@ -1267,7 +1271,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * Demotion only happen from primary nodes * to cpuless nodes. */ - if (is_demote_ok(page_to_nid(page))) { + if (is_demote_ok(page_to_nid(page), sc)) { list_add(>lru, _pages); unlock_page(page); continue; @@ -2219,7 +2223,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file, * deactivation is pointless. */ if (!file && !total_swap_pages && - !is_demote_ok(pgdat->node_id)) + !is_demote_ok(pgdat->node_id, sc)) return false; inactive = lruvec_lru_size(lruvec, inactive_lru, sc->reclaim_idx); @@ -2306,7 +2310,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, * * If current node is already PMEM node, demotion is not applicable. */ - if (!is_demote_ok(pgdat->node_id)) { + if (!is_demote_ok(pgdat->node_id, sc)) { /* * If we have no swap space, do not bother scanning * anon pages. @@ -2315,18 +2319,18 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, scan_balance = SCAN_FILE; goto out; } + } - /* -* Global reclaim will swap to prevent OOM even with no -* swappiness, but memcg users want to use this knob to -* disable swapping for individual groups completely when -* using the memory controller's swap limit feature would be -* too expensive. -*/ - if (!global_reclaim(sc) && !swappiness) { - scan_balance = SCAN_FILE; - goto out; - } + /* +* Global reclaim will swap to prevent OOM even with no +* swappiness, but memcg users want to use this knob to +* disable swapping for individual groups completely when +* using the memory controller's swap limit feature would be +* too expensive. +*/ + if (!global_reclaim(sc) && !swappiness) { + scan_balance = SCAN_FILE; + goto out; } /* @@ -2675,7 +2679,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, */ pages_for_compaction = compact_gap(sc->order); inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); - if (get_nr_swap_pages() > 0 || is_demote_ok(pgdat->node_id)) + if (get_nr_swap_pages() > 0 || is_demote_ok(pgdat->node_id, sc)) inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); if (sc->nr_reclaimed < pages_for_compaction && inactive_lru_pages > pages_for_compaction) @@ -3373,7 +3377,7 @@ static void age_active_anon(struct pglist_data *pgdat, struct mem_cgroup *memcg; /* Aging anon page as long as demotion is fine */ - if (!total_swap_pages && !is_demote_ok(pgdat->node_id)) + if (!total_swap_pages && !is_demote_ok(pgdat->node_id, sc)) return; memcg = mem_cgroup_iter(NULL, NULL, NULL); -- 1.8.3.1
[v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node
With Dave Hansen's patches merged into Linus's tree https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308fd01d9fb33a16f64d2fd95f8830a4 PMEM could be hot plugged as NUMA node now. But, how to use PMEM as NUMA node effectively and efficiently is still a question. There have been a couple of proposals posted on the mailing list [1] [2] [3]. Changelog = v1 --> v2: * Dropped the default allocation node mask. The memory placement restriction could be achieved by mempolicy or cpuset. * Dropped the new mempolicy since its semantic is not that clear yet. * Dropped PG_Promote flag. * Defined N_CPU_MEM nodemask for the nodes which have both CPU and memory. * Extended page_check_references() to implement "twice access" check for anonymous page in NUMA balancing path. * Reworked the memory demotion code. v1: https://lore.kernel.org/linux-mm/1553316275-21985-1-git-send-email-yang@linux.alibaba.com/ Design == Basically, the approach is aimed to spread data from DRAM (closest to local CPU) down further to PMEM and disk (typically assume the lower tier storage is slower, larger and cheaper than the upper tier) by their hotness. The patchset tries to achieve this goal by doing memory promotion/demotion via NUMA balancing and memory reclaim as what the below diagram shows: DRAM <--> PMEM <--> Disk ^ ^ |---| swap When DRAM has memory pressure, demote pages to PMEM via page reclaim path. Then NUMA balancing will promote pages to DRAM as long as the page is referenced again. The memory pressure on PMEM node would push the inactive pages of PMEM to disk via swap. The promotion/demotion happens only between "primary" nodes (the nodes have both CPU and memory) and PMEM nodes. No promotion/demotion between PMEM nodes and promotion from DRAM to PMEM and demotion from PMEM to DRAM. The HMAT is effectively going to enforce "cpu-less" nodes for any memory range that has differentiated performance from the conventional memory pool, or differentiated performance for a specific initiator, per Dan Williams. So, assuming PMEM nodes are cpuless nodes sounds reasonable. However, cpuless nodes might be not PMEM nodes. But, actually, memory promotion/demotion doesn't care what kind of memory will be the target nodes, it could be DRAM, PMEM or something else, as long as they are the second tier memory (slower, larger and cheaper than regular DRAM), otherwise it sounds pointless to do such demotion. Defined "N_CPU_MEM" nodemask for the nodes which have both CPU and memory in order to distinguish with cpuless nodes (memory only, i.e. PMEM nodes) and memoryless nodes (some architectures, i.e. Power, may have memoryless nodes). Typically, memory allocation would happen on such nodes by default unless cpuless nodes are specified explicitly, cpuless nodes would be just fallback nodes, so they are also as known as "primary" nodes in this patchset. With two tier memory system (i.e. DRAM + PMEM), this sounds good enough to demonstrate the promotion/demotion approach for now, and this looks more architecture-independent. But it may be better to construct such node mask by reading hardware information (i.e. HMAT), particularly for more complex memory hierarchy. To reduce memory thrashing and PMEM bandwidth pressure, promote twice faulted page in NUMA balancing. Implement "twice access" check by extending page_check_references() for anonymous pages. When doing demotion, demote to the less-contended local PMEM node. If the local PMEM node is contended (i.e. migrate_pages() returns -ENOMEM), just do swap instead of demotion. To make things simple, demotion to the remote PMEM node is not allowed for now if the local PMEM node is online. If the local PMEM node is not online, just demote to the remote one. If no PMEM node online, just do normal swap. Anonymous page only for the time being since NUMA balancing can't promote unmapped page cache. Added vmstat counters for pgdemote_kswapd, pgdemote_direct and numa_pages_promoted. There are definitely still some details need to be sorted out, for example, shall respect to mempolicy in demotion path, etc. Any comment is welcome. Test The stress test was done with mmtests + applications workload (i.e. sysbench, grep, etc). Generate memory pressure by running mmtest's usemem-stress-numa-compact, then run other applications as workload to stress the promotion and demotion path. The machine was still alive after the stress test had been running for ~30 hours. The /proc/vmstat also shows: ... pgdemote_kswapd 3316563 pgdemote_direct 1930721 ... numa_pages_promoted 81838 TODO 1. Promote page cache. There are a couple of ways to handle this in kernel, i.e. promote via active LRU in reclaim path on PMEM node, or promote in mark_page_accessed(). 2. Promote/demote HugeTLB. Now HugeTLB is not on LRU and NUMA balancing just skips it. 3. May
[v2 PATCH 7/9] mm: vmscan: check if the demote target node is contended or not
When demoting to PMEM node, the target node may have memory pressure, then the memory pressure may cause migrate_pages() fail. If the failure is caused by memory pressure (i.e. returning -ENOMEM), tag the node with PGDAT_CONTENDED. The tag would be cleared once the target node is balanced again. Check if the target node is PGDAT_CONTENDED or not, if it is just skip demotion. Signed-off-by: Yang Shi --- include/linux/mmzone.h | 3 +++ mm/vmscan.c| 28 2 files changed, 31 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fba7741..de534db 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -520,6 +520,9 @@ enum pgdat_flags { * many pages under writeback */ PGDAT_RECLAIM_LOCKED, /* prevents concurrent reclaim */ + PGDAT_CONTENDED,/* the node has not enough free memory +* available +*/ }; enum zone_flags { diff --git a/mm/vmscan.c b/mm/vmscan.c index 80cd624..50cde53 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1048,6 +1048,9 @@ static void page_check_dirty_writeback(struct page *page, static inline bool is_demote_ok(int nid, struct scan_control *sc) { + int node; + nodemask_t used_mask; + /* It is pointless to do demotion in memcg reclaim */ if (!global_reclaim(sc)) return false; @@ -1060,6 +1063,13 @@ static inline bool is_demote_ok(int nid, struct scan_control *sc) if (!has_cpuless_node_online()) return false; + /* Check if the demote target node is contended or not */ + nodes_clear(used_mask); + node = find_next_best_node(nid, _mask, true); + + if (test_bit(PGDAT_CONTENDED, _DATA(node)->flags)) + return false; + return true; } @@ -1502,6 +1512,10 @@ static unsigned long shrink_page_list(struct list_head *page_list, nr_reclaimed += nr_succeeded; if (err) { + if (err == -ENOMEM) + set_bit(PGDAT_CONTENDED, + _DATA(target_nid)->flags); + putback_movable_pages(_pages); list_splice(_pages, _pages); @@ -2596,6 +2610,19 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc * scan target and the percentage scanning already complete */ lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; + + /* +* The shrink_page_list() may find the demote target node is +* contended, if so it doesn't make sense to scan anonymous +* LRU again. +* +* Need check if swap is available or not too since demotion +* may happen on swapless system. +*/ + if (!is_demote_ok(pgdat->node_id, sc) && + (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0)) + lru = LRU_FILE; + nr_scanned = targets[lru] - nr[lru]; nr[lru] = targets[lru] * (100 - percentage) / 100; nr[lru] -= min(nr[lru], nr_scanned); @@ -3458,6 +3485,7 @@ static void clear_pgdat_congested(pg_data_t *pgdat) clear_bit(PGDAT_CONGESTED, >flags); clear_bit(PGDAT_DIRTY, >flags); clear_bit(PGDAT_WRITEBACK, >flags); + clear_bit(PGDAT_CONTENDED, >flags); } /* -- 1.8.3.1
[v2 PATCH 5/9] mm: vmscan: demote anon DRAM pages to PMEM node
Since PMEM provides larger capacity than DRAM and has much lower access latency than disk, so it is a good choice to use as a middle tier between DRAM and disk in page reclaim path. With PMEM nodes, the demotion path of anonymous pages could be: DRAM -> PMEM -> swap device This patch demotes anonymous pages only for the time being and demote THP to PMEM in a whole. To avoid expensive page reclaim and/or compaction on PMEM node if there is memory pressure on it, the most conservative gfp flag is used, which would fail quickly if there is memory pressure and just wakeup kswapd on failure. The migrate_pages() would split THP to migrate one by one as base page upon THP allocation failure. Demote pages to the cloest non-DRAM node even though the system is swapless. The current logic of page reclaim just scan anon LRU when swap is on and swappiness is set properly. Demoting to PMEM doesn't need care whether swap is available or not. But, reclaiming from PMEM still skip anon LRU if swap is not available. The demotion just happens from DRAM node to its cloest PMEM node. Demoting to a remote PMEM node or migrating from PMEM to DRAM on reclaim is not allowed for now. And, define a new migration reason for demotion, called MR_DEMOTE. Demote page via async migration to avoid blocking. Signed-off-by: Yang Shi --- include/linux/gfp.h| 12 include/linux/migrate.h| 1 + include/trace/events/migrate.h | 3 +- mm/debug.c | 1 + mm/internal.h | 13 + mm/migrate.c | 15 - mm/vmscan.c| 127 +++-- 7 files changed, 149 insertions(+), 23 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index fdab7de..57ced51 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -285,6 +285,14 @@ * available and will not wake kswapd/kcompactd on failure. The _LIGHT * version does not attempt reclaim/compaction at all and is by default used * in page fault path, while the non-light is used by khugepaged. + * + * %GFP_DEMOTE is for migration on memory reclaim (a.k.a demotion) allocations. + * The allocation might happen in kswapd or direct reclaim, so assuming + * __GFP_IO and __GFP_FS are not allowed looks safer. Demotion happens for + * user pages (on LRU) only and on specific node. Generally it will fail + * quickly if memory is not available, but may wake up kswapd on failure. + * + * %GFP_TRANSHUGE_DEMOTE is used for THP demotion allocation. */ #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) @@ -300,6 +308,10 @@ #define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM) #define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) +#define GFP_DEMOTE (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_NORETRY | \ + __GFP_NOMEMALLOC | __GFP_NOWARN | __GFP_THISNODE | \ + GFP_NOWAIT) +#define GFP_TRANSHUGE_DEMOTE (GFP_DEMOTE | __GFP_COMP) /* Convert GFP flags to their corresponding migrate type */ #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 837fdd1..cfb1f57 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -25,6 +25,7 @@ enum migrate_reason { MR_MEMPOLICY_MBIND, MR_NUMA_MISPLACED, MR_CONTIG_RANGE, + MR_DEMOTE, MR_TYPES }; diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h index 705b33d..c1d5b36 100644 --- a/include/trace/events/migrate.h +++ b/include/trace/events/migrate.h @@ -20,7 +20,8 @@ EM( MR_SYSCALL, "syscall_or_cpuset")\ EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind") \ EM( MR_NUMA_MISPLACED, "numa_misplaced") \ - EMe(MR_CONTIG_RANGE,"contig_range") + EM( MR_CONTIG_RANGE,"contig_range") \ + EMe(MR_DEMOTE, "demote") /* * First define the enums in the above macros to be exported to userspace diff --git a/mm/debug.c b/mm/debug.c index c0b31b6..cc0d7df 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -25,6 +25,7 @@ "mempolicy_mbind", "numa_misplaced", "cma", + "demote", }; const struct trace_print_flags pageflag_names[] = { diff --git a/mm/internal.h b/mm/internal.h index bee4d6c..8c424b5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -383,6 +383,19 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask, } #endif +static inline bool has_cpuless_node_online(void) +{ + nodemask_t nmask; + + nodes_andnot(nmask, node_states[N_MEMORY], +node_states[N_CPU_MEM]); + + if (nodes_empty(nmask)) + return false; + + return true; +}
[v2 PATCH 9/9] mm: numa: add page promotion counter
Add counter for page promotion for NUMA balancing. Signed-off-by: Yang Shi --- include/linux/vm_event_item.h | 1 + mm/huge_memory.c | 4 mm/memory.c | 4 mm/vmstat.c | 1 + 4 files changed, 10 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 499a3aa..9f52a62 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -51,6 +51,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, NUMA_HINT_FAULTS, NUMA_HINT_FAULTS_LOCAL, NUMA_PAGE_MIGRATE, + NUMA_PAGE_PROMOTE, #endif #ifdef CONFIG_MIGRATION PGMIGRATE_SUCCESS, PGMIGRATE_FAIL, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0b18ac45..ca9d688 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1609,6 +1609,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) migrated = migrate_misplaced_transhuge_page(vma->vm_mm, vma, vmf->pmd, pmd, vmf->address, page, target_nid); if (migrated) { + if (!node_state(page_nid, N_CPU_MEM) && + node_state(target_nid, N_CPU_MEM)) + count_vm_numa_events(NUMA_PAGE_PROMOTE, HPAGE_PMD_NR); + flags |= TNF_MIGRATED; page_nid = target_nid; } else diff --git a/mm/memory.c b/mm/memory.c index 01c1ead..7b1218b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3704,6 +3704,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) /* Migrate to the requested node */ migrated = migrate_misplaced_page(page, vma, target_nid); if (migrated) { + if (!node_state(page_nid, N_CPU_MEM) && + node_state(target_nid, N_CPU_MEM)) + count_vm_numa_event(NUMA_PAGE_PROMOTE); + page_nid = target_nid; flags |= TNF_MIGRATED; } else diff --git a/mm/vmstat.c b/mm/vmstat.c index d1e4993..fd194e3 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1220,6 +1220,7 @@ int fragmentation_index(struct zone *zone, unsigned int order) "numa_hint_faults", "numa_hint_faults_local", "numa_pages_migrated", + "numa_pages_promoted", #endif #ifdef CONFIG_MIGRATION "pgmigrate_success", -- 1.8.3.1
[v2 PATCH 1/9] mm: define N_CPU_MEM node states
Kernel has some pre-defined node masks called node states, i.e. N_MEMORY, N_CPU, etc. But, there might be cpuless nodes, i.e. PMEM nodes, and some architectures, i.e. Power, may have memoryless nodes. It is not very straight forward to get the nodes with both CPUs and memory. So, define N_CPU_MEMORY node states. The nodes with both CPUs and memory are called "primary" nodes. /sys/devices/system/node/primary would show the current online "primary" nodes. Signed-off-by: Yang Shi --- drivers/base/node.c | 2 ++ include/linux/nodemask.h | 3 ++- mm/memory_hotplug.c | 6 ++ mm/page_alloc.c | 1 + mm/vmstat.c | 11 +-- 5 files changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 86d6cd9..1b963b2 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -634,6 +634,7 @@ static ssize_t show_node_state(struct device *dev, #endif [N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY), [N_CPU] = _NODE_ATTR(has_cpu, N_CPU), + [N_CPU_MEM] = _NODE_ATTR(primary, N_CPU_MEM), }; static struct attribute *node_state_attrs[] = { @@ -645,6 +646,7 @@ static ssize_t show_node_state(struct device *dev, #endif _state_attr[N_MEMORY].attr.attr, _state_attr[N_CPU].attr.attr, + _state_attr[N_CPU_MEM].attr.attr, NULL }; diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 27e7fa3..66a8964 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -398,7 +398,8 @@ enum node_states { N_HIGH_MEMORY = N_NORMAL_MEMORY, #endif N_MEMORY, /* The node has memory(regular, high, movable) */ - N_CPU, /* The node has one or more cpus */ + N_CPU, /* The node has one or more cpus */ + N_CPU_MEM, /* The node has both cpus and memory */ NR_NODE_STATES }; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f767582..1140f3b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -729,6 +729,9 @@ static void node_states_set_node(int node, struct memory_notify *arg) if (arg->status_change_nid >= 0) node_set_state(node, N_MEMORY); + + if (node_state(node, N_CPU)) + node_set_state(node, N_CPU_MEM); } static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn, @@ -1569,6 +1572,9 @@ static void node_states_clear_node(int node, struct memory_notify *arg) if (arg->status_change_nid >= 0) node_clear_state(node, N_MEMORY); + + if (node_state(node, N_CPU)) + node_clear_state(node, N_CPU_MEM); } static int __ref __offline_pages(unsigned long start_pfn, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03fcf73..7cd88a4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -122,6 +122,7 @@ struct pcpu_drain { #endif [N_MEMORY] = { { [0] = 1UL } }, [N_CPU] = { { [0] = 1UL } }, + [N_CPU_MEM] = { { [0] = 1UL } }, #endif /* NUMA */ }; EXPORT_SYMBOL(node_states); diff --git a/mm/vmstat.c b/mm/vmstat.c index 36b56f8..1a431dc 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1910,15 +1910,22 @@ static void __init init_cpu_node_state(void) int node; for_each_online_node(node) { - if (cpumask_weight(cpumask_of_node(node)) > 0) + if (cpumask_weight(cpumask_of_node(node)) > 0) { node_set_state(node, N_CPU); + if (node_state(node, N_MEMORY)) + node_set_state(node, N_CPU_MEM); + } } } static int vmstat_cpu_online(unsigned int cpu) { + int node = cpu_to_node(cpu); + refresh_zone_stat_thresholds(); - node_set_state(cpu_to_node(cpu), N_CPU); + node_set_state(node, N_CPU); + if (node_state(node, N_MEMORY)) + node_set_state(node, N_CPU_MEM); return 0; } -- 1.8.3.1
[v2 PATCH 8/9] mm: vmscan: add page demotion counter
Account the number of demoted pages into reclaim_state->nr_demoted. Add pgdemote_kswapd and pgdemote_direct VM counters showed in /proc/vmstat. Signed-off-by: Yang Shi --- include/linux/vm_event_item.h | 2 ++ include/linux/vmstat.h| 1 + mm/internal.h | 1 + mm/vmscan.c | 7 +++ mm/vmstat.c | 2 ++ 5 files changed, 13 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 47a3441..499a3aa 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -32,6 +32,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, PGREFILL, PGSTEAL_KSWAPD, PGSTEAL_DIRECT, + PGDEMOTE_KSWAPD, + PGDEMOTE_DIRECT, PGSCAN_KSWAPD, PGSCAN_DIRECT, PGSCAN_DIRECT_THROTTLE, diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 2db8d60..eb5d21c 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -29,6 +29,7 @@ struct reclaim_stat { unsigned nr_activate; unsigned nr_ref_keep; unsigned nr_unmap_fail; + unsigned nr_demoted; }; #ifdef CONFIG_VM_EVENT_COUNTERS diff --git a/mm/internal.h b/mm/internal.h index 8c424b5..8ba4853 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -156,6 +156,7 @@ struct scan_control { unsigned int immediate; unsigned int file_taken; unsigned int taken; + unsigned int demoted; } nr; }; diff --git a/mm/vmscan.c b/mm/vmscan.c index 50cde53..a52c8248 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1511,6 +1511,12 @@ static unsigned long shrink_page_list(struct list_head *page_list, nr_reclaimed += nr_succeeded; + stat->nr_demoted = nr_succeeded; + if (current_is_kswapd()) + __count_vm_events(PGDEMOTE_KSWAPD, stat->nr_demoted); + else + __count_vm_events(PGDEMOTE_DIRECT, stat->nr_demoted); + if (err) { if (err == -ENOMEM) set_bit(PGDAT_CONTENDED, @@ -2019,6 +2025,7 @@ static int current_may_throttle(void) sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; sc->nr.writeback += stat.nr_writeback; sc->nr.immediate += stat.nr_immediate; + sc->nr.demoted += stat.nr_demoted; sc->nr.taken += nr_taken; if (file) sc->nr.file_taken += nr_taken; diff --git a/mm/vmstat.c b/mm/vmstat.c index 1a431dc..d1e4993 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1192,6 +1192,8 @@ int fragmentation_index(struct zone *zone, unsigned int order) "pgrefill", "pgsteal_kswapd", "pgsteal_direct", + "pgdemote_kswapd", + "pgdemote_direct", "pgscan_kswapd", "pgscan_direct", "pgscan_direct_throttle", -- 1.8.3.1
[v2 PATCH 3/9] mm: numa: promote pages to DRAM when it gets accessed twice
NUMA balancing would promote the pages to DRAM once it is accessed, but it might be just one off access. To reduce migration thrashing and memory bandwidth pressure, just promote the page which gets accessed twice by extending page_check_references() to support second reference algorithm for anonymous page. The page_check_reference() would walk all mapped pte or pmd to check if the page is referenced or not, but such walk sounds unnecessary to NUMA balancing since NUMA balancing would have pte or pmd referenced bit set all the time, so anonymous page would have at least one referenced pte or pmd. And, distinguish with page reclaim path via scan_control, scan_control would be NULL in NUMA balancing path. This approach is not definitely the optimal one to distinguish the hot or cold pages accurately. It may need much more sophisticated algorithm to distinguish hot or cold pages accurately. Signed-off-by: Yang Shi --- mm/huge_memory.c | 11 ++ mm/internal.h| 80 ++ mm/memory.c | 21 ++ mm/vmscan.c | 116 --- 4 files changed, 146 insertions(+), 82 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 404acdc..0b18ac45 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1590,6 +1590,17 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) } /* +* Promote the page when it gets NUMA fault twice. +* It is safe to set page flag since the page is locked now. +*/ + if (!node_state(page_nid, N_CPU_MEM) && + page_check_references(page, NULL) != PAGEREF_PROMOTE) { + put_page(page); + page_nid = NUMA_NO_NODE; + goto clear_pmdnuma; + } + + /* * Migrate the THP to the requested node, returns with page unlocked * and access rights restored. */ diff --git a/mm/internal.h b/mm/internal.h index a514808..bee4d6c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -89,8 +89,88 @@ static inline void set_page_refcounted(struct page *page) /* * in mm/vmscan.c: */ +struct scan_control { + /* How many pages shrink_list() should reclaim */ + unsigned long nr_to_reclaim; + + /* +* Nodemask of nodes allowed by the caller. If NULL, all nodes +* are scanned. +*/ + nodemask_t *nodemask; + + /* +* The memory cgroup that hit its limit and as a result is the +* primary target of this reclaim invocation. +*/ + struct mem_cgroup *target_mem_cgroup; + + /* Writepage batching in laptop mode; RECLAIM_WRITE */ + unsigned int may_writepage:1; + + /* Can mapped pages be reclaimed? */ + unsigned int may_unmap:1; + + /* Can pages be swapped as part of reclaim? */ + unsigned int may_swap:1; + + /* e.g. boosted watermark reclaim leaves slabs alone */ + unsigned int may_shrinkslab:1; + + /* +* Cgroups are not reclaimed below their configured memory.low, +* unless we threaten to OOM. If any cgroups are skipped due to +* memory.low and nothing was reclaimed, go back for memory.low. +*/ + unsigned int memcg_low_reclaim:1; + unsigned int memcg_low_skipped:1; + + unsigned int hibernation_mode:1; + + /* One of the zones is ready for compaction */ + unsigned int compaction_ready:1; + + /* Allocation order */ + s8 order; + + /* Scan (total_size >> priority) pages at once */ + s8 priority; + + /* The highest zone to isolate pages for reclaim from */ + s8 reclaim_idx; + + /* This context's GFP mask */ + gfp_t gfp_mask; + + /* Incremented by the number of inactive pages that were scanned */ + unsigned long nr_scanned; + + /* Number of pages freed so far during a call to shrink_zones() */ + unsigned long nr_reclaimed; + + struct { + unsigned int dirty; + unsigned int unqueued_dirty; + unsigned int congested; + unsigned int writeback; + unsigned int immediate; + unsigned int file_taken; + unsigned int taken; + } nr; +}; + +enum page_references { + PAGEREF_RECLAIM, + PAGEREF_RECLAIM_CLEAN, + PAGEREF_KEEP, + PAGEREF_ACTIVATE, + PAGEREF_PROMOTE = PAGEREF_ACTIVATE, +}; + extern int isolate_lru_page(struct page *page); extern void putback_lru_page(struct page *page); +enum page_references page_check_references(struct page *page, + struct scan_control *sc); /* * in mm/rmap.c: diff --git a/mm/memory.c b/mm/memory.c index 47fe250..01c1ead 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3680,6 +3680,27 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) goto out; } + /* +* Promote
[v2 PATCH 4/9] mm: migrate: make migrate_pages() return nr_succeeded
The migrate_pages() returns the number of pages that were not migrated, or an error code. When returning an error code, there is no way to know how many pages were migrated or not migrated. In the following patch, migrate_pages() is used to demote pages to PMEM node, we need account how many pages are reclaimed (demoted) since page reclaim behavior depends on this. Add *nr_succeeded parameter to make migrate_pages() return how many pages are demoted successfully for all cases. Signed-off-by: Yang Shi --- include/linux/migrate.h | 5 +++-- mm/compaction.c | 3 ++- mm/gup.c| 4 +++- mm/memory-failure.c | 7 +-- mm/memory_hotplug.c | 4 +++- mm/mempolicy.c | 7 +-- mm/migrate.c| 18 ++ mm/page_alloc.c | 4 +++- 8 files changed, 34 insertions(+), 18 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index e13d9bf..837fdd1 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -66,7 +66,8 @@ extern int migrate_page(struct address_space *mapping, struct page *newpage, struct page *page, enum migrate_mode mode); extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, - unsigned long private, enum migrate_mode mode, int reason); + unsigned long private, enum migrate_mode mode, int reason, + unsigned int *nr_succeeded); extern int isolate_movable_page(struct page *page, isolate_mode_t mode); extern void putback_movable_page(struct page *page); @@ -84,7 +85,7 @@ extern int migrate_page_move_mapping(struct address_space *mapping, static inline void putback_movable_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, unsigned long private, enum migrate_mode mode, - int reason) + int reason, unsigned int *nr_succeeded) { return -ENOSYS; } static inline int isolate_movable_page(struct page *page, isolate_mode_t mode) { return -EBUSY; } diff --git a/mm/compaction.c b/mm/compaction.c index f171a83..c6a0ec4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2065,6 +2065,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, unsigned long last_migrated_pfn; const bool sync = cc->mode != MIGRATE_ASYNC; bool update_cached; + unsigned int nr_succeeded = 0; cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask); ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags, @@ -2173,7 +2174,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, err = migrate_pages(>migratepages, compaction_alloc, compaction_free, (unsigned long)cc, cc->mode, - MR_COMPACTION); + MR_COMPACTION, _succeeded); trace_mm_compaction_migratepages(cc->nr_migratepages, err, >migratepages); diff --git a/mm/gup.c b/mm/gup.c index f84e226..b482b8c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1217,6 +1217,7 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages, long i; bool drain_allow = true; bool migrate_allow = true; + unsigned int nr_succeeded = 0; LIST_HEAD(cma_page_list); check_again: @@ -1257,7 +1258,8 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages, put_page(pages[i]); if (migrate_pages(_page_list, new_non_cma_page, - NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) { + NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE, + _succeeded)) { /* * some of the pages failed migration. Do get_user_pages * without migration. diff --git a/mm/memory-failure.c b/mm/memory-failure.c index fc8b517..b5d8a8f 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1686,6 +1686,7 @@ static int soft_offline_huge_page(struct page *page, int flags) int ret; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); + unsigned int nr_succeeded = 0; LIST_HEAD(pagelist); /* @@ -1713,7 +1714,7 @@ static int soft_offline_huge_page(struct page *page, int flags) } ret = migrate_pages(, new_page, NULL, MPOL_MF_MOVE_ALL, - MIGRATE_SYNC, MR_MEMORY_FAILURE); + MIGRATE_SYNC, MR_MEMORY_FAILURE, _succeeded); if (ret) { pr_info("soft offline: %#lx: hugepage migration failed %d, type %lx (%pGp)\n", pfn, ret, page->flags, >flags); @@ -1742,6 +1743,7
[v2 PATCH 2/9] mm: page_alloc: make find_next_best_node find return cpuless node
Need find the cloest cpuless node to demote DRAM pages. Add "cpuless" parameter to find_next_best_node() to skip DRAM node on demand. Signed-off-by: Yang Shi --- mm/internal.h | 11 +++ mm/page_alloc.c | 14 ++ 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 9eeaf2b..a514808 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -292,6 +292,17 @@ static inline bool is_data_mapping(vm_flags_t flags) return (flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE; } +#ifdef CONFIG_NUMA +extern int find_next_best_node(int node, nodemask_t *used_node_mask, + bool cpuless); +#else +static inline int find_next_best_node(int node, nodemask_t *used_node_mask, + bool cpuless) +{ + return 0; +} +#endif + /* mm/util.c */ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev, struct rb_node *rb_parent); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7cd88a4..bda17c2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5362,6 +5362,7 @@ int numa_zonelist_order_handler(struct ctl_table *table, int write, * find_next_best_node - find the next node that should appear in a given node's fallback list * @node: node whose fallback list we're appending * @used_node_mask: nodemask_t of already used nodes + * @cpuless: find next best cpuless node * * We use a number of factors to determine which is the next node that should * appear on a given node's fallback list. The node should not have appeared @@ -5373,7 +5374,8 @@ int numa_zonelist_order_handler(struct ctl_table *table, int write, * * Return: node id of the found node or %NUMA_NO_NODE if no node is found. */ -static int find_next_best_node(int node, nodemask_t *used_node_mask) +int find_next_best_node(int node, nodemask_t *used_node_mask, + bool cpuless) { int n, val; int min_val = INT_MAX; @@ -5381,13 +5383,18 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) const struct cpumask *tmp = cpumask_of_node(0); /* Use the local node if we haven't already */ - if (!node_isset(node, *used_node_mask)) { + if (!node_isset(node, *used_node_mask) && + !cpuless) { node_set(node, *used_node_mask); return node; } for_each_node_state(n, N_MEMORY) { + /* Find next best cpuless node */ + if (cpuless && (node_state(n, N_CPU))) + continue; + /* Don't want a node to appear more than once */ if (node_isset(n, *used_node_mask)) continue; @@ -5419,7 +5426,6 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) return best_node; } - /* * Build zonelists ordered by node and zones within node. * This results in maximum locality--normal zone overflows into local @@ -5481,7 +5487,7 @@ static void build_zonelists(pg_data_t *pgdat) nodes_clear(used_mask); memset(node_order, 0, sizeof(node_order)); - while ((node = find_next_best_node(local_node, _mask)) >= 0) { + while ((node = find_next_best_node(local_node, _mask, false)) >= 0) { /* * We don't want to pressure a particular node. * So adding penalty to the first node in same -- 1.8.3.1
[PATCH RESEND] fs: drop unused fput_atomic definition
commit d7065da03822 ("get rid of the magic around f_count in aio") added fput_atomic to include/linux/fs.h, motivated by its use in __aio_put_req() in fs/aio.c. Later, commit 3ffa3c0e3f6e ("aio: now fput() is OK from interrupt context; get rid of manual delayed __fput()") removed the only use of fput_atomic in __aio_put_req(), but did not remove the since then unused fput_atomic definition in include/linux/fs.h. We curate this now and finally remove the unused definition. This issue was identified during a code review due to a coccinelle warning from the atomic_as_refcounter.cocci rule pointing to the use of atomic_t in fput_atomic. Suggested-by: Krystian Radlak Signed-off-by: Lukas Bulwahn --- v1: - sent on 2018-01-12, got no response https://lore.kernel.org/lkml/20190112055430.5860-1-lukas.bulw...@gmail.com/ v1 resend: - rebased to v5.1-rc4 - added Jens to recipient list as he touched the place lately closeby in commit 091141a42e15 ("fs: add fget_many() and fput_many()") - compile-tested with defconfig on v5.1-rc4 and next-20190410 include/linux/fs.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index dd28e7679089..79b2f43b945d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -969,7 +969,6 @@ static inline struct file *get_file(struct file *f) #define get_file_rcu_many(x, cnt) \ atomic_long_add_unless(&(x)->f_count, (cnt), 0) #define get_file_rcu(x) get_file_rcu_many((x), 1) -#define fput_atomic(x) atomic_long_add_unless(&(x)->f_count, -1, 1) #define file_count(x) atomic_long_read(&(x)->f_count) #defineMAX_NON_LFS ((1UL<<31) - 1) -- 2.17.1
Re: [PATCH v2] init: Do not select DEBUG_KERNEL by default
On 4/10/19 8:02 PM, Josh Triplett wrote: > On April 10, 2019 4:24:18 PM PDT, Kees Cook wrote: >> On Wed, Apr 10, 2019 at 4:22 PM Josh Triplett >> wrote: >>> >>> On April 10, 2019 3:58:55 PM PDT, Kees Cook >> wrote: On Wed, Apr 10, 2019 at 3:42 PM Sinan Kaya wrote: > > We can't seem to have a kernel with CONFIG_EXPERT set but > CONFIG_DEBUG_KERNEL unset these days. > > While some of the features under the CONFIG_EXPERT require > CONFIG_DEBUG_KERNEL, it doesn't apply for all features. > > It looks like CONFIG_KALLSYMS_ALL is the only feature that > requires CONFIG_DEBUG_KERNEL. > > Select CONFIG_EXPERT when CONFIG_DEBUG is chosen but you can Typo: CONFIG_DEBUG_KERNEL > still choose CONFIG_EXPERT without CONFIG_DEBUG. same. > > Signed-off-by: Sinan Kaya But with those fixed, looks good to me. Adding Josh (and others) to >> CC since he originally added the linkage to EXPERT in commit f505c553dbe2. >>> >>> CONFIG_DEBUG_KERNEL shouldn't affect code generation in any way; it >> should only make more options appear in kconfig. I originally added >> this to ensure that features you might want to *disable* aren't hidden, >> as part of the tinification effort. >>> >>> What specific problem does having CONFIG_DEBUG_KERNEL enabled cause >> for you? I'd still prefer to have a single switch for "don't hide >> things I might want to disable", rather than several. >> >> See earlier in the thread: code generation depends on >> CONFIG_DEBUG_KERNEL now unfortunately. > > Then let's fix *that*, and get checkpatch to help enforce it in the future. > EXPERT doesn't affect code generation, and neither should this. > checkpatch is not an enforcer. It takes maintainers to do that. -- ~Randy
Re: [PATCH 2/3] clk: rockchip: Make rkpwm a critical clock on rk3288
hi, 在 2019/4/10 下午11:25, Doug Anderson 写道: Hi, On Tue, Apr 9, 2019 at 11:42 PM elaine.zhang wrote: hi, 在 2019/4/10 上午4:47, Douglas Anderson 写道: Most rk3288-based boards are derived from the EVB and thus use a PWM regulator for the logic rail. However, most rk3288-based boards don't specify the PWM regulator in their device tree. We'll deal with that by making it critical. NOTE: it's important to make it critical and not just IGNORE_UNUSED because all PWMs in the system share the same clock. We don't want another PWM user to turn the clock on and off and kill the logic rail. This change is in preparation for actually having the PWMs in the rk3288 device tree actually point to the proper PWM clock. Up until now they've all pointed to the clock for the old IP block and they've all worked due to the fact that rkpwm was IGNORE_UNUSED and that the clock rates for both clocks were the same. Signed-off-by: Douglas Anderson --- drivers/clk/rockchip/clk-rk3288.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/clk/rockchip/clk-rk3288.c b/drivers/clk/rockchip/clk-rk3288.c index 06287810474e..c3321eade23e 100644 --- a/drivers/clk/rockchip/clk-rk3288.c +++ b/drivers/clk/rockchip/clk-rk3288.c @@ -697,7 +697,7 @@ static struct rockchip_clk_branch rk3288_clk_branches[] __initdata = { GATE(PCLK_TZPC, "pclk_tzpc", "pclk_cpu", 0, RK3288_CLKGATE_CON(11), 3, GFLAGS), GATE(PCLK_UART2, "pclk_uart2", "pclk_cpu", 0, RK3288_CLKGATE_CON(11), 9, GFLAGS), GATE(PCLK_EFUSE256, "pclk_efuse_256", "pclk_cpu", 0, RK3288_CLKGATE_CON(11), 10, GFLAGS), - GATE(PCLK_RKPWM, "pclk_rkpwm", "pclk_cpu", CLK_IGNORE_UNUSED, RK3288_CLKGATE_CON(11), 11, GFLAGS), + GATE(PCLK_RKPWM, "pclk_rkpwm", "pclk_cpu", 0, RK3288_CLKGATE_CON(11), 11, GFLAGS), /* ddrctrl [DDR Controller PHY clock] gates */ GATE(0, "nclk_ddrupctl0", "ddrphy", CLK_IGNORE_UNUSED, RK3288_CLKGATE_CON(11), 4, GFLAGS), @@ -837,6 +837,7 @@ static const char *const rk3288_critical_clocks[] __initconst = { "pclk_alive_niu", "pclk_pd_pmu", "pclk_pmu_niu", + "pclk_rkpwm", pwm have device node, can enable and disable it in the pwm drivers. pwm regulator use pwm node as: pwms = < 0 25000 1> when set Logic voltage: pwm_regulator_set_voltage() --> pwm_apply_state() -->clk_enable() -->pwm_enable() -->pwm_config() -->pinctrl_select() -- For mark pclk_rkpwm as critical,do you have any questions, or provides some log or more information. Right, if we actually specify the PWM used for the PWM regulator in the device tree then there is no need to mark it as a critical clock. In fact rk3288-veyron devices boot absolutely fine without marking this clock as critical. Actually, it seems like the way the PWM framework works (IIRC it was designed this way specifically to support PWM regulators) is that even just specifying that pwm1 is "okay" is enough to keep the clock on even if the PWM regulator isn't specified. ...however... Take a look at, for instance, the rk3288-evb device tree file. Nowhere in there does it specify that the PWM used for the PWM regulator should be on. Presumably that means that if we don't mark the clock as critical then rk3288-evb will fail to boot. That's easy for me to fix since I have the rk3288-evb schematics, but what about other rk3288 boards? We could make educated guesses about each of them and/or fix things are we hear about breakages. ...but... All the above would only be worth doing if we thought someone would get some benefit out of it. I'd bet that pretty much all rk3288-based boards use a PWM regulator. Thus, in reality, everyone will want the rkpwm clock on all the time anyway. In that case going through all that extra work / potentially breaking other boards doesn't seem worth it. Just mark the clock as critical. I have no problem with changing it like this, but I think it is better to modify dts: vdd_log: vdd-log { compatible = "pwm-regulator"; rockchip,pwm_id = <2>; //for rk uboot rockchip,pwm_voltage = <90>; // for rk uboot pwms = < 0 25000 1>; regulator-name = "vdd_log"; regulator-min-microvolt = <80>;//hw logic min voltage regulator-max-microvolt = <140>;//hw logic max voltage regulator-always-on; regulator-boot-on; }; Maybe we did not push the modification of this part in rk3288-evb, I will push to deal with this.(rk3229-evb.dts and rk3399 has been already pushed) -Doug
Re: [RFC patch 40/41] stacktrace: Remove obsolete functions
On Wed, Apr 10, 2019 at 12:28:34PM +0200, Thomas Gleixner wrote: > No more users of the struct stack_trace based interfaces. Remove them. > > Remove the macro stubs for !CONFIG_STACKTRACE as well as they are pointless > because the storage on the call sites is conditional on CONFIG_STACKTRACE > already. No point to be 'smart'. > > Signed-off-by: Thomas Gleixner > --- > include/linux/stacktrace.h | 46 > +++-- > kernel/stacktrace.c| 14 - > 2 files changed, 16 insertions(+), 44 deletions(-) > > --- a/include/linux/stacktrace.h > +++ b/include/linux/stacktrace.h > @@ -8,23 +8,6 @@ struct task_struct; > struct pt_regs; > > #ifdef CONFIG_STACKTRACE > -struct stack_trace { > - unsigned int nr_entries, max_entries; > - unsigned long *entries; > - int skip; /* input argument: How many entries to skip */ > -}; > - > -extern void save_stack_trace(struct stack_trace *trace); > -extern void save_stack_trace_regs(struct pt_regs *regs, > - struct stack_trace *trace); > -extern void save_stack_trace_tsk(struct task_struct *tsk, > - struct stack_trace *trace); > -extern int save_stack_trace_tsk_reliable(struct task_struct *tsk, > - struct stack_trace *trace); > - > -extern void print_stack_trace(struct stack_trace *trace, int spaces); > -extern int snprint_stack_trace(char *buf, size_t size, > - struct stack_trace *trace, int spaces); > > extern void stack_trace_print(unsigned long *trace, unsigned int nr_entries, > int spaces); > @@ -43,20 +26,23 @@ extern unsigned int stack_trace_save_reg > extern unsigned int stack_trace_save_user(unsigned long *store, > unsigned int size, > unsigned int skipnr); > +/* > + * The below is for stack trace internals and architecture > + * implementations. Do not use in generic code. > + */ > +struct stack_trace { > + unsigned int nr_entries, max_entries; > + unsigned long *entries; > + int skip; /* input argument: How many entries to skip */ > +}; I was a bit surprised to see struct stack_trace still standing at the end of the patch set, but I guess 41 patches is enough :-) Do we want to eventually remove the struct altogether? I was also hoping to see the fragile "skipnr" go away in favor of something less dependent on compiler optimizations, but I'm not sure how feasible that would be. Regardless, these are very nice cleanups, nice work. > -#ifdef CONFIG_USER_STACKTRACE_SUPPORT > +extern void save_stack_trace(struct stack_trace *trace); > +extern void save_stack_trace_regs(struct pt_regs *regs, > + struct stack_trace *trace); > +extern void save_stack_trace_tsk(struct task_struct *tsk, > + struct stack_trace *trace); > +extern int save_stack_trace_tsk_reliable(struct task_struct *tsk, > + struct stack_trace *trace); save_stack_trace_tsk_reliable() is still in use by generic livepatch code. Also I wonder if it would make sense to rename these to __save_stack_trace_*() or arch_save_stack_trace_*() to help discourage them from being used by generic code. -- Josh
[PATCH] slab: fix an infinite loop in leaks_show()
"cat /proc/slab_allocators" could hang forever on SMP machines with kmemleak or object debugging enabled due to other CPUs running do_drain() will keep making kmemleak_object or debug_objects_cache dirty and unable to escape the first loop in leaks_show(), do { set_store_user_clean(cachep); drain_cpu_caches(cachep); ... } while (!is_store_user_clean(cachep)); For example, do_drain slabs_destroy slab_destroy kmem_cache_free __cache_free ___cache_free kmemleak_free_recursive delete_object_full __delete_object put_object free_object_rcu kmem_cache_free cache_free_debugcheck --> dirty kmemleak_object One approach is to check cachep->name and skip both kmemleak_object and debug_objects_cache in leaks_show(). The other is to set store_user_clean after drain_cpu_caches() which leaves a small window between drain_cpu_caches() and set_store_user_clean() where per-CPU caches could be dirty again lead to slightly wrong information has been stored but could also speed up things significantly which sounds like a good compromise. For example, # cat /proc/slab_allocators 0m42.778s # 1st approach 0m0.737s # 2nd approach Fixes: d31676dfde25 ("mm/slab: alternative implementation for DEBUG_SLAB_LEAK") Signed-off-by: Qian Cai --- mm/slab.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index 9142ee992493..3e1b7ff0360c 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -4328,8 +4328,12 @@ static int leaks_show(struct seq_file *m, void *p) * whole processing. */ do { - set_store_user_clean(cachep); drain_cpu_caches(cachep); + /* +* drain_cpu_caches() could always make kmemleak_object and +* debug_objects_cache dirty, so reset afterwards. +*/ + set_store_user_clean(cachep); x[1] = 0; -- 2.17.2 (Apple Git-113)
[PATCH v3 0/5] soundwire: code cleanup
SoundWire support will be provided in Linux with the Sound Open Firmware (SOF) on Intel platforms. Before we start adding the missing pieces, there are a number of warnings and style issues reported by checkpatch, cppcheck and Coccinelle that need to be cleaned-up. Changes since v2: fixed inversion of devm_kcalloc parameters, detected while rebasing additional patches. Changes since v1: added missing newlines in new patch (suggested by Joe Perches) Pierre-Louis Bossart (5): soundwire: intel: fix inversion in devm_kcalloc parameters soundwire: fix style issues soundwire: bus: remove useless initializations soundwire: stream: remove useless initialization of local variable soundwire: add missing newlines in dynamic debug logs drivers/soundwire/Kconfig | 2 +- drivers/soundwire/bus.c| 137 --- drivers/soundwire/bus.h| 16 +- drivers/soundwire/bus_type.c | 4 +- drivers/soundwire/cadence_master.c | 99 +-- drivers/soundwire/cadence_master.h | 22 +-- drivers/soundwire/intel.c | 103 ++- drivers/soundwire/intel.h | 4 +- drivers/soundwire/intel_init.c | 12 +- drivers/soundwire/mipi_disco.c | 116 +++-- drivers/soundwire/slave.c | 10 +- drivers/soundwire/stream.c | 267 +++-- 12 files changed, 404 insertions(+), 388 deletions(-) -- 2.17.1
[PATCH v3 4/5] soundwire: stream: remove useless initialization of local variable
no need to reset return value. Detected with cppcheck: [drivers/soundwire/stream.c:332]: (style) Variable 'ret' is assigned a value that is never used. Signed-off-by: Pierre-Louis Bossart --- drivers/soundwire/stream.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c index e3d2bc5cba80..ab64c2c4c33f 100644 --- a/drivers/soundwire/stream.c +++ b/drivers/soundwire/stream.c @@ -329,7 +329,7 @@ static int sdw_enable_disable_master_ports(struct sdw_master_runtime *m_rt, struct sdw_transport_params *t_params = _rt->transport_params; struct sdw_bus *bus = m_rt->bus; struct sdw_enable_ch enable_ch; - int ret = 0; + int ret; enable_ch.port_num = p_rt->num; enable_ch.ch_mask = p_rt->ch_mask; -- 2.17.1
[PATCH v3 3/5] soundwire: bus: remove useless initializations
No need for explicit initialization of page and ssp fields, they are already zeroed with a memset. Detected with cppcheck: [drivers/soundwire/bus.c:309]: (style) Variable 'msg->page' is reassigned a value before the old one has been used. Signed-off-by: Pierre-Louis Bossart --- drivers/soundwire/bus.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c index 691a31df9732..bb697fd68580 100644 --- a/drivers/soundwire/bus.c +++ b/drivers/soundwire/bus.c @@ -271,8 +271,6 @@ int sdw_fill_msg(struct sdw_msg *msg, struct sdw_slave *slave, msg->dev_num = dev_num; msg->flags = flags; msg->buf = buf; - msg->ssp_sync = false; - msg->page = false; if (addr < SDW_REG_NO_PAGE) { /* no paging area */ return 0; -- 2.17.1
[PATCH v3 5/5] soundwire: add missing newlines in dynamic debug logs
For some reason the newlines are not used everywhere. Fix as needed. Reported-by: Joe Perches Signed-off-by: Pierre-Louis Bossart --- drivers/soundwire/bus.c| 74 +-- drivers/soundwire/cadence_master.c | 12 ++-- drivers/soundwire/intel.c | 12 ++-- drivers/soundwire/stream.c | 110 ++--- 4 files changed, 104 insertions(+), 104 deletions(-) diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c index bb697fd68580..fa86957cb615 100644 --- a/drivers/soundwire/bus.c +++ b/drivers/soundwire/bus.c @@ -21,12 +21,12 @@ int sdw_add_bus_master(struct sdw_bus *bus) int ret; if (!bus->dev) { - pr_err("SoundWire bus has no device"); + pr_err("SoundWire bus has no device\n"); return -ENODEV; } if (!bus->ops) { - dev_err(bus->dev, "SoundWire Bus ops are not set"); + dev_err(bus->dev, "SoundWire Bus ops are not set\n"); return -EINVAL; } @@ -43,7 +43,7 @@ int sdw_add_bus_master(struct sdw_bus *bus) if (bus->ops->read_prop) { ret = bus->ops->read_prop(bus); if (ret < 0) { - dev_err(bus->dev, "Bus read properties failed:%d", ret); + dev_err(bus->dev, "Bus read properties failed:%d\n", ret); return ret; } } @@ -296,7 +296,7 @@ int sdw_fill_msg(struct sdw_msg *msg, struct sdw_slave *slave, return -EINVAL; } else if (!slave->prop.paging_support) { dev_err(>dev, - "address %x needs paging but no support", addr); + "address %x needs paging but no support\n", addr); return -EINVAL; } @@ -455,13 +455,13 @@ static int sdw_assign_device_num(struct sdw_slave *slave) dev_num = sdw_get_device_num(slave); mutex_unlock(>bus->bus_lock); if (dev_num < 0) { - dev_err(slave->bus->dev, "Get dev_num failed: %d", + dev_err(slave->bus->dev, "Get dev_num failed: %d\n", dev_num); return dev_num; } } else { dev_info(slave->bus->dev, -"Slave already registered dev_num:%d", +"Slave already registered dev_num:%d\n", slave->dev_num); /* Clear the slave->dev_num to transfer message on device 0 */ @@ -472,7 +472,7 @@ static int sdw_assign_device_num(struct sdw_slave *slave) ret = sdw_write(slave, SDW_SCP_DEVNUMBER, dev_num); if (ret < 0) { - dev_err(>dev, "Program device_num failed: %d", ret); + dev_err(>dev, "Program device_num failed: %d\n", ret); return ret; } @@ -485,7 +485,7 @@ static int sdw_assign_device_num(struct sdw_slave *slave) void sdw_extract_slave_id(struct sdw_bus *bus, u64 addr, struct sdw_slave_id *id) { - dev_dbg(bus->dev, "SDW Slave Addr: %llx", addr); + dev_dbg(bus->dev, "SDW Slave Addr: %llx\n", addr); /* * Spec definition @@ -505,7 +505,7 @@ void sdw_extract_slave_id(struct sdw_bus *bus, id->class_id = addr & GENMASK(7, 0); dev_dbg(bus->dev, - "SDW Slave class_id %x, part_id %x, mfg_id %x, unique_id %x, version %x", + "SDW Slave class_id %x, part_id %x, mfg_id %x, unique_id %x, version %x\n", id->class_id, id->part_id, id->mfg_id, id->unique_id, id->sdw_version); @@ -562,7 +562,7 @@ static int sdw_program_device_num(struct sdw_bus *bus) ret = sdw_assign_device_num(slave); if (ret) { dev_err(slave->bus->dev, - "Assign dev_num failed:%d", + "Assign dev_num failed:%d\n", ret); return ret; } @@ -573,7 +573,7 @@ static int sdw_program_device_num(struct sdw_bus *bus) if (!found) { /* TODO: Park this device in Group 13 */ - dev_err(bus->dev, "Slave Entry not found"); + dev_err(bus->dev, "Slave Entry not found\n"); } count++; @@ -618,7 +618,7 @@ int sdw_configure_dpn_intr(struct sdw_slave *slave, ret = sdw_update(slave, addr, (mask | SDW_DPN_INT_PORT_READY), val); if (ret < 0) dev_err(slave->bus->dev, - "SDW_DPN_INTMASK write failed:%d", val); + "SDW_DPN_INTMASK write
[PATCH v3 2/5] soundwire: fix style issues
Visual inspections confirmed by checkpatch.pl --strict expose a number of style issues, specifically parameter alignment is inconsistent as if different contributors used different styles. Before we restart support for SoundWire with Sound Open Firmware on Intel platforms, let's clean all this. Fix Kconfig help, spelling, SPDX format, alignment, spurious parentheses, bool comparisons to true/false, macro argument protection. No new functionality added. Signed-off-by: Pierre-Louis Bossart --- drivers/soundwire/Kconfig | 2 +- drivers/soundwire/bus.c| 87 drivers/soundwire/bus.h| 16 +-- drivers/soundwire/bus_type.c | 4 +- drivers/soundwire/cadence_master.c | 87 drivers/soundwire/cadence_master.h | 22 ++-- drivers/soundwire/intel.c | 87 drivers/soundwire/intel.h | 4 +- drivers/soundwire/intel_init.c | 12 +-- drivers/soundwire/mipi_disco.c | 116 +++-- drivers/soundwire/slave.c | 10 +- drivers/soundwire/stream.c | 161 +++-- 12 files changed, 313 insertions(+), 295 deletions(-) diff --git a/drivers/soundwire/Kconfig b/drivers/soundwire/Kconfig index 19c8efb9a5ee..84876a74874f 100644 --- a/drivers/soundwire/Kconfig +++ b/drivers/soundwire/Kconfig @@ -4,7 +4,7 @@ menuconfig SOUNDWIRE bool "SoundWire support" - ---help--- + help SoundWire is a 2-Pin interface with data and clock line ratified by the MIPI Alliance. SoundWire is used for transporting data typically related to audio functions. SoundWire interface is diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c index 1cbfedfc20ef..691a31df9732 100644 --- a/drivers/soundwire/bus.c +++ b/drivers/soundwire/bus.c @@ -49,7 +49,7 @@ int sdw_add_bus_master(struct sdw_bus *bus) } /* -* Device numbers in SoundWire are 0 thru 15. Enumeration device +* Device numbers in SoundWire are 0 through 15. Enumeration device * number (0), Broadcast device number (15), Group numbers (12 and * 13) and Master device number (14) are not used for assignment so * mask these and other higher bits. @@ -172,7 +172,8 @@ static inline int do_transfer(struct sdw_bus *bus, struct sdw_msg *msg) } static inline int do_transfer_defer(struct sdw_bus *bus, - struct sdw_msg *msg, struct sdw_defer *defer) + struct sdw_msg *msg, + struct sdw_defer *defer) { int retry = bus->prop.err_threshold; enum sdw_command_response resp; @@ -224,7 +225,7 @@ int sdw_transfer(struct sdw_bus *bus, struct sdw_msg *msg) ret = do_transfer(bus, msg); if (ret != 0 && ret != -ENODATA) dev_err(bus->dev, "trf on Slave %d failed:%d\n", - msg->dev_num, ret); + msg->dev_num, ret); if (msg->page) sdw_reset_page(bus, msg->dev_num); @@ -243,7 +244,7 @@ int sdw_transfer(struct sdw_bus *bus, struct sdw_msg *msg) * Caller needs to hold the msg_lock lock while calling this */ int sdw_transfer_defer(struct sdw_bus *bus, struct sdw_msg *msg, - struct sdw_defer *defer) + struct sdw_defer *defer) { int ret; @@ -253,7 +254,7 @@ int sdw_transfer_defer(struct sdw_bus *bus, struct sdw_msg *msg, ret = do_transfer_defer(bus, msg, defer); if (ret != 0 && ret != -ENODATA) dev_err(bus->dev, "Defer trf on Slave %d failed:%d\n", - msg->dev_num, ret); + msg->dev_num, ret); if (msg->page) sdw_reset_page(bus, msg->dev_num); @@ -261,9 +262,8 @@ int sdw_transfer_defer(struct sdw_bus *bus, struct sdw_msg *msg, return ret; } - int sdw_fill_msg(struct sdw_msg *msg, struct sdw_slave *slave, - u32 addr, size_t count, u16 dev_num, u8 flags, u8 *buf) +u32 addr, size_t count, u16 dev_num, u8 flags, u8 *buf) { memset(msg, 0, sizeof(*msg)); msg->addr = addr; /* addr is 16 bit and truncated here */ @@ -284,7 +284,7 @@ int sdw_fill_msg(struct sdw_msg *msg, struct sdw_slave *slave, if (addr < SDW_REG_OPTIONAL_PAGE) { /* 32k but no page */ if (slave && !slave->prop.paging_support) return 0; - /* no need for else as that will fall thru to paging */ + /* no need for else as that will fall-through to paging */ } /* paging mandatory */ @@ -323,7 +323,7 @@ int sdw_nread(struct sdw_slave *slave, u32 addr, size_t count, u8 *val) int ret; ret = sdw_fill_msg(, slave, addr, count, - slave->dev_num, SDW_MSG_FLAG_READ, val); + slave->dev_num,
[PATCH v3 1/5] soundwire: intel: fix inversion in devm_kcalloc parameters
the number of elements and size are inverted, fix. This probably only worked because the number of properties is hard-coded to 1. Fixes: 71bb8a1b059e ('soundwire: intel: Add Intel Master driver') Signed-off-by: Pierre-Louis Bossart --- drivers/soundwire/intel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/soundwire/intel.c b/drivers/soundwire/intel.c index fd8d034cfec1..8669b314c476 100644 --- a/drivers/soundwire/intel.c +++ b/drivers/soundwire/intel.c @@ -796,8 +796,8 @@ static int intel_prop_read(struct sdw_bus *bus) /* BIOS is not giving some values correctly. So, lets override them */ bus->prop.num_freq = 1; - bus->prop.freq = devm_kcalloc(bus->dev, sizeof(*bus->prop.freq), - bus->prop.num_freq, GFP_KERNEL); + bus->prop.freq = devm_kcalloc(bus->dev, bus->prop.num_freq, + sizeof(*bus->prop.freq), GFP_KERNEL); if (!bus->prop.freq) return -ENOMEM; -- 2.17.1
Re: [PATCH v2] init: Do not select DEBUG_KERNEL by default
On 4/10/2019 11:02 PM, Josh Triplett wrote: Then let's fix*that*, and get checkpatch to help enforce it in the future. EXPERT doesn't affect code generation, and neither should this. I think we have to do both. We need to go after the users as well as solve the immediate problem per this patch. As Mathieu identified, CONFIG_DEBUG_KERNEL is being used all over the place and getting subsystem owners to remove let alone add a check to checkpatch is just going to take time. Please let us know if you are OK with this plan.
Re: [PATCH v5 0/5] PCIE support for i.MX8MQ (DT changes)
On Fri, Apr 05, 2019 at 10:29:59AM -0700, Andrey Smirnov wrote: > Andrey Smirnov (5): > arm64: dts: imx8mq: Mark iomuxc_gpr as i.MX6Q compatible > arm64: dts: imx8mq: Add a node for SRC IP block > arm64: dts: imx8mq: Combine PCIE power domains > arm64: dts: imx8mq: Add nodes for PCIe IP blocks > arm64: dts: imx8mq-evk: Enable PCIE0 interface Applied all, thanks.
Re: [RFC patch 16/41] tracing: Remove the ULONG_MAX stack trace hackery
On Wed, 10 Apr 2019 21:34:25 -0500 Josh Poimboeuf wrote: > > --- a/kernel/trace/trace_stack.c > > +++ b/kernel/trace/trace_stack.c > > @@ -18,8 +18,7 @@ > > > > #include "trace.h" > > > > -static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] = > > -{ [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX }; > > +static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES + 1]; > > Is the "+ 1" still needed? AFAICT, accesses to this array never go past > nr_entries. Probably not. But see this for an explanation: http://lkml.kernel.org/r/20180620110758.crunhd5bfep7zuiz@kili.mountain > > Also I've been staring at the code but I can't figure out why > max_entries is "- 1". > > struct stack_trace stack_trace_max = { > .max_entries= STACK_TRACE_ENTRIES - 1, > .entries= _dump_trace[0], > }; > Well, it had a reason in the past, but there doesn't seem to be a reason today. Looking at git history, that code was originally: .max_entries= STACK_TRACE_ENTRIES - 1, .entries= _dump_trace[1], Where we had to make max_entries -1 as we started at the first index into the array. I'll have to take a new look into this code. After Thomas's clean up here, I'm sure we can simplify it a bit more. -- Steve
Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling.
On Wed, Apr 10, 2019 at 04:44:18PM +0200, Peter Zijlstra wrote: > On Wed, Apr 10, 2019 at 12:36:33PM +0800, Aaron Lu wrote: > > On Tue, Apr 09, 2019 at 11:09:45AM -0700, Tim Chen wrote: > > > Now that we have accumulated quite a number of different fixes to your > > > orginal > > > posted patches. Would you like to post a v2 of the core scheduler with > > > the fixes? > > > > One more question I'm not sure: should a task with cookie=0, i.e. tasks > > that are untagged, be allowed to scheduled on the the same core with > > another tagged task? > > That was not meant to be possible. Good to know this. > > The current patch seems to disagree on this, e.g. in pick_task(), > > if max is already chosen but max->core_cookie == 0, then we didn't care > > about cookie and simply use class_pick for the other cpu. This means we > > could schedule two tasks with different cookies(one is zero and the > > other can be tagged). > > When core_cookie==0 we shouldn't schedule the other siblings at all. Not even with another untagged task? I was thinking to leave host side tasks untagged, like kernel threads, init and other system daemons or utilities etc., and tenant tasks tagged. Then at least two untagged tasks can be scheduled on the same core. Kindly let me know if you see a problem with this. > > But then sched_core_find() only allow idle task to match with any tagged > > tasks(we didn't place untagged tasks to the core tree of course :-). > > > > Thoughts? Do I understand this correctly? If so, I think we probably > > want to make this clear before v2. I personally feel, we shouldn't allow > > untagged tasks(like kernel threads) to match with tagged tasks. > > Agreed, cookie should always match or idle. Thanks a lot for the clarification.
Re: [PATCH v2] init: Do not select DEBUG_KERNEL by default
On April 10, 2019 4:24:18 PM PDT, Kees Cook wrote: >On Wed, Apr 10, 2019 at 4:22 PM Josh Triplett >wrote: >> >> On April 10, 2019 3:58:55 PM PDT, Kees Cook >wrote: >> >On Wed, Apr 10, 2019 at 3:42 PM Sinan Kaya wrote: >> >> >> >> We can't seem to have a kernel with CONFIG_EXPERT set but >> >> CONFIG_DEBUG_KERNEL unset these days. >> >> >> >> While some of the features under the CONFIG_EXPERT require >> >> CONFIG_DEBUG_KERNEL, it doesn't apply for all features. >> >> >> >> It looks like CONFIG_KALLSYMS_ALL is the only feature that >> >> requires CONFIG_DEBUG_KERNEL. >> >> >> >> Select CONFIG_EXPERT when CONFIG_DEBUG is chosen but you can >> > >> >Typo: CONFIG_DEBUG_KERNEL >> > >> >> still choose CONFIG_EXPERT without CONFIG_DEBUG. >> > >> >same. >> > >> >> >> >> Signed-off-by: Sinan Kaya >> > >> >But with those fixed, looks good to me. Adding Josh (and others) to >CC >> >since he originally added the linkage to EXPERT in commit >> >f505c553dbe2. >> >> CONFIG_DEBUG_KERNEL shouldn't affect code generation in any way; it >should only make more options appear in kconfig. I originally added >this to ensure that features you might want to *disable* aren't hidden, >as part of the tinification effort. >> >> What specific problem does having CONFIG_DEBUG_KERNEL enabled cause >for you? I'd still prefer to have a single switch for "don't hide >things I might want to disable", rather than several. > >See earlier in the thread: code generation depends on >CONFIG_DEBUG_KERNEL now unfortunately. Then let's fix *that*, and get checkpatch to help enforce it in the future. EXPERT doesn't affect code generation, and neither should this.
Re: [PATCH] arm64: dts: imx8qxp: Add lpuart1/lpuart2/lpuart3 nodes
On Sat, Mar 30, 2019 at 05:07:44PM +, Daniel Baluta wrote: > lpuart nodes are part of the ADMA subsystem. See Audio DMA > memory map in iMX8 QXP RM [1] > > This patch is based on the dtsi file initially submitted by > Teo Hall in i.MX NXP internal tree. > > [1] https://www.nxp.com/docs/en/reference-manual/IMX8DQXPRM.pdf > > Signed-off-by: Teo Hall > Signed-off-by: Daniel Baluta Applied, thanks.
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Martin, On 2019/4/11 1:54, Martin Blumenstingl wrote: Hi Liang, On Wed, Apr 10, 2019 at 1:08 PM Liang Yang wrote: Hi Martin, On 2019/4/5 12:30, Martin Blumenstingl wrote: Hi Liang, On Fri, Mar 29, 2019 at 8:44 AM Liang Yang wrote: Hi Martin, On 2019/3/29 2:03, Martin Blumenstingl wrote: Hi Liang, [..] I don't think it is caused by a different NAND type, but i have followed the some test on my GXL platform. we can see the result from the attachment. By the way, i don't find any information about this on meson NFC datasheet, so i will ask our VLSI. Martin, May you reproduce it with the new patch on meson8b platform ? I need a more clear and easier compared log like gxl.txt. Thanks. your gxl.txt is great, finally I can also compare my own results with something that works for you! in my results (see attachment) the "DATA_IN [256 B, force 8-bit]" instructions result in a different info buffer output. does this make any sense to you? I have asked our VLSI designer for explanation or simulation result by an e-mail. Thanks. do you have any update on this? Sorry. I haven't got reply from VLSI designer yet. We tried to improve priority yesterday, but i still can't estimate the time. There is no document or change list showing the difference between m8/b and gxl/axg serial chips. Now it seems that we can't use command NFC_CMD_N2M on nand initialization for m8/b chips and use *read byte from NFC fifo register* instead. thank you for the status update! I am trying to understand your suggestion not to use NFC_CMD_N2M: the documentation (public S922X datasheet from Hardkernel: [0]) states that P_NAND_BUF (NFC_REG_BUF in the meson_nand driver) can hold up to four bytes of data. is this the "read byte from NFC FIFO register" you mentioned? You are right.take the early meson NFC driver V2 on previous mail as a reference. Before I spend time changing the code to use the FIFO register I would like to wait for an answer from your VLSI designer. Setting the "correct" info buffer length for NFC_CMD_N2M on the 32-bit SoCs seems like an easier solution compared to switching to the FIFO register. Keeping NFC_CMD_N2M on the 32-bit SoCs also allows us to have only one code-path for 32 and 64 bit SoCs, meaning we don't have to maintain two separate code-paths for basically the same functionality (assuming that NFC_CMD_N2M is not completely broken on the 32-bit SoCs, we just don't know how to use it yet). All right. I am also waiting for the answer. Regards Martin [0] https://dn.odroid.com/S922X/ODROID-N2/Datasheet/S922X_Public_Datasheet_V0.2.pdf .
Re: [PATCH RFC] clk: ux500: add range to usleep_range
On Wed, Apr 10, 2019 at 03:53:51PM -0700, Stephen Boyd wrote: > Quoting Nicholas Mc Guire (2019-04-06 20:13:24) > > Providing a range for usleep_range() allows the hrtimer subsystem to > > coalesce timers - the delay is runtime configurable so a factor 2 > > is taken to provide the range. > > > > Signed-off-by: Nicholas Mc Guire > > --- > > I think this driver is in maintenance mode. I'll wait for Ulf to ack or > review this change before applying. > > > diff --git a/drivers/clk/ux500/clk-sysctrl.c > > b/drivers/clk/ux500/clk-sysctrl.c > > index 7c0403b..a1fa3fb 100644 > > --- a/drivers/clk/ux500/clk-sysctrl.c > > +++ b/drivers/clk/ux500/clk-sysctrl.c > > @@ -42,7 +42,7 @@ static int clk_sysctrl_prepare(struct clk_hw *hw) > > clk->reg_bits[0]); > > > > if (!ret && clk->enable_delay_us) > > - usleep_range(clk->enable_delay_us, clk->enable_delay_us); > > + usleep_range(clk->enable_delay_us, clk->enable_delay_us*2); > > Please add space around that multiply. > I can do that but it does not seem common and also checkpatch did not complain about this - now a simple grep -re "\*10" on the kernel shows that it seems more common not to use spaces around * that to use them. Greping specifically for cases using usleep_range() (not that many) it seems more or less evenly devided between space and no space - so the concern is overlooking that factor 2 ? thx! hofrat
Re: [RFC patch 25/41] mm/kasan: Simplify stacktrace handling
On Wed, Apr 10, 2019 at 12:28:19PM +0200, Thomas Gleixner wrote: > Replace the indirection through struct stack_trace by using the storage > array based interfaces. > > Signed-off-by: Thomas Gleixner > Cc: Andrey Ryabinin > Cc: Alexander Potapenko > Cc: Dmitry Vyukov > Cc: kasan-...@googlegroups.com > Cc: linux...@kvack.org > --- > mm/kasan/common.c | 30 -- > mm/kasan/report.c |7 --- > 2 files changed, 16 insertions(+), 21 deletions(-) > > --- a/mm/kasan/common.c > +++ b/mm/kasan/common.c > @@ -48,34 +48,28 @@ static inline int in_irqentry_text(unsig >ptr < (unsigned long)&__softirqentry_text_end); > } > > -static inline void filter_irq_stacks(struct stack_trace *trace) > +static inline unsigned int filter_irq_stacks(unsigned long *entries, > + unsigned int nr_entries) > { > - int i; > + unsigned int i; > > - if (!trace->nr_entries) > - return; > - for (i = 0; i < trace->nr_entries; i++) > - if (in_irqentry_text(trace->entries[i])) { > + for (i = 0; i < nr_entries; i++) { > + if (in_irqentry_text(entries[i])) { > /* Include the irqentry function into the stack. */ > - trace->nr_entries = i + 1; > - break; > + return i + 1; Isn't this an off-by-one error if "i" points to the last entry of the array? -- Josh
Re: [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects
On Thu, Apr 11, 2019 at 03:33:22AM +0100, Al Viro wrote: > On Thu, Apr 11, 2019 at 11:34:40AM +1000, Tobin C. Harding wrote: > > +/* > > + * d_isolate() - Dentry isolation callback function. > > + * @s: The dentry cache. > > + * @v: Vector of pointers to the objects to isolate. > > + * @nr: Number of objects in @v. > > + * > > + * The slab allocator is holding off frees. We can safely examine > > + * the object without the danger of it vanishing from under us. > > + */ > > +static void *d_isolate(struct kmem_cache *s, void **v, int nr) > > +{ > > + struct dentry *dentry; > > + int i; > > + > > + for (i = 0; i < nr; i++) { > > + dentry = v[i]; > > + __dget(dentry); > > + } > > + > > + return NULL;/* No need for private data */ > > +} > > Huh? This is compeletely wrong; what you need is collecting the ones > with zero refcount (and not on shrink lists) into a private list. > *NOT* bumping the refcounts at all. And do it in your isolate thing. Oh, so putting entries on a shrink list is enough to pin them? > > > +static void d_partial_shrink(struct kmem_cache *s, void **v, int nr, > > + int node, void *_unused) > > +{ > > + struct dentry *dentry; > > + LIST_HEAD(dispose); > > + int i; > > + > > + for (i = 0; i < nr; i++) { > > + dentry = v[i]; > > + spin_lock(>d_lock); > > + dentry->d_lockref.count--; > > + > > + if (dentry->d_lockref.count > 0 || > > + dentry->d_flags & DCACHE_SHRINK_LIST) { > > + spin_unlock(>d_lock); > > + continue; > > + } > > + > > + if (dentry->d_flags & DCACHE_LRU_LIST) > > + d_lru_del(dentry); > > + > > + d_shrink_add(dentry, ); > > + > > + spin_unlock(>d_lock); > > + } > > Basically, that loop (sans jerking the refcount up and down) should > get moved into d_isolate(). > > + > > + if (!list_empty()) > > + shrink_dentry_list(); > > +} > > ... with this left in d_partial_shrink(). And you obviously need some way > to pass the list from the former to the latter... Easy enough, we have a void * return value from the isolate function just for this purpose. Thanks Al, hackety hack ... Tobin
Re: [PATCH 1/5] media: platform: Aspeed: Remove use of reset line
On Tue, 2 Apr 2019 at 18:24, Eddie James wrote: > > The reset line is toggled by enabling the clocks, so it's not necessary > to manually toggle the reset as well. > > Signed-off-by: Eddie James Reviewed-by: Joel Stanley
[PATCH -next] memstick: remove set but not used variable 'data'
Fixes gcc '-Wunused-but-set-variable' warning: drivers/memstick/host/jmb38x_ms.c: In function 'jmb38x_ms_issue_cmd': drivers/memstick/host/jmb38x_ms.c:371:17: warning: variable 'data' set but not used [-Wunused-but-set-variable] It's never used since introduction and can be removed. Signed-off-by: YueHaibing --- drivers/memstick/host/jmb38x_ms.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/memstick/host/jmb38x_ms.c b/drivers/memstick/host/jmb38x_ms.c index e3a5af65dbce..acec09813419 100644 --- a/drivers/memstick/host/jmb38x_ms.c +++ b/drivers/memstick/host/jmb38x_ms.c @@ -368,7 +368,6 @@ static int jmb38x_ms_transfer_data(struct jmb38x_ms_host *host) static int jmb38x_ms_issue_cmd(struct memstick_host *msh) { struct jmb38x_ms_host *host = memstick_priv(msh); - unsigned char *data; unsigned int data_len, cmd, t_val; if (!(STATUS_HAS_MEDIA & readl(host->addr + STATUS))) { @@ -400,8 +399,6 @@ static int jmb38x_ms_issue_cmd(struct memstick_host *msh) cmd |= TPC_WAIT_INT; } - data = host->req->data; - if (!no_dma) host->cmd_flags |= DMA_DATA;
Re: [RFC patch 20/41] backtrace-test: Simplify stack trace handling
On Wed, Apr 10, 2019 at 12:28:14PM +0200, Thomas Gleixner wrote: > Replace the indirection through struct stack_trace by using the storage > array based interfaces. > > Signed-off-by: Thomas Gleixner > --- > kernel/backtracetest.c | 11 +++ > 1 file changed, 3 insertions(+), 8 deletions(-) > > --- a/kernel/backtracetest.c > +++ b/kernel/backtracetest.c > @@ -48,19 +48,14 @@ static void backtrace_test_irq(void) > #ifdef CONFIG_STACKTRACE > static void backtrace_test_saved(void) > { > - struct stack_trace trace; > unsigned long entries[8]; > + unsigned int nent; "Nent" isn't immediately readable to my eyes. How about just good old "nr_entries"? (for this patch and all the others) -- Josh
Re: [PATCH] cifs: fix page reference leak with readv/writev
How was this discovered? Does it address a reported user problem? On Wed, Apr 10, 2019 at 2:38 PM wrote: > > From: Jérôme Glisse > > CIFS can leak pages reference gotten through GUP (get_user_pages*() > through iov_iter_get_pages()). This happen if cifs_send_async_read() > or cifs_write_from_iter() calls fail from within __cifs_readv() and > __cifs_writev() respectively. This patch move page unreference to > cifs_aio_ctx_release() which will happens on all code paths this is > all simpler to follow for correctness. > > Signed-off-by: Jérôme Glisse > Cc: Steve French > Cc: linux-c...@vger.kernel.org > Cc: samba-techni...@lists.samba.org > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Linus Torvalds > Cc: Stable > --- > fs/cifs/file.c | 15 +-- > fs/cifs/misc.c | 23 ++- > 2 files changed, 23 insertions(+), 15 deletions(-) > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c > index 89006e044973..a756a4d3f70f 100644 > --- a/fs/cifs/file.c > +++ b/fs/cifs/file.c > @@ -2858,7 +2858,6 @@ static void collect_uncached_write_data(struct > cifs_aio_ctx *ctx) > struct cifs_tcon *tcon; > struct cifs_sb_info *cifs_sb; > struct dentry *dentry = ctx->cfile->dentry; > - unsigned int i; > int rc; > > tcon = tlink_tcon(ctx->cfile->tlink); > @@ -2922,10 +2921,6 @@ static void collect_uncached_write_data(struct > cifs_aio_ctx *ctx) > kref_put(>refcount, cifs_uncached_writedata_release); > } > > - if (!ctx->direct_io) > - for (i = 0; i < ctx->npages; i++) > - put_page(ctx->bv[i].bv_page); > - > cifs_stats_bytes_written(tcon, ctx->total_len); > set_bit(CIFS_INO_INVALID_MAPPING, _I(dentry->d_inode)->flags); > > @@ -3563,7 +3558,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) > struct iov_iter *to = >iter; > struct cifs_sb_info *cifs_sb; > struct cifs_tcon *tcon; > - unsigned int i; > int rc; > > tcon = tlink_tcon(ctx->cfile->tlink); > @@ -3647,15 +3641,8 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) > kref_put(>refcount, cifs_uncached_readdata_release); > } > > - if (!ctx->direct_io) { > - for (i = 0; i < ctx->npages; i++) { > - if (ctx->should_dirty) > - set_page_dirty(ctx->bv[i].bv_page); > - put_page(ctx->bv[i].bv_page); > - } > - > + if (!ctx->direct_io) > ctx->total_len = ctx->len - iov_iter_count(to); > - } > > /* mask nodata case */ > if (rc == -ENODATA) > diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c > index bee203055b30..9bc0d17a9d77 100644 > --- a/fs/cifs/misc.c > +++ b/fs/cifs/misc.c > @@ -768,6 +768,11 @@ cifs_aio_ctx_alloc(void) > { > struct cifs_aio_ctx *ctx; > > + /* > +* Must use kzalloc to initialize ctx->bv to NULL and ctx->direct_io > +* to false so that we know when we have to unreference pages within > +* cifs_aio_ctx_release() > +*/ > ctx = kzalloc(sizeof(struct cifs_aio_ctx), GFP_KERNEL); > if (!ctx) > return NULL; > @@ -786,7 +791,23 @@ cifs_aio_ctx_release(struct kref *refcount) > struct cifs_aio_ctx, refcount); > > cifsFileInfo_put(ctx->cfile); > - kvfree(ctx->bv); > + > + /* > +* ctx->bv is only set if setup_aio_ctx_iter() was call successfuly > +* which means that iov_iter_get_pages() was a success and thus that > +* we have taken reference on pages. > +*/ > + if (ctx->bv) { > + unsigned i; > + > + for (i = 0; i < ctx->npages; i++) { > + if (ctx->should_dirty) > + set_page_dirty(ctx->bv[i].bv_page); > + put_page(ctx->bv[i].bv_page); > + } > + kvfree(ctx->bv); > + } > + > kfree(ctx); > } > > -- > 2.20.1 > -- Thanks, Steve
Re: [PATCH v3] init: Do not select DEBUG_KERNEL by default
On Wed, Apr 10, 2019 at 5:56 PM Sinan Kaya wrote: > > We can't seem to have a kernel with CONFIG_EXPERT set but > CONFIG_DEBUG_KERNEL unset these days. > > While some of the features under the CONFIG_EXPERT require > CONFIG_DEBUG_KERNEL, it doesn't apply for all features. > > It looks like CONFIG_KALLSYMS_ALL is the only feature that > requires CONFIG_DEBUG_KERNEL. > > Select CONFIG_EXPERT when CONFIG_DEBUG_KERNEL is chosen but > you can still choose CONFIG_EXPERT without CONFIG_DEBUG_KERNEL. > > Signed-off-by: Sinan Kaya > Reviewed-by: Kees Cook Masahiro, should this go via your tree, or somewhere else? Thanks! -Kees > --- > init/Kconfig | 2 -- > lib/Kconfig.debug | 1 + > 2 files changed, 1 insertion(+), 2 deletions(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 4592bf7997c0..37e10a8391a3 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1206,8 +1206,6 @@ config BPF > > menuconfig EXPERT > bool "Configure standard kernel features (expert users)" > - # Unhide debug options, to make the on-by-default options visible > - select DEBUG_KERNEL > help > This option allows certain base kernel options and settings >to be disabled or tweaked. This is for specialized > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 0d9e81779e37..9fbf3499ec8d 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -434,6 +434,7 @@ config MAGIC_SYSRQ_SERIAL > > config DEBUG_KERNEL > bool "Kernel debugging" > + default EXPERT > help > Say Y here if you are developing drivers or trying to debug and > identify kernel problems. > -- > 2.21.0 > -- Kees Cook
Re: [PATCH 1/4] ARM: dts: imx6: RDU2: Use new CODEC reset pin name
On Fri, Mar 29, 2019 at 01:13:10PM -0500, Andrew F. Davis wrote: > The correct DT property for specifying a GPIO used for reset > is "reset-gpios", the driver now accepts this name, use it here. > > Note the GPIO polarity in the driver was ignored before and always > assumed to be active low, when all the DTs are fixed we will start > respecting the specified polarity. Switch polarity in DT to the > currently assumed one, this way when the driver changes the > behavior will not change. > > Signed-off-by: Andrew F. Davis I fixed up the prefix to use board name, and applied patch #1 ~ #3. Shawn
[PATCH -next] bus: ti-sysc: Use PTR_ERR_OR_ZERO in sysc_init_resets()
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Signed-off-by: YueHaibing --- drivers/bus/ti-sysc.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/bus/ti-sysc.c b/drivers/bus/ti-sysc.c index b696f26a3894..2b93be2882f3 100644 --- a/drivers/bus/ti-sysc.c +++ b/drivers/bus/ti-sysc.c @@ -432,10 +432,7 @@ static int sysc_init_resets(struct sysc *ddata) { ddata->rsts = devm_reset_control_array_get_optional_exclusive(ddata->dev); - if (IS_ERR(ddata->rsts)) - return PTR_ERR(ddata->rsts); - - return 0; + return PTR_ERR_OR_ZERO(ddata->rsts); } /**
Re: [RFC patch 16/41] tracing: Remove the ULONG_MAX stack trace hackery
On Wed, Apr 10, 2019 at 12:28:10PM +0200, Thomas Gleixner wrote: > No architecture terminates the stack trace with ULONG_MAX anymore. As the > code checks the number of entries stored anyway there is no point in > keeping all that ULONG_MAX magic around. > > The histogram code zeroes the storage before saving the stack, so if the > trace is shorter than the maximum number of entries it can terminate the > print loop if a zero entry is detected. > > Signed-off-by: Thomas Gleixner > Cc: Steven Rostedt > --- > kernel/trace/trace_events_hist.c |2 +- > kernel/trace/trace_stack.c | 20 +--- > 2 files changed, 6 insertions(+), 16 deletions(-) > > --- a/kernel/trace/trace_events_hist.c > +++ b/kernel/trace/trace_events_hist.c > @@ -5246,7 +5246,7 @@ static void hist_trigger_stacktrace_prin > unsigned int i; > > for (i = 0; i < max_entries; i++) { > - if (stacktrace_entries[i] == ULONG_MAX) > + if (!stacktrace_entries[i]) > return; > > seq_printf(m, "%*c", 1 + spaces, ' '); > --- a/kernel/trace/trace_stack.c > +++ b/kernel/trace/trace_stack.c > @@ -18,8 +18,7 @@ > > #include "trace.h" > > -static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] = > - { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX }; > +static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES + 1]; Is the "+ 1" still needed? AFAICT, accesses to this array never go past nr_entries. Also I've been staring at the code but I can't figure out why max_entries is "- 1". struct stack_trace stack_trace_max = { .max_entries= STACK_TRACE_ENTRIES - 1, .entries= _dump_trace[0], }; -- Josh
Re: [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects
On Thu, Apr 11, 2019 at 11:34:40AM +1000, Tobin C. Harding wrote: > +/* > + * d_isolate() - Dentry isolation callback function. > + * @s: The dentry cache. > + * @v: Vector of pointers to the objects to isolate. > + * @nr: Number of objects in @v. > + * > + * The slab allocator is holding off frees. We can safely examine > + * the object without the danger of it vanishing from under us. > + */ > +static void *d_isolate(struct kmem_cache *s, void **v, int nr) > +{ > + struct dentry *dentry; > + int i; > + > + for (i = 0; i < nr; i++) { > + dentry = v[i]; > + __dget(dentry); > + } > + > + return NULL;/* No need for private data */ > +} Huh? This is compeletely wrong; what you need is collecting the ones with zero refcount (and not on shrink lists) into a private list. *NOT* bumping the refcounts at all. And do it in your isolate thing. > +static void d_partial_shrink(struct kmem_cache *s, void **v, int nr, > + int node, void *_unused) > +{ > + struct dentry *dentry; > + LIST_HEAD(dispose); > + int i; > + > + for (i = 0; i < nr; i++) { > + dentry = v[i]; > + spin_lock(>d_lock); > + dentry->d_lockref.count--; > + > + if (dentry->d_lockref.count > 0 || > + dentry->d_flags & DCACHE_SHRINK_LIST) { > + spin_unlock(>d_lock); > + continue; > + } > + > + if (dentry->d_flags & DCACHE_LRU_LIST) > + d_lru_del(dentry); > + > + d_shrink_add(dentry, ); > + > + spin_unlock(>d_lock); > + } Basically, that loop (sans jerking the refcount up and down) should get moved into d_isolate(). > + > + if (!list_empty()) > + shrink_dentry_list(); > +} ... with this left in d_partial_shrink(). And you obviously need some way to pass the list from the former to the latter...
Re: [PATCH v3 1/9] ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA
On Thu, Mar 28, 2019 at 11:49:16PM -0700, Andrey Smirnov wrote: > Since 25aaa75df1e6 SDMA driver uses clock rates of "ipg" and "ahb" > clock to determine if it needs to configure the IP block as operating > at 1:1 or 1:2 clock ratio (ACR bit in SDMAARM_CONFIG). Specifying both > clocks as IMX6QDL_CLK_SDMA results in driver incorrectly thinking that > ratio is 1:1 which results in broken SDMA funtionality(this at least > breaks RAVE SP serdev driver on RDU2). Fix the code to specify > IMX6QDL_CLK_IPG as "ipg" clock for SDMA, to avoid detecting incorrect > clock ratio. > > Fixes: 25aaa75df1e6 ("dmaengine: imx-sdma: add clock ratio 1:1 check") Since we have a fix in the dma driver, I dropped the Fixes tag here. Applied all, thanks. Shawn > Signed-off-by: Andrey Smirnov > Reviewed-by: Lucas Stach > Cc: Angus Ainslie (Purism) > Cc: Chris Healy > Cc: Lucas Stach > Cc: Fabio Estevam > Cc: Shawn Guo > Cc: linux-arm-ker...@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > --- > arch/arm/boot/dts/imx6qdl.dtsi | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/boot/dts/imx6qdl.dtsi b/arch/arm/boot/dts/imx6qdl.dtsi > index 9f9aa6e7ed0e..354feba077b2 100644 > --- a/arch/arm/boot/dts/imx6qdl.dtsi > +++ b/arch/arm/boot/dts/imx6qdl.dtsi > @@ -949,7 +949,7 @@ > compatible = "fsl,imx6q-sdma", "fsl,imx35-sdma"; > reg = <0x020ec000 0x4000>; > interrupts = <0 2 IRQ_TYPE_LEVEL_HIGH>; > - clocks = < IMX6QDL_CLK_SDMA>, > + clocks = < IMX6QDL_CLK_IPG>, >< IMX6QDL_CLK_SDMA>; > clock-names = "ipg", "ahb"; > #dma-cells = <3>; > -- > 2.20.1 >
Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation
On 04/10/2019 02:44 PM, Peter Zijlstra wrote: > On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote: >> Because of writer lock stealing, it is possible that a constant >> stream of incoming writers will cause a waiting writer or reader to >> wait indefinitely leading to lock starvation. >> >> The mutex code has a lock handoff mechanism to prevent lock starvation. >> This patch implements a similar lock handoff mechanism to disable >> lock stealing and force lock handoff to the first waiter in the queue >> after at least a 5ms waiting period. The waiting period is used to >> avoid discouraging lock stealing too much to affect performance. > I would say the handoff it not at all similar to the mutex code. It is > in fact radically different. > I mean they are similar in concept. Of course, the implementations are quite different. >> @@ -131,6 +138,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, >> adjustment = RWSEM_READER_BIAS; >> oldcount = atomic_long_fetch_add(adjustment, >count); >> if (unlikely(oldcount & RWSEM_WRITER_MASK)) { >> +/* >> + * Initiate handoff to reader, if applicable. >> + */ >> +if (!(oldcount & RWSEM_FLAG_HANDOFF) && >> +time_after(jiffies, waiter->timeout)) { >> +adjustment -= RWSEM_FLAG_HANDOFF; >> +lockevent_inc(rwsem_rlock_handoff); >> +} >> + >> atomic_long_sub(adjustment, >count); >> return; >> } > That confuses the heck out of me... > > The above seems to rely on __rwsem_mark_wake() to be fully serialized > (and it is, by ->wait_lock, but that isn't spelled out anywhere) such > that we don't get double increment of FLAG_HANDOFF. > > So there is NO __rwsem_mark_wake() vs __wesem_mark_wake() race like: > > CPU0CPU1 > > oldcount = atomic_long_fetch_add(adjustment, >count) > > oldcount = > atomic_long_fetch_add(adjustment, >count) > > if (!(oldcount & HANDOFF)) > adjustment -= HANDOFF; > > if (!(oldcount & HANDOFF)) > adjustment -= HANDOFF; > atomic_long_sub(adjustment) > atomic_long_sub(adjustment) > > > *whoops* double negative decrement of HANDOFF (aka double increment). Yes, __rwsem_mark_wake() is always called with wait_lock held. I can add a lockdep_assert() statement to clarify this point. > > However there is another site that fiddles with the HANDOFF bit, namely > __rwsem_down_write_failed_common(), and that does: > > + atomic_long_or(RWSEM_FLAG_HANDOFF, > >count); > > _OUTSIDE_ of ->wait_lock, which would yield: > > CPU0CPU1 > > oldcount = atomic_long_fetch_add(adjustment, >count) > > atomic_long_or(HANDOFF) > > if (!(oldcount & HANDOFF)) > adjustment -= HANDOFF; > > atomic_long_sub(adjustment) > > *whoops*, incremented HANDOFF on HANDOFF. > > > And there's not a comment in sight that would elucidate if this is > possible or not. > A writer can only set the handoff bit if it is the first waiter in the queue. If it is the first waiter, a racing __rwsem_mark_wake() will see that the first waiter is a writer and so won't go into the reader path. I know I something don't spell out all the conditions that may look obvious to me but not to others. I will elaborate more in comments. > Also: > > + atomic_long_or(RWSEM_FLAG_HANDOFF, > >count); > + first++; > + > + /* > +* Make sure the handoff bit is seen by > +* others before proceeding. > +*/ > + smp_mb__after_atomic(); > > That comment is utter nonsense. smp_mb() doesn't (and cannot) 'make > visible'. There needs to be order between two memops on both sides. > I kind of add that for safety. I will take some time to rethink if it is really necessary. Cheers, Longman
linux-next: manual merge of the apparmor tree with Linus' tree
Hi all, Today's linux-next merge of the apparmor tree got a conflict in: security/apparmor/lsm.c between commit: e33c1b992377 ("apparmor: Restore Y/N in /sys for apparmor's "enabled"") from Linus' tree and commit: 876dd866c084 ("apparmor: Initial implementation of raw policy blob compression") from the apparmor tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc security/apparmor/lsm.c index 87500bde5a92,e1e9c3c01cd3.. --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@@ -25,8 -25,8 +25,9 @@@ #include #include #include + #include #include +#include #include "include/apparmor.h" #include "include/apparmorfs.h" @@@ -1420,46 -1424,37 +1436,77 @@@ static int param_get_aauint(char *buffe return param_get_uint(buffer, kp); } +/* Can only be set before AppArmor is initialized (i.e. on boot cmdline). */ +static int param_set_aaintbool(const char *val, const struct kernel_param *kp) +{ + struct kernel_param kp_local; + bool value; + int error; + + if (apparmor_initialized) + return -EPERM; + + /* Create local copy, with arg pointing to bool type. */ + value = !!*((int *)kp->arg); + memcpy(_local, kp, sizeof(kp_local)); + kp_local.arg = + + error = param_set_bool(val, _local); + if (!error) + *((int *)kp->arg) = *((bool *)kp_local.arg); + return error; +} + +/* + * To avoid changing /sys/module/apparmor/parameters/enabled from Y/N to + * 1/0, this converts the "int that is actually bool" back to bool for + * display in the /sys filesystem, while keeping it "int" for the LSM + * infrastructure. + */ +static int param_get_aaintbool(char *buffer, const struct kernel_param *kp) +{ + struct kernel_param kp_local; + bool value; + + /* Create local copy, with arg pointing to bool type. */ + value = !!*((int *)kp->arg); + memcpy(_local, kp, sizeof(kp_local)); + kp_local.arg = + + return param_get_bool(buffer, _local); +} + + static int param_set_aacompressionlevel(const char *val, + const struct kernel_param *kp) + { + int error; + + if (!apparmor_enabled) + return -EINVAL; + if (apparmor_initialized) + return -EPERM; + + error = param_set_int(val, kp); + + aa_g_rawdata_compression_level = clamp(aa_g_rawdata_compression_level, + Z_NO_COMPRESSION, + Z_BEST_COMPRESSION); + pr_info("AppArmor: policy rawdata compression level set to %u\n", + aa_g_rawdata_compression_level); + + return error; + } + + static int param_get_aacompressionlevel(char *buffer, + const struct kernel_param *kp) + { + if (!apparmor_enabled) + return -EINVAL; + if (apparmor_initialized && !policy_view_capable(NULL)) + return -EPERM; + return param_get_int(buffer, kp); + } + static int param_get_audit(char *buffer, const struct kernel_param *kp) { if (!apparmor_enabled) pgplC3V9Q_JZK.pgp Description: OpenPGP digital signature
Re: kernel BUG at fs/inode.c:LINE!
On Thu, Apr 11, 2019 at 08:50:17AM +0800, Ian Kent wrote: > On Wed, 2019-04-10 at 14:41 +0200, Dmitry Vyukov wrote: > > On Wed, Apr 10, 2019 at 2:12 PM Al Viro wrote: > > > > > > On Wed, Apr 10, 2019 at 08:07:15PM +0800, Ian Kent wrote: > > > > > > > > I'm unable to find a branch matching the line numbers. > > > > > > > > > > Given that, on the face of it, the scenario is impossible I'm > > > > > seeking clarification on what linux-next to look at for the > > > > > sake of accuracy. > > > > > > > > > > So I'm wondering if this testing done using the master branch > > > > > or one of the daily branches one would use to check for conflicts > > > > > before posting? > > > > > > > > Sorry those are tags not branches. > > > > > > FWIW, that's next-20181214; it is what master had been in mid-December > > > and master is rebased every day. Can it be reproduced with the current > > > tree? > > > > From the info on the dashboard we know that it happened only once on > > d14b746c (the second one is result of reproducing the first one). So > > it was either fixed or just hard to trigger. > > Looking at the source of tag next-20181214 in linux-next-history I see > this is mistake I made due to incorrect error handling which I fixed > soon after (there was in fact a double iput()). Right - "autofs: fix possible inode leak in autofs_fill_super()" had been broken (and completely pointless), leading to double iput() in that failure case. And yes, double iput() can trigger that BUG_ON(), and with non-zero odds do so with that stack trace. As far as I'm concerned, case closed - bug had been in a misguided "fix" for inexistent leak (coming from misreading the calling conventions for d_make_root()), introduced in -next at next-20181130 and kicked out of there in next-20181219. Dropped by Ian's request in Message-ID: <66d497c00cffb3e4109ca0d5287c8277954d7132.ca...@themaw.net> which has fixed that crap. Moreover, that posting had been in reply to that very syzcaller report, AFAICS. I don't know how to tell the bot to STFU and close the report in this situation; up to you, folks. As an aside, the cause of that bug is that d_make_root() calling conventions are insufficiently documented. All we have is ||[mandatory] ||d_alloc_root() is gone, along with a lot of bugs caused by code ||misusing it. Replacement: d_make_root(inode). The difference is, ||d_make_root() drops the reference to inode if dentry allocation fails. in Documentation/filesystems/porting, and that's not good enough. Anyone willing to take a shot at that? FWIW, the calling conventions are: d_make_root(inode) normally allocates and returns a new dentry. On failure NULL is returned. A reference to inode is consumed in all cases (on success it is transferred to new dentry, on failure it is dropped), so failure handling does not need anything done to inode. d_make_root(NULL) quietly returns NULL, which further simplifies the error handling in typical caller. Usually it's something like inode = foofs_new_inode(); s->s_root = d_make_inode(inode); if (!s->s_root) bugger off, no need to undo inode allocation success We do not need to check if foofs_new_inode() has returned NULL and we do not need any special cleanups in case of failure - not for the undoing the inode allocation. If anyone cares to convert that into coherent (and printable) documentation, patches are welcome...
[PATCH] rtc: mxc_v2: use dev_pm_set_wake_irq() to simplify code
With calling dev_pm_set_wake_irq() to set MXC_V2 RTC as wakeup source for suspend, generic wake irq mechanism will automatically enable it as wakeup source when suspend, then the suspend/resume callback which are ONLY for enabling/disabling irq wake can be removed, it simplifies the code. Signed-off-by: Anson Huang --- drivers/rtc/rtc-mxc_v2.c | 29 - 1 file changed, 4 insertions(+), 25 deletions(-) diff --git a/drivers/rtc/rtc-mxc_v2.c b/drivers/rtc/rtc-mxc_v2.c index 007879a..5b970a8 100644 --- a/drivers/rtc/rtc-mxc_v2.c +++ b/drivers/rtc/rtc-mxc_v2.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #define SRTC_LPPDR_INIT 0x41736166 /* init for glitch detect */ @@ -305,6 +306,9 @@ static int mxc_rtc_probe(struct platform_device *pdev) return pdata->irq; device_init_wakeup(>dev, 1); + ret = dev_pm_set_wake_irq(>dev, pdata->irq); + if (ret) + dev_err(>dev, "failed to enable irq wake\n"); ret = clk_prepare_enable(pdata->clk); if (ret) @@ -367,30 +371,6 @@ static int mxc_rtc_remove(struct platform_device *pdev) return 0; } -#ifdef CONFIG_PM_SLEEP -static int mxc_rtc_suspend(struct device *dev) -{ - struct mxc_rtc_data *pdata = dev_get_drvdata(dev); - - if (device_may_wakeup(dev)) - enable_irq_wake(pdata->irq); - - return 0; -} - -static int mxc_rtc_resume(struct device *dev) -{ - struct mxc_rtc_data *pdata = dev_get_drvdata(dev); - - if (device_may_wakeup(dev)) - disable_irq_wake(pdata->irq); - - return 0; -} -#endif - -static SIMPLE_DEV_PM_OPS(mxc_rtc_pm_ops, mxc_rtc_suspend, mxc_rtc_resume); - static const struct of_device_id mxc_ids[] = { { .compatible = "fsl,imx53-rtc", }, {} @@ -400,7 +380,6 @@ static struct platform_driver mxc_rtc_driver = { .driver = { .name = "mxc_rtc_v2", .of_match_table = mxc_ids, - .pm = _rtc_pm_ops, }, .probe = mxc_rtc_probe, .remove = mxc_rtc_remove, -- 2.7.4
Re: [RFC PATCH hubcap] orangefs: orangefs_file_open() can be static
On Thu, 2019-04-11 at 09:58 +0800, kbuild test robot wrote: > Fixes: 9a959aaffd70 ("orangefs: remember count when reading.") Making something static likely does not warrant a "Fixes:" tag > Signed-off-by: kbuild test robot > --- > file.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c > index d198af9..01d0db6 100644 > --- a/fs/orangefs/file.c > +++ b/fs/orangefs/file.c > @@ -571,7 +571,7 @@ static int orangefs_lock(struct file *filp, int cmd, > struct file_lock *fl) > return rc; > } > > -int orangefs_file_open(struct inode * inode, struct file *file) > +static int orangefs_file_open(struct inode * inode, struct file *file) > { > file->private_data = NULL; > return generic_file_open(inode, file);
Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling.
On Wed, Apr 10, 2019 at 10:18:10PM +0800, Aubrey Li wrote: > On Wed, Apr 10, 2019 at 12:36 PM Aaron Lu wrote: > > > > On Tue, Apr 09, 2019 at 11:09:45AM -0700, Tim Chen wrote: > > > Now that we have accumulated quite a number of different fixes to your > > > orginal > > > posted patches. Would you like to post a v2 of the core scheduler with > > > the fixes? > > > > One more question I'm not sure: should a task with cookie=0, i.e. tasks > > that are untagged, be allowed to scheduled on the the same core with > > another tagged task? > > > > The current patch seems to disagree on this, e.g. in pick_task(), > > if max is already chosen but max->core_cookie == 0, then we didn't care > > about cookie and simply use class_pick for the other cpu. This means we > > could schedule two tasks with different cookies(one is zero and the > > other can be tagged). > > > > But then sched_core_find() only allow idle task to match with any tagged > > tasks(we didn't place untagged tasks to the core tree of course :-). > > > > Thoughts? Do I understand this correctly? If so, I think we probably > > want to make this clear before v2. I personally feel, we shouldn't allow > > untagged tasks(like kernel threads) to match with tagged tasks. > > Does it make sense if we take untagged tasks as hypervisor, and different > cookie tasks as different VMs? Isolation is done between VMs, not between > VM and hypervisor. > > Did you see anything harmful if an untagged task and a tagged task > run simultaneously on the same core? VM can see hypervisor's data then, I think. We probably do not want that happen.
[PATCH] rtc: mxc: use dev_pm_set_wake_irq() to simplify code
With calling dev_pm_set_wake_irq() to set MXC RTC as wakeup source for suspend, generic wake irq mechanism will automatically enable it as wakeup source when suspend, then the suspend/resume callback which are ONLY for enabling/disabling irq wake can be removed, it simplifies the code. Signed-off-by: Anson Huang --- drivers/rtc/rtc-mxc.c | 32 ++-- 1 file changed, 6 insertions(+), 26 deletions(-) diff --git a/drivers/rtc/rtc-mxc.c b/drivers/rtc/rtc-mxc.c index 28a15bd..708b9e9 100644 --- a/drivers/rtc/rtc-mxc.c +++ b/drivers/rtc/rtc-mxc.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -394,8 +395,12 @@ static int mxc_rtc_probe(struct platform_device *pdev) pdata->irq = -1; } - if (pdata->irq >= 0) + if (pdata->irq >= 0) { device_init_wakeup(>dev, 1); + ret = dev_pm_set_wake_irq(>dev, pdata->irq); + if (ret) + dev_err(>dev, "failed to enable irq wake\n"); + } rtc = devm_rtc_device_register(>dev, pdev->name, _rtc_ops, THIS_MODULE); @@ -426,35 +431,10 @@ static int mxc_rtc_remove(struct platform_device *pdev) return 0; } -#ifdef CONFIG_PM_SLEEP -static int mxc_rtc_suspend(struct device *dev) -{ - struct rtc_plat_data *pdata = dev_get_drvdata(dev); - - if (device_may_wakeup(dev)) - enable_irq_wake(pdata->irq); - - return 0; -} - -static int mxc_rtc_resume(struct device *dev) -{ - struct rtc_plat_data *pdata = dev_get_drvdata(dev); - - if (device_may_wakeup(dev)) - disable_irq_wake(pdata->irq); - - return 0; -} -#endif - -static SIMPLE_DEV_PM_OPS(mxc_rtc_pm_ops, mxc_rtc_suspend, mxc_rtc_resume); - static struct platform_driver mxc_rtc_driver = { .driver = { .name= "mxc_rtc", .of_match_table = of_match_ptr(imx_rtc_dt_ids), - .pm = _rtc_pm_ops, }, .id_table = imx_rtc_devtype, .probe = mxc_rtc_probe, -- 2.7.4
[PATCH 2/2] regulator: mcp16502: Remove setup_regulators function
It seems a little bit odd current code pass struct regulator_config rather than a pointer to setup_regulators. The setup_regulators is so simple and only has one caller, so remove it. Signed-off-by: Axel Lin --- drivers/regulator/mcp16502.c | 37 +++- 1 file changed, 11 insertions(+), 26 deletions(-) diff --git a/drivers/regulator/mcp16502.c b/drivers/regulator/mcp16502.c index 9292ab8736c7..e5a02711cb46 100644 --- a/drivers/regulator/mcp16502.c +++ b/drivers/regulator/mcp16502.c @@ -427,36 +427,15 @@ static const struct regmap_config mcp16502_regmap_config = { .wr_table = _yes_reg_table, }; -/* - * set_up_regulators() - initialize all regulators - */ -static int setup_regulators(struct mcp16502 *mcp, struct device *dev, - struct regulator_config config) -{ - struct regulator_dev *rdev; - int i; - - for (i = 0; i < NUM_REGULATORS; i++) { - rdev = devm_regulator_register(dev, _desc[i], ); - if (IS_ERR(rdev)) { - dev_err(dev, - "failed to register %s regulator %ld\n", - mcp16502_desc[i].name, PTR_ERR(rdev)); - return PTR_ERR(rdev); - } - } - - return 0; -} - static int mcp16502_probe(struct i2c_client *client, const struct i2c_device_id *id) { struct regulator_config config = { }; + struct regulator_dev *rdev; struct device *dev; struct mcp16502 *mcp; struct regmap *rmap; - int ret = 0; + int i, ret; dev = >dev; config.dev = dev; @@ -482,9 +461,15 @@ static int mcp16502_probe(struct i2c_client *client, return PTR_ERR(mcp->lpm); } - ret = setup_regulators(mcp, dev, config); - if (ret != 0) - return ret; + for (i = 0; i < NUM_REGULATORS; i++) { + rdev = devm_regulator_register(dev, _desc[i], ); + if (IS_ERR(rdev)) { + dev_err(dev, + "failed to register %s regulator %ld\n", + mcp16502_desc[i].name, PTR_ERR(rdev)); + return PTR_ERR(rdev); + } + } mcp16502_gpio_set_mode(mcp, MCP16502_OPMODE_ACTIVE); -- 2.17.1
[hubcap:for-next 20/22] fs/orangefs/file.c:574:5: sparse: symbol 'orangefs_file_open' was not declared. Should it be static?
tree: https://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux for-next head: 6055a739910e69f8f76120d48e7ae74a13b1fdda commit: 9a959aaffd7090810eade53e4d960614405f57c6 [20/22] orangefs: remember count when reading. reproduce: # apt-get install sparse git checkout 9a959aaffd7090810eade53e4d960614405f57c6 make ARCH=x86_64 allmodconfig make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' sparse warnings: (new ones prefixed by >>) >> fs/orangefs/file.c:574:5: sparse: symbol 'orangefs_file_open' was not >> declared. Should it be static? fs/orangefs/file.c:580:5: sparse: symbol 'orangefs_flush' was not declared. Should it be static? Please review and possibly fold the followup patch. --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
[PATCH 1/2] regulator: mcp16502: Remove unneeded fields from struct mcp16502
At the context with rdev, we can use rdev->regmap instead of mcp->rmap. The *rdev[NUM_REGULATORS] is not required because current code uses devm_regulator_register() so we don't need to store *rdev for clean up paths. Signed-off-by: Axel Lin --- drivers/regulator/mcp16502.c | 40 +++- 1 file changed, 16 insertions(+), 24 deletions(-) diff --git a/drivers/regulator/mcp16502.c b/drivers/regulator/mcp16502.c index 3a8004abe044..9292ab8736c7 100644 --- a/drivers/regulator/mcp16502.c +++ b/drivers/regulator/mcp16502.c @@ -119,8 +119,6 @@ enum { * @lpm: LPM GPIO descriptor */ struct mcp16502 { - struct regulator_dev *rdev[NUM_REGULATORS]; - struct regmap *rmap; struct gpio_desc *lpm; }; @@ -179,13 +177,12 @@ static unsigned int mcp16502_get_mode(struct regulator_dev *rdev) { unsigned int val; int ret, reg; - struct mcp16502 *mcp = rdev_get_drvdata(rdev); reg = mcp16502_get_reg(rdev, MCP16502_OPMODE_ACTIVE); if (reg < 0) return reg; - ret = regmap_read(mcp->rmap, reg, ); + ret = regmap_read(rdev->regmap, reg, ); if (ret) return ret; @@ -211,7 +208,6 @@ static int _mcp16502_set_mode(struct regulator_dev *rdev, unsigned int mode, { int val; int reg; - struct mcp16502 *mcp = rdev_get_drvdata(rdev); reg = mcp16502_get_reg(rdev, op_mode); if (reg < 0) @@ -228,7 +224,7 @@ static int _mcp16502_set_mode(struct regulator_dev *rdev, unsigned int mode, return -EINVAL; } - reg = regmap_update_bits(mcp->rmap, reg, MCP16502_MODE, val); + reg = regmap_update_bits(rdev->regmap, reg, MCP16502_MODE, val); return reg; } @@ -247,9 +243,8 @@ static int mcp16502_get_status(struct regulator_dev *rdev) { int ret; unsigned int val; - struct mcp16502 *mcp = rdev_get_drvdata(rdev); - ret = regmap_read(mcp->rmap, MCP16502_STAT_BASE(rdev_get_id(rdev)), + ret = regmap_read(rdev->regmap, MCP16502_STAT_BASE(rdev_get_id(rdev)), ); if (ret) return ret; @@ -290,7 +285,6 @@ static int mcp16502_suspend_get_target_reg(struct regulator_dev *rdev) */ static int mcp16502_set_suspend_voltage(struct regulator_dev *rdev, int uV) { - struct mcp16502 *mcp = rdev_get_drvdata(rdev); int sel = regulator_map_voltage_linear_range(rdev, uV, uV); int reg = mcp16502_suspend_get_target_reg(rdev); @@ -300,7 +294,7 @@ static int mcp16502_set_suspend_voltage(struct regulator_dev *rdev, int uV) if (reg < 0) return reg; - return regmap_update_bits(mcp->rmap, reg, MCP16502_VSEL, sel); + return regmap_update_bits(rdev->regmap, reg, MCP16502_VSEL, sel); } /* @@ -328,13 +322,12 @@ static int mcp16502_set_suspend_mode(struct regulator_dev *rdev, */ static int mcp16502_set_suspend_enable(struct regulator_dev *rdev) { - struct mcp16502 *mcp = rdev_get_drvdata(rdev); int reg = mcp16502_suspend_get_target_reg(rdev); if (reg < 0) return reg; - return regmap_update_bits(mcp->rmap, reg, MCP16502_EN, MCP16502_EN); + return regmap_update_bits(rdev->regmap, reg, MCP16502_EN, MCP16502_EN); } /* @@ -342,13 +335,12 @@ static int mcp16502_set_suspend_enable(struct regulator_dev *rdev) */ static int mcp16502_set_suspend_disable(struct regulator_dev *rdev) { - struct mcp16502 *mcp = rdev_get_drvdata(rdev); int reg = mcp16502_suspend_get_target_reg(rdev); if (reg < 0) return reg; - return regmap_update_bits(mcp->rmap, reg, MCP16502_EN, 0); + return regmap_update_bits(rdev->regmap, reg, MCP16502_EN, 0); } #endif /* CONFIG_SUSPEND */ @@ -441,17 +433,16 @@ static const struct regmap_config mcp16502_regmap_config = { static int setup_regulators(struct mcp16502 *mcp, struct device *dev, struct regulator_config config) { + struct regulator_dev *rdev; int i; for (i = 0; i < NUM_REGULATORS; i++) { - mcp->rdev[i] = devm_regulator_register(dev, - _desc[i], - ); - if (IS_ERR(mcp->rdev[i])) { + rdev = devm_regulator_register(dev, _desc[i], ); + if (IS_ERR(rdev)) { dev_err(dev, "failed to register %s regulator %ld\n", - mcp16502_desc[i].name, PTR_ERR(mcp->rdev[i])); - return PTR_ERR(mcp->rdev[i]); + mcp16502_desc[i].name, PTR_ERR(rdev)); + return PTR_ERR(rdev); } } @@ -464,6 +455,7 @@ static int mcp16502_probe(struct i2c_client *client, struct regulator_config config = { };
[RFC PATCH hubcap] orangefs: orangefs_file_open() can be static
Fixes: 9a959aaffd70 ("orangefs: remember count when reading.") Signed-off-by: kbuild test robot --- file.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c index d198af9..01d0db6 100644 --- a/fs/orangefs/file.c +++ b/fs/orangefs/file.c @@ -571,7 +571,7 @@ static int orangefs_lock(struct file *filp, int cmd, struct file_lock *fl) return rc; } -int orangefs_file_open(struct inode * inode, struct file *file) +static int orangefs_file_open(struct inode * inode, struct file *file) { file->private_data = NULL; return generic_file_open(inode, file);
Re: \\ 答复: [PATCH] of: del redundant type conversion
My pleasure. I am very new to sparse. I guess the warning is caused by the macro min. Then I submitted my changes. Thanks for code review. -邮件原件- 发件人: Frank Rowand [mailto:frowand.l...@gmail.com] 发送时间: 2019年4月11日 2:50 收件人: xiaojiangfeng ; robh...@kernel.org; r...@kernel.org 抄送: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] of: del redundant type conversion On 4/10/19 1:29 AM, xiaojiangfeng wrote: > The type of variable l in early_init_dt_scan_chosen is int, there is > no need to convert to int. > > Signed-off-by: xiaojiangfeng > --- > drivers/of/fdt.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index > 4734223..de893c9 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long > node, const char *uname, > /* Retrieve command line */ > p = of_get_flat_dt_prop(node, "bootargs", ); > if (p != NULL && l > 0) > - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE)); > + strlcpy(data, p, min(l, COMMAND_LINE_SIZE)); > > /* >* CONFIG_CMDLINE is meant to be a default in case nothing else > Thanks for catching the redundant cast. There is a second problem detected by sparse on that line: drivers/of/fdt.c:1094:34: warning: expression using sizeof(void) Can you please fix both issues? Thanks, Frank
[RFC 1/2] mm: oom: expose expedite_reclaim to use oom_reaper outside of oom_kill.c
Create an API to allow users outside of oom_kill.c to mark a victim and wake up oom_reaper thread for expedited memory reclaim of the process being killed. Signed-off-by: Suren Baghdasaryan --- include/linux/oom.h | 1 + mm/oom_kill.c | 15 +++ 2 files changed, 16 insertions(+) diff --git a/include/linux/oom.h b/include/linux/oom.h index d07992009265..6c043c7518c1 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -112,6 +112,7 @@ extern unsigned long oom_badness(struct task_struct *p, unsigned long totalpages); extern bool out_of_memory(struct oom_control *oc); +extern bool expedite_reclaim(struct task_struct *task); extern void exit_oom_victim(void); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 3a2484884cfd..6449710c8a06 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1102,6 +1102,21 @@ bool out_of_memory(struct oom_control *oc) return !!oc->chosen; } +bool expedite_reclaim(struct task_struct *task) +{ + bool res = false; + + task_lock(task); + if (task_will_free_mem(task)) { + mark_oom_victim(task); + wake_oom_reaper(task); + res = true; + } + task_unlock(task); + + return res; +} + /* * The pagefault handler calls here because it is out of memory, so kill a * memory-hogging task. If oom_lock is held by somebody else, a parallel oom -- 2.21.0.392.gf8f6787159e-goog
[PATCH] of: fix expression using sizeof(void)
problem detected by sparse: drivers/of/fdt.c:1094:34: warning: expression using sizeof(void) Signed-off-by: xiaojiangfeng --- drivers/of/fdt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 4734223..75c6c55 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, /* Retrieve command line */ p = of_get_flat_dt_prop(node, "bootargs", ); if (p != NULL && l > 0) - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE)); + strlcpy(data, p, COMMAND_LINE_SIZE); /* * CONFIG_CMDLINE is meant to be a default in case nothing else -- 1.8.5.6
[RFC 0/2] opportunistic memory reclaim of a killed process
The time to kill a process and free its memory can be critical when the killing was done to prevent memory shortages affecting system responsiveness. In the case of Android, where processes can be restarted easily, killing a less important background process is preferred to delaying or throttling an interactive foreground process. At the same time unnecessary kills should be avoided as they cause delays when the killed process is needed again. This requires a balanced decision from the system software about how long a kill can be postponed in the hope that memory usage will decrease without such drastic measures. As killing a process and reclaiming its memory is not an instant operation, a margin of free memory has to be maintained to prevent system performance deterioration while memory of the killed process is being reclaimed. The size of this margin depends on the minimum reclaim rate to cover the worst-case scenario and this minimum rate should be deterministic. Note that on asymmetric architectures like ARM big.LITTLE the reclaim rate can vary dramatically depending on which core it’s performed at (see test results). It’s a usual scenario that a non-essential victim process is being restricted to a less performant or throttled CPU for power saving purposes. This makes the worst-case reclaim rate scenario very probable. The cases when victim’s memory reclaim can be delayed further due to process being blocked in an uninterruptible sleep or when it performs a time-consuming operation makes the reclaim time even more unpredictable. Increasing memory reclaim rate and making it more deterministic would allow for a smaller free memory margin and would lead to more opportunities to avoid killing a process. Note that while other strategies like throttling memory allocations are viable and can be employed for other non-essential processes they would affect user experience if applied towards an interactive process. Proposed solution uses existing oom-reaper thread to increase memory reclaim rate of a killed process and to make this rate more deterministic. By no means the proposed solution is considered the best and was chosen because it was simple to implement and allowed for test data collection. The downside of this solution is that it requires additional “expedite” hint for something which has to be fast in all cases. Would be great to find a way that does not require additional hints. Other possible approaches include: - Implementing a dedicated syscall to perform opportunistic reclaim in the context of the process waiting for the victim’s death. A natural boost bonus occurs if the waiting process has high or RT priority and is not limited by cpuset cgroup in its CPU choices. - Implement a mechanism that would perform opportunistic reclaim if it’s possible unconditionally (similar to checks in task_will_free_mem()). - Implement opportunistic reclaim that uses shrinker interface, PSI or other memory pressure indications as a hint to engage. Test details: Tests are performed on a Qualcomm® Snapdragon™ 845 8-core ARM big.LITTLE system with 4 little cores (0.3-1.6GHz) and 4 big cores (0.8-2.5GHz) running Android. Memory reclaim speed was measured using signal/signal_generate, kmem/rss_stat and sched/sched_process_exit traces. Test results: powersave governor, min freq normal kills expedited kills little 856 MB/sec3236 MB/sec big 5084 MB/sec 6144 MB/sec performance governor, max freq normal kills expedited kills little 5602 MB/sec 8144 MB/sec big 14656 MB/sec 12398 MB/sec schedutil governor (default) normal kills expedited kills little 2386 MB/sec 3908 MB/sec big 7282 MB/sec 6820-16386 MB/sec = min reclaim speed: 856 MB/sec3236 MB/sec The patches are based on 5.1-rc1 Suren Baghdasaryan (2): mm: oom: expose expedite_reclaim to use oom_reaper outside of oom_kill.c signal: extend pidfd_send_signal() to allow expedited process killing include/linux/oom.h | 1 + include/linux/sched/signal.h | 3 ++- include/linux/signal.h | 11 ++- ipc/mqueue.c | 2 +- kernel/signal.c | 37 kernel/time/itimer.c | 2 +- mm/oom_kill.c| 15 +++ 7 files changed, 59 insertions(+), 12 deletions(-) -- 2.21.0.392.gf8f6787159e-goog
[RFC 2/2] signal: extend pidfd_send_signal() to allow expedited process killing
Add new SS_EXPEDITE flag to be used when sending SIGKILL via pidfd_send_signal() syscall to allow expedited memory reclaim of the victim process. The usage of this flag is currently limited to SIGKILL signal and only to privileged users. Signed-off-by: Suren Baghdasaryan --- include/linux/sched/signal.h | 3 ++- include/linux/signal.h | 11 ++- ipc/mqueue.c | 2 +- kernel/signal.c | 37 kernel/time/itimer.c | 2 +- 5 files changed, 43 insertions(+), 12 deletions(-) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index e412c092c1e8..8a227633a058 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -327,7 +327,8 @@ extern int send_sig_info(int, struct kernel_siginfo *, struct task_struct *); extern void force_sigsegv(int sig, struct task_struct *p); extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *); extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp); -extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid); +extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid, + bool expedite); extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *, const struct cred *); extern int kill_pgrp(struct pid *pid, int sig, int priv); diff --git a/include/linux/signal.h b/include/linux/signal.h index 9702016734b1..34b7852aa4a0 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -446,8 +446,17 @@ int __save_altstack(stack_t __user *, unsigned long); } while (0); #ifdef CONFIG_PROC_FS + +/* + * SS_FLAGS values used in pidfd_send_signal: + * + * SS_EXPEDITE indicates desire to expedite the operation. + */ +#define SS_EXPEDITE0x0001 + struct seq_file; extern void render_sigset_t(struct seq_file *, const char *, sigset_t *); -#endif + +#endif /* CONFIG_PROC_FS */ #endif /* _LINUX_SIGNAL_H */ diff --git a/ipc/mqueue.c b/ipc/mqueue.c index aea30530c472..27c66296e08e 100644 --- a/ipc/mqueue.c +++ b/ipc/mqueue.c @@ -720,7 +720,7 @@ static void __do_notify(struct mqueue_inode_info *info) rcu_read_unlock(); kill_pid_info(info->notify.sigev_signo, - _i, info->notify_owner); + _i, info->notify_owner, false); break; case SIGEV_THREAD: set_cookie(info->notify_cookie, NOTIFY_WOKENUP); diff --git a/kernel/signal.c b/kernel/signal.c index f98448cf2def..02ed4332d17c 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -43,6 +43,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -1394,7 +1395,8 @@ int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp) return success ? 0 : retval; } -int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid) +int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid, + bool expedite) { int error = -ESRCH; struct task_struct *p; @@ -1402,8 +1404,17 @@ int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid) for (;;) { rcu_read_lock(); p = pid_task(pid, PIDTYPE_PID); - if (p) + if (p) { error = group_send_sig_info(sig, info, p, PIDTYPE_TGID); + + /* +* Ignore expedite_reclaim return value, it is best +* effort only. +*/ + if (!error && expedite) + expedite_reclaim(p); + } + rcu_read_unlock(); if (likely(!p || error != -ESRCH)) return error; @@ -1420,7 +1431,7 @@ static int kill_proc_info(int sig, struct kernel_siginfo *info, pid_t pid) { int error; rcu_read_lock(); - error = kill_pid_info(sig, info, find_vpid(pid)); + error = kill_pid_info(sig, info, find_vpid(pid), false); rcu_read_unlock(); return error; } @@ -1487,7 +1498,7 @@ static int kill_something_info(int sig, struct kernel_siginfo *info, pid_t pid) if (pid > 0) { rcu_read_lock(); - ret = kill_pid_info(sig, info, find_vpid(pid)); + ret = kill_pid_info(sig, info, find_vpid(pid), false); rcu_read_unlock(); return ret; } @@ -1704,7 +1715,7 @@ EXPORT_SYMBOL(kill_pgrp); int kill_pid(struct pid *pid, int sig, int priv) { - return kill_pid_info(sig, __si_special(priv), pid); + return kill_pid_info(sig, __si_special(priv), pid, false); } EXPORT_SYMBOL(kill_pid); @@ -3577,10
[RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects
The dentry slab cache is susceptible to internal fragmentation. Now that we have Slab Movable Objects we can attempt to defragment the dcache. Dentry objects are inherently _not_ relocatable however under some conditions they can be free'd. This is the same as shrinking the dcache but instead of shrinking the whole cache we only attempt to free those objects that are located in partially full slab pages. There is no guarantee that this will reduce the memory usage of the system, it is a compromise between fragmented memory and total cache shrinkage with the hope that some memory pressure can be alleviated. This is implemented using the newly added Slab Movable Objects infrastructure. The dcache 'migration' function is intentionally _not_ called 'd_migrate' because we only free, we do not migrate. Call it 'd_partial_shrink' to make explicit that no reallocation is done. Implement isolate and 'migrate' functions for the dentry slab cache. Signed-off-by: Tobin C. Harding --- fs/dcache.c | 71 + 1 file changed, 71 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 606cfca20d42..5c707ed9ab5a 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -30,6 +30,7 @@ #include #include #include +#include #include "internal.h" #include "mount.h" @@ -3068,6 +3069,74 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode) } EXPORT_SYMBOL(d_tmpfile); +/* + * d_isolate() - Dentry isolation callback function. + * @s: The dentry cache. + * @v: Vector of pointers to the objects to isolate. + * @nr: Number of objects in @v. + * + * The slab allocator is holding off frees. We can safely examine + * the object without the danger of it vanishing from under us. + */ +static void *d_isolate(struct kmem_cache *s, void **v, int nr) +{ + struct dentry *dentry; + int i; + + for (i = 0; i < nr; i++) { + dentry = v[i]; + __dget(dentry); + } + + return NULL;/* No need for private data */ +} + +/* + * d_partial_shrink() - Dentry migration callback function. + * @s: The dentry cache. + * @v: Vector of pointers to the objects to migrate. + * @nr: Number of objects in @v. + * @node: The NUMA node where new object should be allocated. + * @private: Returned by d_isolate() (currently %NULL). + * + * Dentry objects _can not_ be relocated and shrinking the whole dcache + * can be expensive. This is an effort to free dentry objects that are + * stopping slab pages from being free'd without clearing the whole dcache. + * + * This callback is called from the SLUB allocator object migration + * infrastructure in attempt to free up slab pages by freeing dentry + * objects from partially full slabs. + */ +static void d_partial_shrink(struct kmem_cache *s, void **v, int nr, + int node, void *_unused) +{ + struct dentry *dentry; + LIST_HEAD(dispose); + int i; + + for (i = 0; i < nr; i++) { + dentry = v[i]; + spin_lock(>d_lock); + dentry->d_lockref.count--; + + if (dentry->d_lockref.count > 0 || + dentry->d_flags & DCACHE_SHRINK_LIST) { + spin_unlock(>d_lock); + continue; + } + + if (dentry->d_flags & DCACHE_LRU_LIST) + d_lru_del(dentry); + + d_shrink_add(dentry, ); + + spin_unlock(>d_lock); + } + + if (!list_empty()) + shrink_dentry_list(); +} + static __initdata unsigned long dhash_entries; static int __init set_dhash_entries(char *str) { @@ -3113,6 +3182,8 @@ static void __init dcache_init(void) sizeof_field(struct dentry, d_iname), dcache_ctor); + kmem_cache_setup_mobility(dentry_cache, d_isolate, d_partial_shrink); + /* Hash may have been set up in dcache_init_early */ if (!hashdist) return; -- 2.21.0
Re: crypto: Kernel memory overwrite attempt detected to spans multiple pages
On Wed, 2019-04-10 at 16:11 -0700, Eric Biggers wrote: > You've explained *what* it does again, but not *why*. *Why* do you > want > hardened usercopy to detect copies across page boundaries, when there > is no > actual buffer overflow? When some subsystem in the kernel allocates multiple pages without _GFP_COMP, there is no way afterwards to detect exactly how many pages it allocated. In other words, there is no way to see how large the buffer is, nor whether the copy operation in question would overflow it. -- All Rights Reversed. signature.asc Description: This is a digitally signed message part
[RFC PATCH v3 13/15] dcache: Provide a dentry constructor
In order to support object migration on the dentry cache we need to have a determined object state at all times. Without a constructor the object would have a random state after allocation. Provide a dentry constructor. Signed-off-by: Tobin C. Harding --- fs/dcache.c | 31 ++- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index aac41adf4743..606cfca20d42 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1603,6 +1603,16 @@ void d_invalidate(struct dentry *dentry) } EXPORT_SYMBOL(d_invalidate); +static void dcache_ctor(void *p) +{ + struct dentry *dentry = p; + + /* Mimic lockref_mark_dead() */ + dentry->d_lockref.count = -128; + + spin_lock_init(>d_lock); +} + /** * __d_alloc - allocate a dcache entry * @sb: filesystem it will belong to @@ -1658,7 +1668,7 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name) dentry->d_lockref.count = 1; dentry->d_flags = 0; - spin_lock_init(>d_lock); + seqcount_init(>d_seq); dentry->d_inode = NULL; dentry->d_parent = dentry; @@ -3091,14 +3101,17 @@ static void __init dcache_init_early(void) static void __init dcache_init(void) { - /* -* A constructor could be added for stable state like the lists, -* but it is probably not worth it because of the cache nature -* of the dcache. -*/ - dentry_cache = KMEM_CACHE_USERCOPY(dentry, - SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT, - d_iname); + slab_flags_t flags = + SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | SLAB_MEM_SPREAD | SLAB_ACCOUNT; + + dentry_cache = + kmem_cache_create_usercopy("dentry", + sizeof(struct dentry), + __alignof__(struct dentry), + flags, + offsetof(struct dentry, d_iname), + sizeof_field(struct dentry, d_iname), + dcache_ctor); /* Hash may have been set up in dcache_init_early */ if (!hashdist) -- 2.21.0
Re: [v7 1/3] dt-bindings: ahci-fsl-qoriq: add lx2160a chip name to the list
On Tue, Mar 12, 2019 at 09:50:17AM +0800, Peng Ma wrote: > Add lx2160a compatible to bindings documentation. > > Signed-off-by: Peng Ma > Reviewed-by: Rob Herring I assume that the bindings will go via AHCI tree. Otherwise, please let me know. Shawn
[RFC PATCH v3 15/15] dcache: Add CONFIG_DCACHE_SMO
In an attempt to make the SMO patchset as non-invasive as possible add a config option CONFIG_DCACHE_SMO (under "Memory Management options") for enabling SMO for the DCACHE. Whithout this option dcache constructor is used but no other code is built in, with this option enabled slab mobility is enabled and the isolate/migrate functions are built in. Add CONFIG_DCACHE_SMO to guard the partial shrinking of the dcache via Slab Movable Objects infrastructure. Signed-off-by: Tobin C. Harding --- fs/dcache.c | 4 mm/Kconfig | 7 +++ 2 files changed, 11 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 5c707ed9ab5a..5ef68b78b457 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -3069,6 +3069,7 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode) } EXPORT_SYMBOL(d_tmpfile); +#ifdef CONFIG_DCACHE_SMO /* * d_isolate() - Dentry isolation callback function. * @s: The dentry cache. @@ -3136,6 +3137,7 @@ static void d_partial_shrink(struct kmem_cache *s, void **v, int nr, if (!list_empty()) shrink_dentry_list(); } +#endif /* CONFIG_DCACHE_SMO */ static __initdata unsigned long dhash_entries; static int __init set_dhash_entries(char *str) @@ -3182,7 +3184,9 @@ static void __init dcache_init(void) sizeof_field(struct dentry, d_iname), dcache_ctor); +#ifdef CONFIG_DCACHE_SMO kmem_cache_setup_mobility(dentry_cache, d_isolate, d_partial_shrink); +#endif /* Hash may have been set up in dcache_init_early */ if (!hashdist) diff --git a/mm/Kconfig b/mm/Kconfig index 47040d939f3b..92fc27ad3472 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -265,6 +265,13 @@ config SMO_NODE help On NUMA systems enable moving objects to and from a specified node. +config DCACHE_SMO + bool "Enable Slab Movable Objects for the dcache" + depends on SLUB + help + Under memory pressure we can try to free dentry slab cache objects from + the partial slab list if this is enabled. + config PHYS_ADDR_T_64BIT def_bool 64BIT -- 2.21.0
[RFC PATCH v3 08/15] tools/testing/slab: Add object migration test suite
We just added a module that enables testing the SLUB allocators ability to defrag/shrink caches via movable objects. Tests are better when they are automated. Add automated testing via a python script for SLUB movable objects. Example output: $ cd path/to/linux/tools/testing/slab $ /slub_defrag.py Please run script as root $ sudo ./slub_defrag.py $ sudo ./slub_defrag.py --debug Loading module ... Slab cache smo_test created Objects per slab: 20 Running sanity checks ... Running module stress test (see dmesg for additional test output) ... Removing module slub_defrag ... Loading module ... Slab cache smo_test created Running test non-movable ... testing slab 'smo_test' prior to enabling movable objects ... verified non-movable slabs are NOT shrinkable Running test movable ... testing slab 'smo_test' after enabling movable objects ... verified movable slabs are shrinkable Removing module slub_defrag ... Signed-off-by: Tobin C. Harding --- tools/testing/slab/slub_defrag.c | 1 + tools/testing/slab/slub_defrag.py | 451 ++ 2 files changed, 452 insertions(+) create mode 100755 tools/testing/slab/slub_defrag.py diff --git a/tools/testing/slab/slub_defrag.c b/tools/testing/slab/slub_defrag.c index 4a5c24394b96..8332e69ee868 100644 --- a/tools/testing/slab/slub_defrag.c +++ b/tools/testing/slab/slub_defrag.c @@ -337,6 +337,7 @@ static int smo_run_module_tests(int nr_objs, int keep) /* * struct functions() - Map command to a function pointer. + * If you update this please update the documentation in slub_defrag.py */ struct functions { char *fn_name; diff --git a/tools/testing/slab/slub_defrag.py b/tools/testing/slab/slub_defrag.py new file mode 100755 index ..41747c0db39b --- /dev/null +++ b/tools/testing/slab/slub_defrag.py @@ -0,0 +1,451 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +import subprocess +import sys +from os import path + +# SLUB Movable Objects test suite. +# +# Requirements: +# - CONFIG_SLUB=y +# - CONFIG_SLUB_DEBUG=y +# - The slub_defrag module in this directory. + +# Test SMO using a kernel module that enables triggering arbitrary +# kernel code from userspace via a debugfs file. +# +# Module code is in ./slub_defrag.c, basically the functionality is as +# follows: +# +# - Creates debugfs file /sys/kernel/debugfs/smo/callfn +# - Writes to 'callfn' are parsed as a command string and the function +#associated with command is called. +# - Defines 4 commands (all commands operate on smo_test cache): +# - 'test': Runs module stress tests. +# - 'alloc N': Allocates N slub objects +# - 'free N POS': Frees N objects starting at POS (see below) +# - 'enable': Enables SLUB Movable Objects +# +# The module maintains a list of allocated objects. Allocation adds +# objects to the tail of the list. Free'ing frees from the head of the +# list. This has the effect of creating free slots in the slab. For +# finer grained control over where in the cache slots are free'd POS +# (position) argument may be used. + +# The main() function is reasonably readable; the test suite does the +# following: +# +# 1. Runs the module stress tests. +# 2. Tests the cache without movable objects enabled. +#- Creates multiple partial slabs as explained above. +#- Verifies that partial slabs are _not_ removed by shrink (see below). +# 3. Tests the cache with movable objects enabled. +#- Creates multiple partial slabs as explained above. +#- Verifies that partial slabs _are_ removed by shrink (see below). + +# The sysfs file /sys/kernel/slab//shrink enables calling the +# function kmem_cache_shrink() (see mm/slab_common.c and mm/slub.cc). +# Shrinking a cache attempts to consolidate all partial slabs by moving +# objects if object migration is enable for the cache, otherwise +# shrinking a cache simply re-orders the partial list so as most densely +# populated slab are at the head of the list. + +# Enable/disable debugging output (also enabled via -d | --debug). +debug = False + +# Used in debug messages and when running `insmod`. +MODULE_NAME = "slub_defrag" + +# Slab cache created by the test module. +CACHE_NAME = "smo_test" + +# Set by get_slab_config() +objects_per_slab = 0 +pages_per_slab = 0 +debugfs_mounted = False # Set to true if we mount debugfs. + + +def eprint(*args, **kwargs): +print(*args, file=sys.stderr, **kwargs) + + +def dprint(*args, **kwargs): +if debug: +print(*args, file=sys.stderr, **kwargs) + + +def run_shell(cmd): +return subprocess.call([cmd], shell=True) + + +def run_shell_get_stdout(cmd): +return subprocess.check_output([cmd], shell=True) + + +def assert_root(): +user = run_shell_get_stdout('whoami') +if user != b'root\n': +eprint("Please run script as root") +sys.exit(1) + + +def mount_debugfs(): +mounted = False + +# Check if debugfs is mounted at a known