date:20190410

On Tue, Apr 02, 2019 at 02:32:04PM -0500, Adam Ford wrote:
> Some USB peripherals draw more power, and the sourcing regulator
> take a little time to turn on.  This patch fixes an issue where
> some devices occasionally do not get detected, because the power
> isn't quite ready when communication starts, so we add a bit
> of a delay.
> 
> Fixes: 1c207f911fe9 ("ARM: dts: imx: Add support for Logic PD
> i.MX6QD EVM")
> 
> Signed-off-by: Adam Ford 

Applied, thanks.

[PATCH v2 11/11] platform/x86: asus-wmi: Do not disable keyboard backlight on unload

The keyboard backlight is disabled when module is unloaded as it is
exposed as LED device. Change this behavior to ignore setting 0 brightness
when the ledclass device is unloading.

Signed-off-by: Yurii Pavlovskyi 
---
 drivers/platform/x86/asus-wmi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index f0e506feb924..f49992fa87b3 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -475,6 +475,10 @@ static void do_kbd_led_set(struct led_classdev *led_cdev, 
int value)
 static void kbd_led_set(struct led_classdev *led_cdev,
enum led_brightness value)
 {
+   /* Prevent disabling keyboard backlight on module unregister */
+   if (led_cdev->flags & LED_UNREGISTERING)
+   return;
+
do_kbd_led_set(led_cdev, value);
 }
 
-- 
2.17.1

Re: [PATCH V2] ARM: dts: imx6q-logicpd: Shutdown LCD regulator during suspend

On Tue, Apr 02, 2019 at 02:25:46PM -0500, Adam Ford wrote:
> The LCD power sequencer is very finicky.  The backlight cannot
> be driven until after the sequencer is done.  Until now, the
> regulators were marked with 'regulator-always-on' to make sure
> it came up before the backlight.  This patch allows the LCD
> regulators to power down and prevent the backlight from being
> used again until the sequencer is ready.  This reduces
> standby power consumption by ~100mW.
> 
> Signed-off-by: Adam Ford 

Applied, thanks.

[PATCH v2 09/11] platform/x86: asus-wmi: Control RGB keyboard backlight

The WMI exposes two methods for controlling RGB keyboard backlight which
allow to control:
* RGB components in range 00 - ff,
* Switch between 4 effects,
* Switch between 3 effect speed modes,
* Separately enable the backlight on boot, in awake state (after driver
  load), in sleep mode, and probably in something called shutdown mode
  (no observable effects of enabling it are known so far).

The configuration should be written to several sysfs parameter buffers
which are then written via WMI by writing either 1 or 2 to the "kbbl_set"
parameter. When reading the buffers the last written value is returned.

If the 2 is written to "kbbl_set", the parameters will be reset on reboot
(temporary mode), 1 is permanent mode, parameters are retained.

The calls use new 3-dword input buffer method call.

The functionality is only enabled if corresponding DSTS methods return
exact valid values.

The following script demonstrates usage:

echo Red [00 - ff]
echo 33 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_red
echo Green [00 - ff]
echo ff > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_green
echo Blue [00 - ff]
echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_blue
echo Mode: 0 - static color, 1 - blink, 2 - rainbow, 3 - strobe
echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_mode
echo Speed for modes 1 and 2: 0 - slow, 1 - medium, 2 - fast
echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_speed
echo Enable: 02 - on boot, before module load, 08 - awake, 20 - sleep,
echo 2a or ff to set all
echo 2a > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_flags
echo Save: 1 - permanently, 2 - temporarily, reset after reboot
echo 1 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_set

Signed-off-by: Yurii Pavlovskyi 
---
 .../ABI/testing/sysfs-platform-asus-wmi   |  61 
 drivers/platform/x86/asus-wmi.c   | 329 ++
 include/linux/platform_data/x86/asus-wmi.h|   2 +
 3 files changed, 392 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-platform-asus-wmi 
b/Documentation/ABI/testing/sysfs-platform-asus-wmi
index 019e1e29370e..300a40519695 100644
--- a/Documentation/ABI/testing/sysfs-platform-asus-wmi
+++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi
@@ -36,3 +36,64 @@ KernelVersion:   3.5
 Contact:   "AceLan Kao" 
 Description:
Resume on lid open. 1 means on, 0 means off.
+
+What:  /sys/devices/platform//kbbl/kbbl_red
+Date:  Apr 2019
+KernelVersion: 5.1
+Contact:   "Yurii Pavlovskyi" 
+Description:
+   RGB keyboard backlight red component: 00 .. ff.
+
+What:  /sys/devices/platform//kbbl/kbbl_green
+Date:  Apr 2019
+KernelVersion: 5.1
+Contact:   "Yurii Pavlovskyi" 
+Description:
+   RGB keyboard backlight green component: 00 .. ff.
+
+What:  /sys/devices/platform//kbbl/kbbl_blue
+Date:  Apr 2019
+KernelVersion: 5.1
+Contact:   "Yurii Pavlovskyi" 
+Description:
+   RGB keyboard backlight blue component: 00 .. ff.
+
+What:  /sys/devices/platform//kbbl/kbbl_mode
+Date:  Apr 2019
+KernelVersion: 5.1
+Contact:   "Yurii Pavlovskyi" 
+Description:
+   RGB keyboard backlight mode:
+   * 0 - static color,
+   * 1 - blink,
+   * 2 - rainbow,
+   * 3 - strobe.
+
+What:  /sys/devices/platform//kbbl/kbbl_speed
+Date:  Apr 2019
+KernelVersion: 5.1
+Contact:   "Yurii Pavlovskyi" 
+Description:
+   RGB keyboard backlight speed for modes 1 and 2:
+   * 0 - slow,
+   * 1 - medium,
+   * 2 - fast.
+
+What:  /sys/devices/platform//kbbl/kbbl_flags
+Date:  Apr 2019
+KernelVersion: 5.1
+Contact:   "Yurii Pavlovskyi" 
+Description:
+   RGB keyboard backlight enable flags (2a to enable everything), 
OR of:
+   * 02 - on boot (until module load),
+   * 08 - awake,
+   * 20 - sleep.
+
+What:  /sys/devices/platform//kbbl/kbbl_set
+Date:  Apr 2019
+KernelVersion: 5.1
+Contact:   "Yurii Pavlovskyi" 
+Description:
+   Write changed RGB keyboard backlight parameters:
+   * 1 - permanently,
+   * 2 - temporarily.
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index de0a8f61d4a1..b4fd200e8335 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -145,6 +145,21 @@ struct asus_rfkill {
u32 dev_id;
 };
 
+struct asus_kbbl_rgb {
+   u8 kbbl_red;
+   u8 kbbl_green;
+   u8 kbbl_blue;
+   u8 kbbl_mode;
+   u8 kbbl_speed;
+
+   u8 kbbl_set_red;
+   u8 kbbl_set_green;
+   u8 kbbl_set_blue;
+   u8 kbbl_set_mode;
+   u8 kbbl_set_speed;
+   u8 kbbl_set_flags;
+};
+
 struct asus_wmi {
int dsts_id;
int spec;

[PATCH v2 10/11] platform/x86: asus-wmi: Switch fan boost mode

The WMI exposes a write-only device ID where three modes can be switched
on some laptops (TUF Gaming FX505GM). There is a hotkey combination Fn-F5
that does have a fan icon which is designed to toggle between these 3
modes.

Add a SysFS entry that reads the last written value and updates value in
WMI on write and a hotkey handler that toggles the modes. The
corresponding DEVS device handler does obviously take 3 possible
argument values.

Method (SFBM, 1, NotSerialized)
{
If ((Arg0 == Zero) { .. }
If ((Arg0 == One)) { .. }
If ((Arg0 == 0x02)) { .. }
}

... // DEVS
If ((IIA0 == 0x00110018))
{
   SFBM (IIA1)
   Return (One)
}

* 0x00 - is normal,
* 0x01 - is obviously turbo by the amount of noise, might be useful to
avoid CPU frequency throttling on high load,
* 0x02 - the meaning is unknown at the time as modes are not named
in the vendor documentation, but it does look like a quiet mode as CPU
temperature does increase about 10 degrees on maximum load.

Signed-off-by: Yurii Pavlovskyi 
---
 .../ABI/testing/sysfs-platform-asus-wmi   |  10 ++
 drivers/platform/x86/asus-wmi.c   | 119 --
 include/linux/platform_data/x86/asus-wmi.h|   1 +
 3 files changed, 117 insertions(+), 13 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-platform-asus-wmi 
b/Documentation/ABI/testing/sysfs-platform-asus-wmi
index 300a40519695..2b3184e297a7 100644
--- a/Documentation/ABI/testing/sysfs-platform-asus-wmi
+++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi
@@ -97,3 +97,13 @@ Description:
Write changed RGB keyboard backlight parameters:
* 1 - permanently,
* 2 - temporarily.
+
+What:  /sys/devices/platform//fan_mode
+Date:  Apr 2019
+KernelVersion: 5.1
+Contact:   "Yurii Pavlovskyi" 
+Description:
+   Fan boost mode:
+   * 0 - normal,
+   * 1 - turbo,
+   * 2 - quiet?
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index b4fd200e8335..f0e506feb924 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -69,6 +69,7 @@ MODULE_LICENSE("GPL");
 #define NOTIFY_KBD_BRTUP   0xc4
 #define NOTIFY_KBD_BRTDWN  0xc5
 #define NOTIFY_KBD_BRTTOGGLE   0xc7
+#define NOTIFY_KBD_FBM 0x99
 
 #define ASUS_FAN_DESC  "cpu_fan"
 #define ASUS_FAN_MFUN  0x13
@@ -77,6 +78,8 @@ MODULE_LICENSE("GPL");
 #define ASUS_FAN_CTRL_MANUAL   1
 #define ASUS_FAN_CTRL_AUTO 2
 
+#define ASUS_FAN_MODE_COUNT3
+
 #define USB_INTEL_XUSB2PR  0xD0
 #define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI  0x9c31
 
@@ -196,6 +199,9 @@ struct asus_wmi {
int asus_hwmon_num_fans;
int asus_hwmon_pwm;
 
+   bool fan_mode_available;
+   u8 fan_mode;
+
bool kbbl_rgb_available;
struct asus_kbbl_rgb kbbl_rgb;
 
@@ -1832,6 +1838,87 @@ static int asus_wmi_fan_init(struct asus_wmi *asus)
return 0;
 }
 
+/* Fan mode 
***/
+
+static int fan_mode_check_present(struct asus_wmi *asus)
+{
+   u32 result;
+   int err;
+
+   asus->fan_mode_available = false;
+
+   err = asus_wmi_get_devstate(asus, ASUS_WMI_DEVID_FAN_MODE, );
+   if (err) {
+   if (err == -ENODEV)
+   return 0;
+   else
+   return err;
+   }
+
+   if (result & ASUS_WMI_DSTS_PRESENCE_BIT)
+   asus->fan_mode_available = true;
+
+   return 0;
+}
+
+static int fan_mode_write(struct asus_wmi *asus)
+{
+   int err;
+   u8 value;
+   u32 retval;
+
+   value = asus->fan_mode % ASUS_FAN_MODE_COUNT;
+   pr_info("Set fan mode: %u\n", value);
+   err = asus_wmi_set_devstate(ASUS_WMI_DEVID_FAN_MODE, value, );
+
+   if (err) {
+   pr_warn("Failed to set fan mode: %d\n", err);
+   return err;
+   }
+
+   if (retval != 1) {
+   pr_warn("Failed to set fan mode (retval): 0x%x\n", retval);
+   return -EIO;
+   }
+
+   return 0;
+}
+
+static int fan_mode_switch_next(struct asus_wmi *asus)
+{
+   asus->fan_mode = (asus->fan_mode + 1) % ASUS_FAN_MODE_COUNT;
+   return fan_mode_write(asus);
+}
+
+static ssize_t fan_mode_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct asus_wmi *asus = dev_get_drvdata(dev);
+
+   return show_u8(asus->fan_mode, buf);
+}
+
+static ssize_t fan_mode_store(struct device *dev,
+   struct device_attribute *attr, const char *buf, size_t count)
+{
+   int result;
+   u8 new_mode;
+
+   struct asus_wmi *asus = dev_get_drvdata(dev);
+
+   result = store_u8(_mode, buf, count);
+   if (result < 0)
+   return result;
+
+

[PATCH v2 08/11] platform/x86: asus-wmi: Enhance detection of thermal data

The obviously wrong value 1 for temperature device ID in this driver is
returned by at least some devices, including TUF Gaming series laptops,
instead of 0 as expected previously. Observable effect is that a
temp1_input in hwmon reads temperature near absolute zero.

* Consider 0.1 K as erroneous value in addition to 0 K.
* Refactor detection of thermal input availability to a separate function.

Signed-off-by: Yurii Pavlovskyi 
---
 drivers/platform/x86/asus-wmi.c | 45 -
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index a98df005d6cb..de0a8f61d4a1 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -176,6 +176,7 @@ struct asus_wmi {
struct asus_rfkill gps;
struct asus_rfkill uwb;
 
+   bool asus_hwmon_thermal_available;
bool asus_hwmon_fan_manual_mode;
int asus_hwmon_num_fans;
int asus_hwmon_pwm;
@@ -1373,6 +1374,32 @@ static struct attribute *hwmon_attributes[] = {
NULL
 };
 
+static int asus_hwmon_check_thermal_available(struct asus_wmi *asus)
+{
+   u32 value = ASUS_WMI_UNSUPPORTED_METHOD;
+   int err;
+
+   asus->asus_hwmon_thermal_available = false;
+   err = asus_wmi_get_devstate(asus, ASUS_WMI_DEVID_THERMAL_CTRL, );
+
+   if (err < 0) {
+   if (err == -ENODEV)
+   return 0;
+
+   return err;
+   }
+
+   /*
+* If the temperature value in deci-Kelvin is near the absolute
+* zero temperature, something is clearly wrong.
+*/
+   if (!value || value == 1)
+   return 0;
+
+   asus->asus_hwmon_thermal_available = true;
+   return 0;
+}
+
 static umode_t asus_hwmon_sysfs_is_visible(struct kobject *kobj,
  struct attribute *attr, int idx)
 {
@@ -1386,8 +1413,6 @@ static umode_t asus_hwmon_sysfs_is_visible(struct kobject 
*kobj,
 
if (attr == _attr_pwm1.attr)
dev_id = ASUS_WMI_DEVID_FAN_CTRL;
-   else if (attr == _attr_temp1_input.attr)
-   dev_id = ASUS_WMI_DEVID_THERMAL_CTRL;
 
if (attr == _attr_fan1_input.attr
|| attr == _attr_fan1_label.attr
@@ -1412,15 +1437,13 @@ static umode_t asus_hwmon_sysfs_is_visible(struct 
kobject *kobj,
 * - reverved bits are non-zero
 * - sfun and presence bit are not set
 */
-   if (value == ASUS_WMI_UNSUPPORTED_METHOD || value & 0xFFF8
+   if (value == ASUS_WMI_UNSUPPORTED_METHOD || (value & 0xFFF8)
|| (!asus->sfun && !(value & ASUS_WMI_DSTS_PRESENCE_BIT)))
ok = false;
else
ok = fan_attr <= asus->asus_hwmon_num_fans;
-   } else if (dev_id == ASUS_WMI_DEVID_THERMAL_CTRL) {
-   /* If value is zero, something is clearly wrong */
-   if (!value)
-   ok = false;
+   } else if (attr == _attr_temp1_input.attr) {
+   ok = asus->asus_hwmon_thermal_available;
} else if (fan_attr <= asus->asus_hwmon_num_fans && fan_attr != -1) {
ok = true;
} else {
@@ -1476,6 +1499,14 @@ static int asus_wmi_fan_init(struct asus_wmi *asus)
}
 
pr_info("Number of fans: %d\n", asus->asus_hwmon_num_fans);
+
+   status = asus_hwmon_check_thermal_available(asus);
+   if (status) {
+   pr_warn("Could not check if thermal available: %d\n", status);
+   return -ENXIO;
+   }
+
+   pr_info("Thermal available: %d\n", asus->asus_hwmon_thermal_available);
return 0;
 }
 
-- 
2.17.1

Re: [PATCH V2] ARM: dts: imx6q-logicpd: Reduce inrush current on start

On Tue, Apr 02, 2019 at 02:19:08PM -0500, Adam Ford wrote:
> The main 3.3V regulator sources a series of additional regulators.
> This patch adds a small delay, so when the 3.3V regulator comes
> on it delays a bit before the subsequent regulators can come on.
> This reduces the inrush current a bit on the external DC power
> supply to help prevent a situation where the sourcing power supply
> cannot source enough current and overloads and the kit fails to
> start.
> 
> Fixes: 1c207f911fe9 ("ARM: dts: imx: Add support for Logic PD
> i.MX6QD EVM")
> 
> Signed-off-by: Adam Ford 

Applied, thanks.

Re: [PATCH net] vhost: reject zero size iova range

2019-04-10 Thread David Miller

From: Jason Wang 
Date: Tue,  9 Apr 2019 12:10:25 +0800

> We used to accept zero size iova range which will lead a infinite loop
> in translate_desc(). Fixing this by failing the request in this case.
> 
> Reported-by: syzbot+d21e6e297322a900c...@syzkaller.appspotmail.com
> Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
> Signed-off-by: Jason Wang 

Applied and queued up for -stable.

Re: [PATCH V2] ARM: dts: imx6q-logicpd: Enable Analog audio capture

On Tue, Apr 02, 2019 at 02:25:45PM -0500, Adam Ford wrote:
> The original submission had functional audio out and was based
> on reviewing other boards using the same wm8962 codec. However,
> the Logic PD board uses an analog microphone which was being
> disabled for a digital mic.  This patch corrects that and
> explicitly sets the gpio-cfg pins all to 0x which allows the
> analog microphone to capture audio.
> 
> Signed-off-by: Adam Ford 

Applied, thanks.

[PATCH v2 06/11] platform/x86: asus-nb-wmi: Add microphone mute key code

The microphone mute key that is present on FX505GM laptop and possibly
others is missing from sparse keymap. Add the missing code.

Also comment on the fan mode switch key that has the same code as the
already used key.

Signed-off-by: Yurii Pavlovskyi 
---
 drivers/platform/x86/asus-nb-wmi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/platform/x86/asus-nb-wmi.c 
b/drivers/platform/x86/asus-nb-wmi.c
index 357d273ed336..39cf447198a9 100644
--- a/drivers/platform/x86/asus-nb-wmi.c
+++ b/drivers/platform/x86/asus-nb-wmi.c
@@ -474,6 +474,7 @@ static const struct key_entry asus_nb_wmi_keymap[] = {
{ KE_KEY, 0x6B, { KEY_TOUCHPAD_TOGGLE } },
{ KE_IGNORE, 0x6E, },  /* Low Battery notification */
{ KE_KEY, 0x7a, { KEY_ALS_TOGGLE } }, /* Ambient Light Sensor Toggle */
+   { KE_KEY, 0x7c, { KEY_MICMUTE } },
{ KE_KEY, 0x7D, { KEY_BLUETOOTH } }, /* Bluetooth Enable */
{ KE_KEY, 0x7E, { KEY_BLUETOOTH } }, /* Bluetooth Disable */
{ KE_KEY, 0x82, { KEY_CAMERA } },
@@ -488,7 +489,7 @@ static const struct key_entry asus_nb_wmi_keymap[] = {
{ KE_KEY, 0x92, { KEY_SWITCHVIDEOMODE } }, /* SDSP CRT + TV + DVI */
{ KE_KEY, 0x93, { KEY_SWITCHVIDEOMODE } }, /* SDSP LCD + CRT + TV + DVI 
*/
{ KE_KEY, 0x95, { KEY_MEDIA } },
-   { KE_KEY, 0x99, { KEY_PHONE } },
+   { KE_KEY, 0x99, { KEY_PHONE } }, /* Conflicts with fan mode switch */
{ KE_KEY, 0xA0, { KEY_SWITCHVIDEOMODE } }, /* SDSP HDMI only */
{ KE_KEY, 0xA1, { KEY_SWITCHVIDEOMODE } }, /* SDSP LCD + HDMI */
{ KE_KEY, 0xA2, { KEY_SWITCHVIDEOMODE } }, /* SDSP CRT + HDMI */
-- 
2.17.1

[PATCH v2 07/11] platform/x86: asus-wmi: Organize code into sections

The driver has grown (and will more) pretty big which makes it hard to
navigate and understand. Add uniform comments to the code and ensure that
it is sorted into logical sections.

Signed-off-by: Yurii Pavlovskyi 
---
 drivers/platform/x86/asus-wmi.c | 94 -
 1 file changed, 46 insertions(+), 48 deletions(-)

diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index 5aa30f8a2a38..a98df005d6cb 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -191,6 +191,8 @@ struct asus_wmi {
struct asus_wmi_driver *driver;
 };
 
+/* Input 
**/
+
 static int asus_wmi_input_init(struct asus_wmi *asus)
 {
int err;
@@ -228,6 +230,8 @@ static void asus_wmi_input_exit(struct asus_wmi *asus)
asus->inputdev = NULL;
 }
 
+/* WMI 
/
+
 static int asus_wmi_evaluate_method_3dw(u32 method_id, u32 arg0, u32 arg1,
u32 arg2, u32 *retval)
 {
@@ -246,7 +250,7 @@ static int asus_wmi_evaluate_method_3dw(u32 method_id, u32 
arg0, u32 arg1,
 , );
 
if (ACPI_FAILURE(status))
-   goto exit;
+   return -EIO;
 
obj = (union acpi_object *)output.pointer;
if (obj && obj->type == ACPI_TYPE_INTEGER)
@@ -257,10 +261,6 @@ static int asus_wmi_evaluate_method_3dw(u32 method_id, u32 
arg0, u32 arg1,
 
kfree(obj);
 
-exit:
-   if (ACPI_FAILURE(status))
-   return -EIO;
-
if (tmp == ASUS_WMI_UNSUPPORTED_METHOD)
return -ENODEV;
 
@@ -344,9 +344,8 @@ static int asus_wmi_get_devstate_simple(struct asus_wmi 
*asus, u32 dev_id)
  ASUS_WMI_DSTS_STATUS_BIT);
 }
 
-/*
- * LEDs
- */
+/* LEDs 
***/
+
 /*
  * These functions actually update the LED's, and are called from a
  * workqueue. By doing this as separate work rather than when the LED
@@ -656,6 +655,7 @@ static int asus_wmi_led_init(struct asus_wmi *asus)
return rv;
 }
 
+/* RF 
*/
 
 /*
  * PCI hotplug (for wlan rfkill)
@@ -1078,6 +1078,8 @@ static int asus_wmi_rfkill_init(struct asus_wmi *asus)
return result;
 }
 
+/* Quirks 
*/
+
 static void asus_wmi_set_xusb2pr(struct asus_wmi *asus)
 {
struct pci_dev *xhci_pdev;
@@ -1110,9 +1112,8 @@ static void asus_wmi_set_als(void)
asus_wmi_set_devstate(ASUS_WMI_DEVID_ALS_ENABLE, 1, NULL);
 }
 
-/*
- * Hwmon device
- */
+/* Hwmon device 
***/
+
 static int asus_hwmon_agfn_fan_speed_read(struct asus_wmi *asus, int fan,
  int *speed)
 {
@@ -1388,7 +1389,6 @@ static umode_t asus_hwmon_sysfs_is_visible(struct kobject 
*kobj,
else if (attr == _attr_temp1_input.attr)
dev_id = ASUS_WMI_DEVID_THERMAL_CTRL;
 
-
if (attr == _attr_fan1_input.attr
|| attr == _attr_fan1_label.attr
|| attr == _attr_pwm1.attr
@@ -1460,9 +1460,27 @@ static void asus_wmi_hwmon_exit(struct asus_wmi *asus)
}
 }
 
-/*
- * Backlight
- */
+static int asus_wmi_fan_init(struct asus_wmi *asus)
+{
+   int status;
+
+   asus->asus_hwmon_pwm = -1;
+   asus->asus_hwmon_num_fans = -1;
+   asus->asus_hwmon_fan_manual_mode = false;
+
+   status = asus_hwmon_get_fan_number(asus, >asus_hwmon_num_fans);
+   if (status) {
+   asus->asus_hwmon_num_fans = 0;
+   pr_warn("Could not determine number of fans: %d\n", status);
+   return -ENXIO;
+   }
+
+   pr_info("Number of fans: %d\n", asus->asus_hwmon_num_fans);
+   return 0;
+}
+
+/* Backlight 
**/
+
 static int read_backlight_power(struct asus_wmi *asus)
 {
int ret;
@@ -1644,6 +1662,8 @@ static int is_display_toggle(int code)
return 0;
 }
 
+/* WMI events 
*/
+
 static int asus_poll_wmi_event(u32 value)
 {
struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL };
@@ -1766,9 +1786,8 @@ static int asus_wmi_notify_queue_flush(struct asus_wmi 
*asus)
return -EIO;
 }
 
-/*
- * Sys helpers
- */
+/* Sysfs 
**/
+
 static int parse_arg(const char *buf, unsigned long count, int *val)
 {
if (!count)
@@ -1907,9 +1926,8 @@ static int asus_wmi_sysfs_init(struct platform_device 
*device)
return sysfs_create_group(>dev.kobj, _attribute_group);
 }
 
-/*
- * Platform device
- */
+/* Platform device

[PATCH v2 05/11] platform/x86: asus-wmi: Support queued WMI event codes

Event codes are expected to be polled from a queue on at least some
models.

The WMI event codes are pushed into queue based on circular buffer. After
INIT method is called ACPI code is allowed to push events into this buffer
the INIT method can not be reverted. If the module is unloaded and an
event (such as hotkey press) gets emitted before inserting it back the
events get processed delayed by one or, if the queue overflows,
additionally delayed by about 3 seconds.

Patch was tested on a newer TUF Gaming FX505GM and older K54C model.

FX505GM
Device (ATKD)
{ ..
Name (ATKQ, Package (0x10)
{
0x, ..
}

Method (IANQ, 1, Serialized)
{
If ((AQNO >= 0x10))
{
Local0 = 0x64
While ((Local0 && (AQNO >= 0x10)))
{
Local0--
Sleep (0x0A)
}
...
..
AQTI++
AQTI &= 0x0F
ATKQ [AQTI] = Arg0
...
}

Method (GANQ, 0, Serialized)
{
..
If (AQNO)
{
...
Local0 = DerefOf (ATKQ [AQHI])
AQHI++
AQHI &= 0x0F
Return (Local0)
}

Return (One)
}

This code is almost identical to K54C, which does return Ones on empty
queue.

K54C:
Method (GANQ, 0, Serialized)
{
If (AQNO)
{
...
Return (Local0)
}

Return (Ones)
}

The fix flushes the old key codes out of the queue on load and after
receiving event the queue is read until either .. or 1 is encountered.

It might be considered a minor issue and no normal user would likely to
observe this (there is little reason unloading the driver), but it does
significantly frustrate a developer who is unlucky enough to encounter
this.

Introduce functionality for flushing and processing queued codes, which is
enabled via quirk flag for ASUS7000. It might be considered if it is
reasonable to enable it everywhere (might introduce regressions) or always
try to flush the queue on module load and try to detect if this quirk is
present in the future.

This patch limits the effect to the specific hardware defined by ASUS7000
device that is used for driver detection by vendor driver of Fx505. The
fallback is also implemented in case initial flush fails.

Signed-off-by: Yurii Pavlovskyi 
---
 drivers/platform/x86/asus-nb-wmi.c |   1 +
 drivers/platform/x86/asus-wmi.c| 122 ++---
 drivers/platform/x86/asus-wmi.h|   2 +
 3 files changed, 97 insertions(+), 28 deletions(-)

diff --git a/drivers/platform/x86/asus-nb-wmi.c 
b/drivers/platform/x86/asus-nb-wmi.c
index cc5f0765a8d9..357d273ed336 100644
--- a/drivers/platform/x86/asus-nb-wmi.c
+++ b/drivers/platform/x86/asus-nb-wmi.c
@@ -438,6 +438,7 @@ static void asus_nb_wmi_quirks(struct asus_wmi_driver 
*driver)
 
if (acpi_dev_found("ASUS7000")) {
driver->quirks->force_dsts = true;
+   driver->quirks->wmi_event_queue = true;
}
 }
 
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index 80f3447734fc..5aa30f8a2a38 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -80,6 +80,12 @@ MODULE_LICENSE("GPL");
 #define USB_INTEL_XUSB2PR  0xD0
 #define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI  0x9c31
 
+#define WMI_EVENT_QUEUE_SIZE   0x10
+#define WMI_EVENT_QUEUE_END0x1
+#define WMI_EVENT_MASK 0x
+/* The event value is always the same. */
+#define WMI_EVENT_VALUE0xFF
+
 static const char * const ashs_ids[] = { "ATK4001", "ATK4002", NULL };
 
 static bool ashs_present(void)
@@ -143,6 +149,7 @@ struct asus_wmi {
int dsts_id;
int spec;
int sfun;
+   bool wmi_event_queue;
 
struct input_dev *inputdev;
struct backlight_device *backlight_device;
@@ -1637,77 +1644,126 @@ static int is_display_toggle(int code)
return 0;
 }
 
-static void asus_wmi_notify(u32 value, void *context)
+static int asus_poll_wmi_event(u32 value)
 {
-   struct asus_wmi *asus = context;
-   struct acpi_buffer response = { ACPI_ALLOCATE_BUFFER, NULL };
+   struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL };
union acpi_object *obj;
acpi_status status;
-   int code;
-   int orig_code;
-   unsigned int key_value = 1;
-   bool autorelease = 1;
+   int code = -EIO;
 
-   status = wmi_get_event_data(value, );
-   if (status != AE_OK) {
-   pr_err("bad event status 0x%x\n", status);
-   return;
+   status = wmi_get_event_data(value, );
+   if (ACPI_FAILURE(status)) {
+   pr_warn("Failed to get WMI event code: %s\n",
+   acpi_format_exception(status));
+   return code;
}
 
-   obj = (union acpi_object *)response.pointer;
+   obj = (union acpi_object *)output.pointer;
 
-

Re: [PATCH v3] init: Do not select DEBUG_KERNEL by default

2019-04-10 Thread Sinan Kaya


On 4/11/2019 1:31 AM, Masahiro Yamada wrote:

t looks like CONFIG_KALLSYMS_ALL is the only feature that
requires CONFIG_DEBUG_KERNEL.

Which part of KALLSYMS_ALL code requires CONFIG_DEBUG_KERNEL?



I was going by what Kconfig tells me

Symbol: KALLSYMS_ALL [=n]
 Depends on: DEBUG_KERNEL [=n] && KALLSYMS [=y]

[PATCH v2 04/11] platform/x86: asus-wmi: Add quirk to force DSTS WMI method detection

The DSTS method detection fails, as nothing is returned if method is not
defined in WMNB. As a result the control of keyboard backlight is not
functional for TUF Gaming series laptops (at the time the only
functionality of the driver on this model implemented with WMI methods).

Patch was tested on a newer TUF Gaming FX505GM and older K54C model.

FX505GM:
Method (WMNB, 3, Serialized)
{ ...
If ((Local0 == 0x53545344))
{
...
Return (Zero)
}
...
// No return
}

K54C:
Method (WMNB, 3, Serialized)
{ ...
If ((Local0 == 0x53545344))
{
...
Return (0x02)
}
...
Return (0xFFFE)
}

The non-existing method ASUS_WMI_METHODID_DSTS=0x53544344 (actually it is
DCTS in little endian ASCII) is selected in asus->dsts.

One way to fix this would be to call both for every known device ID until
some answers - this would increase module load time.

Another option is to check some device that is known to exist on every
model - none known at the time.

Last option, which is implemented, is to check for presence of the
ASUS7000 device in ACPI tree (it is a dummy device), which is the
condition used for loading the vendor driver for this model. This might
not fix every affected model ever produced, but it likely does not
introduce any regressions. The patch introduces a quirk that is enabled
when ASUS7000 is found.

Scope (_SB)
{
Device (ATK)
{
Name (_HID, "ASUS7000")  // _HID: Hardware ID
}
}

Signed-off-by: Yurii Pavlovskyi 
---
 drivers/platform/x86/asus-nb-wmi.c |  5 +
 drivers/platform/x86/asus-wmi.c| 14 --
 drivers/platform/x86/asus-wmi.h|  5 +
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/asus-nb-wmi.c 
b/drivers/platform/x86/asus-nb-wmi.c
index b6f2ff95c3ed..cc5f0765a8d9 100644
--- a/drivers/platform/x86/asus-nb-wmi.c
+++ b/drivers/platform/x86/asus-nb-wmi.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "asus-wmi.h"
 
@@ -434,6 +435,10 @@ static void asus_nb_wmi_quirks(struct asus_wmi_driver 
*driver)
}
pr_info("Using i8042 filter function for receiving events\n");
}
+
+   if (acpi_dev_found("ASUS7000")) {
+   driver->quirks->force_dsts = true;
+   }
 }
 
 static const struct key_entry asus_nb_wmi_keymap[] = {
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index cfccfc0b8c2f..80f3447734fc 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -1885,11 +1885,21 @@ static int asus_wmi_platform_init(struct asus_wmi *asus)
 * Note, on most Eeepc, there is no way to check if a method exist
 * or note, while on notebooks, they returns 0xFFFE on failure,
 * but once again, SPEC may probably be used for that kind of things.
+*
+* Additionally at least TUF Gaming series laptops return 0 for unknown
+* methods, so the detection in this way is not possible and method must
+* be forced. Likely the presence of ACPI device ASUS7000 indicates
+* this.
 */
-   if (!asus_wmi_evaluate_method(ASUS_WMI_METHODID_DSTS, 0, 0, NULL))
+   if (asus->driver->quirks->force_dsts) {
+   pr_info("DSTS method forced\n");
+   asus->dsts_id = ASUS_WMI_METHODID_DSTS2;
+   } else if (!asus_wmi_evaluate_method(ASUS_WMI_METHODID_DSTS,
+   0, 0, NULL)) {
asus->dsts_id = ASUS_WMI_METHODID_DSTS;
-   else
+   } else {
asus->dsts_id = ASUS_WMI_METHODID_DSTS2;
+   }
 
/* CWAP allow to define the behavior of the Fn+F2 key,
 * this method doesn't seems to be present on Eee PCs */
diff --git a/drivers/platform/x86/asus-wmi.h b/drivers/platform/x86/asus-wmi.h
index 6c1311f4b04d..94056da02fde 100644
--- a/drivers/platform/x86/asus-wmi.h
+++ b/drivers/platform/x86/asus-wmi.h
@@ -54,6 +54,11 @@ struct quirk_entry {
 */
int no_display_toggle;
u32 xusb2pr;
+   /**
+* Force DSTS instead of DSCS and skip detection. Useful if WMNB
+* returns nothing on unknown method call.
+*/
+   bool force_dsts;
 
bool (*i8042_filter)(unsigned char data, unsigned char str,
 struct serio *serio);
-- 
2.17.1

Re: [PATCH 00/11] asus-wmi: Support of ASUS TUF Gaming series laptops

Hi,

sorry, just realized, I've broken the logging. I will re-post patches 04 to 11 
as replies to original ones, 1 to 3 were ok.

Re: [PATCH v3] init: Do not select DEBUG_KERNEL by default

2019-04-10 Thread Masahiro Yamada

On Thu, Apr 11, 2019 at 11:47 AM Kees Cook  wrote:
>
> On Wed, Apr 10, 2019 at 5:56 PM Sinan Kaya  wrote:
> >
> > We can't seem to have a kernel with CONFIG_EXPERT set but
> > CONFIG_DEBUG_KERNEL unset these days.
> >
> > While some of the features under the CONFIG_EXPERT require
> > CONFIG_DEBUG_KERNEL, it doesn't apply for all features.
> >
> > It looks like CONFIG_KALLSYMS_ALL is the only feature that
> > requires CONFIG_DEBUG_KERNEL.
> >
> > Select CONFIG_EXPERT when CONFIG_DEBUG_KERNEL is chosen but
> > you can still choose CONFIG_EXPERT without CONFIG_DEBUG_KERNEL.
> >
> > Signed-off-by: Sinan Kaya 
> > Reviewed-by: Kees Cook 
>
> Masahiro, should this go via your tree, or somewhere else?


I think somewhere else.


-- 
Best Regards
Masahiro Yamada

Re: [External] Re: Basics : Memory Configuration

2019-04-10 Thread Pankaj Suryawanshi

From: Christopher Lameter 
Sent: 09 April 2019 21:31
To: Pankaj Suryawanshi
Cc: linux-kernel@vger.kernel.org; linux...@kvack.org
Subject: [External] Re: Basics : Memory Configuration

On Tue, 9 Apr 2019, Pankaj Suryawanshi wrote:

> I am confuse about memory configuration and I have below questions

Hmmm... Yes some of the terminology that you use is a bit confusing.

> 1. if 32-bit os maximum virtual address is 4GB, When i have 4 gb of ram
> for 32-bit os, What about the virtual memory size ? is it required
> virtual memory(disk space) or we can directly use physical memory ?

The virtual memory size is the maximum virtual size of a single process.
Multiple processes can run and each can use different amounts of physical
memory. So both are actually independent.

Okay Got it.

The size of the virtual memory space per process is configurable on x86 32
bit (2G, 3G, 4G). Thus the possible virtual process size may vary
depending on the hardware architecture and the configuration of the
kernel.

Another Questions -
- Q. If i configures VMSPLIT = 2G/2G what does it mean ?
- Q. Disk Space is used by Virtual Memory ? If this is true, than without 
secondary storage there is no virtual memory ?
let say for 32-bit os i have 4GB ram than what is the use case of 
virtual memory ?

> 2. In 32-bit os 12 bits are offset because page size=4k i.e 2^12 and
> 2^20 for page addresses
>What about 64-bit os, What is offset size ? What is page size ? How it 
> calculated.

12 bits are passed through? Thats what you mean?

The remainder of the bits  are used to lookup the physical frame
number(PFN) in the page tables.

64 bit is the same. However, the number of bits used for lookups in the
page tables are much higher.

- Q. for 32-bit os page size is 4k, what is the page size for 64-bit os ? page 
size and offset is related to each other ?
- Q. if i increase the page size from 4k to 8k, does it change the offset size 
that it 2^12 to 2^13 ?
- Q. Why only 48 bits are used in 64-bit os ?

> 3. What is PAE? If enabled how to decide size of PAE, what is maximum
> and minimum size of extended memory.

PAE increases the physical memory size that can be addressed through a
page table lookup. The number of bits that can be specified in the PFN is
increased and thus more than 4GB of physical memory can be used by the
operating system. However, the virtual memory size stays the same and an
individual process still cannot use more memory.

- Q. Let say i enabled PAE for 32-bit os with 6GB ram.Virtual size is same 4GB, 
32-bit os cant address more thatn 4gb, Than what is the use of 6GB with PAE 
enabled.
*
 eInfochips Business Disclaimer: This e-mail message and all attachments 
transmitted with it are intended solely for the use of the addressee and may 
contain legally privileged and confidential information. If the reader of this 
message is not the intended recipient, or an employee or agent responsible for 
delivering this message to the intended recipient, you are hereby notified that 
any dissemination, distribution, copying, or other use of this message or its 
attachments is strictly prohibited. If you have received this message in error, 
please notify the sender immediately by replying to this message and please 
delete it from your computer. Any views expressed in this message are those of 
the individual sender unless otherwise stated. Company has taken enough 
precautions to prevent the spread of viruses. However the company accepts no 
liability for any damage caused by any virus transmitted by this email. 
*

Re: [PATCH v3] init: Do not select DEBUG_KERNEL by default

2019-04-10 Thread Masahiro Yamada

On Thu, Apr 11, 2019 at 9:59 AM Sinan Kaya  wrote:
>
> We can't seem to have a kernel with CONFIG_EXPERT set but
> CONFIG_DEBUG_KERNEL unset these days.
>
> While some of the features under the CONFIG_EXPERT require
> CONFIG_DEBUG_KERNEL, it doesn't apply for all features.
>
> It looks like CONFIG_KALLSYMS_ALL is the only feature that
> requires CONFIG_DEBUG_KERNEL.

Which part of KALLSYMS_ALL code requires CONFIG_DEBUG_KERNEL?



> Select CONFIG_EXPERT when CONFIG_DEBUG_KERNEL is chosen but
> you can still choose CONFIG_EXPERT without CONFIG_DEBUG_KERNEL.
>
> Signed-off-by: Sinan Kaya 
> Reviewed-by: Kees Cook 
> ---
>  init/Kconfig  | 2 --
>  lib/Kconfig.debug | 1 +
>  2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 4592bf7997c0..37e10a8391a3 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1206,8 +1206,6 @@ config BPF
>
>  menuconfig EXPERT
> bool "Configure standard kernel features (expert users)"
> -   # Unhide debug options, to make the on-by-default options visible
> -   select DEBUG_KERNEL
> help
>   This option allows certain base kernel options and settings
>to be disabled or tweaked. This is for specialized
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 0d9e81779e37..9fbf3499ec8d 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -434,6 +434,7 @@ config MAGIC_SYSRQ_SERIAL
>
>  config DEBUG_KERNEL
> bool "Kernel debugging"
> +   default EXPERT
> help
>   Say Y here if you are developing drivers or trying to debug and
>   identify kernel problems.
> --
> 2.21.0
>


--
Best Regards

Masahiro Yamada

Re: \\ 答复: [PATCH] of: del redundant type conversion

On 4/10/19 9:21 PM, Frank Rowand wrote:
> On 4/10/19 9:13 PM, Frank Rowand wrote:
>> On 4/10/19 6:51 PM, xiaojiangfeng wrote:
>>> My pleasure.
>>>
>>> I am very new to sparse.
>>>
>>> I guess the warning is caused by the macro min.
>>
>> I think the warning is likely because the type of data is 'void *'.
>>
>> Removing the (int) cast is a good fix, but does not resolve
>> the sparse warning.
> 
> Let me correct myself.  When I ran sparse, I see the removing min() does
> eliminate the sparse warning.  I'm not sure why, so I'll go dig a little
> deeper.

Digging leaves me with more information, but still not sure of the actual
underlying cause.  min() is defined in include/linux/kernel.h.  Unraveling
the defines, the code that sparse is complaining about is in
__no_side_effects(), which is:

#define __no_side_effects(x, y) \
(__is_constexpr(x) && __is_constexpr(y))

and __is_constexpr() is:

#define __is_constexpr(x) \
(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))

The compiler warning points to the second sizeof() in the __is_constexpr() for
'l', which expands as:

  (sizeof(int) == sizeof(*(8 ? ((void *)((long)(   l) * 0l)) : (int *)8)))

I'll dig into this a little more, to see if maybe the problem is related to
my compiler version or sparse version.  Or if the reason lies elsewhere.

-Frank


> 
> -Frank
> 
>>
>> -Frank
>>
>>
>>> Then I submitted my changes.
>>>
>>> Thanks for code review.
>>>
>>>
>>> -邮件原件-
>>> 发件人: Frank Rowand [mailto:frowand.l...@gmail.com] 
>>> 发送时间: 2019年4月11日 2:50
>>> 收件人: xiaojiangfeng ; robh...@kernel.org; 
>>> r...@kernel.org
>>> 抄送: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org
>>> 主题: Re: [PATCH] of: del redundant type conversion
>>>
>>> On 4/10/19 1:29 AM, xiaojiangfeng wrote:
 The type of variable l in early_init_dt_scan_chosen is int, there is 
 no need to convert to int.

 Signed-off-by: xiaojiangfeng 
 ---
  drivers/of/fdt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 
 4734223..de893c9 100644
 --- a/drivers/of/fdt.c
 +++ b/drivers/of/fdt.c
 @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long 
 node, const char *uname,
/* Retrieve command line */
p = of_get_flat_dt_prop(node, "bootargs", );
if (p != NULL && l > 0)
 -  strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));
 +  strlcpy(data, p, min(l, COMMAND_LINE_SIZE));
  
/*
 * CONFIG_CMDLINE is meant to be a default in case nothing else

>>>
>>> Thanks for catching the redundant cast.
>>>
>>> There is a second problem detected by sparse on that line:
>>>
>>>   drivers/of/fdt.c:1094:34: warning: expression using sizeof(void)
>>>
>>> Can you please fix both issues?
>>>
>>> Thanks,
>>>
>>> Frank
>>>
>>
>>
> 
>

Re: [PATCH 1/2] soc: imx: gpc: use devm_platform_ioremap_resource() to simplify code

On Mon, Apr 01, 2019 at 06:07:08AM +, Anson Huang wrote:
> Use the new helper devm_platform_ioremap_resource() which wraps the
> platform_get_resource() and devm_ioremap_resource() together, to
> simplify the code.
> 
> Signed-off-by: Anson Huang 

Applied both, thanks.

[PATCH] arm64: dts: ls1028a: Add USB dt nodes

2019-04-10 Thread Ran Wang

This patch adds USB dt nodes for LS1028A.

Signed-off-by: Ran Wang 
---
 arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
index 8dd3501..d4bc314 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
@@ -144,6 +144,26 @@
clocks = <>;
};
 
+   usb0:usb3@310 {
+   compatible= "snps,dwc3";
+   reg= <0x0 0x310 0x0 0x1>;
+   interrupts= <0 80 0x4>;
+   dr_mode= "host";
+   snps,dis_rxdet_inp3_quirk;
+   snps,quirk-frame-length-adjustment = <0x20>;
+   snps,incr-burst-type-adjustment = <1>, <4>, <8>, <16>;
+   };
+
+   usb1:usb3@311 {
+   compatible= "snps,dwc3";
+   reg= <0x0 0x311 0x0 0x1>;
+   interrupts= <0 81 0x4>;
+   dr_mode= "host";
+   snps,dis_rxdet_inp3_quirk;
+   snps,quirk-frame-length-adjustment = <0x20>;
+   snps,incr-burst-type-adjustment = <1>, <4>, <8>, <16>;
+   };
+
i2c0: i2c@200 {
compatible = "fsl,vf610-i2c";
#address-cells = <1>;
-- 
1.7.1

Re: [PATCH 2/2] x86/pci: Clean up usage of X86_DEV_DMA_OPS

2019-04-10 Thread Christoph Hellwig

On Wed, Apr 10, 2019 at 04:45:01PM -0500, Bjorn Helgaas wrote:
> [+cc Keith, Jonathan (VMD guys)]
> 
> I'm OK with this from a PCI perspective.  It would be nice if
> 
>   dma_domain_list
>   dma_domain_list_lock
>   add_dma_domain()
>   del_dma_domain()
>   set_dma_domain_ops()
> 
> could all be moved to vmd.c, since they're really only used there.

I have another patch to eventually kill that, but it will need a little
more prep work and thus be delayed to the next merge window.

Re: [PATCH] clk: imx: use devm_platform_ioremap_resource() to simplify code

On Mon, Apr 01, 2019 at 05:13:02AM +, Anson Huang wrote:
> Use the new helper devm_platform_ioremap_resource() which wraps the
> platform_get_resource() and devm_ioremap_resource() together, to
> simplify the code.
> 
> Signed-off-by: Anson Huang 

Applied, thanks.

Re: [PATCH] of: del redundant type conversion

On 4/10/19 1:29 AM, xiaojiangfeng wrote:
> The type of variable l in early_init_dt_scan_chosen is
> int, there is no need to convert to int.
> 
> Signed-off-by: xiaojiangfeng 
> ---
>  drivers/of/fdt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 4734223..de893c9 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long 
> node, const char *uname,
>   /* Retrieve command line */
>   p = of_get_flat_dt_prop(node, "bootargs", );
>   if (p != NULL && l > 0)
> - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));
> + strlcpy(data, p, min(l, COMMAND_LINE_SIZE));
>  
>   /*
>* CONFIG_CMDLINE is meant to be a default in case nothing else
> 

My first reply to this patch asked for a sparse warning on this line to
also be fixed.  After xiaojiangfeng followed up with a different patch
to try to resolve the issues with this line of code, I see that the
sparse warning was not caused by my first conjecture and this patch
is the correct one to apply.

I will pursue the cause of the sparse warning myself separately.

Reviewed-by: Frank Rowand

Re: [PATCH] of: fix expression using sizeof(void)

On 4/10/19 6:47 PM, xiaojiangfeng wrote:
> problem detected by sparse:
> drivers/of/fdt.c:1094:34: warning: expression using sizeof(void)
> 
> Signed-off-by: xiaojiangfeng 
> ---
>  drivers/of/fdt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 4734223..75c6c55 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long 
> node, const char *uname,
>   /* Retrieve command line */
>   p = of_get_flat_dt_prop(node, "bootargs", );
>   if (p != NULL && l > 0)
> - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));
> + strlcpy(data, p, COMMAND_LINE_SIZE);
>  
>   /*
>* CONFIG_CMDLINE is meant to be a default in case nothing else
> 

The fuller discussion is in the thread where you first attempted to fix an
issue with the line of code and I reported a sparse error against this line.

After digging deeper, your first patch is valid, removing min() here is not
the correct approach.  I will add my Reviewed-by to the first patch and I
will pursue the sparse warning separately.

Thanks,

Frank

Re: [PATCH v1 00/15] Refactor pgalloc stuff

2019-04-10 Thread Aneesh Kumar K.V

Christophe Leroy  writes:

> This series converts book3e64 to pte_fragment and refactor
> things that are common among subarches.
>
> Christophe Leroy (15):
>   powerpc/mm: drop __bad_pte()
>   powerpc/mm: define __pud_free_tlb() at all time on nohash/64
>   powerpc/mm: convert Book3E 64 to pte_fragment
>   powerpc/mm: move pgtable_t in asm/mmu.h
>   powerpc/mm: get rid of nohash/32/mmu.h and nohash/64/mmu.h
>   powerpc/Kconfig: select PPC_MM_SLICES from subarch type
>   powerpc/book3e: move early_alloc_pgtable() to init section
>   powerpc/mm: don't use pte_alloc_kernel() until slab is available on
> PPC32
>   powerpc/mm: inline pte_alloc_one_kernel() and pte_alloc_one() on PPC32
>   powerpc/mm: refactor pte_alloc_one() and pte_free() families
> definition.
>   powerpc/mm: refactor definition of pgtable_cache[]
>   powerpc/mm: Only keep one version of pmd_populate() functions on
> nohash/32
>   powerpc/mm: refactor pgtable freeing functions on nohash
>   powerpc/mm: refactor pmd_pgtable()
>   powerpc/mm: refactor pgd_alloc() and pgd_free() on nohash
>
>  arch/powerpc/include/asm/book3s/32/mmu-hash.h |   4 -
>  arch/powerpc/include/asm/book3s/32/pgalloc.h  |  41 -
>  arch/powerpc/include/asm/book3s/64/mmu.h  |   8 --
>  arch/powerpc/include/asm/book3s/64/pgalloc.h  |  49 --
>  arch/powerpc/include/asm/mmu.h|   3 +
>  arch/powerpc/include/asm/mmu_context.h|   6 --
>  arch/powerpc/include/asm/nohash/32/mmu.h  |  25 --
>  arch/powerpc/include/asm/nohash/32/pgalloc.h  | 123 
> ++
>  arch/powerpc/include/asm/nohash/64/mmu.h  |  12 ---
>  arch/powerpc/include/asm/nohash/64/pgalloc.h  | 117 +---
>  arch/powerpc/include/asm/nohash/mmu.h |  16 +++-
>  arch/powerpc/include/asm/nohash/pgalloc.h |  56 
>  arch/powerpc/include/asm/pgalloc.h|  51 +++
>  arch/powerpc/mm/Makefile  |   4 +-
>  arch/powerpc/mm/mmu_context.c |   2 +-
>  arch/powerpc/mm/pgtable-book3e.c  |   4 +-
>  arch/powerpc/mm/pgtable_32.c  |  42 +
>  arch/powerpc/platforms/Kconfig.cputype|   4 +-
>  18 files changed, 165 insertions(+), 402 deletions(-)
>  delete mode 100644 arch/powerpc/include/asm/nohash/32/mmu.h
>  delete mode 100644 arch/powerpc/include/asm/nohash/64/mmu.h
>
> -- 
> 2.13.3

Looks good. You can add for the series

Reviewed-by: Aneesh Kumar K.V

Re: [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects

On Thu, Apr 11, 2019 at 05:47:46AM +0100, Al Viro wrote:
> On Thu, Apr 11, 2019 at 12:48:21PM +1000, Tobin C. Harding wrote:
> 
> > Oh, so putting entries on a shrink list is enough to pin them?
> 
> Not exactly pin, but __dentry_kill() has this:
> if (dentry->d_flags & DCACHE_SHRINK_LIST) {
> dentry->d_flags |= DCACHE_MAY_FREE;
> can_free = false;
> }
> spin_unlock(>d_lock);
> if (likely(can_free))
> dentry_free(dentry);
> and shrink_dentry_list() - this:
> if (dentry->d_lockref.count < 0)
> can_free = dentry->d_flags & DCACHE_MAY_FREE;
> spin_unlock(>d_lock);
> if (can_free)
> dentry_free(dentry);
>   continue;
> so if dentry destruction comes before we get around to
> shrink_dentry_list(), it'll stop short of dentry_free() and mark it for
> shrink_dentry_list() to do just dentry_free(); if it overlaps with
> shrink_dentry_list(), but doesn't progress all the way to freeing,
> we will
>   * have dentry removed from shrink list
>   * notice the negative ->d_count (i.e. that it has already reached
> __dentry_kill())
>   * see that __dentry_kill() is not through with tearing the sucker
> apart (no DCACHE_MAY_FREE set)
> ... and just leave it alone, letting __dentry_kill() do the rest of its
> thing - it's already off the shrink list, so __dentry_kill() will do
> everything, including dentry_free().
> 
> The reason for that dance is the locking - shrink list belongs to whoever
> has set it up and nobody else is modifying it.  So __dentry_kill() doesn't
> even try to remove the victim from there; it does all the teardown
> (detaches from inode, unhashes, etc.) and leaves removal from the shrink
> list and actual freeing to the owner of shrink list.  That way we don't
> have to protect all shrink lists a single lock (contention on it would
> be painful) and we don't have to play with per-shrink-list locks and
> all the attendant headaches (those lists usually live on stack frame
> of some function, so just having the lock next to the list_head would
> do us no good, etc.).  Much easier to have the shrink_dentry_list()
> do all the manipulations...
> 
> The bottom line is, once it's on a shrink list, it'll stay there
> until shrink_dentry_list().  It may get extra references after
> being inserted there (e.g. be found by hash lookup), it may drop
> those, whatever - it won't get freed until we run shrink_dentry_list().
> If it ends up with extra references, no problem - shrink_dentry_list()
> will just kick it off the shrink list and leave it alone.
> 
> Note, BTW, that umount coming between isolate and drop is not a problem;
> it call shrink_dcache_parent() on the root.  And if shrink_dcache_parent()
> finds something on (another) shrink list, it won't put it to the shrink
> list of its own, but it will make note of that and repeat the scan in
> such case.  So if we find something with zero refcount and not on
> shrink list, we can move it to our shrink list and be sure that its
> superblock won't go away under us...

Man, that was good to read.  Thanks for taking the time to write this.


Tobin

Re: [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects

2019-04-10 Thread Al Viro

On Thu, Apr 11, 2019 at 12:48:21PM +1000, Tobin C. Harding wrote:

> Oh, so putting entries on a shrink list is enough to pin them?

Not exactly pin, but __dentry_kill() has this:
if (dentry->d_flags & DCACHE_SHRINK_LIST) {
dentry->d_flags |= DCACHE_MAY_FREE;
can_free = false;
}
spin_unlock(>d_lock);
if (likely(can_free))
dentry_free(dentry);
and shrink_dentry_list() - this:
if (dentry->d_lockref.count < 0)
can_free = dentry->d_flags & DCACHE_MAY_FREE;
spin_unlock(>d_lock);
if (can_free)
dentry_free(dentry);
continue;
so if dentry destruction comes before we get around to
shrink_dentry_list(), it'll stop short of dentry_free() and mark it for
shrink_dentry_list() to do just dentry_free(); if it overlaps with
shrink_dentry_list(), but doesn't progress all the way to freeing,
we will
* have dentry removed from shrink list
* notice the negative ->d_count (i.e. that it has already reached
__dentry_kill())
* see that __dentry_kill() is not through with tearing the sucker
apart (no DCACHE_MAY_FREE set)
... and just leave it alone, letting __dentry_kill() do the rest of its
thing - it's already off the shrink list, so __dentry_kill() will do
everything, including dentry_free().

The reason for that dance is the locking - shrink list belongs to whoever
has set it up and nobody else is modifying it.  So __dentry_kill() doesn't
even try to remove the victim from there; it does all the teardown
(detaches from inode, unhashes, etc.) and leaves removal from the shrink
list and actual freeing to the owner of shrink list.  That way we don't
have to protect all shrink lists a single lock (contention on it would
be painful) and we don't have to play with per-shrink-list locks and
all the attendant headaches (those lists usually live on stack frame
of some function, so just having the lock next to the list_head would
do us no good, etc.).  Much easier to have the shrink_dentry_list()
do all the manipulations...

The bottom line is, once it's on a shrink list, it'll stay there
until shrink_dentry_list().  It may get extra references after
being inserted there (e.g. be found by hash lookup), it may drop
those, whatever - it won't get freed until we run shrink_dentry_list().
If it ends up with extra references, no problem - shrink_dentry_list()
will just kick it off the shrink list and leave it alone.

Note, BTW, that umount coming between isolate and drop is not a problem;
it call shrink_dcache_parent() on the root.  And if shrink_dcache_parent()
finds something on (another) shrink list, it won't put it to the shrink
list of its own, but it will make note of that and repeat the scan in
such case.  So if we find something with zero refcount and not on
shrink list, we can move it to our shrink list and be sure that its
superblock won't go away under us...

[PATCH] vfs: update d_make_root() description

2019-04-10 Thread Ian Kent

Clearify d_make_root() usage, error handling and cleanup
requirements.

Signed-off-by: Ian Kent 
---
 Documentation/filesystems/porting |   15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/porting 
b/Documentation/filesystems/porting
index cf43bc4dbf31..1ebc1c6eb64b 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -428,8 +428,19 @@ release it yourself.
 --
 [mandatory]
d_alloc_root() is gone, along with a lot of bugs caused by code
-misusing it.  Replacement: d_make_root(inode).  The difference is,
-d_make_root() drops the reference to inode if dentry allocation fails.  
+misusing it.  Replacement: d_make_root(inode).  On success d_make_root(inode)
+allocates and returns a new dentry instantiated with the passed in inode.
+On failure NULL is returned and the passed in inode is dropped so failure
+handling need not do any cleanup for the inode. If d_make_root(inode)
+is passed a NULL inode it returns NULL and also requires no further
+error handling. Typical usage is:
+
+   inode = foofs_new_inode();
+   s->s_root = d_make_inode(inode);
+   if (!s->s_root)
+   /* Nothing needed for the inode cleanup */
+   return -ENOMEM;
+   ...
 
 --
 [mandatory]

Re: \\ 答复: [PATCH] of: del redundant type conversion

On 4/10/19 9:13 PM, Frank Rowand wrote:
> On 4/10/19 6:51 PM, xiaojiangfeng wrote:
>> My pleasure.
>>
>> I am very new to sparse.
>>
>> I guess the warning is caused by the macro min.
> 
> I think the warning is likely because the type of data is 'void *'.
> 
> Removing the (int) cast is a good fix, but does not resolve
> the sparse warning.

Let me correct myself.  When I ran sparse, I see the removing min() does
eliminate the sparse warning.  I'm not sure why, so I'll go dig a little
deeper.

-Frank

> 
> -Frank
> 
> 
>> Then I submitted my changes.
>>
>> Thanks for code review.
>>
>>
>> -邮件原件-
>> 发件人: Frank Rowand [mailto:frowand.l...@gmail.com] 
>> 发送时间: 2019年4月11日 2:50
>> 收件人: xiaojiangfeng ; robh...@kernel.org; 
>> r...@kernel.org
>> 抄送: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org
>> 主题: Re: [PATCH] of: del redundant type conversion
>>
>> On 4/10/19 1:29 AM, xiaojiangfeng wrote:
>>> The type of variable l in early_init_dt_scan_chosen is int, there is 
>>> no need to convert to int.
>>>
>>> Signed-off-by: xiaojiangfeng 
>>> ---
>>>  drivers/of/fdt.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 
>>> 4734223..de893c9 100644
>>> --- a/drivers/of/fdt.c
>>> +++ b/drivers/of/fdt.c
>>> @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long 
>>> node, const char *uname,
>>> /* Retrieve command line */
>>> p = of_get_flat_dt_prop(node, "bootargs", );
>>> if (p != NULL && l > 0)
>>> -   strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));
>>> +   strlcpy(data, p, min(l, COMMAND_LINE_SIZE));
>>>  
>>> /*
>>>  * CONFIG_CMDLINE is meant to be a default in case nothing else
>>>
>>
>> Thanks for catching the redundant cast.
>>
>> There is a second problem detected by sparse on that line:
>>
>>   drivers/of/fdt.c:1094:34: warning: expression using sizeof(void)
>>
>> Can you please fix both issues?
>>
>> Thanks,
>>
>> Frank
>>
> 
>

Re: \\ 答复: [PATCH] of: del redundant type conversion

On 4/10/19 6:51 PM, xiaojiangfeng wrote:
> My pleasure.
> 
> I am very new to sparse.
> 
> I guess the warning is caused by the macro min.

I think the warning is likely because the type of data is 'void *'.

Removing the (int) cast is a good fix, but does not resolve
the sparse warning.

-Frank


> Then I submitted my changes.
> 
> Thanks for code review.
> 
> 
> -邮件原件-
> 发件人: Frank Rowand [mailto:frowand.l...@gmail.com] 
> 发送时间: 2019年4月11日 2:50
> 收件人: xiaojiangfeng ; robh...@kernel.org; 
> r...@kernel.org
> 抄送: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org
> 主题: Re: [PATCH] of: del redundant type conversion
> 
> On 4/10/19 1:29 AM, xiaojiangfeng wrote:
>> The type of variable l in early_init_dt_scan_chosen is int, there is 
>> no need to convert to int.
>>
>> Signed-off-by: xiaojiangfeng 
>> ---
>>  drivers/of/fdt.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 
>> 4734223..de893c9 100644
>> --- a/drivers/of/fdt.c
>> +++ b/drivers/of/fdt.c
>> @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long 
>> node, const char *uname,
>>  /* Retrieve command line */
>>  p = of_get_flat_dt_prop(node, "bootargs", );
>>  if (p != NULL && l > 0)
>> -strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));
>> +strlcpy(data, p, min(l, COMMAND_LINE_SIZE));
>>  
>>  /*
>>   * CONFIG_CMDLINE is meant to be a default in case nothing else
>>
> 
> Thanks for catching the redundant cast.
> 
> There is a second problem detected by sparse on that line:
> 
>   drivers/of/fdt.c:1094:34: warning: expression using sizeof(void)
> 
> Can you please fix both issues?
> 
> Thanks,
> 
> Frank
>

Re: [PATCH v3 1/2] cpufreq: Add sunxi nvmem based CPU scaling driver

2019-04-10 Thread Viresh Kumar

On 10-04-19, 13:41, Yangtao Li wrote:
> For some SoCs, the CPU frequency subset and voltage value of each OPP
> varies based on the silicon variant in use. The sunxi-cpufreq-nvmem
> driver reads the efuse value from the SoC to provide the OPP framework
> with required information.
> 
> Signed-off-by: Yangtao Li 
> ---
>  MAINTAINERS   |   7 +
>  drivers/cpufreq/Kconfig.arm   |  10 ++
>  drivers/cpufreq/Makefile  |   1 +
>  drivers/cpufreq/cpufreq-dt-platdev.c  |   2 +
>  drivers/cpufreq/sunxi-cpufreq-nvmem.c | 232 ++
>  5 files changed, 252 insertions(+)
>  create mode 100644 drivers/cpufreq/sunxi-cpufreq-nvmem.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 391405091c6b..bfd18ba6aa1a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -667,6 +667,13 @@ S:   Maintained
>  F:   Documentation/i2c/busses/i2c-ali1563
>  F:   drivers/i2c/busses/i2c-ali1563.c
>  
> +ALLWINNER CPUFREQ DRIVER
> +M:   Yangtao Li 
> +L:   linux...@vger.kernel.org
> +S:   Maintained
> +F:   Documentation/devicetree/bindings/opp/sunxi-nvmem-cpufreq.txt
> +F:   drivers/cpufreq/sunxi-cpufreq-nvmem.c
> +
>  ALLWINNER SECURITY SYSTEM
>  M:   Corentin Labbe 
>  L:   linux-cry...@vger.kernel.org
> diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
> index 179a1d302f48..25933c4321a7 100644
> --- a/drivers/cpufreq/Kconfig.arm
> +++ b/drivers/cpufreq/Kconfig.arm
> @@ -18,6 +18,16 @@ config ACPI_CPPC_CPUFREQ
>  
> If in doubt, say N.
>  
> +config ARM_ALLWINNER_CPUFREQ_NVMEM
> + tristate "Allwinner nvmem based CPUFreq"
> + depends on ARCH_SUNXI
> + depends on NVMEM_SUNXI_SID
> + select PM_OPP
> + help
> +   This adds the CPUFreq driver for Allwinner nvmem based SoC.
> +
> +   If in doubt, say N.
> +
>  config ARM_ARMADA_37XX_CPUFREQ
>   tristate "Armada 37xx CPUFreq support"
>   depends on ARCH_MVEBU && CPUFREQ_DT
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index 689b26c6f949..da28de67613c 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -78,6 +78,7 @@ obj-$(CONFIG_ARM_SCMI_CPUFREQ)  += 
> scmi-cpufreq.o
>  obj-$(CONFIG_ARM_SCPI_CPUFREQ)   += scpi-cpufreq.o
>  obj-$(CONFIG_ARM_SPEAR_CPUFREQ)  += spear-cpufreq.o
>  obj-$(CONFIG_ARM_STI_CPUFREQ)+= sti-cpufreq.o
> +obj-$(CONFIG_ARM_ALLWINNER_CPUFREQ_NVMEM) += sunxi-cpufreq-nvmem.o
>  obj-$(CONFIG_ARM_TANGO_CPUFREQ)  += tango-cpufreq.o
>  obj-$(CONFIG_ARM_TEGRA20_CPUFREQ)+= tegra20-cpufreq.o
>  obj-$(CONFIG_ARM_TEGRA124_CPUFREQ)   += tegra124-cpufreq.o
> diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c 
> b/drivers/cpufreq/cpufreq-dt-platdev.c
> index 47729a22c159..50e7810f3a28 100644
> --- a/drivers/cpufreq/cpufreq-dt-platdev.c
> +++ b/drivers/cpufreq/cpufreq-dt-platdev.c
> @@ -105,6 +105,8 @@ static const struct of_device_id whitelist[] __initconst 
> = {
>   * platforms using "operating-points-v2" property.
>   */
>  static const struct of_device_id blacklist[] __initconst = {
> + { .compatible = "allwinner,sun50i-h6", },
> +
>   { .compatible = "calxeda,highbank", },
>   { .compatible = "calxeda,ecx-2000", },
>  
> diff --git a/drivers/cpufreq/sunxi-cpufreq-nvmem.c 
> b/drivers/cpufreq/sunxi-cpufreq-nvmem.c
> new file mode 100644
> index ..6bf4755d00d9
> --- /dev/null
> +++ b/drivers/cpufreq/sunxi-cpufreq-nvmem.c
> @@ -0,0 +1,232 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Allwinner CPUFreq nvmem based driver
> + *
> + * The sunxi-cpufreq-nvmem driver reads the efuse value from the SoC to
> + * provide the OPP framework with required information.
> + *
> + * Copyright (C) 2019 Yangtao Li 
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define MAX_NAME_LEN  7
> +
> +struct sunxi_cpufreq_soc_data {
> + u32 (*efuse_xlate)(const struct sunxi_cpufreq_soc_data *soc_data,
> +u32 efuse);
> + u32 nvmem_mask;
> + u32 nvmem_shift;
> +};
> +
> +static struct platform_device *cpufreq_dt_pdev, *sunxi_cpufreq_pdev;
> +
> +static u32 sun50i_efuse_xlate(const struct sunxi_cpufreq_soc_data *soc_data,
> +   u32 efuse)
> +{
> + return (efuse >> soc_data->nvmem_shift) & soc_data->nvmem_mask;
> +}
> +
> +/**
> + * sunxi_cpufreq_get_efuse() - Parse and return efuse value present on SoC
> + * @soc_data: Pointer to sunxi_cpufreq_soc_data context
> + * @versions: Set to the value parsed from efuse
> + *
> + * Returns 0 if success.
> + */
> +static int sunxi_cpufreq_get_efuse(const struct sunxi_cpufreq_soc_data 
> *soc_data,
> +u32 *versions)
> +{
> + struct nvmem_cell *speedbin_nvmem;
> + struct device_node *np;
> + struct device *cpu_dev;
> + u32 *speedbin;
> + size_t len;
> + int ret;
> +
> + cpu_dev

[v2 PATCH 6/9] mm: vmscan: don't demote for memcg reclaim

The memcg reclaim happens when the limit is breached, but demotion just
migrate pages to the other node instead of reclaiming them.  This sounds
pointless to memcg reclaim since the usage is not reduced at all.

Signed-off-by: Yang Shi 
---
 mm/vmscan.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2a96609..80cd624 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1046,8 +1046,12 @@ static void page_check_dirty_writeback(struct page *page,
mapping->a_ops->is_dirty_writeback(page, dirty, writeback);
 }
 
-static inline bool is_demote_ok(int nid)
+static inline bool is_demote_ok(int nid, struct scan_control *sc)
 {
+   /* It is pointless to do demotion in memcg reclaim */
+   if (!global_reclaim(sc))
+   return false;
+
/* Current node is cpuless node */
if (!node_state(nid, N_CPU_MEM))
return false;
@@ -1267,7 +1271,7 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
 * Demotion only happen from primary nodes
 * to cpuless nodes.
 */
-   if (is_demote_ok(page_to_nid(page))) {
+   if (is_demote_ok(page_to_nid(page), sc)) {
list_add(>lru, _pages);
unlock_page(page);
continue;
@@ -2219,7 +2223,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, 
bool file,
 * deactivation is pointless.
 */
if (!file && !total_swap_pages &&
-   !is_demote_ok(pgdat->node_id))
+   !is_demote_ok(pgdat->node_id, sc))
return false;
 
inactive = lruvec_lru_size(lruvec, inactive_lru, sc->reclaim_idx);
@@ -2306,7 +2310,7 @@ static void get_scan_count(struct lruvec *lruvec, struct 
mem_cgroup *memcg,
 *
 * If current node is already PMEM node, demotion is not applicable.
 */
-   if (!is_demote_ok(pgdat->node_id)) {
+   if (!is_demote_ok(pgdat->node_id, sc)) {
/*
 * If we have no swap space, do not bother scanning
 * anon pages.
@@ -2315,18 +2319,18 @@ static void get_scan_count(struct lruvec *lruvec, 
struct mem_cgroup *memcg,
scan_balance = SCAN_FILE;
goto out;
}
+   }
 
-   /*
-* Global reclaim will swap to prevent OOM even with no
-* swappiness, but memcg users want to use this knob to
-* disable swapping for individual groups completely when
-* using the memory controller's swap limit feature would be
-* too expensive.
-*/
-   if (!global_reclaim(sc) && !swappiness) {
-   scan_balance = SCAN_FILE;
-   goto out;
-   }
+   /*
+* Global reclaim will swap to prevent OOM even with no
+* swappiness, but memcg users want to use this knob to
+* disable swapping for individual groups completely when
+* using the memory controller's swap limit feature would be
+* too expensive.
+*/
+   if (!global_reclaim(sc) && !swappiness) {
+   scan_balance = SCAN_FILE;
+   goto out;
}
 
/*
@@ -2675,7 +2679,7 @@ static inline bool should_continue_reclaim(struct 
pglist_data *pgdat,
 */
pages_for_compaction = compact_gap(sc->order);
inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
-   if (get_nr_swap_pages() > 0 || is_demote_ok(pgdat->node_id))
+   if (get_nr_swap_pages() > 0 || is_demote_ok(pgdat->node_id, sc))
inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
if (sc->nr_reclaimed < pages_for_compaction &&
inactive_lru_pages > pages_for_compaction)
@@ -3373,7 +3377,7 @@ static void age_active_anon(struct pglist_data *pgdat,
struct mem_cgroup *memcg;
 
/* Aging anon page as long as demotion is fine */
-   if (!total_swap_pages && !is_demote_ok(pgdat->node_id))
+   if (!total_swap_pages && !is_demote_ok(pgdat->node_id, sc))
return;
 
memcg = mem_cgroup_iter(NULL, NULL, NULL);
-- 
1.8.3.1

[v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

With Dave Hansen's patches merged into Linus's tree

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308fd01d9fb33a16f64d2fd95f8830a4

PMEM could be hot plugged as NUMA node now. But, how to use PMEM as NUMA node
effectively and efficiently is still a question.

There have been a couple of proposals posted on the mailing list [1] [2] [3].

Changelog
=
v1 --> v2:
* Dropped the default allocation node mask. The memory placement restriction
could be achieved by mempolicy or cpuset.
* Dropped the new mempolicy since its semantic is not that clear yet.
* Dropped PG_Promote flag.
* Defined N_CPU_MEM nodemask for the nodes which have both CPU and memory.
* Extended page_check_references() to implement "twice access" check for
anonymous page in NUMA balancing path.
* Reworked the memory demotion code.

v1:
https://lore.kernel.org/linux-mm/1553316275-21985-1-git-send-email-yang@linux.alibaba.com/

Design
==
Basically, the approach is aimed to spread data from DRAM (closest to local
CPU) down further to PMEM and disk (typically assume the lower tier storage
is slower, larger and cheaper than the upper tier) by their hotness. The
patchset tries to achieve this goal by doing memory promotion/demotion via
NUMA balancing and memory reclaim as what the below diagram shows:

DRAM <--> PMEM <--> Disk
^ ^
|---|
swap

When DRAM has memory pressure, demote pages to PMEM via page reclaim path.
Then NUMA balancing will promote pages to DRAM as long as the page is referenced
again. The memory pressure on PMEM node would push the inactive pages of PMEM
to disk via swap.

The promotion/demotion happens only between "primary" nodes (the nodes have
both CPU and memory) and PMEM nodes. No promotion/demotion between PMEM nodes
and promotion from DRAM to PMEM and demotion from PMEM to DRAM.

The HMAT is effectively going to enforce "cpu-less" nodes for any memory range
that has differentiated performance from the conventional memory pool, or
differentiated performance for a specific initiator, per Dan Williams. So,
assuming PMEM nodes are cpuless nodes sounds reasonable.

However, cpuless nodes might be not PMEM nodes. But, actually, memory
promotion/demotion doesn't care what kind of memory will be the target nodes,
it could be DRAM, PMEM or something else, as long as they are the second tier
memory (slower, larger and cheaper than regular DRAM), otherwise it sounds
pointless to do such demotion.

Defined "N_CPU_MEM" nodemask for the nodes which have both CPU and memory in
order to distinguish with cpuless nodes (memory only, i.e. PMEM nodes) and
memoryless nodes (some architectures, i.e. Power, may have memoryless nodes).
Typically, memory allocation would happen on such nodes by default unless
cpuless nodes are specified explicitly, cpuless nodes would be just fallback
nodes, so they are also as known as "primary" nodes in this patchset. With
two tier memory system (i.e. DRAM + PMEM), this sounds good enough to
demonstrate the promotion/demotion approach for now, and this looks more
architecture-independent. But it may be better to construct such node mask
by reading hardware information (i.e. HMAT), particularly for more complex
memory hierarchy.

To reduce memory thrashing and PMEM bandwidth pressure, promote twice faulted
page in NUMA balancing. Implement "twice access" check by extending
page_check_references() for anonymous pages.

When doing demotion, demote to the less-contended local PMEM node. If the
local PMEM node is contended (i.e. migrate_pages() returns -ENOMEM), just do
swap instead of demotion. To make things simple, demotion to the remote PMEM
node is not allowed for now if the local PMEM node is online. If the local
PMEM node is not online, just demote to the remote one. If no PMEM node online,
just do normal swap.

Anonymous page only for the time being since NUMA balancing can't promote
unmapped page cache.

Added vmstat counters for pgdemote_kswapd, pgdemote_direct and
numa_pages_promoted.

There are definitely still some details need to be sorted out, for example,
shall respect to mempolicy in demotion path, etc.

Any comment is welcome.

Test

The stress test was done with mmtests + applications workload (i.e. sysbench,
grep, etc).

Generate memory pressure by running mmtest's usemem-stress-numa-compact,
then run other applications as workload to stress the promotion and demotion
path. The machine was still alive after the stress test had been running for
~30 hours. The /proc/vmstat also shows:

...
pgdemote_kswapd 3316563
pgdemote_direct 1930721
...
numa_pages_promoted 81838

TODO

1. Promote page cache. There are a couple of ways to handle this in kernel,
i.e. promote via active LRU in reclaim path on PMEM node, or promote in
mark_page_accessed().

2. Promote/demote HugeTLB. Now HugeTLB is not on LRU and NUMA balancing just
skips it.

3. May

[v2 PATCH 7/9] mm: vmscan: check if the demote target node is contended or not

When demoting to PMEM node, the target node may have memory pressure,
then the memory pressure may cause migrate_pages() fail.

If the failure is caused by memory pressure (i.e. returning -ENOMEM),
tag the node with PGDAT_CONTENDED.  The tag would be cleared once the
target node is balanced again.

Check if the target node is PGDAT_CONTENDED or not, if it is just skip
demotion.

Signed-off-by: Yang Shi 
---
 include/linux/mmzone.h |  3 +++
 mm/vmscan.c| 28 
 2 files changed, 31 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fba7741..de534db 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -520,6 +520,9 @@ enum pgdat_flags {
 * many pages under writeback
 */
PGDAT_RECLAIM_LOCKED,   /* prevents concurrent reclaim */
+   PGDAT_CONTENDED,/* the node has not enough free memory
+* available
+*/
 };
 
 enum zone_flags {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 80cd624..50cde53 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1048,6 +1048,9 @@ static void page_check_dirty_writeback(struct page *page,
 
 static inline bool is_demote_ok(int nid, struct scan_control *sc)
 {
+   int node;
+   nodemask_t used_mask;
+
/* It is pointless to do demotion in memcg reclaim */
if (!global_reclaim(sc))
return false;
@@ -1060,6 +1063,13 @@ static inline bool is_demote_ok(int nid, struct 
scan_control *sc)
if (!has_cpuless_node_online())
return false;
 
+   /* Check if the demote target node is contended or not */
+   nodes_clear(used_mask);
+   node = find_next_best_node(nid, _mask, true);
+
+   if (test_bit(PGDAT_CONTENDED, _DATA(node)->flags))
+   return false;
+
return true;
 }
 
@@ -1502,6 +1512,10 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
nr_reclaimed += nr_succeeded;
 
if (err) {
+   if (err == -ENOMEM)
+   set_bit(PGDAT_CONTENDED,
+   _DATA(target_nid)->flags);
+
putback_movable_pages(_pages);
 
list_splice(_pages, _pages);
@@ -2596,6 +2610,19 @@ static void shrink_node_memcg(struct pglist_data *pgdat, 
struct mem_cgroup *memc
 * scan target and the percentage scanning already complete
 */
lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE;
+
+   /*
+* The shrink_page_list() may find the demote target node is
+* contended, if so it doesn't make sense to scan anonymous
+* LRU again.
+*
+* Need check if swap is available or not too since demotion
+* may happen on swapless system.
+*/
+   if (!is_demote_ok(pgdat->node_id, sc) &&
+   (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0))
+   lru = LRU_FILE;
+
nr_scanned = targets[lru] - nr[lru];
nr[lru] = targets[lru] * (100 - percentage) / 100;
nr[lru] -= min(nr[lru], nr_scanned);
@@ -3458,6 +3485,7 @@ static void clear_pgdat_congested(pg_data_t *pgdat)
clear_bit(PGDAT_CONGESTED, >flags);
clear_bit(PGDAT_DIRTY, >flags);
clear_bit(PGDAT_WRITEBACK, >flags);
+   clear_bit(PGDAT_CONTENDED, >flags);
 }
 
 /*
-- 
1.8.3.1

[v2 PATCH 5/9] mm: vmscan: demote anon DRAM pages to PMEM node

Since PMEM provides larger capacity than DRAM and has much lower
access latency than disk, so it is a good choice to use as a middle
tier between DRAM and disk in page reclaim path.

With PMEM nodes, the demotion path of anonymous pages could be:

DRAM -> PMEM -> swap device

This patch demotes anonymous pages only for the time being and demote
THP to PMEM in a whole.  To avoid expensive page reclaim and/or
compaction on PMEM node if there is memory pressure on it, the most
conservative gfp flag is used, which would fail quickly if there is
memory pressure and just wakeup kswapd on failure.  The migrate_pages()
would split THP to migrate one by one as base page upon THP allocation
failure.

Demote pages to the cloest non-DRAM node even though the system is
swapless.  The current logic of page reclaim just scan anon LRU when
swap is on and swappiness is set properly.  Demoting to PMEM doesn't
need care whether swap is available or not.  But, reclaiming from PMEM
still skip anon LRU if swap is not available.

The demotion just happens from DRAM node to its cloest PMEM node.
Demoting to a remote PMEM node or migrating from PMEM to DRAM on reclaim
is not allowed for now.

And, define a new migration reason for demotion, called MR_DEMOTE.
Demote page via async migration to avoid blocking.

Signed-off-by: Yang Shi 
---
 include/linux/gfp.h|  12 
 include/linux/migrate.h|   1 +
 include/trace/events/migrate.h |   3 +-
 mm/debug.c |   1 +
 mm/internal.h  |  13 +
 mm/migrate.c   |  15 -
 mm/vmscan.c| 127 +++--
 7 files changed, 149 insertions(+), 23 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fdab7de..57ced51 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -285,6 +285,14 @@
  * available and will not wake kswapd/kcompactd on failure. The _LIGHT
  * version does not attempt reclaim/compaction at all and is by default used
  * in page fault path, while the non-light is used by khugepaged.
+ *
+ * %GFP_DEMOTE is for migration on memory reclaim (a.k.a demotion) allocations.
+ * The allocation might happen in kswapd or direct reclaim, so assuming
+ * __GFP_IO and __GFP_FS are not allowed looks safer.  Demotion happens for
+ * user pages (on LRU) only and on specific node.  Generally it will fail
+ * quickly if memory is not available, but may wake up kswapd on failure.
+ *
+ * %GFP_TRANSHUGE_DEMOTE is used for THP demotion allocation.
  */
 #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
@@ -300,6 +308,10 @@
 #define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
 __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
 #define GFP_TRANSHUGE  (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
+#define GFP_DEMOTE (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_NORETRY | \
+   __GFP_NOMEMALLOC | __GFP_NOWARN | __GFP_THISNODE | \
+   GFP_NOWAIT)
+#define GFP_TRANSHUGE_DEMOTE   (GFP_DEMOTE | __GFP_COMP)
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 837fdd1..cfb1f57 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -25,6 +25,7 @@ enum migrate_reason {
MR_MEMPOLICY_MBIND,
MR_NUMA_MISPLACED,
MR_CONTIG_RANGE,
+   MR_DEMOTE,
MR_TYPES
 };
 
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index 705b33d..c1d5b36 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -20,7 +20,8 @@
EM( MR_SYSCALL, "syscall_or_cpuset")\
EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind")  \
EM( MR_NUMA_MISPLACED,  "numa_misplaced")   \
-   EMe(MR_CONTIG_RANGE,"contig_range")
+   EM( MR_CONTIG_RANGE,"contig_range") \
+   EMe(MR_DEMOTE,  "demote")
 
 /*
  * First define the enums in the above macros to be exported to userspace
diff --git a/mm/debug.c b/mm/debug.c
index c0b31b6..cc0d7df 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -25,6 +25,7 @@
"mempolicy_mbind",
"numa_misplaced",
"cma",
+   "demote",
 };
 
 const struct trace_print_flags pageflag_names[] = {
diff --git a/mm/internal.h b/mm/internal.h
index bee4d6c..8c424b5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -383,6 +383,19 @@ static inline int find_next_best_node(int node, nodemask_t 
*used_node_mask,
 }
 #endif
 
+static inline bool has_cpuless_node_online(void)
+{
+   nodemask_t nmask;
+
+   nodes_andnot(nmask, node_states[N_MEMORY],
+node_states[N_CPU_MEM]);
+
+   if (nodes_empty(nmask))
+   return false;
+
+   return true;
+}

[v2 PATCH 9/9] mm: numa: add page promotion counter

Add counter for page promotion for NUMA balancing.

Signed-off-by: Yang Shi 
---
 include/linux/vm_event_item.h | 1 +
 mm/huge_memory.c  | 4 
 mm/memory.c   | 4 
 mm/vmstat.c   | 1 +
 4 files changed, 10 insertions(+)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 499a3aa..9f52a62 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -51,6 +51,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
NUMA_HINT_FAULTS,
NUMA_HINT_FAULTS_LOCAL,
NUMA_PAGE_MIGRATE,
+   NUMA_PAGE_PROMOTE,
 #endif
 #ifdef CONFIG_MIGRATION
PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0b18ac45..ca9d688 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1609,6 +1609,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, 
pmd_t pmd)
migrated = migrate_misplaced_transhuge_page(vma->vm_mm, vma,
vmf->pmd, pmd, vmf->address, page, target_nid);
if (migrated) {
+   if (!node_state(page_nid, N_CPU_MEM) &&
+   node_state(target_nid, N_CPU_MEM))
+   count_vm_numa_events(NUMA_PAGE_PROMOTE, HPAGE_PMD_NR);
+
flags |= TNF_MIGRATED;
page_nid = target_nid;
} else
diff --git a/mm/memory.c b/mm/memory.c
index 01c1ead..7b1218b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3704,6 +3704,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
/* Migrate to the requested node */
migrated = migrate_misplaced_page(page, vma, target_nid);
if (migrated) {
+   if (!node_state(page_nid, N_CPU_MEM) &&
+   node_state(target_nid, N_CPU_MEM))
+   count_vm_numa_event(NUMA_PAGE_PROMOTE);
+
page_nid = target_nid;
flags |= TNF_MIGRATED;
} else
diff --git a/mm/vmstat.c b/mm/vmstat.c
index d1e4993..fd194e3 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1220,6 +1220,7 @@ int fragmentation_index(struct zone *zone, unsigned int 
order)
"numa_hint_faults",
"numa_hint_faults_local",
"numa_pages_migrated",
+   "numa_pages_promoted",
 #endif
 #ifdef CONFIG_MIGRATION
"pgmigrate_success",
-- 
1.8.3.1

[v2 PATCH 1/9] mm: define N_CPU_MEM node states

Kernel has some pre-defined node masks called node states, i.e.
N_MEMORY, N_CPU, etc.  But, there might be cpuless nodes, i.e. PMEM
nodes, and some architectures, i.e. Power, may have memoryless nodes.
It is not very straight forward to get the nodes with both CPUs and
memory.  So, define N_CPU_MEMORY node states.  The nodes with both CPUs
and memory are called "primary" nodes.  /sys/devices/system/node/primary
would show the current online "primary" nodes.

Signed-off-by: Yang Shi 
---
 drivers/base/node.c  |  2 ++
 include/linux/nodemask.h |  3 ++-
 mm/memory_hotplug.c  |  6 ++
 mm/page_alloc.c  |  1 +
 mm/vmstat.c  | 11 +--
 5 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 86d6cd9..1b963b2 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -634,6 +634,7 @@ static ssize_t show_node_state(struct device *dev,
 #endif
[N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY),
[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
+   [N_CPU_MEM] = _NODE_ATTR(primary, N_CPU_MEM),
 };
 
 static struct attribute *node_state_attrs[] = {
@@ -645,6 +646,7 @@ static ssize_t show_node_state(struct device *dev,
 #endif
_state_attr[N_MEMORY].attr.attr,
_state_attr[N_CPU].attr.attr,
+   _state_attr[N_CPU_MEM].attr.attr,
NULL
 };
 
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 27e7fa3..66a8964 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -398,7 +398,8 @@ enum node_states {
N_HIGH_MEMORY = N_NORMAL_MEMORY,
 #endif
N_MEMORY,   /* The node has memory(regular, high, movable) 
*/
-   N_CPU,  /* The node has one or more cpus */
+   N_CPU,  /* The node has one or more cpus */
+   N_CPU_MEM,  /* The node has both cpus and memory */
NR_NODE_STATES
 };
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f767582..1140f3b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -729,6 +729,9 @@ static void node_states_set_node(int node, struct 
memory_notify *arg)
 
if (arg->status_change_nid >= 0)
node_set_state(node, N_MEMORY);
+
+   if (node_state(node, N_CPU))
+   node_set_state(node, N_CPU_MEM);
 }
 
 static void __meminit resize_zone_range(struct zone *zone, unsigned long 
start_pfn,
@@ -1569,6 +1572,9 @@ static void node_states_clear_node(int node, struct 
memory_notify *arg)
 
if (arg->status_change_nid >= 0)
node_clear_state(node, N_MEMORY);
+
+   if (node_state(node, N_CPU))
+   node_clear_state(node, N_CPU_MEM);
 }
 
 static int __ref __offline_pages(unsigned long start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 03fcf73..7cd88a4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,6 +122,7 @@ struct pcpu_drain {
 #endif
[N_MEMORY] = { { [0] = 1UL } },
[N_CPU] = { { [0] = 1UL } },
+   [N_CPU_MEM] = { { [0] = 1UL } },
 #endif /* NUMA */
 };
 EXPORT_SYMBOL(node_states);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 36b56f8..1a431dc 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1910,15 +1910,22 @@ static void __init init_cpu_node_state(void)
int node;
 
for_each_online_node(node) {
-   if (cpumask_weight(cpumask_of_node(node)) > 0)
+   if (cpumask_weight(cpumask_of_node(node)) > 0) {
node_set_state(node, N_CPU);
+   if (node_state(node, N_MEMORY))
+   node_set_state(node, N_CPU_MEM);
+   }
}
 }
 
 static int vmstat_cpu_online(unsigned int cpu)
 {
+   int node = cpu_to_node(cpu);
+
refresh_zone_stat_thresholds();
-   node_set_state(cpu_to_node(cpu), N_CPU);
+   node_set_state(node, N_CPU);
+   if (node_state(node, N_MEMORY))
+   node_set_state(node, N_CPU_MEM);
return 0;
 }
 
-- 
1.8.3.1

[v2 PATCH 8/9] mm: vmscan: add page demotion counter

Account the number of demoted pages into reclaim_state->nr_demoted.

Add pgdemote_kswapd and pgdemote_direct VM counters showed in
/proc/vmstat.

Signed-off-by: Yang Shi 
---
 include/linux/vm_event_item.h | 2 ++
 include/linux/vmstat.h| 1 +
 mm/internal.h | 1 +
 mm/vmscan.c   | 7 +++
 mm/vmstat.c   | 2 ++
 5 files changed, 13 insertions(+)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 47a3441..499a3aa 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -32,6 +32,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
PGREFILL,
PGSTEAL_KSWAPD,
PGSTEAL_DIRECT,
+   PGDEMOTE_KSWAPD,
+   PGDEMOTE_DIRECT,
PGSCAN_KSWAPD,
PGSCAN_DIRECT,
PGSCAN_DIRECT_THROTTLE,
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 2db8d60..eb5d21c 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -29,6 +29,7 @@ struct reclaim_stat {
unsigned nr_activate;
unsigned nr_ref_keep;
unsigned nr_unmap_fail;
+   unsigned nr_demoted;
 };
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
diff --git a/mm/internal.h b/mm/internal.h
index 8c424b5..8ba4853 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -156,6 +156,7 @@ struct scan_control {
unsigned int immediate;
unsigned int file_taken;
unsigned int taken;
+   unsigned int demoted;
} nr;
 };
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 50cde53..a52c8248 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1511,6 +1511,12 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
 
nr_reclaimed += nr_succeeded;
 
+   stat->nr_demoted = nr_succeeded;
+   if (current_is_kswapd())
+   __count_vm_events(PGDEMOTE_KSWAPD, stat->nr_demoted);
+   else
+   __count_vm_events(PGDEMOTE_DIRECT, stat->nr_demoted);
+
if (err) {
if (err == -ENOMEM)
set_bit(PGDAT_CONTENDED,
@@ -2019,6 +2025,7 @@ static int current_may_throttle(void)
sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
sc->nr.writeback += stat.nr_writeback;
sc->nr.immediate += stat.nr_immediate;
+   sc->nr.demoted += stat.nr_demoted;
sc->nr.taken += nr_taken;
if (file)
sc->nr.file_taken += nr_taken;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1a431dc..d1e4993 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1192,6 +1192,8 @@ int fragmentation_index(struct zone *zone, unsigned int 
order)
"pgrefill",
"pgsteal_kswapd",
"pgsteal_direct",
+   "pgdemote_kswapd",
+   "pgdemote_direct",
"pgscan_kswapd",
"pgscan_direct",
"pgscan_direct_throttle",
-- 
1.8.3.1

[v2 PATCH 3/9] mm: numa: promote pages to DRAM when it gets accessed twice

NUMA balancing would promote the pages to DRAM once it is accessed, but
it might be just one off access.  To reduce migration thrashing and
memory bandwidth pressure, just promote the page which gets accessed
twice by extending page_check_references() to support second reference
algorithm for anonymous page.

The page_check_reference() would walk all mapped pte or pmd to check if
the page is referenced or not, but such walk sounds unnecessary to NUMA
balancing since NUMA balancing would have pte or pmd referenced bit set
all the time, so anonymous page would have at least one referenced pte
or pmd.  And, distinguish with page reclaim path via scan_control,
scan_control would be NULL in NUMA balancing path.

This approach is not definitely the optimal one to distinguish the
hot or cold pages accurately.  It may need much more sophisticated
algorithm to distinguish hot or cold pages accurately.

Signed-off-by: Yang Shi 
---
 mm/huge_memory.c |  11 ++
 mm/internal.h|  80 ++
 mm/memory.c  |  21 ++
 mm/vmscan.c  | 116 ---
 4 files changed, 146 insertions(+), 82 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 404acdc..0b18ac45 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1590,6 +1590,17 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, 
pmd_t pmd)
}
 
/*
+* Promote the page when it gets NUMA fault twice.
+* It is safe to set page flag since the page is locked now.
+*/
+   if (!node_state(page_nid, N_CPU_MEM) &&
+   page_check_references(page, NULL) != PAGEREF_PROMOTE) {
+   put_page(page);
+   page_nid = NUMA_NO_NODE;
+   goto clear_pmdnuma;
+   }
+
+   /*
 * Migrate the THP to the requested node, returns with page unlocked
 * and access rights restored.
 */
diff --git a/mm/internal.h b/mm/internal.h
index a514808..bee4d6c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -89,8 +89,88 @@ static inline void set_page_refcounted(struct page *page)
 /*
  * in mm/vmscan.c:
  */
+struct scan_control {
+   /* How many pages shrink_list() should reclaim */
+   unsigned long nr_to_reclaim;
+
+   /*
+* Nodemask of nodes allowed by the caller. If NULL, all nodes
+* are scanned.
+*/
+   nodemask_t  *nodemask;
+
+   /*
+* The memory cgroup that hit its limit and as a result is the
+* primary target of this reclaim invocation.
+*/
+   struct mem_cgroup *target_mem_cgroup;
+
+   /* Writepage batching in laptop mode; RECLAIM_WRITE */
+   unsigned int may_writepage:1;
+
+   /* Can mapped pages be reclaimed? */
+   unsigned int may_unmap:1;
+
+   /* Can pages be swapped as part of reclaim? */
+   unsigned int may_swap:1;
+
+   /* e.g. boosted watermark reclaim leaves slabs alone */
+   unsigned int may_shrinkslab:1;
+
+   /*
+* Cgroups are not reclaimed below their configured memory.low,
+* unless we threaten to OOM. If any cgroups are skipped due to
+* memory.low and nothing was reclaimed, go back for memory.low.
+*/
+   unsigned int memcg_low_reclaim:1;
+   unsigned int memcg_low_skipped:1;
+
+   unsigned int hibernation_mode:1;
+
+   /* One of the zones is ready for compaction */
+   unsigned int compaction_ready:1;
+
+   /* Allocation order */
+   s8 order;
+
+   /* Scan (total_size >> priority) pages at once */
+   s8 priority;
+
+   /* The highest zone to isolate pages for reclaim from */
+   s8 reclaim_idx;
+
+   /* This context's GFP mask */
+   gfp_t gfp_mask;
+
+   /* Incremented by the number of inactive pages that were scanned */
+   unsigned long nr_scanned;
+
+   /* Number of pages freed so far during a call to shrink_zones() */
+   unsigned long nr_reclaimed;
+
+   struct {
+   unsigned int dirty;
+   unsigned int unqueued_dirty;
+   unsigned int congested;
+   unsigned int writeback;
+   unsigned int immediate;
+   unsigned int file_taken;
+   unsigned int taken;
+   } nr;
+};
+
+enum page_references {
+   PAGEREF_RECLAIM,
+   PAGEREF_RECLAIM_CLEAN,
+   PAGEREF_KEEP,
+   PAGEREF_ACTIVATE,
+   PAGEREF_PROMOTE = PAGEREF_ACTIVATE,
+};
+
 extern int isolate_lru_page(struct page *page);
 extern void putback_lru_page(struct page *page);
+enum page_references page_check_references(struct page *page,
+  struct scan_control *sc);
 
 /*
  * in mm/rmap.c:
diff --git a/mm/memory.c b/mm/memory.c
index 47fe250..01c1ead 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3680,6 +3680,27 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
goto out;
}
 
+   /*
+* Promote

[v2 PATCH 4/9] mm: migrate: make migrate_pages() return nr_succeeded

The migrate_pages() returns the number of pages that were not migrated,
or an error code.  When returning an error code, there is no way to know
how many pages were migrated or not migrated.

In the following patch, migrate_pages() is used to demote pages to PMEM
node, we need account how many pages are reclaimed (demoted) since page
reclaim behavior depends on this.  Add *nr_succeeded parameter to make
migrate_pages() return how many pages are demoted successfully for all
cases.

Signed-off-by: Yang Shi 
---
 include/linux/migrate.h |  5 +++--
 mm/compaction.c |  3 ++-
 mm/gup.c|  4 +++-
 mm/memory-failure.c |  7 +--
 mm/memory_hotplug.c |  4 +++-
 mm/mempolicy.c  |  7 +--
 mm/migrate.c| 18 ++
 mm/page_alloc.c |  4 +++-
 8 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e13d9bf..837fdd1 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -66,7 +66,8 @@ extern int migrate_page(struct address_space *mapping,
struct page *newpage, struct page *page,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
-   unsigned long private, enum migrate_mode mode, int reason);
+   unsigned long private, enum migrate_mode mode, int reason,
+   unsigned int *nr_succeeded);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -84,7 +85,7 @@ extern int migrate_page_move_mapping(struct address_space 
*mapping,
 static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
-   int reason)
+   int reason, unsigned int *nr_succeeded)
{ return -ENOSYS; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
diff --git a/mm/compaction.c b/mm/compaction.c
index f171a83..c6a0ec4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2065,6 +2065,7 @@ bool compaction_zonelist_suitable(struct alloc_context 
*ac, int order,
unsigned long last_migrated_pfn;
const bool sync = cc->mode != MIGRATE_ASYNC;
bool update_cached;
+   unsigned int nr_succeeded = 0;
 
cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
@@ -2173,7 +2174,7 @@ bool compaction_zonelist_suitable(struct alloc_context 
*ac, int order,
 
err = migrate_pages(>migratepages, compaction_alloc,
compaction_free, (unsigned long)cc, cc->mode,
-   MR_COMPACTION);
+   MR_COMPACTION, _succeeded);
 
trace_mm_compaction_migratepages(cc->nr_migratepages, err,
>migratepages);
diff --git a/mm/gup.c b/mm/gup.c
index f84e226..b482b8c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1217,6 +1217,7 @@ static long check_and_migrate_cma_pages(unsigned long 
start, long nr_pages,
long i;
bool drain_allow = true;
bool migrate_allow = true;
+   unsigned int nr_succeeded = 0;
LIST_HEAD(cma_page_list);
 
 check_again:
@@ -1257,7 +1258,8 @@ static long check_and_migrate_cma_pages(unsigned long 
start, long nr_pages,
put_page(pages[i]);
 
if (migrate_pages(_page_list, new_non_cma_page,
- NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+ NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE,
+ _succeeded)) {
/*
 * some of the pages failed migration. Do get_user_pages
 * without migration.
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fc8b517..b5d8a8f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1686,6 +1686,7 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
int ret;
unsigned long pfn = page_to_pfn(page);
struct page *hpage = compound_head(page);
+   unsigned int nr_succeeded = 0;
LIST_HEAD(pagelist);
 
/*
@@ -1713,7 +1714,7 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
}
 
ret = migrate_pages(, new_page, NULL, MPOL_MF_MOVE_ALL,
-   MIGRATE_SYNC, MR_MEMORY_FAILURE);
+   MIGRATE_SYNC, MR_MEMORY_FAILURE, _succeeded);
if (ret) {
pr_info("soft offline: %#lx: hugepage migration failed %d, type 
%lx (%pGp)\n",
pfn, ret, page->flags, >flags);
@@ -1742,6 +1743,7

[v2 PATCH 2/9] mm: page_alloc: make find_next_best_node find return cpuless node

Need find the cloest cpuless node to demote DRAM pages.  Add
"cpuless" parameter to find_next_best_node() to skip DRAM node on
demand.

Signed-off-by: Yang Shi 
---
 mm/internal.h   | 11 +++
 mm/page_alloc.c | 14 ++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 9eeaf2b..a514808 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -292,6 +292,17 @@ static inline bool is_data_mapping(vm_flags_t flags)
return (flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE;
 }
 
+#ifdef CONFIG_NUMA
+extern int find_next_best_node(int node, nodemask_t *used_node_mask,
+  bool cpuless);
+#else
+static inline int find_next_best_node(int node, nodemask_t *used_node_mask,
+ bool cpuless)
+{
+   return 0;
+}
+#endif
+
 /* mm/util.c */
 void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma,
struct vm_area_struct *prev, struct rb_node *rb_parent);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7cd88a4..bda17c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5362,6 +5362,7 @@ int numa_zonelist_order_handler(struct ctl_table *table, 
int write,
  * find_next_best_node - find the next node that should appear in a given 
node's fallback list
  * @node: node whose fallback list we're appending
  * @used_node_mask: nodemask_t of already used nodes
+ * @cpuless: find next best cpuless node
  *
  * We use a number of factors to determine which is the next node that should
  * appear on a given node's fallback list.  The node should not have appeared
@@ -5373,7 +5374,8 @@ int numa_zonelist_order_handler(struct ctl_table *table, 
int write,
  *
  * Return: node id of the found node or %NUMA_NO_NODE if no node is found.
  */
-static int find_next_best_node(int node, nodemask_t *used_node_mask)
+int find_next_best_node(int node, nodemask_t *used_node_mask,
+   bool cpuless)
 {
int n, val;
int min_val = INT_MAX;
@@ -5381,13 +5383,18 @@ static int find_next_best_node(int node, nodemask_t 
*used_node_mask)
const struct cpumask *tmp = cpumask_of_node(0);
 
/* Use the local node if we haven't already */
-   if (!node_isset(node, *used_node_mask)) {
+   if (!node_isset(node, *used_node_mask) &&
+   !cpuless) {
node_set(node, *used_node_mask);
return node;
}
 
for_each_node_state(n, N_MEMORY) {
 
+   /* Find next best cpuless node */
+   if (cpuless && (node_state(n, N_CPU)))
+   continue;
+
/* Don't want a node to appear more than once */
if (node_isset(n, *used_node_mask))
continue;
@@ -5419,7 +5426,6 @@ static int find_next_best_node(int node, nodemask_t 
*used_node_mask)
return best_node;
 }
 
-
 /*
  * Build zonelists ordered by node and zones within node.
  * This results in maximum locality--normal zone overflows into local
@@ -5481,7 +5487,7 @@ static void build_zonelists(pg_data_t *pgdat)
nodes_clear(used_mask);
 
memset(node_order, 0, sizeof(node_order));
-   while ((node = find_next_best_node(local_node, _mask)) >= 0) {
+   while ((node = find_next_best_node(local_node, _mask, false)) >= 
0) {
/*
 * We don't want to pressure a particular node.
 * So adding penalty to the first node in same
-- 
1.8.3.1

[PATCH RESEND] fs: drop unused fput_atomic definition

2019-04-10 Thread Lukas Bulwahn

commit d7065da03822 ("get rid of the magic around f_count in aio") added
fput_atomic to include/linux/fs.h, motivated by its use in __aio_put_req()
in fs/aio.c.

Later, commit 3ffa3c0e3f6e ("aio: now fput() is OK from interrupt context;
get rid of manual delayed __fput()") removed the only use of fput_atomic
in __aio_put_req(), but did not remove the since then unused fput_atomic
definition in include/linux/fs.h.

We curate this now and finally remove the unused definition.

This issue was identified during a code review due to a coccinelle warning
from the atomic_as_refcounter.cocci rule pointing to the use of atomic_t
in fput_atomic.

Suggested-by: Krystian Radlak 
Signed-off-by: Lukas Bulwahn 
---
v1:
  - sent on 2018-01-12, got no response
https://lore.kernel.org/lkml/20190112055430.5860-1-lukas.bulw...@gmail.com/

v1 resend:
  - rebased to v5.1-rc4

  - added Jens to recipient list as he touched the place lately closeby
in commit 091141a42e15 ("fs: add fget_many() and fput_many()")

  - compile-tested with defconfig on v5.1-rc4 and next-20190410


 include/linux/fs.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index dd28e7679089..79b2f43b945d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -969,7 +969,6 @@ static inline struct file *get_file(struct file *f)
 #define get_file_rcu_many(x, cnt)  \
atomic_long_add_unless(&(x)->f_count, (cnt), 0)
 #define get_file_rcu(x) get_file_rcu_many((x), 1)
-#define fput_atomic(x) atomic_long_add_unless(&(x)->f_count, -1, 1)
 #define file_count(x)  atomic_long_read(&(x)->f_count)
 
 #defineMAX_NON_LFS ((1UL<<31) - 1)
-- 
2.17.1

Re: [PATCH v2] init: Do not select DEBUG_KERNEL by default

2019-04-10 Thread Randy Dunlap

On 4/10/19 8:02 PM, Josh Triplett wrote:
> On April 10, 2019 4:24:18 PM PDT, Kees Cook  wrote:
>> On Wed, Apr 10, 2019 at 4:22 PM Josh Triplett 
>> wrote:
>>>
>>> On April 10, 2019 3:58:55 PM PDT, Kees Cook 
>> wrote:
 On Wed, Apr 10, 2019 at 3:42 PM Sinan Kaya  wrote:
>
> We can't seem to have a kernel with CONFIG_EXPERT set but
> CONFIG_DEBUG_KERNEL unset these days.
>
> While some of the features under the CONFIG_EXPERT require
> CONFIG_DEBUG_KERNEL, it doesn't apply for all features.
>
> It looks like CONFIG_KALLSYMS_ALL is the only feature that
> requires CONFIG_DEBUG_KERNEL.
>
> Select CONFIG_EXPERT when CONFIG_DEBUG is chosen but you can

 Typo: CONFIG_DEBUG_KERNEL

> still choose CONFIG_EXPERT without CONFIG_DEBUG.

 same.

>
> Signed-off-by: Sinan Kaya 

 But with those fixed, looks good to me. Adding Josh (and others) to
>> CC
 since he originally added the linkage to EXPERT in commit
 f505c553dbe2.
>>>
>>> CONFIG_DEBUG_KERNEL shouldn't affect code generation in any way; it
>> should only make more options appear in kconfig. I originally added
>> this to ensure that features you might want to *disable* aren't hidden,
>> as part of the tinification effort.
>>>
>>> What specific problem does having CONFIG_DEBUG_KERNEL enabled cause
>> for you? I'd still prefer to have a single switch for "don't hide
>> things I might want to disable", rather than several.
>>
>> See earlier in the thread: code generation depends on
>> CONFIG_DEBUG_KERNEL now unfortunately.
> 
> Then let's fix *that*, and get checkpatch to help enforce it in the future. 
> EXPERT doesn't affect code generation, and neither should this.
> 

checkpatch is not an enforcer.  It takes maintainers to do that.


-- 
~Randy

Re: [PATCH 2/3] clk: rockchip: Make rkpwm a critical clock on rk3288

2019-04-10 Thread elaine.zhang


hi,

在 2019/4/10 下午11:25, Doug Anderson 写道:

Hi,

On Tue, Apr 9, 2019 at 11:42 PM elaine.zhang  wrote:

hi,

在 2019/4/10 上午4:47, Douglas Anderson 写道:

Most rk3288-based boards are derived from the EVB and thus use a PWM
regulator for the logic rail.  However, most rk3288-based boards don't
specify the PWM regulator in their device tree.  We'll deal with that
by making it critical.

NOTE: it's important to make it critical and not just IGNORE_UNUSED
because all PWMs in the system share the same clock.  We don't want
another PWM user to turn the clock on and off and kill the logic rail.

This change is in preparation for actually having the PWMs in the
rk3288 device tree actually point to the proper PWM clock.  Up until
now they've all pointed to the clock for the old IP block and they've
all worked due to the fact that rkpwm was IGNORE_UNUSED and that the
clock rates for both clocks were the same.

Signed-off-by: Douglas Anderson 
---

   drivers/clk/rockchip/clk-rk3288.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/rockchip/clk-rk3288.c 
b/drivers/clk/rockchip/clk-rk3288.c
index 06287810474e..c3321eade23e 100644
--- a/drivers/clk/rockchip/clk-rk3288.c
+++ b/drivers/clk/rockchip/clk-rk3288.c
@@ -697,7 +697,7 @@ static struct rockchip_clk_branch rk3288_clk_branches[] 
__initdata = {
   GATE(PCLK_TZPC, "pclk_tzpc", "pclk_cpu", 0, RK3288_CLKGATE_CON(11), 3, 
GFLAGS),
   GATE(PCLK_UART2, "pclk_uart2", "pclk_cpu", 0, RK3288_CLKGATE_CON(11), 9, 
GFLAGS),
   GATE(PCLK_EFUSE256, "pclk_efuse_256", "pclk_cpu", 0, 
RK3288_CLKGATE_CON(11), 10, GFLAGS),
- GATE(PCLK_RKPWM, "pclk_rkpwm", "pclk_cpu", CLK_IGNORE_UNUSED, 
RK3288_CLKGATE_CON(11), 11, GFLAGS),
+ GATE(PCLK_RKPWM, "pclk_rkpwm", "pclk_cpu", 0, RK3288_CLKGATE_CON(11), 11, 
GFLAGS),

   /* ddrctrl [DDR Controller PHY clock] gates */
   GATE(0, "nclk_ddrupctl0", "ddrphy", CLK_IGNORE_UNUSED, 
RK3288_CLKGATE_CON(11), 4, GFLAGS),
@@ -837,6 +837,7 @@ static const char *const rk3288_critical_clocks[] 
__initconst = {
   "pclk_alive_niu",
   "pclk_pd_pmu",
   "pclk_pmu_niu",
+ "pclk_rkpwm",

pwm have device node, can enable and disable it in the pwm drivers.

pwm regulator use pwm node as:

pwms = < 0 25000 1>

when set Logic voltage:

pwm_regulator_set_voltage()

  --> pwm_apply_state()

  -->clk_enable()

  -->pwm_enable()

  -->pwm_config()

  -->pinctrl_select()

  --

For mark pclk_rkpwm as critical,do you have any questions, or provides
some log or more information.

Right, if we actually specify the PWM used for the PWM regulator in
the device tree then there is no need to mark it as a critical clock.
In fact rk3288-veyron devices boot absolutely fine without marking
this clock as critical.  Actually, it seems like the way the PWM
framework works (IIRC it was designed this way specifically to support
PWM regulators) is that even just specifying that pwm1 is "okay" is
enough to keep the clock on even if the PWM regulator isn't specified.

...however...

Take a look at, for instance, the rk3288-evb device tree file.
Nowhere in there does it specify that the PWM used for the PWM
regulator should be on.  Presumably that means that if we don't mark
the clock as critical then rk3288-evb will fail to boot.  That's easy
for me to fix since I have the rk3288-evb schematics, but what about
other rk3288 boards?  We could make educated guesses about each of
them and/or fix things are we hear about breakages.

...but...

All the above would only be worth doing if we thought someone would
get some benefit out of it.  I'd bet that pretty much all rk3288-based
boards use a PWM regulator.  Thus, in reality, everyone will want the
rkpwm clock on all the time anyway.  In that case going through all
that extra work / potentially breaking other boards doesn't seem worth
it.  Just mark the clock as critical.


I have no problem with changing it like this, but I think it is better 
to modify dts:


vdd_log: vdd-log {
        compatible = "pwm-regulator";
        rockchip,pwm_id = <2>; //for rk uboot
        rockchip,pwm_voltage = <90>; // for rk uboot
        pwms = < 0 25000 1>;
        regulator-name = "vdd_log";
        regulator-min-microvolt = <80>;//hw logic min voltage
        regulator-max-microvolt = <140>;//hw logic max voltage
        regulator-always-on;
        regulator-boot-on;
    };

Maybe we did not push the modification of this part in rk3288-evb, I 
will push to deal with this.(rk3229-evb.dts and rk3399 has been already 
pushed)





-Doug

Re: [RFC patch 40/41] stacktrace: Remove obsolete functions

On Wed, Apr 10, 2019 at 12:28:34PM +0200, Thomas Gleixner wrote:
> No more users of the struct stack_trace based interfaces. Remove them.
> 
> Remove the macro stubs for !CONFIG_STACKTRACE as well as they are pointless
> because the storage on the call sites is conditional on CONFIG_STACKTRACE
> already. No point to be 'smart'.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/stacktrace.h |   46 
> +++--
>  kernel/stacktrace.c|   14 -
>  2 files changed, 16 insertions(+), 44 deletions(-)
> 
> --- a/include/linux/stacktrace.h
> +++ b/include/linux/stacktrace.h
> @@ -8,23 +8,6 @@ struct task_struct;
>  struct pt_regs;
>  
>  #ifdef CONFIG_STACKTRACE
> -struct stack_trace {
> - unsigned int nr_entries, max_entries;
> - unsigned long *entries;
> - int skip;   /* input argument: How many entries to skip */
> -};
> -
> -extern void save_stack_trace(struct stack_trace *trace);
> -extern void save_stack_trace_regs(struct pt_regs *regs,
> -   struct stack_trace *trace);
> -extern void save_stack_trace_tsk(struct task_struct *tsk,
> - struct stack_trace *trace);
> -extern int save_stack_trace_tsk_reliable(struct task_struct *tsk,
> -  struct stack_trace *trace);
> -
> -extern void print_stack_trace(struct stack_trace *trace, int spaces);
> -extern int snprint_stack_trace(char *buf, size_t size,
> - struct stack_trace *trace, int spaces);
>  
>  extern void stack_trace_print(unsigned long *trace, unsigned int nr_entries,
> int spaces);
> @@ -43,20 +26,23 @@ extern unsigned int stack_trace_save_reg
>  extern unsigned int stack_trace_save_user(unsigned long *store,
> unsigned int size,
> unsigned int skipnr);
> +/*
> + * The below is for stack trace internals and architecture
> + * implementations. Do not use in generic code.
> + */
> +struct stack_trace {
> + unsigned int nr_entries, max_entries;
> + unsigned long *entries;
> + int skip;   /* input argument: How many entries to skip */
> +};

I was a bit surprised to see struct stack_trace still standing at the
end of the patch set, but I guess 41 patches is enough :-)  Do we want
to eventually remove the struct altogether?

I was also hoping to see the fragile "skipnr" go away in favor of
something less dependent on compiler optimizations, but I'm not sure how
feasible that would be.

Regardless, these are very nice cleanups, nice work.

> -#ifdef CONFIG_USER_STACKTRACE_SUPPORT
> +extern void save_stack_trace(struct stack_trace *trace);
> +extern void save_stack_trace_regs(struct pt_regs *regs,
> +   struct stack_trace *trace);
> +extern void save_stack_trace_tsk(struct task_struct *tsk,
> + struct stack_trace *trace);
> +extern int save_stack_trace_tsk_reliable(struct task_struct *tsk,
> +  struct stack_trace *trace);

save_stack_trace_tsk_reliable() is still in use by generic livepatch
code.

Also I wonder if it would make sense to rename these to
__save_stack_trace_*() or arch_save_stack_trace_*() to help discourage
them from being used by generic code.

-- 
Josh

[PATCH] slab: fix an infinite loop in leaks_show()

2019-04-10 Thread Qian Cai

"cat /proc/slab_allocators" could hang forever on SMP machines with
kmemleak or object debugging enabled due to other CPUs running do_drain()
will keep making kmemleak_object or debug_objects_cache dirty and unable
to escape the first loop in leaks_show(),

do {
set_store_user_clean(cachep);
drain_cpu_caches(cachep);
...

} while (!is_store_user_clean(cachep));

For example,

do_drain
  slabs_destroy
slab_destroy
  kmem_cache_free
__cache_free
  ___cache_free
kmemleak_free_recursive
  delete_object_full
__delete_object
  put_object
free_object_rcu
  kmem_cache_free
cache_free_debugcheck --> dirty kmemleak_object

One approach is to check cachep->name and skip both kmemleak_object and
debug_objects_cache in leaks_show(). The other is to set
store_user_clean after drain_cpu_caches() which leaves a small window
between drain_cpu_caches() and set_store_user_clean() where per-CPU
caches could be dirty again lead to slightly wrong information has been
stored but could also speed up things significantly which sounds like a
good compromise. For example,

 # cat /proc/slab_allocators
 0m42.778s # 1st approach
 0m0.737s  # 2nd approach

Fixes: d31676dfde25 ("mm/slab: alternative implementation for DEBUG_SLAB_LEAK")
Signed-off-by: Qian Cai 
---
 mm/slab.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9142ee992493..3e1b7ff0360c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4328,8 +4328,12 @@ static int leaks_show(struct seq_file *m, void *p)
 * whole processing.
 */
do {
-   set_store_user_clean(cachep);
drain_cpu_caches(cachep);
+   /*
+* drain_cpu_caches() could always make kmemleak_object and
+* debug_objects_cache dirty, so reset afterwards.
+*/
+   set_store_user_clean(cachep);
 
x[1] = 0;
 
-- 
2.17.2 (Apple Git-113)

[PATCH v3 0/5] soundwire: code cleanup

SoundWire support will be provided in Linux with the Sound Open
Firmware (SOF) on Intel platforms. Before we start adding the missing
pieces, there are a number of warnings and style issues reported by
checkpatch, cppcheck and Coccinelle that need to be cleaned-up.

Changes since v2:
fixed inversion of devm_kcalloc parameters, detected while rebasing
additional patches.

Changes since v1:
added missing newlines in new patch (suggested by Joe Perches)

Pierre-Louis Bossart (5):
  soundwire: intel: fix inversion in devm_kcalloc parameters
  soundwire: fix style issues
  soundwire: bus: remove useless initializations
  soundwire: stream: remove useless initialization of local variable
  soundwire: add missing newlines in dynamic debug logs

 drivers/soundwire/Kconfig  |   2 +-
 drivers/soundwire/bus.c| 137 ---
 drivers/soundwire/bus.h|  16 +-
 drivers/soundwire/bus_type.c   |   4 +-
 drivers/soundwire/cadence_master.c |  99 +--
 drivers/soundwire/cadence_master.h |  22 +--
 drivers/soundwire/intel.c  | 103 ++-
 drivers/soundwire/intel.h  |   4 +-
 drivers/soundwire/intel_init.c |  12 +-
 drivers/soundwire/mipi_disco.c | 116 +++--
 drivers/soundwire/slave.c  |  10 +-
 drivers/soundwire/stream.c | 267 +++--
 12 files changed, 404 insertions(+), 388 deletions(-)

-- 
2.17.1

[PATCH v3 4/5] soundwire: stream: remove useless initialization of local variable

no need to reset return value.

Detected with cppcheck:
[drivers/soundwire/stream.c:332]: (style) Variable 'ret' is assigned a
value that is never used.

Signed-off-by: Pierre-Louis Bossart 
---
 drivers/soundwire/stream.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c
index e3d2bc5cba80..ab64c2c4c33f 100644
--- a/drivers/soundwire/stream.c
+++ b/drivers/soundwire/stream.c
@@ -329,7 +329,7 @@ static int sdw_enable_disable_master_ports(struct 
sdw_master_runtime *m_rt,
struct sdw_transport_params *t_params = _rt->transport_params;
struct sdw_bus *bus = m_rt->bus;
struct sdw_enable_ch enable_ch;
-   int ret = 0;
+   int ret;
 
enable_ch.port_num = p_rt->num;
enable_ch.ch_mask = p_rt->ch_mask;
-- 
2.17.1

[PATCH v3 3/5] soundwire: bus: remove useless initializations

No need for explicit initialization of page and ssp fields, they are
already zeroed with a memset.

Detected with cppcheck:

[drivers/soundwire/bus.c:309]: (style) Variable 'msg->page' is
reassigned a value before the old one has been used.

Signed-off-by: Pierre-Louis Bossart 
---
 drivers/soundwire/bus.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index 691a31df9732..bb697fd68580 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -271,8 +271,6 @@ int sdw_fill_msg(struct sdw_msg *msg, struct sdw_slave 
*slave,
msg->dev_num = dev_num;
msg->flags = flags;
msg->buf = buf;
-   msg->ssp_sync = false;
-   msg->page = false;
 
if (addr < SDW_REG_NO_PAGE) { /* no paging area */
return 0;
-- 
2.17.1

[PATCH v3 5/5] soundwire: add missing newlines in dynamic debug logs

For some reason the newlines are not used everywhere. Fix as needed.

Reported-by: Joe Perches 
Signed-off-by: Pierre-Louis Bossart 
---
 drivers/soundwire/bus.c|  74 +--
 drivers/soundwire/cadence_master.c |  12 ++--
 drivers/soundwire/intel.c  |  12 ++--
 drivers/soundwire/stream.c | 110 ++---
 4 files changed, 104 insertions(+), 104 deletions(-)

diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index bb697fd68580..fa86957cb615 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -21,12 +21,12 @@ int sdw_add_bus_master(struct sdw_bus *bus)
int ret;
 
if (!bus->dev) {
-   pr_err("SoundWire bus has no device");
+   pr_err("SoundWire bus has no device\n");
return -ENODEV;
}
 
if (!bus->ops) {
-   dev_err(bus->dev, "SoundWire Bus ops are not set");
+   dev_err(bus->dev, "SoundWire Bus ops are not set\n");
return -EINVAL;
}
 
@@ -43,7 +43,7 @@ int sdw_add_bus_master(struct sdw_bus *bus)
if (bus->ops->read_prop) {
ret = bus->ops->read_prop(bus);
if (ret < 0) {
-   dev_err(bus->dev, "Bus read properties failed:%d", ret);
+   dev_err(bus->dev, "Bus read properties failed:%d\n", 
ret);
return ret;
}
}
@@ -296,7 +296,7 @@ int sdw_fill_msg(struct sdw_msg *msg, struct sdw_slave 
*slave,
return -EINVAL;
} else if (!slave->prop.paging_support) {
dev_err(>dev,
-   "address %x needs paging but no support", addr);
+   "address %x needs paging but no support\n", addr);
return -EINVAL;
}
 
@@ -455,13 +455,13 @@ static int sdw_assign_device_num(struct sdw_slave *slave)
dev_num = sdw_get_device_num(slave);
mutex_unlock(>bus->bus_lock);
if (dev_num < 0) {
-   dev_err(slave->bus->dev, "Get dev_num failed: %d",
+   dev_err(slave->bus->dev, "Get dev_num failed: %d\n",
dev_num);
return dev_num;
}
} else {
dev_info(slave->bus->dev,
-"Slave already registered dev_num:%d",
+"Slave already registered dev_num:%d\n",
 slave->dev_num);
 
/* Clear the slave->dev_num to transfer message on device 0 */
@@ -472,7 +472,7 @@ static int sdw_assign_device_num(struct sdw_slave *slave)
 
ret = sdw_write(slave, SDW_SCP_DEVNUMBER, dev_num);
if (ret < 0) {
-   dev_err(>dev, "Program device_num failed: %d", ret);
+   dev_err(>dev, "Program device_num failed: %d\n", ret);
return ret;
}
 
@@ -485,7 +485,7 @@ static int sdw_assign_device_num(struct sdw_slave *slave)
 void sdw_extract_slave_id(struct sdw_bus *bus,
  u64 addr, struct sdw_slave_id *id)
 {
-   dev_dbg(bus->dev, "SDW Slave Addr: %llx", addr);
+   dev_dbg(bus->dev, "SDW Slave Addr: %llx\n", addr);
 
/*
 * Spec definition
@@ -505,7 +505,7 @@ void sdw_extract_slave_id(struct sdw_bus *bus,
id->class_id = addr & GENMASK(7, 0);
 
dev_dbg(bus->dev,
-   "SDW Slave class_id %x, part_id %x, mfg_id %x, unique_id %x, 
version %x",
+   "SDW Slave class_id %x, part_id %x, mfg_id %x, unique_id %x, 
version %x\n",
id->class_id, id->part_id, id->mfg_id,
id->unique_id, id->sdw_version);
 
@@ -562,7 +562,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
ret = sdw_assign_device_num(slave);
if (ret) {
dev_err(slave->bus->dev,
-   "Assign dev_num failed:%d",
+   "Assign dev_num failed:%d\n",
ret);
return ret;
}
@@ -573,7 +573,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 
if (!found) {
/* TODO: Park this device in Group 13 */
-   dev_err(bus->dev, "Slave Entry not found");
+   dev_err(bus->dev, "Slave Entry not found\n");
}
 
count++;
@@ -618,7 +618,7 @@ int sdw_configure_dpn_intr(struct sdw_slave *slave,
ret = sdw_update(slave, addr, (mask | SDW_DPN_INT_PORT_READY), val);
if (ret < 0)
dev_err(slave->bus->dev,
-   "SDW_DPN_INTMASK write failed:%d", val);
+   "SDW_DPN_INTMASK write

[PATCH v3 2/5] soundwire: fix style issues

Visual inspections confirmed by checkpatch.pl --strict expose a number
of style issues, specifically parameter alignment is inconsistent as
if different contributors used different styles. Before we restart
support for SoundWire with Sound Open Firmware on Intel platforms,
let's clean all this.

Fix Kconfig help, spelling, SPDX format, alignment, spurious
parentheses, bool comparisons to true/false, macro argument
protection.

No new functionality added.

Signed-off-by: Pierre-Louis Bossart 
---
 drivers/soundwire/Kconfig  |   2 +-
 drivers/soundwire/bus.c|  87 
 drivers/soundwire/bus.h|  16 +--
 drivers/soundwire/bus_type.c   |   4 +-
 drivers/soundwire/cadence_master.c |  87 
 drivers/soundwire/cadence_master.h |  22 ++--
 drivers/soundwire/intel.c  |  87 
 drivers/soundwire/intel.h  |   4 +-
 drivers/soundwire/intel_init.c |  12 +--
 drivers/soundwire/mipi_disco.c | 116 +++--
 drivers/soundwire/slave.c  |  10 +-
 drivers/soundwire/stream.c | 161 +++--
 12 files changed, 313 insertions(+), 295 deletions(-)

diff --git a/drivers/soundwire/Kconfig b/drivers/soundwire/Kconfig
index 19c8efb9a5ee..84876a74874f 100644
--- a/drivers/soundwire/Kconfig
+++ b/drivers/soundwire/Kconfig
@@ -4,7 +4,7 @@
 
 menuconfig SOUNDWIRE
bool "SoundWire support"
-   ---help---
+   help
  SoundWire is a 2-Pin interface with data and clock line ratified
  by the MIPI Alliance. SoundWire is used for transporting data
  typically related to audio functions. SoundWire interface is
diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index 1cbfedfc20ef..691a31df9732 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -49,7 +49,7 @@ int sdw_add_bus_master(struct sdw_bus *bus)
}
 
/*
-* Device numbers in SoundWire are 0 thru 15. Enumeration device
+* Device numbers in SoundWire are 0 through 15. Enumeration device
 * number (0), Broadcast device number (15), Group numbers (12 and
 * 13) and Master device number (14) are not used for assignment so
 * mask these and other higher bits.
@@ -172,7 +172,8 @@ static inline int do_transfer(struct sdw_bus *bus, struct 
sdw_msg *msg)
 }
 
 static inline int do_transfer_defer(struct sdw_bus *bus,
-   struct sdw_msg *msg, struct sdw_defer *defer)
+   struct sdw_msg *msg,
+   struct sdw_defer *defer)
 {
int retry = bus->prop.err_threshold;
enum sdw_command_response resp;
@@ -224,7 +225,7 @@ int sdw_transfer(struct sdw_bus *bus, struct sdw_msg *msg)
ret = do_transfer(bus, msg);
if (ret != 0 && ret != -ENODATA)
dev_err(bus->dev, "trf on Slave %d failed:%d\n",
-   msg->dev_num, ret);
+   msg->dev_num, ret);
 
if (msg->page)
sdw_reset_page(bus, msg->dev_num);
@@ -243,7 +244,7 @@ int sdw_transfer(struct sdw_bus *bus, struct sdw_msg *msg)
  * Caller needs to hold the msg_lock lock while calling this
  */
 int sdw_transfer_defer(struct sdw_bus *bus, struct sdw_msg *msg,
-   struct sdw_defer *defer)
+  struct sdw_defer *defer)
 {
int ret;
 
@@ -253,7 +254,7 @@ int sdw_transfer_defer(struct sdw_bus *bus, struct sdw_msg 
*msg,
ret = do_transfer_defer(bus, msg, defer);
if (ret != 0 && ret != -ENODATA)
dev_err(bus->dev, "Defer trf on Slave %d failed:%d\n",
-   msg->dev_num, ret);
+   msg->dev_num, ret);
 
if (msg->page)
sdw_reset_page(bus, msg->dev_num);
@@ -261,9 +262,8 @@ int sdw_transfer_defer(struct sdw_bus *bus, struct sdw_msg 
*msg,
return ret;
 }
 
-
 int sdw_fill_msg(struct sdw_msg *msg, struct sdw_slave *slave,
-   u32 addr, size_t count, u16 dev_num, u8 flags, u8 *buf)
+u32 addr, size_t count, u16 dev_num, u8 flags, u8 *buf)
 {
memset(msg, 0, sizeof(*msg));
msg->addr = addr; /* addr is 16 bit and truncated here */
@@ -284,7 +284,7 @@ int sdw_fill_msg(struct sdw_msg *msg, struct sdw_slave 
*slave,
if (addr < SDW_REG_OPTIONAL_PAGE) { /* 32k but no page */
if (slave && !slave->prop.paging_support)
return 0;
-   /* no need for else as that will fall thru to paging */
+   /* no need for else as that will fall-through to paging */
}
 
/* paging mandatory */
@@ -323,7 +323,7 @@ int sdw_nread(struct sdw_slave *slave, u32 addr, size_t 
count, u8 *val)
int ret;
 
ret = sdw_fill_msg(, slave, addr, count,
-   slave->dev_num, SDW_MSG_FLAG_READ, val);
+  slave->dev_num,

[PATCH v3 1/5] soundwire: intel: fix inversion in devm_kcalloc parameters

the number of elements and size are inverted, fix.

This probably only worked because the number of properties is
hard-coded to 1.

Fixes: 71bb8a1b059e ('soundwire: intel: Add Intel Master driver')
Signed-off-by: Pierre-Louis Bossart 
---
 drivers/soundwire/intel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/soundwire/intel.c b/drivers/soundwire/intel.c
index fd8d034cfec1..8669b314c476 100644
--- a/drivers/soundwire/intel.c
+++ b/drivers/soundwire/intel.c
@@ -796,8 +796,8 @@ static int intel_prop_read(struct sdw_bus *bus)
 
/* BIOS is not giving some values correctly. So, lets override them */
bus->prop.num_freq = 1;
-   bus->prop.freq = devm_kcalloc(bus->dev, sizeof(*bus->prop.freq),
-   bus->prop.num_freq, GFP_KERNEL);
+   bus->prop.freq = devm_kcalloc(bus->dev, bus->prop.num_freq,
+ sizeof(*bus->prop.freq), GFP_KERNEL);
if (!bus->prop.freq)
return -ENOMEM;
 
-- 
2.17.1

Re: [PATCH v2] init: Do not select DEBUG_KERNEL by default

2019-04-10 Thread Sinan Kaya


On 4/10/2019 11:02 PM, Josh Triplett wrote:

Then let's fix*that*, and get checkpatch to help enforce it in the future. 
EXPERT doesn't affect code generation, and neither should this.


I think we have to do both. We need to go after the users as well as
solve the immediate problem per this patch.

As Mathieu identified, CONFIG_DEBUG_KERNEL is being used all over the
place and getting subsystem owners to remove let alone add a check
to checkpatch is just going to take time.

Please let us know if you are OK with this plan.

Re: [PATCH v5 0/5] PCIE support for i.MX8MQ (DT changes)

On Fri, Apr 05, 2019 at 10:29:59AM -0700, Andrey Smirnov wrote:
> Andrey Smirnov (5):
>   arm64: dts: imx8mq: Mark iomuxc_gpr as i.MX6Q compatible
>   arm64: dts: imx8mq: Add a node for SRC IP block
>   arm64: dts: imx8mq: Combine PCIE power domains
>   arm64: dts: imx8mq: Add nodes for PCIe IP blocks
>   arm64: dts: imx8mq-evk: Enable PCIE0 interface

Applied all, thanks.

Re: [RFC patch 16/41] tracing: Remove the ULONG_MAX stack trace hackery

2019-04-10 Thread Steven Rostedt

On Wed, 10 Apr 2019 21:34:25 -0500
Josh Poimboeuf  wrote:

> > --- a/kernel/trace/trace_stack.c
> > +++ b/kernel/trace/trace_stack.c
> > @@ -18,8 +18,7 @@
> >  
> >  #include "trace.h"
> >  
> > -static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
> > -{ [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
> > +static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES + 1];  
> 
> Is the "+ 1" still needed?  AFAICT, accesses to this array never go past
> nr_entries.

Probably not. But see this for an explanation:

 http://lkml.kernel.org/r/20180620110758.crunhd5bfep7zuiz@kili.mountain

> 
> Also I've been staring at the code but I can't figure out why
> max_entries is "- 1".
> 
> struct stack_trace stack_trace_max = {
>   .max_entries= STACK_TRACE_ENTRIES - 1,
>   .entries= _dump_trace[0],
> };
> 

Well, it had a reason in the past, but there doesn't seem to be a
reason today.  Looking at git history, that code was originally:

.max_entries= STACK_TRACE_ENTRIES - 1,
.entries= _dump_trace[1],

Where we had to make max_entries -1 as we started at the first index
into the array.

I'll have to take a new look into this code. After Thomas's clean up
here, I'm sure we can simplify it a bit more.

-- Steve

Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling.

2019-04-10 Thread Aaron Lu

On Wed, Apr 10, 2019 at 04:44:18PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 10, 2019 at 12:36:33PM +0800, Aaron Lu wrote:
> > On Tue, Apr 09, 2019 at 11:09:45AM -0700, Tim Chen wrote:
> > > Now that we have accumulated quite a number of different fixes to your 
> > > orginal
> > > posted patches.  Would you like to post a v2 of the core scheduler with 
> > > the fixes?
> > 
> > One more question I'm not sure: should a task with cookie=0, i.e. tasks
> > that are untagged, be allowed to scheduled on the the same core with
> > another tagged task?
> 
> That was not meant to be possible.

Good to know this.

> > The current patch seems to disagree on this, e.g. in pick_task(),
> > if max is already chosen but max->core_cookie == 0, then we didn't care
> > about cookie and simply use class_pick for the other cpu. This means we
> > could schedule two tasks with different cookies(one is zero and the
> > other can be tagged).
> 
> When core_cookie==0 we shouldn't schedule the other siblings at all.

Not even with another untagged task?

I was thinking to leave host side tasks untagged, like kernel threads,
init and other system daemons or utilities etc., and tenant tasks tagged.
Then at least two untagged tasks can be scheduled on the same core.

Kindly let me know if you see a problem with this.

> > But then sched_core_find() only allow idle task to match with any tagged
> > tasks(we didn't place untagged tasks to the core tree of course :-).
> > 
> > Thoughts? Do I understand this correctly? If so, I think we probably
> > want to make this clear before v2. I personally feel, we shouldn't allow
> > untagged tasks(like kernel threads) to match with tagged tasks.
> 
> Agreed, cookie should always match or idle.

Thanks a lot for the clarification.

Re: [PATCH v2] init: Do not select DEBUG_KERNEL by default

2019-04-10 Thread Josh Triplett

On April 10, 2019 4:24:18 PM PDT, Kees Cook  wrote:
>On Wed, Apr 10, 2019 at 4:22 PM Josh Triplett 
>wrote:
>>
>> On April 10, 2019 3:58:55 PM PDT, Kees Cook 
>wrote:
>> >On Wed, Apr 10, 2019 at 3:42 PM Sinan Kaya  wrote:
>> >>
>> >> We can't seem to have a kernel with CONFIG_EXPERT set but
>> >> CONFIG_DEBUG_KERNEL unset these days.
>> >>
>> >> While some of the features under the CONFIG_EXPERT require
>> >> CONFIG_DEBUG_KERNEL, it doesn't apply for all features.
>> >>
>> >> It looks like CONFIG_KALLSYMS_ALL is the only feature that
>> >> requires CONFIG_DEBUG_KERNEL.
>> >>
>> >> Select CONFIG_EXPERT when CONFIG_DEBUG is chosen but you can
>> >
>> >Typo: CONFIG_DEBUG_KERNEL
>> >
>> >> still choose CONFIG_EXPERT without CONFIG_DEBUG.
>> >
>> >same.
>> >
>> >>
>> >> Signed-off-by: Sinan Kaya 
>> >
>> >But with those fixed, looks good to me. Adding Josh (and others) to
>CC
>> >since he originally added the linkage to EXPERT in commit
>> >f505c553dbe2.
>>
>> CONFIG_DEBUG_KERNEL shouldn't affect code generation in any way; it
>should only make more options appear in kconfig. I originally added
>this to ensure that features you might want to *disable* aren't hidden,
>as part of the tinification effort.
>>
>> What specific problem does having CONFIG_DEBUG_KERNEL enabled cause
>for you? I'd still prefer to have a single switch for "don't hide
>things I might want to disable", rather than several.
>
>See earlier in the thread: code generation depends on
>CONFIG_DEBUG_KERNEL now unfortunately.

Then let's fix *that*, and get checkpatch to help enforce it in the future. 
EXPERT doesn't affect code generation, and neither should this.

Re: [PATCH] arm64: dts: imx8qxp: Add lpuart1/lpuart2/lpuart3 nodes

On Sat, Mar 30, 2019 at 05:07:44PM +, Daniel Baluta wrote:
> lpuart nodes are part of the ADMA subsystem. See Audio DMA
> memory map in iMX8 QXP RM [1]
> 
> This patch is based on the dtsi file initially submitted by
> Teo Hall in i.MX NXP internal tree.
> 
> [1] https://www.nxp.com/docs/en/reference-manual/IMX8DQXPRM.pdf
> 
> Signed-off-by: Teo Hall 
> Signed-off-by: Daniel Baluta 

Applied, thanks.

Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

2019-04-10 Thread Liang Yang


Hi Martin,
On 2019/4/11 1:54, Martin Blumenstingl wrote:

Hi Liang,

On Wed, Apr 10, 2019 at 1:08 PM Liang Yang  wrote:


Hi Martin,

On 2019/4/5 12:30, Martin Blumenstingl wrote:

Hi Liang,

On Fri, Mar 29, 2019 at 8:44 AM Liang Yang  wrote:


Hi Martin,

On 2019/3/29 2:03, Martin Blumenstingl wrote:

Hi Liang,

[..]

I don't think it is caused by a different NAND type, but i have followed
the some test on my GXL platform. we can see the result from the
attachment. By the way, i don't find any information about this on meson
NFC datasheet, so i will ask our VLSI.
Martin, May you reproduce it with the new patch on meson8b platform ? I
need a more clear and easier compared log like gxl.txt. Thanks.

your gxl.txt is great, finally I can also compare my own results with
something that works for you!
in my results (see attachment) the "DATA_IN  [256 B, force 8-bit]"
instructions result in a different info buffer output.
does this make any sense to you?


I have asked our VLSI designer for explanation or simulation result by
an e-mail. Thanks.

do you have any update on this?

Sorry. I haven't got reply from VLSI designer yet. We tried to improve
priority yesterday, but i still can't estimate the time. There is no
document or change list showing the difference between m8/b and gxl/axg
serial chips. Now it seems that we can't use command NFC_CMD_N2M on nand
initialization for m8/b chips and use *read byte from NFC fifo register*
instead.

thank you for the status update!

I am trying to understand your suggestion not to use NFC_CMD_N2M:
the documentation (public S922X datasheet from Hardkernel: [0]) states
that P_NAND_BUF (NFC_REG_BUF in the meson_nand driver) can hold up to
four bytes of data. is this the "read byte from NFC FIFO register" you
mentioned?

You are right.take the early meson NFC driver V2 on previous mail as a 
reference.



Before I spend time changing the code to use the FIFO register I would
like to wait for an answer from your VLSI designer.
Setting the "correct" info buffer length for NFC_CMD_N2M on the 32-bit
SoCs seems like an easier solution compared to switching to the FIFO
register. Keeping NFC_CMD_N2M on the 32-bit SoCs also allows us to
have only one code-path for 32 and 64 bit SoCs, meaning we don't have
to maintain two separate code-paths for basically the same
functionality (assuming that NFC_CMD_N2M is not completely broken on
the 32-bit SoCs, we just don't know how to use it yet).


All right. I am also waiting for the answer.


Regards
Martin


[0] 
https://dn.odroid.com/S922X/ODROID-N2/Datasheet/S922X_Public_Datasheet_V0.2.pdf

.

Re: [PATCH RFC] clk: ux500: add range to usleep_range

2019-04-10 Thread Nicholas Mc Guire

On Wed, Apr 10, 2019 at 03:53:51PM -0700, Stephen Boyd wrote:
> Quoting Nicholas Mc Guire (2019-04-06 20:13:24)
> > Providing a range for usleep_range() allows the hrtimer subsystem to
> > coalesce timers - the delay is runtime configurable so a factor 2
> > is taken to provide the range.
> > 
> > Signed-off-by: Nicholas Mc Guire 
> > ---
> 
> I think this driver is in maintenance mode. I'll wait for Ulf to ack or
> review this change before applying.
> 
> > diff --git a/drivers/clk/ux500/clk-sysctrl.c 
> > b/drivers/clk/ux500/clk-sysctrl.c
> > index 7c0403b..a1fa3fb 100644
> > --- a/drivers/clk/ux500/clk-sysctrl.c
> > +++ b/drivers/clk/ux500/clk-sysctrl.c
> > @@ -42,7 +42,7 @@ static int clk_sysctrl_prepare(struct clk_hw *hw)
> > clk->reg_bits[0]);
> >  
> > if (!ret && clk->enable_delay_us)
> > -   usleep_range(clk->enable_delay_us, clk->enable_delay_us);
> > +   usleep_range(clk->enable_delay_us, clk->enable_delay_us*2);
> 
> Please add space around that multiply.
>
I can do that but it does not seem common and also checkpatch
did not complain about this - now a simple grep -re "\*10" on the
kernel shows that it seems more common not to use spaces around *
that to use them. Greping specifically for cases using usleep_range()
(not that many) it seems more or less evenly devided between space
and no space - so the concern is overlooking that factor 2 ?

thx!
hofrat

Re: [RFC patch 25/41] mm/kasan: Simplify stacktrace handling

On Wed, Apr 10, 2019 at 12:28:19PM +0200, Thomas Gleixner wrote:
> Replace the indirection through struct stack_trace by using the storage
> array based interfaces.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Andrey Ryabinin 
> Cc: Alexander Potapenko 
> Cc: Dmitry Vyukov 
> Cc: kasan-...@googlegroups.com
> Cc: linux...@kvack.org
> ---
>  mm/kasan/common.c |   30 --
>  mm/kasan/report.c |7 ---
>  2 files changed, 16 insertions(+), 21 deletions(-)
> 
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -48,34 +48,28 @@ static inline int in_irqentry_text(unsig
>ptr < (unsigned long)&__softirqentry_text_end);
>  }
>  
> -static inline void filter_irq_stacks(struct stack_trace *trace)
> +static inline unsigned int filter_irq_stacks(unsigned long *entries,
> +  unsigned int nr_entries)
>  {
> - int i;
> + unsigned int i;
>  
> - if (!trace->nr_entries)
> - return;
> - for (i = 0; i < trace->nr_entries; i++)
> - if (in_irqentry_text(trace->entries[i])) {
> + for (i = 0; i < nr_entries; i++) {
> + if (in_irqentry_text(entries[i])) {
>   /* Include the irqentry function into the stack. */
> - trace->nr_entries = i + 1;
> - break;
> + return i + 1;

Isn't this an off-by-one error if "i" points to the last entry of the
array?

-- 
Josh

Re: [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects

On Thu, Apr 11, 2019 at 03:33:22AM +0100, Al Viro wrote:
> On Thu, Apr 11, 2019 at 11:34:40AM +1000, Tobin C. Harding wrote:
> > +/*
> > + * d_isolate() - Dentry isolation callback function.
> > + * @s: The dentry cache.
> > + * @v: Vector of pointers to the objects to isolate.
> > + * @nr: Number of objects in @v.
> > + *
> > + * The slab allocator is holding off frees. We can safely examine
> > + * the object without the danger of it vanishing from under us.
> > + */
> > +static void *d_isolate(struct kmem_cache *s, void **v, int nr)
> > +{
> > +   struct dentry *dentry;
> > +   int i;
> > +
> > +   for (i = 0; i < nr; i++) {
> > +   dentry = v[i];
> > +   __dget(dentry);
> > +   }
> > +
> > +   return NULL;/* No need for private data */
> > +}
> 
> Huh?  This is compeletely wrong; what you need is collecting the ones
> with zero refcount (and not on shrink lists) into a private list.
> *NOT* bumping the refcounts at all.  And do it in your isolate thing.

Oh, so putting entries on a shrink list is enough to pin them?

> 
> > +static void d_partial_shrink(struct kmem_cache *s, void **v, int nr,
> > + int node, void *_unused)
> > +{
> > +   struct dentry *dentry;
> > +   LIST_HEAD(dispose);
> > +   int i;
> > +
> > +   for (i = 0; i < nr; i++) {
> > +   dentry = v[i];
> > +   spin_lock(>d_lock);
> > +   dentry->d_lockref.count--;
> > +
> > +   if (dentry->d_lockref.count > 0 ||
> > +   dentry->d_flags & DCACHE_SHRINK_LIST) {
> > +   spin_unlock(>d_lock);
> > +   continue;
> > +   }
> > +
> > +   if (dentry->d_flags & DCACHE_LRU_LIST)
> > +   d_lru_del(dentry);
> > +
> > +   d_shrink_add(dentry, );
> > +
> > +   spin_unlock(>d_lock);
> > +   }
> 
> Basically, that loop (sans jerking the refcount up and down) should
> get moved into d_isolate().
> > +
> > +   if (!list_empty())
> > +   shrink_dentry_list();
> > +}
> 
> ... with this left in d_partial_shrink().  And you obviously need some way
> to pass the list from the former to the latter...

Easy enough, we have a void * return value from the isolate function
just for this purpose.

Thanks Al, hackety hack ...


Tobin

Re: [PATCH 1/5] media: platform: Aspeed: Remove use of reset line

2019-04-10 Thread Joel Stanley

On Tue, 2 Apr 2019 at 18:24, Eddie James  wrote:
>
> The reset line is toggled by enabling the clocks, so it's not necessary
> to manually toggle the reset as well.
>
> Signed-off-by: Eddie James 

Reviewed-by: Joel Stanley

[PATCH -next] memstick: remove set but not used variable 'data'

2019-04-10 Thread YueHaibing

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/memstick/host/jmb38x_ms.c: In function 'jmb38x_ms_issue_cmd':
drivers/memstick/host/jmb38x_ms.c:371:17: warning:
 variable 'data' set but not used [-Wunused-but-set-variable]

It's never used since introduction and can be removed.
Signed-off-by: YueHaibing 
---
 drivers/memstick/host/jmb38x_ms.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/memstick/host/jmb38x_ms.c 
b/drivers/memstick/host/jmb38x_ms.c
index e3a5af65dbce..acec09813419 100644
--- a/drivers/memstick/host/jmb38x_ms.c
+++ b/drivers/memstick/host/jmb38x_ms.c
@@ -368,7 +368,6 @@ static int jmb38x_ms_transfer_data(struct jmb38x_ms_host 
*host)
 static int jmb38x_ms_issue_cmd(struct memstick_host *msh)
 {
struct jmb38x_ms_host *host = memstick_priv(msh);
-   unsigned char *data;
unsigned int data_len, cmd, t_val;
 
if (!(STATUS_HAS_MEDIA & readl(host->addr + STATUS))) {
@@ -400,8 +399,6 @@ static int jmb38x_ms_issue_cmd(struct memstick_host *msh)
cmd |= TPC_WAIT_INT;
}
 
-   data = host->req->data;
-
if (!no_dma)
host->cmd_flags |= DMA_DATA;

Re: [RFC patch 20/41] backtrace-test: Simplify stack trace handling

On Wed, Apr 10, 2019 at 12:28:14PM +0200, Thomas Gleixner wrote:
> Replace the indirection through struct stack_trace by using the storage
> array based interfaces.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  kernel/backtracetest.c |   11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> --- a/kernel/backtracetest.c
> +++ b/kernel/backtracetest.c
> @@ -48,19 +48,14 @@ static void backtrace_test_irq(void)
>  #ifdef CONFIG_STACKTRACE
>  static void backtrace_test_saved(void)
>  {
> - struct stack_trace trace;
>   unsigned long entries[8];
> + unsigned int nent;

"Nent" isn't immediately readable to my eyes.  How about just good old
"nr_entries"?  (for this patch and all the others)

-- 
Josh

Re: [PATCH] cifs: fix page reference leak with readv/writev

2019-04-10 Thread Steve French

How was this discovered? Does it address a reported user problem?

On Wed, Apr 10, 2019 at 2:38 PM  wrote:
>
> From: Jérôme Glisse 
>
> CIFS can leak pages reference gotten through GUP (get_user_pages*()
> through iov_iter_get_pages()). This happen if cifs_send_async_read()
> or cifs_write_from_iter() calls fail from within __cifs_readv() and
> __cifs_writev() respectively. This patch move page unreference to
> cifs_aio_ctx_release() which will happens on all code paths this is
> all simpler to follow for correctness.
>
> Signed-off-by: Jérôme Glisse 
> Cc: Steve French 
> Cc: linux-c...@vger.kernel.org
> Cc: samba-techni...@lists.samba.org
> Cc: Alexander Viro 
> Cc: linux-fsde...@vger.kernel.org
> Cc: Linus Torvalds 
> Cc: Stable 
> ---
>  fs/cifs/file.c | 15 +--
>  fs/cifs/misc.c | 23 ++-
>  2 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 89006e044973..a756a4d3f70f 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2858,7 +2858,6 @@ static void collect_uncached_write_data(struct 
> cifs_aio_ctx *ctx)
> struct cifs_tcon *tcon;
> struct cifs_sb_info *cifs_sb;
> struct dentry *dentry = ctx->cfile->dentry;
> -   unsigned int i;
> int rc;
>
> tcon = tlink_tcon(ctx->cfile->tlink);
> @@ -2922,10 +2921,6 @@ static void collect_uncached_write_data(struct 
> cifs_aio_ctx *ctx)
> kref_put(>refcount, cifs_uncached_writedata_release);
> }
>
> -   if (!ctx->direct_io)
> -   for (i = 0; i < ctx->npages; i++)
> -   put_page(ctx->bv[i].bv_page);
> -
> cifs_stats_bytes_written(tcon, ctx->total_len);
> set_bit(CIFS_INO_INVALID_MAPPING, _I(dentry->d_inode)->flags);
>
> @@ -3563,7 +3558,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
> struct iov_iter *to = >iter;
> struct cifs_sb_info *cifs_sb;
> struct cifs_tcon *tcon;
> -   unsigned int i;
> int rc;
>
> tcon = tlink_tcon(ctx->cfile->tlink);
> @@ -3647,15 +3641,8 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
> kref_put(>refcount, cifs_uncached_readdata_release);
> }
>
> -   if (!ctx->direct_io) {
> -   for (i = 0; i < ctx->npages; i++) {
> -   if (ctx->should_dirty)
> -   set_page_dirty(ctx->bv[i].bv_page);
> -   put_page(ctx->bv[i].bv_page);
> -   }
> -
> +   if (!ctx->direct_io)
> ctx->total_len = ctx->len - iov_iter_count(to);
> -   }
>
> /* mask nodata case */
> if (rc == -ENODATA)
> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
> index bee203055b30..9bc0d17a9d77 100644
> --- a/fs/cifs/misc.c
> +++ b/fs/cifs/misc.c
> @@ -768,6 +768,11 @@ cifs_aio_ctx_alloc(void)
>  {
> struct cifs_aio_ctx *ctx;
>
> +   /*
> +* Must use kzalloc to initialize ctx->bv to NULL and ctx->direct_io
> +* to false so that we know when we have to unreference pages within
> +* cifs_aio_ctx_release()
> +*/
> ctx = kzalloc(sizeof(struct cifs_aio_ctx), GFP_KERNEL);
> if (!ctx)
> return NULL;
> @@ -786,7 +791,23 @@ cifs_aio_ctx_release(struct kref *refcount)
> struct cifs_aio_ctx, refcount);
>
> cifsFileInfo_put(ctx->cfile);
> -   kvfree(ctx->bv);
> +
> +   /*
> +* ctx->bv is only set if setup_aio_ctx_iter() was call successfuly
> +* which means that iov_iter_get_pages() was a success and thus that
> +* we have taken reference on pages.
> +*/
> +   if (ctx->bv) {
> +   unsigned i;
> +
> +   for (i = 0; i < ctx->npages; i++) {
> +   if (ctx->should_dirty)
> +   set_page_dirty(ctx->bv[i].bv_page);
> +   put_page(ctx->bv[i].bv_page);
> +   }
> +   kvfree(ctx->bv);
> +   }
> +
> kfree(ctx);
>  }
>
> --
> 2.20.1
>


-- 
Thanks,

Steve

Re: [PATCH v3] init: Do not select DEBUG_KERNEL by default

2019-04-10 Thread Kees Cook

On Wed, Apr 10, 2019 at 5:56 PM Sinan Kaya  wrote:
>
> We can't seem to have a kernel with CONFIG_EXPERT set but
> CONFIG_DEBUG_KERNEL unset these days.
>
> While some of the features under the CONFIG_EXPERT require
> CONFIG_DEBUG_KERNEL, it doesn't apply for all features.
>
> It looks like CONFIG_KALLSYMS_ALL is the only feature that
> requires CONFIG_DEBUG_KERNEL.
>
> Select CONFIG_EXPERT when CONFIG_DEBUG_KERNEL is chosen but
> you can still choose CONFIG_EXPERT without CONFIG_DEBUG_KERNEL.
>
> Signed-off-by: Sinan Kaya 
> Reviewed-by: Kees Cook 

Masahiro, should this go via your tree, or somewhere else?

Thanks!

-Kees

> ---
>  init/Kconfig  | 2 --
>  lib/Kconfig.debug | 1 +
>  2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 4592bf7997c0..37e10a8391a3 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1206,8 +1206,6 @@ config BPF
>
>  menuconfig EXPERT
> bool "Configure standard kernel features (expert users)"
> -   # Unhide debug options, to make the on-by-default options visible
> -   select DEBUG_KERNEL
> help
>   This option allows certain base kernel options and settings
>to be disabled or tweaked. This is for specialized
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 0d9e81779e37..9fbf3499ec8d 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -434,6 +434,7 @@ config MAGIC_SYSRQ_SERIAL
>
>  config DEBUG_KERNEL
> bool "Kernel debugging"
> +   default EXPERT
> help
>   Say Y here if you are developing drivers or trying to debug and
>   identify kernel problems.
> --
> 2.21.0
>


-- 
Kees Cook

Re: [PATCH 1/4] ARM: dts: imx6: RDU2: Use new CODEC reset pin name

On Fri, Mar 29, 2019 at 01:13:10PM -0500, Andrew F. Davis wrote:
> The correct DT property for specifying a GPIO used for reset
> is "reset-gpios", the driver now accepts this name, use it here.
> 
> Note the GPIO polarity in the driver was ignored before and always
> assumed to be active low, when all the DTs are fixed we will start
> respecting the specified polarity. Switch polarity in DT to the
> currently assumed one, this way when the driver changes the
> behavior will not change.
> 
> Signed-off-by: Andrew F. Davis 

I fixed up the prefix to use board name, and applied patch #1 ~ #3.

Shawn

[PATCH -next] bus: ti-sysc: Use PTR_ERR_OR_ZERO in sysc_init_resets()

2019-04-10 Thread YueHaibing

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Signed-off-by: YueHaibing 
---
 drivers/bus/ti-sysc.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/bus/ti-sysc.c b/drivers/bus/ti-sysc.c
index b696f26a3894..2b93be2882f3 100644
--- a/drivers/bus/ti-sysc.c
+++ b/drivers/bus/ti-sysc.c
@@ -432,10 +432,7 @@ static int sysc_init_resets(struct sysc *ddata)
 {
ddata->rsts =
devm_reset_control_array_get_optional_exclusive(ddata->dev);
-   if (IS_ERR(ddata->rsts))
-   return PTR_ERR(ddata->rsts);
-
-   return 0;
+   return PTR_ERR_OR_ZERO(ddata->rsts);
 }
 
 /**

Re: [RFC patch 16/41] tracing: Remove the ULONG_MAX stack trace hackery

On Wed, Apr 10, 2019 at 12:28:10PM +0200, Thomas Gleixner wrote:
> No architecture terminates the stack trace with ULONG_MAX anymore. As the
> code checks the number of entries stored anyway there is no point in
> keeping all that ULONG_MAX magic around.
> 
> The histogram code zeroes the storage before saving the stack, so if the
> trace is shorter than the maximum number of entries it can terminate the
> print loop if a zero entry is detected.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Steven Rostedt 
> ---
>  kernel/trace/trace_events_hist.c |2 +-
>  kernel/trace/trace_stack.c   |   20 +---
>  2 files changed, 6 insertions(+), 16 deletions(-)
> 
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -5246,7 +5246,7 @@ static void hist_trigger_stacktrace_prin
>   unsigned int i;
>  
>   for (i = 0; i < max_entries; i++) {
> - if (stacktrace_entries[i] == ULONG_MAX)
> + if (!stacktrace_entries[i])
>   return;
>  
>   seq_printf(m, "%*c", 1 + spaces, ' ');
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -18,8 +18,7 @@
>  
>  #include "trace.h"
>  
> -static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
> -  { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
> +static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES + 1];

Is the "+ 1" still needed?  AFAICT, accesses to this array never go past
nr_entries.

Also I've been staring at the code but I can't figure out why
max_entries is "- 1".

struct stack_trace stack_trace_max = {
.max_entries= STACK_TRACE_ENTRIES - 1,
.entries= _dump_trace[0],
};

-- 
Josh

Re: [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects

2019-04-10 Thread Al Viro

On Thu, Apr 11, 2019 at 11:34:40AM +1000, Tobin C. Harding wrote:
> +/*
> + * d_isolate() - Dentry isolation callback function.
> + * @s: The dentry cache.
> + * @v: Vector of pointers to the objects to isolate.
> + * @nr: Number of objects in @v.
> + *
> + * The slab allocator is holding off frees. We can safely examine
> + * the object without the danger of it vanishing from under us.
> + */
> +static void *d_isolate(struct kmem_cache *s, void **v, int nr)
> +{
> + struct dentry *dentry;
> + int i;
> +
> + for (i = 0; i < nr; i++) {
> + dentry = v[i];
> + __dget(dentry);
> + }
> +
> + return NULL;/* No need for private data */
> +}

Huh?  This is compeletely wrong; what you need is collecting the ones
with zero refcount (and not on shrink lists) into a private list.
*NOT* bumping the refcounts at all.  And do it in your isolate thing.

> +static void d_partial_shrink(struct kmem_cache *s, void **v, int nr,
> +   int node, void *_unused)
> +{
> + struct dentry *dentry;
> + LIST_HEAD(dispose);
> + int i;
> +
> + for (i = 0; i < nr; i++) {
> + dentry = v[i];
> + spin_lock(>d_lock);
> + dentry->d_lockref.count--;
> +
> + if (dentry->d_lockref.count > 0 ||
> + dentry->d_flags & DCACHE_SHRINK_LIST) {
> + spin_unlock(>d_lock);
> + continue;
> + }
> +
> + if (dentry->d_flags & DCACHE_LRU_LIST)
> + d_lru_del(dentry);
> +
> + d_shrink_add(dentry, );
> +
> + spin_unlock(>d_lock);
> + }

Basically, that loop (sans jerking the refcount up and down) should
get moved into d_isolate().
> +
> + if (!list_empty())
> + shrink_dentry_list();
> +}

... with this left in d_partial_shrink().  And you obviously need some way
to pass the list from the former to the latter...

Re: [PATCH v3 1/9] ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA

On Thu, Mar 28, 2019 at 11:49:16PM -0700, Andrey Smirnov wrote:
> Since 25aaa75df1e6 SDMA driver uses clock rates of "ipg" and "ahb"
> clock to determine if it needs to configure the IP block as operating
> at 1:1 or 1:2 clock ratio (ACR bit in SDMAARM_CONFIG). Specifying both
> clocks as IMX6QDL_CLK_SDMA results in driver incorrectly thinking that
> ratio is 1:1 which results in broken SDMA funtionality(this at least
> breaks RAVE SP serdev driver on RDU2). Fix the code to specify
> IMX6QDL_CLK_IPG as "ipg" clock for SDMA, to avoid detecting incorrect
> clock ratio.
> 
> Fixes: 25aaa75df1e6 ("dmaengine: imx-sdma: add clock ratio 1:1 check")

Since we have a fix in the dma driver, I dropped the Fixes tag here.

Applied all, thanks.

Shawn

> Signed-off-by: Andrey Smirnov 
> Reviewed-by: Lucas Stach 
> Cc: Angus Ainslie (Purism) 
> Cc: Chris Healy 
> Cc: Lucas Stach 
> Cc: Fabio Estevam 
> Cc: Shawn Guo 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/arm/boot/dts/imx6qdl.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/boot/dts/imx6qdl.dtsi b/arch/arm/boot/dts/imx6qdl.dtsi
> index 9f9aa6e7ed0e..354feba077b2 100644
> --- a/arch/arm/boot/dts/imx6qdl.dtsi
> +++ b/arch/arm/boot/dts/imx6qdl.dtsi
> @@ -949,7 +949,7 @@
>   compatible = "fsl,imx6q-sdma", "fsl,imx35-sdma";
>   reg = <0x020ec000 0x4000>;
>   interrupts = <0 2 IRQ_TYPE_LEVEL_HIGH>;
> - clocks = < IMX6QDL_CLK_SDMA>,
> + clocks = < IMX6QDL_CLK_IPG>,
>< IMX6QDL_CLK_SDMA>;
>   clock-names = "ipg", "ahb";
>   #dma-cells = <3>;
> -- 
> 2.20.1
>

Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation

2019-04-10 Thread Waiman Long

On 04/10/2019 02:44 PM, Peter Zijlstra wrote:
> On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
>> Because of writer lock stealing, it is possible that a constant
>> stream of incoming writers will cause a waiting writer or reader to
>> wait indefinitely leading to lock starvation.
>>
>> The mutex code has a lock handoff mechanism to prevent lock starvation.
>> This patch implements a similar lock handoff mechanism to disable
>> lock stealing and force lock handoff to the first waiter in the queue
>> after at least a 5ms waiting period. The waiting period is used to
>> avoid discouraging lock stealing too much to affect performance.
> I would say the handoff it not at all similar to the mutex code. It is
> in fact radically different.
>

I mean they are similar in concept. Of course, the implementations are
quite different.

>> @@ -131,6 +138,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>>  adjustment = RWSEM_READER_BIAS;
>>  oldcount = atomic_long_fetch_add(adjustment, >count);
>>  if (unlikely(oldcount & RWSEM_WRITER_MASK)) {
>> +/*
>> + * Initiate handoff to reader, if applicable.
>> + */
>> +if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
>> +time_after(jiffies, waiter->timeout)) {
>> +adjustment -= RWSEM_FLAG_HANDOFF;
>> +lockevent_inc(rwsem_rlock_handoff);
>> +}
>> +
>>  atomic_long_sub(adjustment, >count);
>>  return;
>>  }
> That confuses the heck out of me...
>
> The above seems to rely on __rwsem_mark_wake() to be fully serialized
> (and it is, by ->wait_lock, but that isn't spelled out anywhere) such
> that we don't get double increment of FLAG_HANDOFF.
>
> So there is NO __rwsem_mark_wake() vs __wesem_mark_wake() race like:
>
>   CPU0CPU1
>
>   oldcount = atomic_long_fetch_add(adjustment, >count)
>
>   oldcount = 
> atomic_long_fetch_add(adjustment, >count)
>
>   if (!(oldcount & HANDOFF))
> adjustment -= HANDOFF;
>
>   if (!(oldcount & HANDOFF))
> adjustment -= HANDOFF;
>   atomic_long_sub(adjustment)
>   atomic_long_sub(adjustment)
>
>
> *whoops* double negative decrement of HANDOFF (aka double increment).

Yes, __rwsem_mark_wake() is always called with wait_lock held. I can add
a lockdep_assert() statement to clarify this point.

>
> However there is another site that fiddles with the HANDOFF bit, namely
> __rwsem_down_write_failed_common(), and that does:
>
> +   atomic_long_or(RWSEM_FLAG_HANDOFF, 
> >count);
>
> _OUTSIDE_ of ->wait_lock, which would yield:
>
>   CPU0CPU1
>
>   oldcount = atomic_long_fetch_add(adjustment, >count)
>
>   atomic_long_or(HANDOFF)
>
>   if (!(oldcount & HANDOFF))
> adjustment -= HANDOFF;
>
>   atomic_long_sub(adjustment)
>
> *whoops*, incremented HANDOFF on HANDOFF.
>
>
> And there's not a comment in sight that would elucidate if this is
> possible or not.
>

A writer can only set the handoff bit if it is the first waiter in the
queue. If it is the first waiter, a racing __rwsem_mark_wake() will see
that the first waiter is a writer and so won't go into the reader path.
I know I something don't spell out all the conditions that may look
obvious to me but not to others. I will elaborate more in comments.

> Also:
>
> +   atomic_long_or(RWSEM_FLAG_HANDOFF, 
> >count);
> +   first++;
> +
> +   /*
> +* Make sure the handoff bit is seen by
> +* others before proceeding.
> +*/
> +   smp_mb__after_atomic();
>
> That comment is utter nonsense. smp_mb() doesn't (and cannot) 'make
> visible'. There needs to be order between two memops on both sides.
>
I kind of add that for safety. I will take some time to rethink if it is
really necessary.

Cheers,
Longman

linux-next: manual merge of the apparmor tree with Linus' tree

2019-04-10 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the apparmor tree got a conflict in:

  security/apparmor/lsm.c

between commit:

  e33c1b992377 ("apparmor: Restore Y/N in /sys for apparmor's "enabled"")

from Linus' tree and commit:

  876dd866c084 ("apparmor: Initial implementation of raw policy blob 
compression")

from the apparmor tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc security/apparmor/lsm.c
index 87500bde5a92,e1e9c3c01cd3..
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@@ -25,8 -25,8 +25,9 @@@
  #include 
  #include 
  #include 
+ #include 
  #include 
 +#include 
  
  #include "include/apparmor.h"
  #include "include/apparmorfs.h"
@@@ -1420,46 -1424,37 +1436,77 @@@ static int param_get_aauint(char *buffe
return param_get_uint(buffer, kp);
  }
  
 +/* Can only be set before AppArmor is initialized (i.e. on boot cmdline). */
 +static int param_set_aaintbool(const char *val, const struct kernel_param *kp)
 +{
 +  struct kernel_param kp_local;
 +  bool value;
 +  int error;
 +
 +  if (apparmor_initialized)
 +  return -EPERM;
 +
 +  /* Create local copy, with arg pointing to bool type. */
 +  value = !!*((int *)kp->arg);
 +  memcpy(_local, kp, sizeof(kp_local));
 +  kp_local.arg = 
 +
 +  error = param_set_bool(val, _local);
 +  if (!error)
 +  *((int *)kp->arg) = *((bool *)kp_local.arg);
 +  return error;
 +}
 +
 +/*
 + * To avoid changing /sys/module/apparmor/parameters/enabled from Y/N to
 + * 1/0, this converts the "int that is actually bool" back to bool for
 + * display in the /sys filesystem, while keeping it "int" for the LSM
 + * infrastructure.
 + */
 +static int param_get_aaintbool(char *buffer, const struct kernel_param *kp)
 +{
 +  struct kernel_param kp_local;
 +  bool value;
 +
 +  /* Create local copy, with arg pointing to bool type. */
 +  value = !!*((int *)kp->arg);
 +  memcpy(_local, kp, sizeof(kp_local));
 +  kp_local.arg = 
 +
 +  return param_get_bool(buffer, _local);
 +}
 +
+ static int param_set_aacompressionlevel(const char *val,
+   const struct kernel_param *kp)
+ {
+   int error;
+ 
+   if (!apparmor_enabled)
+   return -EINVAL;
+   if (apparmor_initialized)
+   return -EPERM;
+ 
+   error = param_set_int(val, kp);
+ 
+   aa_g_rawdata_compression_level = clamp(aa_g_rawdata_compression_level,
+  Z_NO_COMPRESSION,
+  Z_BEST_COMPRESSION);
+   pr_info("AppArmor: policy rawdata compression level set to %u\n",
+   aa_g_rawdata_compression_level);
+ 
+   return error;
+ }
+ 
+ static int param_get_aacompressionlevel(char *buffer,
+   const struct kernel_param *kp)
+ {
+   if (!apparmor_enabled)
+   return -EINVAL;
+   if (apparmor_initialized && !policy_view_capable(NULL))
+   return -EPERM;
+   return param_get_int(buffer, kp);
+ }
+ 
  static int param_get_audit(char *buffer, const struct kernel_param *kp)
  {
if (!apparmor_enabled)


pgplC3V9Q_JZK.pgp
Description: OpenPGP digital signature

Re: kernel BUG at fs/inode.c:LINE!

2019-04-10 Thread Al Viro

On Thu, Apr 11, 2019 at 08:50:17AM +0800, Ian Kent wrote:
> On Wed, 2019-04-10 at 14:41 +0200, Dmitry Vyukov wrote:
> > On Wed, Apr 10, 2019 at 2:12 PM Al Viro  wrote:
> > > 
> > > On Wed, Apr 10, 2019 at 08:07:15PM +0800, Ian Kent wrote:
> > > 
> > > > > I'm unable to find a branch matching the line numbers.
> > > > > 
> > > > > Given that, on the face of it, the scenario is impossible I'm
> > > > > seeking clarification on what linux-next to look at for the
> > > > > sake of accuracy.
> > > > > 
> > > > > So I'm wondering if this testing done using the master branch
> > > > > or one of the daily branches one would use to check for conflicts
> > > > > before posting?
> > > > 
> > > > Sorry those are tags not branches.
> > > 
> > > FWIW, that's next-20181214; it is what master had been in mid-December
> > > and master is rebased every day.  Can it be reproduced with the current
> > > tree?
> > 
> > From the info on the dashboard we know that it happened only once on
> > d14b746c (the second one is result of reproducing the first one). So
> > it was either fixed or just hard to trigger.
> 
> Looking at the source of tag next-20181214 in linux-next-history I see
> this is mistake I made due to incorrect error handling which I fixed
> soon after (there was in fact a double iput()).

Right - "autofs: fix possible inode leak in autofs_fill_super()" had been
broken (and completely pointless), leading to double iput() in that failure
case.  And yes, double iput() can trigger that BUG_ON(), and with non-zero
odds do so with that stack trace.

As far as I'm concerned, case closed - bug had been in a misguided "fix"
for inexistent leak (coming from misreading the calling conventions for
d_make_root()), introduced in -next at next-20181130 and kicked out of
there in next-20181219.  Dropped by Ian's request in 
Message-ID: <66d497c00cffb3e4109ca0d5287c8277954d7132.ca...@themaw.net>
which has fixed that crap.  Moreover, that posting had been in reply to
that very syzcaller report, AFAICS.

I don't know how to tell the bot to STFU and close the report in this
situation; up to you, folks.

As an aside, the cause of that bug is that d_make_root() calling conventions
are insufficiently documented.  All we have is

||[mandatory]
||d_alloc_root() is gone, along with a lot of bugs caused by code
||misusing it.  Replacement: d_make_root(inode).  The difference is,
||d_make_root() drops the reference to inode if dentry allocation fails.

in Documentation/filesystems/porting, and that's not good enough.  Anyone
willing to take a shot at that?  FWIW, the calling conventions are:

d_make_root(inode) normally allocates and returns a new dentry.
On failure NULL is returned.  A reference to inode is consumed in all
cases (on success it is transferred to new dentry, on failure it is
dropped), so failure handling does not need anything done to inode.
d_make_root(NULL) quietly returns NULL, which further simplifies the
error handling in typical caller.  Usually it's something like
inode = foofs_new_inode();
s->s_root = d_make_inode(inode);
if (!s->s_root)
bugger off, no need to undo inode allocation
success
We do not need to check if foofs_new_inode() has returned NULL and we
do not need any special cleanups in case of failure - not for the
undoing the inode allocation.

If anyone cares to convert that into coherent (and printable) documentation,
patches are welcome...

[PATCH] rtc: mxc_v2: use dev_pm_set_wake_irq() to simplify code

2019-04-10 Thread Anson Huang

With calling dev_pm_set_wake_irq() to set MXC_V2 RTC as wakeup source
for suspend, generic wake irq mechanism will automatically enable
it as wakeup source when suspend, then the suspend/resume callback
which are ONLY for enabling/disabling irq wake can be removed, it
simplifies the code.

Signed-off-by: Anson Huang 
---
 drivers/rtc/rtc-mxc_v2.c | 29 -
 1 file changed, 4 insertions(+), 25 deletions(-)

diff --git a/drivers/rtc/rtc-mxc_v2.c b/drivers/rtc/rtc-mxc_v2.c
index 007879a..5b970a8 100644
--- a/drivers/rtc/rtc-mxc_v2.c
+++ b/drivers/rtc/rtc-mxc_v2.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define SRTC_LPPDR_INIT   0x41736166   /* init for glitch detect */
@@ -305,6 +306,9 @@ static int mxc_rtc_probe(struct platform_device *pdev)
return pdata->irq;
 
device_init_wakeup(>dev, 1);
+   ret = dev_pm_set_wake_irq(>dev, pdata->irq);
+   if (ret)
+   dev_err(>dev, "failed to enable irq wake\n");
 
ret = clk_prepare_enable(pdata->clk);
if (ret)
@@ -367,30 +371,6 @@ static int mxc_rtc_remove(struct platform_device *pdev)
return 0;
 }
 
-#ifdef CONFIG_PM_SLEEP
-static int mxc_rtc_suspend(struct device *dev)
-{
-   struct mxc_rtc_data *pdata = dev_get_drvdata(dev);
-
-   if (device_may_wakeup(dev))
-   enable_irq_wake(pdata->irq);
-
-   return 0;
-}
-
-static int mxc_rtc_resume(struct device *dev)
-{
-   struct mxc_rtc_data *pdata = dev_get_drvdata(dev);
-
-   if (device_may_wakeup(dev))
-   disable_irq_wake(pdata->irq);
-
-   return 0;
-}
-#endif
-
-static SIMPLE_DEV_PM_OPS(mxc_rtc_pm_ops, mxc_rtc_suspend, mxc_rtc_resume);
-
 static const struct of_device_id mxc_ids[] = {
{ .compatible = "fsl,imx53-rtc", },
{}
@@ -400,7 +380,6 @@ static struct platform_driver mxc_rtc_driver = {
.driver = {
.name = "mxc_rtc_v2",
.of_match_table = mxc_ids,
-   .pm = _rtc_pm_ops,
},
.probe = mxc_rtc_probe,
.remove = mxc_rtc_remove,
-- 
2.7.4

Re: [RFC PATCH hubcap] orangefs: orangefs_file_open() can be static

2019-04-10 Thread Joe Perches

On Thu, 2019-04-11 at 09:58 +0800, kbuild test robot wrote:
> Fixes: 9a959aaffd70 ("orangefs: remember count when reading.")

Making something static likely does not warrant a "Fixes:" tag

> Signed-off-by: kbuild test robot 
> ---
>  file.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
> index d198af9..01d0db6 100644
> --- a/fs/orangefs/file.c
> +++ b/fs/orangefs/file.c
> @@ -571,7 +571,7 @@ static int orangefs_lock(struct file *filp, int cmd, 
> struct file_lock *fl)
>   return rc;
>  }
>  
> -int orangefs_file_open(struct inode * inode, struct file *file)
> +static int orangefs_file_open(struct inode * inode, struct file *file)
>  {
>   file->private_data = NULL;
>   return generic_file_open(inode, file);

Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling.

2019-04-10 Thread Aaron Lu

On Wed, Apr 10, 2019 at 10:18:10PM +0800, Aubrey Li wrote:
> On Wed, Apr 10, 2019 at 12:36 PM Aaron Lu  wrote:
> >
> > On Tue, Apr 09, 2019 at 11:09:45AM -0700, Tim Chen wrote:
> > > Now that we have accumulated quite a number of different fixes to your 
> > > orginal
> > > posted patches.  Would you like to post a v2 of the core scheduler with 
> > > the fixes?
> >
> > One more question I'm not sure: should a task with cookie=0, i.e. tasks
> > that are untagged, be allowed to scheduled on the the same core with
> > another tagged task?
> >
> > The current patch seems to disagree on this, e.g. in pick_task(),
> > if max is already chosen but max->core_cookie == 0, then we didn't care
> > about cookie and simply use class_pick for the other cpu. This means we
> > could schedule two tasks with different cookies(one is zero and the
> > other can be tagged).
> >
> > But then sched_core_find() only allow idle task to match with any tagged
> > tasks(we didn't place untagged tasks to the core tree of course :-).
> >
> > Thoughts? Do I understand this correctly? If so, I think we probably
> > want to make this clear before v2. I personally feel, we shouldn't allow
> > untagged tasks(like kernel threads) to match with tagged tasks.
> 
> Does it make sense if we take untagged tasks as hypervisor, and different
> cookie tasks as different VMs? Isolation is done between VMs, not between
> VM and hypervisor.
> 
> Did you see anything harmful if an untagged task and a tagged task
> run simultaneously on the same core?

VM can see hypervisor's data then, I think.
We probably do not want that happen.

[PATCH] rtc: mxc: use dev_pm_set_wake_irq() to simplify code

2019-04-10 Thread Anson Huang

With calling dev_pm_set_wake_irq() to set MXC RTC as wakeup source
for suspend, generic wake irq mechanism will automatically enable
it as wakeup source when suspend, then the suspend/resume callback
which are ONLY for enabling/disabling irq wake can be removed, it
simplifies the code.

Signed-off-by: Anson Huang 
---
 drivers/rtc/rtc-mxc.c | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/drivers/rtc/rtc-mxc.c b/drivers/rtc/rtc-mxc.c
index 28a15bd..708b9e9 100644
--- a/drivers/rtc/rtc-mxc.c
+++ b/drivers/rtc/rtc-mxc.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -394,8 +395,12 @@ static int mxc_rtc_probe(struct platform_device *pdev)
pdata->irq = -1;
}
 
-   if (pdata->irq >= 0)
+   if (pdata->irq >= 0) {
device_init_wakeup(>dev, 1);
+   ret = dev_pm_set_wake_irq(>dev, pdata->irq);
+   if (ret)
+   dev_err(>dev, "failed to enable irq wake\n");
+   }
 
rtc = devm_rtc_device_register(>dev, pdev->name, _rtc_ops,
  THIS_MODULE);
@@ -426,35 +431,10 @@ static int mxc_rtc_remove(struct platform_device *pdev)
return 0;
 }
 
-#ifdef CONFIG_PM_SLEEP
-static int mxc_rtc_suspend(struct device *dev)
-{
-   struct rtc_plat_data *pdata = dev_get_drvdata(dev);
-
-   if (device_may_wakeup(dev))
-   enable_irq_wake(pdata->irq);
-
-   return 0;
-}
-
-static int mxc_rtc_resume(struct device *dev)
-{
-   struct rtc_plat_data *pdata = dev_get_drvdata(dev);
-
-   if (device_may_wakeup(dev))
-   disable_irq_wake(pdata->irq);
-
-   return 0;
-}
-#endif
-
-static SIMPLE_DEV_PM_OPS(mxc_rtc_pm_ops, mxc_rtc_suspend, mxc_rtc_resume);
-
 static struct platform_driver mxc_rtc_driver = {
.driver = {
   .name= "mxc_rtc",
   .of_match_table = of_match_ptr(imx_rtc_dt_ids),
-  .pm  = _rtc_pm_ops,
},
.id_table = imx_rtc_devtype,
.probe = mxc_rtc_probe,
-- 
2.7.4

[PATCH 2/2] regulator: mcp16502: Remove setup_regulators function

2019-04-10 Thread Axel Lin

It seems a little bit odd current code pass struct regulator_config rather
than a pointer to setup_regulators. The setup_regulators is so simple and
only has one caller, so remove it.

Signed-off-by: Axel Lin 
---
 drivers/regulator/mcp16502.c | 37 +++-
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/drivers/regulator/mcp16502.c b/drivers/regulator/mcp16502.c
index 9292ab8736c7..e5a02711cb46 100644
--- a/drivers/regulator/mcp16502.c
+++ b/drivers/regulator/mcp16502.c
@@ -427,36 +427,15 @@ static const struct regmap_config mcp16502_regmap_config 
= {
.wr_table   = _yes_reg_table,
 };
 
-/*
- * set_up_regulators() - initialize all regulators
- */
-static int setup_regulators(struct mcp16502 *mcp, struct device *dev,
-   struct regulator_config config)
-{
-   struct regulator_dev *rdev;
-   int i;
-
-   for (i = 0; i < NUM_REGULATORS; i++) {
-   rdev = devm_regulator_register(dev, _desc[i], );
-   if (IS_ERR(rdev)) {
-   dev_err(dev,
-   "failed to register %s regulator %ld\n",
-   mcp16502_desc[i].name, PTR_ERR(rdev));
-   return PTR_ERR(rdev);
-   }
-   }
-
-   return 0;
-}
-
 static int mcp16502_probe(struct i2c_client *client,
  const struct i2c_device_id *id)
 {
struct regulator_config config = { };
+   struct regulator_dev *rdev;
struct device *dev;
struct mcp16502 *mcp;
struct regmap *rmap;
-   int ret = 0;
+   int i, ret;
 
dev = >dev;
config.dev = dev;
@@ -482,9 +461,15 @@ static int mcp16502_probe(struct i2c_client *client,
return PTR_ERR(mcp->lpm);
}
 
-   ret = setup_regulators(mcp, dev, config);
-   if (ret != 0)
-   return ret;
+   for (i = 0; i < NUM_REGULATORS; i++) {
+   rdev = devm_regulator_register(dev, _desc[i], );
+   if (IS_ERR(rdev)) {
+   dev_err(dev,
+   "failed to register %s regulator %ld\n",
+   mcp16502_desc[i].name, PTR_ERR(rdev));
+   return PTR_ERR(rdev);
+   }
+   }
 
mcp16502_gpio_set_mode(mcp, MCP16502_OPMODE_ACTIVE);
 
-- 
2.17.1

[hubcap:for-next 20/22] fs/orangefs/file.c:574:5: sparse: symbol 'orangefs_file_open' was not declared. Should it be static?

2019-04-10 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux for-next
head:   6055a739910e69f8f76120d48e7ae74a13b1fdda
commit: 9a959aaffd7090810eade53e4d960614405f57c6 [20/22] orangefs: remember 
count when reading.
reproduce:
# apt-get install sparse
git checkout 9a959aaffd7090810eade53e4d960614405f57c6
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'


sparse warnings: (new ones prefixed by >>)

>> fs/orangefs/file.c:574:5: sparse: symbol 'orangefs_file_open' was not 
>> declared. Should it be static?
   fs/orangefs/file.c:580:5: sparse: symbol 'orangefs_flush' was not declared. 
Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

[PATCH 1/2] regulator: mcp16502: Remove unneeded fields from struct mcp16502

2019-04-10 Thread Axel Lin

At the context with rdev, we can use rdev->regmap instead of mcp->rmap.
The *rdev[NUM_REGULATORS] is not required because current code uses
devm_regulator_register() so we don't need to store *rdev for clean up
paths.

Signed-off-by: Axel Lin 
---
 drivers/regulator/mcp16502.c | 40 +++-
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/drivers/regulator/mcp16502.c b/drivers/regulator/mcp16502.c
index 3a8004abe044..9292ab8736c7 100644
--- a/drivers/regulator/mcp16502.c
+++ b/drivers/regulator/mcp16502.c
@@ -119,8 +119,6 @@ enum {
  * @lpm: LPM GPIO descriptor
  */
 struct mcp16502 {
-   struct regulator_dev *rdev[NUM_REGULATORS];
-   struct regmap *rmap;
struct gpio_desc *lpm;
 };
 
@@ -179,13 +177,12 @@ static unsigned int mcp16502_get_mode(struct 
regulator_dev *rdev)
 {
unsigned int val;
int ret, reg;
-   struct mcp16502 *mcp = rdev_get_drvdata(rdev);
 
reg = mcp16502_get_reg(rdev, MCP16502_OPMODE_ACTIVE);
if (reg < 0)
return reg;
 
-   ret = regmap_read(mcp->rmap, reg, );
+   ret = regmap_read(rdev->regmap, reg, );
if (ret)
return ret;
 
@@ -211,7 +208,6 @@ static int _mcp16502_set_mode(struct regulator_dev *rdev, 
unsigned int mode,
 {
int val;
int reg;
-   struct mcp16502 *mcp = rdev_get_drvdata(rdev);
 
reg = mcp16502_get_reg(rdev, op_mode);
if (reg < 0)
@@ -228,7 +224,7 @@ static int _mcp16502_set_mode(struct regulator_dev *rdev, 
unsigned int mode,
return -EINVAL;
}
 
-   reg = regmap_update_bits(mcp->rmap, reg, MCP16502_MODE, val);
+   reg = regmap_update_bits(rdev->regmap, reg, MCP16502_MODE, val);
return reg;
 }
 
@@ -247,9 +243,8 @@ static int mcp16502_get_status(struct regulator_dev *rdev)
 {
int ret;
unsigned int val;
-   struct mcp16502 *mcp = rdev_get_drvdata(rdev);
 
-   ret = regmap_read(mcp->rmap, MCP16502_STAT_BASE(rdev_get_id(rdev)),
+   ret = regmap_read(rdev->regmap, MCP16502_STAT_BASE(rdev_get_id(rdev)),
  );
if (ret)
return ret;
@@ -290,7 +285,6 @@ static int mcp16502_suspend_get_target_reg(struct 
regulator_dev *rdev)
  */
 static int mcp16502_set_suspend_voltage(struct regulator_dev *rdev, int uV)
 {
-   struct mcp16502 *mcp = rdev_get_drvdata(rdev);
int sel = regulator_map_voltage_linear_range(rdev, uV, uV);
int reg = mcp16502_suspend_get_target_reg(rdev);
 
@@ -300,7 +294,7 @@ static int mcp16502_set_suspend_voltage(struct 
regulator_dev *rdev, int uV)
if (reg < 0)
return reg;
 
-   return regmap_update_bits(mcp->rmap, reg, MCP16502_VSEL, sel);
+   return regmap_update_bits(rdev->regmap, reg, MCP16502_VSEL, sel);
 }
 
 /*
@@ -328,13 +322,12 @@ static int mcp16502_set_suspend_mode(struct regulator_dev 
*rdev,
  */
 static int mcp16502_set_suspend_enable(struct regulator_dev *rdev)
 {
-   struct mcp16502 *mcp = rdev_get_drvdata(rdev);
int reg = mcp16502_suspend_get_target_reg(rdev);
 
if (reg < 0)
return reg;
 
-   return regmap_update_bits(mcp->rmap, reg, MCP16502_EN, MCP16502_EN);
+   return regmap_update_bits(rdev->regmap, reg, MCP16502_EN, MCP16502_EN);
 }
 
 /*
@@ -342,13 +335,12 @@ static int mcp16502_set_suspend_enable(struct 
regulator_dev *rdev)
  */
 static int mcp16502_set_suspend_disable(struct regulator_dev *rdev)
 {
-   struct mcp16502 *mcp = rdev_get_drvdata(rdev);
int reg = mcp16502_suspend_get_target_reg(rdev);
 
if (reg < 0)
return reg;
 
-   return regmap_update_bits(mcp->rmap, reg, MCP16502_EN, 0);
+   return regmap_update_bits(rdev->regmap, reg, MCP16502_EN, 0);
 }
 #endif /* CONFIG_SUSPEND */
 
@@ -441,17 +433,16 @@ static const struct regmap_config mcp16502_regmap_config 
= {
 static int setup_regulators(struct mcp16502 *mcp, struct device *dev,
struct regulator_config config)
 {
+   struct regulator_dev *rdev;
int i;
 
for (i = 0; i < NUM_REGULATORS; i++) {
-   mcp->rdev[i] = devm_regulator_register(dev,
-  _desc[i],
-  );
-   if (IS_ERR(mcp->rdev[i])) {
+   rdev = devm_regulator_register(dev, _desc[i], );
+   if (IS_ERR(rdev)) {
dev_err(dev,
"failed to register %s regulator %ld\n",
-   mcp16502_desc[i].name, PTR_ERR(mcp->rdev[i]));
-   return PTR_ERR(mcp->rdev[i]);
+   mcp16502_desc[i].name, PTR_ERR(rdev));
+   return PTR_ERR(rdev);
}
}
 
@@ -464,6 +455,7 @@ static int mcp16502_probe(struct i2c_client *client,
struct regulator_config config = { };

[RFC PATCH hubcap] orangefs: orangefs_file_open() can be static

2019-04-10 Thread kbuild test robot



Fixes: 9a959aaffd70 ("orangefs: remember count when reading.")
Signed-off-by: kbuild test robot 
---
 file.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index d198af9..01d0db6 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -571,7 +571,7 @@ static int orangefs_lock(struct file *filp, int cmd, struct 
file_lock *fl)
return rc;
 }
 
-int orangefs_file_open(struct inode * inode, struct file *file)
+static int orangefs_file_open(struct inode * inode, struct file *file)
 {
file->private_data = NULL;
return generic_file_open(inode, file);

Re: \\ 答复: [PATCH] of: del redundant type conversion

2019-04-10 Thread xiaojiangfeng

My pleasure.

I am very new to sparse.

I guess the warning is caused by the macro min.

Then I submitted my changes.

Thanks for code review.


-邮件原件-
发件人: Frank Rowand [mailto:frowand.l...@gmail.com] 
发送时间: 2019年4月11日 2:50
收件人: xiaojiangfeng ; robh...@kernel.org; 
r...@kernel.org
抄送: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org
主题: Re: [PATCH] of: del redundant type conversion

On 4/10/19 1:29 AM, xiaojiangfeng wrote:
> The type of variable l in early_init_dt_scan_chosen is int, there is 
> no need to convert to int.
> 
> Signed-off-by: xiaojiangfeng 
> ---
>  drivers/of/fdt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 
> 4734223..de893c9 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long 
> node, const char *uname,
>   /* Retrieve command line */
>   p = of_get_flat_dt_prop(node, "bootargs", );
>   if (p != NULL && l > 0)
> - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));
> + strlcpy(data, p, min(l, COMMAND_LINE_SIZE));
>  
>   /*
>* CONFIG_CMDLINE is meant to be a default in case nothing else
> 

Thanks for catching the redundant cast.

There is a second problem detected by sparse on that line:

  drivers/of/fdt.c:1094:34: warning: expression using sizeof(void)

Can you please fix both issues?

Thanks,

Frank

[RFC 1/2] mm: oom: expose expedite_reclaim to use oom_reaper outside of oom_kill.c

2019-04-10 Thread Suren Baghdasaryan

Create an API to allow users outside of oom_kill.c to mark a victim and
wake up oom_reaper thread for expedited memory reclaim of the process being
killed.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/oom.h |  1 +
 mm/oom_kill.c   | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index d07992009265..6c043c7518c1 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -112,6 +112,7 @@ extern unsigned long oom_badness(struct task_struct *p,
unsigned long totalpages);
 
 extern bool out_of_memory(struct oom_control *oc);
+extern bool expedite_reclaim(struct task_struct *task);
 
 extern void exit_oom_victim(void);
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 3a2484884cfd..6449710c8a06 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1102,6 +1102,21 @@ bool out_of_memory(struct oom_control *oc)
return !!oc->chosen;
 }
 
+bool expedite_reclaim(struct task_struct *task)
+{
+   bool res = false;
+
+   task_lock(task);
+   if (task_will_free_mem(task)) {
+   mark_oom_victim(task);
+   wake_oom_reaper(task);
+   res = true;
+   }
+   task_unlock(task);
+
+   return res;
+}
+
 /*
  * The pagefault handler calls here because it is out of memory, so kill a
  * memory-hogging task. If oom_lock is held by somebody else, a parallel oom
-- 
2.21.0.392.gf8f6787159e-goog

[PATCH] of: fix expression using sizeof(void)

2019-04-10 Thread xiaojiangfeng

problem detected by sparse:
drivers/of/fdt.c:1094:34: warning: expression using sizeof(void)

Signed-off-by: xiaojiangfeng 
---
 drivers/of/fdt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 4734223..75c6c55 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1091,7 +1091,7 @@ int __init early_init_dt_scan_chosen(unsigned long node, 
const char *uname,
/* Retrieve command line */
p = of_get_flat_dt_prop(node, "bootargs", );
if (p != NULL && l > 0)
-   strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));
+   strlcpy(data, p, COMMAND_LINE_SIZE);
 
/*
 * CONFIG_CMDLINE is meant to be a default in case nothing else
-- 
1.8.5.6

[RFC 0/2] opportunistic memory reclaim of a killed process

2019-04-10 Thread Suren Baghdasaryan

The time to kill a process and free its memory can be critical when the
killing was done to prevent memory shortages affecting system
responsiveness.

In the case of Android, where processes can be restarted easily, killing a
less important background process is preferred to delaying or throttling
an interactive foreground process. At the same time unnecessary kills
should be avoided as they cause delays when the killed process is needed
again. This requires a balanced decision from the system software about
how long a kill can be postponed in the hope that memory usage will
decrease without such drastic measures.

As killing a process and reclaiming its memory is not an instant operation,
a margin of free memory has to be maintained to prevent system performance
deterioration while memory of the killed process is being reclaimed. The
size of this margin depends on the minimum reclaim rate to cover the
worst-case scenario and this minimum rate should be deterministic.

Note that on asymmetric architectures like ARM big.LITTLE the reclaim rate
can vary dramatically depending on which core it’s performed at (see test
results). It’s a usual scenario that a non-essential victim process is
being restricted to a less performant or throttled CPU for power saving
purposes. This makes the worst-case reclaim rate scenario very probable.

The cases when victim’s memory reclaim can be delayed further due to
process being blocked in an uninterruptible sleep or when it performs a
time-consuming operation makes the reclaim time even more unpredictable.

Increasing memory reclaim rate and making it more deterministic would
allow for a smaller free memory margin and would lead to more opportunities
to avoid killing a process.

Note that while other strategies like throttling memory allocations are
viable and can be employed for other non-essential processes they would
affect user experience if applied towards an interactive process.

Proposed solution uses existing oom-reaper thread to increase memory
reclaim rate of a killed process and to make this rate more deterministic.
By no means the proposed solution is considered the best and was chosen
because it was simple to implement and allowed for test data collection.
The downside of this solution is that it requires additional “expedite”
hint for something which has to be fast in all cases. Would be great to
find a way that does not require additional hints.

Other possible approaches include:
- Implementing a dedicated syscall to perform opportunistic reclaim in the
context of the process waiting for the victim’s death. A natural boost
bonus occurs if the waiting process has high or RT priority and is not
limited by cpuset cgroup in its CPU choices.
- Implement a mechanism that would perform opportunistic reclaim if it’s
possible unconditionally (similar to checks in task_will_free_mem()).
- Implement opportunistic reclaim that uses shrinker interface, PSI or
other memory pressure indications as a hint to engage.

Test details:
Tests are performed on a Qualcomm® Snapdragon™ 845 8-core ARM big.LITTLE
system with 4 little cores (0.3-1.6GHz) and 4 big cores (0.8-2.5GHz)
running Android.
Memory reclaim speed was measured using signal/signal_generate,
kmem/rss_stat and sched/sched_process_exit traces.

Test results:
powersave governor, min freq
normal kills  expedited kills
little  856 MB/sec3236 MB/sec
big 5084 MB/sec   6144 MB/sec

performance governor, max freq
normal kills  expedited kills
little  5602 MB/sec   8144 MB/sec
big 14656 MB/sec  12398 MB/sec

schedutil governor (default)
normal kills  expedited kills
little  2386 MB/sec   3908 MB/sec
big 7282 MB/sec   6820-16386 MB/sec
=
min reclaim speed:  856 MB/sec3236 MB/sec

The patches are based on 5.1-rc1

Suren Baghdasaryan (2):
  mm: oom: expose expedite_reclaim to use oom_reaper outside of
oom_kill.c
  signal: extend pidfd_send_signal() to allow expedited process killing

 include/linux/oom.h  |  1 +
 include/linux/sched/signal.h |  3 ++-
 include/linux/signal.h   | 11 ++-
 ipc/mqueue.c |  2 +-
 kernel/signal.c  | 37 
 kernel/time/itimer.c |  2 +-
 mm/oom_kill.c| 15 +++
 7 files changed, 59 insertions(+), 12 deletions(-)

-- 
2.21.0.392.gf8f6787159e-goog

[RFC 2/2] signal: extend pidfd_send_signal() to allow expedited process killing

2019-04-10 Thread Suren Baghdasaryan

Add new SS_EXPEDITE flag to be used when sending SIGKILL via
pidfd_send_signal() syscall to allow expedited memory reclaim of the
victim process. The usage of this flag is currently limited to SIGKILL
signal and only to privileged users.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/sched/signal.h |  3 ++-
 include/linux/signal.h   | 11 ++-
 ipc/mqueue.c |  2 +-
 kernel/signal.c  | 37 
 kernel/time/itimer.c |  2 +-
 5 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e412c092c1e8..8a227633a058 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -327,7 +327,8 @@ extern int send_sig_info(int, struct kernel_siginfo *, 
struct task_struct *);
 extern void force_sigsegv(int sig, struct task_struct *p);
 extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid 
*pgrp);
-extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid 
*pid);
+extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid,
+   bool expedite);
 extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
const struct cred *);
 extern int kill_pgrp(struct pid *pid, int sig, int priv);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 9702016734b1..34b7852aa4a0 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -446,8 +446,17 @@ int __save_altstack(stack_t __user *, unsigned long);
 } while (0);
 
 #ifdef CONFIG_PROC_FS
+
+/*
+ * SS_FLAGS values used in pidfd_send_signal:
+ *
+ * SS_EXPEDITE indicates desire to expedite the operation.
+ */
+#define SS_EXPEDITE0x0001
+
 struct seq_file;
 extern void render_sigset_t(struct seq_file *, const char *, sigset_t *);
-#endif
+
+#endif /* CONFIG_PROC_FS */
 
 #endif /* _LINUX_SIGNAL_H */
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index aea30530c472..27c66296e08e 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -720,7 +720,7 @@ static void __do_notify(struct mqueue_inode_info *info)
rcu_read_unlock();
 
kill_pid_info(info->notify.sigev_signo,
- _i, info->notify_owner);
+ _i, info->notify_owner, false);
break;
case SIGEV_THREAD:
set_cookie(info->notify_cookie, NOTIFY_WOKENUP);
diff --git a/kernel/signal.c b/kernel/signal.c
index f98448cf2def..02ed4332d17c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -1394,7 +1395,8 @@ int __kill_pgrp_info(int sig, struct kernel_siginfo 
*info, struct pid *pgrp)
return success ? 0 : retval;
 }
 
-int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid)
+int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid,
+ bool expedite)
 {
int error = -ESRCH;
struct task_struct *p;
@@ -1402,8 +1404,17 @@ int kill_pid_info(int sig, struct kernel_siginfo *info, 
struct pid *pid)
for (;;) {
rcu_read_lock();
p = pid_task(pid, PIDTYPE_PID);
-   if (p)
+   if (p) {
error = group_send_sig_info(sig, info, p, PIDTYPE_TGID);
+
+   /*
+* Ignore expedite_reclaim return value, it is best
+* effort only.
+*/
+   if (!error && expedite)
+   expedite_reclaim(p);
+   }
+
rcu_read_unlock();
if (likely(!p || error != -ESRCH))
return error;
@@ -1420,7 +1431,7 @@ static int kill_proc_info(int sig, struct kernel_siginfo 
*info, pid_t pid)
 {
int error;
rcu_read_lock();
-   error = kill_pid_info(sig, info, find_vpid(pid));
+   error = kill_pid_info(sig, info, find_vpid(pid), false);
rcu_read_unlock();
return error;
 }
@@ -1487,7 +1498,7 @@ static int kill_something_info(int sig, struct 
kernel_siginfo *info, pid_t pid)
 
if (pid > 0) {
rcu_read_lock();
-   ret = kill_pid_info(sig, info, find_vpid(pid));
+   ret = kill_pid_info(sig, info, find_vpid(pid), false);
rcu_read_unlock();
return ret;
}
@@ -1704,7 +1715,7 @@ EXPORT_SYMBOL(kill_pgrp);
 
 int kill_pid(struct pid *pid, int sig, int priv)
 {
-   return kill_pid_info(sig, __si_special(priv), pid);
+   return kill_pid_info(sig, __si_special(priv), pid, false);
 }
 EXPORT_SYMBOL(kill_pid);
 
@@ -3577,10

[RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects

The dentry slab cache is susceptible to internal fragmentation.  Now
that we have Slab Movable Objects we can attempt to defragment the
dcache.  Dentry objects are inherently _not_ relocatable however under
some conditions they can be free'd.  This is the same as shrinking the
dcache but instead of shrinking the whole cache we only attempt to free
those objects that are located in partially full slab pages.  There is
no guarantee that this will reduce the memory usage of the system, it is
a compromise between fragmented memory and total cache shrinkage with
the hope that some memory pressure can be alleviated.

This is implemented using the newly added Slab Movable Objects
infrastructure.  The dcache 'migration' function is intentionally _not_
called 'd_migrate' because we only free, we do not migrate.  Call it
'd_partial_shrink' to make explicit that no reallocation is done.

Implement isolate and 'migrate' functions for the dentry slab cache.

Signed-off-by: Tobin C. Harding 
---
 fs/dcache.c | 71 +
 1 file changed, 71 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 606cfca20d42..5c707ed9ab5a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "mount.h"
 
@@ -3068,6 +3069,74 @@ void d_tmpfile(struct dentry *dentry, struct inode 
*inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+/*
+ * d_isolate() - Dentry isolation callback function.
+ * @s: The dentry cache.
+ * @v: Vector of pointers to the objects to isolate.
+ * @nr: Number of objects in @v.
+ *
+ * The slab allocator is holding off frees. We can safely examine
+ * the object without the danger of it vanishing from under us.
+ */
+static void *d_isolate(struct kmem_cache *s, void **v, int nr)
+{
+   struct dentry *dentry;
+   int i;
+
+   for (i = 0; i < nr; i++) {
+   dentry = v[i];
+   __dget(dentry);
+   }
+
+   return NULL;/* No need for private data */
+}
+
+/*
+ * d_partial_shrink() - Dentry migration callback function.
+ * @s: The dentry cache.
+ * @v: Vector of pointers to the objects to migrate.
+ * @nr: Number of objects in @v.
+ * @node: The NUMA node where new object should be allocated.
+ * @private: Returned by d_isolate() (currently %NULL).
+ *
+ * Dentry objects _can not_ be relocated and shrinking the whole dcache
+ * can be expensive.  This is an effort to free dentry objects that are
+ * stopping slab pages from being free'd without clearing the whole dcache.
+ *
+ * This callback is called from the SLUB allocator object migration
+ * infrastructure in attempt to free up slab pages by freeing dentry
+ * objects from partially full slabs.
+ */
+static void d_partial_shrink(struct kmem_cache *s, void **v, int nr,
+ int node, void *_unused)
+{
+   struct dentry *dentry;
+   LIST_HEAD(dispose);
+   int i;
+
+   for (i = 0; i < nr; i++) {
+   dentry = v[i];
+   spin_lock(>d_lock);
+   dentry->d_lockref.count--;
+
+   if (dentry->d_lockref.count > 0 ||
+   dentry->d_flags & DCACHE_SHRINK_LIST) {
+   spin_unlock(>d_lock);
+   continue;
+   }
+
+   if (dentry->d_flags & DCACHE_LRU_LIST)
+   d_lru_del(dentry);
+
+   d_shrink_add(dentry, );
+
+   spin_unlock(>d_lock);
+   }
+
+   if (!list_empty())
+   shrink_dentry_list();
+}
+
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
 {
@@ -3113,6 +3182,8 @@ static void __init dcache_init(void)
   sizeof_field(struct dentry, d_iname),
   dcache_ctor);
 
+   kmem_cache_setup_mobility(dentry_cache, d_isolate, d_partial_shrink);
+
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
return;
-- 
2.21.0

Re: crypto: Kernel memory overwrite attempt detected to spans multiple pages

2019-04-10 Thread Rik van Riel

On Wed, 2019-04-10 at 16:11 -0700, Eric Biggers wrote:

> You've explained *what* it does again, but not *why*.  *Why* do you
> want
> hardened usercopy to detect copies across page boundaries, when there
> is no
> actual buffer overflow?

When some subsystem in the kernel allocates multiple
pages without _GFP_COMP, there is no way afterwards
to detect exactly how many pages it allocated.

In other words, there is no way to see how large the
buffer is, nor whether the copy operation in question
would overflow it.

-- 
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part

[RFC PATCH v3 13/15] dcache: Provide a dentry constructor

In order to support object migration on the dentry cache we need to have
a determined object state at all times. Without a constructor the object
would have a random state after allocation.

Provide a dentry constructor.

Signed-off-by: Tobin C. Harding 
---
 fs/dcache.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index aac41adf4743..606cfca20d42 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1603,6 +1603,16 @@ void d_invalidate(struct dentry *dentry)
 }
 EXPORT_SYMBOL(d_invalidate);
 
+static void dcache_ctor(void *p)
+{
+   struct dentry *dentry = p;
+
+   /* Mimic lockref_mark_dead() */
+   dentry->d_lockref.count = -128;
+
+   spin_lock_init(>d_lock);
+}
+
 /**
  * __d_alloc   -   allocate a dcache entry
  * @sb: filesystem it will belong to
@@ -1658,7 +1668,7 @@ struct dentry *__d_alloc(struct super_block *sb, const 
struct qstr *name)
 
dentry->d_lockref.count = 1;
dentry->d_flags = 0;
-   spin_lock_init(>d_lock);
+
seqcount_init(>d_seq);
dentry->d_inode = NULL;
dentry->d_parent = dentry;
@@ -3091,14 +3101,17 @@ static void __init dcache_init_early(void)
 
 static void __init dcache_init(void)
 {
-   /*
-* A constructor could be added for stable state like the lists,
-* but it is probably not worth it because of the cache nature
-* of the dcache.
-*/
-   dentry_cache = KMEM_CACHE_USERCOPY(dentry,
-   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
-   d_iname);
+   slab_flags_t flags =
+   SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | SLAB_MEM_SPREAD | 
SLAB_ACCOUNT;
+
+   dentry_cache =
+   kmem_cache_create_usercopy("dentry",
+  sizeof(struct dentry),
+  __alignof__(struct dentry),
+  flags,
+  offsetof(struct dentry, d_iname),
+  sizeof_field(struct dentry, d_iname),
+  dcache_ctor);
 
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
-- 
2.21.0

Re: [v7 1/3] dt-bindings: ahci-fsl-qoriq: add lx2160a chip name to the list

On Tue, Mar 12, 2019 at 09:50:17AM +0800, Peng Ma wrote:
> Add lx2160a compatible to bindings documentation.
> 
> Signed-off-by: Peng Ma 
> Reviewed-by: Rob Herring 

I assume that the bindings will go via AHCI tree.  Otherwise, please let
me know.

Shawn

[RFC PATCH v3 15/15] dcache: Add CONFIG_DCACHE_SMO

In an attempt to make the SMO patchset as non-invasive as possible add a
config option CONFIG_DCACHE_SMO (under "Memory Management options") for
enabling SMO for the DCACHE.  Whithout this option dcache constructor is
used but no other code is built in, with this option enabled slab
mobility is enabled and the isolate/migrate functions are built in.

Add CONFIG_DCACHE_SMO to guard the partial shrinking of the dcache via
Slab Movable Objects infrastructure.

Signed-off-by: Tobin C. Harding 
---
 fs/dcache.c | 4 
 mm/Kconfig  | 7 +++
 2 files changed, 11 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5c707ed9ab5a..5ef68b78b457 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3069,6 +3069,7 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+#ifdef CONFIG_DCACHE_SMO
 /*
  * d_isolate() - Dentry isolation callback function.
  * @s: The dentry cache.
@@ -3136,6 +3137,7 @@ static void d_partial_shrink(struct kmem_cache *s, void 
**v, int nr,
if (!list_empty())
shrink_dentry_list();
 }
+#endif /* CONFIG_DCACHE_SMO */
 
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
@@ -3182,7 +3184,9 @@ static void __init dcache_init(void)
   sizeof_field(struct dentry, d_iname),
   dcache_ctor);
 
+#ifdef CONFIG_DCACHE_SMO
kmem_cache_setup_mobility(dentry_cache, d_isolate, d_partial_shrink);
+#endif
 
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
diff --git a/mm/Kconfig b/mm/Kconfig
index 47040d939f3b..92fc27ad3472 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -265,6 +265,13 @@ config SMO_NODE
help
  On NUMA systems enable moving objects to and from a specified node.
 
+config DCACHE_SMO
+   bool "Enable Slab Movable Objects for the dcache"
+   depends on SLUB
+   help
+ Under memory pressure we can try to free dentry slab cache objects 
from
+ the partial slab list if this is enabled.
+
 config PHYS_ADDR_T_64BIT
def_bool 64BIT
 
-- 
2.21.0

[RFC PATCH v3 08/15] tools/testing/slab: Add object migration test suite