Re: [PATCH RESEND] Documentation: filesystems: update filesystem locking documentation

2018-05-25 Thread Al Viro
On Wed, May 23, 2018 at 10:29:10PM -0400, Sean Anderson wrote:
> Documentation/filesystems/Locking no longer reflects current locking
> semantics. i_mutex is no longer used for locking, and has been superseded
> by i_rwsem. Additionally, ->iterate_shared() was not documented.

Your mission, should you choose to accept it, shall be to locate the
old sig regarding the usual reaction to use of Quoted-Printable...

IOW, fix your mail setup.  Applied, but...
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs

2018-05-25 Thread Alexei Starovoitov
On Fri, May 25, 2018 at 06:50:09PM +0200, Eugene Syromiatnikov wrote:
> On Thu, May 24, 2018 at 04:34:51PM -0700, Alexei Starovoitov wrote:
> > On Thu, May 24, 2018 at 09:41:08AM +0200, Jesper Dangaard Brouer wrote:
> > > On Wed, 23 May 2018 15:02:45 -0700
> > > Alexei Starovoitov  wrote:
> > > 
> > > > On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:
> > > > > Some BPF sysctl knobs affect the loading of BPF programs, and during
> > > > > system boot/init stages these sysctls are not yet configured.
> > > > > A concrete example is systemd, that has implemented loading of BPF
> > > > > programs.
> > > > > 
> > > > > Thus, to allow controlling these setting at early boot, this patch set
> > > > > adds the ability to change the default setting of these sysctl knobs
> > > > > as well as option to override them via a boot-time kernel parameter
> > > > > (in order to avoid rebuilding kernel each time a need of changing 
> > > > > these
> > > > > defaults arises).
> > > > > 
> > > > > The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> > > > > net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.  
> > > > 
> > > > - systemd is root. today it only uses cgroup-bpf progs which require 
> > > > root,
> > > >   so disabling unpriv during boot time makes no difference to systemd.
> > > >   what is the actual reason to present time?
> systemd also runs a lot of code, some of which is unprivileged.

systemd processes sysctl configs first. It's essential for system
security to do so. If you have concerns in how systemd does that
please bring it up with systemd folks.

> > > > - say in the future systemd wants to use so_reuseport+bpf for faster
> > > >   networking. With unpriv disable during boot, it will force systemd
> > > >   to do such networking from root, which will lower its security 
> > > > barrier.
> No, it will force systemd not to use SO_REUSEPORT BPF.

sorry this argument makes no sense to me.

> > > > - bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
> > > >   Flipping it during the boot or right after or any time after
> > > >   is the same thing. Why add such boot flag then?
> Well, that one was for completeness.

Should we convert all sysctls to bootparams for 'completeness' ?

> > > > - jit_harden can be turned on by systemd. so turning it during the boot
> > > >   will make systemd progs to be constant blinded.
> > > >   Constant blinding protects kernel from unprivileged JIT spraying.
> > > >   Are you worried that systemd will attack the kernel with JIT spraying?
> I'm worried that systemd can be exploited for a JIT spraying attack.

I'm afraid we're not on the same page with definition of 'JIT spraying attack'.

> Another thing I'm concerned with is that the generated code is different,
> which introduces additional complication during debugging.

specifically what kind of complication?

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/6] hwmon: Add support for RPi voltage sensor

2018-05-25 Thread Guenter Roeck
On Fri, May 25, 2018 at 09:24:35PM +0200, Stefan Wahren wrote:
> Currently there is no easy way to detect undervoltage conditions on a
> remote Raspberry Pi. This hwmon driver retrieves the state of the
> undervoltage sensor via mailbox interface. The handling based on
> Noralf's modifications to the downstream firmware driver. In case of
> an undervoltage condition only an entry is written to the kernel log.
> 
> CC: "Noralf Trønnes" 
> Signed-off-by: Stefan Wahren 

Acked-by: Guenter Roeck 

... assuming this will go through some arm tree.

> ---
>  Documentation/hwmon/raspberrypi-hwmon |  22 +
>  drivers/hwmon/Kconfig |  10 ++
>  drivers/hwmon/Makefile|   1 +
>  drivers/hwmon/raspberrypi-hwmon.c | 166 
> ++
>  4 files changed, 199 insertions(+)
>  create mode 100644 Documentation/hwmon/raspberrypi-hwmon
>  create mode 100644 drivers/hwmon/raspberrypi-hwmon.c
> 
> diff --git a/Documentation/hwmon/raspberrypi-hwmon 
> b/Documentation/hwmon/raspberrypi-hwmon
> new file mode 100644
> index 000..3c92e2c
> --- /dev/null
> +++ b/Documentation/hwmon/raspberrypi-hwmon
> @@ -0,0 +1,22 @@
> +Kernel driver raspberrypi-hwmon
> +===
> +
> +Supported boards:
> +  * Raspberry Pi A+ (via GPIO on SoC)
> +  * Raspberry Pi B+ (via GPIO on SoC)
> +  * Raspberry Pi 2 B (via GPIO on SoC)
> +  * Raspberry Pi 3 B (via GPIO on port expander)
> +  * Raspberry Pi 3 B+ (via PMIC)
> +
> +Author: Stefan Wahren 
> +
> +Description
> +---
> +
> +This driver periodically polls a mailbox property of the VC4 firmware to 
> detect
> +undervoltage conditions.
> +
> +Sysfs entries
> +-
> +
> +in0_lcrit_alarm  Undervoltage alarm
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index f10840a..fdaab82 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1298,6 +1298,16 @@ config SENSORS_PWM_FAN
> This driver can also be built as a module.  If so, the module
> will be called pwm-fan.
>  
> +config SENSORS_RASPBERRYPI_HWMON
> + tristate "Raspberry Pi voltage monitor"
> + depends on RASPBERRYPI_FIRMWARE || COMPILE_TEST
> + help
> +   If you say yes here you get support for voltage sensor on the
> +   Raspberry Pi.
> +
> +   This driver can also be built as a module. If so, the module
> +   will be called raspberrypi-hwmon.
> +
>  config SENSORS_SHT15
>   tristate "Sensiron humidity and temperature sensors. SHT15 and compat."
>   depends on GPIOLIB || COMPILE_TEST
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index e7d52a3..a929770 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -141,6 +141,7 @@ obj-$(CONFIG_SENSORS_PC87427) += pc87427.o
>  obj-$(CONFIG_SENSORS_PCF8591)+= pcf8591.o
>  obj-$(CONFIG_SENSORS_POWR1220)  += powr1220.o
>  obj-$(CONFIG_SENSORS_PWM_FAN)+= pwm-fan.o
> +obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON)  += raspberrypi-hwmon.o
>  obj-$(CONFIG_SENSORS_S3C)+= s3c-hwmon.o
>  obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
>  obj-$(CONFIG_SENSORS_SCH5627)+= sch5627.o
> diff --git a/drivers/hwmon/raspberrypi-hwmon.c 
> b/drivers/hwmon/raspberrypi-hwmon.c
> new file mode 100644
> index 000..fb4e4a6
> --- /dev/null
> +++ b/drivers/hwmon/raspberrypi-hwmon.c
> @@ -0,0 +1,166 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Raspberry Pi voltage sensor driver
> + *
> + * Based on firmware/raspberrypi.c by Noralf Trønnes
> + *
> + * Copyright (C) 2018 Stefan Wahren 
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define UNDERVOLTAGE_STICKY_BIT  BIT(16)
> +
> +struct rpi_hwmon_data {
> + struct device *hwmon_dev;
> + struct rpi_firmware *fw;
> + u32 last_throttled;
> + struct delayed_work get_values_poll_work;
> +};
> +
> +static void rpi_firmware_get_throttled(struct rpi_hwmon_data *data)
> +{
> + u32 new_uv, old_uv, value;
> + int ret;
> +
> + /* Request firmware to clear sticky bits */
> + value = 0x;
> +
> + ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
> + &value, sizeof(value));
> + if (ret) {
> + dev_err_once(data->hwmon_dev, "Failed to get throttled (%d)\n",
> +  ret);
> + return;
> + }
> +
> + new_uv = value & UNDERVOLTAGE_STICKY_BIT;
> + old_uv = data->last_throttled & UNDERVOLTAGE_STICKY_BIT;
> + data->last_throttled = value;
> +
> + if (new_uv == old_uv)
> + return;
> +
> + if (new_uv)
> + dev_crit(data->hwmon_dev, "Undervoltage detected!\n");
> + else
> + dev_info(data->hwmon_dev, "Voltage normalised\n");
> +
> + sysfs_notify(&data->hwmon_dev->kobj, NULL, "in0_lcrit_alarm");
> +}
> +
> +static void get_values_poll(struct work_struct *work)

[PATCH V3 5/6] ARM: multi_v7_defconfig: Enable RPi voltage sensor

2018-05-25 Thread Stefan Wahren
The patch enables the hwmon driver for the Raspberry Pi.

Signed-off-by: Stefan Wahren 
---
 arch/arm/configs/multi_v7_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/multi_v7_defconfig 
b/arch/arm/configs/multi_v7_defconfig
index 7b2283a..2af46fd 100644
--- a/arch/arm/configs/multi_v7_defconfig
+++ b/arch/arm/configs/multi_v7_defconfig
@@ -449,6 +449,7 @@ CONFIG_SENSORS_LM90=y
 CONFIG_SENSORS_LM95245=y
 CONFIG_SENSORS_NTC_THERMISTOR=m
 CONFIG_SENSORS_PWM_FAN=m
+CONFIG_SENSORS_RASPBERRYPI_HWMON=m
 CONFIG_SENSORS_INA2XX=m
 CONFIG_CPU_THERMAL=y
 CONFIG_IMX_THERMAL=y
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 3/6] firmware: raspberrypi: Register hwmon driver

2018-05-25 Thread Stefan Wahren
Since the raspberrypi-hwmon driver is tied to the VC4 firmware instead of
particular hardware its registration should be in the firmware driver.

Signed-off-by: Stefan Wahren 
---
 drivers/firmware/raspberrypi.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/firmware/raspberrypi.c b/drivers/firmware/raspberrypi.c
index 6692888f..0602626 100644
--- a/drivers/firmware/raspberrypi.c
+++ b/drivers/firmware/raspberrypi.c
@@ -21,6 +21,8 @@
 #define MBOX_DATA28(msg)   ((msg) & ~0xf)
 #define MBOX_CHAN_PROPERTY 8
 
+static struct platform_device *rpi_hwmon;
+
 struct rpi_firmware {
struct mbox_client cl;
struct mbox_chan *chan; /* The property channel. */
@@ -183,6 +185,20 @@ rpi_firmware_print_firmware_revision(struct rpi_firmware 
*fw)
}
 }
 
+static void
+rpi_register_hwmon_driver(struct device *dev, struct rpi_firmware *fw)
+{
+   u32 packet;
+   int ret = rpi_firmware_property(fw, RPI_FIRMWARE_GET_THROTTLED,
+   &packet, sizeof(packet));
+
+   if (ret)
+   return;
+
+   rpi_hwmon = platform_device_register_data(dev, "raspberrypi-hwmon",
+ -1, NULL, 0);
+}
+
 static int rpi_firmware_probe(struct platform_device *pdev)
 {
struct device *dev = &pdev->dev;
@@ -209,6 +225,7 @@ static int rpi_firmware_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, fw);
 
rpi_firmware_print_firmware_revision(fw);
+   rpi_register_hwmon_driver(dev, fw);
 
return 0;
 }
@@ -217,6 +234,8 @@ static int rpi_firmware_remove(struct platform_device *pdev)
 {
struct rpi_firmware *fw = platform_get_drvdata(pdev);
 
+   platform_device_unregister(rpi_hwmon);
+   rpi_hwmon = NULL;
mbox_free_channel(fw->chan);
 
return 0;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 2/6] hwmon: Add support for RPi voltage sensor

2018-05-25 Thread Stefan Wahren
Currently there is no easy way to detect undervoltage conditions on a
remote Raspberry Pi. This hwmon driver retrieves the state of the
undervoltage sensor via mailbox interface. The handling based on
Noralf's modifications to the downstream firmware driver. In case of
an undervoltage condition only an entry is written to the kernel log.

CC: "Noralf Trønnes" 
Signed-off-by: Stefan Wahren 
---
 Documentation/hwmon/raspberrypi-hwmon |  22 +
 drivers/hwmon/Kconfig |  10 ++
 drivers/hwmon/Makefile|   1 +
 drivers/hwmon/raspberrypi-hwmon.c | 166 ++
 4 files changed, 199 insertions(+)
 create mode 100644 Documentation/hwmon/raspberrypi-hwmon
 create mode 100644 drivers/hwmon/raspberrypi-hwmon.c

diff --git a/Documentation/hwmon/raspberrypi-hwmon 
b/Documentation/hwmon/raspberrypi-hwmon
new file mode 100644
index 000..3c92e2c
--- /dev/null
+++ b/Documentation/hwmon/raspberrypi-hwmon
@@ -0,0 +1,22 @@
+Kernel driver raspberrypi-hwmon
+===
+
+Supported boards:
+  * Raspberry Pi A+ (via GPIO on SoC)
+  * Raspberry Pi B+ (via GPIO on SoC)
+  * Raspberry Pi 2 B (via GPIO on SoC)
+  * Raspberry Pi 3 B (via GPIO on port expander)
+  * Raspberry Pi 3 B+ (via PMIC)
+
+Author: Stefan Wahren 
+
+Description
+---
+
+This driver periodically polls a mailbox property of the VC4 firmware to detect
+undervoltage conditions.
+
+Sysfs entries
+-
+
+in0_lcrit_alarmUndervoltage alarm
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index f10840a..fdaab82 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1298,6 +1298,16 @@ config SENSORS_PWM_FAN
  This driver can also be built as a module.  If so, the module
  will be called pwm-fan.
 
+config SENSORS_RASPBERRYPI_HWMON
+   tristate "Raspberry Pi voltage monitor"
+   depends on RASPBERRYPI_FIRMWARE || COMPILE_TEST
+   help
+ If you say yes here you get support for voltage sensor on the
+ Raspberry Pi.
+
+ This driver can also be built as a module. If so, the module
+ will be called raspberrypi-hwmon.
+
 config SENSORS_SHT15
tristate "Sensiron humidity and temperature sensors. SHT15 and compat."
depends on GPIOLIB || COMPILE_TEST
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index e7d52a3..a929770 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -141,6 +141,7 @@ obj-$(CONFIG_SENSORS_PC87427)   += pc87427.o
 obj-$(CONFIG_SENSORS_PCF8591)  += pcf8591.o
 obj-$(CONFIG_SENSORS_POWR1220)  += powr1220.o
 obj-$(CONFIG_SENSORS_PWM_FAN)  += pwm-fan.o
+obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON)+= raspberrypi-hwmon.o
 obj-$(CONFIG_SENSORS_S3C)  += s3c-hwmon.o
 obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
 obj-$(CONFIG_SENSORS_SCH5627)  += sch5627.o
diff --git a/drivers/hwmon/raspberrypi-hwmon.c 
b/drivers/hwmon/raspberrypi-hwmon.c
new file mode 100644
index 000..fb4e4a6
--- /dev/null
+++ b/drivers/hwmon/raspberrypi-hwmon.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Raspberry Pi voltage sensor driver
+ *
+ * Based on firmware/raspberrypi.c by Noralf Trønnes
+ *
+ * Copyright (C) 2018 Stefan Wahren 
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define UNDERVOLTAGE_STICKY_BITBIT(16)
+
+struct rpi_hwmon_data {
+   struct device *hwmon_dev;
+   struct rpi_firmware *fw;
+   u32 last_throttled;
+   struct delayed_work get_values_poll_work;
+};
+
+static void rpi_firmware_get_throttled(struct rpi_hwmon_data *data)
+{
+   u32 new_uv, old_uv, value;
+   int ret;
+
+   /* Request firmware to clear sticky bits */
+   value = 0x;
+
+   ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
+   &value, sizeof(value));
+   if (ret) {
+   dev_err_once(data->hwmon_dev, "Failed to get throttled (%d)\n",
+ret);
+   return;
+   }
+
+   new_uv = value & UNDERVOLTAGE_STICKY_BIT;
+   old_uv = data->last_throttled & UNDERVOLTAGE_STICKY_BIT;
+   data->last_throttled = value;
+
+   if (new_uv == old_uv)
+   return;
+
+   if (new_uv)
+   dev_crit(data->hwmon_dev, "Undervoltage detected!\n");
+   else
+   dev_info(data->hwmon_dev, "Voltage normalised\n");
+
+   sysfs_notify(&data->hwmon_dev->kobj, NULL, "in0_lcrit_alarm");
+}
+
+static void get_values_poll(struct work_struct *work)
+{
+   struct rpi_hwmon_data *data;
+
+   data = container_of(work, struct rpi_hwmon_data,
+   get_values_poll_work.work);
+
+   rpi_firmware_get_throttled(data);
+
+   /*
+* We can't run faster than the sticky shift (100ms) since we get
+* flipping in the sticky bits that are cleared.
+*/
+   schedu

[PATCH V3 4/6] ARM: bcm2835_defconfig: Enable RPi voltage sensor

2018-05-25 Thread Stefan Wahren
The patch enables the hwmon driver for the Raspberry Pi.

Signed-off-by: Stefan Wahren 
---
 arch/arm/configs/bcm2835_defconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/configs/bcm2835_defconfig 
b/arch/arm/configs/bcm2835_defconfig
index e4d188f..e9bc889 100644
--- a/arch/arm/configs/bcm2835_defconfig
+++ b/arch/arm/configs/bcm2835_defconfig
@@ -86,7 +86,7 @@ CONFIG_SPI=y
 CONFIG_SPI_BCM2835=y
 CONFIG_SPI_BCM2835AUX=y
 CONFIG_GPIO_SYSFS=y
-# CONFIG_HWMON is not set
+CONFIG_SENSORS_RASPBERRYPI_HWMON=m
 CONFIG_THERMAL=y
 CONFIG_BCM2835_THERMAL=y
 CONFIG_WATCHDOG=y
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 1/6] ARM: bcm2835: Add GET_THROTTLED firmware property

2018-05-25 Thread Stefan Wahren
Recent Raspberry Pi firmware provides a mailbox property to detect
under-voltage conditions. Here is the current definition.

The u32 value returned by the firmware is divided into 2 parts:
  - lower 16-bits are the live value
  - upper 16-bits are the history or sticky value

  Bits:
  0: undervoltage
  1: arm frequency capped
  2: currently throttled
  16: undervoltage has occurred
  17: arm frequency capped has occurred
  18: throttling has occurred

Signed-off-by: Stefan Wahren 
---
 include/soc/bcm2835/raspberrypi-firmware.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/soc/bcm2835/raspberrypi-firmware.h 
b/include/soc/bcm2835/raspberrypi-firmware.h
index 8ee8991..c4a5c9e 100644
--- a/include/soc/bcm2835/raspberrypi-firmware.h
+++ b/include/soc/bcm2835/raspberrypi-firmware.h
@@ -75,6 +75,7 @@ enum rpi_firmware_property_tag {
RPI_FIRMWARE_GET_EDID_BLOCK = 0x00030020,
RPI_FIRMWARE_GET_CUSTOMER_OTP =   0x00030021,
RPI_FIRMWARE_GET_DOMAIN_STATE =   0x00030030,
+   RPI_FIRMWARE_GET_THROTTLED =  0x00030046,
RPI_FIRMWARE_SET_CLOCK_STATE =0x00038001,
RPI_FIRMWARE_SET_CLOCK_RATE = 0x00038002,
RPI_FIRMWARE_SET_VOLTAGE =0x00038003,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 6/6] arm64: defconfig: Enable RPi voltage sensor

2018-05-25 Thread Stefan Wahren
The patch enables the hwmon driver for the Raspberry Pi.

Signed-off-by: Stefan Wahren 
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 3299505..e5c7198 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -353,6 +353,7 @@ CONFIG_BATTERY_BQ27XXX=y
 CONFIG_SENSORS_ARM_SCPI=y
 CONFIG_SENSORS_LM90=m
 CONFIG_SENSORS_INA2XX=m
+CONFIG_SENSORS_RASPBERRYPI_HWMON=m
 CONFIG_THERMAL_GOV_POWER_ALLOCATOR=y
 CONFIG_CPU_THERMAL=y
 CONFIG_THERMAL_EMULATION=y
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 0/6] hwmon: Add support for Raspberry Pi voltage sensor

2018-05-25 Thread Stefan Wahren
A common issue for the Raspberry Pi is an inadequate power supply. 
Noralf Trønnes started a discussion [1] about writing such undervoltage
conditions into the kernel log.

Changes in V3:
- rebase
- simplify probing

Changes in V2:
- simplified Kconfig dependency suggested by Robin Murphy
- replace dt-binding by probing from firmware driver
- add hwmon documentation
- minor improvements suggested by Guenter Roeck

[1] - https://github.com/raspberrypi/linux/issues/2367

Stefan Wahren (6):
  ARM: bcm2835: Add GET_THROTTLED firmware property
  hwmon: Add support for RPi voltage sensor
  firmware: raspberrypi: Register hwmon driver
  ARM: bcm2835_defconfig: Enable RPi voltage sensor
  ARM: multi_v7_defconfig: Enable RPi voltage sensor
  arm64: defconfig: Enable RPi voltage sensor

 Documentation/hwmon/raspberrypi-hwmon  |  22 
 arch/arm/configs/bcm2835_defconfig |   2 +-
 arch/arm/configs/multi_v7_defconfig|   1 +
 arch/arm64/configs/defconfig   |   1 +
 drivers/firmware/raspberrypi.c |  19 
 drivers/hwmon/Kconfig  |  10 ++
 drivers/hwmon/Makefile |   1 +
 drivers/hwmon/raspberrypi-hwmon.c  | 166 +
 include/soc/bcm2835/raspberrypi-firmware.h |   1 +
 9 files changed, 222 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/hwmon/raspberrypi-hwmon
 create mode 100644 drivers/hwmon/raspberrypi-hwmon.c

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/6] arm64: untag user pointers passed to the kernel

2018-05-25 Thread Andrey Konovalov
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.

This patch makes a few of the kernel interfaces accept tagged user
pointers. The kernel is already able to handle user faults with tagged
pointers and has the untagged_addr macro, which this patchset reuses.

We're not trying to cover all possible ways the kernel accepts user
pointers in one patchset, so this one should be considered as a start.

Thanks!

[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html

Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.

Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.

Changes in v1:
- Rebased onto 4.17-rc1.

Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
  defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped “mm, arm64: untag user addresses in memory syscalls”.
- Rebased onto 3eb2ce82 (4.16-rc7).

Andrey Konovalov (6):
  arm64: add type casts to untagged_addr macro
  uaccess: add untagged_addr definition for other arches
  arm64: untag user addresses in access_ok and __uaccess_mask_ptr
  mm, arm64: untag user addresses in mm/gup.c
  lib, arm64: untag addrs passed to strncpy_from_user and strnlen_user
  arm64: update Documentation/arm64/tagged-pointers.txt

 Documentation/arm64/tagged-pointers.txt |  5 +++--
 arch/arm64/include/asm/uaccess.h| 14 +-
 include/linux/uaccess.h |  4 
 lib/strncpy_from_user.c |  2 ++
 lib/strnlen_user.c  |  2 ++
 mm/gup.c|  4 
 6 files changed, 24 insertions(+), 7 deletions(-)

-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/6] arm64: add type casts to untagged_addr macro

2018-05-25 Thread Andrey Konovalov
This patch makes the untagged_addr macro accept all kinds of address types
(void *, unsigned long, etc.) and allows not to specify type casts in each
place where it is used. This is done by using __typeof__.

Signed-off-by: Andrey Konovalov 
---
 arch/arm64/include/asm/uaccess.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index e66b0fca99c2..2d6451cbaa86 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -102,7 +102,8 @@ static inline unsigned long __range_ok(const void __user 
*addr, unsigned long si
  * up with a tagged userland pointer. Clear the tag to get a sane pointer to
  * pass on to access_ok(), for instance.
  */
-#define untagged_addr(addr)sign_extend64(addr, 55)
+#define untagged_addr(addr)\
+   ((__typeof__(addr))sign_extend64((__u64)(addr), 55))
 
 #define access_ok(type, addr, size)__range_ok(addr, size)
 #define user_addr_max  get_fs
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/6] arm64: untag user addresses in access_ok and __uaccess_mask_ptr

2018-05-25 Thread Andrey Konovalov
copy_from_user (and a few other similar functions) are used to copy data
from user memory into the kernel memory or vice versa. Since a user can
provided a tagged pointer to one of the syscalls that use copy_from_user,
we need to correctly handle such pointers.

Do this by untagging user pointers in access_ok and in __uaccess_mask_ptr,
before performing access validity checks.

Signed-off-by: Andrey Konovalov 
---
 arch/arm64/include/asm/uaccess.h | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 2d6451cbaa86..fa7318d3d7d5 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -105,7 +105,8 @@ static inline unsigned long __range_ok(const void __user 
*addr, unsigned long si
 #define untagged_addr(addr)\
((__typeof__(addr))sign_extend64((__u64)(addr), 55))
 
-#define access_ok(type, addr, size)__range_ok(addr, size)
+#define access_ok(type, addr, size)\
+   __range_ok(untagged_addr(addr), size)
 #define user_addr_max  get_fs
 
 #define _ASM_EXTABLE(from, to) \
@@ -237,7 +238,8 @@ static inline void uaccess_enable_not_uao(void)
 
 /*
  * Sanitise a uaccess pointer such that it becomes NULL if above the
- * current addr_limit.
+ * current addr_limit. In case the pointer is tagged (has the top byte set),
+ * untag the pointer before checking.
  */
 #define uaccess_mask_ptr(ptr) (__typeof__(ptr))__uaccess_mask_ptr(ptr)
 static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
@@ -245,10 +247,11 @@ static inline void __user *__uaccess_mask_ptr(const void 
__user *ptr)
void __user *safe_ptr;
 
asm volatile(
-   "   bicsxzr, %1, %2\n"
+   "   bicsxzr, %3, %2\n"
"   csel%0, %1, xzr, eq\n"
: "=&r" (safe_ptr)
-   : "r" (ptr), "r" (current_thread_info()->addr_limit)
+   : "r" (ptr), "r" (current_thread_info()->addr_limit),
+ "r" (untagged_addr(ptr))
: "cc");
 
csdb();
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/6] mm, arm64: untag user addresses in mm/gup.c

2018-05-25 Thread Andrey Konovalov
mm/gup.c provides a kernel interface that accepts user addresses and
manipulates user pages directly (for example get_user_pages, that is used
by the futex syscall). Here we also need to handle the case of tagged user
pointers.

Add untagging to gup.c functions that use user pointers for vma lookup.

Signed-off-by: Andrey Konovalov 
---
 mm/gup.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index 541904a7c60f..5d0e9715bab7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -650,6 +650,8 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
if (!nr_pages)
return 0;
 
+   start = untagged_addr(start);
+
VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
 
/*
@@ -804,6 +806,8 @@ int fixup_user_fault(struct task_struct *tsk, struct 
mm_struct *mm,
struct vm_area_struct *vma;
int ret, major = 0;
 
+   address = untagged_addr(address);
+
if (unlocked)
fault_flags |= FAULT_FLAG_ALLOW_RETRY;
 
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/6] lib, arm64: untag addrs passed to strncpy_from_user and strnlen_user

2018-05-25 Thread Andrey Konovalov
strncpy_from_user and strnlen_user accept user addresses as arguments, and
do not go through the same path as copy_from_user and others, so here we
need to handle the case of tagged user addresses separately.

Untag user pointers passed to these functions.

Signed-off-by: Andrey Konovalov 
---
 lib/strncpy_from_user.c | 2 ++
 lib/strnlen_user.c  | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
index b53e1b5d80f4..97467cd2bc59 100644
--- a/lib/strncpy_from_user.c
+++ b/lib/strncpy_from_user.c
@@ -106,6 +106,8 @@ long strncpy_from_user(char *dst, const char __user *src, 
long count)
if (unlikely(count <= 0))
return 0;
 
+   src = untagged_addr(src);
+
max_addr = user_addr_max();
src_addr = (unsigned long)src;
if (likely(src_addr < max_addr)) {
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index 60d0bbda8f5e..8b5f56466e00 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -108,6 +108,8 @@ long strnlen_user(const char __user *str, long count)
if (unlikely(count <= 0))
return 0;
 
+   str = untagged_addr(str);
+
max_addr = user_addr_max();
src_addr = (unsigned long)str;
if (likely(src_addr < max_addr)) {
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/6] arm64: update Documentation/arm64/tagged-pointers.txt

2018-05-25 Thread Andrey Konovalov
Add a note that work on passing tagged user pointers to the kernel via
syscalls has started, but might not be complete yet.

Signed-off-by: Andrey Konovalov 
---
 Documentation/arm64/tagged-pointers.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/arm64/tagged-pointers.txt 
b/Documentation/arm64/tagged-pointers.txt
index a25a99e82bb1..361481283f00 100644
--- a/Documentation/arm64/tagged-pointers.txt
+++ b/Documentation/arm64/tagged-pointers.txt
@@ -35,8 +35,9 @@ Using non-zero address tags in any of these locations may 
result in an
 error code being returned, a (fatal) signal being raised, or other modes
 of failure.
 
-For these reasons, passing non-zero address tags to the kernel via
-system calls is forbidden, and using a non-zero address tag for sp is
+Some initial work for supporting non-zero address tags passed to the
+kernel via system calls has been done, but the kernel doesn't provide
+any guarantees at this point. Using a non-zero address tag for sp is
 strongly discouraged.
 
 Programs maintaining a frame pointer and frame records that use non-zero
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/6] uaccess: add untagged_addr definition for other arches

2018-05-25 Thread Andrey Konovalov
To allow arm64 syscalls accept tagged pointers from userspace, we must
untag them when they are passed to the kernel. Since untagging is done in
generic parts of the kernel (like the mm subsystem), the untagged_addr
macro should be defined for all architectures.

Define it as a noop for other architectures besides arm64.

Signed-off-by: Andrey Konovalov 
---
 include/linux/uaccess.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index efe79c1cdd47..c045b4eff95e 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -13,6 +13,10 @@
 
 #include 
 
+#ifndef untagged_addr
+#define untagged_addr(addr) addr
+#endif
+
 /*
  * Architectures should provide two primitives (raw_copy_{to,from}_user())
  * and get rid of their private instances of copy_{to,from}_user() and
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs

2018-05-25 Thread Eugene Syromiatnikov
On Thu, May 24, 2018 at 04:34:51PM -0700, Alexei Starovoitov wrote:
> On Thu, May 24, 2018 at 09:41:08AM +0200, Jesper Dangaard Brouer wrote:
> > On Wed, 23 May 2018 15:02:45 -0700
> > Alexei Starovoitov  wrote:
> > 
> > > On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:
> > > > Some BPF sysctl knobs affect the loading of BPF programs, and during
> > > > system boot/init stages these sysctls are not yet configured.
> > > > A concrete example is systemd, that has implemented loading of BPF
> > > > programs.
> > > > 
> > > > Thus, to allow controlling these setting at early boot, this patch set
> > > > adds the ability to change the default setting of these sysctl knobs
> > > > as well as option to override them via a boot-time kernel parameter
> > > > (in order to avoid rebuilding kernel each time a need of changing these
> > > > defaults arises).
> > > > 
> > > > The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> > > > net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.  
> > > 
> > > - systemd is root. today it only uses cgroup-bpf progs which require root,
> > >   so disabling unpriv during boot time makes no difference to systemd.
> > >   what is the actual reason to present time?
systemd also runs a lot of code, some of which is unprivileged.

> > > - say in the future systemd wants to use so_reuseport+bpf for faster
> > >   networking. With unpriv disable during boot, it will force systemd
> > >   to do such networking from root, which will lower its security barrier.
No, it will force systemd not to use SO_REUSEPORT BPF.

> > > - bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
> > >   Flipping it during the boot or right after or any time after
> > >   is the same thing. Why add such boot flag then?
Well, that one was for completeness.

> > > - jit_harden can be turned on by systemd. so turning it during the boot
> > >   will make systemd progs to be constant blinded.
> > >   Constant blinding protects kernel from unprivileged JIT spraying.
> > >   Are you worried that systemd will attack the kernel with JIT spraying?
I'm worried that systemd can be exploited for a JIT spraying attack.

Another thing I'm concerned with is that the generated code is different,
which introduces additional complication during debugging.

> > I think you are missing that, we want the ability to change these
> > defaults in-order to avoid depending on /etc/sysctl.conf settings, and
> > that the these sysctl.conf setting happen too late.
> 
> What does it mean 'happens too late' ?
> Too late for what?
> sysctl.conf has plenty of system critical knobs like
> kernel.perf_event_paranoid, kernel.core_pattern, etc
> The behavior of the host is drastically different after sysctl config
> is applied.
> 
> > For example with jit_harden, there will be a difference between the
> > loaded BPF program that got loaded at boot-time with systemd (no
> > constant blinding) and when someone reloads that systemd service after
> > /etc/sysctl.conf have been evaluated and setting bpf_jit_harden (now
> > slower due to constant blinding).   This is inconsistent behavior.
> 
> net.core.bpf_jit_harden can be flipped back and forth at run-time,
> so bpf progs before and after will be either blinded or not.
> I don't see any inconsistency.

That can't be the reason to maintain that inconsistency.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

2018-05-25 Thread Leo Yan
Hi Arnaldo, Rob,

On Fri, May 25, 2018 at 12:27:13PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 25, 2018 at 03:03:47PM +0100, Robert Walker escreveu:
> > Hi Leo,
> > 
> > Following the discussions from your reply to this with a simplified patch,
> > this version of the patch works better as you also need to emit a branch
> > sample when handling a CS_ETM_TRACE_ON packet to indicate the end of a block
> > of trace.

I also will follow the suggestion as Rob mentioned in another email:
"The deadbeefdeadbeef addresses are a bit ugly - these are just dummy
values emitted in the decoder layer - maybe these should be changed
to 0."

> > This patch does not break the output from perf inject to generate
> > instruction samples for AutoFDO, so I am happy with that.

Thanks for confirmation.

> > Regards
> > 
> > Rob
> > 
> > Reviewed-by: Robert Walker 
> 
> So, Leo, can you please resubmit, bumping the v2 to v3 (or the latest
> one, I haven't fully reread this thread) add this "Reviewed-by: Robert"
> tag and any other that people may have provided, so that I can merge it?

Sure!  I will respin the v3 patch series by following up Rob's
suggestion and add Rob's review tag.

BTW, I'd like to get ack from Mathieu as well.  Mathieu is working on
CPU wide tracing, so I talked with Mathieu he will review the patch
series if has conflict with CPU wide tracing.

[...]

Thanks,
Leo Yan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

2018-05-25 Thread Arnaldo Carvalho de Melo
Em Fri, May 25, 2018 at 03:03:47PM +0100, Robert Walker escreveu:
> Hi Leo,
> 
> Following the discussions from your reply to this with a simplified patch,
> this version of the patch works better as you also need to emit a branch
> sample when handling a CS_ETM_TRACE_ON packet to indicate the end of a block
> of trace.
> 
> This patch does not break the output from perf inject to generate
> instruction samples for AutoFDO, so I am happy with that.
> 
> Regards
> 
> Rob
> 
> Reviewed-by: Robert Walker 

So, Leo, can you please resubmit, bumping the v2 to v3 (or the latest
one, I haven't fully reread this thread) add this "Reviewed-by: Robert"
tag and any other that people may have provided, so that I can merge it?

Thanks,

- Arnaldo
 
> 
> On 22/05/18 09:39, Leo Yan wrote:
> > Hi Rob,
> > 
> > On Mon, May 21, 2018 at 12:27:42PM +0100, Robert Walker wrote:
> > > Hi Leo,
> > > 
> > > On 21/05/18 09:52, Leo Yan wrote:
> > > > Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
> > > > traces") reworks the samples generation flow from CoreSight trace to
> > > > match the correct format so Perf report tool can display the samples
> > > > properly.  But the change has side effect for packet handling, it only
> > > > generate samples when 'prev_packet->last_instr_taken_branch' is true,
> > > > this results in the start tracing packet and exception packets are
> > > > dropped.
> > > > 
> > > > This patch checks extra two conditions for complete samples:
> > > > 
> > > > - If 'prev_packet->sample_type' is zero we can use this condition to
> > > >get to know this is the start tracing packet; for this case, the 
> > > > start
> > > >packet's end_addr is zero as well so we need to handle it in the
> > > >function cs_etm__last_executed_instr();
> > > > 
> > > I think you also need to add something in to handle discontinuities in
> > > trace - for example it is possible to configure the ETM to only trace
> > > execution in specific code regions or to trace a few cycles every so
> > > often. In these cases, prev_packet->sample_type will not be zero, but
> > > whatever the previous packet was.  You will get a CS_ETM_TRACE_ON packet 
> > > in
> > > such cases, generated by an I_TRACE_ON element in the trace stream.
> > > You also get this on exception return.
> > > 
> > > However, you should also keep the test for prev_packet->sample_type == 0
> > > as you may not see a CS_ETM_TRACE_ON when decoding a buffer that has
> > > wrapped.
> > Thanks for reviewing.  Let's dig more detailed into this issue,
> > especially for handling packet CS_ETM_TRACE_ON, I'd like divide into two
> > sub cases.
> > 
> > - The first case is for using python script:
> > 
> >I use python script to analyze packets with below command:
> >./perf script --itrace=ril128 -s arm-cs-trace-disasm.py -F 
> > cpu,event,ip,addr,sym -- -v -d objdump -k ./vmlinux
> > 
> >What I observe is after we pass python script with parameter '-s
> >arm-cs-trace-disasm.py', then instruction tracing options
> >'--itrace=ril128' isn't really used;  the perf tool creates another
> >new process for launch python script and re-enter cmd_script()
> >function, but at the second time when invoke cmd_script() for python
> >script execution the option '--itrace=ril128' is dropped and all
> >parameters are only valid defined by the python script.
> > 
> >As result, I can the variable 'etmq->etm->synth_opts.last_branch' is
> >always FALSE for running python script.  So all CS_ETM_TRACE_ON
> >packets will be ignored in the function cs_etm__flush().
> > 
> >Even the CS_ETM_TRACE_ON packets are missed to handle, the program
> >flow still can work well.  The reason is without the interference by
> >CS_ETM_TRACE_ON, the CS_ETM_RANGE packets can smoothly create
> >instruction range by ignore the middle CS_ETM_TRACE_ON packet.
> > 
> >Please see below example, in this example there have 3 packets, the
> >first one packet is CS_ETM_RANGE packet which is labelled with
> >'PACKET_1', the first one packet can properly generate branch sample
> >data with previous packet as expected;  the second packet is
> >PACKET_2 which is CS_ETM_TRACE_ON, but
> >'etmq->etm->synth_opts.last_branch' is false so function
> >cs_etm__flush() doesn't handle it and skip the swap operation
> >"etmq->prev_packet = tmp"; the third packet is PACKET_3, which is
> >CS_ETM_RANGE packet and we can see it's smoontly to create
> >continous instruction range between PACKET_1 and PACKET_3.
> > 
> >cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 
> > start_addr=0x08a5f79c end_addr=0x08a5f7bc 
> > last_instr_taken_branch=1
> >PACKET_1: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=1 
> > start_addr=0x08a5f858 end_addr=0x08a5f864 
> > last_instr_taken_branch=1
> >cs_etm__synth_branch_sample: ip=0x08a5f7b8 
> > addr=0x

Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-25 Thread Waiman Long
On 05/25/2018 05:40 AM, Patrick Bellasi wrote:
> On 24-May 11:22, Waiman Long wrote:
>> On 05/24/2018 11:16 AM, Juri Lelli wrote:
>>> On 24/05/18 11:09, Waiman Long wrote:
 On 05/24/2018 10:36 AM, Juri Lelli wrote:
> On 17/05/18 16:55, Waiman Long wrote:
>
> [...]
>
>> +A parent cgroup cannot distribute all its CPUs to child
>> +scheduling domain cgroups unless its load balancing flag is
>> +turned off.
>> +
>> +  cpuset.sched.load_balance
>> +A read-write single value file which exists on non-root
>> +cpuset-enabled cgroups.  It is a binary value flag that accepts
>> +either "0" (off) or a non-zero value (on).  This flag is set
>> +by the parent and is not delegatable.
>> +
>> +When it is on, tasks within this cpuset will be load-balanced
>> +by the kernel scheduler.  Tasks will be moved from CPUs with
>> +high load to other CPUs within the same cpuset with less load
>> +periodically.
>> +
>> +When it is off, there will be no load balancing among CPUs on
>> +this cgroup.  Tasks will stay in the CPUs they are running on
>> +and will not be moved to other CPUs.
>> +
>> +The initial value of this flag is "1".  This flag is then
>> +inherited by child cgroups with cpuset enabled.  Its state
>> +can only be changed on a scheduling domain cgroup with no
>> +cpuset-enabled children.
> [...]
>
>> +/*
>> + * On default hierachy, a load balance flag change is only 
>> allowed
>> + * in a scheduling domain with no child cpuset.
>> + */
>> +if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) && 
>> balance_flag_changed &&
>> +   (!is_sched_domain(cs) || css_has_online_children(&cs->css))) 
>> {
>> +err = -EINVAL;
>> +goto out;
>> +}
> The rule is actually
>
>  - no child cpuset
>  - and it must be a scheduling domain
> I always a bit confused by the usage of "scheduling domain", which
> overlaps with the SD concept from the scheduler standpoint.

It is supposed to mimic SD concept of scheduler.

>
> AFAIU a cpuset sched domain is not granted to be turned into an
> actual scheduler SD, am I wrong?
>
> If that's the case, why not better disambiguate these two concept by
> calling the cpuset one a "cpus partition" or eventually "cpuset domain"?

Good point. Peter has similar comment. I will probably change the name
and clarifying it better in the documentation.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

2018-05-25 Thread Robert Walker

Hi Leo,

Following the discussions from your reply to this with a simplified 
patch, this version of the patch works better as you also need to emit a 
branch sample when handling a CS_ETM_TRACE_ON packet to indicate the end 
of a block of trace.


This patch does not break the output from perf inject to generate 
instruction samples for AutoFDO, so I am happy with that.


Regards

Rob

Reviewed-by: Robert Walker 


On 22/05/18 09:39, Leo Yan wrote:

Hi Rob,

On Mon, May 21, 2018 at 12:27:42PM +0100, Robert Walker wrote:

Hi Leo,

On 21/05/18 09:52, Leo Yan wrote:

Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
traces") reworks the samples generation flow from CoreSight trace to
match the correct format so Perf report tool can display the samples
properly.  But the change has side effect for packet handling, it only
generate samples when 'prev_packet->last_instr_taken_branch' is true,
this results in the start tracing packet and exception packets are
dropped.

This patch checks extra two conditions for complete samples:

- If 'prev_packet->sample_type' is zero we can use this condition to
   get to know this is the start tracing packet; for this case, the start
   packet's end_addr is zero as well so we need to handle it in the
   function cs_etm__last_executed_instr();


I think you also need to add something in to handle discontinuities in
trace - for example it is possible to configure the ETM to only trace
execution in specific code regions or to trace a few cycles every so
often. In these cases, prev_packet->sample_type will not be zero, but
whatever the previous packet was.  You will get a CS_ETM_TRACE_ON packet in
such cases, generated by an I_TRACE_ON element in the trace stream.
You also get this on exception return.

However, you should also keep the test for prev_packet->sample_type == 0
as you may not see a CS_ETM_TRACE_ON when decoding a buffer that has
wrapped.

Thanks for reviewing.  Let's dig more detailed into this issue,
especially for handling packet CS_ETM_TRACE_ON, I'd like divide into two
sub cases.

- The first case is for using python script:

   I use python script to analyze packets with below command:
   ./perf script --itrace=ril128 -s arm-cs-trace-disasm.py -F 
cpu,event,ip,addr,sym -- -v -d objdump -k ./vmlinux

   What I observe is after we pass python script with parameter '-s
   arm-cs-trace-disasm.py', then instruction tracing options
   '--itrace=ril128' isn't really used;  the perf tool creates another
   new process for launch python script and re-enter cmd_script()
   function, but at the second time when invoke cmd_script() for python
   script execution the option '--itrace=ril128' is dropped and all
   parameters are only valid defined by the python script.

   As result, I can the variable 'etmq->etm->synth_opts.last_branch' is
   always FALSE for running python script.  So all CS_ETM_TRACE_ON
   packets will be ignored in the function cs_etm__flush().

   Even the CS_ETM_TRACE_ON packets are missed to handle, the program
   flow still can work well.  The reason is without the interference by
   CS_ETM_TRACE_ON, the CS_ETM_RANGE packets can smoothly create
   instruction range by ignore the middle CS_ETM_TRACE_ON packet.

   Please see below example, in this example there have 3 packets, the
   first one packet is CS_ETM_RANGE packet which is labelled with
   'PACKET_1', the first one packet can properly generate branch sample
   data with previous packet as expected;  the second packet is
   PACKET_2 which is CS_ETM_TRACE_ON, but
   'etmq->etm->synth_opts.last_branch' is false so function
   cs_etm__flush() doesn't handle it and skip the swap operation
   "etmq->prev_packet = tmp"; the third packet is PACKET_3, which is
   CS_ETM_RANGE packet and we can see it's smoontly to create
   continous instruction range between PACKET_1 and PACKET_3.

   cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 
start_addr=0x08a5f79c end_addr=0x08a5f7bc 
last_instr_taken_branch=1
   PACKET_1: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=1 
start_addr=0x08a5f858 end_addr=0x08a5f864 
last_instr_taken_branch=1
   cs_etm__synth_branch_sample: ip=0x08a5f7b8 addr=0x08a5f858 
pid=2290 tid=2290 id=100021 stream_id=100021 period=1 cpu=1 flags=0 
cpumode=2

   cs_etm__flush: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 
start_addr=0x08a5f858 end_addr=0x08a5f864 
last_instr_taken_branch=1
   PACKET_2: cs_etm__flush: packet: sample_type=2 exc=0 exc_ret=0 cpu=2 
start_addr=0xdeadbeefdeadbeef end_addr=0xdeadbeefdeadbeef 
last_instr_taken_branch=1

   cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 
start_addr=0x08a5f858 end_addr=0x08a5f864 
last_instr_taken_branch=1
   PACKET_3: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=2 
start_addr=0x08be7528 end_addr=0x08be7538 
last_instr_taken_branch=1
   cs_etm__synt

Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets

2018-05-25 Thread Robert Walker

Hi Leo,


On 23/05/18 14:22, Leo Yan wrote:

Hi Rob,

On Wed, May 23, 2018 at 12:21:18PM +0100, Robert Walker wrote:

Hi Leo,

On 22/05/18 10:52, Leo Yan wrote:

On Tue, May 22, 2018 at 04:39:20PM +0800, Leo Yan wrote:

[...]

Rather than the patch I posted in my previous email, I think below new
patch is more reasonable for me.

In the below change, 'etmq->prev_packet' is only used to store the
previous CS_ETM_RANGE packet, we don't need to save CS_ETM_TRACE_ON
packet into 'etmq->prev_packet'.

On the other hand, cs_etm__flush() can use 'etmq->period_instructions'
to indicate if need to generate instruction sample or not.  If it's
non-zero, then generate instruction sample and
'etmq->period_instructions' will be cleared; so next time if there
have more tracing CS_ETM_TRACE_ON packet, it can skip to generate
instruction sample due 'etmq->period_instructions' is zero.

How about you think for this?

Thanks,
Leo Yan


I don't think this covers the cases where CS_ETM_TRACE_ON is used to
indicate a discontinuity in the trace.  For example, there is work in
progress to configure the ETM so that it only traces a few thousand cycles
with a gap of many thousands of cycles between each chunk of trace - this
can be used to sample program execution in the form of instruction events
with branch stacks for feedback directed optimization (AutoFDO).

In this case, the raw trace is something like:

 ...
 I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x007E7B886908;
 I_ATOM_F3 : Atom format 3.; EEN
 I_ATOM_F1 : Atom format 1.; E
# Trace stops here

# Some time passes, and then trace is turned on again
 I_TRACE_ON : Trace On.
 I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.;
Addr=0x0057224322F4; Ctxt: AArch64,EL0, NS;
 I_ATOM_F3 : Atom format 3.; ENN
 I_ATOM_F5 : Atom format 5.; ENENE
 ...

This results in the following packets from the decoder:

CS_ETM_RANGE: [0x7e7b886908-0x7e7b886930] br
CS_ETM_RANGE: [0x7e7b88699c-0x7e7b8869a4] br
CS_ETM_RANGE: [0x7e7b8869d8-0x7e7b8869f0]
CS_ETM_RANGE: [0x7e7b8869f0-0x7e7b8869fc] br
CS_ETM_TRACE_ON
CS_ETM_RANGE: [0x57224322f4-0x5722432304] br
CS_ETM_RANGE: [0x57224320e8-0x57224320ec]
CS_ETM_RANGE: [0x57224320ec-0x57224320f8]
CS_ETM_RANGE: [0x57224320f8-0x572243212c] br
CS_ETM_RANGE: [0x5722439b80-0x5722439bec]
CS_ETM_RANGE: [0x5722439bec-0x5722439c14] br
CS_ETM_RANGE: [0x5722437c30-0x5722437c6c]
CS_ETM_RANGE: [0x5722437c6c-0x5722437c7c] br

Without handling the CS_ETM_TRACE_ON, this would be interpreted as a branch
from 0x7e7b8869f8 to 0x57224322f4, when there is actually a gap of many
thousand instructions between these.

I think this patch will break the branch stacks - by removing the
prev_packet swap from cs_etm__flush(), the next time a CS_ETM_RANGE packet
is handled, cs_etm__sample() will see prev_packet contains the last
CS_ETM_RANGE from the previous block of trace, causing an erroneous call to
cs_etm__update_last_branch_rb().  In the example above, the branch stack
will contain an erroneous branch from 0x7e7b8869f8 to 0x57224322f4.

I think what you need to do is add a check for the previous packet being a
CS_ETM_TRACE_ON when determining the generate_sample value.

I still can see there have hole for packets handling with your
suggestion, let's focus on below three packets:

CS_ETM_RANGE:[0x7e7b8869f0-0x7e7b8869fc] br
CS_ETM_TRACE_ON: [0xdeadbeefdeadbeef-0xdeadbeefdeadbeef]
CS_ETM_RANGE:[0x57224322f4-0x5722432304] br

When the CS_ETM_TRACE_ON packet is coming, cs_etm__flush() doesn't
handle for 'etmq->prev_packet' to generate branch sample, this results
in we miss the info for 0x7e7b8869fc, and with packet swapping
'etmq->prev_packet' is assigned to CS_ETM_TRACE_ON packet.

When the last CS_ETM_RANGE packet is coming, cs_etm__sample() will
combine the values from CS_ETM_TRACE_ON packet and the last
CS_ETM_RANGE packet to generate branch sample packet; at the end
we get below sample packets:

   packet(n):   sample::addr=0x7e7b8869f0
   packet(n+1): sample::ip=0xdeadbeefdeadbeeb sample::addr=0x57224322f4

So I think we also need to generate branch sample, and we can get
below results:

   packet(n):   sample::addr=0x7e7b8869f0
   packet(n+1): sample::ip=0x7e7b8869f8 sample::addr=0xdeadbeefdeadbeef
   packet(n+2): sample::ip=0xdeadbeefdeadbeeb sample::addr=0x57224322f4

So we also can rely on this to get to know there have one address
range is [0xdeadbeefdeadbeef..0xdeadbeefdeadbeeb] to indicate there
have a discontinuity in the trace.

Yes, I agree you need the extra branch sample from cs_etm__flush().

With a discontinuity in trace, I get output from perf script like this:

branches:u:    59ee6e2e08 sqlite3VdbeExec (speedtest1) =>   
59ee6e2e64 sqlite3VdbeExec (spe
branches:u:    59ee6e2e7c sqlite3VdbeExec (speedtest1) =>   
59ee6e2eec sqlite3VdbeExec (spe
branches:u:    59ee6e2efc sqlite3VdbeExec (speedtest1) =>   
59ee6e2f14 sqlite3VdbeExec (spe
branches:u:    59ee6e2f3c sqlite3

Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus

2018-05-25 Thread Juri Lelli
On 25/05/18 11:31, Patrick Bellasi wrote:

[...]

> Right, so the problem seems to be that we "need" to call
> arch_update_cpu_topology() and we do that by calling
> partition_sched_domains() which was initially introduced by:
> 
>029190c515f1 ("cpuset sched_load_balance flag")
> 
> back in 2007, where it's also quite well explained the reasons behind
> the sched_load_balance flag and the idea to have "partitioned" SDs.
> 
> I also (hopefully) understood that there are at least two actors involved:
> 
>  - A) arch code
>which creates SDs and SGs, usually to group CPUs depending on the
>memory hierarchy, to support different time granularity of load
>balancing operations
> 
>Special case here are HP and hibernation which, by on-/off-lining
>CPUs they directly affect the SDs/SGs definitions.
> 
>  - B) cpusets
>which expose to userspace the possibility to define,
>_if possible_, a finer granularity set of SGs to further restrict the
>scope of load balancing operations
> 
> Since B is a "possible finer granularity" refinement of A, then we
> trigger A's reconfigurations based on B's constraints.
> 
> That's why, for example, in consequence of an HP online event,
> we have:
> 
>--- core.c ---
> HP[sched:active]
>  | sched_cpu_activate()
>| cpuset_cpu_active()
>--- cpuset.c -
>  | cpuset_update_active_cpus()
>| schedule_work(&cpuset_hotplug_work)
> \.. System Kworker \
> | cpuset_hotplug_workfn()
>   if (cpus_updated || force_rebuild)
> | rebuild_sched_domains()
>   | rebuild_sched_domains_locked()
> | generate_sched_domains()
>--- topology.c ---
> | partition_sched_domains()
>   | arch_update_cpu_topology()
> 
> 
> IOW, we need to pass via cpusets to rebuild the SDs whenever we
> there are HP events or we "need" to do an arch_update_cpu_topology()
> via the arch topology driver (drivers/base/arch_topology.c).

I don't think the arch topology driver is always involved in this (e.g.,
arch/x86/kernel/itmt::sched_itmt_update_handler()).

Still we need to check if topology changed, as you say.

> This last bit is also interesting, whenever we detect arch topology
> information that required an SD rebuild, we need to force a
> partition_sched_domains(). But, for that, in:
> 
>commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs")
> 
> we just introduced the support for the "force_rebuild" flag to be set.
> 
> Thus, potentially we can just extend the check I've proposed to consider the
> force rebuild flag, to be something like:
> 
> ---8<---
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 8f586e8bdc98..1f051fafaa3a 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -874,11 +874,19 @@ static void rebuild_sched_domains_locked(void)
>!cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
> goto out;
>  
> +   /* Special case for the 99% of systems with one, full, sched domain */
> +   if (!force_rebuild &&
> +   !top_cpuset.isolation_count &&
> +   is_sched_load_balance(&top_cpuset))
> +   goto out;
> +   force_rebuild = false;
> +
> /* Generate domain masks and attrs */
> ndoms = generate_sched_domains(&doms, &attr);
>  
> /* Have scheduler rebuild the domains */
> partition_sched_domains(ndoms, doms, attr);
>  out:
> put_online_cpus();
> ---8<---
> 
> 
> Which would still allow to use something like:
> 
>cpuset_force_rebuild()
>rebuild_sched_domains()
> 
> to actually rebuild SD in consequence of arch topology changes.

That might work.

> 
> > 
> > Maybe we could move the check you are proposing in update_cpumasks_
> > hier() ?
> 
> Yes, that's another option... although there we are outside of
> get_online_cpus(). Could be a problem?

Mmm, using force_rebuild flag seems safer indeed.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus

2018-05-25 Thread Patrick Bellasi
Hi Juri,
following are some notes I took while trying to understand what's going on...
could be useful to understand if I have a correct view of all the different
components and how they come together.

At the end there are also a couple of possible updates and a question on your
proposal.

Cheers Patrick

On 24-May 12:39, Juri Lelli wrote:
> On 24/05/18 10:04, Patrick Bellasi wrote:
> 
> [...]
> 
> > From 84bb8137ce79f74849d97e30871cf67d06d8d682 Mon Sep 17 00:00:00 2001
> > From: Patrick Bellasi 
> > Date: Wed, 23 May 2018 16:33:06 +0100
> > Subject: [PATCH 1/1] cgroup/cpuset: disable sched domain rebuild when not
> >  required
> > 
> > The generate_sched_domains() already addresses the "special case for 99%
> > of systems" which require a single full sched domain at the root,
> > spanning all the CPUs. However, the current support is based on an
> > expensive sequence of operations which destroy and recreate the exact
> > same scheduling domain configuration.
> > 
> > If we notice that:
> > 
> >  1) CPUs in "cpuset.isolcpus" are excluded from load balancing by the
> > isolcpus= kernel boot option, and will never be load balanced
> > regardless of the value of "cpuset.sched_load_balance" in any
> > cpuset.
> > 
> >  2) the root cpuset has load_balance enabled by default at boot and
> > it's the only parameter which userspace can change at run-time.
> > 
> > we know that, by default, every system comes up with a complete and
> > properly configured set of scheduling domains covering all the CPUs.
> > 
> > Thus, on every system, unless the user explicitly disables load balance
> > for the top_cpuset, the scheduling domains already configured at boot
> > time by the scheduler/topology code and updated in consequence of
> > hotplug events, are already properly configured for cpuset too.
> > 
> > This configuration is the default one for 99% of the systems,
> > and it's also the one used by most of the Android devices which never
> > disable load balance from the top_cpuset.
> > 
> > Thus, while load balance is enabled for the top_cpuset,
> > destroying/rebuilding the scheduling domains at every cpuset.cpus
> > reconfiguration is a useless operation which will always produce the
> > same result.
> > 
> > Let's anticipate the "special" optimization within:
> > 
> >rebuild_sched_domains_locked()
> > 
> > thus completely skipping the expensive:
> > 
> >generate_sched_domains()
> >partition_sched_domains()
> > 
> > for all the cases we know that the scheduling domains already defined
> > will not be affected by whatsoever value of cpuset.cpus.
> 
> [...]
> 
> > +   /* Special case for the 99% of systems with one, full, sched domain */
> > +   if (!top_cpuset.isolation_count &&
> > +   is_sched_load_balance(&top_cpuset))
> > +   goto out;
> > +
> 
> Mmm, looks like we still need to destroy e recreate if there is a
> new_topology (see arch_update_cpu_topology() in partition_sched_
> domains).

Right, so the problem seems to be that we "need" to call
arch_update_cpu_topology() and we do that by calling
partition_sched_domains() which was initially introduced by:

   029190c515f1 ("cpuset sched_load_balance flag")

back in 2007, where it's also quite well explained the reasons behind
the sched_load_balance flag and the idea to have "partitioned" SDs.

I also (hopefully) understood that there are at least two actors involved:

 - A) arch code
   which creates SDs and SGs, usually to group CPUs depending on the
   memory hierarchy, to support different time granularity of load
   balancing operations

   Special case here are HP and hibernation which, by on-/off-lining
   CPUs they directly affect the SDs/SGs definitions.

 - B) cpusets
   which expose to userspace the possibility to define,
   _if possible_, a finer granularity set of SGs to further restrict the
   scope of load balancing operations

Since B is a "possible finer granularity" refinement of A, then we
trigger A's reconfigurations based on B's constraints.

That's why, for example, in consequence of an HP online event,
we have:

   --- core.c ---
HP[sched:active]
 | sched_cpu_activate()
   | cpuset_cpu_active()
   --- cpuset.c -
 | cpuset_update_active_cpus()
   | schedule_work(&cpuset_hotplug_work)
\.. System Kworker \
| cpuset_hotplug_workfn()
  if (cpus_updated || force_rebuild)
| rebuild_sched_domains()
  | rebuild_sched_domains_locked()
| generate_sched_domains()
   --- topology.c ---
| partition_sched_domains()
  | arch_update_cpu_topology()


IOW, we need to pass via cpusets to rebuild the SDs whenever we
there are HP events or we "need" to do an arch_update_cpu_topology()
via the arch topology driver (drivers/base/arch_topology.c).

This last bit is also interesting, whenever we detec

Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2

2018-05-25 Thread Patrick Bellasi
On 24-May 11:22, Waiman Long wrote:
> On 05/24/2018 11:16 AM, Juri Lelli wrote:
> > On 24/05/18 11:09, Waiman Long wrote:
> >> On 05/24/2018 10:36 AM, Juri Lelli wrote:
> >>> On 17/05/18 16:55, Waiman Long wrote:
> >>>
> >>> [...]
> >>>
>  +A parent cgroup cannot distribute all its CPUs to child
>  +scheduling domain cgroups unless its load balancing flag is
>  +turned off.
>  +
>  +  cpuset.sched.load_balance
>  +A read-write single value file which exists on non-root
>  +cpuset-enabled cgroups.  It is a binary value flag that accepts
>  +either "0" (off) or a non-zero value (on).  This flag is set
>  +by the parent and is not delegatable.
>  +
>  +When it is on, tasks within this cpuset will be load-balanced
>  +by the kernel scheduler.  Tasks will be moved from CPUs with
>  +high load to other CPUs within the same cpuset with less load
>  +periodically.
>  +
>  +When it is off, there will be no load balancing among CPUs on
>  +this cgroup.  Tasks will stay in the CPUs they are running on
>  +and will not be moved to other CPUs.
>  +
>  +The initial value of this flag is "1".  This flag is then
>  +inherited by child cgroups with cpuset enabled.  Its state
>  +can only be changed on a scheduling domain cgroup with no
>  +cpuset-enabled children.
> >>> [...]
> >>>
>  +/*
>  + * On default hierachy, a load balance flag change is only 
>  allowed
>  + * in a scheduling domain with no child cpuset.
>  + */
>  +if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) && 
>  balance_flag_changed &&
>  +   (!is_sched_domain(cs) || css_has_online_children(&cs->css))) 
>  {
>  +err = -EINVAL;
>  +goto out;
>  +}
> >>> The rule is actually
> >>>
> >>>  - no child cpuset
> >>>  - and it must be a scheduling domain

I always a bit confused by the usage of "scheduling domain", which
overlaps with the SD concept from the scheduler standpoint.

AFAIU a cpuset sched domain is not granted to be turned into an
actual scheduler SD, am I wrong?

If that's the case, why not better disambiguate these two concept by
calling the cpuset one a "cpus partition" or eventually "cpuset domain"?

-- 
#include 

Patrick Bellasi
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Add parameter for disabling ACS redirection for P2P

2018-05-25 Thread Christian König

Am 24.05.2018 um 23:48 schrieb Logan Gunthorpe:

Hi,

As discussed in our PCI P2PDMA series, we'd like to add a kernel
parameter for selectively disabling ACS redirection for select
bridges. Seeing this turned out to be a small series in itself, we've
decided to send this separately from the P2P work.

This series generalizes the code already done for the resource_alignment
option that already exists. The first patch creates a helper function
to match PCI devices against strings based on the code that already
existed in pci_specified_resource_alignment().

The second patch expands the new helper to optionally take a path of
PCI devfns. This is to address Alex's renumbering concern when using
simple bus-devfns. The implementation is essentially how he described it and
similar to the Intel VT-d spec (Section 8.3.1).

The final patch adds the disable_acs_redir kernel parameter which takes
a list of PCI devices and will disable the ACS P2P Request Redirect,
ACS P2P Completion Redirect and ACS P2P Egress Control bits for the
selected devices. This allows P2P traffic between selected bridges and
seeing it's done at boot, before IOMMU group creating the IOMMU groups
will be created correctly based on the bits.

Thanks,

Logan


Logan Gunthorpe (3):
   PCI: Make specifying PCI devices in kernel parameters reusable
   PCI: Allow specifying devices using a base bus and path of devfns
   PCI: Introduce the disable_acs_redir parameter


Thanks a lot of taking care of it like that. It looks much cleaner to me 
than just trying to disable ACS without a parameter.


Series is Acked-by: Christian König .

Thanks,
Christian.




  Documentation/admin-guide/kernel-parameters.txt |  39 ++-
  drivers/pci/pci.c   | 358 
  2 files changed, 336 insertions(+), 61 deletions(-)

--
2.11.0


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-25 Thread Ravi Bangoria
Thanks Oleg for the review,

On 05/24/2018 09:56 PM, Oleg Nesterov wrote:
> On 04/17, Ravi Bangoria wrote:
>>
>> @@ -941,6 +1091,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer 
>> *self,
>>  if (ret)
>>  goto err_buffer;
>>  
>> +if (tu->ref_ctr_offset)
>> +sdt_increment_ref_ctr(tu);
>> +
> 
> iiuc, this is probe_event_enable()...
> 
> Looks racy, but afaics the race with uprobe_mmap() will be closed by the next
> change. However, it seems that probe_event_disable() can race with 
> trace_uprobe_mmap()
> too and the next 7/9 patch won't help,
> 
>> +if (tu->ref_ctr_offset)
>> +sdt_decrement_ref_ctr(tu);
>> +
>>  uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
>>  tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
> 
> so what if trace_uprobe_mmap() comes right after uprobe_unregister() ?
> Note that trace_probe_is_enabled() is T until we update tp.flags.

Sure, I'll look at your comments.

Apart from these, I've also found a deadlock between uprobe_lock and
mm->mmap_sem. trace_uprobe_mmap() takes these locks in

   mm->mmap_sem
  uprobe_lock

order but some other code path is taking these locks in reverse order. I've
mentioned sample lockdep warning at the end. The issue is, mm->mmap_sem is
not in control of trace_uprobe_mmap() and we have to take uprobe_lock to
loop over all trace_uprobes.

Any idea how this can be resolved?


Sample lockdep warning:

[  499.258006] ==
[  499.258205] WARNING: possible circular locking dependency detected
[  499.258409] 4.17.0-rc3+ #76 Not tainted
[  499.258528] --
[  499.258731] perf/6744 is trying to acquire lock:
[  499.258895] e4895f49 (uprobe_lock){+.+.}, at: 
trace_uprobe_mmap+0x78/0x130
[  499.259147]
[  499.259147] but task is already holding lock:
[  499.259349] 9ec93a76 (&mm->mmap_sem){}, at: 
vm_mmap_pgoff+0xe0/0x160
[  499.259597]
[  499.259597] which lock already depends on the new lock.
[  499.259597]
[  499.259848]
[  499.259848] the existing dependency chain (in reverse order) is:
[  499.260086]
[  499.260086] -> #4 (&mm->mmap_sem){}:
[  499.260277]__lock_acquire+0x53c/0x910
[  499.260442]lock_acquire+0xf4/0x2f0
[  499.260595]down_write_killable+0x6c/0x150
[  499.260764]copy_process.isra.34.part.35+0x1594/0x1be0
[  499.260967]_do_fork+0xf8/0x910
[  499.261090]ppc_clone+0x8/0xc
[  499.261209]
[  499.261209] -> #3 (&dup_mmap_sem){}:
[  499.261378]__lock_acquire+0x53c/0x910
[  499.261540]lock_acquire+0xf4/0x2f0
[  499.261669]down_write+0x6c/0x110
[  499.261793]percpu_down_write+0x48/0x140
[  499.261954]register_for_each_vma+0x6c/0x2a0
[  499.262116]uprobe_register+0x230/0x320
[  499.262277]probe_event_enable+0x1cc/0x540
[  499.262435]perf_trace_event_init+0x1e0/0x350
[  499.262587]perf_trace_init+0xb0/0x110
[  499.262750]perf_tp_event_init+0x38/0x90
[  499.262910]perf_try_init_event+0x10c/0x150
[  499.263075]perf_event_alloc+0xbb0/0xf10
[  499.263235]sys_perf_event_open+0x2a8/0xdd0
[  499.263396]system_call+0x58/0x6c
[  499.263516]
[  499.263516] -> #2 (&uprobe->register_rwsem){}:
[  499.263723]__lock_acquire+0x53c/0x910
[  499.263884]lock_acquire+0xf4/0x2f0
[  499.264002]down_write+0x6c/0x110
[  499.264118]uprobe_register+0x1ec/0x320
[  499.264283]probe_event_enable+0x1cc/0x540
[  499.264442]perf_trace_event_init+0x1e0/0x350
[  499.264603]perf_trace_init+0xb0/0x110
[  499.264766]perf_tp_event_init+0x38/0x90
[  499.264930]perf_try_init_event+0x10c/0x150
[  499.265092]perf_event_alloc+0xbb0/0xf10
[  499.265261]sys_perf_event_open+0x2a8/0xdd0
[  499.265424]system_call+0x58/0x6c
[  499.265542]
[  499.265542] -> #1 (event_mutex){+.+.}:
[  499.265738]__lock_acquire+0x53c/0x910
[  499.265896]lock_acquire+0xf4/0x2f0
[  499.266019]__mutex_lock+0xa0/0xab0
[  499.266142]trace_add_event_call+0x44/0x100
[  499.266310]create_trace_uprobe+0x4a0/0x8b0
[  499.266474]trace_run_command+0xa4/0xc0
[  499.266631]trace_parse_run_command+0xe4/0x200
[  499.266799]probes_write+0x20/0x40
[  499.266922]__vfs_write+0x6c/0x240
[  499.267041]vfs_write+0xd0/0x240
[  499.267166]ksys_write+0x6c/0x110
[  499.267295]system_call+0x58/0x6c
[  499.267413]
[  499.267413] -> #0 (uprobe_lock){+.+.}:
[  499.267591]validate_chain.isra.34+0xbd0/0x1000
[  499.267747]__lock_acquire+0x53c/0x910
[  499.267917]lock_acquire+0xf4/0x2f0
[  499.268048]__mutex_lock+0xa0/0xab0
[  499.268170]trace_uprobe_mmap+0x78/0x130
[  499.268335]uprobe_mmap+0x80/0x3b0
[  499.268464]mmap_region+0x290/0x660
[ 

Re: [PATCH v8 2/6] cpuset: Add new v2 cpuset.sched.domain flag

2018-05-25 Thread Peter Zijlstra
On Thu, May 24, 2018 at 02:53:31PM -0400, Waiman Long wrote:
> On 05/24/2018 11:41 AM, Peter Zijlstra wrote:
> > On Thu, May 17, 2018 at 04:55:41PM -0400, Waiman Long wrote:
> >> A new cpuset.sched.domain boolean flag is added to cpuset v2. This new
> >> flag indicates that the CPUs in the current cpuset should be treated
> >> as a separate scheduling domain.
> > The traditional name for this is a partition.
> 
> Do you want to call it cpuset.sched.partition? That name sounds strange
> to me.

Let me explore the whole domain x load-balance space first. I'm thinking
the two parameters are mostly redundant, but I might be overlooking
something (trivial or otherwise).

> >> +  cpuset.sched.domain
> >> +  A read-write single value file which exists on non-root
> >> +  cpuset-enabled cgroups.  It is a binary value flag that accepts
> >> +  either "0" (off) or a non-zero value (on).
> > I would be conservative and only allow 0/1.
> 
> I stated that because echoing other integer value like 2 into the flag
> file won't return any error. I will modify it to say just 0 and 1.

Ah, I would make the file error on >1.

Because then you can always extend the meaning afterwards because you
know it won't be written to with the new value.

> >> +  3) There is no child cgroups with cpuset enabled.
> >> +
> >> +  Setting this flag will take the CPUs away from the effective
> >> +  CPUs of the parent cgroup. Once it is set, this flag cannot be
> >> +  cleared if there are any child cgroups with cpuset enabled.
> > This I'm not clear on. Why?
> >
> That is for pragmatic reason as it is easier to code this way. We could
> remove this restriction but that will make the code more complex.

Best to mention that I think.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html